Added by steve miller on December 10, 2020 at 4:38am — No Comments
At dinner with friends last Sunday, the topic of conversation fixated on -- what else -- the upcoming presidential election. That morning, a poll had been released by the …
ContinueAdded by steve miller on October 13, 2020 at 6:00am — 1 Comment
Summary: This blog is part III of a series showcasing management and analytics of the daily World Covid-19 case/death data published by the Center for Systems Science and…
ContinueAdded by steve miller on August 27, 2020 at 3:54am — No Comments
Summary: This blog is part II of a series showcasing management and analytics of the daily U.S. Covid-19 case/death data published by the Center for Systems Science and Engineering at Johns…
ContinueAdded by steve miller on June 2, 2020 at 8:33am — No Comments
Summary: This blog showcases the handling of daily data of cases/deaths from Covid-19 in the U.S. published by the …
Added by steve miller on May 6, 2020 at 8:27am — No Comments
A few years ago, in a Q&A session following a presentation I gave on data analysis (DA) to a group of college recruits for my then consulting company, I was asked to name what I considered the most important analytic technique. Though a surprise to the audience, my answer, counts and frequencies, was a no brainer for…
ContinueAdded by steve miller on March 11, 2020 at 10:30am — No Comments
Summary: It's no secret that Python-Pandas is central to data management for analytics and data science today. Indeed, what we're seeing now is Pandas being extended to handle ever-larger data. Underappreciated is that Pandas is a tunable platform, supporting its own datatypes as well as those from numerical library Numpy. Together, these comprise…
ContinueAdded by steve miller on February 18, 2020 at 4:46am — 5 Comments
Summary: This blog details R data.table programming to handle multi-gigabyte data. It shows how the data can be efficiently loaded, "normalized", and counted. Readers can readily copy and enhance the code below for their own analytic needs. An intermediate level of R coding sophistication is assumed.
In my travels over the holidays, I…
ContinueAdded by steve miller on January 15, 2020 at 5:29am — No Comments
Both R and Python-Pandas are array-oriented platforms that support fast filtering through vectors of record-id's. In Python-Pandas, such vectors are implemented via Pandas's powerful index construct; in R-data.table, they're accessible through the "which" and "row.name" functions. In both instances, joins to record-id vectors generate fast subsetted access.
How is the record-id vector approach helpful? For starters, the analyst can encapsulate common…
ContinueAdded by steve miller on December 13, 2019 at 5:51am — No Comments
Added by steve miller on November 4, 2019 at 9:04am — No Comments
I recently came across an interesting account by a practical data scientist on how to munge 25 TB of data. What caught my eye at first was the article's title: "Using AWK and R to parse 25tb". I'm a big R user now and made a living with AWK 30 years ago as a budding data analyst. I also empathized with the author's recountings of…
ContinueAdded by steve miller on September 21, 2019 at 5:30am — 2 Comments
Despite the consuming controversy surrounding his presidency, POTUS 45 has been able to secure solid ratings on the performance of the economy over his so-far 30-month administration. And he certainly isn't bashful about taking credit for the successes, opining loudly and often that his tax cuts and de-regulation initiatives…
ContinueAdded by steve miller on September 4, 2019 at 8:39am — No Comments
Last time I wrote on using Python/Pandas as an adjunct to loading PostgreSQL tables. In this sequel, I demo how R can be used to collaborate with the database in…
ContinueAdded by steve miller on August 8, 2019 at 6:22am — No Comments
I enjoy data prep munging for analyses with computational platforms such as R, Python-Pandas, Julia, Apache Spark, and even relational databases. The wrangling cycle provides the opportunity to get a feel for and preliminarily explore data that are to be later analyzed/modeled.
A critical task I prefer handling in computation over database is…
ContinueAdded by steve miller on July 30, 2019 at 6:34am — 1 Comment
Added by steve miller on July 2, 2019 at 9:00am — No Comments
I recently downloaded a 5 year Public Use Microsample (PUMS) from the latest release of the American Community Survey (ACS) census data. The data contain a wealth of demographic information on both American households and…
ContinueAdded by steve miller on June 24, 2019 at 12:42pm — 1 Comment
I pulled out a dusty copy of Thinking Stats by Allen Downey the other day. I highly recommend this terrific little read that teaches statistics with easily understood examples using Python. When I purchased the book eight years ago, the Python code proved invaluable as…
ContinueAdded by steve miller on May 30, 2019 at 7:56am — No Comments
In my many years as a data scientist, I've spent more time doing forecast work than any other type of predictive modeling. Often as not, the challenges have involved forecasting demand for an organization's many products/lines of business a year or more out based on five or more years of actual data, generally of daily…
ContinueAdded by steve miller on May 7, 2019 at 5:34am — 1 Comment
A little less than a year ago, I posted a blog on generating multivariate frequencies with the Python Pandas data management library, at the same time showcasing Python/R graphics interoperability. For my…
ContinueAdded by steve miller on April 25, 2019 at 5:33am — No Comments
March Madness officially arrived at 6 PM CDT, Sunday 3/17/2019. 68 D1 schools -- 32 league champions and 36 at large selections -- received invitations to this year's tournament, which starts…
Added by steve miller on March 18, 2019 at 5:35am — No Comments
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles