steve miller has not received any gifts yet
A few years ago, in a Q&A session following a presentation I gave on data analysis (DA) to a group of college recruits for my then consulting company, I was asked to name what I considered the most important analytic technique. Though a surprise to the audience, my answer, counts and frequencies, was a no brainer for…Continue
Summary: It's no secret that Python-Pandas is central to data management for analytics and data science today. Indeed, what we're seeing now is Pandas being extended to handle ever-larger data. Underappreciated is that Pandas is a tunable platform, supporting its own datatypes as well as those from numerical library Numpy. Together, these comprise…Continue
Summary: This blog details R data.table programming to handle multi-gigabyte data. It shows how the data can be efficiently loaded, "normalized", and counted. Readers can readily copy and enhance the code below for their own analytic needs. An intermediate level of R coding sophistication is assumed.
In my travels over the holidays, I…Continue
Both R and Python-Pandas are array-oriented platforms that support fast filtering through vectors of record-id's. In Python-Pandas, such vectors are implemented via Pandas's powerful index construct; in R-data.table, they're accessible through the "which" and "row.name" functions. In both instances, joins to record-id vectors generate fast subsetted access.
How is the record-id vector approach helpful? For starters, the analyst can encapsulate common…Continue