steve miller has not received any gifts yet
Summary: This blog details R data.table programming to handle multi-gigabyte data. It shows how the data can be efficiently loaded, "normalized", and counted. Readers can readily copy and enhance the code below for their own analytic needs. An intermediate level of R coding sophistication is assumed.
In my travels over the holidays, I…Continue
Both R and Python-Pandas are array-oriented platforms that support fast filtering through vectors of record-id's. In Python-Pandas, such vectors are implemented via Pandas's powerful index construct; in R-data.table, they're accessible through the "which" and "row.name" functions. In both instances, joins to record-id vectors generate fast subsetted access.
How is the record-id vector approach helpful? For starters, the analyst can encapsulate common…Continue
I recently came across an interesting account by a practical data scientist on how to munge 25 TB of data. What caught my eye at first was the article's title: "Using AWK and R to parse 25tb". I'm a big R user now and made a living with AWK 30 years ago as a budding data analyst. I also empathized with the author's recountings of…Continue