Subscribe to DSC Newsletter
steve miller
  • Male
  • Chicago, Il
  • United States
Share on Facebook
Share

Steve miller's Friends

  • Kleanthis Koupidis
  • paul gureghian

Gifts Received

Gift

steve miller has not received any gifts yet

Give a Gift

 

steve miller's Page

Latest Activity

Ankit DS commented on steve miller's blog post R, Python, Julia -- and Polyglot
"@steve -Steve i am running your post today, i have recently started using Julia, your comment - A new competitor such as Julia is considerably behind from the get-go, remaining so until it can both attain a noticeable programmer presence and…"
May 13
steve miller liked steve miller's blog post Johns Hopkins Covid-19 Data and R, Part I -- data.table handling.
May 7
steve miller's blog post was featured

Johns Hopkins Covid-19 Data and R, Part I -- data.table handling.

Summary: This blog showcases the handling of daily data of cases/deaths from Covid-19 in the U.S. published by the Center for Systems Science and Engineering at Johns Hopkins University. The technology deployed to manage and explore the data is R along with…See More
May 6
steve miller's blog post was featured

Multi-Dimensional Frequencies with R data.table.

A few years ago, in a Q&A session following a presentation I gave on data analysis (DA) to a group of college recruits for my then consulting company, I was asked to name what I considered the most important analytic technique. Though a surprise to the audience, my answer, counts and frequencies, was a no brainer for me.I've made a living doing data analysis over the…See More
Mar 12
Hugo Bertini liked steve miller's blog post Dataframe Storage Efficiency in Python-Pandas
Mar 1
Hugo Bertini liked steve miller's blog post Dataframe Storage Efficiency in Python-Pandas
Mar 1
Lance Norskog commented on steve miller's blog post Dataframe Storage Efficiency in Python-Pandas
"I have not used VAEX, just saw it in the blizzard of numerical libraries :)"
Feb 24
steve miller commented on steve miller's blog post Dataframe Storage Efficiency in Python-Pandas
"thanks for bringing VAEX to my attention. have you worked with it? is it stable? performant? I'll definitely take a look."
Feb 22
steve miller commented on steve miller's blog post Dataframe Storage Efficiency in Python-Pandas
"I'm addressing category vars in a future blog. There can be storage and performance advantages for both R factors and Pandas categories even when the number of levels is in the thousands."
Feb 22
Lance Norskog commented on steve miller's blog post Dataframe Storage Efficiency in Python-Pandas
"At some point VAEX might be what you want: Pandas API for data stored directly in memory-mapped files. https://vaex.readthedocs.io/en/latest/"
Feb 21
Subhash mantha commented on steve miller's blog post Dataframe Storage Efficiency in Python-Pandas
"There are additional benefits identifying a column as categorical when there are repeating values and the number of distinct values of the column is small. Most of the times the size of numeric data types is set to bit version of OS and it would…"
Feb 21
steve miller's blog post was featured

Dataframe Storage Efficiency in Python-Pandas

Summary: It's no secret that Python-Pandas is central to data management for analytics and data science today. Indeed, what we're seeing now is Pandas being extended to handle ever-larger data. Underappreciated is that Pandas is a tunable platform, supporting its own datatypes as well as those from numerical library Numpy. Together, these comprise a quite granular…See More
Feb 20
Cezar Baisanu liked steve miller's blog post Multi Gigabyte R data.table for Ohio Voter Registration/History
Jan 16
steve miller's blog post was featured

Multi Gigabyte R data.table for Ohio Voter Registration/History

Summary: This blog details R data.table programming to handle multi-gigabyte data. It shows how the data can be efficiently loaded, "normalized", and counted. Readers can readily copy and enhance the code below for their own analytic needs. An intermediate level of R coding sophistication is assumed.In my travels over the holidays, I came across an …See More
Jan 15
steve miller's blog post was featured

Using "record id's" to facilitate processing in Python-Pandas and R-data.table.

Both R and Python-Pandas are array-oriented platforms that support fast filtering through vectors of record-id's. In Python-Pandas, such vectors are implemented via Pandas's powerful index construct; in R-data.table, they're accessible through the "which" and "row.name" functions. In both instances, joins to record-id vectors generate fast subsetted access.How is the record-id vector approach helpful? For starters, the analyst can encapsulate common subsetting conditions once and use many…See More
Dec 15, 2019
steve miller's blog post was featured

Working with Control Breaks Data in R.

About a year ago, a young neighbor who's enrolled in an MS is Data Science program asked my help on an R coding exercise. The challenge was to compute several new category attributes based on columns in an initially loaded dataframe. His solution was to loop through each of the df rows, populating the new vars with basic if/then logic. Kind of reminded me of how I might…See More
Nov 5, 2019

Profile Information

Job Title:
consultant
Seniority:
Consultant
Industry:
Consulting
Short Bio:
40 years experience in consulting services surrounding BI, statistics, analytics, and data science. Most recent position was President of Inquidia Consulting and EVP of BI/Analytics at Braun Consulting. have been a writer for information management, dataversity, and beyenetwork for 12 years.
LinkedIn Profile:
http://https://www.linkedin.com/in/steve-miller-58ab881/
Interests:
Other

Steve miller's Blog

Johns Hopkins Covid-19 Data and R, Part I -- data.table handling.

Posted on May 6, 2020 at 8:27am 0 Comments

Summary: This blog showcases the handling of daily data of cases/deaths from Covid-19 in the U.S. published by the …

Continue

Multi-Dimensional Frequencies with R data.table.

Posted on March 11, 2020 at 10:30am 0 Comments

A few years ago, in a Q&A session following a presentation I gave on data analysis (DA) to a group of college recruits for my then consulting company, I was asked to name what I considered the most important analytic technique. Though a surprise to the audience, my answer, counts and frequencies, was a no brainer for…

Continue

Dataframe Storage Efficiency in Python-Pandas

Posted on February 18, 2020 at 4:46am 5 Comments

Summary: It's no secret that Python-Pandas is central to data management for analytics and data science today. Indeed, what we're seeing now is Pandas being extended to handle ever-larger data. Underappreciated is that Pandas is a tunable platform, supporting its own datatypes as well as those from numerical library Numpy. Together, these comprise…

Continue

Multi Gigabyte R data.table for Ohio Voter Registration/History

Posted on January 15, 2020 at 5:29am 0 Comments

Summary: This blog details R data.table programming to handle multi-gigabyte data. It shows how the data can be efficiently loaded, "normalized", and counted. Readers can readily copy and enhance the code below for their own analytic needs. An intermediate level of R coding sophistication is assumed.

In my travels over the holidays, I…

Continue

Comment Wall

You need to be a member of Data Science Central to add comments!

Join Data Science Central

  • No comments yet!
 
 
 

Videos

  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service