Subscribe to DSC Newsletter

Steve miller's Blog (48)

Kicking Chicago with R.

Like most Chicago football fans, I was pretty distraught after the Bears lost last Sunday's playoff game courtesy of a missed field goal at the end -- a kick that first hit the goalpost and then the crossbar before ultimately failing miserably. While most local fans were grief-stricken like me, some were irrationally inconsolable, demanding the…

Continue

Added by steve miller on January 11, 2019 at 8:22am — No Comments

XGBoost with Python -- Part 0

After posting my last blog, I decided next to do a 2-part series on …

Continue

Added by steve miller on December 20, 2018 at 7:30am — 2 Comments

A So-So Second Date with Julia

A So-So Second Date with Julia

A few months ago, I wrote a quite positive blog on the Julia analytics language,…

Continue

Added by steve miller on December 12, 2018 at 11:49am — No Comments

Matching the Exact Matching of MatchIt

I started a series on causal inference for data science a few weeks back. I think CI methodologies offer great potential for the DS discipline, given that much of our data is observational i.e. outside…

Continue

Added by steve miller on November 19, 2018 at 1:31pm — No Comments

POTUS and the Stock Market

For those who follow the stock market, October's been a pretty rough month, with overall market levels, as measured by major indexes such as the Russell 3000 and the Wilshire 5000, now down into correction territory of 10 percent declines. The falls, unfortunately, closely follow a …

Continue

Added by steve miller on October 30, 2018 at 8:35am — No Comments

Mixing & Matching in R for Data Science

I've spent time over the last few months attempting to enhance my skills in the statistical sub-field of causal inference.

Overly simplified, causal inference comprises a series of methodologies and techniques to assist analysts in making the jump from association or correlation to cause and effect. How can one progress from noting a correlation between…

Continue

Added by steve miller on October 10, 2018 at 10:00am — 5 Comments

R, Python, Julia -- and Polyglot

A poll released recently showed Python increasing its lead over R as the language of choice for analytics professionals. Setting aside questions of the representativeness to the analytics practitioner population of…

Continue

Added by steve miller on September 24, 2018 at 10:55am — 5 Comments

A Little College Sports Analysis, Part III.

This is the third and (I promise) last of the series "A Little College Sports Analysis", wherein I attempt to use data from the Learfield Directors' Cup to evaluate the prowess of college athletic conferences. The …

Continue

Added by steve miller on September 10, 2018 at 11:21am — No Comments

A Little College Sports Analysis II -- 2017-2018 Directors' Cup Conference Rankings

Last time, I wrote on wrangling data from a pdf file to assemble a data set of D1 college athletic performance in the Learfield Directors' Cup competition. In this blog, I embellish that data, calculating individual school ranks from scores…

Continue

Added by steve miller on August 27, 2018 at 10:53am — No Comments

A Little College Sports Analysis, but First a Little Data Wrangling

I'm a big college sports fan, especially active in debates about which D1 conference is best. Five years ago, I came across the Learfield Directors' Cup, an annual evaluation/ranking program of college sports performance based on hard numbers. In separate rankings, Division I,II, and…

Continue

Added by steve miller on August 14, 2018 at 12:35pm — No Comments

a Little SQL with a Little R

My nephew's a very impressive young man. Five years ago, he received a PhD in Biochemistry/Molecular Biology from a prestigious university, earning numerous teaching and research awards along the way. He then took a faculty…

Continue

Added by steve miller on July 16, 2018 at 12:02pm — 6 Comments

ff and Too-Big-for-Memory Data in R -- Part III

After my last blog on the use of relational databases PostgreSQL and MonetDB to help compensate for R's RAM limitations, I received an email from a reader who asked if I'd ever used the R …

Continue

Added by steve miller on July 2, 2018 at 11:30am — No Comments

PostgreSQL, MonetDB, and Too-Big-for-Memory Data in R -- Part II

In PostgreSQL, MonetDB, and Too-Big-for-Memory Data in R -- Part I, I began to discuss how data that was too big for RAM is handled in R, a memory-constrained statistical platform. I attempted to demonstrate the potential of working…

Continue

Added by steve miller on June 13, 2018 at 10:00am — No Comments

PostgreSQL, MonetDB, and Too-Big-for-Memory Data in R -- Part I

In a blog from three months ago, I wrote on "kind of" big data in R. The "kind of" was meant as a caveat that data size in R is limited by RAM. I also mentioned the potential of working with relational data stores for even larger data, and made a vague proposal to…

Continue

Added by steve miller on June 4, 2018 at 11:00am — 2 Comments

Poker, Probability, Monte Carlo, and R

My daughter just started a business analytics Master's program. For the probability sequence of the core statistics course, one of her assignments is to calculate the probability of single 5 card draw poker hands from a 52-card…

Continue

Added by steve miller on May 23, 2018 at 11:30am — 2 Comments

Frequencies in Pandas -- and a Little R Magic for Python

I've got a big digital mouth. Last time, I wrote on frequencies using R, noting cavalierly that I'd done similar development in Python/Pandas. I wasn't lying, but the pertinent work I dug up from…

Continue

Added by steve miller on May 16, 2018 at 12:30pm — No Comments

My Favorite Statistical Procedure? Frequencies!

At a conference I attended a few years ago, a data scientist on a round table discussion replied to a question of what she considered the most important mathematical function in her work with: "the division operator". That clever response provided grist for my later answer to a similar question on my favorite statistical procedure: "frequencies and…

Continue

Added by steve miller on May 1, 2018 at 8:00am — 1 Comment

Reticulating Python and R -- the American Community Survey Data Dictionary to Meta Data III.

Data Dictionary to Meta Data III is the third and final blog devoted to demonstrating the automation of meta data creation for the American Community Survey 2012-2016 household data set, using a published data dictionary. DDMDI was a teaser to show how Python could be used to generate R statements that could in turn be cut/pasted/applied in an R Jupyter notebook to…

Continue

Added by steve miller on April 25, 2018 at 9:00am — No Comments

Data Dictionary to Meta Data II -- Simple Text Wrangling and Factor Creation in R

My blog last week articulated a first shot at automating the creation of meta data…

Continue

Added by steve miller on April 16, 2018 at 6:30am — No Comments

From Data Dictionary to Meta Data with Simple Text Wrangling in Python

My last DSC blog left me a bit disappointed. While the loads of the beefy household and population files for the American Community Survey worked well, the data, just about entirely integer, represents categorical attributes whose meta info is not…

Continue

Added by steve miller on April 11, 2018 at 12:00pm — No Comments

Videos

  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service