Subscribe to DSC Newsletter

Steve miller's Blog (42)

Working with Control Breaks Data in R.

Continue

Added by steve miller on November 4, 2019 at 9:04am — No Comments

AWK -- a Blast from Wrangling Past.

I recently came across an interesting account by a practical data scientist on how to munge 25 TB of data. What caught my eye at first was the article's title: "Using AWK and R to parse 25tb". I'm a big R user now and made a living with AWK 30 years ago as a budding data analyst. I also empathized with the author's recountings of…

Continue

Added by steve miller on September 21, 2019 at 5:30am — 2 Comments

Jobs, Unemployment and 45's Performance.

Despite the consuming controversy surrounding his presidency, POTUS 45 has been able to secure solid ratings on the performance of the economy over his so-far 30-month administration. And he certainly isn't bashful about taking credit for the successes, opining loudly and often that his tax cuts and de-regulation initiatives…

Continue

Added by steve miller on September 4, 2019 at 8:39am — No Comments

Using Python and R to Load Relational Database Tables, Part II

Last time I wrote on using Python/Pandas as an adjunct to loading PostgreSQL tables. In this sequel, I demo how R can be used to collaborate with the database in…

Continue

Added by steve miller on August 8, 2019 at 6:22am — No Comments

Using Python and R to Load Relational Database Tables, Part I

I enjoy data prep munging for analyses with computational platforms such as R, Python-Pandas, Julia, Apache Spark, and even relational databases. The wrangling cycle provides the opportunity to get a feel for and preliminarily explore data that are to be later analyzed/modeled.

A critical task I prefer handling in computation over database is…

Continue

Added by steve miller on July 30, 2019 at 6:34am — 1 Comment

Writing/Reading Large R dataframes/data.tables -- Addendum.



After posting my most recent blog using …

Continue

Added by steve miller on July 2, 2019 at 9:00am — No Comments

Writing/Reading Large R dataframes/datatables.

I recently downloaded a 5 year Public Use Microsample (PUMS) from the latest release of the American Community Survey (ACS) census data. The data contain a wealth of demographic information on both American households and…

Continue

Added by steve miller on June 24, 2019 at 12:42pm — 1 Comment

Simulated Significance

I pulled out a dusty copy of Thinking Stats by Allen Downey the other day. I highly recommend this terrific little read that teaches statistics with easily understood examples using Python. When I purchased the book eight years ago, the Python code proved invaluable as…

Continue

Added by steve miller on May 30, 2019 at 7:56am — No Comments

Nowcasting Chicago Crime with Python-Pandas, and R.

In my many years as a data scientist, I've spent more time doing forecast work than any other type of predictive modeling. Often as not, the challenges have involved forecasting demand for an organization's many products/lines of business a year or more out based on five or more years of actual data, generally of daily…

Continue

Added by steve miller on May 7, 2019 at 5:34am — 1 Comment

Frequencies in Pandas Redux

 

A little less than a year ago, I posted a blog on generating multivariate frequencies with the Python Pandas data management library, at the same time showcasing Python/R graphics interoperability. For my…

Continue

Added by steve miller on April 25, 2019 at 5:33am — No Comments

March Madness, KenPom and Python/Pandas.



March Madness officially arrived at 6 PM CDT, Sunday 3/17/2019. 68 D1 schools -- 32 league champions and 36 at large selections -- received invitations to this year's tournament, which starts…

Continue

Added by steve miller on March 18, 2019 at 5:35am — No Comments

A Blast from Python Past -- Part 3

Last time, I posted Part 2 of a blog trilogy on data programming with Python. That article revolved on showcasing …

Continue

Added by steve miller on March 4, 2019 at 9:01am — No Comments

A Blast from Python Past -- Part 2

Last week I posted the first of a three-part series on basic data programming with Python. For that article, I resurrected scripts written 10 years ago that deployed core Python data structures and functions to assemble a Python list for…

Continue

Added by steve miller on February 5, 2019 at 7:55am — No Comments

A Blast from Python Past

I had an interesting discussion with one of my son's friends at a neighborhood gathering over the holidays. He's just reached the halfway point of a Chicago-area Masters in Analytics program and wanted to pick my brain on the state of the discipline.

Of the four major program foci of business, data, computation, and algorithms, he acknowledged…

Continue

Added by steve miller on January 28, 2019 at 8:24am — No Comments

Kicking Chicago with R.

Like most Chicago football fans, I was pretty distraught after the Bears lost last Sunday's playoff game courtesy of a missed field goal at the end -- a kick that first hit the goalpost and then the crossbar before ultimately failing miserably. While most local fans were grief-stricken like me, some were irrationally inconsolable, demanding the…

Continue

Added by steve miller on January 11, 2019 at 8:22am — No Comments

XGBoost with Python -- Part 0

After posting my last blog, I decided next to do a 2-part series on …

Continue

Added by steve miller on December 20, 2018 at 7:30am — 2 Comments

A So-So Second Date with Julia

A So-So Second Date with Julia

A few months ago, I wrote a quite positive blog on the Julia analytics language,…

Continue

Added by steve miller on December 12, 2018 at 11:49am — No Comments

Matching the Exact Matching of MatchIt

I started a series on causal inference for data science a few weeks back. I think CI methodologies offer great potential for the DS discipline, given that much of our data is observational i.e. outside…

Continue

Added by steve miller on November 19, 2018 at 1:31pm — No Comments

POTUS and the Stock Market

For those who follow the stock market, October's been a pretty rough month, with overall market levels, as measured by major indexes such as the Russell 3000 and the Wilshire 5000, now down into correction territory of 10 percent declines. The falls, unfortunately, closely follow a …

Continue

Added by steve miller on October 30, 2018 at 8:35am — No Comments

Mixing & Matching in R for Data Science

I've spent time over the last few months attempting to enhance my skills in the statistical sub-field of causal inference.

Overly simplified, causal inference comprises a series of methodologies and techniques to assist analysts in making the jump from association or correlation to cause and effect. How can one progress from noting a correlation between factors…

Continue

Added by steve miller on October 10, 2018 at 10:00am — 5 Comments

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service