Home » Uncategorized

Reticulating Python and R — the American Community Survey Data Dictionary to Meta Data III.


Data Dictionary to Meta Data III is the third and final blog devoted to demonstrating the automation of meta data creation for the American Community Survey 2012-2016 household data set, using a published data dictionary. DDMDI was a teaser to show how Python could be used to generate R statements that could in turn be cut/pasted/applied in an R Jupyter notebook to create factor variables from integers. II took the work a step further, implementing the entire wrangling and data transformation exercise in R. Notebook III details how Python and R can interoperate in a single R kernel notebook to produce the desired results, in this case leveraging the R package reticulate.

Python and R are the leading languages of data science, and their ever-increasing collaboration/interoperation are of significant benefit to the DS community. The rpy2 library offers Python programmers access to R data and functions, while R package reticulate allows for: “Calling Python from R in a variety of ways including R Markdown, sourcing Python scripts, importing Python modules, and using Python interactively within an R session….(and for) translation between R and Python objects (for example, between R and Pandas data frames, or between R matrices and NumPy arrays).”

Major development platforms like Jupyter Notebook and RStudio with R Markdown embed additional capabilities for ease of interoperation between Python and R. Don’t be surprised with the emergence of totally seamless and transparent “polyglot” development environments in the very near future.

See the remainder of the blog here.

See the source_python file here