This article was posted by Vik Paruchuri.
Working with large JSON datasets can be a pain, particularly when they are too large to fit into memory. In cases like this, a combination of command line tools and Python can make for an efficient way to explore and analyze the data. In this post, we’ll look at how to leverage tools like Pandas to explore and map out police activity in Montgomery County, Maryland. We’ll start with a look at the JSON data, then segue into exploration and analysis of the JSON with Python.
When data is stored in SQL databases, it tends to follow a rigid structure that looks like a table. Here’s an example from a SQLite database:
id|code|name|area|area_land|area_water|population|population_growth|birth_rate|death_rate|migration_rate|created_at|updated_at 1|af|Afghanistan|652230|652230|0|32564342|2.32|38.57|13.89|1.51|2015-11-01 13:19:49.461734|2015-11-01 13:19:49.461734 2|al|Albania|28748|27398|1350|3029278|0.3|12.92|6.58|3.3|2015-11-01 13:19:54.431082|2015-11-01 13:19:54.431082 3|ag|Algeria|2381741|2381741|0|39542166|1.84|23.67|4.31|0.92|2015-11-01 13:19:59.961286|2015-11-01 13:19:59.961286
As you can see, the data consists of rows and columns, where each column maps to a defined property, like
code. In the dataset above, each row represents a country, and each column represents some fact about that country.
But as the amount of data we capture increases, we often don’t know the exact structure of the data at the time we store it. This is called unstructured data. A good example is a list of events from visitors on a website. Here’s an example of a list of events sent to a server:
Python has great JSON support, with the json library. We can both convert lists and dictionaries to JSON, and convert strings to lists and dictionaries. JSON data looks much like a dictionary would in Python, with keys and values stored.
In this post, we’ll explore a JSON file on the command line, then import it into Python and work with it using Pandas.
Table of Contents:
- The dataset
- Exploring the JSON data
- Extracting information on the columns
- Extracting the data
- Reading the data into Pandas
- Converting columns
- Subsetting the stops
To read the full article, click here.
Top DSC Resources
- Article: What is Data Science? 24 Fundamental Articles Answering This Question
- Article: Hitchhiker’s Guide to Data Science, Machine Learning, R, Python
- Tutorial: Data Science Cheat Sheet
- Tutorial: How to Become a Data Scientist – On Your Own
- Categories: Data Science – Machine Learning – AI – IoT – Deep Learning
- Tools: Hadoop – DataViZ – Python – R – SQL – Excel
- Techniques: Clustering – Regression – SVM – Neural Nets – Ensembles – Decision Trees
- Links: Cheat Sheets – Books – Events – Webinars – Tutorials – Training – News – Jobs
- Links: Announcements – Salary Surveys – Data Sets – Certification – RSS Feeds – About Us
- Newsletter: Sign-up – Past Editions – Members-Only Section – Content Search – For Bloggers
- DSC on: Ning – Twitter – LinkedIn – Facebook – GooglePlus
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge