I prefer to structure my code the same way as an article, and if have academic background as well, you can relate. Hence, I usually start with preamble, where I put all the packages and toolkits I would like to use. Then, the main part follows and subsequently the rest (for example, where the result of the model should go, all the connections). Especially if you are getting started and learn python, I recommend to structure and comment your code as clear as possible. With Jupiter you have several options for headings and comments already implemented in the notebook. Clear structure and sufficient comments would help you to write the code which you can easily recap (and follow) in months or even years. My experience is that good codes are often (partly) recycled.

In this blog post, I summarized what I call, (1) general preamble and (2) visualization preamble. Although I post (3) my map and (4) analysis preamble as well, I will be more detailed about them in the forthcoming posts.

1. General preamble

import pandas as pd

import numpy as np

import datetime as dt

I need pandas and numpy always. Therefore, I always start with these two. I work often with time series. That’s why I need the third line. With datetime you can define the date, set an index on date and do all other date relevant manipulations.

 2. Visualization

import seaborn as sns

import matplotlib.pyplot as plt

%matplotlib inline

These lines are the very general, with seaborn and matplotlib we will cover most of visualizations, of any kind.

Then, we can become more specific and define the style and context. Refer to seaborn page for details. I prefer ‘whitegrid’ and ‘talk’ and set them in the beginning as my personized default framework. Refer to the seaborn for further options.



Furthermore, if you work for the company which have corporate identity with predefined colors, it is useful to create your own palette with company colors. The graphs would automatically get these colors (as long as you not overrule it). The corporate identity colors at my company are ordered in an agreed way. So, by the ordering in the list you define the order in which the colors are applied automatically.

mycolors = ['#0076a7','#ae5b3a','#CD997C','#EDD195'] #list of colors, I prefer hex numbers for colors

sns.set_palette(mycolors) # define the palette

sns.palplot(sns.color_palette()) # display

3. Maps

Maps are special kind of visualization, they worth a separate blog post (forthcoming). You will find my preamble for them below.

import folium

from folium.plugins import MarkerCluster  #if you want to cluster

from folium.plugins import MiniMap  # cool minimap in the low right corner of the big map

from folium import plugins

from folium import FeatureGroup  #if you customize

4. Analysis

I am time series econometrician. Therefore, my analysis preamble is very time series biased. You start time series analysis with tests to understand the autocorrelation structure of your variables. I would be more detailed on data analysis with python in my forthcoming blogs.

import statsmodels.api as sm

from statsmodels.tsa.stattools import adfuller  #Augmented Dickey-Fuller unit root test

from statsmodels.graphics.tsaplots import plot_acf #autocorrelation function

from statsmodels.graphics.tsaplots import plot_pacf  #partial autocorrelation

from statsmodels.tsa.api import VAR, DynamicVAR  #for time series analysis

You will find a lot useful explanations and relavant packages on statmodels webpage.

Views: 1673


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Bradley Clay on October 25, 2019 at 5:22am

Thanks so much for the post.  I had not heard of folium before, wow! I look forward to your post on the time series.  

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service