.

- Anyone can learn quantitative trading. You don’t need to have a PhD in Quantum Astrophysics to create quantitative trading systems or perform quantitative research
- The process of identifying a suitable trading strategy is identical to the scientific method: It requires creating hypotheses and making assumptions based on data to identify a statistical edge.
- Quantitative research (data mining, hypothesis testing…) always precedes backtesting trading strategies

As a trading enthusiast, I have always wondered if the best quant traders possessed predetermined trading strategies that they could use to consistently generate superior returns. I thought trading was as straightforward as solving an equation and using the solution to generate market beating returns. After doing some research and chatting with a few pro quant traders, I started familiarizing myself with quantitative analysis techniques to get a better understanding of the entire quantitative research process.

Let’s look at what the quantitative research process looks like.

We’ll be analyzing the stock of the most popular company in the world: Apple stock (ticker: $AAPL).

Note: We will be using the research environment provided by Quantconnect to perform our research.

We start off our analysis by plotting the distribution of AAPL returns over the past 5000 days.

# import all the modules we will be using in our analysis

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

plt.style.use('ggplot')

plt.rcParams['figure.figsize'] = [11, 8]qb = QuantBook() # Open the QuantBook Analysis Tool from Quantconnect

spy = qb.AddEquity("AAPL") # Load AAPL historical data

history = qb.History(qb.Securities.Keys, 5000, Resolution.Daily) # 5000 daily datapoints# Drop pandas level

history = history.reset_index().drop('symbol',axis=1)# Calculate APPL returns and fillna

history['returns'] = (history['close'].pct_change() * 100).fillna(0)

sns.distplot(history['returns'],label='Distribution of AAPL returns')

plt.legend()

plt.show()

Figure 1–1: Histogram of the distribution of Apple’s daily returns

The next step would be to compare this distribution to a normal distribution. (A lot of models used in quantitative finance and statistics assume a normal or lognormal distribution)

Let’s generate some random data to plot the normal distribution.

`random = np.random.normal(scale=1.23,size=500000)`

sns.distplot(random,label='Returns sampled from normal distribution',color='blue')

plt.legend()

random_series = pd.Series(random)

Figure 1–2: Histogram of the normal distribution (obtained by generating random data)

Now that we have plotted both distributions, let’s put them in one plot for comparison purposes.

Figure 1–3: AAPL distribution returns vs Random normal distribution

**Comparison Summary**

- The distribution of Apple stock daily returns resemble the normal distribution
- The distribution of Apple stock has “heavier tails”. In layman terms, we can expect outsized moves to the upside and downsize, more so than a normal distribution would suggest.
- Statisticians often use the "kurtosis"of a distribution as a statistical measure to simply identify whether the tails of a given distribution contain extreme values.

Jim Simons, arguably one of the most successful quant traders of all times once said: “We search through historical data looking for anomalous patterns that we would not expect to occur at random.”

Let’s follow Jim’s advice and explore Apple’s historical data to see if we can uncover some interesting patterns. Let’s look at the hourly resolution data (typically hard to find for free but easily accessible through the Quantconnect platform).

`aapl_hour = qb.History(qb.Securities.Keys, timedelta(days=5000), Resolution.Hour) # 5000 days of AAPL hourly data`

aapl_hour = aapl_hour.reset_index().drop('symbol',axis=1)

# Transform datetime into hours

aapl_hour['hour'] = aapl_hour['time'].apply(lambda x: x.hour)

# Calculate Hourly returns

aapl_hour['returns'] = (aapl_hour['close'].pct_change() * 100).fillna(0)

aapl_hour.head()

Figure 1–4: AAPL hourly returns

`sns.barplot(x='hour',y='returns',data=aapl_hour)`

Figure 1–4: Boxplot of AAPL hourly returns

It looks like the most substantial returns were made overnight. An interesting idea would be to “buy at market close and sell at market open” to capture overnight gains. You can investigate this phenomenon further by exploring this research paper which explains “the overnight drift” (Most gains are made in the after hours).

Let’s continue to explore the discrepancy that we previously discovered. The previous bar plot suggested that there were substantial gains made in the after hours. Let’s plot the cumulative performance of overnight returns vs intraday returns to better visualize and confirm this discrepancy.

`aapl_hour.query("hour == 10")['returns'].cumsum().plot(label='Overnight Returns')`

aapl_hour.query("hour != 10")['returns'].cumsum().plot(label='Intraday Returns')

plt.legend()

Figure 1–5: Overnight Returns vs Intraday Returns (Apple Stock)

Our hypothesis was correct. It’s quite apparent that most returns are realized in the after hours.

Autocorrelation is a mathematical representation of the degree of similarity between a time series and a lagged version of itself over successive time intervals. in simpler terms, it describes how the present value of a series is related to its past values.

The goal of the quantitative analyst is to look for possible trends within the dataset. This can be accomplished by analyzing the Autocorrelation function plot (ACF plot).

figure 1–6: ACF plot

It looks like there are no significantly correlated lags (we are basically looking for autocorrelations that lie outside the red band).

If the first lag on the graph lied outside for the red band for instance, then we would have concluded that there is a negative autocorrelation at lag 1 (on the x-axis of the ACF plot). Once you have that information, you could potentially investigate the relationship between that lag and the stock’s annual volatility.

history['rolling_lag_1']=history['returns'].rolling(window=100).apply(lambda x:acf(x)[1],raw=True)history['annVol'] = history['returns'].rolling(window=100).std() * np.sqrt(252)

sns.regplot(x='annVol',y='rolling_lag_1',data=history)

Fig 1–7: Rolling lag 1 vs Annual Volatility

There seems to be a negative correlation between the volatility of AAPL and its lag 1 autocorrelation. Furthermore, we can visualize how that relationship held up over the past 5 years.

`sns.regplot(x='annVol',y='rolling_lag_1',data=history.loc['2015-01-01':'2021-01-01'],label='2015-Now')`

plt.legend()

Figure 1–8: Annual volatility vs Rolling lag 1 (from 2015 to now)

As expected, there is no clear consistent relationship between volatility and the rolling lag 1 correlation of Apple stock returns. This is how you would typically investigate time series data.

- The noise to signal ratio is extremely high in quantitative analysis. Clean and consistent patterns are usually very subtle and can quickly vanish
- Based on our research, a substantial amount of Apple stock returns is made in the after-hours

- Using a business rules engine to streamline decision-making
- IBM boosts vertical cloud push with financial services cloud
- Exploring GRC automation benefits and challenges
- Check model accuracy with Facebook AI's new data set
- AR use cases gain ground due to COVID-19, maturing tech
- Air Force's data overhaul makes analytics a priority
- AI adoption in the supply chain requires a strategic approach
- New DataRobot CEO sees bright AI future for the vendor
- Why consider an augmented data catalog?
- Consider IoT TPM security to augment existing protection
- 11 Best Data Science Blogs to Follow

Posted 12 April 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central