Contributed by Sharan Duggal.
Insider Trading is often associated with the illegal activity of trading in shares of ones company based on material non public information. But, insider trading is not always illegal. It is not illegal to own, or buy and sell shares of the company you work for, as long as the transactions are being disclosed publicly in a timely manner and as long as the information that is being used to trade is publicly available. This project focuses legal element of insider trading and its potential impact on short term stock prices.
Technical trading schools often tout the relationship between Insider transactions and stock prices. There are banks that include this information as one of several indicators to create a composite score of the relative strength of a stock. I wanted to do my own analysis to see if this relationship holds up, especially on a short term time frame (i.e. 1 to 5 days after an insider transaction). If there is a relationship, then with some additional analysis, it could be converted to a trading strategy. Some additional questions I wanted to address: - Are there any differences across sectors when it comes to the aforementioned relationship between Insider transactions and price? - Does time come into play at all, and are there seasonality factors that I should be taking into account? - What machine learning methodology would perform the best in predicting price behavior after an insider transaction?
To address my objectives, I web scraped Insidertrading.org and retrieved ~40,000 records of publicly disclosed insider transactions from the past year (August 1st, 2015 to August 5th, 2016) that involved a transaction size of 10,000 shares or more. I matched up the ~40,000 transactions to ticker information on Yahoo.com (through the Quantmod package in R) and appended time-specific information for the 5 days following the date of each Insider transaction. After discarding OTC (Over the counter) stocks and any symbols that weren't listed on the NASDAQ, NYSE or AMEX exchanges, I was left with 28,769 records: -- 13,202 insider stock purchases, and -- 15,567 insider stock sales
The main outcome variable I wanted to look at was the percentage increase or decrease in price anywhere from one day to five days after an insider transaction. The chart below shows us the average percentage change in price following an insider transaction. The price change per day has been taken cumulatively, i.e. the price change depicted for Day 5 is the difference in the closing price five days after the transaction vs. the closing price on the day of the transaction. Theoretically someone can trade off an insider trading signal much sooner than the close of market. The closing price on the day of the transaction is a much more conservative number for average traders like myself who would not be waiting at their computers for a signal, itching to make a trade. So all prices you will see in this blog are closing prices. The chart shows that there appears to be a relationship between insider transactions and price changes. Following an insider sale (yellow part of the bar), we are seeing a decline in price, and following an insider purchase we are seeing an increase in price, which is exactly the behavior we want to see. Moreover the extent of change increases as the days go by. We also see an interesting difference between stocks priced less than $5 vs. those over $5. Stocks priced less than $5 are generally more volatile than higher priced stocks. Volatility can be good though for a trading strategy.
Volume is often referenced as a confirmation signal for identifying support behind a stock's price movement. The volume chart on the right confirms this difference between lower and higher priced stocks. More importantly, this chart is showing that following an insider transaction, on average, volume picks up, compared to the day of the transaction, and stays up for the next couple of days. We don't see any negative numbers in this case.
Looking at the entire set of transactions over the past year, the volume of insider trades seems to have spiked late last year, in November and December, and there seems to be a lull in summer. The behavior in the spring and summer months doesn't seem to be atypical of the general market. Comparing the volume of insider transactions to the volume bars below the price chart of the general market, it seems that the November spike does not gel across the two sources of information. It may be useful to factor volume differences between Insider trading and general market trading in a trading strategy because it may be highlighting potential imbalances not apparent in the market.
The SPY (pictured in the above right hand side chart), is an exchange-traded fund (ETF) that represents the S&P 500 index. It has been quite choppy all year, ending up a little higher than where it started out in August last year. Interestingly, the average percentage change in price following an insider transaction (right), follows the same pattern regardless of transaction side. For e.g. in February there was a large spike in the market, and the price change over the 5 days following an insider transaction (regardless of whether it was a purchase or a sale) resulted in an increase in price. So overall market performance seems to have an impact and it should be factored into a good trading strategy.
Looking at price changes by sector (below), we see that the Energy sector responds in the most favorable way after an insider buy side transaction. On average it has moved 4.88% following an insider transaction. On the short side of things, the Energy sector again features as the leading mover (-1.6% on average) in the direction we would want the stock to move after insiders sell a portion of their shares. Health care comes next with a -1.2% move.
Following that exploratory data analysis, the next thing I wanted to address was a choice of machine learning methodology that would appropriately model the overall relationship between insider transactions and price. I first attempted to look into using multiple linear regressions but upon looking at the plots representing the assumptions that need to be met, it seemed like the data violated normality and equality of variances pretty egregiously. Additionally, there seemed to be a fair number of high leverage and high residual points as indicated by the plot on the bottom right of the chart grid below. I did not want to remove outliers in this case, because a high volume stock for example, could actually be a good thing for a trading strategy. So I moved on to trying out a K nearest neighbor classification method, mainly because there are fewer requirements for your data to meet certain assumptions. As a first step, I included all predictor variables to see how the model would work. I wanted to use a mix of categorical and continuous variables and so I first converted the categorical variables to be on a 0-1 scale and normalized the continuous variables to also be a percentile between 0 and 1. This would ensure that all the metrics would be on the same playing level and that no one variable would overshadow the others in the model.
Side (“Buy”/“Sell”) | Sector | Share Price | # of Shares | Transaction Value | Remaining Shares Post Transaction | Exchange | Month | ( % Volume Change 1 Day after Transaction | Market Capitalization Target Variable % Price Change 5 days after Transaction (Categorized: <-1.5%|-1.5% to 1.5% |>1.5%)
23,857 records (10,481 buy side | 13,376 sell side)
3,945 records (1,949 buy side | 1,996 sell side) — Only records from August
The table shows the output of the model, with predicted values in the columns and true values in the rows. The diagonals represent the accurate predictions. At first glance, this model seems to perform okay, with 57% of the predicted drops in price being accurate predictions, but if you notice, the data set itself contains a higher number of records indicating a drop in price (i.e. more sales side transactions in general in the data). I ran a chi-square test of independence on the model, and the observed values were shown to be not independent from the expected values given the distribution of data. In other words this model was not significantly better than simply guessing based on the distribution seen in the data. I ran the model again, this time filtering for stocks priced below $5, for their greater volatility. I also changed the threshold of my price groups of interest to +-1% to be more inclusive for the 1st and 3rd groups, and removed the month variables from my predictor set since it did not add much more explanatory power to the model. The results are included in the table here and while there are fewer predictions overall (because of the filtered data set), the model performs better. The chi-square test of independence shows a p value that is not significant: 0.13, enabling us to retain the null hypothesis that the predicted Vs. expected values cannot be shown to be dependent on each other.
I am happy to have been successful in modeling a crude relationship between insider transactions and stock prices in the short term, and in selecting a machine learning methodology that works as a good starting point. Clearly a lot of improvements can still be made to the model. For one, I would love to look at factoring overall market behavior (both price and volume) into the model. Historic average monthly market performance could be a powerful predictor that could be added into the mix. I would also love to look a bit more closely at running models for specific sectors -- especially the Energy & Health care sectors, since they have shown a stronger relationship with insider transactions in this analysis. Again, this is just the beginning of my analysis of such insider data and as such I would like to very strongly advise anyone reading to not to make any trades based on the information given here, but to use it as a learning tool to better understand the relationships that exist between insider transactions and stock prices.