**Previous Relevant Posts**

- Single regression with R to identify relationship between WTI and s...
- Getting stock volatility in R & Getting Histogram of returns

**What is CAPM?**

According to the investopedia (http://www.investopedia.com/terms/c/capm.asp),

The capital asset pricing model (CAPM) is a model that describes the relationship between risk and expected return and that is used in pricing of risky securities.

(Generally, you can understand that there is a linear relationship between risk (stock volatility) and the stock return.)

The general idea behind CAPM is that investors need to be compensated in two ways: time value of money and risk. The time value of money is represented by the risk-free (rf) rate in the formula and compensates the investors for placing money in any investment over a period of time. The other half of the formula represents risk and calculates the amount of compensation the investor needs for taking on additional risk. This is calculated by taking a risk measure (beta) that compares the returns of the asset to the market over a period of time and to the market premium (Rm-Rf).

The general idea behind CAPM is that investors need to be compensated in two ways: time value of money and risk. The time value of money is represented by the risk-free (rf) rate in the formula and compensates the investors for placing money in any investment over a period of time. The other half of the formula represents risk and calculates the amount of compensation the investor needs for taking on additional risk. This is calculated by taking a risk measure (beta) that compares the returns of the asset to the market over a period of time and to the market premium (Rm-Rf).

**How do we approach**

Generally, rf is consistent with the T-bill rate. During 2015, it was almost 0.02 (2%) as Fed kept interest rate low to boost the economy. CAPM can be represented in portfolio. Now, I am going to choose the TOP 20 NYSE technology stocks during Mar 2016. This is done by single regression as we used in previous post.

Although we should be careful to use single regression in this situation because of the presence of "Alpha"

http://themarketmogul.com/wp-content/uploads/2015/05/Screen-Shot-20... |

This alpha is a interception of regression analysis and is not identical to rf. With this portfolio, you can expect the interception amount the of the return when your portfolio has zero volatility. Generally, the high alpha is regarded as a good portfolio. We will deal with that.

**Strategy**

We are going to go though general data scientist strategy.

(1) Data Gathering: We are going to gather the market data from the API.

(2) Data Manipulation: We choose only Top 20 firms in terms of market cap.

(3) Data Visulaization: Draw the risk-return graph

(4) Data Interpretation: See this graph is consistent with CAPM theory.

**Codes**

(1) Data Gathering: We are going to gather the market data from the API.

(2) Data Manipulation: We choose only Top 20 firms in terms of market cap.

(3) Data Visulaization: Draw the risk-return graph

(4) Data Interpretation: See this graph is consistent with CAPM theory.

#I found this code is really versatile. Feel free to use this code in analyzing equity market!

#Getting TOP 100 stocks in NYSE volitility and return

library(TTR) #To get tickers

library(plyr) #For sorting

library(tseries) #For volatility / return

library(stringr) #String manipulation

library(calibrate) #To represent stock name on scatter plot

#NASDAQ, NYSE

market <- "NYSE"

#Technology, Finance, Energy, Consumer Services, Transportation, Capital Goods, Health Care, Basic Industries

sector <- "Technology"

getcapm <- function(stock) {

#Getting data from server

data <- get.hist.quote(stock, #Tick mark

start="2016-03-01", #Start date YYYY-MM-DD

end="2016-03-31" #End date YYYY-MM-DD

)

#We only take into account "Closing price", the price when the market closes

yesterdayprice <- data$Close

#This is a unique feature of R better than Excel

#I need to calculate everyday return

#The stock return is defined as (today price - yesterday price)/today price

todayprice <- lag(yesterdayprice)

#ret <- log(lag(price)) - log(price)

rets <- (todayprice - yesterdayprice)/todayprice

#Annualized and percentage

vol <- sd(rets) * sqrt(length(todayprice))

#Getting Geometric Mean.

#You might be tempted to use just mean(). Don't do that in stock market.

geometric_mean_return_prep <- rets + 1

geometric_mean_return_prep <- data.frame(Date=time(geometric_mean_return_prep), geometric_mean_return_prep, check.names=FALSE, row.names=NULL)

geometric_mean_return = 1

for(i in 1:length(geometric_mean_return_prep)) {

geometric_mean_return = geometric_mean_return * geometric_mean_return_prep[i,2]

}

geometric_mean_return <- geometric_mean_return^(1/length(geometric_mean_return_prep))

geometric_mean_return <- geometric_mean_return -1

information <- c(geometric_mean_return, vol) #It's a trick to return multiple values in one return.

return(information)

}

convert_marketcap <- function(str) {

str <- gsub("\\$", "", str) #Get rid of "$" first

#The reason why I use \\ is that $ has a special meaning in regular expression

#Regular expression is not the topic. #I'll deal with later

multiplier <- str_sub(str,-1,-1) #Million? Billion?

pure_number <- as.numeric(gsub("(B|M)", "", str)) #Get rid of M or B. Turn it into number

if(multiplier == "B") {

#Billion

adjustment <- 1000000000

} else if(multiplier == "M") {

#Million

adjustment <- 1000000

} else {

#Don't adjust it.

adjustment <- 1

}

return (pure_number * adjustment)

}

original <- stockSymbols()

#Getting NASDAQ

listings <- original[original$Exchange==market,]

#As these data include "NA," we need to clean them up for further data manipulation.

#If you don't clean up NA, you would encounter error while manipulating

listings <- listings[!is.na(listings$MarketCap),]

listings <- listings[!is.na(listings$Sector),]

#I want to focus on the specific sector

listings <- listings[listings$Sector==sector,]

#Market cap is string right now. We need to convert this to number

listings$MarketCap <- sapply(listings$MarketCap, convert_marketcap)

#Sort the list descending order of market capital

listings <- arrange(listings, desc(listings$MarketCap))

capm <- data.frame(ticker="", volatility=1:20, geometric_return=1:20)

capm$ticker <- listings$Symbol[1:20]

for(i in 1:20) {

information_on_stock <- getcapm(capm$ticker[i])

capm$geometric_return[i] <- information_on_stock[1]

capm$volatility[i] <- information_on_stock[2]

}

main_name <- paste(market, " / ")

main_name <- paste(main_name, sector)

main_name <- paste(main_name, " in Mar 2015")

capm_regression<-lm(capm$geometric_return ~ capm$volatility)

plot(x=capm$volatility,y=capm$geometric_return,pch=19, main = main_name, xlab="Stock Volatility", ylab="Stock Return")

#I want to know which stock is outlier.

textxy(capm$volatility, capm$geometric_return, capm$ticker)

abline(capm_regression, col="red") # regression line (y~x)

print(summary(capm_regression))

**Outcome Interpretation**

#Getting TOP 100 stocks in NYSE volitility and return

library(TTR) #To get tickers

library(plyr) #For sorting

library(tseries) #For volatility / return

library(stringr) #String manipulation

library(calibrate) #To represent stock name on scatter plot

#NASDAQ, NYSE

market <- "NYSE"

#Technology, Finance, Energy, Consumer Services, Transportation, Capital Goods, Health Care, Basic Industries

sector <- "Technology"

getcapm <- function(stock) {

#Getting data from server

data <- get.hist.quote(stock, #Tick mark

start="2016-03-01", #Start date YYYY-MM-DD

end="2016-03-31" #End date YYYY-MM-DD

)

#We only take into account "Closing price", the price when the market closes

yesterdayprice <- data$Close

#This is a unique feature of R better than Excel

#I need to calculate everyday return

#The stock return is defined as (today price - yesterday price)/today price

todayprice <- lag(yesterdayprice)

#ret <- log(lag(price)) - log(price)

rets <- (todayprice - yesterdayprice)/todayprice

#Annualized and percentage

vol <- sd(rets) * sqrt(length(todayprice))

#Getting Geometric Mean.

#You might be tempted to use just mean(). Don't do that in stock market.

geometric_mean_return_prep <- rets + 1

geometric_mean_return_prep <- data.frame(Date=time(geometric_mean_return_prep), geometric_mean_return_prep, check.names=FALSE, row.names=NULL)

geometric_mean_return = 1

for(i in 1:length(geometric_mean_return_prep)) {

geometric_mean_return = geometric_mean_return * geometric_mean_return_prep[i,2]

}

geometric_mean_return <- geometric_mean_return^(1/length(geometric_mean_return_prep))

geometric_mean_return <- geometric_mean_return -1

information <- c(geometric_mean_return, vol) #It's a trick to return multiple values in one return.

return(information)

}

convert_marketcap <- function(str) {

str <- gsub("\\$", "", str) #Get rid of "$" first

#The reason why I use \\ is that $ has a special meaning in regular expression

#Regular expression is not the topic. #I'll deal with later

multiplier <- str_sub(str,-1,-1) #Million? Billion?

pure_number <- as.numeric(gsub("(B|M)", "", str)) #Get rid of M or B. Turn it into number

if(multiplier == "B") {

#Billion

adjustment <- 1000000000

} else if(multiplier == "M") {

#Million

adjustment <- 1000000

} else {

#Don't adjust it.

adjustment <- 1

}

return (pure_number * adjustment)

}

original <- stockSymbols()

#Getting NASDAQ

listings <- original[original$Exchange==market,]

#As these data include "NA," we need to clean them up for further data manipulation.

#If you don't clean up NA, you would encounter error while manipulating

listings <- listings[!is.na(listings$MarketCap),]

listings <- listings[!is.na(listings$Sector),]

#I want to focus on the specific sector

listings <- listings[listings$Sector==sector,]

#Market cap is string right now. We need to convert this to number

listings$MarketCap <- sapply(listings$MarketCap, convert_marketcap)

#Sort the list descending order of market capital

listings <- arrange(listings, desc(listings$MarketCap))

capm <- data.frame(ticker="", volatility=1:20, geometric_return=1:20)

capm$ticker <- listings$Symbol[1:20]

for(i in 1:20) {

information_on_stock <- getcapm(capm$ticker[i])

capm$geometric_return[i] <- information_on_stock[1]

capm$volatility[i] <- information_on_stock[2]

}

main_name <- paste(market, " / ")

main_name <- paste(main_name, sector)

main_name <- paste(main_name, " in Mar 2015")

capm_regression<-lm(capm$geometric_return ~ capm$volatility)

plot(x=capm$volatility,y=capm$geometric_return,pch=19, main = main_name, xlab="Stock Volatility", ylab="Stock Return")

#I want to know which stock is outlier.

textxy(capm$volatility, capm$geometric_return, capm$ticker)

abline(capm_regression, col="red") # regression line (y~x)

print(summary(capm_regression))

As the risk free rate is 0.002, we can just neglect this rate. (Right now, it makes sense to assume that there is no risk free rate. We are living in the historically low risk-free rate world.)

So, thus remaining things are two - Alpha and Beta.

**Interpretation**
**Assignment**

So, thus remaining things are two - Alpha and Beta.

Call:

lm(formula = capm$geometric_return ~ capm$volatility)

Residuals:

Min 1Q Median 3Q Max

-0.017195 -0.004096 -0.002011 0.004146 0.018340

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.002651 0.004949 -0.536 0.599

capm$volatility 0.087089 0.072577 1.200 0.246

Residual standard error: 0.009088 on 18 degrees of freedom

Multiple R-squared: 0.07407, Adjusted R-squared: 0.02263

F-statistic: 1.44 on 1 and 18 DF, p-value: 0.2457

.

It is difficult to say that this regression is meaningful statistically as the p-value is higher than 0.05 (0.599 & 0.246 respectively) In this case, we call that "the idiosyncratic risk is much higher than the market risk." Idiosyncratic risk is the risk that the company only has. It could be management issue. It could be their products level. But, for now, let's pretend that this graph is statistically meaningful.

Alpha (Intercept) is -0.2651%, meaning that this portfolio is not good portfolio. Again, Alpha is the expected return when the volatility is 0. If there's no volatility, still we could lose the money with this portfolio.

Beta is 0.087, meaning that we can expect 8.7% additional return, when we bear the 1.0 volatility. When we bear 0.5 volatility it would become 4.35%. (I don't want to say that this is a good portfolio)

In March 2016, Infosys (INFY) had a better performance than LinkedIn(LINKD). It was less volatile, but generated the better return. If you had invested in Infosys in march 2016, it would have been a much better decision than buying LinkedIn stock.

Please keep in mind that these are all historical data. It doesn't tell you the future performance, but we can forecast the future performance by looking at the past data.

.

Please, find the high beta portfolio in Mar 2016. You can do that with my codes. Just tweak it. I'll post the answer later on.

More R codes? www.mbaprogrammer.com

© 2020 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central