[The goal of this page]
When I have read all R introductions, the books were filled with just instructions. The goal of R is to solve our real life problem. That's why I want to minimize this page. In the real though, we need to understand some key concepts that might be useful for you to tackle the real life problem. Here's basic data structures and data manipulation method.
Still, I believe the best way to learn R programming language is to tackle the real life problems. Please, just skim through how it works.
[Assign Variables]
Unlike C, C++, or JAVA, you don't need to think about memory. The beauty of R is that it enables you to make codes without knowing in depth knowledge on algorithms or memory management method. You can assign variable easily in R.
Let's assume that you know the cash flow of incoming several years. (T=1 30, T=2 50, T=3 80) You want to discount it with the rate (r=0.02). However, it would be tedious type 1+0.02 always. That's why programmers came up with brilliant idea - assigning variable.
Here's some examples
Please, keep in mind that ">" or "+" signs are command prompt. These are not things that I type.
> r<-0.02
> 30/(1+r)
[1] 29.41176
> 50/(1+r)^2
[1] 48.05844
> 80/(1+r)^3
[1] 75.38579
[Iterations]When you run into sigma sign(∑), you can think of this command, which allows you to run same commands for certain amount of time.
<Annuity Problem>
Let's assume that we have an incoming cash flow $50 in T=1, T=2, and T=3. The discount rate is r=0.05. What is the present value of this cash flow?
> pv <- 0
> for (i in 1:3) {
+ pv <- pv + 50/(1.05)^i
+ }
> print(pv)
[1] 136.1624
>
[Data type]
There are some data that cannot be represented in a single number. The cash flow could be a good example. If you want to know historical stock performance, the time series data type would meet your requirement. Sometimes, we want the data type like excel, a combination of numbers, strings, true or false values, all of things. Then you should take into account using data frame. There is no golden rule in which you should use this data type in this situation. What it takes to be a good data scientist is having an eye for the data type, make it easier for you to solve business problem quickly.
<Collection Data Type (aka Vector)>
In order to represent cash flow, collection type can be helpful.
Let's assume that there is a cash flow T=1 50, T=2 100, T=3 50, T=4 80
Here's how do we defined the collection type.
> cf <- c(50,100,50,80) #c means "collection"
> cf
[1] 50 100 50 80
> cf[1] #Getting value of first index
[1] 50
> cf[2] #Getting value of second index
[1] 100
> cf[3] #Getting value of third
[1] 50
> cf[4] #Fourth
[1] 80
>
Keep in mind that you don't need to type the indices for cash flow. It automatically start off with "1."
<Data Frame>If you want more Excel like or database like data type, "data frame" would meet your need. Let's assume that you need this information.
Month |
LIBOR RATE |
T-bill rate |
1 month |
0.01 |
0.015 |
3 month |
0.02 |
0.022 |
6 month |
0.03 |
0.033 |
You can build the data frame like this.
> information_table <- data.frame( + month=c("1m", "3m", "6m"), + libor_rate=c(0.01, 0.02, 0.03), + tbill_rate=c(0.015, 0.022, 0.033) + ) > information_table[2] libor_rate
1 0.01
2 0.02
3 0.03
> information_table[,2] [1] 0.01 0.02 0.03
>
The important thing is that when you choose the name of the variable, make sure you don't use any white space(space, tab, or enter).
if you type data frame[2] you can have access to 2nd column
if you type data frame[,2] you can have access to 2nd row.
if you type data frame[2,2], you can have access to the value in 2nd column & 2nd row.
<List Data Type>If you want to put together the different types of the collection, list type could be useful. Each collection doesn't have to have same length. This is especially useful when you want to store unstructured data.
> libor<-c(0.01, 0.02, 0.03) #Number > tbill<-c(0.01, 0.02, 0.05, 0.12) #Number > swap<-c("1m", "2m", "3m", "6m", "12m") #String > counterpartyrisk<-c(TRUE, FALSE, TRUE) #Boolean > list_all <- list(libor, tbill, swap, counterpartyrisk) In order to have access to each collection you should use [[]] instead of []. You can use [], but you'll have the list type instead of the collection type, make it harder for you to have access to specific value.
> list_all[1] #this returns list type [[1]]
[1] 0.01 0.02 0.03
> list_all[[1]] #this returns collection type(vector) [1] 0.01 0.02 0.03
> list_all[1][1] #this returns list type too. [[1]]
[1] 0.01 0.02 0.03
> list_all[[1]][1] #If you want to have access to 1st list and 1st row, this is the right command. [1] 0.01
<Factors (Categorical value)> Factor is a categorical variable, such as Male/Female, Adult/Kids, A/B/C/D/F Grades. If you use just Excel, you would hardly run into this data type, but if you use databases, such as Oracle, My-SQL, or MS-SQL, you should get yourself familiar with this concept, as it separates the actual value from the code value.
> information_table <- data.frame( + people_name=c("Tom", "Jane", "Greg", "Kelly"), + people_gender=c(1, 2, 1, 2) + ) > information_table$people_gender <- factor(information_table$people_gender, labels=c("Male", "Female")) > information_table people_name people_gender
1 Tom Male
2 Jane Female
3 Greg Male
4 Kelly Female
> information_table$people_gender [1] Male Female Male Female
Levels: Male Female
One way to have access to the entire column in the specific data frame is to use "$". DataFrame$ColumnName
<Time Series>
Basically, time series is almost similar to the data frame. The only difference is the row name is defined as the time stamp, like 2015-01-01. It allows you to calculate the date easier than the data frame. In our posts, we are going to use "zoo" type. I want to defer explanation until we run into "tseries" when you get the stock data from the internet. Again, I want to keep the introduction part as simple as possible. I believe that's the huge differentiator from other online R material. You should learn program or tool by actually solving.
When we think about the function, mathematically it can be defined as y=f(x). It's not that different. The function is supposed to give you certain return corresponding to the x value. If the x value is the same, so does y, unless it is a stochastic function.
I want to get square value(^2) of the number. Let's do that.
> getsquare<-function(x) {
+ y <- x^2
+ return(y)
+ }
>
> getsquare(2)
[1] 4
> getsquare(4)
[1] 16
[Other data manipulation]
<sapply> If you want to change all the values in data frame, let's say, you want to multiply the data frame by 2, you can use "sapply" command. Here's the step.
(1) define(or read) the data
(2) define the function first.
(3) use sapply to the data frame that you want to change. The defined function in (1) automatically applies to all your data points.
(4) see the result.
> information_table <- data.frame( + libor_rate=c(0.01, 0.02, 0.03), + tbill_rate=c(0.015, 0.022, 0.033) + ) >
> changevalues <- function(x) { + y <- x*2 + return (y) + } >
> information_table<-sapply(information_table, changevalues) > information_table libor_rate tbill_rate
[1,] 0.02 0.030
[2,] 0.04 0.044
[3,] 0.06 0.066
Here's some other introduction page for R
[Other tutorials] R Tutorial 2 - If~else, File read/write, String manipulation [Following Study] I think you learn almost all necessary basic functions to tackle finance problems. It's time to solve real problem. Here's some problems that may enhance your R-skills.
How to Calculate Annuity value in R (Mortgage Problem)
You need to be a member of Data Science Central to add comments!
Join Data Science Central