Subscribe to DSC Newsletter

R tutorial (R programming basic 101)

[The goal of this page]
When I have read all R introductions, the books were filled with just instructions. The goal of R is to solve our real life problem. That's why I want to minimize this page. In the real though, we need to understand some key concepts that might be useful for you to tackle the real life problem. Here's basic data structures and data manipulation method.

Still, I believe the best way to learn R programming language is to tackle the real life problems. Please, just skim through how it works.

[Assign Variables]
Unlike C, C++, or JAVA, you don't need to think about memory. The beauty of R is that it enables you to make codes without knowing in depth knowledge on algorithms or memory management method. You can assign variable easily in R.

Let's assume that you know the cash flow of incoming several years. (T=1 30, T=2 50, T=3 80) You want to discount it with the rate (r=0.02). However, it would be tedious type 1+0.02 always. That's why programmers came up with brilliant idea - assigning variable.

Here's some examples

Please, keep in mind that ">" or "+" signs are command prompt. These are not things that I type.

> r<-0.02
> 30/(1+r)
[1] 29.41176
> 50/(1+r)^2
[1] 48.05844
> 80/(1+r)^3
[1] 75.38579

[Iterations]
When you run into sigma sign(∑), you can think of this command, which allows you to run same commands for certain amount of time.

<Annuity Problem>
Let's assume that we have an incoming cash flow $50 in T=1, T=2, and T=3. The discount rate is r=0.05. What is the present value of this cash flow?

> pv <- 0
> for (i in 1:3) {
+     pv <- pv + 50/(1.05)^i
+ }
> print(pv)
[1] 136.1624
>
[Data type]

There are some data that cannot be represented in a single number. The cash flow could be a good example. If you want to know historical stock performance, the time series data type would meet your requirement. Sometimes, we want the data type like excel, a combination of numbers, strings, true or false values, all of things. Then you should take into account using data frame. There is no golden rule in which you should use this data type in this situation. What it takes to be a good data scientist is having an eye for the data type, make it easier for you to solve business problem quickly.

<Collection Data Type (aka Vector)>
In order to represent cash flow, collection type can be helpful.
Let's assume that there is a cash flow T=1 50, T=2 100, T=3 50, T=4 80

Here's how do we defined the collection type.

> cf <- c(50,100,50,80) #c means "collection"
> cf
[1]  50 100  50  80
> cf[1] #Getting value of first index
[1] 50
> cf[2] #Getting value of second index
[1] 100
> cf[3] #Getting value of third
[1] 50
> cf[4] #Fourth
[1] 80
>

Keep in mind that you don't need to type the indices for cash flow. It automatically start off with "1."

<Data Frame>
If you want more Excel like or database like data type, "data frame" would meet your need. Let's assume that you need this information.

Month LIBOR RATE T-bill rate
1 month 0.01 0.015
3 month 0.02 0.022
6 month 0.03 0.033

You can build the data frame like this.

> information_table <- data.frame(
+     month=c("1m", "3m", "6m"),
+     libor_rate=c(0.01, 0.02, 0.03),
+     tbill_rate=c(0.015, 0.022, 0.033)
+ )
> information_table[2]
  libor_rate
1       0.01
2       0.02
3       0.03
> information_table[,2]
[1] 0.01 0.02 0.03
>

The important thing is that when you choose the name of the variable, make sure you don't use any white space(space, tab, or enter).

if you type data frame[2] you can have access to 2nd column
if you type data frame[,2] you can have access to 2nd row.

if you type data frame[2,2], you can have access to the value in 2nd column & 2nd row.
<List Data Type>
If you want to put together the different types of the collection, list type could be useful. Each collection doesn't have to have same length. This is especially useful when you want to store unstructured data.

> libor<-c(0.01, 0.02, 0.03) #Number
> tbill<-c(0.01, 0.02, 0.05, 0.12) #Number
> swap<-c("1m", "2m", "3m", "6m", "12m") #String
> counterpartyrisk<-c(TRUE, FALSE, TRUE) #Boolean
> list_all <- list(libor, tbill, swap, counterpartyrisk)

In order to have access to each collection you should use [[]] instead of []. You can use [], but you'll have the list type instead of the collection type, make it harder for you to have access to specific value.

> list_all[1] #this returns list type
[[1]]
[1] 0.01 0.02 0.03

> list_all[[1]] #this returns collection type(vector)
[1] 0.01 0.02 0.03
> list_all[1][1] #this returns list type too.
[[1]]
[1] 0.01 0.02 0.03

> list_all[[1]][1] #If you want to have access to 1st list and 1st row, this is the right command.
[1] 0.01


<Factors (Categorical value)>
Factor is a categorical variable, such as Male/Female, Adult/Kids, A/B/C/D/F Grades. If you use just Excel, you would hardly run into this data type, but if you use databases, such as Oracle, My-SQL, or MS-SQL, you should get yourself familiar with this concept, as it separates the actual value from the code value.

> information_table <- data.frame(
+     people_name=c("Tom", "Jane", "Greg", "Kelly"),
+     people_gender=c(1, 2, 1, 2)
+ )
> information_table$people_gender <- factor(information_table$people_gender, labels=c("Male", "Female"))
> information_table
  people_name people_gender
1         Tom          Male
2        Jane        Female
3        Greg          Male
4       Kelly        Female
> information_table$people_gender
[1] Male   Female Male   Female
Levels: Male Female
One way to have access to the entire column in the specific data frame is to use "$". DataFrame$ColumnName
<Time Series>
Basically, time series is almost similar to the data frame. The only difference is the row name is defined as the time stamp, like 2015-01-01. It allows you to calculate the date easier than the data frame. In our posts, we are going to use "zoo" type. I want to defer explanation until we run into "tseries" when you get the stock data from the internet. Again, I want to keep the introduction part as simple as possible. I believe that's the huge differentiator from other online R material. You should learn program or tool by actually solving.
[Defining functions]
When we think about the function, mathematically it can be defined as y=f(x). It's not that different. The function is supposed to give you certain return corresponding to the x value. If the x value is the same, so does y, unless it is a stochastic function.

I want to get square value(^2) of the number. Let's do that.

> getsquare<-function(x) {
+     y <- x^2
+     return(y)
+ }
>
> getsquare(2)
[1] 4
> getsquare(4)
[1] 16
[Other data manipulation]
<sapply>
If you want to change all the values in data frame, let's say, you want to multiply the data frame by 2, you can use "sapply" command. Here's the step.

(1) define(or read) the data
(2) define the function first.
(3) use sapply to the data frame that you want to change. The defined function in (1) automatically applies to all your data points.
(4) see the result.

> information_table <- data.frame(
+     libor_rate=c(0.01, 0.02, 0.03),
+     tbill_rate=c(0.015, 0.022, 0.033)
+ )
>
> changevalues <- function(x) {
+     y <- x*2
+     return (y)
+ }
>
> information_table<-sapply(information_table, changevalues)
> information_table
     libor_rate tbill_rate
[1,]       0.02      0.030
[2,]       0.04      0.044
[3,]       0.06      0.066

Here's some other introduction page for R
[Other tutorials]
R Tutorial 2 - If~else, File read/write, String manipulation

[Following Study]
I think you learn almost all necessary basic functions to tackle finance problems. It's time to solve real problem. Here's some problems that may enhance your R-skills.

How to Calculate Annuity value in R (Mortgage Problem)


Views: 8686

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Heramb J. Joshi on April 13, 2016 at 6:03am

This is a great starting point for a new learner.

Some corrections...


in Topic <Data Frame>, it says,

if you type data_frame[,2] you can have access to 2nd row... But this gives us the second column transposed...

If you type data_frame[2,] it will give you the first row.

Thanks,

Heramb

Comment by Jeffrey E. Crosby on April 11, 2016 at 11:27am

Often the + is used as a prompt when a command is not complete.  Thanks for posting this!

Comment by Gregory Choi on April 7, 2016 at 2:47pm

Thank you for your correction on my ambiguity! What I mean was that when you type "information_table <- data.frame(" on your command prompt and hit enter button, and then you'll see '+' on your next command prompt. I'll make it correct :)

Comment by Rick Henderson on April 7, 2016 at 10:17am

In the first part of the tutorial you mention that '>' is the command prompt that you don't type, but you also say the "+" is the command prompt and you don't type it, but it is the addition operator so people will need to type it to follow along.

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service