Subscribe to DSC Newsletter

Before starting with R Programming: Basic R without any package installed

Open source software solutions have become so powerful and large corporates started to prepare their traditional business analysts to move to open source softwares, particularly R. I have prepared a basic document to train some of my clients and local communities in Dallas. This article is not intended for people who are exposed to R before; but, people who are new and want to learn ABCs of R.

 

Downloading R is rather simple on https://cran.r-project.org/. Once you download and install it on your computer and start your R you will see a cursor like >. > means that you are ready to have some fun with R and know enough to become dangerous.

 

R has three basic types:  numbers, strings and logical values (True/False). You can start using cursor to do some basic operations with these types.

 

Numbers:

> 4+5

Strings:

> “Hello, Meltem”  

# self centered (if you want to write a note and command out your code you can use #

Logicals

>  5=5

> 5<6

> 6<5

> 2+2 = 4

Check what happens when you do followings:

T==TRUE, T=TRUE & F==FALSE, F=FALSE

You can assign values to some variables in R.

> x=3

OR

>x<-3

This will not list any output and you can see what x is :

>x

Let’s do some operations on x numeric value

>x/2

> x+5

>x=x+7

I am sure some of you already tried. Yes, you can also assign strings to variables:

> y<-”Hello, Meltem”

Functions are the essential parts of R and there are a lot of useful functions in the basic R (without installing any package).

I will not be able to list all of the basic functions; however, I will give you some ways to check available functions and later install packages.

For me the most basic function is the adding list of numbers and R has a function called “sum”.

> sum(2,3,66) # you got it you can assign your result as a variable!!

Another function I like is “rep”. You can repeat some values and create an array in R.

> mydat<-rep(“hello, meltem”, times=3).

Let’s get a little fancy:

>sqrt(sum(2,3,66) #Yes, it is the square root of 2+3+66

If you want to understand some functions you can try:

>help(sum)

>example(min)

Most of the time I want to know which directory (AKA path) I am in and I check where I am at using:

>getwd()

If I want to be in  a particular path I love using:

> setwd(“my path”) # it eliminates time and effort to go back on forth between directories.

One of my to go functions is the “list.files()”

>list.files(path="~/your path" pattern=“*.your extension“)

Let’s say I want to see the word files in my documents.

>list.files(path="~/documents/" pattern=“*.doc“)  

In the second example, we want to find files starting with letter a to l and case sensitive.

> list.files(path="~/documents/", pattern="^[a-l]",ignore.case=FALSE)

Note: list.files() is a good function to practice help and example function.

Practice: I am an avid runner and have a lot of running related files in my computer. So, I want to develop an example with my running files. You pick one of your directories and work with that to see how it goes.

One of the marathons I ran is cowtown and have a lot of files related to cowtown run in my running folder. I am only interested in the pictures taken at the finish line.

> mylist<-list.files(path="~/run/",pattern="*.jpg",ignore.case=FALSE)

>mylist[grep(pattern = "finish", x = mylist,ignore.case=T)]

Note: grep is a command line tool for linux operation systems to specify the regular expressions. It is very useful to have some familiarity with it (HINT: start with help and example).

Before I talk about vectors I want to leave with couple teasers here:

>(5:1)

>(5:1,pi)

HINT: it should give an error.

Vectors are the elements of same basic types (numbers, strings or logical). The vectors defined like c(5:1, pi). The error you ran into was the definition. We tried to define a vector including a descending sequence of integers and the number pi (3.141593).

Try the following

>myvec<-c(1,TRUE,"THREE")

>class(myvec)

All the values are defined as character which is string of characters. It is default to R; but, it might be wrong for your usage.

The last example I gave on functions is actually a decremental vector and used in R frequently.

>c(8:4)

Or

>seq(8,4)

If you want to define the steps:

>seq(8,4,by=-0.4)

>seq(4,8,by=0.4)

HINT: Increment and decrement should be identified by + or -.

In the most use cases we want to access the elements of a vector and I want to talk about that for a short while. Let’s say we have a vector strings:

>myvector<-c("summers","are","always","fun","with","family")

Let’s check the length of our sentence and call the third word.

>length(myvector)

>myvector[3]

I am not very good at grammar; but, let’s add a period to end the sentence.

>myvector<-myvector<-c(myvector,".")

You can add another full sentence and try to access to elements of the vector. HINT: myvector(6,7) and myvector[c(6,7)] are the same.

Another simple vector operation is to name a vector in R and also to change the elements of vector.

For the simplicity let’s go ahead and define a short vector:

>mynamedvec<-c(5,10,34)

>names(mynumbers)<-c("smallest","medium","largest")

>mynumbers[“largest”]<-0 # Bummer the largest element is not the largest anymore!!

The way I like numbers when they are on the images and barplot is one of the prettiest way of showing numbers.

Let’s see the federal spendings of US government for last 10 years.It is publicly available data and someone else already showed it on bar plots (http://www.usgovernmentspending.com/spending_chart_2005_2015USr_XXs...). Since I had all the time in my hands I decided to plot my barplot.

>federal_spending<-c(13.0937,13.8559,14.4776,14.7186,14.4187,14.9644,15.5179,16.1553,16.6632,17.3481,17.947)

>barplot(federal_spending)

It will show the federal spendings between 2005 and 2015. In order to make the plot more meaningful we can name the spendings by year.

Even though, I have enough time to write the years one by one I am way too lazy. So, I will show you the shortcut:

>years<-c(2005:2015)

>names(federal_spending)<-years

>barplot(federal_spending/1000,col="red",xlab="Years",ylab="Scaled spending by 1000 in $")

There is a modest decrease in GDP federal spending in 2009; however, the numbers picked it up and climbed up the $17 Trillions level in 2015. If you want you can work with numbers and speculate further. I will stop here until I start writing another piece on data frames.

Views: 2458

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service