Subscribe to Dr. Granville's Weekly Digest

R Tutorial for Beginners: A Quick Start-Up Kit


Learn R: A Statistical Programming Language


Here's my quick start-up kit for you.
  1. Install R
    1. Linux: "sudo apt-get install r-base" should do it
    2. Windows: go get it here
  2. Open a Script Windows alongside the Console window when you run R
    1. It should look something like this. Your Console allows typing direct, hit <enter> and R runs the line. If it goes to prompt (the Red ">"), then that command processed.
      Console and Script windows
    2. Your script file is for typing in as much as you want. To run whatever is there, highlight what you want to run and hit Ctrl+R or the icon on top. It will run in the console.
    3. This basic setup is useful over to begin.
  3. The quickest approach is to go to the Appendix of the Intro Manual and walk though typing in all the commands to see how it basically works. You'll see quickly that you feed equations, functions, values, objects, etc. from the right to the named variable or object on the left using the " <- " characters.


After you get the very basics, one of the first things you'll want to be able to do is load a file. Use the sample data provided below for a starter to practice on.


Set your working directory


From Console window:
File > Change dir...
In Console, type: "dir()" <enter>
You should see your file there you've saved as "mydata.csv" (the below data cut n pasted and saved as such in a folder). If not, change your working directory till it is right.


Load and view your data with:

> read.csv("mydata.csv")

Do it again, but load it into a variable named "data" and then view it by just typing "data" like this:
> data <- read.csv("mydata.csv")
> data

Now you're ready to start manipulating the data by pulling out only subset of it:
Pull a single column (hash symbol is for comments).
> oz <- x$Ozone ## pulled out Ozone column into a vector

Look at it by typing in the variable object "oz" and <enter>
Now let go find all the values  in oz where we have no data (i.e. "NA") and let's put that into a vector object called "badoz":
> badoz <- is.na(oz) ## pulled na's from column

Now let's use "oz" and "badoz" to give me everything in the Ozone column ("oz") that DOESN'T have bad data by using the exclamation character with means "NOT this thing".
> oz[!badoz] ## found index where by all that's not NA

You've just filtered out a whole column and then eliminated all the bad data from that column into it's own data set.

Let's find the average of this data set:
> mean(oz[!badoz])

Let's get a summary:
> summary(oz[!badoz])

Make another subset of data from the original where Ozone was more than 31 and Temperature was over 90 degrees:
> myY <- subset(data, (Ozone > 31 & Temp > 90))

Let's get only the month of May into a subset:
> mM <- subset(x, Month==5) ##find subset month of May

So by now you get the idea here. The rest is mostly syntax.


Next thing you will start wanting to do is to write FUNCTIONS to manipulate your data.


Basic syntax is like this:
myfunction <- function() {
x <- rnorm(100)
mean (x)
}

Let's write a function that will accept two parameters: 1.) Filename and 2:) Folder and return to us that data. We'll add another parameter initially set to FALSE that will return to us a summary of this data also when set to TRUE. This would be done something like this:

getdata <- function(id, directory, summarize=FALSE) {
filelist <- dir(directory);
file <- paste(directory,filelist,sep='/')
data <- read.csv(file)
if(summarize == "TRUE") {
mySum <- summary(data)
print(mySum)
}
return(data)
}

Now to run this function in memory after this you would simply call it, adding the filename and directory and option TRUE (or leave it off since it auto-sets to false) like this:

> getdata("mydata.csv", "datafolder", TRUE)

Save your function code as getdata.R and reuse it in other functions and as a basis to expand upon as you further explore and get better at using R.

Enjoy!
Mitchell
*********************


##############
##Here is a cut n' paste data set to work with. Save it as "mydata.csv":
##############

"Ozone","Solar.R","Wind","Temp","Month","Day"
41,190,7.4,67,5,1
36,118,8,72,5,2
12,149,12.6,74,5,3
18,313,11.5,62,5,4
NA,NA,14.3,56,5,5
28,NA,14.9,66,5,6
23,299,8.6,65,5,7
19,99,13.8,59,5,8
8,19,20.1,61,5,9
NA,194,8.6,69,5,10
7,NA,6.9,74,5,11
16,256,9.7,69,5,12
11,290,9.2,66,5,13
14,274,10.9,68,5,14
18,65,13.2,58,5,15
14,334,11.5,64,5,16
34,307,12,66,5,17
6,78,18.4,57,5,18
30,322,11.5,68,5,19
11,44,9.7,62,5,20
1,8,9.7,59,5,21
11,320,16.6,73,5,22
4,25,9.7,61,5,23
32,92,12,61,5,24
NA,66,16.6,57,5,25
NA,266,14.9,58,5,26
NA,NA,8,57,5,27
23,13,12,67,5,28
45,252,14.9,81,5,29
115,223,5.7,79,5,30
37,279,7.4,76,5,31
NA,286,8.6,78,6,1
NA,287,9.7,74,6,2
NA,242,16.1,67,6,3
NA,186,9.2,84,6,4
NA,220,8.6,85,6,5
NA,264,14.3,79,6,6
29,127,9.7,82,6,7
NA,273,6.9,87,6,8
71,291,13.8,90,6,9
39,323,11.5,87,6,10
NA,259,10.9,93,6,11
NA,250,9.2,92,6,12
23,148,8,82,6,13
NA,332,13.8,80,6,14
NA,322,11.5,79,6,15
21,191,14.9,77,6,16
37,284,20.7,72,6,17
20,37,9.2,65,6,18
12,120,11.5,73,6,19
13,137,10.3,76,6,20
NA,150,6.3,77,6,21
NA,59,1.7,76,6,22
NA,91,4.6,76,6,23
NA,250,6.3,76,6,24
NA,135,8,75,6,25
NA,127,8,78,6,26
NA,47,10.3,73,6,27
NA,98,11.5,80,6,28
NA,31,14.9,77,6,29
NA,138,8,83,6,30
135,269,4.1,84,7,1
49,248,9.2,85,7,2
32,236,9.2,81,7,3
NA,101,10.9,84,7,4
64,175,4.6,83,7,5
40,314,10.9,83,7,6
77,276,5.1,88,7,7
97,267,6.3,92,7,8
97,272,5.7,92,7,9
85,175,7.4,89,7,10
NA,139,8.6,82,7,11
10,264,14.3,73,7,12
27,175,14.9,81,7,13
NA,291,14.9,91,7,14
7,48,14.3,80,7,15
48,260,6.9,81,7,16
35,274,10.3,82,7,17
61,285,6.3,84,7,18
79,187,5.1,87,7,19
63,220,11.5,85,7,20
16,7,6.9,74,7,21
NA,258,9.7,81,7,22
NA,295,11.5,82,7,23
80,294,8.6,86,7,24
108,223,8,85,7,25
20,81,8.6,82,7,26
52,82,12,86,7,27
82,213,7.4,88,7,28
50,275,7.4,86,7,29
64,253,7.4,83,7,30
59,254,9.2,81,7,31
39,83,6.9,81,8,1
9,24,13.8,81,8,2
16,77,7.4,82,8,3
78,NA,6.9,86,8,4
35,NA,7.4,85,8,5
66,NA,4.6,87,8,6
122,255,4,89,8,7
89,229,10.3,90,8,8
110,207,8,90,8,9
NA,222,8.6,92,8,10
NA,137,11.5,86,8,11
44,192,11.5,86,8,12
28,273,11.5,82,8,13
65,157,9.7,80,8,14
NA,64,11.5,79,8,15
22,71,10.3,77,8,16
59,51,6.3,79,8,17
23,115,7.4,76,8,18
31,244,10.9,78,8,19
44,190,10.3,78,8,20
21,259,15.5,77,8,21
9,36,14.3,72,8,22
NA,255,12.6,75,8,23
45,212,9.7,79,8,24
168,238,3.4,81,8,25
73,215,8,86,8,26
NA,153,5.7,88,8,27
76,203,9.7,97,8,28
118,225,2.3,94,8,29
84,237,6.3,96,8,30
85,188,6.3,94,8,31
96,167,6.9,91,9,1
78,197,5.1,92,9,2
73,183,2.8,93,9,3
91,189,4.6,93,9,4
47,95,7.4,87,9,5
32,92,15.5,84,9,6
20,252,10.9,80,9,7
23,220,10.3,78,9,8
21,230,10.9,75,9,9
24,259,9.7,73,9,10
44,236,14.9,81,9,11
21,259,15.5,76,9,12
28,238,6.3,77,9,13
9,24,10.9,71,9,14
13,112,11.5,71,9,15
46,237,6.9,78,9,16
18,224,13.8,67,9,17
13,27,10.3,76,9,18
24,238,10.3,68,9,19
16,201,8,82,9,20
13,238,12.6,64,9,21
23,14,9.2,71,9,22
36,139,10.3,81,9,23
7,49,10.3,69,9,24
14,20,16.6,63,9,25
30,193,6.9,70,9,26
NA,145,13.2,77,9,27
14,191,14.3,75,9,28
18,131,8,76,9,29
20,223,11.5,68,9,30

Views: 10349

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Mitchell A. Sanders on November 27, 2013 at 5:07am

Erratum: code line above "> oz <- x$Ozone ## pulled out Ozone column into a vector" should be instead:

> oz <- data$Ozone ## pulled out Ozone column into a vector

as there is no x variable yet and "data" is the frame variable holding data, not x.

Comment by Mitchell A. Sanders on October 28, 2013 at 3:10pm

I totally agree Dr. Z. Thanks for bringing that up. Link here.

Comment by Dr. Z on October 28, 2013 at 5:00am

The R GUI is quite unappealing. To facilitate working with R and learning its various functions I'd strongly recommend RStudio as an IDO. It's free and quite robust as a GUI, not to mention as a way to keep your projects organized and efficient. You can learn in about a week (tops).

Follow Us

Videos

  • Add Videos
  • View All

© 2014   Data Science Central

Badges  |  Report an Issue  |  Terms of Service