Junvie Pailden, Ph.D.

May 27, 2014

SIUE - Stat 575 - Summer 2014

R offers a powerful and appealing interactive environment for exploring data, running simulations, etc.

R is platform independent meaning it is available on Windows, Mac, and Linux.

R has the best help resources both online (just google any issue/question) and using help(…), e.g. help(lm).

R is not a black box software, i.e., you can trace how a function or package works by following the R script, e.g. lm()

Many more!!!

Download R from http://cran.us.r-project.org/

Install R. Leave all default settings in the installation options.

Download RStudio from [http://rstudio.org/download/desktop] and install it. Leave all default settings in the installation options.

Open RStudio.

```
# create an integer sequence
3:7
```

```
[1] 3 4 5 6 7
```

```
# create an sequence from 0 to 3 with 0.5 increment
seq(0,3,by=0.5)
```

```
[1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0
```

```
# create a repeated sequence
rep(pi,4)
```

```
[1] 3.142 3.142 3.142 3.142
```

Basic Operations

```
17+3+1
```

```
[1] 21
```

```
(2-3)*4
```

```
[1] -4
```

```
c(6,20,-3) # numbers
```

```
[1] 6 20 -3
```

```
c("words","are","wind") # strings
```

```
[1] "words" "are" "wind"
```

Operations

```
c(1,2,3,4) + 1
```

```
[1] 2 3 4 5
```

```
1/ c(1,2,3,4)
```

```
[1] 1.0000 0.5000 0.3333 0.2500
```

```
c(1,2,3,4)^2
```

```
[1] 1 4 9 16
```

Variable

```
# assignment
x <- 3
# is the same as
3 -> x
# and
x = 3
```

In this class, we will use `<-`

for convenience. Be careful with `=`

because it does not mean equals. For that, you need `==`

operator

```
one <- 1
two <- 2
```

```
one = two # This means: assign the value of "two" to the variable "one"
```

```
one
```

```
[1] 2
```

```
two
```

```
[1] 2
```

Let's start again

```
one <- 1
two <- 2
```

```
one == two # This means: does the value of "one" equals the value of "two"
```

```
[1] FALSE
```

```
a <- sqrt(2); b <- 1:3; c <- 2:4
```

Scalar addition and multiplication

```
a + b
```

```
[1] 2.414 3.414 4.414
```

```
a * b
```

```
[1] 1.414 2.828 4.243
```

Entrywise multiplication

```
b * c
```

```
[1] 2 6 12
```

`x`

modulus `y`

```
17 %% 5
```

```
[1] 2
```

Integer Division

```
17 %/% 5
```

```
[1] 3
```

General Form

```
f(argument1, argument2,...)
```

`sum(),mean(),sd()`

```
b <- c(1,2,3)
sum(b)
```

```
[1] 6
```

```
mean(b)
```

```
[1] 2
```

```
sd(b)
```

```
[1] 1
```

`exp(),cos(),log()`

```
exp(1)
```

```
[1] 2.718
```

```
cos(3.141593)
```

```
[1] -1
```

```
log2(1)
```

```
[1] 0
```

```
log(x=64,base=4)
```

```
[1] 3
```

```
height <- 58:72
weight <- c(115,117,120,123,126,129,132,135,139,142,146,150,154,159,164)
hbar <- mean(height); hbar # mean of height, OR
```

```
[1] 65
```

```
n <- length(height);
sum(height)/n
```

```
[1] 65
```

```
var(height) # variance of height, OR
```

```
[1] 20
```

```
sum((height-hbar)^2)/(n-1)
```

```
[1] 20
```

Find the correlation of height and weight?

```
height <- 58:72
weight <- c(115,117,120,123,126,129,132,135,139,142,146,150,154,159,164)
# size
n <- length(height)
# mean
hbar <- mean(height)
wbar <- mean(weight)
# standard deviation
sdh <- sd(height)
sdw <- sd(weight)
# correlation coefficient
r <- sum((height-hbar)*(weight-wbar))/(sdh*sdw*(n-1))
```

Find the correlation of height and weight?

```
height <- 58:72
weight <- c(115,117,120,123,126,129,132,135,139,142,146,150,154,159,164)
n <- length(height)
hbar <- mean(height)
wbar <- mean(weight)
sdh <- sd(height)
sdw <- sd(weight)
r <- sum((height-hbar)*(weight-wbar))/(sdh*sdw*(n-1))
# printing the results
print(c(n,hbar,wbar,sdh,sdw,r))
```

```
[1] 15.0000 65.0000 136.7333 4.4721 15.4987 0.9955
```

```
# lazy way
cor(height,weight)
```

```
[1] 0.9955
```

General Form

```
function(arglist) expr
return(value)
```

I want a function that will add two numbers

```
my_fun <- function(x,y){
x + y
}
my_fun(1,2)
```

```
[1] 3
```

Body of the function does not need to be in separate lines. If the body of the function is only one line, then braces aren't necessary.

```
my_fun2 <- function(x,y) x + y
my_fun2(1,2)
```

```
[1] 3
```

I can set default values, say `y=5`

```
my_fun2 <- function(x,y=5) x + y
my_fun2(1)
```

```
[1] 6
```

The `sapply()`

function accepts a list and a function, then applies the function to every element of that list and returns the result.

Because functions are also objects, I can pass a function into another function as the argument.

```
l <- 1:5
sapply(l, my_fun2)
```

```
[1] 6 7 8 9 10
```

Write a function that computes the correlation between two variables!

```
my_corr <- function(a,b){
# size
n <- length(a)
# mean
abar <- mean(a)
bbar <- mean(b)
# standard deviation
sda <- sd(a)
sdb <- sd(b)
# correlation coefficient
r <- sum((a-abar)*(b-bbar))/(sda*sdb*(n-1))
return(r)
}
```

```
height <- 58:72
weight <- c(115,117,120,123,126,129,132,135,139,142,146,150,154,159,164)
my_corr(height,weight)
```

Write a function that computes the correlation between two variables!

```
my_corr <- function(a,b){
# size
n <- length(a)
# mean
abar <- mean(a)
bbar <- mean(b)
# standard deviation
sda <- sd(a)
sdb <- sd(b)
# correlation coefficient
r <- sum((a-abar)*(b-bbar))/(sda*sdb*(n-1))
return(r)
}
```

```
height <- 58:72
weight <- c(115,117,120,123,126,129,132,135,139,142,146,150,154,159,164)
my_corr(height,weight)
```

```
[1] 0.9955
```

There are a few special values that are used in R

The `NA`

values are used to represent missing values. You may encounter `NA`

values in text loaded in R or in data loaded from databases (to replace `NULL`

values).

```
v <- c(1,2,3)
v
```

```
[1] 1 2 3
```

```
length(v) <- 4
v
```

```
[1] 1 2 3 NA
```

Expanding the size of a vector (matrix, array) beyond the size where values are defined.

If a computation results in a number that is too big, R will return `Inf`

for a positive and `-Inf`

for a negative.

```
2^1024
```

```
[1] Inf
```

```
-2 ^ 1024
```

```
[1] -Inf
```

```
1/0
```

```
[1] Inf
```

```
Inf-Inf # will return `NaN`
```

```
[1] NaN
```

A list, in R use `list()`

, is an ordered collection of objects of possibly different types. Lists are frequently used to return several results of a function in a single object.

```
arya <- list(name='Arya of Winterfell',age=11,northman=TRUE)
arya
```

```
$name
[1] "Arya of Winterfell"
$age
[1] 11
$northman
[1] TRUE
```

You can see that the name of each item is preceded by a `$`

. You can then reference each item in the list by its position or its name:

```
arya[1]
```

```
$name
[1] "Arya of Winterfell"
```

```
arya$name
```

```
[1] "Arya of Winterfell"
```

```
arya$age>15
```

```
[1] FALSE
```

A matrix is a two-dimensional array. Matrices (same as vectors) can hold elements only of the same type.

```
# 2 by 4 matrix
m <- matrix(1:20,nrow=5,ncol=4)
m
```

```
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
```

By default, the matrix is populated by column

```
m <- matrix(1:20,nrow=5,ncol=4,byrow=TRUE)
m
```

```
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
[4,] 13 14 15 16
[5,] 17 18 19 20
```

To access the matrix, use square brackets.

```
m[10] # 10th entry columnwise
```

```
[1] 18
```

```
m[3,4] # entry on 3rd row, 4th column
```

```
[1] 12
```

```
m[3:5] # 3rd to 4th entry columnwise
```

```
[1] 9 13 17
```

```
m[3:5,2:3] # entry fromt 3rd thru 5th rows and 2nd thru 3rd columns
```

```
[,1] [,2]
[1,] 10 11
[2,] 14 15
[3,] 18 19
```

You can also give names to each row and each column using `dimnames()`

.

```
dimnames(m) <- list(c('a','b','c','d','e'),c('p','q','r','s'))
m
```

```
p q r s
a 1 2 3 4
b 5 6 7 8
c 9 10 11 12
d 13 14 15 16
e 17 18 19 20
```

Combine objects by rows `rbind()`

or columns `cbind()`

```
S <- rbind(rep(FALSE,5),rep(NA,5))
rownames(S) <- c('All False','All NA')
S
```

```
[,1] [,2] [,3] [,4] [,5]
All False FALSE FALSE FALSE FALSE FALSE
All NA NA NA NA NA NA
```

An array is an extention of the vector to more than two dimensions.

```
# 2 by 2 by 2 array
A <- array(1:16,c(2,4,2))
A
```

```
, , 1
[,1] [,2] [,3] [,4]
[1,] 1 3 5 7
[2,] 2 4 6 8
, , 2
[,1] [,2] [,3] [,4]
[1,] 9 11 13 15
[2,] 10 12 14 16
```

Interchange the first two subscripts on a 3-way array A

```
At <- aperm(A, c(2,1,3))
At
```

```
, , 1
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
[4,] 7 8
, , 2
[,1] [,2]
[1,] 9 10
[2,] 11 12
[3,] 13 14
[4,] 15 16
```

Values can be *nominal*,*ordinal*, or *continuous*. In R, nominal and ordinal values are represented by `factor()`

```
houses <- c('Stark','Lannister','Tully','Arryn','Tyrells','Baratheon','Martell')
factor(houses)
```

```
[1] Stark Lannister Tully Arryn Tyrells Baratheon Martell
Levels: Arryn Baratheon Lannister Martell Stark Tully Tyrells
```

By default, factor levels are created in alphabetical order.

```
factor(houses,order=TRUE,levels=houses)
```

```
[1] Stark Lannister Tully Arryn Tyrells Baratheon Martell
7 Levels: Stark < Lannister < Tully < Arryn < Tyrells < ... < Martell
```

Applies a function to sections of an array (or matrix) and returns the results in an array (or matrix).

```
apply(array, margin, function, ...)
```

The margin argument is used to specify which margin we want to apply the function to and which margin we wish to keep.

```
mat1 <- matrix(rep(seq(4), 4), ncol = 4)
mat1
```

```
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] 2 2 2 2
[3,] 3 3 3 3
[4,] 4 4 4 4
```

```
#row sums of mat1, margin is 1
apply(mat1, 1, sum)
```

```
[1] 4 8 12 16
```

```
#column sums of mat1, margin is 2
apply(mat1, 2, sum)
```

```
[1] 10 10 10 10
```

```
#using a user defined function
sum.plus.2 <- function(x){
sum(x) + 2
}
#using the sum.plus.2 function on the rows of mat1
apply(mat1, 1, sum.plus.2)
```

```
[1] 6 10 14 18
```

A data frame is a data structure we will be using most often in this class. A data frame is a list that contains multiple named vectors of the same length. Whereas we usually use spreadsheet or database table by row, data frames are constructed by columns.

```
# head displays the returns the first parts of the data frame "cars""
head(cars)
```

```
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
```

```
# faster summary measures
summary(cars)
```

```
speed dist
Min. : 4.0 Min. : 2
1st Qu.:12.0 1st Qu.: 26
Median :15.0 Median : 36
Mean :15.4 Mean : 43
3rd Qu.:19.0 3rd Qu.: 56
Max. :25.0 Max. :120
```

General Form

```
if (arglist satisfies) {
do this one
} else {
do this two
}
```

Create a function that tells you whether a variable is greater than 20 or not

```
my_cond <- function(x){
if (x > 20) {
print("x is greater than 20")
}
else {
print("x is less than 20")
}
}
x <- 10
my_cond(x)
```

```
[1] "x is less than 20"
```

R has three forms of loops.

The first is `repeat`

w/c repeats a particular expression until it hits a break keyword.

```
x <- 0
repeat{if (x>10) break
else {print(x); x <- x+1}
}
```

```
[1] 0
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
```

Within the outermost braces is an

`if-else`

expression:`if (x>10) break else {print(x); x <- x+1}`

. The inner set of braces is part of the`else`

clause:`print(x); x <- x+1`

.The semicolon separates the clause into two parts. The first is

`print`

statement, and the second increments`x`

so that the condition that termintes the loop,`x>10`

, is eventually satisfied.

```
x <- 0
while (x < 10) {print (x); x <- x + 1}
```

```
[1] 0
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
```

R loops iterate through each item in a vector or a list:

```
x <- 0
for (x in 1: 10) print(x)
```

```
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
```

The colon creates a vector, passing each integer from 1 to 10 to the loop.

```
len <- 10
fibvals <- numeric(len) # creates a vector of 0's of length 10
fibvals
```

```
[1] 0 0 0 0 0 0 0 0 0 0
```

```
fibvals[1] <- 1
fibvals[2] <- 1
for (i in 3:len) {
fibvals[i] <- fibvals[i-1]+fibvals[i-2]
}
fibvals
```

```
[1] 1 1 2 3 5 8 13 21 34 55
```

- Create a function that returns a Fibonaccy sequence of any length.
- Create a function that returns a sequence of odd numbers of any length.

- An R package is a set of related functions and help files, bundled together.
- It is similar to libraries in C or toolbox in Matlab.
- Normally, all functions within a single package are related: for example, the
`stats`

package contains functions for statistical analysis. - There are few public repositories of packages: the largest is CRAN hosted by the R foundation with more than 4000 packages, and is mirrored in many sites worldwide. Of course, you need internet connection to do this.
- To use a package, you first need to install it into R.
- If you're using the R console user interface, you can use the package installer from the menu.
- You can also install R packages directly through R console using
`install.packages()`

. - To load up an R package, use the
`library()`

There are many ways to create a scatterplot in R. The basic function is `plot(x, y)`

, where x and y are numeric vectors denoting the `(x,y)`

points to plot.

```
# Simple Scatterplot
attach(mtcars)
plot(wt, mpg, main="Scatterplot Example",
xlab="Car Weight ", ylab="Miles Per Gallon ", pch=19)
```

There are many ways to create a scatterplot in R. The basic function is `plot(x, y)`

, where x and y are numeric vectors denoting the `(x,y)`

points to plot.

```
# Simple Scatterplot
plot(wt, mpg, main="Scatterplot Example",
xlab="Car Weight ", ylab="Miles Per Gallon ", pch=19)
# Add fit lines
abline(lm(mpg~wt), col="red") # regression line (y~x)
lines(lowess(wt,mpg), col="blue") # lowess line (x,y)
```

```
names(mtcars)
```

```
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
[11] "carb"
```

```
# Consider only the variables mpg, disp, drat, and wt
pairs(~mpg+disp+drat+wt,data=mtcars,
main="Simple Scatterplot Matrix")
```

Boxplots can be created for individual variables or for variables by group. The format is `boxplot(x, data=)`

, where `x`

is a formula and `data=`

denotes the data frame providing the data.

```
# Boxplot of MPG by Car Cylinders
boxplot(mpg~cyl,data=mtcars, main="Car Milage Data",
xlab="Number of Cylinders", ylab="Miles Per Gallon")
```

```
# Dotplot: Grouped Sorted and Colored
# Sort by mpg, group and color by cylinder
x <- mtcars[order(mtcars$mpg),] # sort by mpg
x$cyl <- factor(x$cyl) # it must be a factor
x$color[x$cyl==4] <- "red"
x$color[x$cyl==6] <- "blue"
x$color[x$cyl==8] <- "darkgreen"
dotchart(x$mpg,labels=row.names(x),cex=.7,groups= x$cyl,
main="Gas Milage for Car Models\ngrouped by cylinder",
xlab="Miles Per Gallon", gcolor="black", color=x$color)
```