Anda di halaman 1dari 15

MAN 555 MULTIVARIATE STATISTICS HANDOUT 1

A SHORT MANUAL FOR R


1. INTRODUCTION
Download R from the R-Project website: www.r-project.org
For R manuals, go to: R->Help->Manuals (in PDF) -> An Introduction to R
-> Data Import/Export
To get help on any R function, type ? followed by the name of the function.
Ex: ?mean
To search for a topic, type: help.search(normal distribution)
2. BASIC OPERATIONS
You can use R as a calculator.
> # Note that R skips a line starting with "#"
> 2+3
[1] 5
> 3^2
[1] 9
> sin(pi/6)
[1] 0.5
> a<-5
> b<-3
> a*b
[1] 15
> exp(1)
[1] 2.718282
> sqrt(4)
[1] 2

3. VECTORS AND MATRICES


A simple way to define a vector:
> a<-c(3,8,12,23,54)
>a
[1] 3 8 12 23 54
> a[1]
[1] 3
> a[5]
[1] 54
> length(a)
[1] 5
> 2*a
[1] 6 16 24 46 108
> a^2
[1] 9 64 144 529 2916
> a-3
[1] 0 5 9 20 51
> a/2
[1] 1.5 4.0 6.0 11.5 27.0
> sum(a)
[1] 100

(Just Luck!)

Here is how we define a matrix. Give all the entries and the dimensions. Pay attention to
how R assigns the numbers to cell entries.
> b=matrix(c(1,2,3,4,5,6),3,2)
>b
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
> b[2,1]
[1] 2
> b[3,2]

[1] 6
> t(b)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
> 2*b
[,1] [,2]
[1,] 2 8
[2,] 4 10
[3,] 6 12
> sum(b)
[1] 21
You can slice the rows or columns of matrix b.
> b[,1]
[1] 1 2 3
> b[3,]
[1] 3 6
You can generate regular sequences with R.
> a=1:10
>a
[1] 1 2 3 4 5 6 7 8 9 10
> b<-seq(1,10,by=1)
>b
[1] 1 2 3 4 5 6 7 8 9 10
> seq(-2,2,by=0.2)->c
>c
[1] -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8
[16] 1.0 1.2 1.4 1.6 1.8 2.0

4. LOGICAL STATEMENTS AND FOR LOOPS


The logical statements work like:
if (something is true) {do this}
Ex:
> a=5
> if (2<3) {a=a^2}
>a
[1] 25
> a=5
> if (2>3) {a=a^2}
>a
[1] 5
For Loops are used for repetitive executions. They work like:
For (these values of an index) {do this}
Ex:
> a=1:5
>a
[1] 1 2 3 4 5
> sum=0
> for (i in 1:5) {sum=sum+a[i]}
> sum
[1] 15
Note: Normally we just use sum(a) for such a summation.

5. BASIC PLOTS
> x=c(1,2,3,4,5)
> y=c(2,3,3.3,4.4,6)
> plot(x,y)

> x=seq(-6,6,by=0.1)
> y=x^2
> plot(x,y)

x=seq(-2*pi,2*pi,by=0.05)
> y=sin(x)
> plot(x,y,type="l")

6. DESCRIPTIVE STATISTICS
> x=c(5,4,2,1,3)
> mean(x)
[1] 3
> min(x)
[1] 1
> max(x)
[1] 5
> sort(x)
[1] 1 2 3 4 5
> median(x)
[1] 3
> sd(x)
[1] 1.581139
> var(x)
[1] 2.5
Standardize the data points
> sx=(x-mean(x))/sd(x)
> sx
[1] 1.2649111 0.6324555 -0.6324555 -1.2649111 0.0000000
Simple Linear Regression
> x=c(1,2,3,4,5)
> y=c(2,3,3.3,4.4,6)
> lm(y~x)
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept)
x
0.92
0.94
> plot(x,y)
> abline(lm(y~x))

7. RANDOM NUMBER GERNERATION


You can generate random numbers from well-known distributions using R.
Let us generate normal observations. The function we need is rnorm. The arguments of
this function are rnorm(sample size, mean, standard deviation). Type ?rnorm for help,
and also familiarize yourself with the help files of R. Try ?mean, ?seq, ?plot, etc
x=rnorm(100,2,4)
>x
[1] 7.00920380 0.89544292 1.11494166 2.23874095 2.34608868 4.63683431
[7] -5.74015100 10.04406693 6.00339253 -0.49875786 6.72831867 6.43631302
[13] -2.89722723 -3.24025063 4.08147886 0.83126034 5.94573466 5.47021533
[19] 1.00893045 -5.23225438 5.24711541 -1.20403596 0.87351557 -6.63826023
[25] 9.93123589 3.24957367 4.30184227 -0.85836402 -0.17998849 3.55573809
[31] -1.77535333 0.95957929 4.00390875 -1.75561633 -2.64425412 -1.97298189
[37] -2.96378153 -4.79864973 -0.41589372 -2.44137402 0.80500439 10.46648769
[43] 0.22298239 0.99368460 6.14366443 -0.98620661 4.05836224 -4.50692541
[49] 2.86500150 -4.05145320 4.33389987 6.94099516 -1.23494220 1.85258991
[55] 2.74636843 5.57507110 -2.12132808 -0.47312972 6.83258367 10.67801454
[61] 6.32845876 1.34148309 -1.26282137 2.13439048 6.96459583 -2.16817618
[67] 2.42898597 6.06420545 0.86332410 -8.12640186 1.68979043 5.51642595
[73] 1.13842998 4.73662907 2.66721494 0.94106871 1.66568516 3.94952422
[79] -3.91050838 -1.61665526 2.41700500 3.77226199 0.57492320 4.34364971
[85] 6.58262410 -0.87071441 -0.43844571 3.53671501 3.47224236 0.66874516
[91] 1.04185379 4.81033118 2.66605092 1.05299818 -1.28912612 -7.56619217
[97] 5.30682776 -0.02236963 4.93891849 5.73489180

> hist(x)

> mean(x)
[1] 1.748758
> sd(x)
[1] 3.938521
>boxplot(x)

A Q-Q plot to see normality.


>qqnorm(x)

Shapiro-Wilk test for testing normality (Null Hyppothesis: The data comes from a normal
distribution).
> shapiro.test(x)
Shapiro-Wilk normality test
data: x
W = 0.9901, p-value = 0.6733

Here, p-value is very large. No evidence for rejecting the null hypothesis that this data
comes from a normal distribution.
Now let us generate numbers from Beta distribution.
y=rbeta(100,1,4)
> hist(y)

> boxplot(y)

>qqnorm(y)

> shapiro.test(y)
Shapiro-Wilk normality test
data: y
W = 0.8616, p-value = 3.249e-08
Here p-value is very small, the null hypothesis of normality is rejected.

7. MATRIX ALGEBRA
> a=matrix(c(1,2,3,4),2,2)
>a
[,1] [,2]
[1,] 1 3
[2,] 2 4
> b=matrix(c(10,20,30,40),2,2)
>b
[,1] [,2]
[1,] 10 30
[2,] 20 40
> a+b
[,1] [,2]
[1,] 11 33
[2,] 22 44
> a-b
[,1] [,2]
[1,] -9 -27
[2,] -18 -36
Transpose
> t(a)
[,1] [,2]
[1,] 1 2
[2,] 3 4

Element by element multiplication:


> a*b
[,1] [,2]
[1,] 10 90
[2,] 40 160
Matrix multiplication:
> a%*%b
[,1] [,2]
[1,] 70 150
[2,] 100 220
Matrix Inverse
> a=matrix(c(1,1,-1,-4,1,1,1,-2,1),3,3)
>a
[,1] [,2] [,3]
[1,] 1 -4 1
[2,] 1 1 -2
[3,] -1 1 1
> ainv=solve(a)
> ainv
[,1] [,2] [,3]
[1,] 3 5 7
[2,] 1 2 3
[3,] 2 3 5
Let us check if a*ainv=Identity matrix
> a%*%ainv
[,1]
[1,] 1.000000 e+00
[2,] 0.000000 e+00
[3,] 4.440892 e-16

[,2]
[,3]
0.000000 e+00
0
1.000000 e+00
0
8.881784 e-16
1

Eigenvalues and eigenvectors


> a=matrix(c(3,2,1,-11,-8,-3,16,8,2),3,3)
>a
[,1] [,2] [,3]
[1,] 3 -11 16
[2,] 2 -8 8
[3,] 1 -3 2

Good enough!

> eigen(a)
$values
[1] -3 2 -2
$vectors
[,1] [,2]
[,3]
[1,] -0.4082483 0.9370426 -0.5773503
[2,] -0.8164966 0.3123475 0.5773503
[3,] -0.4082483 0.1561738 0.5773503
Now let us check if a*v1=e1*v1. First we must extract the first column of the above
matrix.
> L=eigen(a)
> L$vectors
[,1]
[,2]
[,3]
[1,] -0.4082483 0.9370426 -0.5773503
[2,] -0.8164966 0.3123475 0.5773503
[3,] -0.4082483 0.1561738 0.5773503
> v1=L$vectors[,1]
> v1
[1] -0.4082483 -0.8164966 -0.4082483
> a%*%v1
[,1]
[1,] 1.224745
[2,] 2.449490
[3,] 1.224745
> -3*v1
[1] 1.224745 2.449490 1.224745
Singular Value Decomposition
>a
[,1] [,2] [,3]
[1,] 3 -11 16
[2,] 2 -8 8
[3,] 1 -3 2
> svd(a)
$d
[1] 22.9607876 2.1782253 0.2399339

OK!

$u
[,1]
[,2]
[,3]
[1,] -0.8544093 0.4892724 0.1749207
[2,] -0.4963077 -0.6688083 -0.5535107
[3,] -0.1538291 -0.5597392 0.8142657
$v
[,1]
[,2]
[,3]
[1,] -0.1615656 -0.1971966 0.96695919
[2,] 0.6023509 0.7564359 0.25490812
[3,] -0.7817096 0.6236331 -0.00343243
Matrix Determinant
>a
[,1] [,2] [,3]
[1,] 3 -11 16
[2,] 2 -8 8
[3,] 1 -3 2
> det(a)
[1] 12
8. OPENING AND SAVING AN R SCRIPT
Typically we do not work on R console. Instead, we program everything on a script, save
our program, and then run the program.
First, let us clear the console by going to R->Misc->Remove all objects.
Alternatively, you can type:
> rm(list=ls(all=TRUE))
Open a new script by going to R->File->New Script
Type your R code:

Save your work by going to R->File->Save

You can create a folder (For Example C:\Program Files\R\my codes\MAN656) for your
MAN 656 files:

Now run your code by going to R->edit->run all (script window must be active to see this
menu). Here is the result:

Anda mungkin juga menyukai