Anda di halaman 1dari 41

Introduction to R:

 Why we use R for statistical computing and graphics?


 Which companies are using R?
 Application of R program in real world
R reserved words

R data types and constants

R data types:

 Logical ,numerical,integer,complex,character and raw data


R constants:

 Numeric
 Character
 Built in
R data structures

 Vector
 List
 Matrices
 Arrays
 Factors
1. How to create factors?
2. How to access components of a factors?
 Data frames
1. How to create dataframe in R?
2. How to access components of a data frame?
Using rbind() and Column bind cbind()/ Installing pakages

R flow controls (loops and if then else) / import data sets/ How to read csv files?

R charts and graphs

 Pie chart
 Bar chart
 Box plot
 Histogram
Histogram with added parameters
 Line graphs
 Scatter plots
 Strip charts
R statistical functions
 Basic statistical functions: mean,median,mode,average,min,max
 Correlation and Linear regression , multilinear regression functions
 ANOVA functions
Mode of teaching: Lab sessions

Introduction to R programming:

• R is a programming language and environment commonly used in statistical


computing, data analytics and scientific research.

• It is one of the most popular languages used by statisticians, data analysts, researchers
and marketers to retrieve, clean, analyze, visualize and present data.

• Due to its expressive syntax and easy-to-use interface, it has grown in popularity in
recent years.

Why we use R programming for statistical computing and graphics?

• R is open source and free!

• R is popular – and increasing in popularity

• R runs on all platforms

• Learning R will increase your chances of getting a job

• R is being used by the biggest tech giants

History of R

 John Chambers and colleagues developed R at Bell Laboratories. R is an


implementation of the S programming Language and combines with lexical scoping
semantics inspired by Scheme.
 R was named partly after the first names of two R authors. The project conceives in
1992, with an initial version released in 1995 and a stable beta version in 2000.
 Companies Using R for what purpose?
Application of R programming in the real world:

• Data Science

• Programming languages like R give a data scientist superpowers that allow them to
collect data in real-time, perform statistical and predictive analysis, create
visualizations and communicate actionable results to stakeholders.

• Statistical computing

• It has a rich package repository with more than 9100 packages with every statistical
function you can imagine

• Machine Learning

Machine learning enthusiasts to researchers use R to implement machine learning algorithms


in fields like finance, genetics research, retail, marketing and health care.

Alternatives of R programming:

SAS

SPSS

Python

Features of R

 It supports procedural programming with functions and object-oriented


programming with generic functions. Procedural programming includes procedure,
records, modules, and procedure calls. While object-oriented programming language
includes class, objects, and
functions.
 Packages are part of R programming. Hence, they are useful in collecting sets of R
functions into a single unit.
 R programming features include database input, exporting data, viewing data, variable
labels, missing data, etc.
 R is an interpreted language. Hence, we can access it through command line interpreter.
 R supports matrix arithmetic.
 It has effective data handling and storage facilities.
 R supports a large pool of operators for performing operations on arrays and matrices.
 It has facilities to print the reports for the analysis performed in the form of graphs either
on-screen or on hardcopy.
So, we can obtain the installation files for the R program on the official R Website (www.r-
project.org). The website has general documentation related to R along with the libraries of
routines. We can simply download and install the R program from the R Website.

Run R programming in Windows

• Go to official site of R programming

• Click on the CRAN link on the left sidebar

• Select a mirror

• Click “Download R for Windows”

• Click on the link that downloads the base distribution

• Run the file and follow the steps in the instructions to install R.

R studio GUI

a. Features of RStudio
 Code highlighting that gives different colors to keywords and variables, making it easier
to read
 Automatic bracket matching
 Code completion, so as to reduce the effort of typing the commands in full
 Easy access to R Help, with additional features for exploring functions and parameters of
functions
 Easy exploration of variables and values. RStudio is available free of charge for Linux,
Windows, and Mac devices. It can be directly accessed by clicking the RStudio icon in
the menu system on the desktop.
Because RStudio is available free of charge for Linux, Windows, and Mac devices, it is a
good option to use with R. To open RStudio, click the RStudio icon in the menu system or on
the desktop.
b. Components of RStudio
 Source – Top left corner of the screen contains a text editor that lets the user work with
source script files. Multiple lines of code can also be entered here. Users can save R
script file to disk and perform other tasks on the script.
 Console – Bottom left corner is the R console window. The console in RStudio is
identical to the console in RGui. All the interactive work of R programming is performed
in this window.
 Workspace and History – The top right corner is the R workspace and history window.
This provides an overview of the workspace, where the variables created in the session
along with their values can be inspected. This is also the area where the user can see a
history of the commands issued in R.
Files, Plots, Package, and Help the bottom right corner gives access to the following tools:

 Files – This is where the user can browse folders and files on a computer.
 Plots – Now, this is where R displays the user’s plots.
 Packages – This is where the user can view a list of all the installed packages.
 Help – This is where you can browse the built-in Help system of R.

R reserved words
Comparison of R with other technologies:

 Data handling Capabilities – Good data handling capabilities and options for parallel
computation.
 Availability / Cost – R is an open source and we can use it anywhere.
 Advancement in Tool – If you are working on latest technologies, R gets latest features.
 Ease of Learning – R has a learning curve. R is a low-level programming language. As a
result, simple procedures can take long codes.
 Job Scenario – It is a better option for start-ups and companies looking for cost
efficiency.
 Graphical capabilities – R is having the most advanced graphical capabilities. Hence, it
provides you with advanced graphical capabilities.
 Customer Service support and community – R is the biggest online growing
community.

R code and explanation

Vectors:

A vector must have elements of the same type, this function will try and coerce elements to
the same type, if they are different.
Coercion is from lower to higher types from logical to integer to double to character.
Example 1:
Code:
x <- c(1, 5, 4, 9, 0)
typeof(x)
length(x)

Example:2
Code:
x <- c(1, 5.4, TRUE, "hello")
x
typeof(x)
If we want to create a vector of consecutive numbers, the : operator is very helpful.
Code:
X <- 1:7; x
y <- 2:-2; y

Creating a vector using seq() function


Code:
seq(1, 3, by=0.2) # specify step size

seq(1, 5, length.out=4) # specify length of the vector

Using integer vector as index

 Vector index in R starts from 1, unlike most programming languages where index
start from 0.
 We can use a vector of integers as index to access specific elements.
 We can also use negative integers to return all elements except that those specified.
 But we cannot mix positive and negative integers while indexing and real numbers, if
used, are truncated to integers.
Code:

[1] 0 2 4 6 8 10

x[3] # access 3rd element

[1] 4

x[c(2, 4)] # access 2nd and 4th element

[1] 2 6

x[-1] # access all but 1st element

[1] 2 4 6 8 10

x[c(2, -4)] # cannot mix positive and negative integers

Error in x[c(2, -4)] : only 0's may be mixed with negative subscripts
x[c(2.4, 3.54)] # real numbers are truncated to integers

[1] 2 4

Using logical vector as index

 When we use a logical vector for indexing, the position where the logical vector
is TRUE is returned.
 This useful feature helps us in filtering of vector as shown below.
x[c(TRUE, FALSE, FALSE, TRUE)]

[1] -3 3

x[x < 0] # filtering vectors based on conditions

[1] -3 -1

x[x > 0]

[1] 3

Using character vector as index

 This type of indexing is useful when dealing with named vectors. We can name each
elements of a vector.

x <- c("first"=3, "second"=0, "third"=9)

names(x)

[1] "first" "second" "third"

x["second"]

second

x[c("first", "third")]

first third

3 9

How to modify a vector in R?

 We can modify a vector using the assignment operator.


 We can use the techniques discussed above to access specific elements and modify
them.
 If we want to truncate the elements, we can use reassignments.
x

[1] -3 -2 -1 0 1 2

x[2] <- 0; x # modify 2nd element

[1] -3 0 -1 0 1 2

x[x<0] <- 5; x # modify elements less than 0

[1] 5 0 5 0 1 2

x <- x[1:4]; x # truncate x to first 4 elements

[1] 5 0 5 0

How to delete a Vector?

We can delete a vector by simply assigning a NULL to it.


x

[1] -3 -2 -1 0 1 2

x <- NULL

NULL

x[4]

NULL

Matrix:

 Matrix is a two dimensional data structure in R programming.


 Matrix is similar to vector but additionally contains the dimension attribute.
 All attributes of an object can be checked with the attributes() function (dimension
can be checked directly with the dim() function).
 We can check if a variable is a matrix or not with the class() function.

R code for practice:

*charcter constants

'example'

typeof("5")
*Numeric Constants

Types of operators

Arithmetic operators

+ / - / * / / / %% / %/% / ^

Add two vectors

Subtract s second vector from the first

Multiply both the vectors

Divide the first vector with the second

Give the remainder of the first vector with the second

The result of division of first vector with second (quotient)

The first vector raised to the exponent of second vector

u <- c(2,3,4)

v <- c(9,8,7)

print (u+v)

b <- c(1,2,3)

c <- c(9,8,7)

print(b-c)

print (u-v)

print(v-u)

g <- c(1,2)

h <- c(2,3,4)
print(g*h)

g <- c(1,2,3)

h <- c(3,5,6)

print (g*h)

g <- c(1,2,3)

h <- c(3,5,6)

print (g%%h)

print(g %/% h)

print (g^h)

Built in constants

LETTERS

letters

pi

month.name

month.abb

Vector:

Basic statistical operations (code)


mean(c(0, 5, 1, -10, 6))

median(c(0, 5, 1, -10, 6))

var(c(0, 5, 1, -10, 6))

length(c(1, 5, 6, -2))

quantile(c(5,6,7))

sd(c(5,6,7,8))

max(c(5,6,7,8))

min(c(5,6,7,8))

sqrt(c(2, 4))
Mode function :

Mode

getmode <- function(v) {

uniqv <- unique(v)

uniqv[which.max(tabulate(match(v, uniqv)))]

v <- c(2,1,2,3,1,2,3,4,1,5,5,3,2,3)

result <- getmode(v)

print(result)

Create the vector with characters.


charv <- c("o","it","the","it","it")
>
> # Calculate the mode using the user function.
result <- getmode(charv)
print(result)
vector:

 Vector is a basic data structure in R. It contains element of the same type. The
data types can be logical, integer, double, character, complex or raw.
 A vector’s type can be checked with the typeof() function.
 Another important property of a vector is its length. This is the number of
elements in the vector and can be checked with the function length().

apple <- c('red','green',"yellow")

print(apple)

# Get the class of the vector.

print(class(apple))

Bschools <- c('MMS','PGPM','PGDM')

Bschools

print(class(Bschools))

List:

 List is a data structure having components of mixed data types.


 A vector having all elements of the same type is called atomic vector but a vector
having elements of different type is called list.
 We can check if it’s a list with typeof() function and find its length using length().
Here is an example of a list having three components each of different data type.

list1 <- list(c(2,5,3),21.3,sin)

# Print the list.

print(list1)

list <- list('MMS students',21.5,c(3,7,8,9))

list

Matrix:

Create a matrix
Matrix can be created using the matrix() function.
Dimension of the matrix can be defined by passing appropriate value for
arguments nrow and ncol.

M = matrix( c('k','a','v','i','t','a'), nrow = 2, ncol = 3, byrow = TRUE)

print(M)

matrix(1:9, nrow = 3, ncol = 3)

matrix(1:9, nrow = 3)

matrix(1:9, nrow=3, byrow=TRUE) # fill matrix row-wise

x <- matrix(1:9, nrow = 3, dimnames = list(c("India","USA","UK"), c("C1","C2","C3")))

Column names and row names chaging and accessing

colnames(x)

"A" "B" "C"

rownames(x)

"X" "Y" "Z"

> # It is also possible to change names

colnames(x) <- c("C1","C2","C3")

rownames(x) <- c("R1","R2","R3")

Column bind and row bind


cbind(c(1,2,3),c(4,5,6))

rbind(c(1,2,3),c(4,5,6))

cbind(c('t','e','a','c','h'),c(1,2,3,4,5))

rbind(c('t','e','a','c','h'),c(1,2,3,4,5))

How to modify a matrix?

x[2,2] <- 10; x # modify a single element

x[x<5] <- 0; x # modify elements less than 5

x[-1,] # select all rows except first

x[c(1,2),c(2,3)] select rows 1 & 2 and columns 2 & 3

x[c(3,2),] # leaving column field blank will select entire columns

x[,] # leaving row as well as column field blank will select entire matrix

x[-1,] # select all rows except first


factors

Factor is a data structure used for fields that takes only predefined, finite number of
values (categorical data).
For example: a data field such as marital status may contain only values from single,
married, separated, divorced, or widowed.In such case, we know the possible values
beforehand and these predefined, distinct values are called levels. Following is an
example of factor in R.

seeds_rice<- c('IR 20','Basmati','IR 60','Kolam','kolam nasik','IR idli rice','IR


20','Basmati','wada kolam')

seeds_rice

factor_seeds <- factor(seeds_rice)

print(factor_seeds)

print(nlevels(factor_seeds))

Data frames

BMI <- data.frame(

gender = c("Male", "Male","Female"),

height = c(152, 171.5, 165),

weight = c(81,93, 78),

Age = c(42,38,26)

print(BMI)

Temp < - data.frame (

Min = c (23,12,13,5),

Max = c(23,45,45,65)

)
Print(Temp)

x <- data.frame("SN" = 1:2, "Age" = c(47,75), "Name" = c("kavita","ramalingam"))

str(x) # structure of x

x["Name"]

x$Name

x[["Name"]]

x[[3]]

combining two dataframes

library(gtools)

df1 = data.frame(a = c(1:5), b = c(6:10))

df2 = data.frame(a = c(11:15), b = c(16:20), c = LETTERS[1:5])

smartbind(df1,df2)

emp.data <- data.frame(

emp_id = c (1:5),

emp_name = c("Ricky","Danish","Mini","Ryan","Gary"),

salary = c(643.3,515.2,671.0,729.0,943.25),

start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11","2015-03-


27")),

stringsAsFactors = FALSE
)

print(emp.data)

Get the Structure of the R Data Frame

The structure of the data frame can see by using the str () function.

str(emp.data)

extract specific columns

result <- data.frame(emp.data$emp_name,emp.data$salary)

print(result)

extract first two rows

result <- emp.data[1:2,]

print(result)

3rd and 5th row and 2nd and 4th column

result <- emp.data[c(3,5),c(2,4)]

print(result)

add the dept column

emp.data$dept <- c("IT","Operations","IT","HR","Finance")

v <- emp.data

print(v)

create a 2nd data frame


emp.newdata <- data.frame(

emp_id = c (6:8),

emp_name = c("Rasmi","Pranab","Tusar"),

salary = c(578.0,722.5,632.8),

start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),

dept = c("IT","Operations","Fianance"),

stringsAsFactors = FALSE

emp.finaldata <- rbind(emp.data,emp.newdata)

print(emp.finaldata)

Using Loops

Exercise 1:

How to print a multiplication table

num = as.integer(readline(prompt = "Enter a number: "))

# use for loop to iterate 10 times

for(i in 1:10)

print(paste(num,'x', i, '=', num*i))

Exercise:2

How to print a addition table


num = as.integer(readline(prompt = "Enter a number: "))

for(i in 1:10)

print(paste(num,'+', i, '=', num +i))

Exercise:3

To check the given number is even or odd

num = as.integer(readline(prompt="Enter a number: "))

if((num %% 2) == 0) {

print(paste(num,"is Even"))

} else {

print(paste(num,"is Odd"))

Charts and its types:

max.temp <- c(22, 27, 26, 24, 23, 26, 28)

barplot(max.temp)

bar chart with added parameters:

barplot(max.temp,

main = "Maximum Temperatures in a Week",


xlab = "Degree Celsius",

ylab = "Day",

names.arg = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"),

col = "darkred",

horiz = TRUE)

plotting categorical data:

age <- c(17,18,18,17,18,19,18,16,18,18)

table(age)

barplot(table(age),

main="Age Count of 10 Students",

xlab="Age",

ylab="Count",

border="red",

col="blue",

density=10

histogram

Builtin data sets

str(airquality) # str structure of the data set

Temperature <- airquality$Temp

hist(Temperature)

added parameters
hist(Temperature,

main="Maximum daily temperature at La Guardia Airport",

xlab="Temperature in degrees Fahrenheit",

xlim=c(50,100),

col="darkmagenta",

freq=FALSE

return value of hist()

h <- hist(Temperature)

return values for labels using text()

h <- hist(Temperature,ylim=c(0,40))

text(h$mids,h$counts,labels=h$counts, adj=c(0.5, -0.5))

histogram using different breaks

hist(Temperature, breaks=4, main="With breaks=4")

hist(Temperature, breaks=20, main="With breaks=20")

histogram with non uniform width:

hist(Temperature,

main="Maximum daily temperature at La Guardia Airport",

xlab="Temperature in degrees Fahrenheit",

xlim=c(50,100),
col="chocolate",

border="brown",

breaks=c(55,60,70,75,80,100)

bar plot

str(airquality)

boxplot(airquality$Ozone) # ozone readings

boxplot(airquality$Ozone,

main = "Mean ozone in parts per billion at Roosevelt Island",

xlab = "Parts Per Billion",

ylab = "Ozone",

col = "orange",

border = "brown",

horizontal = TRUE,

notch = TRUE

b <- boxplot(airquality$Ozone)

boxplot(Temp~Month,

data=airquality,

main="Different boxplots for each month",


xlab="Month Number",

ylab="Degree Fahrenheit",

col="orange",

border="brown"

strip chart

str(airquality)

stripchart(airquality$Ozone)

using jitter as a method

stripchart(airquality$Ozone,

main="Mean ozone in parts per billion at Roosevelt Island",

xlab="Parts Per Billion",

ylab="Ozone",

method="jitter",

col="orange",

pch=1

to draw multiple strips we want to prepare data set

# prepare the data

temp <- airquality$Temp

# gererate normal distribution with same mean and sd

tempNorm <- rnorm(200,mean=mean(temp, na.rm=TRUE), sd = sd(temp, na.rm=TRUE))

# make a list
x <- list("temp"=temp, "norm"=tempNorm)

stripchart(x,

main="Multiple stripchart for comparision",

xlab="Degree Fahrenheit",

ylab="Temperature",

method="jitter",

col=c("orange","red"),

pch=16

strip chart from the formula

stripchart(Temp~Month,

data=airquality,

main="Different strip chart for each month",

xlab="Months",

ylab="Temperature",

col="brown3",

group.names=c("May","June","July","August","September"),

vertical=TRUE,

pch=16

TYPES OF CHARTS

Data set

class.interval frequency
11.5-16.5 2

16.5-21.5 6

21.5-26.5 7

26.5-31.5 5

31.5-36.5 3

hist(CHARTS1$frequency,right = FALSE)

histogram

v <- c(9,13,21,8,36,22,12,41,31,33,19)

hist(v,xlab = "Weight",col = "yellow",border = "blue")

hist(v,xlab = "Weight",col = "green",border = "red", xlim = c(0,40), ylim =


c(0,5),

breaks = 5)

plot

v <- c(7,12,28,3,41)

plot(v,type = "o")
line chart

v <- c(7,12,28,3,41)

plot(v,type = "o", col = "red", xlab = "Month", ylab = "Rain fall",

main = "Rain fall chart")

multiple lines in a chart

v <- c(7,12,28,3,41)

t <- c(14,7,6,19,3)

plot(v,type = "o",col = "red", xlab = "Month", ylab = "Rain fall",

main = "Rain fall chart")

lines(t, type = "o", col = "blue")

stem(CHARTS1$frequency)

> stem(CHARTS1$frequency)

The decimal point is at the |

2 | 00
4|0
6 | 00

dotchart(CHARTS$frequency)
SCATTER PLOT

plot(CHARTS1$frequency)
barplot(CHARTS1$frequency)

Values <- matrix(c(28,40,38,50,53,55,38,30,53),

nrow=3,ncol=3,byrow=TRUE,

dimnames = list(c("A","B","C"),c("1947","1957","1967")))

State <- c ("A","B","C")

colors <-c("darkblue","red","yellow")

counts <- table(dot_data$A,dot_data$B)

barplot(Values, main="production of paddy",

xlab="Years", col=c("darkblue","red","yellow"),

beside=TRUE,ylab = "production of paddy in lakhs tones")

legend("bottomright", State, cex=1.3, fill=colors)


Values <- matrix(c(28,40,38,50,53,55,38,30,53),

nrow=3,ncol=3,byrow=TRUE,

dimnames = list(c("A","B","C"),c("1947","1957","1967")))

State <- c ("A","B","C")

colors <-c("darkblue","red","yellow")

counts <- table(dot_data$A,dot_data$B)

barplot(Values, main="production of paddy",

xlab="Years", col=c("darkblue","red","yellow"),

ylab = "production of paddy in lakhs tones")

legend("bottomright", State, cex=1.3, fill=colors)


slices <- c(10, 12,4, 16, 8)

lbls <- c("US", "UK", "Australia", "Germany", "France")

pie(slices, labels = lbls, main="Pie Chart of Countries")


slices <- c(10, 12, 4, 16, 8)

lbls <- c("US", "UK", "Australia", "Germany", "France")

pct <- round(slices/sum(slices)*100)

lbls <- paste(lbls, pct) # add percents to labels

lbls <- paste(lbls,"%",sep="") # ad % to labels

pie(slices,labels = lbls, col=rainbow(length(lbls)),

main="Pie Chart of Countries")

x <- seq(-pi,pi,0.1)
plot(x, sin(x))

plot(x, sin(x),

main="The Sine Function",

ylab="sin(x)")

plot(x, sin(x),

main="The Sine Function",

ylab="sin(x)",

type="l",

col="blue")

plot(x, sin(x),

main="Overlaying Graphs",

ylab="",

type="l",

col="blue")

lines(x,cos(x), col="red")

legend("topleft",

c("sin(x)","cos(x)"),

fill=c("blue","red")

max.temp # a vector used for plotting

Sun Mon Tue Wen Thu Fri Sat


22 27 26 24 23 26 28

par(mfrow=c(1,2)) # set the plotting area into a 1*2 array

barplot(max.temp, main="Barplot")

pie(max.temp, main="Piechart", radius=1)

Temperature <- airquality$Temp

Ozone <- airquality$Ozone

par(mfrow=c(2,2))

hist(Temperature)

boxplot(Temperature, horizontal=TRUE)

hist(Ozone)

boxplot(Ozone, horizontal=TRUE)

make labels and margins smaller

par(cex=0.7, mai=c(0.1,0.1,0.2,0.1))

Temperature <- airquality$Temp

# define area for the histogram

par(fig=c(0.1,0.7,0.3,0.9))

hist(Temperature)

# define area for the boxplot

par(fig=c(0.8,1,0,1), new=TRUE)

boxplot(Temperature)

# define area for the stripchart

par(fig=c(0.1,0.67,0.1,0.25), new=TRUE)

stripchart(Temperature, method="jitter")
drawing a 3D plot

cone <- function(x, y){

sqrt(x^2+y^2)

to prepare our variables

x <- y <- seq(-1, 1, length= 20)

z <- outer(x, y, cone)

persp(x, y, z)

persp(x, y, z,

main="Perspective Plot of a Cone",

zlab = "Height",

theta = 30, phi = 15,

col = "springgreen", shade = 0.5)

Read csv file:

mydata <- read.csv ("flowers.csv", header= TRUE)

mydata

output:

mydata <- read.csv ("flowers.csv", header= TRUE)


> mydata
flowers
1 rose
2 liliy
3 champa
4 mogra
5 malligai
6 mullai
7 orchid
8 hibiscus
9 jaswant
10 marigold

Anova

Analysis of Variance

Anova code:

y1 = c(18.2, 20.1, 17.6, 16.8, 18.8, 19.7, 19.1)

y2 = c(17.4, 18.7, 19.1, 16.4, 15.9, 18.4, 17.7)

y3 = c(15.2, 18.8, 17.7, 16.5, 15.9, 17.1, 16.7)

y = c(y1, y2, y3)

n = rep(7, 3)

group = rep(1:3, n)

group

tmp = tapply(y, group, stem)

tmpfn = function(x) c(sum = sum(x), mean = mean(x), var = var(x),

n = length(x))

tapply(y, group, tmpfn)

data = data.frame(y = y, group = factor(group))

fit = lm(y ~ group, data)

anova(fit)

df = anova(fit)[, "Df"]

names(df) = c("trt", "err")


df

anova(fit)["Residuals", "Sum Sq"]

anova(fit)["Residuals", "Sum Sq"]/qchisq(c(0.025, 0.975), 18,

lower.tail = FALSE)

output:

> y1 = c(18.2, 20.1, 17.6, 16.8, 18.8, 19.7, 19.1)


> y2 = c(17.4, 18.7, 19.1, 16.4, 15.9, 18.4, 17.7)
> y3 = c(15.2, 18.8, 17.7, 16.5, 15.9, 17.1, 16.7)
> y = c(y1, y2, y3)
> n = rep(7, 3)
> n
[1] 7 7 7
> group = rep(1:3, n)
> group
[1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3
> tmp = tapply(y, group, stem)

The decimal point is at the |

16 | 8
17 | 6
18 | 28
19 | 17
20 | 1

The decimal point is at the |

15 | 9
16 | 4
17 | 47
18 | 47
19 | 1

The decimal point is at the |

15 | 29
16 | 57
17 | 17
18 | 8

>
> tmpfn = function(x) c(sum = sum(x), mean = mean(x), var = var(x),
+ n = length(x))
> tapply(y, group, tmpfn)
$`1`
sum mean var n
130.300000 18.614286 1.358095 7.000000

$`2`
sum mean var n
123.600000 17.657143 1.409524 7.000000

$`3`
sum mean var n
117.900000 16.842857 1.392857 7.000000

>
> data = data.frame(y = y, group = factor(group))
> fit = lm(y ~ group, data)
> anova(fit)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
group 2 11.007 5.5033 3.9683 0.03735 *
Residuals 18 24.963 1.3868
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> df = anova(fit)[, "Df"]
> names(df) = c("trt", "err")
> df
trt err
2 18
>
> anova(fit)["Residuals", "Sum Sq"]
[1] 24.96286
>
> anova(fit)["Residuals", "Sum Sq"]/qchisq(c(0.025, 0.975), 18,
+ lower.tail = FALSE)
[1] 0.7918086 3.0328790

Interpretation :

If the p value from the F test is greater than or equal to 0.05 then the null hyphothesis is accepted otherwise
rejected.

Correlation :
cor(CORRELATION, use="complete.obs", method="pearson")

CORRELATION
X Y
1 10 20
2 12 13
3 9 12
4 13 5
5 6 9
6 8 2
7 12 5
8 13 6

OUTPUT:

cor(CORRELATION, use="complete.obs", method="pearson")


X Y
X 1.00000000 -0.09610721
Y -0.09610721 1.00000000

>
cor(CORRELATION, use="complete.obs", method="spearman")

CORRELATION
X Y
1 10 20
2 12 13
3 9 12
4 13 5
5 6 9
6 8 2
7 12 5
8 13 6

OUTPUT: cor(CORRELATION, use="complete.obs", method="spearman")


X Y
X 1.00000000 -0.09697148
Y -0.09697148 1.00000000

cor(CORRELATION, use="complete.obs", method="kendall")


cov(CORRELATION, use="complete.obs")

output:

cor(CORRELATION, use="complete.obs", method="kendall")


X Y
X 1.00000000 -0.03774257
Y -0.03774257 1.00000000
> cov(CORRELATION, use="complete.obs")
X Y
X 6.553571 -1.428571
Y -1.428571 33.714286
Data set:

X Y

10 20

12 13

9 12

13 5

6 9

8 2

12 5

13 6
Regression

>

REGRESSION
alligator = data.frame(
lnLength = c(3.87, 3.61, 4.33, 3.43, 3.81, 3.83, 3.46, 3.76,
3.50, 3.58, 4.19, 3.78, 3.71, 3.73, 3.78),
lnWeight = c(4.87, 3.93, 6.46, 3.33, 4.38, 4.70, 3.50, 4.50,
3.58, 3.64, 5.90, 4.43, 4.38, 4.42, 4.25)
)
alligator #view data

alligator_regression = lm(lnWeight ~ lnLength, data = alligator)


lm(formula = lnWeight ~ lnLength, data = alligator)
lm(formula = lnWeight ~ lnLength, data = alligator)

summary(alligator_regression)

alligator_regression = lm(lnWeight ~ lnLength, data = alligator)


> lm(formula = lnWeight ~ lnLength, data = alligator)

Call:
lm(formula = lnWeight ~ lnLength, data = alligator)

Coefficients:
(Intercept) lnLength
-8.476 3.431

>
> summary(alligator_regression)
Call:
lm(formula = lnWeight ~ lnLength, data = alligator)

Residuals:

Anda mungkin juga menyukai