Anda di halaman 1dari 21

R Intro

Other Objects
Intrinsic Attributes: Mode and length

• Objects storing homogeneous values are


atomic structures. Eg: vectors
• Objects such as lists- ordered structures
having heterogeneous types of data are called
recursive. Others such as functions, expression
are of same structure
• Mode and length are intrinsic attributes
• Extrinsic Attributes can be get/set
Attributes
• An empty numeric object can be declared as: e<-numeric() [Try to check output for
X[0]
• Now If we give e an index value outside its range, the length of e increases [Try e[3]<-
5]
• We can also extend length just by length(e)<-6 for example.
• attr(object, name) allows to get/set an attribute specified by name.
• Eg: attr(x,”dim”)<-c(10,10) would tell R to treat x as 10x10 matrix [Try above with
predefined e]
• attr(x,"names")<-c("a","b","c")
• Class is also an attribute of an object which allows R to treat it in a particular way (eg,
vectors printed in diff format, matrices in diff, functions like plot and summary also
take class into account before processing them)
• To remove temporarily the effects of class on some classes, use the function
unclass(). For example if winter has the class "data.frame" (or “ts”) then
> winter #will print it in data frame(or ts) form, which is rather like a matrix, whereas
> winter<-unclass(winter) #prints it as a list(or numeric for ts)
Coercion
• Type conversion attempt by R
• Implicit Coercion: If we try combining hetrogeneous types in a
homogeneous object, R automatically coerces instead of throwing error
Eg: >x<-c(1,"canada",3) [Check output]
• Explicit Coercion: by using special functions
Eg: x<-1:5
y<-as.character(x)
x<-as.numeric(y)

• When R tries but can't coerce, it substitutes NA and displays warning


Eg: >x<-c("1","b","3")
>as.numeric(x) [Check output]
Factors- Slicing and dicing
• Factors are used for grouping elements of vector
• Also referred as enumerated type or categorical
objects
>v <- c(40,2,83,28,58)
>f <- factor(c("A","B","C"))
>y<-split(v,f)
Group Values
A 40,28
B 2,58
C 83
Factors …ctd
• Eg 2: Suppose there’s a column in a table (a vector loc, say) spcifying the
location of office of each employee. We want to extract different office
locations from this redundant list, we simply convert it to factor:
>locf<-factor(loc)
Now we can split the tuples(rows of the table using above split function)
• Try print(locf)
• Try levels(locf)
• f <- factor(loc, levels) #vector v contains only a subset of possible values
and not the entire universe, then specify all possible locations by levels

• The combination of a vector and a labeling factor is an example of what is


sometimes called a ragged array, since the subclass sizes are possibly
irregular.
Tapply(), User Functions
• Continue with previous example and define
incomes <- c(60, 49, 40, 61, 64, 60, 59, 54)
• Now to calculate mean of each group, use
tapply(vector,index,function)
• Tapply(incomes,locf,mean)
• Suppose we want to define a function for stderror:
stderror<-function(x) sqrt(var(x)/length(x))
Then use it: tapply(incomes,locf,stderror)
Frequency table
• Suppose we wish to find out the number of
employees in each location
– Tapply(locf,locf,length)
– Table(locf)
• Two way frequency table: Suppose we want to
find the number of income-wise frequency in
each department. (how many people earning a
given amount in each table)
– Table(incomes,locf)
Ordered Factors: Ordinal data
• The levels of factors are stored in alphabetical order, or in the order they were
specified explicitly. Sometimes the levels will have some other natural ordering that
we want to record and make use of in plots(bar charts).
Eg: To represent the status of five projects. Each project has a status of low, medium, or
high:
> status <- c("Lo", "Hi", "Med", "Med", "Hi")
Now we create an ordered factor with this status data:
> ordered.status <- factor(status, levels=c("Lo", "Med", "Hi"), ordered=TRUE)
> ordered.status
[1] Lo Hi Med Med Hi
Levels: Lo < Med < Hi

> table(status)
status Hi Lo Med
21 2

>table(ordered.status) # gives a more intuitive representation


ordered.status Lo Med Hi
12 2
Arrays
• Vectors with additional attribute dim vector- a vector of non-negative integers
specifying the respective extents of the array.
• If length of dim vector is k then the array is k-dimensional.

• Vector ~1D array [But with no dim vector. Thus similar not same]
• Matrix=2D array [provided separately for ease like log10]

• The content of the array is stored physically in column-major order. R ensures


that the length of the vector is the product of the lengths of the dimensions. The
length of one or more dimensions may be zero.

• dim(): dim(z) <- c(3,5,100) # reshapes z into a 3D array!


The elements would be divided into 100 (3x5)matrixes. First 15 goto first
group/slice and so on. In each matrix, first elements are distributed along 5 rows
of first column then to next column. Thus first subscript moves fastest (rows
change first) and last slowest.
• array(data_vector, dim_vector)
Subscripting Array
• A[i,j,…]
• A[,,k]: All elements in kth matrix
• A[i]: prints the corresponding element as in
underlying vector stored in column major
order.
• A[indexvector]: same as above
examples
z<-1:30
• z <- 1:10 dim(z)<-c(2,3,5)
• is.array(z)
is.array(z)
is.matrix(z)
• dim(z) <- c(2, 5) print(z)
• is.array(z) print(z[2,3,1])
• is.matrix(z) print(z[2,3])#ERROR!
• attr(z, “dim”)=10 print(z[,2,3]) #Correct
• attr(z, “dim”)=NULL print(z[2,3,])
• print(z[2,,1])
attr(z, “dim”)=c(5,2)
print(z[,,])
print(dim(z))
print(z[dim(z)]) #!!!!
print(z[c(1,2),,])
print(z[5])
print(z[c(1,2,3)])
Customizing Organization in Matrix and
Arrays
• Notice R always reshapes a vector in column major order. We
have no control over this.
• For more control use- matrix()
• Matrix(vector, nrow=,ncol=,byrow=TRUE)

• The case of array is not so simple. Here we can use aperm(a,


perm_vector) may be used to permute an array’s dimensions.
Perm_vector is a permutation of integers from
1:length(dim(a)).
Eg: if a is a matrix, aperm(a,c(2,1)) creates transpose of the
matrix. For this simple case, function t() is also available
Computations using array
• Computation done element by element
• Both array of same dimension
• If not, recycling rule:
– Vector+array:
• If vector is shorter: stretch the short vector and then computation done as per
physical storage order.
• If array is shorter: error!
– Array+array: dim must match else error!
• The official way to coerce an array back to a simple vector object is to
use as.vector()
• Coercion
– > vec <- as.vector(X) #clears dim attribute
– > vec <- c(X) #same as above
Outer Product
• C<- A %o% B
– Outer product is an array whose dimension vector is
obtained by concatenating the two dimension vectors of
operand arrays(order is important). Data vector is
obtained by forming all possible products.
• Alt: outer(A,B, ‘*’)
Outer function can also be used to compute any
function between the two vectors:
Outer(A,b, FUN= “functionname”)
Matrix multiplication
• A*B ### What do you expect??

• A%*%B: Matrix Multiplication (each vector is a


row vector)

• Crossprod(x,y) is same as t(X) %*% y. If second


argument is omitted it is taken same as first.
Diagonal
• diag(v), where v is a vector, gives a diagonal
matrix with elements of the vector as the
diagonal entries.
• On the other hand diag(M), where M is a
matrix, gives the vector of main diagonal
entries of M.
Solving linear equations:solve()
• Given A,b
> b <- A %*% x #System of equations
• The vector x is the solution of that linear equation system:
x=A-1b
In R, x<- solve(A,b)
• Inverse of A: solve(A)

• [Inefficient and potentially unstable to compute


x <- solve(A) %*% b
instead of solve(A,b). ]
CBIND() and RBIND()
• cbind() forms matrices by binding together matrices
horizontally, or column-wise
> X <- cbind(arg_1, arg_2, arg_3, …) #returns matrix arg1,
arg2,…col concatenated
– Input matrices must have same column size, that is the same
number of rows.
– If a vector is present among inputs, it is treated as a column, if
short, it is cyclically extended.

• rbind() : vectors taken as rows, binds matrices vertically, or


row-wise.
Eigenvalues, Eigenvectors, SVD
• For a symmetric matrix Sm, eigen(Sm) returns two
components named values and vectors.
> ev <- eigen(Sm)
– >ev$val #vector of eigenvalues of Sm
– >ev$vec #the matrix of corresponding eigenvectors.
– > evals <- eigen(Sm)$values #Alternate
– > evals <- eigen(Sm, only.values = TRUE)$values
#saves computation when eigenvector not needed

• SVD(M) returns M= U %*% D %*% t(V).


– Eg:SVD(M)$d returns the D matrix above
Least squares fitting, QR decomposition

• lsfit(X, y) #y =vector of observations, X= the


design matrix.

Anda mungkin juga menyukai