Other Objects Intrinsic Attributes: Mode and length
• Objects storing homogeneous values are
atomic structures. Eg: vectors • Objects such as lists- ordered structures having heterogeneous types of data are called recursive. Others such as functions, expression are of same structure • Mode and length are intrinsic attributes • Extrinsic Attributes can be get/set Attributes • An empty numeric object can be declared as: e<-numeric() [Try to check output for X[0] • Now If we give e an index value outside its range, the length of e increases [Try e[3]<- 5] • We can also extend length just by length(e)<-6 for example. • attr(object, name) allows to get/set an attribute specified by name. • Eg: attr(x,”dim”)<-c(10,10) would tell R to treat x as 10x10 matrix [Try above with predefined e] • attr(x,"names")<-c("a","b","c") • Class is also an attribute of an object which allows R to treat it in a particular way (eg, vectors printed in diff format, matrices in diff, functions like plot and summary also take class into account before processing them) • To remove temporarily the effects of class on some classes, use the function unclass(). For example if winter has the class "data.frame" (or “ts”) then > winter #will print it in data frame(or ts) form, which is rather like a matrix, whereas > winter<-unclass(winter) #prints it as a list(or numeric for ts) Coercion • Type conversion attempt by R • Implicit Coercion: If we try combining hetrogeneous types in a homogeneous object, R automatically coerces instead of throwing error Eg: >x<-c(1,"canada",3) [Check output] • Explicit Coercion: by using special functions Eg: x<-1:5 y<-as.character(x) x<-as.numeric(y)
• When R tries but can't coerce, it substitutes NA and displays warning
Eg: >x<-c("1","b","3") >as.numeric(x) [Check output] Factors- Slicing and dicing • Factors are used for grouping elements of vector • Also referred as enumerated type or categorical objects >v <- c(40,2,83,28,58) >f <- factor(c("A","B","C")) >y<-split(v,f) Group Values A 40,28 B 2,58 C 83 Factors …ctd • Eg 2: Suppose there’s a column in a table (a vector loc, say) spcifying the location of office of each employee. We want to extract different office locations from this redundant list, we simply convert it to factor: >locf<-factor(loc) Now we can split the tuples(rows of the table using above split function) • Try print(locf) • Try levels(locf) • f <- factor(loc, levels) #vector v contains only a subset of possible values and not the entire universe, then specify all possible locations by levels
• The combination of a vector and a labeling factor is an example of what is
sometimes called a ragged array, since the subclass sizes are possibly irregular. Tapply(), User Functions • Continue with previous example and define incomes <- c(60, 49, 40, 61, 64, 60, 59, 54) • Now to calculate mean of each group, use tapply(vector,index,function) • Tapply(incomes,locf,mean) • Suppose we want to define a function for stderror: stderror<-function(x) sqrt(var(x)/length(x)) Then use it: tapply(incomes,locf,stderror) Frequency table • Suppose we wish to find out the number of employees in each location – Tapply(locf,locf,length) – Table(locf) • Two way frequency table: Suppose we want to find the number of income-wise frequency in each department. (how many people earning a given amount in each table) – Table(incomes,locf) Ordered Factors: Ordinal data • The levels of factors are stored in alphabetical order, or in the order they were specified explicitly. Sometimes the levels will have some other natural ordering that we want to record and make use of in plots(bar charts). Eg: To represent the status of five projects. Each project has a status of low, medium, or high: > status <- c("Lo", "Hi", "Med", "Med", "Hi") Now we create an ordered factor with this status data: > ordered.status <- factor(status, levels=c("Lo", "Med", "Hi"), ordered=TRUE) > ordered.status [1] Lo Hi Med Med Hi Levels: Lo < Med < Hi
> table(status) status Hi Lo Med 21 2
>table(ordered.status) # gives a more intuitive representation
ordered.status Lo Med Hi 12 2 Arrays • Vectors with additional attribute dim vector- a vector of non-negative integers specifying the respective extents of the array. • If length of dim vector is k then the array is k-dimensional.
• Vector ~1D array [But with no dim vector. Thus similar not same] • Matrix=2D array [provided separately for ease like log10]
• The content of the array is stored physically in column-major order. R ensures
that the length of the vector is the product of the lengths of the dimensions. The length of one or more dimensions may be zero.
• dim(): dim(z) <- c(3,5,100) # reshapes z into a 3D array!
The elements would be divided into 100 (3x5)matrixes. First 15 goto first group/slice and so on. In each matrix, first elements are distributed along 5 rows of first column then to next column. Thus first subscript moves fastest (rows change first) and last slowest. • array(data_vector, dim_vector) Subscripting Array • A[i,j,…] • A[,,k]: All elements in kth matrix • A[i]: prints the corresponding element as in underlying vector stored in column major order. • A[indexvector]: same as above examples z<-1:30 • z <- 1:10 dim(z)<-c(2,3,5) • is.array(z) is.array(z) is.matrix(z) • dim(z) <- c(2, 5) print(z) • is.array(z) print(z[2,3,1]) • is.matrix(z) print(z[2,3])#ERROR! • attr(z, “dim”)=10 print(z[,2,3]) #Correct • attr(z, “dim”)=NULL print(z[2,3,]) • print(z[2,,1]) attr(z, “dim”)=c(5,2) print(z[,,]) print(dim(z)) print(z[dim(z)]) #!!!! print(z[c(1,2),,]) print(z[5]) print(z[c(1,2,3)]) Customizing Organization in Matrix and Arrays • Notice R always reshapes a vector in column major order. We have no control over this. • For more control use- matrix() • Matrix(vector, nrow=,ncol=,byrow=TRUE)
• The case of array is not so simple. Here we can use aperm(a,
perm_vector) may be used to permute an array’s dimensions. Perm_vector is a permutation of integers from 1:length(dim(a)). Eg: if a is a matrix, aperm(a,c(2,1)) creates transpose of the matrix. For this simple case, function t() is also available Computations using array • Computation done element by element • Both array of same dimension • If not, recycling rule: – Vector+array: • If vector is shorter: stretch the short vector and then computation done as per physical storage order. • If array is shorter: error! – Array+array: dim must match else error! • The official way to coerce an array back to a simple vector object is to use as.vector() • Coercion – > vec <- as.vector(X) #clears dim attribute – > vec <- c(X) #same as above Outer Product • C<- A %o% B – Outer product is an array whose dimension vector is obtained by concatenating the two dimension vectors of operand arrays(order is important). Data vector is obtained by forming all possible products. • Alt: outer(A,B, ‘*’) Outer function can also be used to compute any function between the two vectors: Outer(A,b, FUN= “functionname”) Matrix multiplication • A*B ### What do you expect??
• A%*%B: Matrix Multiplication (each vector is a
row vector)
• Crossprod(x,y) is same as t(X) %*% y. If second
argument is omitted it is taken same as first. Diagonal • diag(v), where v is a vector, gives a diagonal matrix with elements of the vector as the diagonal entries. • On the other hand diag(M), where M is a matrix, gives the vector of main diagonal entries of M. Solving linear equations:solve() • Given A,b > b <- A %*% x #System of equations • The vector x is the solution of that linear equation system: x=A-1b In R, x<- solve(A,b) • Inverse of A: solve(A)
• [Inefficient and potentially unstable to compute
x <- solve(A) %*% b instead of solve(A,b). ] CBIND() and RBIND() • cbind() forms matrices by binding together matrices horizontally, or column-wise > X <- cbind(arg_1, arg_2, arg_3, …) #returns matrix arg1, arg2,…col concatenated – Input matrices must have same column size, that is the same number of rows. – If a vector is present among inputs, it is treated as a column, if short, it is cyclically extended.
• rbind() : vectors taken as rows, binds matrices vertically, or
row-wise. Eigenvalues, Eigenvectors, SVD • For a symmetric matrix Sm, eigen(Sm) returns two components named values and vectors. > ev <- eigen(Sm) – >ev$val #vector of eigenvalues of Sm – >ev$vec #the matrix of corresponding eigenvectors. – > evals <- eigen(Sm)$values #Alternate – > evals <- eigen(Sm, only.values = TRUE)$values #saves computation when eigenvector not needed
• SVD(M) returns M= U %*% D %*% t(V).
– Eg:SVD(M)$d returns the D matrix above Least squares fitting, QR decomposition