8)
variances is in (0.5, 2), var(x),var(y) or var.test(x,y) or var.test(x(s)~y(y/n)) mu0=17
summary();range();IQR(); IQR(days)/1.34898 #sigma estimated through IQR. n=length(mpg)
rm = apply(B; 1; mean) , 1: matrix 1 indicates rows, 2 indicates columns, c(1, SE=sd(mpg)/sqrt(n)
2) indicates rows and columns T=(mean(mpg)-mu0)/SE
p.z=2*(1-pnorm(abs(T))) #normal approximation
cat("normal distribution method p-value=", round(p.z, 4))
#round(value,4 sig f place)
Given a matrix A in R, write a program that subtracts each column of A by p.t=2*(1-pt(abs(T),df=n-1)) #t approximation
the column’s mean value to get a matrix with all column sums equal to cat("t distribution p-value=", round(p.t,4))
zero. t.test(mpg, mu=mu0)
t(t(A)-apply(A,2,mean)) #p.z is close to p.t and the result from t.test because n=61 is big and t_61 is
Type I Error: P(H0 is rejected | H0 ) , Type II Error: P(H0 is not rejected | H1 ) close to N(0,1)
boxplot(salary~male+rank, names=c(“male”,”female”)
boxplot(a~b, data=test, col="pink", main="Boxplot of pain by drug") State 2 assumptions for ANOVA:
plot( qnorm(seq(0.05,0.95,0.05)), qt(seq(0.05,0.95,0.05),df=1) ) 1: multiple groups of samples are normally distributed
qqplot(rt(1000,df=100),rnorm(1000)); quantile(rnorm(1000),c(0.25,0.75)) 2: Equal variances between treatments (homoscedasticity)
Skew,kurtosis,trimmed,winsorized 3. Each sample is randomly selected and independent.
# In ANOVA, p value=0.0207<alpha, so we reject H0, indicating that mean days
#function for computing skewness
are significantly different over four age groups.
skew=function(x) {n=length(x)
m2=sum((x-mean(x))^2)/n
m3=sum((x-mean(x))^3)/n
m3/m2^(3/2)*sqrt(n*(n-1))/(n-2) }
skew(wip1)
skew(wip2)
#kurtosis
kurt=function(x) {
n=length(x)
m4=sum((x-mean(x))^4)/n
m2=sum((x-mean(x))^2)/n
(n-1)/((n-2)*(n-3))*((n+1)*m4/m2^2-3*(n-1)) }
kurt(wip1) State 2 assumptions for Wilcox.test:
#trimmed mean <10% can ignore 1. Dependent samples – the two samples need to be dependent observations of
mean(days, trim=0.2) the cases. The Wilcoxon sign test assess for differences between a before and
#Winsorized mean after measurement, while accounting for individual differences in the baseline.
n=length(days) 2. Independence – The Wilcoxon sign test assumes independence, meaning that
a=0.2 the paired observations are randomly and independently drawn.
days1=sort(days)
days1[1: (n*a)]=days1[n*a+1] Z-test:
days1[(n-n*a+1):n]=days1[n-n*a] 1. obs are ind and errors are normally distributed, sampling is random
mean(days1) T-test:
huber(days, k =1.5) #winsorized at default k=1.5*sd Normally distributed, homogeneity of variance. Large sample sized used.
### Inference for multiple groups
Categorical Data
flex=read.table("C:/Users.txt",header=T)
attach(flex)
diff=before-after
ncount=sum(sign(diff[diff>0])) #number of positive signs
#sign test for two paired samples
binom.test(ncount,length(diff),0.5)
#Decision: p value=0.109>0.05, don't reject H0 that prob of success=0.5.
#Conclusion: Therefore we conclude that employees' attitudes toward job have
CI and Hypothesis testing
no significant difference between before and after the program.
#(one-sample) Wilcoxon rank sum test for the sample of difference
wilcox.test(diff) #or equivalently wilcox.test(before, after, paired=T)
n=length(x) #p-value-0.0137 <0.05, reject H0 that mean(diff)=0.
xbar=mean(x) #xbar=1.895 Accept Reject
alpha=0.1
tstar=qt(1-alpha/2,df=n-1)
SE=sd(x)/sqrt(n)
cat("90% t CI: (", c(xbar-tstar*SE,xbar+tstar*SE), ")")
zstar=qnorm(1-alpha/2)
cat("90% z CI: (", c(xbar-zstar*SE,xbar+zstar*SE),")")
t.test(x, conf.level=0.9)
#C) one-sided interval
tstar=qt(1-alpha,df=n-1)
SE=sd(x)/sqrt(n)
cat("t 90% confidence upper bound: (-inf, ", xbar+tstar*SE, ")")
#t 90% confidence upper bound: (-inf, 1.954251 )#
zstar=qnorm(1-alpha)
cat(" z 90% confidence upper bound: (-inf, ", xbar+zstar*SE, ")")
#z 90% confidence upper bound: (-inf, 1.948666 )#
t.test(x, conf.level=0.9, alt="less")
#The three methods provide the almost same result.
#weibull
#binom
> Nsim=10^4
Simulation
> X=rep(0,Nsim)
> for (i in 1:Nsim){
+ z=rnorm(1,mean=mu,sd=sigma)
+ while(z<a) z=rnorm(1,mean=mu,sd=sigma)
+ X[i]=z} ; and evaluate the algorithm for µ = 0, α = 1, and various values of
#Row percentage table will be most informative for these data. Because:
labtest/rowSums(labtest)*100
Row percentage table can show that most improper tests (60%) occurred in
the evening shift, while most proper tests (68%) were in the day shift