Cheat Sheet F

Matrix mpg=c(11.4,13.1,14.7,14.7,15.0,15.5,15.6,15.9,16.0,16.
8)
variances is in (0.5, 2), var(x),var(y) or var.test(x,y) or var.test(x(s)~y(y/n)) mu0=17
summary();range();IQR(); IQR(days)/1.34898 #sigma estimated through IQR. n=length(mpg)
rm = apply(B; 1; mean) , 1: matrix 1 indicates rows, 2 indicates columns, c(1, SE=sd(mpg)/sqrt(n)
2) indicates rows and columns T=(mean(mpg)-mu0)/SE
p.z=2*(1-pnorm(abs(T))) #normal approximation
cat("normal distribution method p-value=", round(p.z, 4))
#round(value,4 sig f place)
Given a matrix A in R, write a program that subtracts each column of A by p.t=2*(1-pt(abs(T),df=n-1)) #t approximation
the column’s mean value to get a matrix with all column sums equal to cat("t distribution p-value=", round(p.t,4))
zero. t.test(mpg, mu=mu0)
 t(t(A)-apply(A,2,mean)) #p.z is close to p.t and the result from t.test because n=61 is big and t_61 is
Type I Error: P(H0 is rejected | H0 ) , Type II Error: P(H0 is not rejected | H1 ) close to N(0,1)
z-test and ANOVA

x=matrix(c(33,200712,115,201114),nc=2,
dimnames=list(c("real","fake"),c("polio","no polio"))) *2
Regression
represents that the regression model can explain only around 20% of the
variation in the response Wind.
cor.test(size, rent) t.test(a~b, var.equal=T) #Whether mean of a = mean of b
#p-value<<0.05, reject H0, which shows that r is significantly different from t.test(pain[drug=="A"], pain[drug=="B"], mu=0, paired=TRUE) #to test for
0 at level 0.05. improvement before/after
prop.test( small, big, p = 0.5, alt=””, conf.level=)
If Wind and Temp were reversed in the regression, then the R2 would keep p value=0.4701>0.05, we do not reject H0. There is no evidence that the female
the same because in this simple linear regression model R2 is just the proportion in aboriginal students is not 0.5.
square of the correlation coefficient between Wind and Temp. wilcox.test(a,b,paired = T, alt = “greater”)
Graph
par(mfrow=c(2,1)) mcnemar.test(cancer)
plot(B[gender=="M"],A[gender=="M"], data =test, main="header", #Conclusion: p<alpha. The data provide evidence that an extra 6 months of
ylab="A",xlab="B",xlim=c(0,100), ylim=c(80,130)) chemotherapy results in a different survival rate compared to the treatment
with perioperative chemo alone.
boxplot(salary~male+rank, names=c(“male”,”female”)
boxplot(a~b, data=test, col="pink", main="Boxplot of pain by drug") State 2 assumptions for ANOVA:
plot( qnorm(seq(0.05,0.95,0.05)), qt(seq(0.05,0.95,0.05),df=1) ) 1: multiple groups of samples are normally distributed
qqplot(rt(1000,df=100),rnorm(1000)); quantile(rnorm(1000),c(0.25,0.75)) 2: Equal variances between treatments (homoscedasticity)
Skew,kurtosis,trimmed,winsorized 3. Each sample is randomly selected and independent.
# In ANOVA, p value=0.0207<alpha, so we reject H0, indicating that mean days
#function for computing skewness
are significantly different over four age groups.
skew=function(x) {n=length(x)
m2=sum((x-mean(x))^2)/n
m3/m2^(3/2)*sqrt(n*(n-1))/(n-2) }
skew(wip1)
skew(wip2)
#kurtosis
kurt=function(x) {
n=length(x)
(n-1)/((n-2)*(n-3))*((n+1)*m4/m2^2-3*(n-1)) }
kurt(wip1) State 2 assumptions for Wilcox.test:
#trimmed mean <10% can ignore 1. Dependent samples – the two samples need to be dependent observations of
mean(days, trim=0.2) the cases. The Wilcoxon sign test assess for differences between a before and
#Winsorized mean after measurement, while accounting for individual differences in the baseline.
n=length(days) 2. Independence – The Wilcoxon sign test assumes independence, meaning that
a=0.2 the paired observations are randomly and independently drawn.
days1=sort(days)
days1[1: (n*a)]=days1[n*a+1] Z-test:
days1[(n-n*a+1):n]=days1[n-n*a] 1. obs are ind and errors are normally distributed, sampling is random
mean(days1) T-test:
huber(days, k =1.5) #winsorized at default k=1.5*sd Normally distributed, homogeneity of variance. Large sample sized used.
### Inference for multiple groups
Categorical Data
flex=read.table("C:/Users.txt",header=T)
attach(flex)
diff=before-after
ncount=sum(sign(diff[diff>0])) #number of positive signs
#sign test for two paired samples
binom.test(ncount,length(diff),0.5)
#Decision: p value=0.109>0.05, don't reject H0 that prob of success=0.5.
#Conclusion: Therefore we conclude that employees' attitudes toward job have
CI and Hypothesis testing
no significant difference between before and after the program.
#(one-sample) Wilcoxon rank sum test for the sample of difference
wilcox.test(diff) #or equivalently wilcox.test(before, after, paired=T)
n=length(x) #p-value-0.0137 <0.05, reject H0 that mean(diff)=0.
xbar=mean(x) #xbar=1.895 Accept Reject
alpha=0.1
tstar=qt(1-alpha/2,df=n-1)
SE=sd(x)/sqrt(n)
cat("90% t CI: (", c(xbar-tstar*SE,xbar+tstar*SE), ")")
zstar=qnorm(1-alpha/2)
cat("90% z CI: (", c(xbar-zstar*SE,xbar+zstar*SE),")")
t.test(x, conf.level=0.9)
#C) one-sided interval
tstar=qt(1-alpha,df=n-1)
SE=sd(x)/sqrt(n)
cat("t 90% confidence upper bound: (-inf, ", xbar+tstar*SE, ")")
#t 90% confidence upper bound: (-inf, 1.954251 )#
zstar=qnorm(1-alpha)
cat(" z 90% confidence upper bound: (-inf, ", xbar+zstar*SE, ")")
#z 90% confidence upper bound: (-inf, 1.948666 )#
t.test(x, conf.level=0.9, alt="less")
#The three methods provide the almost same result.
#H0:p<=0.1 Ha: p>0.1 ; prop.test vs manually

p0=0.1; n=25000; ph=2700/n; se=sqrt(p0*(1-p0)/n); pv=1-pnorm((ph-p0)/se)
prop.test(2700, 25000, p=0.1, alt="greater") #alt=”less”
n=100; phat=53/n; SE=sqrt(phat*(1-phat)/n); alpha=0.05; zstar=qnorm
(alpha/2)
CI=c(phat-zstar*SE,phat+zstar*SE); prop.test(53,100, conf.level=0.95)
µ = 10 vs HA : µ ≠ 10.
Basic
varnames=c("a","b","c");
lab=read.fwf("C:/.txt",header=F,col.names=varnames,width=c(a,b,c))
lab2merge=merge(lab2, lab2test, by="id"); lab2merge[height>182,];
lab2merge[height>182,6]
#remove id 211 - lab2remo=lab2[id<211|id>211,]
lab2new[id==211,4]=80 # replace the original value by 80
lab2sort=lab2f[rev(order(lab2f[,3])),]
lab2sort[2,c(3,4,6)] #showing data of col 3,4,6 of 2nd highest
#Function for computing 4 moments
first4mom=function(x)
{ m=numeric(4)
m[1]=mean(x)
for(i in 2:4){
m[i]=sum((x-m[1])^i)/length(x) #or simply use: mean((x-m[1])^i)
} #end of for loop
m
} #end of function
#Recursive: xn = 3xn-1 – 2xn-2 for n >2, x1=0 x2=1
x=numeric(30)
x[1]=0;x[2]=1
for(i in 3:30)
{x[i]=3*x[i-1]-2*x[i-2]
cat("i=",i,"x(i)=",x[i],"\n") #to show every step result in the loop
}
cat("the 30th number is", x[30], "\n")
#answer: the 30th number is 536870911.
#Or Write bisection method as a function
findroot=function(xmin,xmax, f){
xmid=(xmin+xmax)/2
while (abs(f(xmid))>1e-6){
if (f(xmid)*f(xmin)<0){xmax=xmid}
Distribution else {xmin=xmid}
cat(xmin,",",xmax,",",xmid,"\n")
xmid=(xmin+xmax)/2
}
xmid
}
Pdf: λe−λx, for x ≥ 0, Cdf: 1 – e-λx f1=function(x) {(-2)*x^2-5*x+7}
findroot(-4,0,f1) #find a root within interval [-4, 0].
a) What is the interquartile range of the compensation? Is the

distribution left skewed or right skewed? Can you give me an upper
bound and a lower bound of the 40 (sample) percentile?
b) Write the R code to find the proportion of CEO’s who received more
than 100 units (i.e. one million dollars).
c) Write the R code to find the mean compensation for the bottom 10% of
CEO’s.
double exponential cauchy # (a) (3Q-1Q = IQR) 27.5; right skewed; lower bound:14.0, upper bound:27
# (b) sum(exec.pay > 100)/airlength(exec.pay)
# (c) mean(exec.pay[exec.pay <= quantile(exec.pay, 0.1)])
># Calculate MAD : median(abs(x-median(x)))
Chapter 8: The Efficient Market Hypotheisis
#weibull
#binom
> Nsim=10^4
Simulation
> X=rep(0,Nsim)
> for (i in 1:Nsim){
+ z=rnorm(1,mean=mu,sd=sigma)
+ while(z<a) z=rnorm(1,mean=mu,sd=sigma)
+ X[i]=z} ; and evaluate the algorithm for µ = 0, α = 1, and various values of
#Row percentage table will be most informative for these data. Because:
labtest/rowSums(labtest)*100
Row percentage table can show that most improper tests (60%) occurred in
the evening shift, while most proper tests (68%) were in the day shift
sample(1:4, 30, replace=T, prob=p) ; p=c(0.2,0.2,0.6)
1/M = probability of acceptance

Cheat Sheet F

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Cheat Sheet F

Diunggah oleh

Hak Cipta:

Format Tersedia

Matrix mpg=c(11.4,13.1,14.7,14.7,15.0,15.5,15.6,15.9,16.0,16.

z-test and ANOVA

#H0:p<=0.1 Ha: p>0.1 ; prop.test vs manually

a) What is the interquartile range of the compensation? Is the

sample(1:4, 30, replace=T, prob=p) ; p=c(0.2,0.2,0.6)

1/M = probability of acceptance

Anda mungkin juga menyukai