Edwin Ardiansyah
RStudio is a powerful and productive user interface for R. It's free and open source
BASIC FUNCTION
Calculator
R can be used as calculator to solve various arithmetic functions
Functions
Example of math functions: exp( ), sqrt( ), log( ), sin( ), cos( ), tan( ), abs( )
There are a number of ways to get help in R, and there is also a wide variety of online information.
Most installations of R come with a reasonably detailed help file called "An Introduction to R", but
this can be rather technical for first-time users of a statistics package.
DATA MANAGEMENT
Importing dataset
To import a dataset in csv format use the function read.csv( )
In RStudio click the Import Dataset in the environment
Each one of these variables corresponds to a question that was asked in A survey. For example, for
genhlth, respondents were asked to evaluate their general health, responding either excellent, very
good, good, fair or poor. The exerany variable indicates whether the respondent exercised in the
past
month (1) or did not (0). Likewise, hlthplan indicates whether the respondent had some form of
health coverage (1) or did not (0). The smoke100 variable indicates whether the respondent had
smoked at least 100 cigarettes in her lifetime. The other variables record the respondents height in
inches, weight in pounds as well as their desired weight,wtdesire, age in years, and gender.
Suppose we want to sort our data by age and store the result in new object called cdcs. The
following are the codes and display of the result.
For continuous variable, we can get a summary that include minimum and maximum value, median,
mean, first and third quartile, and missing values if any, by assigning a function summary( )
We can see that the wgroup is still in numeric, we want to change the variable to nominal and give
labels to it. R refers nominal variable as factor.
Now the wgroup variable has been changed to nominal and labeled.
Save object
R does not automatically store our modified object. We can save our modified table by running
function write.csv( ) by setting up first our working directory.
GRAPHICS IN R
Histogram
One way to display the distribution of continuous and discrete variables is to construct a histogram.
Boxplot
Another way to summarize data that are measured on continuous or discrete is
to construct a boxplot. It is also often used in exploratory data analysis to show the
shape of the distribution, its central value, and variability. It is especially helpful for
indicating whether a distribution is skewed and whether there are any unusual
observations or outliers in the data set.
The dots above the boxplot are outliers. Suppose we want to remove the outliers from the plot, we
can add the outline = F argument to better see the distribution.
Scatter Plot
A scatterplot is a useful summary of a set of bivariate data (two variables), usually drawn
before working out a linear correlation coefficient or fitting a regression line, as part of
exploratory data analysis. It gives a good visual picture of the relationship between the two
variables, and aids the interpretation of the correlation coefficient or regression model.
Bar chart
A bar graph is composed of discrete bars that represent different categories of data. Its height is
equal to the quantity or frequency within that category data. It is useful for displaying
categorical data and it is best used to compare values across categories.
STATISTICAL PROCEDURE
Test of normality
Normality test is done to make an objective measure of the data distribution. Suppose we want to
perform normality test for weight based on gender, the following are the codes.
Wilcoxon rank sum test
Now lets try to compare weight in male and female. Lets say the normality assumption of the
weight of two groups was not hold after several attempts of transformation. Hence, non-
parametric test was performed and median would be compared instead.
Chi square test
Baseline characteristics (could be demographics, known risk factors or confounders) are often
compared in observational studies (bivariate analysis) before proceeding to multivariable analysis
(i.e. logistic regression). Hence, many of them are binary such as gender and smoking status.
Suppose we wished to compare proportion of people who had smoke 100 cigarettes in their lifetime
among male and female.
Packages provide extension functions despite the basic function that R brings.
THANK YOU