Console Help
you wont be
able to make delicious cake.
Numeric
Integer
Character
Logical (True/ False)
Complex (rarely used)
Equal uses ==
(double =)
Infinity
Not a Number
Not Available
i.e. missing value
<-
Assign value
Put the same type of data points into a long vector
Element-wise
computation
Take the value(s) you want from a vector
Subset by position
Subset by logic
Take the value(s) you want from a vector
Subset by name
R is an environment in which statistical
methods are performed
R use element-wise vectorized operations
Basic data types in R and how to do subsetting
Input Function Output
Second argument
args(function name)
to reveal arguments in
that function
> mean(x) > class(x)
> sum(x) > is.numeric(x)
> sd(x) > is.integer(x)
> var(x) > is.character(x)
> median(x) > is.factor(x)
> max(x) > is.logical(x)
> min(x) > as.numeric(x)
> summary(x) > as.factor(x)
Popular functions
Takes two arguments
Your function name
Function()
2016 1988 = 22
Try use paste() inside your function
and also if .. else !
Well done guys!!
mutate
mtcars$mpg
Your turn to answer these three questions
using mtcars
which.min
Answer
A good data analyst knows
how to treat his/her data
Data manipulation
dplyr the most useful package for data manipulation
created by Hadley Wickham
Object + Verb
data frame select() + rename()
filter()
mutate()
arrange()
summarise()
select
select (dataframe, column1, column2, )
You have
selected column
1:5 and 8
select
select (dataframe, starts_with())
starts_with()
ends_with()
contains()
Filter() makes it
very easy to
select the rows
matched your
criteria
filter(mtcars, am == 1)
filters :D
filter (dataframe, criterion1, criterion2)
We want to filter
cyl between 4 to
6
mutate
mutate (dataframe, new_column = )
gear*carb GC
Alternatively, you can do it in multiple lines using assign (to create new dataframe)
We just use mtcars data %>% select the columns we want %>% then
we filter weitht higher than 4.00 %>% and mutate new column GC
arrange
arrange (dataframe, column you want to arrange)
group_by
Together with summarise()
can give you much better
insights about your data
Automatic or manual gear car,
which type has higher average weight?
group_by summarise
0 = automatic gear
Manual gear car
1 = manual gear
seems to be lighter!
Quiz
Install new package hflights and load new data set into R
Your boss wants to know the average distance
ordered from high to low by carrier?
mark property
This is what you will be able to create by end of this course.
map
set
Map vs. Set
= :=
Understanding = vs.
:= is crucial for ggvis
users
layer_points()
circle (default)
square
cross
diamond
triangle-up
triangle-down
Histogram is a very popular
graph to see the distribution
of continuous variable, you
can set change bin width
easily in ggvis
factor() is used to
change class of a variable
~interaction( , .)
creates combinations
3 cyl
2 am
So we have 3 x 2 = 6 groups
Try iris dataset, use layer_densities() to plot Sepal.Length grouped by Species
titles
Simple to use ;)
add_legend
Nice one !!
customize
hp is numeric, so we use
scale_numeric() to set colors :D
Model
performance/ Data Selection
validation
Model selection
Supervised Unsupervised
Regression Clustering
Classification
CARET
(For supervised learning)
first ML
https://www.analyticsvidhya.com/
https://www.r-bloggers.com/
http://stackoverflow.com/
Now you have completed R Basics.
Hope you enjoy the class!