INFOGRAPHICS
SEARCH... GO
✖
Love this Post? Spread the Word!
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 1/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
Table of Contents
1. What is dplyr?
2. What's special about dplyr?
3. dplyr vs. Base R Functions
4. Important dplyr Functions to remember
5. dplyr Practical Examples
What is dplyr?
The dplyr is a powerful R-package to manipulate, clean
and summarize unstructured data. In short, it makes
data exploration and data manipulation easy and fast
in R.
✖
Love this Post? Spread the Word!
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 2/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
dplyr Tutorial
install.packages("dplyr")
library(dplyr)
Selecting columns
select() SELECT
(variables)
✖
Download
Love thisthe Dataset
Post?
Share
Spread the Word!
Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 5/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
mydata =
read.csv("C:\\Users\\Deepanshu\\Docu
ments\\sampledata.csv")
sample_n(mydata,3)
✖
Love this Post? Spread the Word!
Example 2 : Selecting Random Fraction
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 6/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
of Rows
sample_frac(mydata,0.1)
x1 = distinct(mydata)
select( ) Function
mydata3 = select(mydata,
starts_with("Y"))
mydata33 = select(mydata, -
starts_with("Y"))
✖
Love this Post? Spread the Word!
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 9/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
Helpers Description
✖
Love this Post? Spread the Word!
New order of variables are displayed below -
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 10/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
rename( ) Function
mydata6 = rename(mydata,
Index1=Index)
Output
✖
Love this Post? Spread the Word!
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 11/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
lter( ) Function
✖
Love mydata9
this Post?= Spread
lter(mydata6, Index %in%
Share
the Word!
Share
c("A", "C") | Y2002 Tweet
>= 1300000) Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 13/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
summarise( ) Function
✖
Love this Post? Spread the Word!
Example
Share 18 : Summarize
Share selected
Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 14/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
variables
summarise(mydata, Y2015_mean =
mean(Y2015),
Y2015_med=median(Y2015))
Output
summarise_at(mydata, vars(Y2005,
Y2006), funs(n(), mean, median))
Output
summarise_at(mydata, vars(Y2011,
Y2012),
funs(n(), missing = sum(is.na(.)),
mean(., na.rm = TRUE), median(.,na.rm
= TRUE)))
Summarize : Output
set.seed(222)
mydata <-
data.frame(X1=sample(1:100,100),
X2=runif(100))
summarise_at(mydata,vars(X1,X2),
function(x) var(x - mean(x)))
✖
X1 X2
Love this Post? Spread the Word!
1 841.6667 0.08142161
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 16/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
summarise_if(mydata, is.numeric,
funs(n(),mean,median))
Alternative Method :
First, store data for all the numeric variables
numdata =
mydata[sapply(mydata,is.numeric)]
summarise_all(numdata,
funs(n(),mean,median))
✖
Love this Post? Spread the Word!
Share
summarise_all(mydata["Index"],
Share Tweet Subscribe
funs(nlevels(.), nmiss=sum(is.na(.))))
https://www.listendata.com/2016/08/dplyr-tutorial.html 17/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
nlevels nmiss
1 19 0
arrange() function :
Syntax
arrange(data_frame, variable(s)_to_sort)
or
data_frame %>% arrange(variable(s)_to_
sort)
arrange(mydata, desc(Index), Y2011)
✖
Love this Post? Spread the Word!
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 18/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
dt = sample_n(select(mydata, Index,
State),10) ✖
Love this Post? Spread the Word!
Share
or
Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 19/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
Output
group_by() function :
Use : Group data by categorical variable
Syntax :
group_by(data, variables)
or
data %>% group_by(variables)
t = summarise_at(group_by(mydata, ✖
Love Index),
this Post?
Share
Spread the
vars(Y2011,
Share
Word! funs(n(),
Y2012),
Tweet Subscribe
mean(., na.rm = TRUE)))
https://www.listendata.com/2016/08/dplyr-tutorial.html 20/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
do() function :
Use : Compute within groups
Syntax :
do(data_frame,
expressions_to_apply_to_each_group) ✖
Love this Post? Spread the Word!
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 21/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
✖
t = mydata %>% select(Index, Y2015)
Love this Post? Spread the Word!
%>%
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 22/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
Output
Index Y2015
1 A 1647724
2 C 1330736
3 I 1583516
t = mydata %>%
group_by(Index)%>%
summarise(Mean_2014 = mean(Y2014,
na.rm=TRUE),
Mean_2015 = mean(Y2015,
na.rm=TRUE)) %>%
arrange(desc(Mean_2015))
mutate() function :
mutate(data_frame, expression(s) )
or
data_frame %>% mutate(expression(s))
mydata1 = mutate(mydata,
change=Y2015/Y2014)
✖
Love this Post? Spread the Word!
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 24/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
mydata11 = mutate_all(mydata,
funs("new" = .* 1000))
Output
Warning messages:
1: In Ops.factor(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L,
5L, 6L, :
‘*’ not meaningful for factors
2: In Ops.factor(1:51, 1000) : ‘*’ not meaningful for
factors
mydata12 = mutate_at(mydata,
vars(Y2008:Y2010),
funs(Rank=min_rank(.)))
Output
mydata13 = mutate_at(mydata,
vars(Y2008:Y2010),
funs(Rank=min_rank(desc(.))))
✖
Love this Post? Spread the Word!
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 26/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
join() function :
inner_join(x, y, by = )
left_join(x, y, by = )
right_join(x, y, by = )
full_join(x, y, by = )
semi_join(x, y, by = )
anti_join(x, y, by = )
c =rnorm(5),
d =letters[2:6])
✖
Love this Post? Spread the Word!
Share DataVertically Tweet Subscribe
Combine Share
https://www.listendata.com/2016/08/dplyr-tutorial.html 29/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
union(x, y)
Rows that appear in either or both x and y.
setdiff(x, y)
Rows that appear in x but not y.
✖
x=data.frame(ID = 1:6, ID1= 1:6)
Love this Post? Spread the Word!
y=data.frame(ID
Share = 1:6, ID1
Share = 1:6)
Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 30/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
union(x,y)
union_all(x,y)
Syntax :
✖
If aLove
value is less than 5, add
theit Word!
to 1 and if it is greater
or
than
this
Share
Post?
equal to 5,
Spread
addShare Tweet
it to 2. Otherwise 0. Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 31/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
df =data.frame(x = c(1,5,6,NA))
df %>% mutate(newvar=if_else(x<5, x+1,
x+2,0))
Output
Nested IF ELSE
Output
x flag
1 1 I am one
2 2 I am two
3 3 I am three
4 4 Others
5 5 Others
✖
6 NA I am missing
Love this Post? Spread the Word!
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 32/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
Important Point
df = mydata %>%
rowwise() %>% mutate(Max= ✖
Love this Post? Spread the Word!
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 33/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
max(Y2012,Y2013,Y2014,Y2015)) %>%
select(Y2012:Y2015,Max)
Input Datasets
xy = bind_rows(df1,df2)
xy = rbind(df1,df2)
✖
Love this Post? Spread the Word!
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 34/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
xy = bind_cols(x,y)
or
xy = cbind(x,y)
cbind Output
probs=0.75),
Pecentile_99=quantile(Y2015,
probs=0.99))
x= data.frame(N= 1:10)
x = mutate(x, pos = ntile(x$N,5))
length(unique(mtcars$cyl))
Result : 3
coef(summary(.$mod)))
)
✖
Love this Post? Spread the Word!
Example 44 : Number of levels in factor
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 37/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
variables
Like select_if() function, summarise_if() function lets
you to summarise only for variables where logical
condition holds.
summarise_if(mydata, is.factor,
funs(nlevels(.)))
variable State.
mydata11 = mutate_if(mydata,
is.numeric, funs("new" = .* 1000))
Endnotes ✖
Love this Post? Spread the Word!
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 38/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
About Author:
Deepanshu founded ListenData with a simple objective - Make analytics
easy to understand and follow. He has over 8 years of experience in data
science. During his tenure, he has worked with global clients in various
domains like Banking, Insurance, Telecom and Human Resource.
While I love having friends who agree, I only learn from those who don't
Let's Get Connected: Email | LinkedIn
Related Posts:
Add JavaScript and CSS in Shiny
Install and Load Multiple R Packages
R Dplyr : Data Manipulation (50 Examples)
Create Infographics with R
How to build login page in R Shiny App
Create Animation in R : Learn by Examples
Reply
Replies
Reply
Reply
thx
Reply
Reply
Replies
Very helpfull.
Reply
Reply
✖
Love this Post? Spread the Word!
Share
Replies Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 40/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
Reply
Thank you, this indeed very helpful and precise. Great Job!
Reply
Replies
Reply
Reply
Replies
Love this Post? your another
Spread tutorial - data.table. It's quite
the Word!
interesting to learn R from your blog posts.
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 41/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
Reply
100
===============
The fix is change nlevels() to length(unique(.)) as below
summarise_all(dt["Index"], funs(length(unique(.)),
sum(is.na(.))))
# A tibble: 1 × 2
length sum
1 19 0
Reply
Replies
Reply
Example #21
Reply
✖
Love this
Share
Post? Spread the Word!
Replies
Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 42/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
Reply
Reply
Thanks
Reply
Reply
Reply
Reply
✖
Love this Post? Spread the Word!
Share
Anonymous 30 August 2017 at 15:19
Share Tweet Subscribe
Example 39 is wrong.
https://www.listendata.com/2016/08/dplyr-tutorial.html 43/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
Reply
Replies
Reply
Thanks!
Reply
Reply
Replies
Reply
Amazing!
Reply
Reply
✖
Love this Post? Spread the Word!
Share February
Anonymous 24 Tweet
Share 2018 at 21:38 Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 44/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
Reply
Replies
Reply
Great tutorial.
Reply
Reply
Reply
Replies
Cheers!
Reply
Reply
✖
Love this Post? Spread the Word!
Share
Rishika Malik 13 July 2018 at 03:28
Share Tweet Subscribe
love the detailed yet simple explanations.
https://www.listendata.com/2016/08/dplyr-tutorial.html 45/47
7/15/2019 R Dplyr : Data Manipulation (50 Examples)
Reply
Please help..
do(group_by(filter(data,Index%in%c("C","A","I"))),head(.,2))
Reply
Reply
Reply
Reply
Replies
Reply
Publish Preview
← PREV NEXT →
✖
Love this Post? Spread the Word!
Share Share Tweet Subscribe
https://www.listendata.com/2016/08/dplyr-tutorial.html 47/47