Stata Cheat Sheets

Data Processing
with Stata 14.1 Cheat Sheet

For more info see Statas reference manual (stata.com)
Basic Syntax
All Stata functions have the same format (syntax):
[byvarlist1:]
command
[varlist2]
apply the
command across
each unique
combination of
variables in
varlist1
Useful Shortcuts
keyboard buttons
F2
describe data
Ctrl + 8
open the data editor
clear
delete data in memory
Ctrl + 9
open a new .do file
Ctrl + D
highlight text in .do file,
then ctrl + d executes it
in the command line
Arithmetic
PgUp
+ combine (strings)
Tab
cls
scroll through previous commands
add (numbers)
autocompletes variable name after typing part
subtract
clear the console (where results are displayed)
* multiply
Set up
pwd
print current (working) directory
cd "C:\Program Files (x86)\Stata13"
change working directory
dir
display filenames in working directory
fs *.dta
List all Stata data in working directory underlined parts
are shortcuts
capture log close
use "capture"
close the log on any existing do files or "cap"
log using "myDoFile.txt", replace
create a new log file to record your work and results
search mdesc
packages contain
find the package mdesc to install extra commands that
expand Statas toolkit
ssc install mdesc
install the package mdesc; needs to be done once
Import Data
sysuse auto, clear
for many examples, we
load system data (Auto data) use the auto dataset.
use "yourStataFile.dta", clear
load a dataset from the current directory frequently used
commands are
import excel "yourSpreadsheet.xlsx", /*
highlighted in yellow
*/ sheet("Sheet1") cellrange(A2:H11) firstrow
import an Excel spreadsheet
import delimited"yourFile.csv", /*
*/ rowrange(2:11) colrange(1:8) varnames(2)
import a .csv file
webuse set "https://github.com/GeoCenter/StataTraining/raw/master/Day2/Data"
webuse "wb_indicators_long"
set web-based directory and load data from the web
[=exp]
[ifexp]
[inrange]
column to save output as condition: only apply to

apply
a new variable apply the function specific rows
command to
if something is true
bysort rep78 : summarize
price
[weight]
[usingfilename]
[,options]
apply
weights
pull data from a file

(if not loaded)
special options
for command
In this example, we want a detailed summary

with stats like kurtosis, plus mean and median
if foreign == 0 & price <= 9000, detail
To find out more about any command like what options it takes type helpcommand
AT COMMAND PROMPT
PgDn
function: what are

you going to do
to varlists?
/ divide
^ raise to a power
Basic Data Operations

Logic
&
and
! or ~ not
| or
== tests if something is equal

= assigns a value to a variable
== equal
!= not
or
~= equal
if foreign != 1 & price >= 10000

make
Chevy Colt
Buick Riviera
Honda Civic
Volvo 260
foreign
0
0
1
1
price
3,984
10,372
4,499
11,995
< less than

<= less than or equal to
> greater than
>= greater or equal to
if foreign != 1 | price >= 10000
make
Chevy Colt
Buick Riviera
Honda Civic
Volvo 260
foreign
0
0
1
1
price
3,984
10,372
4,499
11,995
Explore Data
VIEW DATA ORGANIZATION
describe make price

display variable type, format,
and any value/variable labels
count
count if price > 5000
number of rows (observations)
Can be combined with logic
ds, has(type string)
lookfor "in."
search for variable types,
variable name, or variable label
isid mpg
check if mpg uniquely
identifies the data
SEE DATA DISTRIBUTION
codebook make price

overview of variable type, stats,
number of missing/unique values
summarize make price mpg
print summary statistics
(mean, stdev, min, max)
for variables
inspect mpg
show histogram of data,
number of missing or zero
observations
histogram mpg, frequency
plot a histogram of the
distribution of a variable
BROWSE OBSERVATIONS WITHIN THE DATA
Missing values are treated as the largest
browse or
Ctrl + 8 positive number. To exclude missing
values, use the !missing(varname) syntax
open the data editor
list make price if price > 10000 & !missing(price)
clist ... (compact form)
list the make and price for observations with price > $10,000
display price[4]
display the 4th observation in price; only works on single values
gsort price mpg (ascending)
gsort price mpg (descending)
sort in order, first by price then miles per gallon
duplicates report
assert price!=.
finds all duplicate values in each variable
verify truth of claim
levelsof rep78
display the unique values for rep78
Tim Essam (tessam@usaid.gov) Laura Hughes (lhughes@usaid.gov)

follow us @StataRGIS and @flaneuseks
inspired by RStudios awesome Cheat Sheets (rstudio.com/resources/cheatsheets)
Change Data Types
Stata has 6 data types, and data can also be missing:

no data
true/false
numbers
words
missing
string int long float double
byte
To convert between numbers & strings:
gen foreignString = string(foreign)
"1"
1 tostring foreign, gen(foreignString)
"1"
decode foreign , gen(foreignString)
"foreign"
1
gen foreignNumeric = real(foreignString)

"1"
"1"
destring foreignString, gen(foreignNumeric)
encode foreignString, gen(foreignNumeric) "foreign"
recast double mpg

generic way to convert between types
Summarize Data
include missing values create binary variable for every rep78
value in a new variable, repairRecord
tabulate rep78, mi gen(repairRecord)

one-way table: number of rows with each value of rep78
tabulate rep78 foreign, mi
two-way table: cross-tabulate number of observations
for each combination of rep78 and foreign
bysort rep78: tabulate foreign
for each value of rep78, apply the command tabulate foreign
tabstat price weight mpg, by(foreign) stat(mean sd n)
create compact table of summary statistics
displays stats
formats numbers for all data
table foreign, contents(mean price sd price) f(%9.2fc) row

create a flexible table of summary statistics
collapse (mean) price (max) mpg, by(foreign) replaces data
calculate mean price & max mpg by car type (foreign)
Create New Variables
generate mpgSq = mpg^2 gen byte lowPr = price < 4000

create a new variable. Useful also for creating binary
variables based on a condition (generate byte)
generate id = _n
bysort rep78: gen repairIdx = _n
_n creates a running index of observations in a group
generate totRows = _N
bysort rep78: gen repairTot = _N
_N creates a running count of the total observations per group
pctile mpgQuartile = mpg, nq = 4
create quartiles of the mpg data
see help egen
egen meanPrice = mean(price), by(foreign)
calculate mean price for each group in foreign for more options
geocenter.github.io/StataTraining
Disclaimer: we are not affiliated with Stata. But we like it.
updated January 2016

CC BY 4.0
Data Transformation

Select Parts of Data (Subsetting)

SELECT SPECIFIC COLUMNS
drop make
remove the 'make' variable
keep make price
opposite of drop; keep only variables 'make' and 'price'
FILTER SPECIFIC ROWS
drop if mpg < 20

drop in 1/4
drop observations based on a condition (left)
or rows 1-4 (right)
keep in 1/30
opposite of drop; keep only rows 1-30
keep if inrange(price, 5000, 10000)
keep values of price between $5,000 $10,000 (inclusive)
keep if inlist(make, "Honda Accord", "Honda Civic", "Subaru")
keep the specified values of make
sample 25
sample 25% of the observations in the dataset
(use set seed # command for reproducible sampling)
webuse set https://github.com/GeoCenter/StataTraining/raw/master/Day2/Data

webuse "coffeeMaize.dta"
load demo dataset
MELT DATA (WIDE LONG)
reshape variables starting

with coffee and maize
unique id create new variable which captures

variable (key) the info in the column names
reshape long coffee@ maize@, i(country) j(year)

new variable
convert a wide dataset to long
TIDY DATASETS have
WIDE
LONG (TIDY) each observation
country
year coffee maize
melt
coffee maize maize
in its own row and
country coffee
2011
2012
2011 2012
Malawi
2011
each variable in its
Malawi
2012
Malawi
Rwanda
2011
own column.
Rwanda
Rwanda
Uganda
Uganda
cast
Uganda
CAST DATA (LONG WIDE)
2012
2011
2012
what will be create new variables

unique id with the year added
variable (key) to the column name
create new variables named

coffee2011, maize2012...
reshape wide coffee maize, i(country) j(year)

convert a long dataset to wide
When datasets are

tidy, they have a
consistent,
standard format
that is easier to
manipulate and
analyze.
xpose, clear varname

transpose rows and columns of data, clearing the data and saving
old column names as a new variable called "_varname"
Replace Parts of Data

rename (rep78 foreign) (repairRecord carType)
rename one or multiple variables
REPLACE MISSING VALUES
useful for cleaning survey datasets

mvdecode _all, mv(9999)
replace the number 9999 with missing value in all variables
mvencode _all, mv(9999)
useful for exporting data
replace missing values with the number 9999 for all variables
Label Data
Value labels map string descriptions to numbers. They allow the
underlying data to be numeric (making logical tests simpler)
while also connecting the values to human-understandable text.
label define myLabel 0 "US" 1 "Not US"
label values foreign myLabel
define a label and apply it the values in foreign
note: data note here
place note in dataset

id
blue
webuse coffeeMaize2.dta, clear

save coffeeMaize2.dta, replace load demo data
webuse coffeeMaize.dta, clear
pink
id
CHANGE ROW VALUES
replace price = 5000 if price < 5000

replace all values of price that are less than $5,000 with 5000
recode price (0 / 5000 = 5000)
change all prices less than 5000 to be $5,000
recode foreign (0 = 2 "US")(1 = 1 "Not US"), gen(foreign2)
change the values and value labels then store in a new
variable, foreign2
blue
pink
blue
pink
should
contain
the same
variables
(columns)
append using "coffeeMaize2.dta", gen(filenum)

add observations from "coffeeMaize2.dta" to
current data and create variable "filenum" to
track the origin of each observation
MERGING TWO DATASETS TOGETHER

must contain a
common variable
id blue pink (id) id brown
ONE-TO-ONE
id
blue
pink brown _merge

3
3
3
MANY-TO-ONE
id
blue
pink
id brown
id
blue
pink brown _merge
_merge code
1 row only
(master) in ind2
2 row only
(using) in hh2
3 row in
(match) both
3
1
3
3
1
2
webuse ind_age.dta, clear

save ind_age.dta, replace
webuse ind_ag.dta, clear
merge 1:1 id using "ind_age.dta"
one-to-one merge of "ind_age.dta"
into the loaded dataset and create
variable "_merge" to track the origin
webuse hh2.dta, clear
save hh2.dta, replace
webuse ind2.dta, clear
merge m:1 hid using "hh2.dta"
many-to-one merge of "hh2.dta"
into the loaded dataset and create
variable "_merge" to track the origin
FUZZY MATCHING: COMBINING TWO DATASETS WITHOUT A COMMON ID

reclink match records from different data sets using probabilistic matching ssc install reclink
jarowinkler create distance measure for similarity between two strings ssc install jarowinkler
FIND MATCHING STRINGS
display strmatch("123.89", "1??.?9")

return true (1) or false (0) if string matches pattern
display substr("Stata", 3, 5)
return the string located between characters 3-5
list make if regexm(make, "[0-9]")
list observations where make matches the regular
expression (here, records that contain a number)
list if regexm(make, "(Cad.|Chev.|Datsun)")
return all observations where make contains
"Cad.", "Chev." or "Datsun"
compare the given list against the first word in make
TRANSFORM STRINGS
Combine Data
id
GET STRING PROPERTIES
display length("This string has 29 characters")

return the length of the string
* user-defined package
charlist make
display the set of unique characters within a string
display strpos("Stata", "a")
return the position in Stata where a is first found
list if inlist(word(make, 1), "Cad.", "Chev.", "Datsun")

return all observations where the first word of the
make variable contains the listed words
ADDING (APPENDING) NEW DATA
CHANGE COLUMN NAMES
label list
list all labels within the dataset
Manipulate Strings
Reshape Data
display regexr("My string", "My", "Your")

replace string1 ("My") with string2 ("Your")
replace make = subinstr(make, "Cad.", "Cadillac", 1)
replace first occurrence of "Cad." with Cadillac
in the make variable
display stritrim(" Too much Space")
replace consecutive spaces with a single space
display trim(" leading / trailing spaces ")
remove extra spaces before and after a string
display strlower("STATA should not be ALL-CAPS")
change string case; see also strupper, strproper
display strtoname("1Var name")
convert string to Stata-compatible variable name
display real("100")
convert string to a numeric or missing value
Save & Export Data
compress
compress data in memory
Stata 12-compatible file
save "myData.dta", replace
saveold "myData.dta", replace version(12)
save data in Stata format, replacing the data if
a file with same name exists
export excel "myData.xls", /*
*/ firstrow(variables) replace
export data as an Excel file (.xls) with the
variable names as the first row
export delimited "myData.csv", delimiter(",") replace
export data as a comma-delimited file (.csv)
updated March 2016

CC BY 4.0
Data Visualization
BASIC PLOT SYNTAX:
y3
main plot-specific options;

see help for complete set
graph bar (count), over(foreign, gap(*0.5)) intensity(*0.5)

graph hbar draws horizontal bar charts
graph bar (percent), over(rep78) over(foreign)
23
20
graph hbar ...
(asis) (percent) (count) over(<variable>, <options: gap(*#)

relabel descending reverse>) cw missing nofill allcategories
percentages stack bargap(#) intensity(*#) yalternate xalternate
DISCRETE X, CONTINUOUS Y
graph bar (median) price, over(foreign)
twoway pcspike wage68 ttl_exp68 wage88 ttl_exp88

(sysuse nlswide1)
Parallel coordinates plot
twoway scatter mpg weight, jitter(7)
twoway pccapsym wage68 ttl_exp68 wage88 ttl_exp88

Slope/bump plot
(sysuse nlswide1)
vertical, horizontal
half jitter(#) jitterseed(#)

diagonal [aweights(<variable>)]
17
10
vertical horizontal headlabel
THREE VARIABLES
twoway contour mpg price weight, level(20) crule(intensity)
twoway scatter mpg weight, mlabel(mpg)

scatter plot with labelled values
3D contour plot
twoway connected mpg price, sort(price)
regress price mpg trunk weight length turn, nocons

matrix regmat = e(V)
ssc install plotmatrix
plotmatrix, mat(regmat) color(green)
ccuts(#s) levels(#) minmax crule(hue | chue| intensity)

scolor(<color>) ecolor (<color>) ccolors(<colorlist>) heatmap
interp(thinplatespline | shepard | none)
jitter(#) jitterseed(#) sort cmissing(yes | no)

connect(<options>) [aweight(<variable>)]
scatter plot with connected lines and symbols

jitter(#) jitterseed(#) sort
see also
connect(<options>) cmissing(yes | no)
graph hbar ...
bar plot (asis) (percent) (count) (stat: mean median sum min max ...)
over(<variable>, <options: gap(*#) relabel descending reverse

sort(<variable>)>) cw missing nofill allcategories percentages
stack bargap(#) intensity(*#) yalternate xalternate
line
twoway area mpg price, sort(price)

line plot with area shading
heatmap
dot plot (asis) (percent) (count) (stat: mean median sum min max ...)
over(<variable>, <options: gap(*#) relabel descending reverse

sort(<variable>)>) cw missing nofill allcategories percentages
linegap(#) marker(#, <options>) linetype(dot | line | rectangle)
dots(<options>) lines(<options>) rectangles(<options>) rwidth
graph hbox mpg, over(rep78, descending) by(foreign) missing

graph box draws vertical boxplots
box plot
over(<variable>, <options: total gap(*#) relabel descending reverse

sort(<variable>)>) missing allcategories intensity(*#) boxgap(#)
medtype(line | line | marker) medline(<options>) medmarker(<options>)
ssc install vioplot
violin plot over(<variable>, <options: total missing>)>) nofill
vertical horizontal obs kernel(<options>) bwidth(#)

barwidth(#) dscale(#) ygap(#) ogap(#) density(<options>)
bar(<options>) median(<options>) obsopts(<options>)
JUXTAPOSE (FACET)
Plot Placement
twoway scatter mpg price, by(foreign, norescale)
SUPERIMPOSE
total missing colfirst rows(#) cols(#) holes(<numlist>)

compact [no]edgelabel [no]rescale [no]yrescal [no]xrescale
[no]iyaxes [no]ixaxes [no]iytick [no]ixtick [no]iylabel
[no]ixlabel [no]iytitle [no]ixtitle imargin(<options>)
twoway mband mpg weight || scatter mpg weight
sort cmissing(yes | no) vertical, horizontal

base(#)
plot median of the y values

bands(#)
scatter y3 y2 y1 x, marker(i o i) mlabel(var3 var2 var1)

plot several y values for a single x value
graph twoway scatter mpg price in 27/74 || scatter mpg price /*

*/ if mpg < 15 & price > 12000 in 27/74, mlabel(make) m(i)
Laura Hughes (lhughes@usaid.gov) Tim Essam (tessam@usaid.gov)

follow us @flaneuseks and @StataRGIS
ssc install binscatter
binscatter weight mpg, line(none)
plot a single value (mean or median) for each x value
vertical, horizontal base(#) barwidth(#)
medians nquantiles(#) discrete controls(<variables>)

linetype(lfit | qfit | connect | none) aweight[<variable>]
FITTING RESULTS
twoway dot mpg rep78
twoway lfitci mpg weight || scatter mpg weight
dot plot
vertical, horizontal base(#) ndots(#)

dcolor(<color>) dfcolor(<color>) dlcolor(<color>)
dsize(<markersize>) dsymbol(<marker type>)
dlwidth(<strokesize>) dotextend(yes | no)
calculate and plot linear fit to data with confidence intervals
level(#) stdp stdf nofit fitplot(<plottype>) ciplot(<plottype>)

range(# #) n(#) atobs estopts(<options>) predopts(<options>)
twoway lowess mpg weight || scatter mpg weight
twoway dropline mpg price in 1/5
dropped line plot
calculate and plot lowess smoothing
twoway rcapsym length headroom price
calculate and plot quadriatic fit to data with confidence intervals
bwidth(#) mean noweight logit adjust
vertical, horizontal base(#)
twoway qfitci mpg weight, alwidth(none) || scatter mpg weight

level(#) stdp stdf nofit fitplot(<plottype>) ciplot(<plottype>)
range(# #) n(#) atobs estopts(<options>) predopts(<options>)
range plot (y1 y2) with capped lines

vertical horizontal
see also rcap
REGRESSION RESULTS
twoway rarea length headroom price, sort

vertical horizontal sort
cmissing(yes | no)
combine 2+ saved graphs into a single plot
combine twoway plots using ||
twoway bar price rep78
bar plot
range plot (y1 y2) with area shading
graph combine plot1.gph plot2.gph...
mat(<variable) split(<options>) color(<color>) freq
SUMMARY PLOTS
graph dot (mean) length headroom, over(foreign) m(1, ms(S))
vioplot price, over(foreign)
save
scatter plot of each combination of variables
jitter(#) jitterseed(#) sort cmissing(yes | no)

connect(<options>) [aweight(<variable>)]
(asis) (percent) (count) over(<variable>, <options: gap(*#)

relabel descending reverse>) cw missing nofill allcategories
percentages stack bargap(#) intensity(*#) yalternate xalternate
plot size
scatter plot
bar plot
annotations
xline(xint) yline(yint) text(y x "annotation")

axes
graph matrix mpg price weight, half

y2
kdensity mpg, bwidth(3)
facet
by(var)
TWO+ CONTINUOUS VARIABLES

y1
bin(#) width(#) density fraction frequency percent addlabels

addlabopts(<options>) normal normopts(<options>) kdensity
kdenopts(<options>)
grouped bar plot
[if], <plot options>
custom appearance
histogram
DISCRETE
plot-specific options
[in]
<marker, line, text, axis, legend, background options> scheme(s1mono) play(customTheme) xsize(5) ysize(4) saving("myPlot.gph", replace)
histogram mpg, width(5) freq kdensity kdenopts(bwidth(5))
bwidth kernel(<options>
normal normopts(<line options>)
y1 y2 yn x
titles
sysuse auto, clear
smoothed histogram
variables: y first
title("title") subtitle("subtitle") xtitle("x-axis title") ytitle("y axis title") xscale(range(low high) log reverse off noline) yscale(<options>)
ONE VARIABLE
CONTINUOUS
graph <plot type>
twoway rbar length headroom price

range plot (y1 y2) with bars
vertical horizontal barwidth(#) mwidth

msize(<marker size>)
regress price mpg headroom trunk length turn

ssc install coefplot
coefplot, drop(_cons) xline(0)
Plot regression coefficients
baselevels b(<options>) at(<options>) noci levels(#)

keep(<variables>) drop(<variables>) rename(<list>)
horizontal vertical generate(<variable>)
regress mpg weight length turn

margins, eyex(weight) at(weight = (1800(200)4800))
marginsplot, noci
Plot marginal effects of regression
horizontal noci
updated February 2016
CC BY 4.0
Plotting in Stata 14.1
Apply Themes
ANATOMY OF A PLOT
title
annotation
200
Customizing Appearance
plots contain many features
y-axis
y-axis title
150
graph region
inner graph region
inner plot region
marker label
marker
grid lines
3
0
y-line
outer region
inner region
scatter price mpg, graphregion(fcolor("192 192 192") ifcolor("208 208 208"))
20
40
x-axis
specify the fill of the background in RGB or with a Stata color
scatter price mpg, plotregion(fcolor("224 224 224") ifcolor("240 240 240"))
60
80
x-axis title
LINES / BORDERS
arguments for the plot
objects (in green) go in the
options portion of these
commands (in orange)
SYNTAX
marker
<marker
options>
for example:
scatter price mpg, xline(20, lwidth(vthick))
mcolor(none)
COLOR
mcolor("145 168 208")
specify the fill and stroke of the marker

in RGB or with a Stata color
mfcolor("145 168 208")
msize(medium)
SIZE / THICKNESSS
ehuge
vhuge
huge
vlarge
color("145 168 208")
Oh
Dh
Th
Sh
oh
dh
th
sh
none i
jitter(#)
jitterseed(#)
randomly displace the markers
set seed
lcolor(none)
marker mlcolor("145 168 208")
tlcolor("145 168 208")

glcolor("145 168 208")
lwidth(medthick)
marker
specify the thickness

(stroke) of a line:
tick marks
grid lines
vvvthick
vvthick
vthick
thick
medthick
medium
medsmall
small
vsmall
tiny
vtiny
specify the marker symbol:
titles
marker label
lcolor("145 168 208")
specify the stroke color of the line or border
line
axes
grid lines
solid
dash
dot
medthin
thin
vthin
vvthin
vvvthin
none
lpattern(dash)
glpattern(dash)
specify the
line pattern
longdash
longdash_dot
shortdash
shortdash_dot
dash_dot
blank
axes off no axis/labels

noline
tick marks noticks
tick marks tlength(2)
grid lines nogrid nogmin nogmax
axes
tick marks xlabel(#10, tposition(crossing))

number of tick marks, position (outside | crossing | inside)
Laura Hughes (lhughes@usaid.gov) Tim Essam (tessam@usaid.gov)

follow us @flaneuseks and @StataRGIS
title(...)
subtitle(...)
xtitle(...)
ytitle(...)
annotation
text(...)
specify the color of the text
USING A SAVED THEME
twoway scatter mpg price, scheme(customTheme)

Create custom themes by
help scheme entries saving options in a .scheme file

see all options for setting scheme properties
adopath ++ "~/<location>/StataThemes"
set path of the folder (StataThemes) where custom
.scheme files are saved
set as default scheme
set scheme customTheme, permanently

change the theme
axis labels
xlabel(...)
ylabel(...)
legend
legend(...)
install William Buchanans package to generate custom

schemes and color palettes (including ColorBrewer)
USING THE GRAPH EDITOR
twoway scatter mpg price, play(graphEditorTheme)
color(none)
Select the
Graph Editor
marker label mlabcolor("145 168 208")
labcolor("145 168 208")
axis labels
mlwidth(thin)
tlwidth(thin)
glwidth(thin)
Schemes are sets of graphical parameters, so you dont

have to specify the look of the graphs every time.
net inst brewscheme, from("https://wbuchanan.github.io/brewscheme/") replace
<marker
options>
medium
msymbol(Dh)
APPEARANCE
medlarge
tick marks
axes
marker
tick marks
tick marks
TEXT
<line options> <marker xscale(...) grid lines

options> yscale(...)
xline(...)
xlabel(...)
yline(...)
legend
ylabel(...)
legend(region(...))
grid lines
specify the marker size:
large
POSITION
mfcolor(none)
specify the fill of the marker
line
100
y2
Fitted values
legend
specify the fill of the plot background in RGB or with a Stata color
SYMBOLS
line
50
y-axis labels
plot region
10
100
y-axis title
titles
subtitle
specify the size of the text:

size(medsmall)
marker label mlabsize(medsmall)
axis labels labsize(medsmall)
Click
Record
28 pt. vhuge
20 pt.
16 pt.
14 pt.
12 pt.
11 pt.
10 pt. medsmall
8 pt. small
huge
6 pt.
vsmall
4 pt.
tiny
vlarge
2 pt.
half_tiny
large
1.3 pt. third_tiny
medlarge 1 pt. quarter_tiny
medium 1 pt minuscule
Double click on
symbols and areas
on plot, or regions
on sidebar to
customize
Unclick
Record
marker label mlabel(foreign)

label the points with the values
of the foreign variable
axis labels
nolabels
axis labels
format(%12.2f )
no axis labels
change the format of the axis labels
legend
off
legend
label(# "label")
turn off legend
change legend label text
marker label mlabposition(5)

label location relative to marker (clock position: 0 12)
Save theme
as a .grec file
Save Plots
graph twoway scatter y x, saving("myPlot.gph") replace
save the graph when drawing
graph save "myPlot.gph", replace
save current graph to disk
graph combine plot1.gph plot2.gph...
combine 2+ saved graphs into a single plot
graph export "myPlot.pdf", as(.pdf)
see options to set
export the current graph as an image file size and resolution
updated June 2016

CC BY 4.0
Examples use auto.dta (sysuse auto, clear)

unless otherwise noted
univar price mpg, boxplot

ssc install univar
calculate univariate summary, with box-and-whiskers plot
stem mpg
return stem-and-leaf display of mpg
frequently used commands are
summarize price mpg, detail
highlighted in yellow
calculate a variety of univariate summary statistics
for Stata 13: ci mpg price, level (99)
ci mean mpg price, level(99)
r compute standard errors and confidence intervals
correlate mpg price
return correlation or covariance matrix
pwcorr price mpg weight, star(0.05)
return all pairwise correlation coefficients with sig. levels
mean price mpg
estimates of means, including standard errors
proportion rep78 foreign
estimates of proportions, including standard errors for
e
categories identified in varlist
ratio
estimates of ratio, including standard errors
total price
estimates of totals, including standard errors
TIME SERIES OPERATORS

L.
F.
D.
S.
lag x t-1
lead x t+1
difference x t-x t-1
seasonal difference x t-xt-1
tabulate foreign rep78, chi2 exact expected

tabulate foreign and repair record and return chi2
and Fishers exact statistic alongside the expected values
ttest mpg, by(foreign)
estimate t test on equality of means for mpg by foreign
r prtest foreign == 0.5
one-sample test of proportions
ksmirnov mpg, by(foreign) exact
Kolmogorov-Smirnov equality-of-distributions test
ranksum mpg, by(foreign) exact
equality tests on unmatched data (independent samples)
anova systolic drug webuse systolic, clear
analysis of variance and covariance
e pwmean mpg, over(rep78) pveffects mcompare(tukey)
estimate pairwise comparisons of means with equal
variances include multiple comparison adjustment
USEFUL ADD-INS
measure something
CATEGORICAL VARIABLES
identify a group to which

an observations belongs
INDICATOR VARIABLES
denote whether
T F
something is true or false
1900
SURVIVAL ANALYSIS
SURVEY DATA
stores results as e -class
more details at http://www.stata.com/manuals14/u25.pdf
o.
#
omit a variable or indicator

specify interactions
regress price io(2).rep78

regress price mpg c.mpg#c.mpg
specify rep78 variable to be an indicator variable

set the third category of rep78 to be the base category
set the base to most frequently occurring category for rep78
treat mpg as a continuous variable and
specify an interaction between foreign and mpg
set rep78 as an indicator; omit observations with rep78 == 2
create a squared mpg term to be used in regression
##
specify factorial interactions
regress price c.mpg##c.mpg
create all possible interactions with mpg (mpg and mpg2)
1970
1980
1990
webuse nhanes2b, clear
svyset psuid [pweight = finalwgt], strata(stratid)

declare survey design for a dataset
r
svydescribe
report survey data details
svy: mean age, over(sex)
estimate a population mean for each subpopulation
svy, subpop(rural): mean age
estimate a population mean for rural areas
e
svy: tabulate sex heartatk
report two-way table with tests of independence
svy: reg zinc c.age##c.age female weight rural
estimate a regression using survey weights
regress price mpg weight, robust

estimate ordinary least squares (OLS) model
on mpg weight and foreign, apply robust standard errors
regress price mpg weight if foreign == 0, cluster(rep78)
regress price only on domestic cars, cluster standard errors
rreg price mpg weight, genwt(reg_wt)
estimate robust regression to eliminate outliers
probit foreign turn price, vce(robust)
ADDITIONAL MODELS
estimate probit regression with
built-in Stata principal components analysis
pca
command
robust standard errors
factor analysis
factor
poisson nbreg
count outcomes
logit foreign headroom mpg, or
censored data
tobit
estimate logistic regression and
instrumental variables
ivregress ivreg2
report odds ratios
difference-in-difference
bootstrap, reps(100): regress mpg /* rddiff sscuser-written
install ivreg2 regression discontinuity
*/ weight gear foreign
xtabond xtabond2 dynamic panel estimator
estimate regression with bootstrapping psmatch2
propensity score matching
jackknife r(mean), double: sum mpg synth
synthetic control analysis
Blinder-Oaxaca decomposition
jackknife standard error of sample mean oaxaca
EXAMPLE
regress price i.rep78
regress price ib(3).rep78
fvset base frequent rep78
regress price i.foreign#c.mpg i.foreign
id 4
webuse drugtr, clear
DESCRIPTION
specify indicators
specify base indicator
command to change base
treat variable as continuous
id 3
stset studytime, failure(died)

r declare survey design for a dataset
stsum
summarize survival-time data
stcox drug age
e
estimate a cox proportional hazard model
OPERATOR
i.
ib.
fvset
c.

1950
2-period lag x t-2

2-period lead x t+2
difference of difference xt-xt1-(xt1-xt2)
lag-2 (seasonal difference) xtxt2
id 2
compact time series into means, sums and end-of-period values

tscollap
carryforward carry non-missing values forward from one obs. to the next
identify spells or runs in time series
tsspell
Estimation with Categorical & Factor Variables

CONTINUOUS VARIABLES
L2.
F2.
D2.
S2.
1850
id 1
100
webuse nlswork, clear
xtset id year
declare national longitudinal data to be a panel
xtdescribe
xtline plot
report panel aspects of a dataset
wage relative to inflation
r
xtsum hours
summarize hours worked, decomposing
standard deviation into between and
within components
xtline ln_wage if id <= 22, tlabel(#3)
plot panel data as a line plot
xtreg
ln_w c.age##c.age ttl_exp, fe vce(robust)
e
estimate a fixed-effects model with robust standard errors
4
200
1 Estimate Models
Statistical Tests
PANEL / LONGITUDINAL
2 Diagnostics
not appropriate after robust cluster( )
estat hettest test for heteroskedasticity

r
ovtest test for omitted variable bias
vif
report variance inflation factor
dfbeta(length)
Type help regress postestimation plots
calculate measure of influence for additional diagnostic plots
rvfplot, yline(0)
avplots
plot residuals
plot all partialmpg
rep78
against fitted
regression leverage
values
plots in one graph
Fitted values
weight
headroom
3 Postestimation
price
Summarize Data
webuse sunspot, clear
tsset time, yearly

declare sunspot data to be yearly time series
tsreport
r
report time series aspects of a dataset
generate lag_spot = L1.spot
create a new variable of annual lags of sun spots tsline plot
Number of sunspots
tsline spot
plot time series of sunspots
e
arima spot, ar(1/2)
estimate an auto-regressive model with 2 lags
price
Results are stored as either r -class or e -class. See Programming Cheat Sheet
TIME SERIES
price
By declaring data type, you enable Stata to apply data munging and analysis functions specific to certain data types
price
Declare Data
Residuals
Data Analysis
commands that use a fitted model
regress price headroom length Used in all postestimation examples

display _b[length]
display _se[length]
return coefficient estimate or standard error for mpg
from most recent regression model
margins, dydx(length) returns e-class information when post option is used
return the estimated marginal effect for mpg
r
margins, eyex(length)
return the estimated elasticity for price
predict yhat if e(sample)
create predictions for sample on which model was fit
predict double resid, residuals
calculate residuals based on last fit model
test mpg = 0
r test linear hypotheses that mpg estimate equals zero
lincom headroom - length
test linear combination of estimates (headroom = length)
updated June 2016

CC BY 4.0
Programming
with Stata 14.1
Cheat Sheet
1 Scalars
both r- and e-class results contain scalars
scalar x1 = 3
create a scalar x1 storing the number 3
scalar a1 = I am a string scalar
create a scalar a1 storing a string
2 Matrices
Scalars can hold

numeric values or
arbitrarily long strings
DISPLAYING & DELETING BUILDING BLOCKS
[scalar | matrix | macro | estimates] [list | drop] b

list contents of object b or drop (delete) object b
[scalar | matrix | macro | estimates] dir
list all defined objects for that class
matrix list b
matrix dir
scalar drop x1
list contents of matrix b list all matrices delete scalar x1
GLOBALS
public or private variables storing text
available through Stata sessions
LOCALS
R- AND E-CLASS: Stata stores calculation results in two* main classes:
return results from general commands

such as summary or tabulate
return results from estimation

commands such as regress or mean
To assign values to individual variables use:

r individual numbers or strings
e rectangular array of quantities or expressions
e pointers that store text (global or local)
1 SCALARS
2 MATRICES
3 MACROS
Loops: Automate Repetitive Tasks
ANATOMY OF A LOOP
objects to repeat over

temporary variable used
only within the loop
requires local macro notation
* theres also s- and n-class
PUBLIC
available only in programs, loops, or .do files PRIVATE
mean price
e ereturn list
returns list of scalars, macros,
matrices and functions
summarize price, detail

r return list
returns a list of scalars
scalars:
r(N)
r(mean)
r(Var)
r(sd)
=
=
=
=
74
6165.25...
86995225.97...
2949.49...
...
Results are replaced

each time an r-class
/ e-class command
is called
scalars:
e(df_r)
e(N_over)
e(N)
e(k_eq)
e(rank)
=
=
=
=
=
73
1
73
1
1
generate meanN = e(N)

create a new variable equal to
obs. in estimation command
preserve create a temporary copy of active dataframe
restore points to test
restore restore temporary copy to original point set
code that changes data
generate p_mean = r(mean)
create a new variable equal to
average of price
ACCESSING ESTIMATION RESULTS
After you run any estimation command, the results of the estimates are
stored in a structure that you can save, view, compare, and export
regress price weight

Use estimates store
estimates store est1
to compile results
store previous estimation results est1 in memory for later use
ssc install estout
eststo est2: regress price weight mpg
eststo est3: regress price weight mpg foreign
estimate two regression models and store estimation results
estimates table est1 est2 est3
print a table of the two estimation results est1 and est2
EXPORTING RESULTS
see also while
Stata has three options for repeating commands over lists or values:
foreach, forvalues, and while. Though each has a different first line,
the syntax is consistent:
foreach x of varlist var1 var2 var3 {
Many Stata commands store results in types of lists. To access these, use return or
ereturn commands. Stored results can be scalars, macros, matrices or functions.
matrix ad2 = a , d
matrix ad1 = a \ d
row bind matrices
column bind matrices
matselrc b x, c(1 3) findit matselrc
select columns 1 & 3 of matrix b & store in new matrix x
mat2txt, matrix(ad1) saving(textfile.txt) replace
export a matrix to a text file
ssc install mat2txt
global pathdata "C:/Users/SantasLittleHelper/Stata"

define a global variable called pathdata
cd $pathdata add a $ before calling a global macro
change working directory by calling global macro
global myGlobal price mpg length
summarize $myGlobal
summarize price mpg length using global
basic components of programming
4 Access & Save Stored r- and e-class Objects
e-class results are stored as matrices
matrix a = (4\ 5\ 6)
matrix b = (7, 8, 9)
create a 3 x 1 matrix
create a 1 x 3 matrix
matrix d = b' transpose matrix b; store in d
3 Macros
Building Blocks
The estout and outreg2 packages provide numerous, flexible options for making tables
after estimation commands. See also putexcel command.
command `x', option

...
}
open brace must

appear on first line
command(s) you want to repeat

can be one line or many
close brace must appear

on final line by itself
FOREACH: REPEAT COMMANDS OVER STRINGS, LISTS, OR VARIABLES

foreach x in|of [ local, global, varlist, newlist, numlist ] {
list types: objects over which the
Stata commands referring to `x'
commands will be repeated
}
loops repeat the same command
STRINGS
over different arguments:
sysuse "auto.dta", clear
foreach x in auto.dta auto2.dta {
same as...
tab rep78, missing
sysuse "`x'", clear
tab rep78, missing
sysuse "auto2.dta", clear
}
tab rep78, missing
LISTS
foreach x in "Dr. Nick" "Dr. Hibbert" {
display length("Dr. Nick")
display length ( "` x '" )
display length("Dr. Hibbert")
}
When calling a command that takes a string,
surround the macro name with quotes.
VARIABLES
foreach x in mpg weight {
summarize `x'
}
must define list type
foreach x of varlist mpg weight {

summarize `x'
}
foreach in takes any list

as an argument with
elements separated by
spaces
foreach of requires you
to state the list type,
which makes it faster
summarize mpg
summarize weight
FORVALUES: REPEAT COMMANDS OVER LISTS OF NUMBERS

iterator
forvalues i = 10(10)50 {
display ì'
numeric values over
which loop will run
}
DEBUGGING CODE
Use display command to

show the iterator value at
each step in the loop
ITERATORS
display 10
display 20
...
i = 10/50
10, 11, 12, ...
i = 10(10)50
10, 20, 30, ...
i = 10 20 to 50 10, 20, 30, ...
local myLocal price mpg length

esttab est1 est2, se star(* 0.10 ** 0.05 *** 0.01) label
see also capture and scalar _rc
set trace on (off )
create local variable called myLocal with the
create summary table with standard errors and labels
trace
the
execution
of
programs
for
error
checking
strings price mpg and length
esttab using auto_reg.txt, replace plain se
sysuse auto, clear
PUTTING
IT
ALL
TOGETHER
summarize ` myLocal ' add a ` before and a ' after local macro name to call
export summary table to a text file, include standard errors
pull out the first word
summarize contents of local myLocal
generate
car_make
=
word(make,
1)
from the make variable
outreg2 [est1 est2] using auto_reg2.txt, see replace
levelsof rep78, local(levels)
calculate unique groups of
export summary table to a text file using outreg2 syntax
levelsof car_make, local(cmake)
define the
create a sorted list of distinct values of rep78,
car_make and store in local cmake
local i to be
local i = 1
store results in a local macro called levels
an iterator
Additional Programming Resources
store the length of local
local cmake_len : word count `cmake'
local varLab: variable label foreign can also do with value labels
cmake in local cmake_len
store the variable label for foreign in the local varLab
bit.ly/statacode
foreach x of local cmake {
download all examples from this cheat sheet in a .do file
display in yellow "Make group ì' is `x'"
TEMPVARS & TEMPFILES special locals for loops/programs
ssc install adolist
adoupdate
adolist
if ì' == `cmake_len' {
initialize
a
new
temporary
variable
called
temp1
tempvar temp1
Update user-written .ado files
List/copy user-written .ado files
tests the position of the
display "The total number of groups is ì'"
save squared mpg values in temp1
generate `temp1' = mpg^2
iterator,
executes
contents
net install package, from (https://raw.githubusercontent.com/username/repo/master) in brackets when the
summarize the temporary variable temp1
}
summarize `temp1'
install a package from a Github repository
condition is true
local i = `++i'
increment iterator by one
tempfile myAuto create a temporary file to see also
https://github.com/andrewheiss/SublimeStataEnhanced
}
save `myAuto'
be used within a program tempname
configure Sublime text for Stata 11-14
updated June 2016
CC BY 4.0

Stata Cheat Sheets

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Stata Cheat Sheets

Diunggah oleh

Hak Cipta:

Format Tersedia

Data Processing

with Stata 14.1 Cheat Sheet

scroll through previous commands

autocompletes variable name after typing part

clear the console (where results are displayed)

column to save output as condition: only apply to

bysort rep78 : summarize

pull data from a file

In this example, we want a detailed summary

if foreign == 0 & price <= 9000, detail

function: what are

Basic Data Operations

== tests if something is equal

if foreign != 1 & price >= 10000

< less than

describe make price

SEE DATA DISTRIBUTION

codebook make price

BROWSE OBSERVATIONS WITHIN THE DATA

Missing values are treated as the largest

Tim Essam (tessam@usaid.gov) Laura Hughes (lhughes@usaid.gov)

inspired by RStudios awesome Cheat Sheets (rstudio.com/resources/cheatsheets)

Change Data Types

Stata has 6 data types, and data can also be missing:

gen foreignNumeric = real(foreignString)

recast double mpg

tabulate rep78, mi gen(repairRecord)

table foreign, contents(mean price sd price) f(%9.2fc) row

Create New Variables

generate mpgSq = mpg^2 gen byte lowPr = price < 4000

Disclaimer: we are not affiliated with Stata. But we like it.

updated January 2016

with Stata 14.1 Cheat Sheet

Select Parts of Data (Subsetting)

FILTER SPECIFIC ROWS

drop if mpg < 20

webuse set https://github.com/GeoCenter/StataTraining/raw/master/Day2/Data

MELT DATA (WIDE LONG)

reshape variables starting

unique id create new variable which captures

reshape long coffee@ maize@, i(country) j(year)

CAST DATA (LONG WIDE)

what will be create new variables

create new variables named

reshape wide coffee maize, i(country) j(year)

When datasets are

xpose, clear varname

Replace Parts of Data

REPLACE MISSING VALUES

useful for cleaning survey datasets

Tim Essam (tessam@usaid.gov) Laura Hughes (lhughes@usaid.gov)

webuse coffeeMaize2.dta, clear

CHANGE ROW VALUES

replace price = 5000 if price < 5000

append using "coffeeMaize2.dta", gen(filenum)

MERGING TWO DATASETS TOGETHER

pink brown _merge

pink brown _merge

webuse ind_age.dta, clear

FUZZY MATCHING: COMBINING TWO DATASETS WITHOUT A COMMON ID

inspired by RStudios awesome Cheat Sheets (rstudio.com/resources/cheatsheets)

FIND MATCHING STRINGS

display strmatch("123.89", "1??.?9")

GET STRING PROPERTIES

graph bar (count), over(foreign, gap(0.5)) intensity(0.5)