Contents
1 Preliminaries 2 Undergraduate institution 3 Graduate institution 4 Field of Study 5 Subject area 1 1 4 5 7
Preliminaries
The awardee list was downloaded from https://www.fastlane.nsf.gov/grfp/AwardeeList. do?method=sort&method%3DloadAwardeeList&exportType=2. Some preprocessing was done in Microsoft Excel: baccalaureate institutions were normalized by converting to lower case, and a Subject column was generated by splitting the Field of Study column on hyphens. The data were then exported in CSV format as NSFAwardeeList.csv. Load libraries
R Code
1 2 3
Import data
1
Undergraduate institution
> udf <- as.data.frame(ugrads) > names(udf) = c("Undergrad", "Awardees") > head(udf)
1 2 3 4 5 6
Undergrad Awardees albion college 3 allegheny college 1 american university 1 amherst college 6 arizona state university 9 auburn university 1
> o <- order(-udf$Awardees) > ugrads.sorted <- udf[o, ] > head(ugrads.sorted, n = 35L)
279 154 238 380 66 369 102 201 29 113 63 302 383 417 282 25 280 294 194 274 79 184
Undergrad Awardees university of california berkeley 77 massachusetts institute of technology 50 stanford university 38 university of washington 38 cornell university 36 university of texas at austin 36 georgia institute of technology 35 princeton university 32 california institute of technology 31 harvard university 30 columbia university 29 university of florida 28 university of wisconsin madison 28 yale university 27 university of california los angeles 25 brown university 24 university of california davis 20 university of chicago 20 pennsylvania state univ university park 19 university of arizona 19 duke university 18 northwestern university 18 2
328 university of minnesota twin cities 95 franklin w. olin college of engineering 188 ohio state university 358 university of rochester 410 william marsh rice university 340 university of north carolina at chapel hill 42 carnegie mellon university 52 clemson university 114 harvey mudd college 178 new york university 286 university of california santa barbara 326 university of michigan 351 university of pittsburgh
17 16 16 16 16 15 14 14 14 14 14 14 14
Select undergrad institutions with more than 20 awardees and draw a dotplot
1 2 3
R Code > ugrads.top <- drop.levels(udf[udf$Awardees > 20, ]) > p <- qplot(x = Awardees, y = Undergrad, data = ugrads.top) > print(p)
yale university university of wisconsin madison university of washington university of texas at austin university of florida university of california los angeles university of california berkeley
q
Undergrad
stanford university princeton university massachusetts institute of technology harvard university georgia institute of technology cornell university columbia university california institute of technology brown university
q q q q
30
40
50
60
70
Awardees
R Code > utable <- xtable(ugrads) > print(utable, type = "latex", file = "undergrads.tex", tabular.environment = "longtable")
Graduate institution
> gdf <- as.data.frame(grads) > names(gdf) = c("Grad", "Awardees") > head(gdf)
Grad Awardees 1 Arizona State University 9 2 Auburn University 2 3 Boston University 4 4 Boston University - Graduate School of Arts and Sciences 1 5 Boston University School of Medicine 1 6 Brandeis University 1 Sort by number of awardees
R Code
1 2 3
> o <- order(-gdf$Awardees) > grads.sorted <- gdf[o, ] > head(grads.sorted, n = 19L)
Grad Awardees Massachusetts Institute of Technology 160 University of California-Berkeley 156 Stanford University 152 Harvard University 103 University of Washington 76 Cornell University 58 University of Michigan Ann Arbor 58 University of California-San Diego 49 University of California-Davis 42 University of Wisconsin-Madison 42 Columbia University 40 University of California-Los Angeles 39 Northwestern University 37 University of Illinois at Urbana-Champaign 34 4
9 171 14 25 64
33 33 32 31 31
Select grad institutions with more than 30 awardees and draw a dotplot
1 2 3
R Code > grads.top <- drop.levels(gdf[gdf$Awardees > 30, ]) > p <- qplot(x = Awardees, y = Grad, data = grads.top) > print(p)
Yale University University of WisconsinMadison University of Washington University of Michigan Ann Arbor University of Illinois at UrbanaChampaign University of CaliforniaSan Diego University of CaliforniaLos Angeles University of CaliforniaDavis University of CaliforniaBerkeley
q q q q q q q q q q q q q q q q q q q
Grad
Stanford University Princeton University Northwestern University Massachusetts Institute of Technology Harvard University Duke University Cornell University Columbia University CarnegieMellon University California Institute of Technology
40
60
80
100
120
140
160
Awardees
A Generate a L TEX-formatted table
1 2
R Code > gtable <- xtable(grads) > print(gtable, type = "latex", file = "grads.tex", tabular.environment = "longtable")
Field of Study
> fdf <- as.data.frame(fields) > names(fdf) = c("Field", "Awardees") > head(fdf)
1 2 3 4 5 6
Field Awardees Chemistry - Analytical 7 Chemistry - Bio-inorganic 8 Chemistry - Bio-organic 11 Chemistry - Biophysical 5 Chemistry - Environmental 2 Chemistry - Inorganic 25
> o <- order(-fdf$Awardees) > fields.sorted <- fdf[o, ] > head(fields.sorted, n = 10L)
Field Awardees 49 Engineering - Mechanical 83 89 Life Sciences - Ecology 83 38 Engineering - Biomedical 79 39 Engineering - Chemical 68 101 Life Sciences - Neurosciences 61 37 Engineering - Bioengineering 59 48 Engineering - Materials 52 42 Engineering - Electrical and Electronic 51 93 Life Sciences - Evolutionary Biology 50 166 Psychology - Social 43 Select grad institutions with more than 40 awardees and draw a dotplot
1 2 3
R Code > fields.top <- drop.levels(fdf[fdf$Awardees > 40, ]) > p <- qplot(x = Awardees, y = Field, data = fields.top) > print(p)
Psychology Social
Engineering Mechanical
Field
Engineering Materials
Engineering Chemical
Engineering Biomedical
Engineering Bioengineering
Chemistry Organic
50
60
70
80
Awardees
A Generate a L TEX-formatted table
1 2
R Code > ftable <- xtable(fields) > print(ftable, type = "latex", file = "fields.tex", tabular.environment = "longtable")
Subject area
> sdf <- as.data.frame(subjects) > names(sdf) = c("Subject", "Awardees") > head(sdf, n = 10L)
2 Comp/IS/Eng 3 Engineering 4 Geosciences 5 Life Sciences 6 Mathematical Sciences 7 Physics and Astronomy 8 Psychology 9 Social Sciences 10 STEM Education and Learning Research Sort by number of awardees
R Code
1 2 3
> o <- order(-sdf$Awardees) > subjects.sorted <- sdf[o, ] > head(subjects.sorted, n = 10L)
Subject Awardees 5 Life Sciences 593 3 Engineering 524 9 Social Sciences 197 1 Chemistry 158 8 Psychology 134 2 Comp/IS/Eng 113 7 Physics and Astronomy 100 6 Mathematical Sciences 80 4 Geosciences 78 10 STEM Education and Learning Research 23 Select grad institutions with more than 40 awardees and draw a dotplot
1 2
R Code > p <- qplot(x = Awardees, y = Subject, data = subjects.sorted) > print(p)
Social Sciences
Psychology
Mathematical Sciences
Subject
Life Sciences
Geosciences
Engineering
Comp/IS/Eng
Chemistry 100
200
300
400
500
Awardees
A Generate a L TEX-formatted table
1 2
R Code > stable <- xtable(subjects) > print(stable, type = "latex", file = "subjects.tex", tabular.environment = "longtable")
Chemistry Comp/IS/Eng Engineering Geosciences Life Sciences Mathematical Sciences Physics and Astronomy Psychology Social Sciences STEM Education and Learning Research