Anda di halaman 1dari 9

2011 NSF Graduate Research Fellowship statistics

Elson Liu August 16, 2011

Contents
1 Preliminaries 2 Undergraduate institution 3 Graduate institution 4 Field of Study 5 Subject area 1 1 4 5 7

Preliminaries

The awardee list was downloaded from https://www.fastlane.nsf.gov/grfp/AwardeeList. do?method=sort&method%3DloadAwardeeList&exportType=2. Some preprocessing was done in Microsoft Excel: baccalaureate institutions were normalized by converting to lower case, and a Subject column was generated by splitting the Field of Study column on hyphens. The data were then exported in CSV format as NSFAwardeeList.csv. Load libraries
R Code
1 2 3

> library(xtable) > library(ggplot2) > library(gdata)

Import data
1

R Code > df <- read.csv("NSFAwardeeList.csv", head = TRUE)

Undergraduate institution

Tabulate undergraduate institution frequencies

R Code > ugrads <- table(df$Lower.Case.Baccalaureate, dnn = c("Number of awardees"))

Convert the table back to a data frame


R Code
1 2 3

> udf <- as.data.frame(ugrads) > names(udf) = c("Undergrad", "Awardees") > head(udf)

1 2 3 4 5 6

Undergrad Awardees albion college 3 allegheny college 1 american university 1 amherst college 6 arizona state university 9 auburn university 1

Sort by number of awardees


R Code
1 2 3

> o <- order(-udf$Awardees) > ugrads.sorted <- udf[o, ] > head(ugrads.sorted, n = 35L)

279 154 238 380 66 369 102 201 29 113 63 302 383 417 282 25 280 294 194 274 79 184

Undergrad Awardees university of california berkeley 77 massachusetts institute of technology 50 stanford university 38 university of washington 38 cornell university 36 university of texas at austin 36 georgia institute of technology 35 princeton university 32 california institute of technology 31 harvard university 30 columbia university 29 university of florida 28 university of wisconsin madison 28 yale university 27 university of california los angeles 25 brown university 24 university of california davis 20 university of chicago 20 pennsylvania state univ university park 19 university of arizona 19 duke university 18 northwestern university 18 2

328 university of minnesota twin cities 95 franklin w. olin college of engineering 188 ohio state university 358 university of rochester 410 william marsh rice university 340 university of north carolina at chapel hill 42 carnegie mellon university 52 clemson university 114 harvey mudd college 178 new york university 286 university of california santa barbara 326 university of michigan 351 university of pittsburgh

17 16 16 16 16 15 14 14 14 14 14 14 14

Select undergrad institutions with more than 20 awardees and draw a dotplot
1 2 3

R Code > ugrads.top <- drop.levels(udf[udf$Awardees > 20, ]) > p <- qplot(x = Awardees, y = Undergrad, data = ugrads.top) > print(p)

yale university university of wisconsin madison university of washington university of texas at austin university of florida university of california los angeles university of california berkeley
q

Undergrad

stanford university princeton university massachusetts institute of technology harvard university georgia institute of technology cornell university columbia university california institute of technology brown university
q q q q

30

40

50

60

70

Awardees

A Generate a L TEX-formatted table


1 2

R Code > utable <- xtable(ugrads) > print(utable, type = "latex", file = "undergrads.tex", tabular.environment = "longtable")

Graduate institution

Tabulate graduate institution frequencies


1

R Code > grads <- table(df$Graduate.Institution, dnn = c("Number of awardees"))

Convert the table back to a data frame


R Code
1 2 3

> gdf <- as.data.frame(grads) > names(gdf) = c("Grad", "Awardees") > head(gdf)

Grad Awardees 1 Arizona State University 9 2 Auburn University 2 3 Boston University 4 4 Boston University - Graduate School of Arts and Sciences 1 5 Boston University School of Medicine 1 6 Brandeis University 1 Sort by number of awardees
R Code
1 2 3

> o <- order(-gdf$Awardees) > grads.sorted <- gdf[o, ] > head(grads.sorted, n = 19L)

43 94 73 33 157 21 128 99 95 158 20 97 57 116

Grad Awardees Massachusetts Institute of Technology 160 University of California-Berkeley 156 Stanford University 152 Harvard University 103 University of Washington 76 Cornell University 58 University of Michigan Ann Arbor 58 University of California-San Diego 49 University of California-Davis 42 University of Wisconsin-Madison 42 Columbia University 40 University of California-Los Angeles 39 Northwestern University 37 University of Illinois at Urbana-Champaign 34 4

9 171 14 25 64

California Institute of Yale Carnegie-Mellon Duke Princeton

Technology University University University University

33 33 32 31 31

Select grad institutions with more than 30 awardees and draw a dotplot
1 2 3

R Code > grads.top <- drop.levels(gdf[gdf$Awardees > 30, ]) > p <- qplot(x = Awardees, y = Grad, data = grads.top) > print(p)

Yale University University of WisconsinMadison University of Washington University of Michigan Ann Arbor University of Illinois at UrbanaChampaign University of CaliforniaSan Diego University of CaliforniaLos Angeles University of CaliforniaDavis University of CaliforniaBerkeley

q q q q q q q q q q q q q q q q q q q

Grad

Stanford University Princeton University Northwestern University Massachusetts Institute of Technology Harvard University Duke University Cornell University Columbia University CarnegieMellon University California Institute of Technology

40

60

80

100

120

140

160

Awardees
A Generate a L TEX-formatted table
1 2

R Code > gtable <- xtable(grads) > print(gtable, type = "latex", file = "grads.tex", tabular.environment = "longtable")

Field of Study

Tabulate eld of study frequencies 5

R Code > fields <- table(df$Field.of.Study, dnn = c("Number of awardees"))

Convert the table back to a data frame


R Code
1 2 3

> fdf <- as.data.frame(fields) > names(fdf) = c("Field", "Awardees") > head(fdf)

1 2 3 4 5 6

Field Awardees Chemistry - Analytical 7 Chemistry - Bio-inorganic 8 Chemistry - Bio-organic 11 Chemistry - Biophysical 5 Chemistry - Environmental 2 Chemistry - Inorganic 25

Sort by number of awardees


R Code
1 2 3

> o <- order(-fdf$Awardees) > fields.sorted <- fdf[o, ] > head(fields.sorted, n = 10L)

Field Awardees 49 Engineering - Mechanical 83 89 Life Sciences - Ecology 83 38 Engineering - Biomedical 79 39 Engineering - Chemical 68 101 Life Sciences - Neurosciences 61 37 Engineering - Bioengineering 59 48 Engineering - Materials 52 42 Engineering - Electrical and Electronic 51 93 Life Sciences - Evolutionary Biology 50 166 Psychology - Social 43 Select grad institutions with more than 40 awardees and draw a dotplot
1 2 3

R Code > fields.top <- drop.levels(fdf[fdf$Awardees > 40, ]) > p <- qplot(x = Awardees, y = Field, data = fields.top) > print(p)

Psychology Social

Life Sciences Neurosciences

Life Sciences Evolutionary Biology

Life Sciences Ecology

Engineering Mechanical

Field

Engineering Materials

Engineering Electrical and Electronic

Engineering Chemical

Engineering Biomedical

Engineering Bioengineering

Chemistry Organic

50

60

70

80

Awardees
A Generate a L TEX-formatted table
1 2

R Code > ftable <- xtable(fields) > print(ftable, type = "latex", file = "fields.tex", tabular.environment = "longtable")

Subject area

Tabulate subject area frequencies


1

R Code > subjects <- table(df$Subject, dnn = c("Number of awardees"))

Convert the table back to a data frame


R Code
1 2 3

> sdf <- as.data.frame(subjects) > names(sdf) = c("Subject", "Awardees") > head(sdf, n = 10L)

Subject Awardees Chemistry 158 7

2 Comp/IS/Eng 3 Engineering 4 Geosciences 5 Life Sciences 6 Mathematical Sciences 7 Physics and Astronomy 8 Psychology 9 Social Sciences 10 STEM Education and Learning Research Sort by number of awardees
R Code
1 2 3

113 524 78 593 80 100 134 197 23

> o <- order(-sdf$Awardees) > subjects.sorted <- sdf[o, ] > head(subjects.sorted, n = 10L)

Subject Awardees 5 Life Sciences 593 3 Engineering 524 9 Social Sciences 197 1 Chemistry 158 8 Psychology 134 2 Comp/IS/Eng 113 7 Physics and Astronomy 100 6 Mathematical Sciences 80 4 Geosciences 78 10 STEM Education and Learning Research 23 Select grad institutions with more than 40 awardees and draw a dotplot
1 2

R Code > p <- qplot(x = Awardees, y = Subject, data = subjects.sorted) > print(p)

STEM Education and Learning Research

Social Sciences

Psychology

Physics and Astronomy

Mathematical Sciences

Subject

Life Sciences

Geosciences

Engineering

Comp/IS/Eng

Chemistry 100

200

300

400

500

Awardees
A Generate a L TEX-formatted table
1 2

R Code > stable <- xtable(subjects) > print(stable, type = "latex", file = "subjects.tex", tabular.environment = "longtable")

Chemistry Comp/IS/Eng Engineering Geosciences Life Sciences Mathematical Sciences Physics and Astronomy Psychology Social Sciences STEM Education and Learning Research

Number of awardees 158 113 524 78 593 80 100 134 197 23

Anda mungkin juga menyukai