Anda di halaman 1dari 5

Sources of Social Scientific Data

Introduction to Research Analysis (SOC 3040)

1 Overview
This document provides an overview of several major hubs of social scientific data that can
be used for statistical analysis. All of the sources discussed below can be accessed for no cost
to Tulane students (though a few might require registration). Please be sure to note that
the inclusion of a dataset in the list below is not conclusive evidence of high data
quality. You should always be sure to critically examine the data collection methodology
and operationalization of any dataset that you use. To put it mildly, data quality can vary
wildly even from the same source. The list below is far from exhaustive, however it does
cover the most commonly used sources of data that can be used to address a wide variety of
topics in sociology.

If you are interested in additional sources of data, the library’s guide to sociology provides
a discussion of several sources of polling data pertinent to both the US and Latin America.
It is available at http://libguides.tulane.edu/sociology. If you need a bit more help
getting started, check out http://libguides.tulane.edu/socialsciencedata for great
advice that will clarify the steps you should take.

2 Sources of Survey Data


Survey data is by far the most widely used source of social science data. The sources below
are generally drawn from representative samples of a larger population. Some represent a
panel design while others are cross-sectional.

2.1 The General Social Survey


The General Social Survey (GSS) is one of the most widely used datasets in the social
sciences. Starting in 1972, the GSS was conducted annually until 1978, and then became
biennial until 2010. While each survey year of the GSS represents a cross-section of the
US population, recent waves have started to include a panel design. This survey focuses
on a wide variety of attitudinal, political, and socioeconomic measures. As well, several
variables are asked at all or most iterations of the survey, making it possible to examine
attitudinal change over time. The data, methodology, and documentation is freely available
from http://www3.norc.org/gss+website/. The link also provides several online analysis
tools that can be used to quickly examine the distributions for variables in interest.

2.2 The World Values Survey


The World Values Survey (WVS) is an opinion survey and is somewhat similar to the GSS in
terms of the questions and topics. Unlike the GSS, however, the WVS provides representative
samples for about 90 countries ranging from the (post-)industrial democracies to the global

1
south. As a result, the WVS is an excellent dataset if you are interested in comparing national
or regional variation in opinion. Starting in 1981, the WVS survey takes place approximately
every 2 to 4 years, and the aggregate data file contains over 250,000 respondents. This
also allows the examination of temporal change by country. The data, codebooks, and
methodology files are freely available at: http://www.worldvaluessurvey.org/. There are
also basic data analysis tools at the linked webpage.

2.3 The American National Election Survey


The American National Election Survey (ANES) is a nationally representative survey of
American voters, emphasizing political participation, public opinion, and voting trends. It
is extensively used in political science and political sociology. The ANES has been repeated
for every presidential election, starting in 1948. While it is a cross-sectional study at the core,
more recent modules of the ANES have adopted a panel design. The data, documentation,
and methodology are freely available online at http://www.electionstudies.org/.

2.4 The National Longitudinal Survey of Youth, 1979


The National Longitudinal Survey of Youth, 1979 (NLSY79) is a panel study of about 13,000
youths initially interview in 1979. The respondents were interview annually until 1994, and
then the NLSY79 became biennial. The NLSY79 contains extensive information on variety of
topics, including each respondent’s demographic background, educational attainment, career
and occupational history, marital status, and drug/alcohol use. The data, codebooks, and
methodology for the NLSY79 are freely available at http://www.bls.gov/nls/nlsy79.htm.

2.5 The National Longitudinal Study of Adolescent Health


The National Longitudinal Study of Adolescent Health (Add Health) is a nationally repre-
sentative panel study of American high school students. The first set of interviews occurred
in 1994–1995 to a sample of students in grades 7 through 12. There have since been four
follow-up interviews, the most recent of which was completed in 2008. Due to its panel
design, the Add Health provides an excellent way to assess how a nationally representative
sample grew into adulthood. The survey provides measures of each respondent’s family back-
ground, community of residence, educational attainment, romantic relationships, friendship
circles, and many other factors. More recent surveys also contain biological and genetic in-
formation. The unrestricted version of the Add Health is freely available from the Carolina
Population Center at http://www.cpc.unc.edu/projects/addhealth. It should be noted
that the Add Health is complex, and it requires a relatively significant effort to master the
nuances of the survey.

3 Sources of Population Data


There are a wide variety of places to access data and statistic about the US population. To
keep things brief, the major repositories are discussed rather than going every each of the

2
hundreds of individual databases. With a little exploration, it is easy to locate population
data on nearly every demographic topics at the sources below. As well, this section focuses
primarily on population and demographic data for the US population. Last, it is a very good
idea to use social explorer to get started, which is available on through Tulane’s Library
Databases. Data sources for other countries are briefly discussed in section 4 below.

3.1 Census Bureau


The Census Bureau is standard source for information on the American population, both cur-
rently and historically. The main portal for the Census Bureau is http://www.census.gov/.
Of particular note to sociologists are:

1. The Census of Population: Conducted every 10 years, the Census began in 1790 and
the last census was taken in 2010. The Census Bureau has devoted extensive resources
in getting Census data online, and it is relatively straightforward to retrieve data from
the 1990, 2000, and 2010 censuses.

2. The American Community Survey: Starting in 2007, the ACS is an annual survey
giving a snapshot of the demography of the US between census years. Data and
documentation for the ACS is available at http://www.census.gov/acs/www/.

3. The Economic Census: The economic census provides a wide variety of information
on businesses occurring across the United States. Starting in 2002, the economic
census provides detailed information on every industry in the US. The economic cen-
sus occurs every 5 years (2012 is just becoming available). It can be accessed at
http://www.census.gov/econ/.

Again, there are hundreds of different data products produced by the Census Bureau, so if
you are interested in population dynamics, it is well worth the effort to explore the Bureau’s
website.

3.2 Bureau of Labor Statistics


The Bureau of Labor Statistics (BLS) is the main source of data on economic dynamics,
employment and unemployment, consumer price trends, and wages in the US. The main
website is http://www.bls.gov/. Data on a wide variety of indicators are available at
http://www.bls.gov/data/, and many of the databases represent time-series that track
trends over time.

3.3 SODA POP


The Simple Online Data Archive for Population Studies (SODA POP) is a depository main-
tained by the Population Research Institute at Penn State. It is a great source for many
databases related to population dynamics. While not all of the databases listed are available
for free download, the majority are easily accessible. The data products that are available
can be accessed through http://sodapop.pop.psu.edu/.

3
4 Sources of Political-Economic Data
Similar to the sources of data on the US population, there is an almost uncountable number
of places to locate cross-national information. As before, the main concentration is on the
major hubs of data that contain thousands of different data products. A little exploration
and patience will likely yield the data that you are looking for. These sources generally
provide annual or time-series estimates that are aggregated to the level of the nation-state.

4.1 The United Nations


The UN provides a wide variety of information on the demographic, educational, and popu-
lation dynamics of each country. The main data hub for the UN is http://data.un.org/.
Nearly all of the data can be easily downloaded in a variety of formats, and a robust search
tool is provided to help narrow down the amount of data that is available.

4.2 The World Bank


The WB excels at providing economic and population data for nearly every country. Many
of the data products are available as time-series, allowing for both inter- and intra-country
comparison. The main gateway for WB data is http://data.worldbank.org/. Similar to
the UN’s data service, nearly all of the WB’s datasets can be easily exported and there is a
search tool to look for specific measures.

4.3 Correlates of War


The Correlates of War is the standard source of information on global conflict dynamics,
human rights abuses, and geo-political issues. Though coverage of individual countries varies,
the COW data began collection in 1963 and many nation-states are well represented. The
data is provided in several databases available at http://www.correlatesofwar.org/.

5 Major Data Depositories


Aside from the sources listed above, there are two major depositories for social scientific data
that store data on a wide variety of topics. If you are interested in examining a topic that
is not contained in the sources described above, there is a good chance it will be available
at one of the repositories.

5.1 Interuniversity Consortium for Political and Social Research


The Interuniversity Consortium for Political and Social Research (ICPSR) contains thou-
sands of different datasets on nearly every topic that social scientists study. Downloading
the data requires registration, which is free for Tulane students. It is usually possible to view
the codebooks and data collection without setting up an account. The URL for the ICPSR
is http://www.icpsr.umich.edu/icpsrweb/landing.jsp.

4
5.2 The Dataverse Network
The Dataverse Network is a open-source data sharing project maintained by the Institute
for Quantitative Social Science at Harvard. While not as comprehensive as ICPSR, the
breadth of the Dataverse Network is growing steadily over time. A list of the datasets that
are available can be found at http://dvn.iq.harvard.edu/dvn/.

5.3 The Association of Religion Data Archives


The Association of Religion Data Archives (ARDA) is the authoritative source for data
on religious affiliations, religious congregations, and religious activity more generally. The
databases span not only the US, but there are also many international sources of data on
religion as well. The ARDA is continually adding new sources and databases. It can be
found on the web at http://www.thearda.com/.