Anda di halaman 1dari 40

DRAFT DO NOT CITE OR QUOTE Coverage of the Foreign-Born Population in Censuses and Surveys: What Do We Think, What Do We Know,

, and What Can We Prove? By Dean H. Judson1 Abstract It is widely assumed that the foreign-born population in general, and the unauthorized foreign-born population in particular, are not captured in surveys and the decennial Census as well as the native-born population. Many different estimates have been proffered, all with significant limitations. This paper attempts to summarize research on this coverage question. We first describe the methods used to correct net coverage error in censuses and surveys, and discuss the effect of net coverage error on censuses, surveys, and derived products. Then we describe the history of various coverage assertions or assumptions used in the literature (what do we think?). We then assess their relative merit (what do we know?). Finally, we attempt to make an assessment as to what is provable about coverage of the foreign-born (what can we prove?). We conclude by suggesting a research agenda that would address this coverage question in a statistically-principled way.

Keywords: foreign-born coverage, coverage measurement, coverage correction, information integration, the coverage question

This report was completed while the author was Senior Statistician for the Office of Immigration Statistics, and is released to inform interested parties of ongoing research and to encourage discussion of work in progress. The views expressed on statistical and methodological issues are those of the author and not necessarily those of the Office of Immigration Statistics or Department of Homeland Security.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

DRAFT DO NOT CITE OR QUOTE Introduction: the Coverage Question...................................................................................3 Nonsampling Error...............................................................................................................4 Net Coverage Error..............................................................................................................5 Why is Net Coverage Error Important?...............................................................................6 Correcting for Net Coverage Error in Censuses and Surveys.................................................6 The Census: Count Imputation and Post-Enumeration Survey Coverage Correction Factors..................................................................................................................................7 The Effect of Net Coverage Error on the Census................................................................8 Surveys: Weights and Population Controls.........................................................................9 The Effect of Net Coverage Error on Survey Estimates...................................................10 CPS.................................................................................................................................10 ACS................................................................................................................................10 Other Surveys.................................................................................................................10 Adjustments for Potential Net Coverage Error .................................................................11 What has been Asserted about Coverage of the Foreign Born?............................................14 What is Provable About Coverage of the Foreign Born?......................................................15 Ethnographic Studies Suggest that the Foreign Born Avoid Detection............................16 The Foreign Born Tend to Respond Later to Surveys...................................................16 States with Higher Levels of Foreign Born, Particularly Unauthorized or Recent Arrivals, Tend to Have Lower Coverage Ratios...............................................................16 The Foreign Born, Particularly Recent Arrivals, Tend to Live in Areas with Higher Hard-to-Count Scores.....................................................................................................17 Demographic Characteristics of Individual Foreign Born Persons, Particularly Recent Arrivals, Correlate With Hard to Count Indices............................................................18 What Can be Done to Measure Coverage of the Foreign Born?...........................................20 Coverage Measurement Alternatives: Summary...............................................................20 Post Enumeration Surveys/Dual System Estimation.........................................................21 Demographic Benchmarking.............................................................................................22 Demographic Analysis.......................................................................................................22 Direct enquiries..................................................................................................................22 One-Way Record Linking..................................................................................................23 Synthetic Estimation..........................................................................................................23 Research Proposals.................................................................................................................24 References..............................................................................................................................26 Tables and Figures..................................................................................................................31

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

DRAFT DO NOT CITE OR QUOTE

Introduction: the Coverage Question


When conducting a census or survey, it is important that the operations of the census or survey properly represent the target population of interest. In general, the relationship between the enumerated/sampled group and the target population is referred to as the coverage of that population. In the absence of proper coverage, there is coverage error, and the survey estimates or census enumeration are biased with respect to the population. This fact is especially important for derived estimates, such as the unauthorized population, which are difficult in any case due to data gaps. The foreign-born population of the United States is estimated to consist of about 38 million people, 12.6% of the total population of about 302,000,000 as of 2007 (U.S. Census Bureau, American FactFinder, 2008). It is widely assumed that, for various reasons, the foreign-born population is particularly likely to be subject to coverage error (Marcelli and Ong, 2002; Camarota, 2006) in surveys and the census. If true, then this would have wideranging implications for the census itself, ongoing surveys, and for estimates derived from them (including estimates of the naturalized, legal permanent, refugee/asylee and unauthorized subpopulations). We will refer to this question as the coverage question and we will repeatedly use the phrase potential undercoverage to emphasize the potential in the absence of hard proof.2 The purpose of this paper is to comprehensively address the coverage question, and attempt to define an agenda to convert assumptions about foreign-born undercoverage into proof about its existence and magnitude. We will do so in the following steps: First, we will address the sources of error in surveys and the census; Second, we will review the known effects of coverage error on surveys and the census, focusing on the foreign born where possible; Third, we will summarize attempts or recommendations on how to try to measure coverage; Fourth, we will assess which measurements of coverage are most viable; and finally, We will suggest a research agenda to definitively attack the coverage question. In what follows, we will be most careful to distinguish our knowledge in this area between what an individual researcher asserts to be true (what do we think?), what the community of researchers generally agree on (what do we know?), and concrete findings in this area (what can we prove?).3

This is not a new concern; cf. Siegel, 1976:15: This report has tried to develop the view that it is not a practical goal to estimate directly the number of illegal aliens [sic]Thus, the effort to estimate the number of illegal aliens becomes principally an effort to measure the coverage of the total population by nativity (italics added). 3 For the readers interest, this three-part partition of knowledge is derived from the excellent movie, And the Band Played On based upon the book about the AIDS epidemic of the same name (Shilts, 1987).

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

DRAFT DO NOT CITE OR QUOTE Coverage error refers to a number of errors in addition to any sampling variability or other nonsampling error. To more fully describe the term coverage error, we will begin with a general discussion of sampling variability and nonsampling error in surveys and the census.

Sampling Variability
The sampling error of a sample survey can be measured in several ways. The first measure that is usually desired is the variance of the sample estimate. This is the average, over all possible samples, of the squared deviations of the estimates from their expected value. An estimate of the variance can be obtained from the sample survey data themselves. If there are nonsampling errors or the sample is biased, then the deviations are taken around the true value of the statistic and the measure is called the mean square error. Typically, the variance is denoted by 2 and the mean square error by MSE. Of these two measures, the MSE is more general, as illustrated in the formula for MSE. Suppose that p is the value being estimated, and p is the estimator of p; then the MSE of p is given by: MSE ( p ) = E ( p p ) 2 = E ( p E ( p ) + E ( p ) p ) = E ( p E ( p)) + ( E ( p) p )
2 2 2

(1)

= var( p) + bias ( p )

If p is unbiased, then the MSE is just the variance itself. In the presence of coverage error (or other kinds of nonsampling error), p is not unbiased, and contributes to mean squared 4 error.

Nonsampling Error
In addition to the error of the estimates caused by sampling variability, there is another component of the total error in demographic data. Nonsampling error characterizes all surveys, whether sampling is used or notincluding 100% surveys known as censuses. This component arises from mistakes made in the process of eliciting, recording, and processing the response of an individual unit in the surveyed population. Every operation in a census or sample survey, and every factor within an operation, may contribute to nonsampling error. Lessler and Kalsbeek (1992:9) classify survey errors into four types:
4

Lohr (1999:256-258) develops a simple model illustrating the biasing effect of nonresponse. Letting

NM

be the number of nonrespondents in a survey, N the total number sampled in the survey, response for respondents,

p RU the mean

p MU the mean response for nonrespondents, and pU the mean response for the NM ( p RU p MU ) . This bias N

population as a whole, the bias induced by nonresponse is approximately is small only if

NM is small (there is little nonresponse) or ( p RU p MU ) is small (responders arent N

much different than nonresponders).

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

DRAFT DO NOT CITE OR QUOTE Frame errors: problems with developing lists of units to be sampled, duplication or omission of units, and the like; Sampling errors: error or variance associated with sampling itself; Nonresponse errors: errors associated with the failure to include a unit in the survey that should have been included, either through omission by the surveyer or nonresponse or refusal by the respondent; and Measurement errors: errors associated with data collection operations, including the contribution of respondent, interviewer, and questionnaire to error.

Factors that are common to all of these operations, including training procedures and supervision, and technical-staff intervention associated with each operation, may also be sources of nonsampling error. Typically, those operations that are conducted in the office after collection of the data in the field are more amenable to control than are the field operations. It is also generally possible through study of the office operations themselves to measure errors introduced (by the operations) into the data that were collected in the field. Because nonsampling error made by the respondent, in interaction with the interviewer and the questionnaire, is more serious and less amenable to measurement than errors arising from other operations, nonsampling error is often called response error. A typical example of response error made by respondents is the tendency of persons in many countries to report their ages in years ending in zero and five (Ewbank, 1981). Often such response error requires special detection and smoothing methods; such methods are described in Arriaga, Johnson, and Jamison (1994), and Judson and Popoff (2004).

Net Coverage Error


Coverage error can take two forms: overcoverage and undercoverage. The former implies that a sampled unit occurs in the sample more than once. The latter implies that a sampled unit occurs in the sample less than once. Typically, the difference between overcoverage and undercoverage is referred to as net coverage error. Net coverage error can be positive, negative, exactly zero or sum to zero (in this latter case errors offset each other). Furthermore, net coverage error can vary across specific geographic, economic or demographic subgroups. That is, one group could have a positive net coverage error and another group could have negative net coverage errorthus one group is overcovered and the other undercovered. This state of affairs is known as differential coverage error. At this point it is useful to discuss generally the causes of over and undercoverage, before entering into the specific issues associated with the foreign born. Several authors (Fein, 1990; de la Puente, 1993; Anderson and Feinberg, 1999; Judson and Popoff, 2004) enumerate sources of coverage error, which we shall summarize. A source of overcoverage found in Census 2000 was Master Address File frame erroneous enumeration and duplication (Jones, 2003). In the Census 2000 context, operations that created the Master Address File created some units that were later found to be erroneous enumerations, or created multiple records for the same address. (For example, the same physical location might have multiple ways to write the address, and the file would contain

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

DRAFT DO NOT CITE OR QUOTE all of them.) Comparing demographic housing benchmarks with the 1999 Decennial Master Address File, the control file for subsequent operations, implied a 5.3% overcoverage of addresses in that file. This led to overenumeration when the address received multiple forms or in-person followups (Fay, 2001). Similarly, overcoverage can be caused by accidental person duplication. For example, a census form might enumerate a student living away from home at college, while the students parental household also enumerates that same student. As another example, persons in assisted-living situations might be enumerated at a group quarter and at a household. The Census Bureau estimates a lower-bound figure of 5.8 million such person duplicates in Census 2000 (Mule and Fenstermaker, 2003). By contrast, undercoverage comes from many sources. The Census Bureau enumerates many of them: High mobility; Rentership; Language barriers; Neighborhood resistance; Irregular housing; Non-standard living arrangements; Loose attachment to a particular household; Concerns regarding confidentiality (de la Puente, 1993; Darga, 2000; Camarota, 2006). To this list might be added: Active desire for concealment (Tourangeau, et al., 1997; Ellis, 1995; Valentine and Valentine, 1971).

Why is Net Coverage Error Important?


When estimates are produced by a census or survey, typically these estimates are not derived directly from the raw survey or census responses. This means that, even with extensive operational quality control and intensive effort, the data collection does not yet represent the population. As Shapiro and Kostanich (1988:443) state: we believe there is not general awareness of the deleterious effects of response error, and it is rarely estimated. In poorly designed and conducted household surveys, there can be many serious problems. In even the best household surveys, however, undercoverage and response error tend to be high and, in our opinion, are the two most important problems in the sample survey field. (Italics added.) When faced with this situation, there are variety of processing steps and adjustments to the data to attempt to make it represent the population of interest. (These steps are different for a 100% enumeration and a sample enumeration, so we will address them separately.) Typically, the goal is to represent the population as a whole, not necessarily any particular subgroup. Therefore, the question that we will address is: do these adjustments correct for potential undercoverage of the foreign-born population, and if they do not, how much undercoverage remains?

Correcting for Net Coverage Error in Censuses and Surveys


In this section we will describe attempts to correct for net coverage error in the census and in surveys. The approaches are slightly different due to their different data collection context (100% enumeration versus sample selection). One area which we will not consider

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

DRAFT DO NOT CITE OR QUOTE is the area of item imputation, where item nonresponse (failure to collect one or more data element from a survey/census form) requires item imputation. We will, however, consider unit imputation caused by unit nonresponse (where entire sample/enumeration units are missed).

The Census: Count Imputation and Post-Enumeration Survey Coverage Correction Factors
In the census context, extensive efforts are made to execute a complete enumeration: these include advertising campaigns, school presentations, state and local partnership programs, careful organization, quality control checks, multiple language questionnaires, querying neighbors, and multiple nonresponse followups, to name a few. However, despite all efforts, for a small fraction of housing units (about 1.4% of housing units in Census 2000; Zajac, 2003), nothing is known about the composition of the housing unit, even from a neighbor. In this case imputation is used.5 Imputation typically takes two forms: status imputation and count imputation. Status imputation determines whether the nonresponding unit should be considered occupied or vacant; if occupied, count imputation determines how many persons should have been enumerated there, but were not. In Census 2000, both kinds of imputation were executed using the hot deck technique, which typically chooses a nearest neighbor to make the imputation6. Item imputation is then used to fill out the characteristics of the imputed people. For Census 2000, an ambitious coverage correction program was developed, building on the earlier Post-enumeration survey (PES) and Post-enumeration program (PEP) used in past censuses. This coverage program, the Accuracy and Coverage Evaluation (A.C.E.), was a reenumeration of sampled census blocks, from address frame development to household enumeration with specially-designated interviewers. The sample size of the A.C.E. was large enough that it had potential to be used to adjust the census to correct for net coverage error. The statistical theory behind the A.C.E. operation was not, as some suppose, to get the enumeration right in the A.C.E. and then use it to correct the census. Rather, it used dual system estimation theory, which assumes that the two enumerations are independent (rather than one being superior to the other). This independence assumption is critical to understanding why it is difficult to 1) incorporate nativity questions into the analysis in 2010 or 2) use other data sources (such as recent Lawful Permanent Resident Admissions) to estimate coverage. We will deal with these technical matters later in this paper. Using the independence assumption, an estimate of the true enumeration is constructed, for 840 (in 1990) or 416 (in 2000, after collapsing) predefined post-strata or estimation domains. (For example, American Indians living on reservations is one estimation domain;
5

Technically, in Zajac, 2003, this is called substitution to distinguish it from assignment and allocation, two forms of item imputation. 6 Note that these imputations were the subject of a lawsuit against the Census Bureau by the state of Utah, with Utah claiming that hot deck imputation was a form of prohibited sampling for nonresponse, the Census Bureau claiming otherwise. The Supreme Court decided in the Census Bureaus favor by a 5-4 margin. See Cantwell, Hogan, and Styles, 2004, for a summary of Utah v. Evans.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

DRAFT DO NOT CITE OR QUOTE Hispanic renters is another.) From these domains, coverage correction factors were to be derived, and used in non-sampled blocks to adjust the census (see U.S. Census Bureau, 2004, for a detailed design document). The Census Bureau did not commit to adjusting the census; rather, staff conducted a series of evaluations and the Census Bureau was to recommend whether to adjust or not. The history of A.C.E. is documented elsewhere, both by the Census Bureau and by others (Anderson and Feinberg, 1999; U.S. Census Bureau, 2004). In sum, the problem of duplication, respondents interpreting residence rules, and nonindependence between the original enumeration and the A.C.E. enumeration, mitigated against using the A.C.E. to adjust the decennial results. Three decisions, in sequence, were made after several months of evaluation: 1) A.C.E. could not be used for congressional apportionment; 2) A.C.E. could not be used for congressional redistricting; and 3) A.C.E. should not be used for adjusting the postcensal population estimates base (Mulry, 2006). The Census Bureau has not made public any plans for considering adjustment in 2010.

The Effect of Net Coverage Error on the Census


The central purpose of the decennial census is to provide the Constitutional basis for the apportionment of representatives amongst states. A second purpose is to provide small area data for the purpose of congressional redistricting (Government Accountability Office, 1998a). It is the third purpose of a census that is most germane to this paper. This third purpose is to serve as a population base for making postcensal population estimates at many levels of geography: national, state, county, and subcounty. These postcensal population estimates are used for federal funds distribution, but we will not focus on these distributive effects. In this paper, their most important use is for survey population controls, as described in the previous section. To the extent that the census itself has differential net coverage error, that error will be propagated forward into postcensal estimates and projections. An illustration of these impacts can be seen in Robinson, et al. (2003), which compared the Demographic Analysis (DA) method with unadjusted decennial enumerations. Figure one, taken from that paper, exhibits net coverage error7 by 5-year age group and two race categories. -- Insert figure one about here -As can be seen in this figure, comparing decennial census results with the independent DA system of coverage measurement reveals important patterns by race, sex and age. Ages 0-4 and 5-9 tend to exhibit net undercoverage, for all race groups. Outside of these ages,
7

The term used at the time was net census undercountwith a negative net undercount implying an overcount. We prefer the term net coverage error in this context. It is important to note in this context that the race detail that DA can provide is limited to black, white, and all other races. Typically the latter two are combined into a nonblack category. This is a limitation of the data sources used to construct the DA estimate. From our point of view (our interest in coverage of the foreign born), this is a significant limitation.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

DRAFT DO NOT CITE OR QUOTE nonblack females exhibit slight net overcoverage, while black males exhibit substantial net undercoverage from ages 20-64. Black females and nonblack males also exhibit slight undercoverage for approximately ages 30-59. Similar patterns were detected in the 1990 census, as well (Robinson, et al., 2003). To the extent that these coverage errors remain in an unadjusted census, they propagate forward into postcensal estimates, and surveys that use them for postratification controls.

Surveys: Weights and Population Controls


In the survey environment, there are two primary components: Adjustments based on survey design, and adjustments based on population controls derived from the previous census. Coverage correction typically takes the form of a series of weights, each of which attempts to correct for one or another form of net coverage error. For this section, we will take a typical example based on the American Community Survey, although similar techniques are used with other surveys such as the Current Population Survey, Annual Social and Economic Supplement (ASEC). Weights are applied in phases8: Phase one: In phase one, the sampling design is taken into account. This weight accounts for the combined probability of selection of the final sampled unit. Since a sample, by construction, does not cover the entire population of interest, this is the first attempt at correctionto correct for unequal sampling probability. Phase two: In phase two, any nonresponse followup design is taken into account. For the American Community Survey, approximately one in three households who do not respond to the mailing and the telephone followup, are sent to personal interview followup. By design, then, about 2/3rds of the nonresponders are not covered. This weight accounts for that and upweights those households who receive personal interviews. Phase three: After phases one and two, at this stage there are a fraction of households who are noninterviews. (Some have used the phrase hard core nonresponders or similar language.) Little is known about these households except their geographical location and some limited information interviewers can obtain by proxy. Phase three weights, or noninterview weights are typically used to adjust the sampled and responding units to match the geography and information available on the nonresponders. At this point the combination of weights are purely survey oriented. Phase four: After phase three, postratification occurs. In this phase, independent housing unit estimates (typically constructed by demographers) are used as control totals that is, the estimated survey housing unit values are controlled to the independent controls. The purpose is to remove the nonresponse bias from the mean squared error. Phase five: Finally, another round of postratification occurs. In this phase, independent population estimates (again, demographically constructed) are used as control totals for person recordsthe estimated survey person characteristics (some combination of age, race, sex, and Hispanic origin) are controlled to the independent controls. As with housing unit controls, the purpose is to remove nonresponse bias from the mean squared error.
8

These phases are necessarily a summary and approximate the steps without being exact specifications. Typically the weights are combined multiplicatively to generate a final adjustment weight.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

DRAFT DO NOT CITE OR QUOTE

Lohr (1993: 272) presents this caution about this sequence of adjustment weights: The models for weighting adjustments for nonresponse are strong: in each weighting cell, the respondents and nonrespondents are assumed to be similar.These models never exactly describe the true state of affairs, and you should always consider their plausibility and implications. It is an unfortunate tendency of many survey practitioners to treat the weighting adjustment as a complete remedy and to then act as though there were no nonresponse (Italics added).

The Effect of Net Coverage Error on Survey Estimates


The effect of net coverage error in survey estimates can be summarized by the coverage ratio (Shapiro and Kostanich, 1988). A coverage ratio compares the estimate from the sample of the number of people who have a particular characteristic to the same estimate from updated decennial census figures. For example, a coverage ratio of .95 for males aged 50 to 59 indicates that the survey estimate of the number of people in this subpopulation is 95% of the updated census population estimate. Occasionally, the coverage ratio exceeds 1.0, indicating overcoverage of a particular category.

CPS
The Current Population Survey in the 2001-2004 period are summarized at http://www.bls.census.gov/cps/basic/perfmeas/coverage.htm. As noted on that page, average coverage ratios are typically about .90, coverage ratios for males are typically lower than for females, and this is particularly prominent for black males in that survey. Hispanic males and males of other race are also low, in the .80 range. It also appears that the coverage ratios have a slight downward trend over the period.

ACS
The American Community Survey maintains a data quality web page, which summarizes coverage ratios.9 Some evidence of coverage differentials has also been presented at statistical meetings (e.g., Bruce, Navarro and Ahmed, 2007). At the national level, unadjusted ACS estimates exhibit coverage ratios of between .94 and .97 relative to total population estimates through the 2000-2006 period. Males approach 8% undercoverage, females about 4%. Hispanic coverage ratios range from .897 to .964, non-Hispanic white from .949 to .971.

Other Surveys
We only briefly mention other surveys. Three surveys that provide useful information on the foreign-born population are the Survey of Income and Program Participation (SIPP), the New Immigrant Survey (NIS), and the National Agricultural Workers Survey (NAWS).
9

Because ACS samples for personal interview followup, the coverage ratio is necessarily defined slightly differently than the analogous CPS coverage ratio. Coverage ratios can be found at: http://www.census.gov/acs/www/acs-php/quality_measures_coverage_2006.php.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

10

DRAFT DO NOT CITE OR QUOTE The SIPP, by virtue of its longitudinal design, has an additional coverage dimension, apart from the coverage ratios found in other surveys. This additional dimension is sample attrition (the loss of formerly-responding households and people). Attrition, of course, causes later respondent groups to differ from earlier respondent groups, and requires attrition weights to correct for differential attrition across groups.

Adjustments for Potential Net Coverage Error


In a number of publications (e.g., Passel, Van Hook, and Bean, 2004; Passel and Suro, 2005; Passel, 2005, 2006, 2007; Passel and Cohn, 2008), Pew Hispanic Center has published estimates of the size and characteristics of the unauthorized foreign-born population. This method uses the March Current Population Survey (CPS) as the base for the estimate. Undercoverage factors are derived from the Accuracy and Coverage Evaluation (Passel and Cohn, 2008: 13): a 2.0% undercount rate for legal resident immigrants (2.6% for legal resident immigrants who have entered after 1980). Passel and Cohn cite Marcelli and Ong (2002) to justify a 12.5% undercount for unauthorized immigrants in the March CPS. Passel, Van Hook and Bean (2004) use a lower, 9.1% undercount. It is important in this context to point out that the resulting estimate not only uses these assumptions, but the resulting estimate is sensitive to them. Using the Passel, Van Hook and Bean (2004) estimates of the 2000 unauthorized as a base, after some algebraic simplification, the resulting approximate residual estimation formula is (in millions): , where, ( 1-u unauthorized ) unauthorized is the final estimate of the number unauthorized; u legal is the assumed undercoverage rate for legal resident immigrants; and u unauthorized u is the assumed undercoverage rate for unauthorized immigrants. If we fix the parameter u legal to match that of Passel, Van Hook and Bean, at 1.6%, the estimating equation simplifies to: unauthorized = 20.461-12.9 ,. ( 1-u unauthorized ) 1 ( 1-uunauthorized ) . unauthorized = 20.461-13.136 ( 1-u legal )

Thus, the unauthorized population is inflated by

Table one illustrates the impact of this inflator on the final estimate (we vary the parameter from one percent to 20 percent). -- Table one about here -/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

11

DRAFT DO NOT CITE OR QUOTE

As can be seen, the range of possible unauthorized estimates ranges from 7.68 million (the lower bound) to 9.5 million (the upper bound). As can be seen from the center column, the inflation factor increases increasinglythat is, as the assumed rate grows, its impact on the final number also grows. The Office of Immigration Statistics (OIS) is legally mandated to produce an estimate of the stock of the unauthorized population residing in the United States (Immigration and Nationality Act, Section 103(d).) The method is described (cf. Hoefer, Rytina, and Baker, 2008) as a residual method, and is similar to the Pew Hispanic method described above: A population base of the total foreign-born population is constructed from American Community Survey population estimates; This base is then inflated for net undercoverage; From this base is subtracted estimates of deaths, emigrants, legal permanent residents, refugees and asylees, and resident nonimmigrants; and The residual is an estimate of the unauthorized. In this method, two undercoverage factors are applied: a 2.5% net undercoverage rate for the legal permanent resident (LPR), refugee and asylee population as a whole, and a 10% net undercoverage rate for the nonimmigrants10 and the unauthorized. It is important to note that the application of these two assumed net undercoverage rates implies that the resulting estimate is no longer consistent with published census estimates from the ACS. The unauthorized net coverage rate was based on previous DHS/INS unauthorized estimates, which cited Marcelli and Ong (2002). The authors note that the resulting estimate is sensitive to the assumed net undercoverage rate. Like the Pew Hispanic method, with some simplifying assumptions we can perform our own sensitivity analysis. Ignoring some demographic particulars, and using similar notation as above to illustrate the similarities, the 2007 formula is approximately equal to (in millions): unauthorized = 28.8-( 1-u legal )( 20.2( 1-emig)) 1.7( 1-u non )

] ( 1-u

1
unauthorized

where, ulegal is the undercount rate for the legal resident population; emig is the effective emigration rate11; u non is the undercount rate for nonimmigrants; and u unauthorized is the undercount rate for the residual unauthorized population in the numerator. Table two compares the relative sensitivity of the resulting estimate to each of the assumed rates. The shaded center line represents the assumed rates used in 2007 (again noting that this formula is an approximation):
10

Nonimmigrants are sometimes referred to as legal temporary migrants; they include visa holders who are not legally approved for long-term immigration/residence. 11 This is an approximation to the internal formula.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

12

DRAFT DO NOT CITE OR QUOTE

-- Table two about here -The table reveals that, over ranges that are commonly used, the formula is particularly sensitive to the assumed net undercoverage of the unauthorized foreign born. It is also notably sensitive to assumed emigration rates, and less so for the coverage assumptions associated with nonimmigrants and with the legally-resident foreign-born population. We conclude with an examination of labor force estimates. During the period of economic prosperity of the 1990s, an anomaly appeared between the Current Population Surveys estimate of the labor force, and the Current Economic Statistics survey of establishment reported jobs. In essence, the two series, normally running almost in parallel, began to diverge, with the CES reporting greater growth in jobs than the CPS was reporting growth in the labor force (Juhn and Potter, 1999). Juhn and Potter analyzed the differences as of 1999, considering three hypotheses: that the surveys treat multiple jobholding differently; that the payroll survey (CES) is upwardly adjusted by benchmarking; and that there was an undercount of the working-age population in the calculation of the household survey estimates. Their conclusion is notable: We find that the third explanationan underestimated working-age population best accounts for the recent rise in the employment gap. Since the household survey calculates the level of employment by combining survey data with a census-based estimate of the U.S. working-age population, an undercount of that population will produce low employment numbers. Evidence suggests that the census has in fact historically underestimated this population. Significantly, the undercount appears to be highest among groups whose employment status is very sensitive to business cycle fluctuations. We contend that the steady expansion of the economy in the 1990s has enabled these cyclical workers to find employment. Their numbers, only partly captured in the censusand, by extension, in the household surveyhave in recent years helped to boost the job count in the payroll survey, widening the gap between the surveys employment estimates (Juhn and Potter, 1999:1). The recently- (within 10 years) and very-recently- (within 5 years) arrived foreign born are more likely than older foreign born or native born persons to fall into the working age population (Mosisa, 2002). To the extent that the foreign born are among this group, and the unauthorized foreign born in particular, then the potential undercoverage would systematically underestimate the size of the labor force, particularly during expansions. The CES/CPS differential in the 1990s, and the subsequent (in 2000) discovery of additional, as-yet-undetected net immigration (see, e.g., Robinson, et al., 2002: 22, or

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

13

DRAFT DO NOT CITE OR QUOTE Nardone, et al., 2003:12),12 is consistent with this interpretation, and suggests the sensitivity of these results to potential undercoverage of the foreign born. An analysis of the effect of CPS/CES differences on unemployment rates and other quantities of interest was presented by Schweitzer and Ransom (1999). They calculate the effect on late 1990s unemployment rates based on assuming that the CPS employment levels followed those of the CES. Their conclusions are presented in their Figure 1: either unemployment rates must have been much smaller than reported in the CPS during that period, or labor force participation rates must have risen very quickly, or something was wrong with the population controls. Again, this is consistent with, but does not prove, the subsequent discovery of additional net immigration, and suggests the sensitivity of these results to potential undercoverage of the foreign born.

What has been Asserted about Coverage of the Foreign Born?


Marcelli and Ong (2002), provide results of a study of foreign-born Mexicans in Los Angeles county; these results have been widely cited and used far outside their original context.13 In this study, Marcelli and Ong developed a direct survey enquiry: respondents in households were asked directly, was [this person enumerated in the household] included in the 2000 questionnaire sent to the Census Department? Later in the survey, respondents legal statuses were also ascertained. They obtained a gross undercoverage rate of 10.6% for unauthorized respondents, 8.3% for legal immigrants, 4.5% for U.S. citizens, and 7.1% for temporary visitors. In contrast to direct enquiries, Van Hook and Bean (1998) used a demographic analysis approach, generating an expected population size (based on vital statistics) and comparing it to the obtained population size in the 1990 census. Their results suggest net undercount rates that range from 15-25 percent for unauthorized Mexican immigrants. In the context of evaluating coverage in Census 2000, Deardorff and Blumerman (in Robinson, 2001: Appendix A) develop several net undercoverage assumptions by migrant legal status, and examine their impact on final demographic analysis estimates. In part, this was an attempt to explain the then-discrepancy between early A.C.E. coverage results and demographic analysis coverage results. For purposes of their scenarios, these assumptions ranged from 1-2% for legal migrants, 7-35% for legal temporary migrants, and 10-15% for
12

Robinson, et al.:For the 2000 DA, we were particularly concerned about the reliability of the immigration components and conducted a sensitivity analysis in response. This analysis led to the incorporation of an alternative set of DA estimates to allow for the possible understatement of immigration (specifically, undocumented immigration) in the initial DA components of growth. Nardone, et al.: Additional studies carried out by the Census Bureau Population Division as part of the estimates evaluation indicated that the estimates of unauthorized migrants that were used in the 1990 based intercensal estimates were too low. The evaluations indicated that the residual foreign born population increased by about 5 million during the 1990 to 2000 decade rather than the 2.25 million (10 x 225,000) assumed in the 1990 based estimates. 13 For the record, Marcelli (personal communication) considers applying these results to geographical contexts outside of Los Angeles County or to other foreign-born populations residing in the United States to be questionable at best.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

14

DRAFT DO NOT CITE OR QUOTE the unauthorized. They note (p. A-12) that the coverage assumptions do not explain the different total populations calculated by DA and the A.C.E. Passel, Van Hook, and Bean (2004), Passel and Suro (2005), Passel, 2005, 2006, 2007; Passel and Cohn, 2008), developed a series of estimates of the foreign born population by legal status, including characteristics. These estimates are widely cited in the mainstream media. These estimates assume a net undercoverage rate of 2% for legal resident immigrants (2.6% for those entering after 1980), based on census studies of net coverage error. For the unauthorized, however, a net undercoverage rate of 12.5% is assumed. In Passel and Cohn (2008), Marcelli and Ongs 2002 work is cited. A further source of assertions about coverage of the foreign born is the Office of Immigration Statistics estimates of the unauthorized population. As noted above, two undercoverage factors are applied: a 2.5% net undercoverage rate for the LPR, refugee and asylee population as a whole, and a 10% net undercoverage rate for the nonimmigrants and the unauthorized. It is important in this context to quote the justification for these rates: This was the same rate used in previous DHS estimates (Department of Homeland Security, 2007). Of course, the previous estimates cited other previous estimates, and eventually this assumption can be traced to Office of Policy and Planning estimates developed by Robert Warren (2003), who cited Marcelli and Ong (2002). Other ranges have been applied to earlier censuses.14 Unofficial estimates by Census staff put the undercount of illegal immigrants at about 33 percent in the 1980 census (Government Accountability Office, 1998b; Fernandez and Robinson, 1994); Passel (1986) suggested a range between 33 and 50 percent. For the 1990 census, various analyses put the figure at roughly 20-30 percent (Woodrow, 1991; Van Hook and Bean, 1997; WoodrowLafield, 1995). GAO (1998) cites these and other sources in attesting to the difficulty of assessing these various assertions. For completeness, it is important to note the A.C.E. results from Census 2000, and what the non-adjustment decision of the Secretary of the Commerce Department (coinciding with the Census Bureaus recommendation [ESCAP II]) implies. Because overall net coverage error was close to zero, and because it could not be determined that A.C.E.-based adjustments would improve distributional accuracy, no adjustment was performed. This implies that the net coverage error of the foreign-born population is, for all practical purposes, zero. Implicitly, then, subsequent to Census 2000 all population estimates built on the census base are by assumption covering the foreign-born population adequately.

What is Provable About Coverage of the Foreign Born?


While much is assumed or asserted to be true about coverage of the foreign born, these claims are difficult to prove. In this section we will survey some empirical results from the literature, and provide some results of our own. These results suggest that net
14

It is widely believed (e.g., Anderson and Feinberg, 1999) that Census-taking has improved steadily throughout the 20th century (duplication in Census 2000 possibly being a symptom of an overly aggressive attempt to eliminate undercoverage). Thus, estimates of undercoverage from earlier censuses may not apply to Census 2000.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

15

DRAFT DO NOT CITE OR QUOTE undercoverage of the foreign-born population is higher than for the rest of the population. In the following sections, we will use the term recent arrival to refer to foreign-born persons whose year of entry is within 10 years of the current survey period, and the term very recent arrival to refer to those whose year of entry is within five years of the current survey period.15

Ethnographic Studies Suggest that the Foreign Born Avoid Detection


In the early 1990s, several ethnographic studies were performed, working with 29 (necessarily local and not statistically generalizable) sample areas containing some foreignborn groups. De la Puente (1993: 4) summarizes many of these results. De la Puente notes that complex households were common in sample areas with recent immigrants, especially Hispanic immigrants. These ad hoc households protect the identity of members and thus contribute to within-household undercoverage.16 Complex households (often containing more members than allowed by law or by building management) combine with fear of disclosure to create avoidance.

The Foreign Born Tend to Respond Later to Surveys


Work performed by Camarota and Capizzano (2004) mixed ethnographic with quantitative analyses of operational ACS data. While the ethnographic results were broadly comparable with the previous section, the quantitative data reveal that, in the study areas, the foreign born are more likely to be captured later in the survey process. Figure two and three, taken from Camarota and Capizzano (2004), illustrate this: -- Insert figures two and three about here -As can be seen in these figures, foreign-born respondents are disproportionately responding in the later, telephone assisted (CATI) or personal interview (CAPI) phases of the ACS. Furthermore, certain countries of birth of foreign-born respondents (including some countries that are commonly held to be major sources of unauthorized immigration) tend to be captured later in the operational process. Obviously, complete household nonresponse represents the ultimate late responderand it is widely asserted by survey experts (with evidence given by, e.g., Treat and Stackhouse, 2002) that late responders or households captured in nonresponse follow-up are systematically different than early responders.

States with Higher Levels of Foreign Born, Particularly Unauthorized or Recent Arrivals, Tend to Have Lower Coverage Ratios
Figure four plots coverage ratios in the 2006 American Community Survey against the percent of the total population that is a recent arrival foreign born, for states. Overlaid
15

Obviously, the choice of years is somewhat arbitrary. Wilson (2008) uses a 10-year period to indicate recent; Clark and Patel (2004) use a 5-year period to indicate recent. The results described below are robust to either definition. 16 One household cite in this study indicated that of the thirteen persons living in this household, only six were enumerated in the 1990 census.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

16

DRAFT DO NOT CITE OR QUOTE onto this plot is a simple linear regression and 95% confidence interval. The regression line is analytically weighted to reflect the total population size of the state. -- Insert figure four about here -As can be seen, states with more recently arrived foreign born systematically tend to have lower total coverage ratios. It is important in this context to mention the ecological fallacy in this regard (Robinson, 1950). These are state level data, and demonstrating a correlation between recently arrived foreign born and lower coverage ratios does not necessarily imply individual-level net undercoverage. The demonstrated relationship suggests, but does not prove, such an individual-level relationship.

The Foreign Born, Particularly Recent Arrivals, Tend to Live in Areas with Higher Hard-to-Count Scores
Our final test uses the Planning Database (Robinson and Bruce, 2007; see also Bruce and Robinson, 2003; and Bruce, Robinson, and Sanders, 2001), a tool designed by the Census Bureau for planning and targeting areas that are harder to count than other areas. This database, using Census 2000 short form and long form data, contains tabulations of data on variables relevant to net undercoverage. Further, for 65,184 census tracts, a hard to count score is calculated. This score ranges from 0 (representing the easiest to count tracts, with none or very few indicators) to a theoretical maximum of 132 (representing tracts with all indicators of net undercoverage). Again we wish to be careful not to commit the ecological fallacy (Robinson, 1950). We emphasize that these are tract level data, and while we will show that there is a correlation between recently arrived foreign-born persons and other hard to count indicators, we recognize that definitive proof requires individual level data. To emphasize this point, we will continue to use the phrase areas to refer to tracts. The hard to count score is a sum of weighted scores. The following description is from Robinson and Bruce (2007: 7): a total of 12 variables that were correlated with nonresponse rates in 1990 and 2000 are used to derive the HTC score. The set of algorithms used to determine HTC scores is as follows: (1) each individual variable is sorted across geographic areas from high to low (e.g., sort tracts from highest percent poverty to lowest percent poverty), (2) scores (0 to 11) are assigned to each variable for each tract (e.g., values of 11 are given to tracts with the highest poverty rates of over 44.3 percent and values of 0 are given to tracts below the national median poverty rate of 9.9 percent in 2000), (3) the scores assigned to each of the 12 variables for a tract are summed to form a composite HTC score for the tract. The final HTC score is the sum of the ratings of the following components: 1) Percent vacant housing units;

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

17

DRAFT DO NOT CITE OR QUOTE 2) Percent housing units that are not single-family structure; 3) Percent renter-occupied housing units; 4) Percent crowded occupied units; 5) Percent of families not in a husband/wife configuration; 6) Percent of occupied housing units with no phone service; 7) Percent of persons 25 and older not a high-school graduate or more; 8) Percent of persons below the poverty level; 9) Percent of persons receiving public assistance; 10) Percent of persons 16 and older unemployed; 11) Percent of households that are linguistically isolated; 12) Percent of occupied housing units where the owner moved in within 1999-2000. We have demonstrated above that the foreign born, and particularly recent entry foreign born, tend to have characteristics that correlate with the items that enter into the hard to count score. Do they also tend to live in areas that have high HTC scores? Figure five presents a simple graphical analysis, with a linear regression (weighted by population size in each tract) fit overlaid onto the point cloud. Each point is a census tract. -- Figure five about here As can be seen, there appears to be a correlation between percent foreign born who have entered within the last ten years and the hard to count scores. And, paradoxically, when a linear regression is fit, for tracts where there are high levels of foreign born, beginning about about 80% of total population, the linear fit begins to make predictions that are impossibly high (that is, above 132). This simple graphical finding suggests that there is residual variance to be explained that is not captured in the components included in the score itself. To test this hypothesis, we run a regression of HTC score on the twelve components that enter into that score, with the addition of one variable: percent of total population that is foreign born and has a year of entry within the last ten years. The result of this regression is presented in table three. -- Insert table three about here As can be seen, and not surprisingly, the components of the HTC score tend to predict that score. Additionally, conditional on all other components of the score being held constant, every one percent increase in recent entry foreign born in a census tract results in an expected .475 point increase in the HTC score.

Demographic Characteristics of Individual Foreign Born Persons, Particularly Recent Arrivals, Correlate With Hard to Count Indices
In previous sections, we have dealt with aggregate data at the state and tract levels. First, we have shown that coverage ratios tend to decline as the percentage of the population who are recent arrival foreign born increases. Second, we have shown that hard to count scores for tracts tend to increase as the percentage of the population who are recent arrival foreign

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

18

DRAFT DO NOT CITE OR QUOTE born increases, and that this relationship remains even when the components of the score itself are used to predict it. In each of the preceding sections we have noted the potential for the ecological fallacy. In this section we use microdata from the 2007 American Community Survey Public Use Microdata Sample to examine whether the components that correlate with net undercoverage are concentrated among individuals who are foreign born, in particular recent arrivals. We use the characteristics of individuals and households identified by the Census Bureaus planning database (Robinson and Bruce, 2007) as predictive of low return rates and hard to count characteristics. Our analysis is simply framed. We have partitioned the population into three groups: native born (including those born abroad of American parents); older foreign born (those foreign-born persons whose year of entry is earlier than 10 years from the survey date, i.e., 1997); the recent foreign born (those foreign-born people whose year of entry is within 610 years of the survey date, 1997-2001), and the very-recent foreign born (those foreignborn people whose year of entry is within five years of the survey date, 2002 or later). We proceeded by comparing whether these four groups differ on each of the individual hard to count characteristics. Recall that the characteristics identified in the hard to count score are: vacant housing units, housing units that are not a single-family structure, renteroccupied housing units, crowded occupied units, families not in a husband/wife configuration, occupied housing units with no phone service, persons over 25 who have not graduated from high school, persons below the poverty level, persons receiving public assistance, persons aged 16 and over who are unemployed17, households that are linguistically isolated, and occupied housing units where the owner moved in within 1999-2000. With the exception of vacancy (which does not apply), we look at each individually, rather than in a multivariate context, performing simple difference of proportions (percentages) tests (Agresti, 1990) to assess statistical significance. In each case our null hypotheses are stated simply: H0(1): The percentage of persons who exhibit the hard to count characteristic does not differ between native born and non-recent foreign born. H0(2): The percentage of persons who exhibit the hard to count characteristic does not differ between native born and recently arrived foreign born. H0(3): The percentage of persons who exhibit the hard to count characteristic does not differ between native born and very recently arrived foreign born. Our alternative hypotheses are the converses of these null hypotheses. We use the 90% two-tailed confidence interval; thus, if the confidence intervals of the two comparison groups do not overlap, we will reject the null hypothesis, otherwise, retain. Table four contains these tests.

17

Note that persons who are out of the labor force are not considered unemployed in this definition; thus this percentage does not correspond to the traditional unemployment rate definition.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

19

DRAFT DO NOT CITE OR QUOTE -- Insert table four about here -We summarize the conclusions from this table: 1. The foreign born, particularly recent arrivals, are more likely to live in non-singleunit houses; 2. The foreign born, particularly recent arrivals, are more likely to be renters rather than owners; 3. The recently arrived foreign born are more likely to live in crowded housing units; 4. The recently arrived foreign born are more likely to live in non-husband/wife families; 5. The recently arrived foreign born are more likely to live in houses without a telephone; 6. The foreign born (age 25+) are more likely to not have graduated from High School; 7. The recently arrived foreign born are more likely to live in households below the poverty line; the non-recently-arrived foreign born are less likely to live in households below the poverty line; 8. The non-recently-arrived foreign born are more likely to be receiving public assistance; the recent foreign born are less likely to be receiving public assistance; 9. The recently arrived foreign born are more likely to be unemployed; 10. The foreign born, particularly recent arrivals, are more likely to live in linguistically isolated households; 11. The recently arrived foreign born are more likely to be recent movers; the nonrecently-arrived foreign born are less likely to be recent movers. In sum, of the eleven characteristics that are considered to make a household hard to count, the recently-arrived foreign born are more likely to exhibit ten of the them, and less likely to exhibit one (public assistance receipt). The non-recently-arrived foreign born exhibit six of these characteristics; for three characteristics the native born and nonrecently-arrived foreign born are statistically indistinguishable; and for two characteristics the non-recently-arrived foreign born are less likely to exhibit the characteristic.

What Can be Done to Measure Coverage of the Foreign Born?


Despite the stated importance of this topic to many agencies (Government Accountability Office, 1998:57-58), there have been few attempts to measure net undercoverage of the foreign born. This section will detail alternatives, with appropriate caveats stated.

Coverage Measurement Alternatives: Summary


Coverage measurement is a difficult topic to summarize, but we shall attempt a brief summarization here. Table one describes the coverage measurement approaches and their implied coverage ratios. One can imagine defining coverage as doing it better, that is, determining what enumeration should have occurred. In this case the coverage ratio is the enumeration divided by the truth (as determined by the better system). Dual system

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

20

DRAFT DO NOT CITE OR QUOTE estimation is based on doing it independently. In this case the coverage ratio is the enumeration divided by the independence model. The remaining methods, Reverse record check, Megalist/Benchmark, and Demographic accounting, each assume that the traced sample, the megalist or benchmark list, or the demographic summation, determine the coverage ratio (Popoff and Judson 2004: 634 list these five alternatives; Judson, 2006 discusses coverage ratios.) -- Insert table five about here --

Post Enumeration Surveys/Dual System Estimation


A standard for evaluating coverage in a census is the Dual system estimation (DSE) method using a post enumeration survey. (See Popoff and Judson, 2004:633-637 for a summary). The DSE method has a long history; see, e.g., Chandrasekaran and Deming, 1949; Marks, Seltzer and Krotki, 1974; Wolter, 1986; Hogan, 1992, 1993 and 2000; for theory and examples of the method in practice. While the 1950 Census was the first to use a post enumeration program, and the method has been used subsequently (with increasing technical sophistication). The DSE method is a microdata approach (focusing on individual responses) rather than an aggregate approach (focusing on demographic aggregates; Judson, 2006). The 2000 Accuracy and Coverage Evaluation was based on the census short form. Because of this, it was not possible to create a specific estimation domain for the foreign born. (As the nativity question is not asked on the short form, it is not possible to assign individual respondents to that domain so as to construct coverage estimates of that domain.) It has been asserted (e.g., Hogan, 2008) that technical and policy issues make it impossible to use dual system estimation to construct a coverage estimate specifically for the foreign born. The operative phrase here is, of course, dual-system estimation. Assuming that a dual system estimate is the gold standard, and noting that no legislative mandate exists to specifically construct an estimate of coverage for the foreign born18, the question is moot without asking nativity on both the 2010 census and the CCM survey, it is not possible to classify individuals sufficiently to construct a full dual system estimate. What is not spoken of is the possibility of some other kind of estimate of coverage.19 It is to these other kinds of estimates that we now turn.

18

Raising the question, of course: would there be any support for such a legislative mandate, that is, to specifically construct a dual system estimate, and corresponding coverage correction factor, using nativity status as an estimation domain? 19 Suppose that we stipulate, for the sake of argument, that dual system is the gold standard. Even so, that stipulation raises the further question: if we cannot generate a gold standard estimate, does that mean we should not attempt to produce some estimate, recognizing its caveats?

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

21

DRAFT DO NOT CITE OR QUOTE

Demographic Benchmarking
A method that we will describe as demographic benchmarking has been presented in Pitkin and Park (2005), and proposed by Camarota (2006). The demographic benchmarking method develops, for an appropriate target population, the highest-quality demographic data available to construct an estimate of the target population, in various aggregate quantities (e.g., age groups). These benchmarks are compared to census or survey results, and where the census or survey results differ from the demographic benchmarks, interpret the difference as net coverage error. The demographic benchmarking approach stands or falls on one particular assumptionthat the benchmark is of sufficiently high quality to serve this role. What benchmarks have been used successfully? For the benchmark to be of sufficiently high quality, it must be measured with little or no error. Of the demographic statistics available currently, only two fully qualify: vital statistics on births, and vital statistics on deaths. (In the United States, data on Medicare enrollments are a candidate, with relatively minor correction for historical underenrollments; Robinson, et al, 2002.)

Demographic Analysis
An extension of the demographic benchmarking approach is what we shall call demographic analysis (which we alluded to earlier). What distinguishes demographic analysis from benchmarking is that it attempts to construct a complete estimate of the population of interest, rather than particular segments of it. Like benchmarking, it is fundamentally an aggregate approach rather than a microdata approach. The use of demographic analysis to assess net coverage is similar to the benchmarking approach in the following ways: For an appropriate target population, it uses the highest-quality demographic data available to construct an estimate of the target population, in various aggregate quantities (e.g., age groups). For less-well-known groups (e.g., immigrants; emigrants; unauthorized persons), demographically-plausible models are constructed. and then the demographic benchmark is compared to census or survey results, interpreting the difference as net coverage error. The challenge of the demographic analysis approach is that it makes more assumptions than the benchmarking approach. The benchmarking approach can rely on the relative strength of its underlying data sources: vital statistics, housing, school enrollment, or employment data. Demographic analysis, in contrast, must rely on the former and weaker assumptions about components of migration.

Direct enquiries
Marcelli and Ong (2002), after reviewing demographic analysis and dual-system approaches (with an abbreviated discussion of synthetic estimation, to which we shall turn subsequently), propose a direct enquiry as a method of estimating census undercoverage. This approach is microdata oriented, in that respondents in households are asked directly was [this person enumerated in the household] included in the 2000 questionnaire sent to the Census Department? Thus, for an appropriate target population, a list of persons in that target population is constructed, adjusting for survey nonresponse. Within that target population, a direct enquiry as to whether the person was captured in census records is
/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

22

DRAFT DO NOT CITE OR QUOTE performed, and were a respondent indicates that they were not reported on the census list, interpret the report as a census coverage error, and calculate the gross undercoverage rate from these data. Obviously, the question about reporting to the census has the potential to be sensitive, presumably more so for those foreign-born persons of unauthorized or ambiguous legal status. Because the question is sensitive, it is easy to imagine that some version of social desirability bias will play a role in the respondents answer. Marcelli and Ong, while recognizing the above criticism, defend the approach as follows: They work directly with the local population of interest; they use interviewers that are as non-threatening as possible; and they ask questions in the vernacular, with native speakers. All of these approaches are designed to reduce response error due to respondent fear or resistance.

One-Way Record Linking


A microdata approach for estimating coverage error was tested by Heer and Passel (1987) in the Los Angeles metropolitan area. This method will be referred to as one-way record linking. For an appropriate target population, a list of persons in that target population, with appropriate individually identifying information (e.g., full name, full date of birth, geographic locators) is constructed. This list is linked with the decennial enumeration list, and where a nonlink occurs with the census list, interpret the nonlink as a census coverage error, and calculate the gross undercoverage rate from these data. The key to this approach is the assumption that the list for the target population is a complete list, and that therefore any difference between the list and the census enumeration must be gross census undercoverage. A weakness of the approach is the assumption that neither list contains gross overcoverage. Privacy concerns about this use of records might arise, as well.

Synthetic Estimation
Synthetic methods for estimating coverage involve combining information from a coverage evaluation survey with demographic characteristics of the population of interest.20 The synthetic approach also has a long history (e.g., Gonzales, 1978) and makes use of direct and indirect information. In fact, the intended application (in A.C.E. in 2000) of the dual system estimator itself would have used synthetic estimation methods for non-sampled census areas. In the intended A.C.E. 2000 application, nonsampled census blocks would have been treated as an estimation domain to which coverage correction factors would have been
20

This is a specific application of synthetic estimation in general, in which survey estimates of some specific characteristic are combined with demographic characteristics to construct the final estimand.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

23

DRAFT DO NOT CITE OR QUOTE applied synthetically. (A description of the application of the synthetic method can be found in Hogan, 2000, or Judson and Popoff, 2004.) The innovation proposed here is to treat the foreign-born population, or some subset of the foreign-born population (such a recent arrivals) as an estimation domain, and an estimate of the unauthorized foreign-born population as a separate estimation domain, rather than nonsampled census blocks, and apply the dual system coverage estimates to them. This would required tabulating the foreign-born population enumerated in Census 2000, allocating the foreign born respondents to appropriate A.C.E. (revision II) post strata, determining the proportion of the foreign-born population that falls into each stratum, constructing the coverage factors from A.C.E. Revision II post strata, and calculating a synthetic estimate of the net coverage factors for the foreign-born population as a whole. The strength of this synthetic approach is that it would use the best available statisticallydesigned coverage data, rather than rely on demographic assumptions or the assumption that one or more list is a benchmark. A weakness is that it makes the implicit assumption that the foreign born have the same coverage factors as the population as a whole within a post-stratum (the synthetic assumption). If it is assumed that the foreign-born population has at best net coverage error equal to the population as a whole, then this method would generate an upper bound net coverage error rate.

Research Proposals
We have argued in this paper that assessing the potential undercoverage of the foreign-born population is an important task for the statistical community. We have presented ethnographic, survey, and demographic evidence for such undercoverage. We have presented new findings that suggest, but do not prove, that the foreign-born population might have differential undercoverage both in the census and in ongoing surveys. We have shown that population estimates of public policy significance are highly sensitive to an assumed rate. Given these findings, it is natural to conclude that the statistical community has a responsibility to find a way to estimate that number, the potential net undercoverage of the foreign born. We shall now summarize a sequence of research tasks to approach that number, beginning with the easiest-to-implement and proceeding into more difficult approaches. The first method on our list is demographic benchmark studies. Pitkin and Park (2005) demonstrated that birth registry data, combined with reasonable demographic assumptions, can construct a benchmark population estimate from which an estimate of coverage could be derived. Because Pitkin and Parks method is based on birth registry data, it does not suffer the weaknesses of the larger demographic analysis approachit requires fewer difficult-to-maintain demographic assumptions. This approach would provide a net coverage error estimate.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

24

DRAFT DO NOT CITE OR QUOTE The second method on our list is synthetic estimation. Assuming no congressional mandate for a coverage estimate by nativity is promulgated, it is possible to cross-classify the foreign-born on characteristics evaluated in the A.C.E. in 2000 (or those to be evaluated in Census Coverage Measurement in 2010). Using such cross-classification and published coverage correction factors, it is a mathematical exercise to derive a synthetic estimate. This approach would provide a net coverage error estimate. The third method on our list is that of developing a direct survey. Marcelli and Ong (2002) have demonstrated that direct enquiry, combined with demographic analysis, can begin to make headway in understanding the potential gross undercoverage of the foreign born. Following Marcellis arguments, it would appear that such a survey would best be fielded by a trusted, non-governmental entity, and designed from bottom-to-top to allay respondents privacy concerns. This approach, as with one-way record linkage, would not provide information on gross overcoverage. The fourth method on our list is a one-way record linkage study. We have described above the technical limitations of a one-way record linkage, noting in particular that the assessment of coverage is biased by the presence of record linkage error. However, it appears to us to that results from such a study could be adjusted for the presence of such error (e.g., Judson, 2007: 497), yielding, at least, some direct information about coverage of the foreign born. While this approach would provide information on gross undercoverage, it would not provide information on gross overcoverage. The fifth method involves testing the feasibility of a Post-enumeration Survey in the context of the American Community Survey. In accord with Hogan (2008), Census 2010 will not have a nativity question, thus a dual system estimate using nativity as an estimation domain is not possible. However, the American Community Survey does include a nativity question. With the development of appropriate statistical theory to account for the ACSs complex sample design, a post-ACS survey, designed along similar lines as the existing Census Coverage Measurement system, would provide coverage measurements by nativity (and presumably, other relevant characteristics). Our sixth and final method involves testing nativity questions in a CCM framework for post-2010 purposes. The sensitivity of nativity questions in a CCM framework might bias the dual system estimator by inducing correlation bias amongst the foreign-born population. While this is a reasonable concern, it warrants empirical testing. If, in fact the impact of a nativity question is negligible, and congressional mandate for such estimation were in place, then there would be no reason not to apply the gold standard coverage measurement technique to developing a statistically-principled estimate for the net coverage of the foreign born.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

25

DRAFT DO NOT CITE OR QUOTE

References
Agresti, A. 1990. Categorical Data Analysis. New York: Wiley. Anderson. M. J. and Feinberg, S. E. 1999. Who Counts? The Politics of Census-Taking in Contemporary America. New York, NY: Russell Sage Foundation. Arriaga, E,E,, P.D. Johnson, and E. Jamison. 1994. Population Analysis with Microcomputers, Vol. 1 and 2. Washington, D.C.: U.S. Department of Commerce, U.S. Census Bureau, International Programs Center. Bill, W. 2002. A.C.E. Revision II: Calculating Aggregate Data Defined, Correct Enumeration, and Census Inclusion Rates (For Groups that Involve Aggregation Across Post-Strata). Online: http://www.census.gov/dmd/www/pdf/pp-40r.pdf. Bruce, A., and Robinson, J. G.. 2003. The Planning Database: Its Development and Use as an Effective Targeting Tool in Census 2000," paper presented at the Annual Meetings of the Southern Demographic Association, Arlington, VA, October 24, 2003. Bruce, A., Robinson J. G., and Sanders, M. V.. 2001. Hard-to-Count Scores and Broad Demographic Groups Associated with Patterns of Response Rates in Census 2000," Proceedings of the Social Statistics Section, American Statistical Association. Camarota, S. and Capizzano, J. 2004. Assessing the Quality of Data Collected on the Foreign Born: An Evaluation of the American Community Survey (ACS). Online: http://www.sabresystems.com/whitepapers/CIS_whitepaper.pdf. Camarota, S. 2006. Assessing the Quality of Data Collected on the Foreign Born: An Evaluation of the American Community Survey (ACS). Paper presented at the U.S. Census Bureau Conference, Immigration Statistics: Methodology and Data Quality. Alexandria, VA: February 13-14, 2006. Cantwell, P.J., Hogan, H., and Styles, K.M. 2004. The Use of Statistical Methods in the U.S. Census: Utah V. Evans. The American Statistician, 58: 203-212. Chandrasekar, C., and Deming, W.E. 1949. On a Method of Estimating Birth and Death Rates and the Extent of Registration. Journal of the American Statistical Association, 44: 101-115. Clark, W.A.V. and Patel, S. 2004. Residential Choices of the Newly Arrived Foreign Born: Spatial Patterns and the Implications for Assimilation. California Center for Population Research On-Line Working Paper Series, CCPR-026-04, February 2004. Darga, K. 2000. Fixing the Census Until it Breaks: An Assessment of the Undercount Adjustment Puzzle. Lansing, MI: Michigan Information Center.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

26

DRAFT DO NOT CITE OR QUOTE

De la Puente, M. 1993. Why Are People Missed Or Erroneously Included By The Census: A Summary Of Findings From Ethnographic Coverage Reports. Research Conference on Undercounted Ethnic Populations. Richmond, VA: U.S. Census Bureau. Deardorff, K. and Blumerman, L. 2001. Appendix A: Estimates of the Foreign-Born Population by Migrant Status: 2000. In Robinson, J.G. 2001. ESCAP II: Demographic Analysis Results. Executive Steering Committee for A.C.E. Policy II, Report No. 1, October 13, 2001. Online: http://www.census.gov/dmd/www/pdf/Report1.PDF. Ellis, Y. 1995. Examination of Census Omission and Erroneous Enumeration Based on 1990 Ethnographic Studies of Census Coverage. Pp. 515-520 Proceedings of the American Statistical Association (Survey Research Methods Section). Alexandria, VA: American Statistical Association. Ewbank, D.C. 1981. Age Misreporting and Age-Selective Underenumeration: Sources, Patterns, and Consequences for Demographic Analysis. Washington, D.C.: National Academy Press. Fay, R. E. 2001. The 2000 Housing Unit Duplication Operations and Their Effect on The Accuracy Of The Population Count Paper presented at the Annual Meeting of the American Statistical Association, Atlanta, Georgia, August 5-9, 2001. Fein, D. J. 1990. Racial and ethnic differences in U.S. census omission rates. Demography, 27:285-302. Fernandez, E.W., and Robinson, J. G. 1994. "Illustrative Ranges of the Distribution of Undocumented Immigrants by State," Technical Working Paper No. 8. October 1994. Online: http://www.census.gov/population/www/documentation/twps0008/twps0008.html. Government Accountability Office 1998a. Decennial Census: Overview of Historical Census Issues. GAO/GGD-98-103. Washington D.C.: U.S. Government Accountability Office. Government Accountability Office 1998b. Immigration Statistics: Information Gaps, Quality Issues Limit Utility of Federal Data to Policymakers. GAO/GGD-98-164. Washington D.C.: U.S. Government Accountability Office. Hoefer, M., Rytina, N., and Baker, B. 2008. Estimates of the Unauthorized Immigrant Population Residing in the United States: January 2007. Online: http://www.dhs.gov/xlibrary/assets/statistics/publications/ois_ill_pe_2007.pdf. Hogan, H. 1992. "The 1990 Post-Enumeration Survey: An Overview." The American Statistician, 46: 261-269.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

27

DRAFT DO NOT CITE OR QUOTE Hogan, H. 1993. "The 1990 Post-Enumeration Survey: Operations and Results." Journal of the American Statistical Association, 88:1047-1060. Hogan, H. 2000. Accuracy and Coverage Evaluation: Theory and Application. Paper presented at the 2000 Joint Statistical Meetings, Indianapolis, Indiana, August 2-5, 2000. Hogan, H. 2008. Letter to Ms. Judith Droitcour, Assistant Director, Applied Research and Methods, U.S. Government Accountability Office. Dated February 26, 2008. Jones, J. 2003. Housing Unit Duplication in Census 2000. Census Bureau Evaluation O.10. Washington, DC: U.S. Census Bureau. Online: http://www.census.gov/pred/www/rpts/O.10.PDF. Judson, D.H. 2006. Demographic Coverage Measurement: Can Information Integration Theory Help? Paper presented at the 2006 Joint Statistical Meetings, Seattle, WA, August 6-10, 2006. Judson, D.H. 2007. Information integration for constructing social statistics: history, theory and ideas towards a research programme. Journal of the Royal Statistical Society, 170: 483-501. Lohr, S. 1999. Sampling: Design and Analysis. Pacific Grove, CA: Brooks/Cole Publishing Company. Marcelli, E. A. and Ong, P. M. 2002. 2000 Census Coverage of Foreign Born Mexicans in Los Angeles County: Implications for Demographic Analysis. Paper presented at the 2002 annual meeting of the Population Association of America, Atlanta, GA. April. Marks, E.S., Seltzer, W., and Krotki, K.J. 1974. Population Growth Estimation: A Handbook of Vital Statistics Measurement. New York, NY: The Population Council. Mule, V.T., Jr., and Fenstermaker, D. 2003. Overview and Results of Further Study of Person Duplication for the. A.C.E. Revision II. Paper presented at the 2003 Joint Statistical Meetings - Section on Survey Research Methods, San Francisco, CA, August. Mulry, M. 2006. Summary of Accuracy and Coverage Evaluation for Census 2000. Statistical Research Division Research Report Series (Statistics #2006-3). Online: http://www.census.gov/srd/papers/pdf/rrs2006-03.pdf. Passel, J. S. and Cohn, D. 2008. Trends in Unauthorized Immigration: Undocumented Inflow Now Trails Legal Inflow. Washington, DC: Pew Hispanic Center. Online: http://pewhispanic.org/files/reports/94.pdf. Passel, J. S. 2007. Unauthorized Migrants in the United States: Estimates, Methods, and Characteristics. OECD Social, Employment and Migration Working Papers No. 57. Paris:

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

28

DRAFT DO NOT CITE OR QUOTE OECD Working Party on Migration. Online: http://www.oecd.org/dataoecd/41/25/39264671.pdf. Passel, J. S. 2006. Size and Characteristics of the Unauthorized Migrant Population in the U.S.: Estimates Based on the March 2005 Current Population Survey. Washington, DC: Pew Hispanic Center. Online: http://pewhispanic.org/files/reports/61.pdf. Passel, J. S. 2005. Unauthorized Migrants: Numbers and Characteristics. Washington, DC: Pew Hispanic Center. Online: http://pewhispanic.org/files/reports/46.pdf. Passel, J. S. and Suro, R. 2005. Rise, Peak and Decline: Trends in U.S. Immigration 1992 2004. Washington, DC: Pew Hispanic Center. Online: http://pewhispanic.org/files/reports/53.pdf. Passel, J. S., Van Hook, J., and Bean, F. D. 2004. Estimates of Legal and Unauthorized Foreign Born Population for the United States and Selected States, Based on Census 2000. Report to the Census Bureau. Washington, DC: Urban Institute. Online: http://www.sabresystems.com/whitepapers/EMS_Deliverable_1_020305.pdf. Pitkin, J. and Park, J. 2005. The Gap Between Births and Census Counts of Children Born in California: Undercount or Transnational Movement? Presented at the Population Association of America, Philadelphia, PA. Popoff, C. L., and Judson, D.H. 2004. Some Methods of Estimation for Statistically Underdeveloped Areas. In: Swanson, David A., and Siegel, Jacob (Eds.), The Methods and Materials of Demography, 2nd edition. New York: Elsevier. Robinson, J.G. 2001. ESCAP II: Demographic Analysis Results. Executive Steering Committee for A.C.E. Policy II, Report No. 1, October 13, 2001. Online: http://www.census.gov/dmd/www/pdf/Report1.PDF. Robinson, J. G., Bruce, A. 2007. Tract Level Planning Database with Census 2000 Data. Online: http://www.census.gov/procur/www/2010communications/tract%20level%20pdb %20with%20census%202000%20data%2001-19-07.pdf, retrieved 12/08/2008. Robinson, J. G., West, K.K., Adlakha, A., Bruce, A., Judson, D. and Gage L. 2003. The Expanding Role of Demographic Analysis in 2000 and Beyond. Paper presented at the 2003 Joint Statistical Meetings, San Francisco, CA, August 3-7. Online: http://www.amstat.org/sections/srms/Proceedings/y2003/Files/JSM2003-000570.pdf. Shapiro, G. and Kostanich, D. 1988. High response error and poor coverage are severely hurting the value of household survey data. 1988 American Statistical Association proceedings of the section on survey research methods. Online: http://www.amstat.org/sections/SRMS/proceedings/y1988f.html.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

29

DRAFT DO NOT CITE OR QUOTE Shores, R. 2002. A.C.E. Revision II: Adjustment for Correlation Bias. DSSD A.C.E. Revision II Memorandum Series #PP-53. Online: http://www.census.gov/dmd/www/pdf/pp-53r.pdf. Siegel, J. 1976. Research Proposals for Estimating the Number of Illegal Aliens in the United States. Memorandum to Daniel B. Levine, U.S. Bureau of the Census, Sept. 20, 1976. Tourangeau, R., Shapiro, G., Kearney, A., and Ernst, L. 1997. Who Lives Here? Survey Uncercoverage and Household Roster Questions. Journal of Official Statistics, 13:1-18. Treat, J. B., and Stackhouse, H.F. 2002. Demographic Comparison Self-Response and Personal Visit Interview in Census 2000. Population Research and Policy Review 21: 3951. U.S. Census Bureau 2004. Census 2000, Accuracy and Coverage Evaluation of Census 2000: Design and Methodology. Online: http://www.census.gov/prod/2004pubs/dssd03dm.pdf. Valentine, C., and Valentine, B.L. 1971. Missing men: A Comparative Methodological Study of Underenumeration and Related Problems. Unpublished report submitted to the U.S. Census Bureau, Washington D.C. Wilson, F. D. 2008. Components of Change in the Relative Size of the Foreign- Born Population, 1960-2000. Institute for Research on Poverty Seminar Series, University of Wisconsin-Madison, November 20, 2008. Online: http://www.irp.wisc.edu/newsevents/seminars/Presentations/Wilson-11-20-2008.pdf. Wolter, K. 1986. "Some Coverage Error Models For Census Data." Journal of the American Statistical Association, 81: 338-346. Zajac, K. 2003. Analysis of Imputation Rates for the 100 Percent Person and Housing Unit Data Items from Census 2000. Census 2000 Evaluation B.1.a, September 25, 2003. Online: http://www.census.gov/pred/www/rpts/B.1.a.doc.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

30

DRAFT DO NOT CITE OR QUOTE

Tables and Figures

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

31

DRAFT DO NOT CITE OR QUOTE Table 1: Analysis of the Sensitivity of Pew Hispanic Center Estimate of the Unauthorized to Potential Net Undercoverage:
Sensitivity of unauthorized final estimates to Census coverage assumptions: PVHB, 2004 u_unauthorized 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 Formula: (20.461-12.9) / (1-.u_unauth) 1/(1-u_unauthorized) 1.010 1.020 1.031 1.042 1.053 1.064 1.075 1.087 1.099 1.111 1.124 1.136 1.149 1.163 1.176 1.190 1.205 1.220 1.235 1.250 Implied Unauthorized (Total) 7.637 7.715 7.795 7.876 7.959 8.044 8.130 8.218 8.309 8.401 8.496 8.592 8.691 8.792 8.895 9.001 9.110 9.221 9.335 9.451

Note: Formulae are taken from Passel, Van Hook, and Bean (2004).

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

32

DRAFT DO NOT CITE OR QUOTE

Table 2: Analysis of the Sensitivity of Office of Immigration Statistics Estimate of the Unauthorized to Potential Net Undercoverage:
Sensitivity of unauthorized final estimates to Census coverage and emigration assumptions: OIS, 2007 e_rate 0.0506 0.0606 0.0706 0.0806 0.0906 0.1006 0.1106 0.1206 0.1306 0.1406 0.1506 0.1606 0.1706 0.1806 0.1906 0.2006 Implied unauthorized 9.591741 9.810574 10.02941 10.24824 10.46707 10.68591 10.90474 11.12357 11.34241 11.56124 11.78007 11.99891 12.21774 12.43657 12.65541 12.87424 u_legal 0 0.0025 0.005 0.0075 0.01 0.0125 0.015 0.0175 0.02 0.0225 0.025 0.0275 0.03 0.0325 0.035 0.0375 Implied unauthorized 11.30347 11.35113 11.39879 11.44645 11.49411 11.54177 11.58943 11.63709 11.68475 11.73241 11.78007 11.82774 11.8754 11.92306 11.97072 12.01838 u_non 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 Implied unauthorized 11.59007 11.60907 11.62807 11.64707 11.66607 11.68507 11.70407 11.72307 11.74207 11.76107 11.78007 11.79907 11.81807 11.83707 11.85607 11.87507 11.89407 11.91307 11.93207 11.95107 11.97007 u_unauth 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 Implied unauthorized 10.60207 10.70916 10.81844 10.92997 11.04382 11.16007 11.27879 11.40007 11.52399 11.65062 11.78007 11.91243 12.0478 12.18628 12.32798 12.47302 12.62151 12.77357 12.92935 13.08897 13.25258

Formula: Unauthorized = (28.8 (1-u_legal)(20.2(1-e_rate))-(1-u_non)1.7)/(1-u_unauth)

Note: Formulae are taken from Hoefer, Rytina, and Baker (2008).

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

33

DRAFT DO NOT CITE OR QUOTE Table 3:


Regression of Hard to Count Score on Components and "Recent Arrival" Foreign Born Hard to Count Score [0,132] Coefficient and robust t-statistic Pct Vacant HU 0.271 (27.00)** Pct Not Single U Strc 0.117 (23.43)** Pct Renter Occp HU 0.163 (19.82)** Pct Crowd Occp U 0.088 (-1.35) Pct Not HB WF HH 0.269 (22.34)** Pct Occp U No Ph Srvc 0.617 (7.91)** Pct Not HS Grad 0.439 (25.30)** Pct Prs Blw Pov Lev 0.373 (15.51)** Pct Pub Asst Inc 0.696 (6.89)** Pct Unemploy 0.744 (11.58)** Pct LI HH 0.125 (2.64)** Pct Occp HU Moved 0.393 (25.86)** Percent Foreign Born with YOE 90-00 0.477 (10.21)** Constant -23.534 (53.36)** Observations 64954 R-squared 0.93 Robust t statistics in parentheses * significantly different from zero at the 5% level; ** significantly different from zero at the 1% level N=64,954 Census tracts from Census 2000; Pct Vacant HU is percent of housing units in the tract that are vacant; Pct Not Single U Strc is the percent of units in the tract that are not single unit; Pct Renter Occp HU is the percent of occupied units in the tract that are renter occupied; Pct Crowd Occp U is the percent of occupied units that are "crowded" (as defined by the planning database; Pct Not HB WF HH is the percent of households that are not "husbandwife"; Pct Occp U No Ph Srvc is the percent of occupied units without phone service; Pct Not HS Grad is the percent of persons that are not high school graduates; Pct Prs Blw Pov Lev is the percent of persons who live in families below the poverty level; Pct Pub Asst Inc is the percent of total income that is from public assistance; Pct Unemploy is the percent of persons unemployed; Pct LI HH is the percent of households that are "linguistically isolated"; Pct Occp HU Moved is the percent of households that have moved in; Percent Foreign Born with YOE 90-00 is the percent of persons who are foreign born and have a year of entry later than 1989.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

34

DRAFT DO NOT CITE OR QUOTE Table 4: Percent of Persons Native Born, Non-Recently-Arrived Foreign Born, RecentlyArrived Foreign Born, and Very-Recently-Arrived Foreign Born Who Exhibit Hard to Count Characteristics: American Community Survey, 2007
Percent of Persons Who Live in Non-Single Unit Structures Percent Standard Error Lower MOE Upper MOE Native born 25.99 0.07 25.87 26.11 Foreign born, not recent 33.49 0.16 33.22 33.76 Foreign born, recent 48.86 0.38 48.23 49.48 Foreign born, very recent 59.55 0.41 58.88 60.22 Percent of Persons Who are Renters Percent Standard Error Lower MOE Native born 30.47 0.11 30.30 Foreign born, not recent 33.84 0.17 33.56 Foreign born, recent 55.22 0.39 54.58 Foreign born, very recent 70.94 0.37 70.33

* * *

Upper MOE 30.65 34.11 55.86 71.55

* * *

Percent of Persons Who Live In "Crowded" Living Conditions Percent Standard Error Lower MOE Upper MOE Native born 1.56 0.02 1.52 1.60 Foreign born, not recent 5.19 0.07 5.08 5.31 Foreign born, recent 9.02 0.24 8.62 9.41 Foreign born, very recent 11.47 0.28 11.00 11.93 Percent of Persons Who Live In "Non-Husband/Wife" Households Percent Standard Error Lower MOE Upper MOE Native born 21.33 0.08 21.20 21.45 Foreign born, not recent 20.89 0.17 20.62 21.16 Foreign born, recent 23.35 0.34 22.79 23.90 Foreign born, very recent 24.66 0.39 24.02 25.30 Percent of Persons Who Live In Housing Units Without a Telephone Percent Standard Error Lower MOE Upper MOE Native born 4.54 0.03 4.49 4.59 Foreign born, not recent 3.58 0.07 3.47 3.69 Foreign born, recent 6.58 0.21 6.24 6.92 Foreign born, very recent 9.94 0.30 9.45 10.44 Percent of Persons Who are not High School Graduates Percent Standard Error Lower MOE Upper MOE Native born 7.78 0.02 7.75 7.82 Foreign born, not recent 30.00 0.11 29.83 30.18 Foreign born, recent 23.39 0.22 23.02 23.76 Foreign born, very recent 18.17 0.25 17.76 18.57 Percent of Persons Who are Living Below the Poverty Line Percent Standard Error Lower MOE Upper MOE Native born 15.12 0.06 15.02 15.22 Foreign born, not recent 13.49 0.11 13.30 13.68 Foreign born, recent 18.90 0.26 18.48 19.33

* * *

* *

* * *

* * *

* *

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

35

DRAFT DO NOT CITE OR QUOTE


Foreign born, very recent 26.18 0.36 25.59 26.77 *

Percent of Persons Who are Receiving Public Assistance Percent Standard Error Lower MOE Upper MOE Native born 0.98 0.01 0.97 0.99 Foreign born, not recent 1.16 0.03 1.10 1.21 Foreign born, recent 0.93 0.05 0.84 1.02 Foreign born, very recent 0.64 0.04 0.57 0.71 Percent of Persons Who are Unemployed Percent Standard Error Lower MOE Native born 3.15 0.01 3.13 Foreign born, not recent 3.25 0.05 3.17 Foreign born, recent 3.59 0.09 3.44 Foreign born, very recent 3.93 0.11 3.75

* *

Upper MOE 3.18 3.33 3.74 4.12

* *

Percent of Persons Who are Living in "Linguistically Isolated" Households Percent Standard Error Lower MOE Upper MOE Native born 1.91 0.02 1.88 1.93 Foreign born, not recent 22.52 0.13 22.31 22.72 Foreign born, recent 38.18 0.29 37.70 38.66 Foreign born, very recent 46.48 0.45 45.73 47.22 Percent of Persons Who are Recent Movers Percent Standard Error Lower MOE Native born 14.74 0.07 14.63 Foreign born, not recent 11.92 0.11 11.74 Foreign born, recent 22.27 0.33 21.73 Foreign born, very recent 36.35 0.41 35.68

* * *

Upper MOE 14.86 12.09 22.80 37.02

* * *

N=3,099,438 respondents from the 2007 American Community Survey PUMS; "Native born" includes persons born abroad of American parents; "Recent foreign born" are foreign-born persons whose year of entry is 1997 or later; "Very recent foreign born" are foreign-born persons whose year of entry is 2002 or later; "Not recent foreign born" are foreign-born persons whose year of entry is 1996 or earlier; Percent is the point estimate, "Standard error" is the replicate-weight standard error; "Lower MOE" is the 90% lower margin of error; "Upper MOE" is the 90% upper margin of error. * indicates that the reference group is statistically different from the native-born at the 90% level.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

36

DRAFT DO NOT CITE OR QUOTE Table 5: Methods of Coverage Measurement and Implied Coverage Ratio Method Coverage ratio Do it Better Do it Independently Reverse record check Megalist/Benchmark Demographic accounting Enumeration / Truth Enumeration / Ind. Model Enumeration / Traced Sample Enumeration / List Enumeration / Sum

1 2

1 0

Percent

- 2

- 4

- 6 0 - 4 5 - 9 1 0 - 1 41 5 - 1 92 0 - 2 4 2 5 - 2 9 3 0 - 3 4 3 5 - 3 9 4 0 - 4 4 4 5 - 4 9 5 0 - 5 4 5 5 - 5 9 6 0 - 6 4 6 5 - 6 9 7 0 - 7 4 7 5 +

5 - y e a r B la c k M a le B la c k F e m

a g e

g r o u p

a le N

o n B

l a c k

a Ne o n B l

la c k

F e m

a le

Figure 1. Percent Net Census Undercount (Net Coverage Error) by Race, Sex and Age: Revised 2000 DA. Source: Robinson, et al., 2003.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

37

DRAFT DO NOT CITE OR QUOTE

80 70 60 50 40 30 20 10 0

76 65 59 43 22 11 13 13 9 9 Native Foreign-Born 48 31

Mail-In

CATI

CAPI

Mail-In

CATI

CAPI

Figure 2. Mode of Data Collection by Nativity, Unweighted and Weighted. Source: Camarota and Capizzano, 2004.

100 90 80 70 60 50 40 30 20 10 0
n Ca

93 8 7 8 4 81

M ail-In
7 8

C TI A
7 4 73 71 6 3 58 50 4 5 4 0 3 4 20 12 23 2 3 32 3 0 41

C P A I

18 7 1
a ad i Ch na y an

m er G

UK

In

a ia es a bi am in di ss m tn ip lo ie Ru il l V Co Ph

b Cu

r Ko

ea

r a o do al ic m va ex te al M S ua G El

Figure 3. Mode of Data Collection by Country of Birth, Harris/Fort Bend Counties, 1999. Source: Camarota and Capizzano, 2004.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

38

DRAFT DO NOT CITE OR QUOTE

85 0

90

95

100

2 4 Percent foreign born entered less than 10 years ago 95% CI Total population coverage ratio Fitted values

Figure 4. Relationship Between Recent Arrival Foreign Born and Coverage Ratios for 51 states, American Community Survey, 2006.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

39

DRAFT DO NOT CITE OR QUOTE

0 0

Hard to Count Score [0,132] 50 100 150

200

20 40 60 Percent Foreign Born with YOE 90-00 Hard to Count Score [0,132] Fitted values 95% CI

80

Figure 5. Relationship Between Percent Recent Arrival Foreign Born and Hard to Count Scores, census tract data from 2000 Planning Database.

/var/www/apps/conversion/current/tmp/scratch17168/89512926.doc

40