Aris Spanos
To cite this article: Aris Spanos (2016) Transforming structural econometrics: substantive
vs. statistical premises of inference, Review of Political Economy, 28:3, 426-437, DOI:
10.1080/09538259.2016.1154756
Article views: 37
Statistical adequacy;
Errouaki, in Rational Econometric Man: Transforming Structural structural econometrics;
Econometrics, put forward their proposal on how to achieve that, statistical inductive premises;
by discussing the effectiveness of alternative proposals in the substantive premises;
literature. There is a lot to agree with in this book, but the primary trustworthy evidence
aim of this note is to initiate the dialogue on issues where
opinions differ on how to transform structural econometrics. The
discussion focuses on what I consider a crucial aspect of empirical
modeling—statistical adequacy—but the authors question its
practical usefulness for empirical modeling. I will attempt to make
a case that ‘methodological institutionalism’ cannot be properly
implemented without employing the notion of statistical adequacy.
1. Introduction
The recent great recession, initiated by the financial crises of 2008, raised numerous ques-
tions pertaining to economics as a scientific discipline, and, in particular, the soundness of
its empirical underpinnings. It also raised several foundational questions. How do we
acquire knowledge about economic phenomena? How do we distinguish between well-
grounded empirical knowledge and idle speculation stemming from overly simplistic
models or strong personal beliefs? How do we distinguish between ‘good’ theories and
‘bad’ theories? What is the role of the data in testing the adequacy of theories or hypoth-
eses? Indeed, how did we reach the current ‘unsatisfactory’ state of affairs in econometric
modeling?
The one widely noted message from the recent great recession was that there is a
long way to go before the current approach to empirical modeling, dominated by
structural econometrics, can deliver empirical models that generate reliable inferences,
including forecasting, and trustworthy evidence for or against theories or claims, as
well as provide trustable guidance for economic policy-makers. The disagreements
are mainly focused on how the current methodology can be improved to achieve
these goals.
Nell and Errouaki offer a critical appraisal of structural econometrics that I largely agree
with, but there are also a few areas of disagreement. I share their view that Haavelmo
(1944) provides a useful basis to re-evaluate current econometric practice. My own per-
spective on the methodology of econometric modeling was also inspired by Haavelmo’s
monograph (see Spanos 1989, 2015). My disagreements are mostly related to the type
of changes needed to ameliorate current econometric practice.
The primary aim of this note is to initiate a dialogue on the effectiveness of different
proposals for transforming structural econometrics. My focus will be on what I consider
a crucial aspect of empirical modeling: statistical adequacy, i.e., the requirement that the
probabilistic assumptions imposed on the data are valid for the data in question. The
crucial importance of statistical adequacy stems from the fact that, in general, no trust-
worthy evidence for or against a theory or a hypothesis can be established on the basis
of a statistically mis-specified model (see Spanos 2007, 2010a).
This passage confuses statistical and substantive adequacy. Statistical adequacy is con-
cerned solely with the validity of the probabilistic assumptions imposed (implicitly or
explicitly) on the data. This is separate from substantive adequacy, which is concerned
with how well the estimated model, or a hypothesis framed in terms of that model,
sheds light on (describes, explains, predicts) the phenomenon of interest. For instance,
the assumptions of Normality and Markov dependence pertain to statistical adequacy,
but questions of omitted variables and causal connections among variables pertain to
428 A. SPANOS
substantive adequacy. The two forms of adequacy are related in so far as one needs to
secure statistical adequacy before posing questions pertaining to substantive adequacy.
This is to ensure the error-reliability of inference, i.e., to ensure that the nominal error
probabilities are closely approximated by the actual ones. The surest route to untrust-
worthy evidence is to apply a .05 significance level (nominal) test when the actual type
I error probability is closer to .90. This can easily occur when any of the assumptions
imposed on the data are invalid for the particular data (see Spanos and McGuirk 2001;
Spanos 2009). In this sense, statistical adequacy is the toll a modeler must pay for securing
statistically error-reliable answers to questions of substantive interest.
To give the reader a glimpse of the distorting effects of statistical misspecification on the
reliability of inference, consider the case of a simple linear regression model:
where the data z₀: = {(xt, yt), t = 1, 2, … , n], instead of being Normal, Independent and
Downloaded by [177.207.94.122] at 18:09 04 August 2016
Identically Distributed (NIID), as the model (implicitly) assumes, exhibit a certain hetero-
geneity in the mean (trending). What would be the effects of such a mis-specification on
the least-squares estimators (β₀, β₁, s²) of (β₀, β₁, σ²)? The answer is that all three estima-
tors will be biased and inconsistent. The same is true for the usual R², and in all these cases
the effects of bias and inconsistency will worsen as the sample size n increases; no solace
from a larger n. What about the effects of the mean-trending misspecification on the t-test
for the hypotheses:
H0 : b1 = 0 vs. H1 : b1 = 0.
As shown in Spanos and McGuirk (2001), for an assumed nominal type I error of .05,
the actual error probability will be close to .98 for n = 50 and becomes 1.0 with n = 100; one
is guaranteed to reject a true null hypothesis 100% of the time! Similarly, the power of the
t-test will be totally distorted, rendering the t-test very sensitive close to the null (where
high power is not needed), but insensitive to larger discrepancies (where high power is
needed). Does such a situation arise in practice? Yes—very often (see Spanos [2011] for
an empirical example).
In light of these comments, the above quotation makes little sense as a critique of stat-
istical adequacy because the notion pertains to the model as a whole and not to particular
hypotheses. Moreover, one can give straightforward answers to the question: ‘How
inadequate, in exactly what way?’ When an estimated model is statistically inadequate,
the modeler can find out which probabilistic assumptions are invalid using thorough
mis-specification testing. To the question ‘what then?’ the reply is that the modeler
should respecify the original model with a view to account for the systematic information
unaccounted for by the original model. The answer to the question ‘How does the inade-
quacy matter?’ is that ‘inferences based on a statistically inadequate model are likely to be
unreliable in the sense mentioned above’; there are significant discrepancies between the
nominal and actual error probabilities (see Spanos 2009; Mayo and Spanos 2004). Indeed,
in establishing statistical adequacy, the scientific issue is whether a particular assumption
is invalid for the data in question.
Having said that, the above questions and assertions by Nell and Errouaki about scien-
tific claims make more sense with regard to substantive adequacy. A structural model may
REVIEW OF POLITICAL ECONOMY 429
always come up short in attaining substantive adequacy, and a modeler is invariably inter-
ested in improving its adequacy. However, it is one thing to say that a structural model is a
mere approximation of the reality it aims to describe, and entirely another to claim that the
probabilistic assumptions imposed on the data are invalid; the latter will invariably under-
mine the reliability of any inferences based on such a model. A simple but statistically ade-
quate model may be perfectly adequate for answering substantive questions. This
confusion between substantive and statistical adequacy permeates the current econometric
literature and constitutes a serious stumbling block in recasting structural econometrics. A
glance through any traditional econometric textbook reveals that the problem of omitted
variables is routinely viewed as one of statistical mis-specification giving rise to biased and
inconsistent estimators, when the problem concerns substantive adequacy (see Spanos
2006b).
In the discussion that follows I will try to make the case that Nell and Errouaki’s ‘meth-
odological institutionalism’ cannot be properly implemented without the notion of stat-
Downloaded by [177.207.94.122] at 18:09 04 August 2016
istical adequacy.
from the statistical premises can easily give rise to inconsistent estimators. In the case of
testing, such departures induce discrepancies between the nominal and actual error prob-
abilities associated with such inference procedures. Without statistical adequacy, the tra-
ditional criteria of goodness-of-fit and the t-ratios are statistically meaningless, rendering
any inferences stemming from these results spurious and the resulting evidential assess-
ments untrustworthy; see Spanos (2012).
So when a traditional modeler decides that the estimated model provides evidence
against the substantive model, it is not obvious whether such evidence stems from the
falsity of the theory or the inappropriateness of the statistical premises employed for
the statistical quantification itself; in the philosophy of science, this is known as
Duhem’s ambiguity (see Mayo 1996). Similarly, when a traditional modeler decides that
the estimated model provides evidence for the substantive model, it is not obvious
whether the evidence is trustworthy or untrustworthy, with the untrustworthiness stem-
ming from statistically mis-specified premises. This situation arises often in practice
Downloaded by [177.207.94.122] at 18:09 04 August 2016
underlying the particular data Z₀. This is important for several reasons, including the fact
that the error assumptions often provide an incomplete and sometimes misleading picture
of the implicit statistical premises imposed on the data (see Spanos 2010c).
The substantive Mφ(z) and statistical Mθ(z) models in (1) and (2) can be conceptually
separated by viewing the former as based on the theory and the latter not as a ‘solution’ of
the former, but as a parameterization of the observable process {(yt ∣ Xt), t[N] (see Spanos
2006a). A complete set of testable probabilistic assumptions for the multivariate linear
regression model [1]–[5] is given in Table 1.
The third step is to evaluate the statistical adequacy of Mθ(z) by probing the validity of
its assumptions (e.g., [1]–[5]) using thorough mis-specification (M-S) testing. Establishing
statistical adequacy presupposes that the statistical model is specified in terms of an intern-
ally consistent and complete set of testable probabilistic assumptions, such as given in
[1]–[5] in Table 1. In contrast, the traditional specification in terms of error assumptions
εt ∽ N(0, Ω(φ)) does not provide such a complete list of assumptions (see Spanos and
Downloaded by [177.207.94.122] at 18:09 04 August 2016
Mayo 2004). The econometric literature has raised questions about the appropriateness
of M-S testing, including charges of double-use of data, pre-test bias and infinite
regress. However, these charges are based on inadequate understanding of M-S testing
and its role in establishing statistical adequacy (see Spanos 2000, 2010b).
The fourth step is to re-specify the original statistical model when the M-S testing
reveals departures from its probabilistic assumptions. This step differs significantly
from the traditional error-fixing strategies of current econometric practice. It takes the
form of selecting a new set of assumptions for the observable process {Zt: = (yt: Xt),
t[N] that often gives rise to a different parameterization. The only constraint for the
re-specification is to preserve the parametric nesting of the structural model via identifying
restrictions.
In scientific disciplines where the available data are mostly observational, a statistically
adequate model resulting from respecifying the original (implicit) statistical model could
play a crucial role in guiding the search for better (substantively adequate) theories by
demarcating ‘what there is to explain’. A statistically adequate model represents empirical
regularities stemming from the probabilistic structure of the data chosen. Hence, an ade-
quate theory should be able to explain how these regularities are generated by the system
in question.
Once a statistically adequate model Mθ(z) that nests Mφ(z) is established, the latter con-
stitutes a reparameterization/restriction of the former via:
G(u, w) = 0, u [ Q, w [ F, (4)
where θ denotes the statistical parameters and ϕ the substantive parameters of interest.
Hence, the substantive model Mϕ(z) is said to be empirically valid when: (i) the (implicit)
statistical model Mθ(z) is statistically adequate, and (ii) the restrictions in (4) are data-
acceptable (see Spanos 1990).
In general, one can show that behind every structural model, generically specified by:
there exists a statistical model (often implicit) taking the generic form:
where f (z; θ), z [ RnZ denotes the joint distribution of the sample Z: = (Z1, … , Zn). From
this perspective Mθ(z) is associated with a given set of data Z0, irrespective of the theory
that led to the choice of Z0. Once selected, data Z0 can take on ‘a life of its own’ as a par-
ticular realization of a generic process {Zt, t[N]. The link between data Z0 and the process
{Zt, t[N] is provided by a pertinent answer to the key question: ‘what probabilistic struc-
ture, when imposed on the process {Zt, t[N], would render data Z0 a truly typical realiz-
ation thereof?’
Stage 1: Typicality. The ‘truly typical realization’ answer provides the relevant probabil-
istic structure for {Zt, t[N], an answer that can be appraised using thorough M-S
testing.
Stage 2: Parameterization. The relevant statistical model Mθ(z) is specified by choosing a
particular parameterization θ [ Θ for {Zt, t[N], with a view to nest (parametrically)
the structural model Mϕ(z), e.g. G(θ, ϕ) = 0, ϕ [ Φ.
The ontological commitments involved in specifying Mθ(z) concern only the existence
of a rich enough probabilistic structure to ‘model’ the chance regularities in Z0, a condition
that can be verified using statistical adequacy. In contrast, the ontological commitments
associated with the structural model are entirely different because Mϕ(z) is viewed as
aiming to approximate the actual mechanism underlying the phenomenon of interest
by using abstraction, and simplification and focusing on particular aspects (selecting
the relevant observables Zt) of this phenomenon. To appraise substantive adequacy, one
needs to secure statistical adequacy first, and then proceed to probe for several potential
errors, like omitted but relevant factors, false causal claims, etc., using error-reliable infer-
ence procedures.
The distinction between the substantive and statistical premises plays a crucial role in
addressing a number of important issues in empirical modeling, including (i) statistical
model validation vs. statistical inference, (ii) model validation vs. model selection, (iii)
the choice of optimal instruments for the Instrumental Variables method of estimation,
and (iv) addressing the fallacies of rejection and acceptance (see Spanos 2014b).
REVIEW OF POLITICAL ECONOMY 433
where Q(z; θ) denotes the relevant inferential propositions pertaining to optimal estima-
tors, tests or predictors. Most statistics textbooks focus almost exclusively on this deduc-
tive component, but they rarely spell out the probabilistic assumptions comprising Mθ(z).
This model-based statistical induction adopts a frequentist interpretation of probability
Downloaded by [177.207.94.122] at 18:09 04 August 2016
that stems from a pertinent link between the mathematical framework and the data-gen-
erating mechanism giving rise to Z0 (Spanos 2013). Statistical adequacy plays a crucial role
in establishing this link by providing a way to verify the interpretive provision that data
Z0: = (z1, z2, … , zn) constitute a ‘truly typical’ realization of the process {Zt, t[N]
whose probabilistic structure is specified by the statistical model Mθ(z). This is a crucial
first step in learning from data about phenomena of interest, because it secures the sound-
ness of this deductive component providing a ‘truth-preserving’ link. Moreover, when the
identifying restrictions G(θ, φ) = 0 are also valid, this ‘truth-preserving’ link can be
extended to include the inferential propositions Q∗ (z; φ) associated with the structural
model Mϕ(z), in terms of which the substantive questions of interest are posed. In this
sense, statistical adequacy provides the cornerstone of model-based statistical induction.
This is because when Mϕ(z) is shown to be empirically valid (the data provide evidence
for G(θ, φ) = 0), the learning from data is passed from θ to φ, rendering Mφ(z) both stat-
istically and substantively meaningful (see Spanos 1990). The estimated Mϕ(z), when
empirically valid, can now provide the basis for inference (estimation, testing, prediction)
as well as policy framing.
What distinguishes the above model-based induction from the traditional discussions
of induction in the philosophy of science is that its premises, in the form of a statistical
model Mθ(z), provide a complete set of assumptions that are testable vis-à-vis data z0.
Moreover, there is a crucial difference between the nature of deduction in general and
that of the deductive component of the above model-based induction in particular. In tra-
ditional deductive inference there is a premium for inferences based on the minimal set of
assumptions comprising the deductive premises. Hence the incessant effort in mathemat-
ics to weaken the premises but retain the conclusions.
In the context of model-based induction, however, learning from data calls for viewing
the deductive component differently. The premium should be placed on a maximal set of
testable assumptions comprising the premises because that will give rise to more precise
inferences when statistical adequacy is secured. That is, we learn from data by applying
optimal (most effective) inferential procedures whose statistical adequacy has been
assured. For instance, a Uniformly Most Powerful test is nothing more than a testing pro-
Downloaded by [177.207.94.122] at 18:09 04 August 2016
cedure that has the highest capacity to detect any magnitude discrepancies from the null.
In contrast, weak premises comprising non-testable assumptions would give rise to poten-
tially unreliable and imprecise inferences, i.e., ineffective procedures whose reliability is
questionable at best.
Unfortunately, over the last 60 years or so the econometrics journals have been awash
with technical papers deriving Consistent and Asymptotically Normal estimators and
related asymptotic tests for numerous statistical models, under a variety of mathematically
convenient assumptions (usually in terms of unobservable error terms), which are often
non-testable. Indeed, the trend in the econometric literature since the 1950s has generally
been to adopt weaker and weaker premises while neglecting to test their validity. The jus-
tification has been that a complete set of probabilistic assumptions is ‘unrealistic’, and thus
the answer to potentially unreliable inferences is to weaken such assumptions in an
attempt to render them more ‘realistic’. This explains the special emphasis placed by prac-
titioners on using the Generalized Method of Moments as well as nonparametric methods.
The question that this viewpoint raises is: how could one decide whether a set of prob-
abilistic assumptions is ‘realistic’ or ‘unrealistic’ for a particular data set without testing
their validity? This is a question practitioners should contemplate carefully before they
adopt methods that invoke non-testable assumptions. The continuous accumulation of
these technical results has left the practitioner none the wiser as to how and when to
apply them. Hence, practitioners try out the latest technical tools using different data
sets and functional forms, hoping that occasionally a computer output (goodness-of-fit/
prediction measures and statistical significance in conjunction with the theoretical mean-
ingfulness of key coefficients in the estimated substantive model) will enable them to ‘tell a
story’. However, this storytelling sheds no real light on economic phenomena because the
reported empirical ‘evidence’ is usually untrustworthy, exactly because the validity of the
probabilistic assumptions imposed on the data is left unexamined (Spanos 2006a).
In addition, model validation and respecification in current econometric practice has
taken the form of ‘error-fixing’ (correcting for autocorrelation, heteroskedasticity, nonli-
nearity) and adopting the alternative in a mis-specification test (e.g. use Generalized Least
Squares when the Durbin-Watson test is low)—i.e., institutionalizing the fallacy of rejec-
tion (Mayo and Spanos 2006)—or adjusting the standard errors to ensure consistency of
estimators. It can be shown that none of these strategies addresses the real unreliability of
REVIEW OF POLITICAL ECONOMY 435
inference problem in the form of significant discrepancies between actual and nominal
error probabilities (Spanos and McGuirk 2001).
also in determining when they break down or when they experience shifts and regime
changes. Moreover, a statistically adequate model can help guide not only the search
for better theories that should account for the established statistical regularities, but also
the conceptual analysis and fieldwork as envisioned by Nell and Errouaki. For instance,
the presence of trends in a statistical model that has been shown to be statistically adequate
is perfectly acceptable for the error-reliability of any inferences based on this model.
However, from the substantive perspective such terms represent ignorance and could
be treated as generic statistical factors to be replaced by theoretically/institutionally mean-
ingful variables (Spanos 2010c).
I agree with Nell and Errouaki’s critique of neoclassical economic theory as focusing
almost exclusively on interpreting economic phenomena in terms of the choices of
rational individual agents, in a misguided attempt to ground econometric modeling on
deep parameters like preferences. However, what aspects of economic phenomena are
invariant and stable over time cannot be decided solely on a priori grounds. Hence, estab-
lishing a statistically adequate model with constant parameters could help guide that
search for deep structures from the data side.
I also agree with Nell and Errouaki’s critique of the neoclassical economic theory for
failing to provide appropriate structural models pertinent for empirical modeling, but
my assessment is that the problem lies in misapplying that theory because its models
are not directly estimable using the data generated by the socio-economic system. What
is needed is to follow Haavelmo’s (1944) advice for modeling with observational data:
‘[when one] is presented with some results which, so to speak, Nature has produced in
all their complexity, his task [is] to build models that explain what has been observed’
(p. 7).
The key problem in recasting structural econometrics, as I see it, is constructing a per-
tinent bridge between theory and data. My first attempt to address this problem was to
propose a sequence of interconnecting models, structural (estimable) from the theory
side, and statistical from the data side, with the two sides ultimately enmeshing to form
an empirical model (see Spanos 1986, p. 21). This would require the extension of neoclas-
sical theories beyond demand and supply functions specified in terms of intentions,
because the latter are not usually observable. Such an extension entails more than the con-
ceptual analysis and fieldwork that Nell and Errouaki advocate. Neoclassical theory needs
436 A. SPANOS
to frame structural models in terms of price and quantity adjustment equations instead of
focusing exclusively on what happens at the equilibrium state. It will have to include the
development of more relevant models to replace the instantaneous adjustments and Wal-
rasian auctioneers currently invoked by neoclassical economics (see Spanos 1995; Boland
2014). I agree with Nell and Errouaki that these models should take fully into account the
institutional framework of the socio-economic system that gave rise to the data in
question.
Disclosure statement
No potential conflict of interest was reported by the author.
References
Boland, L. A. 2014. Model Building in Economics: its Purposes and Limitations. Cambridge:
Downloaded by [177.207.94.122] at 18:09 04 August 2016
Spanos, A. 2010c. ‘Statistical Adequacy and the Trustworthiness of Empirical Evidence: Statistical
vs. Substantive Information.’ Economic Modelling 27: 1436–1452.
Spanos, A. 2011. ‘Foundational Issues in Statistical Modeling: Statistical Model Specification and
Validation.’ Rationality, Markets and Morals 2: 146–178.
Spanos, A. 2012. ‘Philosophy of Econometrics.’ In Philosophy of Economics, Philosophy of
Economics, edited by U. Maki. Amsterdam: Elsevier.
Spanos, A. 2013. ‘A Frequentist Interpretation of Probability for Model-Based Inductive Inference.’
Synthese 190: 1555–1585.
Spanos, A. 2014a. ‘Learning from Data: The Role of Error in Statistical Modeling and Inference.’ In
Error and Uncertainty in Scientific Practice, edited by M. Boumans, G. Hon and A. Petersen.
London: Pickering and Chatto.
Spanos, A. 2014b. ‘Reflections on the LSE Tradition in Econometrics: A Student’s Perspective.’
Œconomia 4–3: 343–380.
Spanos, A. 2015. ‘Revisiting Haavelmo’s Structural Econometrics: Bridging the gap Between Theory
and Data.’ Journal of Economic Methodology (forthcoming).
Spanos, A., and A. McGuirk. 2001. ‘Econometric Methodologies for the Model Specification
Problem: Addressing Old Problems in the New Century: The Model Specification Problem
Downloaded by [177.207.94.122] at 18:09 04 August 2016