Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

The Theory of Reduction in Econometrics: Model Selection and Encompassing, Lecture notes of Economics

The theory of reduction in econometrics, focusing on model selection and encompassing models. The authors explain the importance of congruent models and the challenges of model selection in small data samples. They introduce PcGets, a software package for implementing general-to-specific reductions in linear, dynamic regression models. The document also covers the econometrics of model selection, its applications in forecasting, testing, and policy analysis, and the limitations of some commonly used approaches. The authors present the settings in PcGets and discuss the impact of various choices on model selection.

What you will learn

  • What are the limitations of some commonly used approaches in model selection?
  • How does PcGets depend on various choices in its settings?
  • What is the theory of reduction in econometrics?
  • What is the role of diagnostic tests in model selection?
  • Why is it important to ensure that selection inferences are reliable in model selection?

Typology: Lecture notes

2021/2022

Uploaded on 09/12/2022

beverly69
beverly69 🇬🇧

4

(8)

242 documents

1 / 53

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Econometric Modelling
David F. Hendry
Nuffield College, Oxford University.
July 18, 2000
Abstract
The theory of reduction explains the origins of empirical models, by delineating all the steps
involved in mapping from the actual data generation process (DGP) in the economy far too com-
plicated and high dimensional ever to be completely modeled to an empirical model thereof. Each
reduction step involves a potential loss of information from: aggregating, marginalizing, condition-
ing, approximating, and truncating,leading to a ‘local’ DGP which is the actual generating process
in the space of variables under analysis. Tests of losses from many of the reduction steps are feas-
ible. Models that show no losses are deemed congruent; those that explain rival models are called
encompassing. The main reductions correspond to well-established econometrics concepts (causal-
ity, exogeneity, invariance, innovations, etc.) which are the null hypotheses of the mis-specification
tests, so the theory has considerable excess content.
General-to-specific (Gets) modelling seeks to mimic reduction by commencing from a general
congruent specification that is simplified to a minimal representation consistent with the desired
criteria and the data evidence (essentially represented by the local DGP). However, in small data
samples, model selection is difficult. We reconsider model selection from a computer-automation
perspective, focusing on general-to-specific reductions, embodied in PcGets an Ox Package for
implementing this modelling strategy for linear, dynamic regression models. We present an econo-
metric theory that explains the remarkable properties of PcGets. Starting from a general congruent
model, standard testing procedures eliminate statistically-insignificant variables, with diagnostic
tests checking the validity of reductions, ensuring a congruent final selection. Path searches in
PcGets terminate when no variable meets the pre-set criteria, or any diagnostic test becomes sig-
nificant. Non-rejected models are tested by encompassing: if several are acceptable, the reduction
recommences from their union: if they re-appear, the search is terminated using the Schwartz cri-
terion.
Since model selection with diagnostic testing has eluded theoretical analysis, we study model-
ling strategies by simulation. The Monte Carlo experiments show that PcGets recovers the DGP
specification from a general model with size and power close to commencing from the DGP it-
self, so model selection can be relatively non-distortionary even when the mechanism is unknown.
Empirical illustrations for consumers’ expenditure and money demand will be shown live.
Next, we discuss sample-selection effects on forecast failure, with a MonteCarlo study of their
impact. This leads to a discussion of the role of selection when testing theories, and the problems
inherent in ‘conventional’ approaches. Finally, we show that selecting policy-analysis models by
forecast accuracy is not generally appropriate. We anticipate that Gets will perf orm well in selecting
models for policy.
Financial support from the UK Economic and Social Research Council under grant L138251009 Modelling Non-
stationary Economic Time Series, R000237500, and Forecasting and Policy in the Evolving Macro-economy, L138251009,
is gratefully acknowledged. The research is based on joint work with Hans-Martin Krolzig of Oxford University.
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35

Partial preview of the text

Download The Theory of Reduction in Econometrics: Model Selection and Encompassing and more Lecture notes Economics in PDF only on Docsity!

Econometric Modelling

David F. Hendry ∗

Nuffield College, Oxford University.

July 18, 2000

Abstract The theory of reduction explains the origins of empirical models, by delineating all the steps involved in mapping from the actual data generation process (DGP) in the economy – far too com- plicated and high dimensional ever to be completely modeled – to an empirical model thereof. Each reduction step involves a potential loss of information from: aggregating, marginalizing, condition- ing, approximating, and truncating, leading to a ‘local’ DGP which is the actual generating process in the space of variables under analysis. Tests of losses from many of the reduction steps are feas- ible. Models that show no losses are deemed congruent; those that explain rival models are called encompassing. The main reductions correspond to well-established econometrics concepts (causal- ity, exogeneity, invariance, innovations, etc.) which are the null hypotheses of the mis-specification tests, so the theory has considerable excess content. General-to-specific ( Gets ) modelling seeks to mimic reduction by commencing from a general congruent specification that is simplified to a minimal representation consistent with the desired criteria and the data evidence (essentially represented by the local DGP). However, in small data samples, model selection is difficult. We reconsider model selection from a computer-automation perspective, focusing on general-to-specific reductions, embodied in PcGets an Ox Package for implementing this modelling strategy for linear, dynamic regression models. We present an econo- metric theory that explains the remarkable properties of PcGets. Starting from a general congruent model, standard testing procedures eliminate statistically-insignificant variables, with diagnostic tests checking the validity of reductions, ensuring a congruent final selection. Path searches in PcGets terminate when no variable meets the pre-set criteria, or any diagnostic test becomes sig- nificant. Non-rejected models are tested by encompassing: if several are acceptable, the reduction recommences from their union: if they re-appear, the search is terminated using the Schwartz cri- terion. Since model selection with diagnostic testing has eluded theoretical analysis, we study model- ling strategies by simulation. The Monte Carlo experiments show that PcGets recovers the DGP specification from a general model with size and power close to commencing from the DGP it- self, so model selection can be relatively non-distortionary even when the mechanism is unknown. Empirical illustrations for consumers’ expenditure and money demand will be shown live. Next, we discuss sample-selection effects on forecast failure, with a Monte Carlo study of their impact. This leads to a discussion of the role of selection when testing theories, and the problems inherent in ‘conventional’ approaches. Finally, we show that selecting policy-analysis models by forecast accuracy is not generally appropriate. We anticipate that Gets will perform well in selecting models for policy. ∗Financial support from the UK Economic and Social Research Council under grant L138251009 Modelling Non- stationary Economic Time Series , R000237500, and Forecasting and Policy in the Evolving Macro-economy , L138251009, is gratefully acknowledged. The research is based on joint work with Hans-Martin Krolzig of Oxford University.

Contents

1 Introduction

The economy is a complicated, dynamic, non-linear, simultaneous, high-dimensional, and evolving en- tity; social systems alter over time; laws change; and technological innovations occur. Time-series data samples are short, highly aggregated, heterogeneous, non-stationary, time-dependent and inter- dependent. Economic magnitudes are inaccurately measured, subject to revision and important vari- ables not unobservable. Economic theories are highly abstract and simplified, with suspect aggregation assumptions, change over time, and often rival, conflicting explanations co-exist. In the face of this welter of problems, econometric modelling of economic time series seeks to discover sustainable and interpretable relationships between observed economic variables. However, the situation is not as bleak as it may seem, provided some general scientific notions are understood. The first key is that knowledge accumulation is progressive: one does not need to know all the answers at the start (otherwise, no science could have advanced). Although the best empirical model at any point will later be supplanted, it can provide a springboard for further discovery. Thus, model selection problems (e.g., data mining) are not a serious concern: this is established below, by the actual behaviour of model-selection algorithms. The second key is that determining inconsistencies between the implications of any conjectured model and the observed data is easy. Indeed, the ease of rejection worries some economists about eco- nometric models, yet is a powerful advantage. Conversely, constructive progress is difficult, because we do not know what we don’t know, so cannot know how to find out. The dichotomy between construction and destruction is an old one in the philosophy of science: critically evaluating empirical evidence is a destructive use of econometrics, but can establish a legitimate basis for models. To understand modelling, one must begin by assuming a probability structure and conjecturing the data generation process. However, the relevant probability basis is unclear, sincet the economic mechanism is unknown. Consequently, one must proceed iteratively: conjecture the process, develop the associated probability theory, use that for modelling, and revise the starting point when the results do not match consistently. This can be seen in the gradual progress from stationarity assumptions, through integrated-cointegrated systems, to general non-stationary, mixing processes: further developments will undoubtedly occur, leading to a more useful probability basis for empirical modelling. These notes first review the theory of reduction in §2 to explain the origins of empirical models, then discuss some methodological issues that concern many economists. Despite the controversy surrounding econometric methodology, the ‘LSE’ approach (see Hendry, 1993, for an overview) has emerged as a leading approach to empirical modelling. One of its main tenets is the concept of general-to-specific modelling ( Getsge neral- t o- s pecific ): starting from a gen- eral dynamic statistical model, which captures the essential characteristics of the underlying data set, standard testing procedures are used to reduce its complexity by eliminating statistically-insignificant variables, checking the validity of the reductions at every stage to ensure the congruence of the selected model. Section 3 discusses Gets , and relates it to the empirical analogue of reduction. Recently econometric model-selection has been automated in a program called PcGets , which is an Ox Package (see Doornik, 1999, and Hendry and Krolzig, 1999a) designed for Gets modelling, currently focusing on reduction approaches for linear, dynamic, regression models. The development of PcGets has been stimulated by Hoover and Perez (1999), who sought to evaluate the performance of Gets. To implement a ‘general-to-specific’ approach in a computer algorithm, all decisions must be ‘mechan- ized’. In doing so, Hoover and Perez made some important advances in practical modelling, and our approach builds on these by introducing further improvements. Given an initial general model, many reduction paths could be considered, and different selection strategies adopted for each path. Some of

these searches may lead to different terminal specifications, between which a choice must be made. Consequently, the reduction process is inherently iterative. Should multiple congruent contenders even- tuate after a reduction round, encompassing can be used to test between them, with only the surviving – usually non-nested – specifications retained. If multiple models still remain after this ‘testimation’ pro- cess, a new general model is formed from their union, and the simplification process re-applied. Should that union repeat, a final selection is made using information criteria, otherwise a unique congruent and encompassing reduction has been located. Automating Gets throws further light on several methodological issues, and prompts some new ideas, which are discussed in section 4. While the joint issue of variable selection and diagnostic testing using multiple criteria has eluded most attempts at theoretical analysis, computer automation of the model-selection process allows us to evaluate econometric model-selection strategies by simulation. Section 6 presents the results of some Monte Carlo experiments to investigate if the model-selection process works well or fails badly; their implications for the calibration of PcGets are also analyzed. The empirical illustrations presented in section 7 demonstrate the usefulness of PcGets for applied econometric research. Section 8 then investigates model selection in forecasting, testing, and policy analysis and shows the drawbacks of some widely-used approaches.

2 Theory of reduction

First we define the notion of an empirical model, then explain the the origins of such models by the theory of reduction.

2.1 Empirical models

In an experiment, the output is caused by the inputs and can be treated as if it were a mechanism:

yt = f (zt) + νt [output] [input] [perturbation]

where yt is the observed outcome of the experiment when zt is the experimental input, f (·) is the mapping from input to output, and νt is a small, random perturbation which varies between experiments conducted at the same values of z. Given the same inputs {zt}, repeating the experiment generates essentially the same outputs. In an econometric model, however:

yt = g (zt) + t [observed] [explanation] [remainder]

yt can always be decomposed into two components, namely g (zt) (the part explained) and t (unex- plained). Such a partition is feasible even when yt does not depend on g (zt). In econometrics:

t = yt − g (zt). (3)

Thus, models can be designed by selection of zt. Design criteria must be analyzed, and lead to the notion of a congruent model: one that matches the data evidence on the measured attributes. Successive congruent models should be able to explain previous ones, which is the concept of encompassing, and thereby progress can be achieved.

2.7 Sequential factorization

To create the innovation process sequentially factorize X^1 T as:

DX

X^1 T | W 0 , Λ^1 b,T

∏^ T

t=

Dx

xt | X^1 t− 1 , W 0 , λb,t

Mean innovation error process t = xt − E

[

xt|X^1 t− 1

]

2.7.1 Sequential factorization of W^1 T.

Alternatively:

DW

W^1 T | W 0 , φ^1 T

∏^ T

t=

Dw (wt | Wt− 1 , δt). (9)

RHS innovation process is ηt = wt − E

[

wt|W^1 t− 1

]

2.7.2 Marginalizing with respect to V^1 T.

Dw (wt | Wt− 1 , δt) = Dv|x (vt | xt, Wt− 1 , δa,t) Dx

xt | V^1 t− 1 , X^1 t− 1 , W 0 , δb,t

as Wt− 1 =

V t^1 − 1 , X^1 t− 1 , W 0

. μ must be obtained from {δb,t} alone. Marginalize with respect to V t^1 − 1 : Dx

xt | V t^1 − 1 , X^1 t− 1 , W 0 , δb,t

= Dx

xt | X^1 t− 1 , W 0 , δ∗ b,t

No loss of information if and only if δb,t = δ∗ b,t ∀t, so the conditional, sequential distribution of {xt} does not depend on V^1 t− 1 (Granger non-causality).

2.8 Mapping to I (0)

Needed to ensure conventional inference is valid, though many inferences will be valid even if this reduction is not enforced. Cointegration would need to be treated in a separate set of lectures.

2.9 Conditional factorization

Factorize the density of xt into sets of n 1 and n 2 variables where n 1 + n 2 = n:

x′ t =

y′ t : z′ t

where the yt are endogenous and the zt are non-modelled.

Dx

xt | X^1 t− 1 , W 0 , λbt

= Dy|z

yt | zt, X^1 t− 1 , W 0 , θa,t

Dz

zt | X^1 t− 1 , W 0 , θb,t

zt is weakly exogenous for μ if (i) μ = f (θa,t) alone; and (ii) (θa,t, θb,t) ∈ Θa × Θb.

2.10 Constancy

Complete parameter constancy is : θa,t = θa ∀t (14)

where θa ∈ Θa, so that μ is a function of θa : μ = f (θa).

∏^ T

t=

Dy|z

yt | zt, X^1 t− 1 , W 0 , θa

with θa ∈ Θ.

2.11 Lag truncation

Fix the extent of the history of X^1 t− 1 in (15) at s earlier periods:

Dy|z

yt | zt, X^1 t− 1 , W 0 , θa

= Dy|z

yt | zt, Xtt−−s 1 , W 0 , δ

2.12 Functional form

Map yt into y∗ t = h (yt) and zt into z∗ t = g (zt), and denote the resulting data by X∗. Assume that y∗ t and z∗ t simultaneously make Dy∗|z∗^ (·) approximately normal and homoscedastic, denoted Nn 1 [ηt, Υ]:

Dy|z

yt | zt, Xtt−−s 1 , W 0 , δ

= Dy∗|z∗

y∗ t | z∗ t , X∗tt−−s 1 , W 0 , γ

2.13 The derived model

A (L) h (y)t = B (L) g (z)t + t (18)

where t gapp Nn 1 [ 0 , Σ], and A (L) and B (L) are polynomial matrices (i.e., matrices whose elements are polynomials) of order s in the lag operator L. t is a derived, and not an autonomous, process defined by: t = A (L) h (y)t − B (L) g (z)t. (19)

The reduction to the generic econometric equation involves all the stages of aggregation, marginaliza- tion, conditioning etc., transforming the parameters from ψ which determines the stochastic features of the data, to the coefficients of the empirical model.

2.14 Dominance

Consider two distinct scalar empirical models denoted M 1 and M 2 with mean-innovation processes (MIPs) {νt} and {t} relative to their own information sets, where νt and t have constant, finite vari- ances σ ν^2 and σ^2  respectively. Then M 1 variance dominates M 2 if σ^2 ν < σ^2  , denoted by M 1  M 2. Variance dominance is transitive since if M 1  M 2 and M 2  M 3 then M 1  M 3 , and anti-symmetric since if M 1  M 2 then it cannot be true that M 2  M 1. A model without a MIP error can be variance dominated by a model with a MIP on a common data set. The DGP cannot be variance dominated in the population by any models thereof (see e.g. Theil, 1971, p543). Let Ut− 1 denote the universe of information for the DGP and let Xt− 1 be the subset, with associated innovation sequences {νu,t} and {νx,t}. Then as {Xt− 1 } ⊆ {Ut− 1 }, E [νu,t|Xt− 1 ] = 0, whereas E [νx,t|Ut− 1 ] need not be zero. A model with an innovation error cannot be variance dominated by a model which uses only a subset of the same information. If t = xt − E [xt|Xt− 1 ], then σ ^2 is no larger than the variance of any other empirical model error defined by ξt = xt − G [xt|Xt− 1 ] whatever the choice of G [·]. The conditional expectation is the minimum mean-square error predictor. These implications favour general rather than simple empirical models, given any choice of information set, and suggest modelling the conditional expectation. A model which nests all contending explanations as special cases must variance dominate in its class. Let model Mj be characterized by parameter vector ψj with κj elements, then as in Hendry and Richard (1982): M 1 is parsimoniously undominated in the class {Mi} if ∀i, κ 1 ≤ κi and no Mi  M 1. Model selection procedures (such as AIC or the Schwarz criterion: see Judge, Griffiths, Hill, L¨utkepohl and Lee (1985)) seek parsimoniously undominated models, but do not check for congruence.

[d] theory information, which often is the source of parameters of interest, and is a creative stimulus in economics; [e] measurement information, including price index theory, constructed identities such as consumption equals income minus savings, data accuracy and so on; and: [f] data of rival models, which could be analyzed into past, present and future in turn. The six main criteria which result for selecting an empirical model are: [a] homoscedastic innovation errors; [b] weakly exogenous conditioning variables for the parameters of interest; [c] constant, invariant parameters of interest; [d] theory consistent, identifiable structures; [e] data admissible formulations on accurate observations; and [f] encompass rival models. Models which satisfy the first five information sets are said to be congruent: an encompassing congruent model satisfies all six criteria.

3 General-to-specific modelling

The practical embodiment of reduction is general-to-specific ( Gets ) modelling. The DGP is replaced by the concept of the ‘local’ DGP (LDGP), namely the joint distribution of the subset of variables under analysis. Then a general unrestricted model (GUM) is formulated to provide a congruent approxim- ation to the LDGP, given the theoretical and previous empirical background. The empirical analysis commences from this general specification, after testing for mis-specifications, and if none are appar- ent, is simplified to a parsimonious, congruent representation, each simplification step being checked by diagnostic testing. Simplification can be done in many ways: and although the goodness of a model is intrinsic to it, and not a property of the selection route, poor routes seem unlikely to deliver useful models. Even so, some economists worry about the impact of selection rules on the properties of the resulting models, and insist on the use of a priori specifications: but these need knowledge of the answer before we start, so deny empirical modelling any useful role – and in practice, it has rarely contributed. Few studies have investigated how well general-to-specific modelling does. However, Hoover and Perez (1999) offer important evidence in a major Monte Carlo, reconsidering the Lovell (1983) experi- ments. They place 20 macro variables in databank; generate one (y) as a function of 0–5 others; regress y on all 20 plus all lags thereof, then let their algorithm simplify that GUM till it finds a congruent (encompassing) irreducible result. They check up to 10 different paths, testing for mis-specification, collect the results from each, then select one choice from the remainder – by following many paths, the algorithm is protected against chance false routes, and delivers an undominated congruent model. Nevertheless, Hendry and Krolzig (1999b) improve on their algorithm in several important respects and this section now describes these.

3.1 Pre-search reductions

First, groups of variables are tested in the order of their absolute t-values, commencing with a block where all the p-values exceed 0.9, and continuing down towards the pre-assigned selection criterion, when deletion must become inadmissible. A less-stringent significance level is used at this step, usually 10%, since the insignificant variables are deleted permanently. If no test is significant, the F-test on all variables in the GUM has been calculated, establishing that there is nothing to model.

3.2 Additional paths

Blocks of variables constitute feasible search paths, in addition to individual-coefficients, like the block F-tests in the preceding sub-section but along search paths. All paths that also commence with an insignificant t-deletion are explored.

3.3 Encompassing

Encompassing tests select between the candidate congruent models at the end of path searches. Each contender is tested against their union, dropping those which are dominated by, and do not dominate, another contender. If a unique model results,select that; otherwise, if some are rejected, form the union of the remaining models, and repeat this round till no encompassing reductions result. That union then constitutes a new starting point, and the complete path-search algorithm repeats till the union is unchanged between successive rounds.

3.4 Information criteria

When a union coincides with the original GUM, or with a previous union, so no further feasible reduc- tions can be found, PcGets selects a model by an information criterion. The preferred ‘final-selection’ rule presently is the Schwarz criterion, or BIC, defined as:

SC = −2 log L/T + p log(T )/T,

where L is the maximized likelihood, p is the number of parameters and T is the sample size. For T = 140 and p = 40, minimum SC corresponds approximately to the marginal regressor satisfying |t| ≥ 1. 9.

3.5 Sub-sample reliability

For that finally-selected model, sub-sample reliability is evaluated by the Hoover–Perez overlapping split-sample test. PcGets concludes that some variables are definitely excluded; some definitely in- cluded, and some have an uncertain role, varying from a reliability of 25% (included in the final model, but insignificant and insignificant in both sub-samples), through to 75% (significant overall and in one sub-sample, or in both sub-samples).

3.6 Significant mis-specification tests

If the initial mis-specification tests are significant at the pre-specified level, we raise the required signi- ficance level, terminating search paths only when that higher level is violated. Empirical investigators would re-specify the GUM on rejection. To see why Gets does well, we develop the analytics for several of its procedures.

4 The econometrics of model selection

The key issue for any model-selection procedure is the cost of search, since there are always bound to be mistakes in statistical inference: specifically, how bad is it to search across many alternatives? The conventional statistical analysis of repeated testing provides a pessimistic background: every test has a non-zero null rejection frequency (or size, if independent of nuisance parameters), and so type I errors

knowledge of everything in advance. But if partial explanations are devoid of use, and empirically we could discover nothing not already known, then no science could have progressed. That is clearly refuted by the historical record. The fallacy in Keynes’s argument is that since theoretical models are incomplete and incorrect, an econometrics that is forced to use such theories as the only permissible starting point for data analysis can contribute little useful knowledge, except perhaps rejecting the theories. When invariant features of reality exist, progressive research can discover them in part without prior knowledge of the whole: see Hendry (1995b). A similar analysis applies to the attack in Koopmans on the study by Burns and Mitchell: he relies on the (unstated) assumption that only one sort of economic theory is applicable, that it is correct, and that it is immutable (see Hendry and Morgan, 1995). Data mining is revealed when conflicting evidence exists or when rival models cannot be encom- passed – and if they can, then an undominated model results despite the inappropriate procedure. Thus, stringent critical evaluation renders the ‘data mining’ criticism otiose. Gilbert (1986) suggests separat- ing output into two groups: the first contains only redundant results (those parsimoniously encompassed by the finally-selected model), and the second contains all other findings. If the second group is not null, then there has been data mining. On such a characterization, Gets cannot involve data mining, despite depending heavily on data basing. When the LDGP is known a priori from economic theory, but an investigator did not know that the resulting model was in fact ‘true’, so sought to test conventional null hypotheses on its coefficients, then inferential mistakes will occur in general. These will vary as a function of the characteristics of the LDGP, and of the particular data sample drawn, but for many parameter values, the selected model will differ from the LDGP, and hence have biased coefficients. This is the ‘pre-test’ problem, and is quite distinct from the costs of searching across a general set of specifications for a congruent representation of the LDGP. If a wide variety of models would be reported when applying any given selection procedure to different samples from a common DGP, then the results using a single sample apparently understate the ‘true’ uncertainty. Coefficient standard errors only reflect sampling variation conditional on a fixed specification, with no additional terms from changes in that specification (see e.g., Chatfield, 1995). Thus, reported empirical estimates must be judged conditional on the resulting equation being a good approximation to the LDGP. Undominated (i.e., encompassing) congruent models have a strong claim to provide such an approximation, and conditional on that, their reported uncertainty is a good measure of the uncertainty inherent in such a specification for the relevant LDGP. The theory of repeated testing is easily understood: the probability p that none of n tests rejects at 100 α% is: pα = (1 − α)n^.

When 40 tests of correct null hypotheses are conducted at α = 0. 05 , p 0. 05 ' 0. 13 , whereas p 0. 005 '

    1. However, it is difficult to obtain spurious t-test values much in excess of three despite repeated testing: as Sargan (1981) pointed out, the t-distribution is ‘thin tailed’, so even the 0.5% critical value is less than three for 50 degrees of freedom. Unfortunately, stringent criteria for avoiding rejections when the null is true lower the power of rejection when it is false. The logic of repeated testing is accurate as a description of the statistical properties of mis-specification testing: conducting four independent diagnostic tests at 5% will lead to about 19% false rejections. Nevertheless, even in that context, there are possible solutions – such as using a single combined test – which can substantially lower the size without too great a power loss (see e.g., Godfrey and Veale, 1999). It is less clear that the analysis is a valid characterization of selection procedures in general when more one path is searched, so there is no ‘error correction’ for wrong reductions. In fact, the serious practical difficulty is not one of avoiding

‘spuriously significant’ regressors because of ‘repeated testing’ when many hypotheses are tested, it is retaining all the variables that genuinely matter. ‘Path dependence’ is when the results obtained in a modelling exercise depend on the simplification sequence adopted. Since the ‘quality’ of a model is intrinsic to it, and progressive research induces a sequence of mutually-encompassing congruent models, proponents of Gets consider that the path adopted is unlikely to matter. As Hendry and Mizon (1990) expressed the matter: ‘the model is the message’. Nevertheless, it must be true that some simplifications lead to poorer representations than others. One aspect of the value-added of the approach discussed below is that it ensures a unique outcome, so the path does not matter. We conclude that each of these criticisms of Gets can be refuted. Indeed, White (1990) showed that with sufficiently-rigorous testing, the selected model will converge to the DGP. Thus, any ‘overfitting’ and mis-specification problems are primarily finite sample. Moreover, Mayo (1981) emphasized the importance of diagnostic test information being effectively independent of the sufficient statistics from which parameter estimates are derived. Hoover and Perez (1999) show how much better Gets is than any method Lovell considered, suggesting that modelling per se need not be bad. Indeed, overall, the size of their selection procedure is close to that expected, and the power is reasonable. Moreover, re-running their experiments using our version ( PcGets ) delivered substantively better outcomes (see Hendry and Krolzig, 1999b). Thus, the case against model selection is far from proved.

4.1 Search costs

Let pdgpi denote the probability of retaining the ith^ variable out of k when commencing from the DGP specification and applying the relevant selection test at the same significance level as the search pro- cedure. Then 1 − pdgpi is the expected cost of inference. For irrelevant variables, pdgpi ≡ 0 , so that whole cost for those is attributed to search. Let pgumi denote the probability of retaining the ith^ variable when commencing from the GUM, and applying the same selection test and significance level. Then, the search costs are pdgpi − pgumi. False rejection frequencies of the null can be lowered by increasing the required significance levels of selection tests, but only at the cost of also reducing power. However, it is feasible to lower the former and raise the latter simultaneously by an improved search algorithm, subject to the bound of attaining the same performance as knowing the DGP from the outset. To keep search costs low, any model-selection process must satisfy a number of requirements. First, it must start from a congruent statistical model to ensure that selection inferences are reliable: con- sequently, it must test for model mis-specification initially, and such tests must be well calibrated (nom- inal size close to actual). Secondly, it must avoid getting stuck in search paths that initially inadvertently delete relevant variables, thereby retaining many other variables as proxies: consequently, it must search many paths. Thirdly, it must check that eliminating variables does not induce diagnostic tests to become significant during searches: consequently, model mis-specification tests must be computed at every stage. Fourthly, it must ensure that any ‘candidate’ model parsimoniously encompasses the GUM, so no loss of information has occurred. Fifthly, it must have a high probability of retaining relevant variables: consequently, a loose significance level and powerful selection tests are required. Sixthly, it must have a low probability of retaining variables that are actually irrelevant: consequently, this clashes with the fifth objective in part, but requires an alternative use of the available information. Finally, it must have powerful procedures to select between the candidate models, and any models derived from them, to end with a good model choice, namely one for which:

L =

∑^ k

i=

∣pdgpi −^ pgumi

when t-test selection is used should the null model be rejected. In general, when there are no relevant variables, the probability of retaining no variables using t-tests with critical value cα is:

P (|ti| < cα ∀i = 1,... , k) = (1 − α)k^. (21)

Combining (21) with the FG-test, the null model will be selected with approximate probability:

pG = (1 − γ) + γ∗^ (1 − α)k^ , (22)

where γ∗^ ≤ γ is the probability of FG rejecting yet no regressors being retained (conditioning on FG ≥ cγ cannot decrease the probability of at least one rejection). Since γ is set at quite a high value, such as 0.20, whereas α = 0. 05 is more usual, FG ≥ c 0. 20 can occur without any |ti| ≥ c 0. 05. Evaluating (22) for γ = 0. 20 , α = 0. 05 and k = 20 yields pG ' 0. 87 ; whereas the re-run of the Hoover–Perez experiments with k = 40 reported by Hendry and Krolzig (1999b) using γ = 0. 01 yielded 97.2% in the Monte Carlo as against a theory prediction from (22) of 99%. Alternatively, when γ = 0. 1 and α = 0. 01 (22) has an upper bound of 96.7%, falling to 91.3% for α = 0. 05. Thus, it is relatively easy to obtain a high probability of locating the null model, even when 40 irrelevant variables are included, using relatively tight significance levels, or a reasonable probability for looser significance levels.

4.4 Path selection probabilities

We now calculate how many spurious regressors will be retained in path searches. The probability distribution of one or more null coefficients being significant in pure t-test selection at significance level α is given by the k + 1 terms of the binomial expansion of:

(α + (1 − α))k^.

The following table illustrates by enumeration for k = 3:

event probability number retained P (|ti| < cα, ∀i = 1,... 3) (1 − α)^3 P (|ti| ≥ cα | |tj | < cα, ∀j 6 = i) 3 α (1 − α)^2 P (|ti| < cα | |tj | ≥ cα, ∀j 6 = i) 3 (1 − α) α^2 P (|ti| ≥ cα, ∀i = 1,... 3) α^3

Thus, for k = 3, the average number of variables retained is:

n = 3 × α^3 + 2 × 3 (1 − α) α^2 + 3α (1 − α)^2 = 3α = kα.

The result n = kα is general. When α = 0. 05 and k = 40, n equals 2 , falling to 0.4 for α = 0. 01 : so even if only t-tests are used, few spurious variables will be retained. Combining the probability of a non-null model with the number of variables selected when the GUM F-test rejects: p = γα,

(where p is the probability any given variable will be retained), which does not depend on k. For γ = 0. 1 , α = 0. 01 , we have p = 0. 001. Even for γ = 0. 25 and α = 0. 05 , p = 0. 0125 – before search paths and diagnostic testing are included in the algorithm. The actual behaviour of PcGets is much more complicated than this, but can deliver a small overall size. Following the event FG ≥ cγ when γ = 0. 1 (so the null is incorrectly rejected 10% of the time), and approximating by 0.5 variables retained when

that occurs, then the average ‘non-deletion’ probability (i.e., the probability any given variable will be retained) is pr = γn/k = 0.125%, as against the reported value of 0 .19% found by Hendry and Krolzig (1999b). These are very small retention rates of spuriously-significant variables. Thus, in contrast to the relatively high costs of inference discussed in the previous section, those of search arising from retaining additional irrelevant variables are almost negligible. For a reasonable GUM with (say) 40 variables where 25 are irrelevant, even without the pre-selection and multiple path searches of PcGets , and using just t-tests at 5%, roughly one spuriously significant variable will be retained by chance. Against that, from the previous section, there is at most a 50% chance of retaining each of the variables that have non-centralites around 2, and little chance of keeping them all: the difficult problem is retention of relevance, not elimination of irrelevance. The only two solutions are better inference procedures, or looser critical values; we will consider them both.

4.5 Improved inference procedures

An inference procedure involves a sequence of steps. As a simple example, consider a procedure com- prising two F-tests: the first is conducted at the γ = 50% level, the second at δ = 5%. The variables to be tested are first ordered by their t-values in the GUM, such that t^21 ≤ t^22 ≤ · · · ≤ t^2 k, and the first F-test adds in variables from the smallest observed t-values till a rejection would occur, with either F 1 > cγ or an individual |t| > cα (say). All those variables except the last are then deleted from the model, and a second F-test conducted of the null that all remaining variables are significant. If that rejects, so F 2 > cδ , all the remaining variables are retained, otherwise, all are eliminated. We will now analyze the probability properties of this 2-step test when all k regressors are orthogonal for a regression model estimated from T observations. Once m variables are included in the first step, non-rejection requires that (a) the diagnostics are insignificant; (b) m − 1 variables did not induce rejection, (c) |tm| < cα and (d):

F 1 (m, T − k) ' 1 m

∑^ m

i=

t^2 i ≤ cγ. (23)

Clearly, any t^2 i ≤ 1 reduces the mean F 1 statistic, and since P(|ti| < 1) = 0. 68 , when k = 40, approximately 28 variables fall in that group; and P(|ti| ≥ 1 .65) = 0. 1 so only 4 variables should chance to have a larger |ti| value on average. In the ‘conventional’ setting where α = 0. 05 with P(|ti| < 2) ' 0. 95 , only 2 variables will chance to have larger t-values, whereas slightly more than half will have t^2 i < 0. 5 or smaller. Since P(F 1 (20, 100) < 1 |H 0 ) ' 0. 53 , a first step with γ = 0. 5 should eliminate all variables with t^2 i ≤ 1 , and some larger t-values as well – hence the need to check that |tm| < cα (below we explain why collinearity between variables that matter and those that do not should not jeopardize this step). A crude approximation to the likely value of (23) under H 0 is to treat all t-values within blocks as having a value equal to the mid-point. We use the five ranges t^2 i < 0. 5 , 1 , 1. 652 , 4 , and greater than 4 , using the expected numbers falling in each of the first four blocks, which yields:

F 1 (38, 100) '

0. 25 × 20 + 0. 75 × 8 + 1. 332 × 8 + 1. 822 × 2

38 '^0.^84 ,

noting P(F 1 (38, 100) < 0. 84 |H 0 ) ' 0. 72 (setting all ts equal to the upper bound of each block yields an illustrative upper bound of about 1. 3 for F 1 ). Thus, surprisingly-large values of γ, such as 0. 75 , can be selected for this step yet have a high probability of eliminating almost all the irrelevant variables. Indeed, using γ = 0. 75 entails cγ ' 0. 75 when m = 20, since:

P (F 1 (20, 100) < 0. 75 | H 0 ) ' 0. 75 ,

PcGets embodies some further developments. First, PcGets undertakes ‘pre-search’ simplification F-tests to exclude variables from the general unrestricted model (GUM), after which the GUM is re- formulated. Since variables found to be irrelevant on such tests are excluded from later analyses, this step uses a loose significance level (such as 10%). Next, many possible paths from that GUM are in- vestigated: reduction paths considered include both multiple deletions as well as single, so t and/or F test statistics are used as simplification criteria. The third development concerns the encompassing step: all distinct contending valid reductions are collected, and encompassing is used to test between these (usually non-nested) specifications. Models which survive encompassing are retained; all encompassed equations are rejected. If multiple models survive this ‘testimation’ process, their union forms a new general model, and selection path searches recommence. Such a process repeats till a unique contender emerges, or the previous union is reproduced, then stops. Fourthly, the diagnostic tests require careful choice to ensure they characterize the salient attributes of congruency, are correctly sized, and do not overly restrict reductions. A further improvement concerns model choice when mutually-encompassing distinct models survive the encompassing step. A minimum standard error rule, as used by Hoover and Perez (1999), will probably ‘over-select’ as it corresponds to retaining all variables with |t| > 1. Instead, we employ information criteria which penalize the likelihood function for the number of parameters. Fi- nally, sub-sample information is used to accord a ‘reliability’ score to variables, which investigators may use to guide their model choice. In Monte Carlo experiments, a ‘progressive research strategy’ (PRS) can be formulated in which decisions on the final model choice are based on the outcomes of such reliability measure.

5.1 The multi-path reduction process of PcGets

The starting point for Gets model-selection is the general unrestricted model, so the key issues concern its specification and congruence. The larger the initial regressor set, the more likely adventitious effects will be retained; but the smaller the GUM, the more likely key variables will be omitted. Further, the less orthogonality between variables, the more ‘confusion’ the algorithm faces, leading to a proliferation of mutual-encompassing models, where final choices may only differ marginally (e.g., lag 2 versus 1).^1 Finally, the initial specification must be congruent, with no mis-specification tests failed at the outset. Empirically, the GUM would be revised if such tests rejected, and little is known about the consequences of doing so (although PcGets will enable such studies in the near future). In Monte Carlo experiments, the program automatically changes the significance levels of such tests. The reduction path relies on a classical, sequential-testing approach. The number of paths is in- creased to try all single-variable deletions, as well as various block deletions from the GUM. Different critical values can be set for multiple and single selection tests, and for diagnostic tests. Denote by η the significance level for the mis-specification tests (diagnostics) and by α the significance level for the selection t-tests (we ignore F tests for the moment). The corresponding p-values of these are denoted ̂ η and α̂ , respectively. During the specification search, the current specification is simplified only if no diagnostic test rejects its null. This corresponds to a likelihood-based model evaluation, where the likelihood function of model M is given by the density:

LM(θM) =

fM(Y; θM) −∞

if min

ˆηM(Y; ˜θM) − η

where fM(Y; θM) is the probability density function (pdf) associated with model M at the parameter

(^1) Some empirical examples for autoregressive-distributed lag (ADL) models and single-equation equilibrium-correction models (EqCM) are presented in section 7.

vector θM, for the sample Y. The vector of test statistics p-values, ˆηM(Y; ˜θM), is evaluated at the maximum likelihood estimate ˜θM under model M, and mapped into its marginal rejection probabilities. So the pdf of model M is only accepted as the likelihood function if the sample information coheres with the underlying assumptions of the model itself. In Monte Carlo experiments, PcGets sets the significance levels of the mis-specification tests endo- genously: when a test of the DGP (or ‘true model’) reveals a significant diagnostic outcome (as must happen when tests have a non-zero size), the significance level is adjusted accordingly. In the event that the GUM fails a mis-specification test at the desired significance level η′′ , a more stringent critical value is used. If the GUM also fails at the reduced significance leve η′ < η′′ , the test statistic is excluded from the test battery during the following search. Thus for the kth^ test we have that:

ηk =

η′′ η′ 0

if η̂k,GUM(Y, ˜θGUM) ∈

[η′′ , 1] [η′ , η′′ ) [0, η′ )

‘desired significance level’ ‘reduced significance level’ ‘test excluded’

where 0 < η′ < η′′ < 1. 2 Each set of search paths is ended by an encompassing step (see e.g., Mizon and Richard, 1986, and Hendry and Richard, 1989). Used as the last step of model selection, encompassing seems to help control the ‘size’ resulting from many path searches. When a given path eliminates a variable x that matters, other variables proxy such an effect, leading to a ‘spuriously large’ – and mis-specified

  • model. However, some other paths are likely to retain x, and in the encompassing tests, the proxies will frequently be revealed as conditionally redundant, inducing a smaller final model, focused on the genuine causal factors.

(^2) In contrast, Hoover and Perez (1999) drop such a test from the checking set (so an ever-increasing problem of that type may lurk undetected). Their procedure was justified on the grounds that if the GUM failed a specification test in a practical application, then an ‘LSE’ economist would expand the search universe to more variables, more lags, or transformations of the variables. In a Monte Carlo setting, however, it seems better to initially increase the nominal level for rejection, and if during any search path, that higher level is exceeded, then stop; we find that sometimes such GUM tests cease to be significant as reduction proceeds, and sometimes increase to reveal a flawed path.