Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

yule theory in statistics, Essays (university) of Statistics

notes for yules theory student help guide

Typology: Essays (university)

2016/2017

Uploaded on 11/16/2017

vasu-chanda
vasu-chanda 🇮🇳

1 document

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
VOLUME
II
FEBRUARY,
1903 No. 2
NOTES
ON THE
THEORY
OF
ASSOCIATION
OF
ATTRIBUTES
IN
STATISTICS.
BY
G.
UDNY YULE.
CONTENTS.
Introductory 121
1. Notation ; terminology ; tabulation, etc 122
2.
Consistence and inference 124
3.
Association 125
4.
On the theory of complete independence of a series of Attributes . . 127
5.
On the fallacies that may be caused by the mixing of distinct records . 132
THE
simplest possible form
of
statistical classification
is
" division"
(as the
logicians term
it)
" by dichotomy,"
i.a the
sorting
of the
objects
or
individuals
observed into
one or
other
of two
mutually exclusive classes according
as
they
do
or do not possess some character
or
attribute;
as one
may divide
men
into sane
and
insane,
the
members
of a
species
of
plants into hairy
and
glabrous,
or the
members
of
a
race
of
animals into males
and
females.
The
mere fact that
we do
employ
such
a
classification
in any
case must
not of
course
be
held
to
imply
a
natural
and
clearly defined boundary between
the
two classes; e.g. sanity
and
insanity, hairiness
and glabrousness,
may
pass into each other
by
such fine gradations that judgments
may differ
as to the
class
in
which
a
given individual should
be
entered.
The
judgment must however
be
finally decisive; intermediates
not
being classed
as
such even when observed.
The theory
of
statistics
of
this kind
is of a
good deal
of
importance,
not
merely because they
are of a
fairly common type—the statistics
of
hybridisation
experiments given
by the
followers
of
Mendel may
be
cited
as
recent examples
but because
the
ideas
and
conceptions required
in
such theory form
a
useful
introduction
to the
more complex
and
less purely logical theory
of
variables.
The
classical writings
on the
subject
are
those
of De
Morgan*, Boole
f and
JevonsJ,
the method
and
notation
of the
latter being used
in the
following Notes,
the
first
three sections
of
which
are an
abstract
of
the two memoirs referred
to
below§.
*
Format Logic,
chap,
vrn.,
"
On
the
Nnmerio&lly Definite Syllogism," 1847.
t
Analytit
of
Logic,
1847.
Laws
of
Thought,
1854.
J
"On a
General System
of
Numerically Definite Seasoning," Memoin
of
Manchester Literary and
Philosophical
Society, 1870. Reprinted
in
Pure Logic and other Minor Works, Macmillan,
1890.
8
"On the
Aiwociation
of
Attribntea
in
Statistics," Phil. Tratu.
A, Vol. 194
(1900),
p. 257. "On
the
theory
of
Consistence
of
Logical Class Frequencies," Phil. Trans.
A, Yol. 197
(1901),
p. 91.
Biomctriia
n 10
pf3
pf4
pf5
pf8
pfd
pfe

Partial preview of the text

Download yule theory in statistics and more Essays (university) Statistics in PDF only on Docsity!

VOLUME II FEBRUARY, 1903 No. 2

NOTES ON THE THEORY OF ASSOCIATION OF

ATTRIBUTES IN STATISTICS.

BY G. UDNY YULE.

CONTENTS.

Introductory 121

  1. Notation ; terminology ; tabulation, etc 122
  2. Consistence and inference 124
  3. Association 125
  4. On the theory of complete independence of a series of Attributes.. 127
  5. On the fallacies that may be caused by the mixing of distinct records. 132

THE simplest possible form of statistical classification is " division" (as the logicians term it) " by dichotomy," i.a the sorting of the objects or individuals observed into one or other of two mutually exclusive classes according as they do or do not possess some character or attribute ; as one may divide men into sane and insane, the members of a species of plants into hairy and glabrous, or the members of a race of animals into males and females. The mere fact that we do employ such a classification in any case must not of course be held to imply a natural and clearly defined boundary between the two classes; e.g. sanity and insanity, hairiness and glabrousness, may pass into each other by such fine gradations that judgments may differ as to the class in which a given individual should be entered. The judgment must however be finally decisive; intermediates not being classed as such even when observed. The theory of statistics of this kind is of a good deal of importance, not merely because they are of a fairly common type—the statistics of hybridisation experiments given by the followers of Mendel may be cited as recent examples— but because the ideas and conceptions required in such theory form a useful introduction to the more complex and less purely logical theory of variables. The classical writings on the subject are those of De Morgan, Boole f and JevonsJ, the method and notation of the latter being used in the following Notes, the first three sections of which are an abstract of the two memoirs referred to below§. *** Format* Logic, chap, vrn., " On the Nnmerio&lly Definite Syllogism," 1847. t Analytit of Logic, 1847. Laws of Thought, 1854. J "On a General System of Numerically Definite Seasoning," Memoin of Manchester Literary and Philosophical Society, 1870. Reprinted in Pure Logic and other Minor Works, Macmillan, 1890. 8 "On the Aiwociation of Attribntea in Statistics," Phil. Tratu. A, Vol. 194 (1900), p. 257. "On the theory of Consistence of Logical Class Frequencies," Phil. Trans. A, Yol. 197 (1901), p. 91. Biomctriia n 10

122 On the Theory of Association

  1. Notation; terminology ; relations between the class frequencies; tabulation. The notation used is as folio WB * : N = total number of observations, (-4) = no. of objects or individuals possessing attribute A, (a) = „ „ not possessing attribute A, (AB) — „ „ possessing both attributes A and B, (A/3) = „ „ „ attribute A but not B. (aB) = „ „ „ attribute B but not A, (a/9) = „ „ not possessing either attribute A or B, and so on for as many attributes as are specified. A class specified by n attributes in this notation may be termed a class of the nth order. The attributes denoted by English capitals may be termed positive attributes, and their contraries, denoted by the Greek letters, negative attributes. If two classes are such that every attribute in the one is the negative or contrary of the corresponding attribute in the other they may be termed contrary classes, and their frequencies contrary frequencies; (AB) and (off), (ABy) and (aflC) are for instance pairs of contraries. If the complete series of frequencies arrived at by noting n attributes is being tabulated, frequencies of the same order should be kept together. Those of the same order are best arranged by taking separately the set or "aggregate" of frequencies, derivable from each positive class by substituting negatives for one or more of the positive attributea Thus the frequencies for the case of three attributes may conveniently be tabulated in the order— Order 0. N Order 1. (A), (a) : (B), (£): (C),(y) Order 2. (AB), (A0), (*B), (ctf): (AC), (Ay), (aC), (ay): (BC), (By), (/SCO, (#7) Order 3. (ABC), (aBC), (ApC), (ABy), (a/3C), (aBy), (A$y), But since all frequencies are used non-exclusively, (A) denoting the frequency of objects possessing the attribute A with or without others and so forth, the frequency of any class can always be expressed in terms of the frequencies of classes of higher order; that is co say we have N :

(A)~(AB) + (A/3) *

( h

= (ABC) + (ABy) + (ApC) + (A0y) = etc

*** I h»ve snbititnted small Greek letter* for Jerons" italioj. Italics are rather troublesome when speaking, as one has to spell out a group like** AbcDE, "big A, little 6, little e, big D, big B." It is simpler to say JflyDE. The Greek become more troublesome when many letters are wanted, owing to the non-correspondence of the alphabets, but this is not often of consequence.*

124 (hi the Theoi-y of Association

  1. Consistence and Inference.

Although the positive-class frequencies (including N under that heading) are all independent in the sense that no single one can be expressed in terms of the others, they are nevertheless subject to certain limiting conditions if they are to be self-consistent, Le. such as might have been observed in one and the same field of observation or " universe," to use the convenient term of the logicians. Consider the case of three attributes, for example. It is evident that we must have

(AB) «£ 0

(A)

(B)

as (AB) must not be negative

as (A/3) as (aB)

and similar conditions must hold for (AC) and (BO). But these are not the only conditions that must hold. The second-order frequencies must not only be such as not to imply negative values for the frequencies of other classes of their own aggregates, but also must not imply negative values for any of the third-order frequencies. Expanding all the third-order frequencies in terms of the frequencies of positive classes, and putting the resulting expansion •<£ 0, we have

*£(AB) + (AC)-(A)

•$(AB) + (BC)-(B)

<t(AC)+(BC)-(C)

>(AB)

>(AC)

>(BC)

> (AB) + (AC) + (BC)-(A)- (B) - (C)

or the frequency given below will be negative

(ABC) [m

(A/3y) [2] (aBy) [3] (a/3(7) [4] (ABy) [5] (Af3C) [6] (aBC) [7]

+ F (afa) [8]

But if any one of the minor limits [1]—[4] be greater than any one of the major limits [5]—[8] these conditions are impossible of fulfilment. There are four minor limits to be compared with four major limits or sixteen comparisons in all to be made; but the majority of these, twelve in all, only lead back to conditions of the form (4). The four comparisons of expansions due to contrary frequencies alone lead to new conditions—viz.

(AB) + (AC) + (BC) < (A) + (B) + (C)-N

G. U. YULE 125

These conditions give limits to any one of the three frequencies (AB), (AC) and (BC) iu terms of the other two and the frequencies of the first order, i.e. enable us to infer limits to the one class-frequency in terms of the others. It will very usually happen in practical statistical cases that the limits so obtained are value- less, lying outside those given by the simpler conditions (4), but that is merely because in practice the values of the assigned frequencies, e.g. (AB) and (AC), seldom approach sufficiently closely to their limiting values to render inference possible.

  1. Association.

Two attributes, A and B, are usually defined to be independent, within any given field of observation or " universe," when the chance of finding them together is the product of the chances of finding either of them separately. The physical meaning of the definition seems rather clearer in a different form of statement, viz. if we define A and B to be independent when the proportion of A's amongst the B"s of the given universe is the same as in that universe at large. If for instance the question were put " What is the test for independence of small-pox attack and vaccination?", the natural reply would be "The percentage of vaccinated amongst the attacked should be the same as in the general popu- lation " or " The percentage of attacked amongst the vaccinated should be the same as in the general population." The two definitions are of course identical in effect, and permit of the same simple symbolical expression in our notation; the criterion of independence of A and B is in fact

In this equation the attributes specifying the universe are understood, not expressed. If all objects or individuals in the universe are to possess an attribute or series of attributes K it may be written

(AK)(BK)

(ABK) ^ —.

An equation of such form must be recognised as the criterion of independence for A and B within the universe K. As I have shewn in the first memoir referred to in note §, p. 121, if the relation (7) hold good, the three similar relations for the remaining frequencies of the " aggregate "—Le. the set of frequencies obtained by substituting their contraries a, 8 for A or B or both—must also hold, viz.

G. U. YULE 127

This point is frequently forgotten. In an investigation as to the inheritance of deaf-mutism in America*, for instance, only the offspring of deaf-mutes were observed, and the argument consequently breaks down on page after page into conjectural statements as to points on which the editor has no information— e.g. the proportion of deaf-mutes amongst the children of normals.

The difference of (AB)/(A) from (S)/JV and of (AB)/(B) from (A)/N are of course not, as a rule, the same, and it would be useful and convenient to measure the " association " by some more symmetrical method—a " coefficient of association " ranging between ± 1 like the coefficient of correlation. In the first memoir referred to in note §, p. 121, such a coefficient, of empirical form, was suggested, but that portion of the memoir should now be read in connection with a later memoir by Professor Pearson f.

  1. On the theory of complete independence of a series of Attributes.

The tests for independence are by no means simple when the number of attributes is more than two. Under what circumstances should we say that a series of attributes ABGD... were completely independent? I believe not a few statisticians would reply at once " if the chance of finding them together were equal to the product of the chances of finding them separately," yet such a reply would be in error. The mere result

(ABCB...)_(A) (B) (G)

~F N'N'N'N

does not in general give any information as to the independence or otherwise of the attributes concerned. If the attributes are known to be completely inde- pendent then certainly the relation (9) holds good, but the converse is not true. " Equations of independence" of the form (9) must be shewn to hold for more than one class of any aggregate, of an order higher than the second, before the complete independence of the attributes can be inferred.

From the physical point of view complete independence can only be said to subsist for a series of attributes ABGD... within a given universe, when every pair of such attributes exhibits independence not only within the universe at large but also in every sub-universe specified by one or more of the remaining attributes of the series, or their contraries. Thus three attributes A, B, G are completely independent within a given universe if AB, AC and BC are independent within that universe and also

AB independent within the universes G and 7, •A-G „ „ „ B „ fS, BC „ „ „ A „ a. _ ilarriag.es of the Deaf in America,_* ed. by E. A. Fay. Volta Bureau, Washington, 1898. t Phil. Tram. VoL 195, p. 16.

128 On the Theory of Association

If a series of attributes are completely independent according to this definition relations of the form (9) must hold for the frequency of every class of every possible order. Take the class-frequency (ABCD) of the fourth order for instance. A and B are, by the terms of the definition, independent within the universe CD. Therefore

But A and G, and also B and C, are independent within the universe D. Therefore the fraction on the right is equal to

1 (AD) (CD) (BD) (CD) _ (AD) (BD) (CD) (CD)' (D) • (25) (DJ

But again AD, BD, CD are each independent within the universe at large; therefore finally

N * N m~F If

Any other frequency can be reduced step by step in precisely the same way.

Now consider the converse problem. The total frequency N is given and also the n frequencies (A), (B), (C), etc. In how many of the ultimate frequencies (ABCD...MN), (aBCD...MN), etc. must "relations of independence " of the form

hold good, in order that complete independence of the attributes may be inferred? The answer is suggested at once by the following consideration. The number of ultimate frequencies (frequencies of order n) is 2B^ ; the number of frequencies given is n + 1. If then all but n + 1 of the ultimate frequencies are given in terms of the equations of independence, the remaining frequencies are deter- minate ; either these determinate values must be those that, would be given by equations of independence, or a state of complete independence must be impossible. Suppose all the ultimate class-frequencies to have been tested and found to be given by the equations of independence, with the exception of the negative class (a/3<yb...f*.v)and the n classes with one positive attribute (Afiyh... pv),(aByo... nv), etc. Take any one of these untested class-frequencies, (Af3y& ... nv), and we have for example A0B A(ABCD ... MN) -(ABCD ...Mv) — other terms with one negative -(ABCD.../iv) — other terms with two negatives

-(AByS ... fiv) — other terms with n — 2 negatives.

130 On the Theory of Association

Replacing C by N— (7) and regrouping in similar pairs of terms containing (D) and (8) this will become

  • e t c }

and continuing the same process until all the frequencies (D) (E). .. ( M) (N) are eliminated, Le. n — 1 times altogether,

That is to say the theorem must be true quite generally: " A series of n attributes ABG... MN are completely independent if the relations of independence are proved to hold for (2* — n + 1) of the 2" ultimate frequencies; such relations must then bold for the remaining n + 1 frequencies also." If the ultimate frequencies are only given by the relations of independence in n cases or less, independence may exist for certain pairs of attributes in certain universes but not in general. The mere fact of the relation holding for one class, e.g.

implies nothing—in striking contrast to the simple case of two attributes, where 2* — n + 1 = 1 and only the one class-frequency need be tested in order to see if independence exists. In the case of three attributes the number of third-order classes is eight, of which four must be tested in order to be certain that complete independence exists. In the case of four attributes there are sixteen fourth-order classes of which eleven must be tested, and so on.

I have dealt with the problem hitherto on the assumption that only the first- order and the nth order frequencies were given, and that the frequencies of intermediate orders were unknown—or at least uncalculated, for of course the frequencies of all lower orders may be expressed in terms of those of the nth order. If however the frequencies of all orders may be supposed known, the above result may be thrown into a somewhat interesting form. It will be remembered that the frequency of any class of any order may be expressed in terms of the frequencies of the positive classes [(A) (AB) (AC) (ABO) eta] of its own and lower orders. Then complete independence exists for a series of attributes if the criterion of independence hold for all the positive-cbiss frequencies up to that of the nth order. If we have for instance

G. U. YULE 131

and also

we must have

(aBCD ... MN) = {BCD ... MN) - (ABCD... MN)

= J L m (G)(D) ... (if) (N)} {N-(A)}

and so on. The number of class-frequencies to be tested in order to demonstrate the existence of complete independence is, of course, the same as before, viz.

It should be noted aa a consequence of these results that the definition of " complete independence " given on p. 127 is redundant in its terms. It is quite true that if complete independence subsist for a series of attributes every possible pair must exhibit independence in every possible sub-universe as well as in the universe at large, but it is not necessary to apply the criterion of independence to all these possible cases. In the case of three attributes for instance the criterion of independence need only be applied to four frequencies, as we have just seen, in order to demonstrate complete independence; it cannot then be necessary, as suggested by the definition, to test nine different associations, viz.

\AB\

\AC\

\BC\

\AB\

\AC\

\BG\

\o\

A 1

\AB\

\AC\

\BC\

B

a

in the notation of my memoir on Association (an expression like ) AB j G | specifying " the association between A and B in the universe of (7s"). It is in fact only necessary to test |-4-B|, \AC, \BC, and | J 4 5 | ( 7 | (or one of the other three partial associations in positive universes). If these are zero, the remaining associations must be zero also; for we are given

(ABC) = ^ (AC) (BG) = i s (A) (B) (0),

Le. | AG\ B |, | BG \ A |, etc. are zero. Quite generally, it is only necessary, if the testing be supposed to proceed from the second order classes upwards, to test one of all the possible partial associations corresponding to each positive class. If there be four attributes A BCD, the six total associations | AB |, | AC , | AD |, | _BG _

17—

G. U. YULB 133

negative value we cannot be sure that nevertheless | _AB \ C_ and | AB | y | are not both zero. Some given attribute might, for instance, be inherited neither in the male line nor the female line; yet a mixed record might exhibit a considerable apparent inheritance. Suppose for instance that 50 % of the fathers and of the sons exhibit the attribute, but only 10 % of the mothers and daughters. Then if there be no inheritance in either line of descent the record must give (approximately)

fathers with attribute and sons with attribute 25 % „ without 25°/o without „ „ with „ 2 5 % without „ 2 5 %

mothers with attribute and daughters with attribute 1 % „ without „ 9 % without „ „ „ with „ 9 % „ without „ 81°/o.

If these two records be mixed in equal proportions we get

parents with attribute and offspring with attribute 13 % ,, without „ 1 7 % without „ „ „ with „ 1 7 % » „ n „ „ without „ 5 3 %

Here 13/30 = 43£ % °f the offspring of parents with the attribute possess the attribute themselves, but only 30% °f offspring in general, i.e. there is quite a large but illusory inheritance created simply by the mixture of the two distinct records. A similar illusory association, that is to say an association to which the most obvious physical meaning must not be assigned, may very probably occur in any other case in which different records are pooled together or in which only one record is made of a lot of heterogeneous material.

Consider the case quite generally. Given that | AB \ G | and | AB \ y | are both zero, find the value of (AB). From the data we have at once

/ A »M - (^7) (By) _ [(A) - (AC)} [(B) - (BC)]

(AQ(BC)

(ABC)^ ^y—.

Adding

_N(AC)(BC)-(A)(C)(BO-(B)(C)(AO) + (A)(B)(C)

{AB) <0>[Jr-(O] •

134 On the Theory of Association

Write

(AB=±(A)(B), (AC\ = ±T(A)(C), (BG\ = i (5)(C),

subtract _(AB_ from both sides of the above equation, simplify, and we have

(AB) - {AB\ C[N-{C)] ' That is to say, there will be apparent association between A and B in the universe at large unless either A or B is independent of G. Thus, in the imaginary case of inheritance given above, if A and B stand for the presence of the attribute in the parents and the offspring respectively, and C for the male sex, we find a positive association between A and B in the universe at large (the pooled results) because A and B are both positively associated with C, i.e. the males of both generations possess the attribute more frequently than the females. The " parents with attribute " are mostly males; as we have only noted offspring of the same sex as the parents, their offspring mnst be mostly males in the same proportion, and therefore more liable to the attribute than the mostly-female offspring of "parents without attribute." It follows obviously that if we had found no inheritance to exist in any one of the four possible lines of descent (male-male, male-female, female-male, and female-female), no fictitious inheiitance could have been introduced by the pooling of the four records. The pooling of the two records for the crossed-sex lines would give rise to a fictitious negative inherit- ance—disinheritance—cancelling the positive inheritance created by the pooling of the records for the same-sex linea I leave it to the reader to verify these statements by following out the arithmetical example just given should he so desire. The fallacy might lead to seriously misleading results in several cases where mixtures of the two sexes occur. Suppose for instance experiments were being made with some new antitoxin on patients of both sexes. There would nearly always be a difference between the case-rates of mortality for the two. If the female cases terminated fatally with the greater frequency and the antitoxin were administered most often to the males, & fictitious association between " antitoxin " and "cure" would be created at once. The general expression for _(AB) — {AB_ shews how it may be avoided; it is only necessary to administer the antitoxin to the same proportion of patients of both sexes. This should be kept constantly in mind as an essential rule in such experiments if it is desired to make the most use of the results. The fictitious association caused by mixing records finds its counterpart in the spurious correlation to which the same process may give rise in the case of continuous variables, a case to which attention was drawn and which was fully discussed by Professor Pearson in a recent memoir*. If two separate records, for each of which the correlation is zero, be pooled together, a spurious correlation will necessarijy be created unless the mean of one of the variables, at least, be the same in the two cases.

- Phil Tram. A, Vol. 192, p. 277.