








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
notes for yules theory student help guide
Typology: Essays (university)
1 / 14
This page cannot be seen from the preview
Don't miss anything!
NOTES ON THE THEORY OF ASSOCIATION OF
ATTRIBUTES IN STATISTICS.
Introductory 121
THE simplest possible form of statistical classification is " division" (as the logicians term it) " by dichotomy," i.a the sorting of the objects or individuals observed into one or other of two mutually exclusive classes according as they do or do not possess some character or attribute ; as one may divide men into sane and insane, the members of a species of plants into hairy and glabrous, or the members of a race of animals into males and females. The mere fact that we do employ such a classification in any case must not of course be held to imply a natural and clearly defined boundary between the two classes; e.g. sanity and insanity, hairiness and glabrousness, may pass into each other by such fine gradations that judgments may differ as to the class in which a given individual should be entered. The judgment must however be finally decisive; intermediates not being classed as such even when observed. The theory of statistics of this kind is of a good deal of importance, not merely because they are of a fairly common type—the statistics of hybridisation experiments given by the followers of Mendel may be cited as recent examples— but because the ideas and conceptions required in such theory form a useful introduction to the more complex and less purely logical theory of variables. The classical writings on the subject are those of De Morgan, Boole f and JevonsJ, the method and notation of the latter being used in the following Notes, the first three sections of which are an abstract of the two memoirs referred to below§. *** Format* Logic, chap, vrn., " On the Nnmerio&lly Definite Syllogism," 1847. t Analytit of Logic, 1847. Laws of Thought, 1854. J "On a General System of Numerically Definite Seasoning," Memoin of Manchester Literary and Philosophical Society, 1870. Reprinted in Pure Logic and other Minor Works, Macmillan, 1890. 8 "On the Aiwociation of Attribntea in Statistics," Phil. Tratu. A, Vol. 194 (1900), p. 257. "On the theory of Consistence of Logical Class Frequencies," Phil. Trans. A, Yol. 197 (1901), p. 91. Biomctriia n 10
( h
*** I h»ve snbititnted small Greek letter* for Jerons" italioj. Italics are rather troublesome when speaking, as one has to spell out a group like** AbcDE, "big A, little 6, little e, big D, big B." It is simpler to say JflyDE. The Greek become more troublesome when many letters are wanted, owing to the non-correspondence of the alphabets, but this is not often of consequence.*
Although the positive-class frequencies (including N under that heading) are all independent in the sense that no single one can be expressed in terms of the others, they are nevertheless subject to certain limiting conditions if they are to be self-consistent, Le. such as might have been observed in one and the same field of observation or " universe," to use the convenient term of the logicians. Consider the case of three attributes, for example. It is evident that we must have
(AB) «£ 0
as (AB) must not be negative
as (A/3) as (aB)
and similar conditions must hold for (AC) and (BO). But these are not the only conditions that must hold. The second-order frequencies must not only be such as not to imply negative values for the frequencies of other classes of their own aggregates, but also must not imply negative values for any of the third-order frequencies. Expanding all the third-order frequencies in terms of the frequencies of positive classes, and putting the resulting expansion •<£ 0, we have
or the frequency given below will be negative
(A/3y) [2] (aBy) [3] (a/3(7) [4] (ABy) [5] (Af3C) [6] (aBC) [7]
But if any one of the minor limits [1]—[4] be greater than any one of the major limits [5]—[8] these conditions are impossible of fulfilment. There are four minor limits to be compared with four major limits or sixteen comparisons in all to be made; but the majority of these, twelve in all, only lead back to conditions of the form (4). The four comparisons of expansions due to contrary frequencies alone lead to new conditions—viz.
G. U. YULE 125
These conditions give limits to any one of the three frequencies (AB), (AC) and (BC) iu terms of the other two and the frequencies of the first order, i.e. enable us to infer limits to the one class-frequency in terms of the others. It will very usually happen in practical statistical cases that the limits so obtained are value- less, lying outside those given by the simpler conditions (4), but that is merely because in practice the values of the assigned frequencies, e.g. (AB) and (AC), seldom approach sufficiently closely to their limiting values to render inference possible.
Two attributes, A and B, are usually defined to be independent, within any given field of observation or " universe," when the chance of finding them together is the product of the chances of finding either of them separately. The physical meaning of the definition seems rather clearer in a different form of statement, viz. if we define A and B to be independent when the proportion of A's amongst the B"s of the given universe is the same as in that universe at large. If for instance the question were put " What is the test for independence of small-pox attack and vaccination?", the natural reply would be "The percentage of vaccinated amongst the attacked should be the same as in the general popu- lation " or " The percentage of attacked amongst the vaccinated should be the same as in the general population." The two definitions are of course identical in effect, and permit of the same simple symbolical expression in our notation; the criterion of independence of A and B is in fact
In this equation the attributes specifying the universe are understood, not expressed. If all objects or individuals in the universe are to possess an attribute or series of attributes K it may be written
An equation of such form must be recognised as the criterion of independence for A and B within the universe K. As I have shewn in the first memoir referred to in note §, p. 121, if the relation (7) hold good, the three similar relations for the remaining frequencies of the " aggregate "—Le. the set of frequencies obtained by substituting their contraries a, 8 for A or B or both—must also hold, viz.
This point is frequently forgotten. In an investigation as to the inheritance of deaf-mutism in America*, for instance, only the offspring of deaf-mutes were observed, and the argument consequently breaks down on page after page into conjectural statements as to points on which the editor has no information— e.g. the proportion of deaf-mutes amongst the children of normals.
The difference of (AB)/(A) from (S)/JV and of (AB)/(B) from (A)/N are of course not, as a rule, the same, and it would be useful and convenient to measure the " association " by some more symmetrical method—a " coefficient of association " ranging between ± 1 like the coefficient of correlation. In the first memoir referred to in note §, p. 121, such a coefficient, of empirical form, was suggested, but that portion of the memoir should now be read in connection with a later memoir by Professor Pearson f.
The tests for independence are by no means simple when the number of attributes is more than two. Under what circumstances should we say that a series of attributes ABGD... were completely independent? I believe not a few statisticians would reply at once " if the chance of finding them together were equal to the product of the chances of finding them separately," yet such a reply would be in error. The mere result
does not in general give any information as to the independence or otherwise of the attributes concerned. If the attributes are known to be completely inde- pendent then certainly the relation (9) holds good, but the converse is not true. " Equations of independence" of the form (9) must be shewn to hold for more than one class of any aggregate, of an order higher than the second, before the complete independence of the attributes can be inferred.
From the physical point of view complete independence can only be said to subsist for a series of attributes ABGD... within a given universe, when every pair of such attributes exhibits independence not only within the universe at large but also in every sub-universe specified by one or more of the remaining attributes of the series, or their contraries. Thus three attributes A, B, G are completely independent within a given universe if AB, AC and BC are independent within that universe and also
AB independent within the universes G and 7, •A-G „ „ „ B „ fS, BC „ „ „ A „ a. _ ilarriag.es of the Deaf in America,_* ed. by E. A. Fay. Volta Bureau, Washington, 1898. t Phil. Tram. VoL 195, p. 16.
If a series of attributes are completely independent according to this definition relations of the form (9) must hold for the frequency of every class of every possible order. Take the class-frequency (ABCD) of the fourth order for instance. A and B are, by the terms of the definition, independent within the universe CD. Therefore
But A and G, and also B and C, are independent within the universe D. Therefore the fraction on the right is equal to
1 (AD) (CD) (BD) (CD) _ (AD) (BD) (CD) (CD)' (D) • (25) (DJ •
But again AD, BD, CD are each independent within the universe at large; therefore finally
Any other frequency can be reduced step by step in precisely the same way.
Now consider the converse problem. The total frequency N is given and also the n frequencies (A), (B), (C), etc. In how many of the ultimate frequencies (ABCD...MN), (aBCD...MN), etc. must "relations of independence " of the form
hold good, in order that complete independence of the attributes may be inferred? The answer is suggested at once by the following consideration. The number of ultimate frequencies (frequencies of order n) is 2B^ ; the number of frequencies given is n + 1. If then all but n + 1 of the ultimate frequencies are given in terms of the equations of independence, the remaining frequencies are deter- minate ; either these determinate values must be those that, would be given by equations of independence, or a state of complete independence must be impossible. Suppose all the ultimate class-frequencies to have been tested and found to be given by the equations of independence, with the exception of the negative class (a/3<yb...f*.v)and the n classes with one positive attribute (Afiyh... pv),(aByo... nv), etc. Take any one of these untested class-frequencies, (Af3y& ... nv), and we have for example A0B A(ABCD ... MN) -(ABCD ...Mv) — other terms with one negative -(ABCD.../iv) — other terms with two negatives
-(AByS ... fiv) — other terms with n — 2 negatives.
Replacing C by N— (7) and regrouping in similar pairs of terms containing (D) and (8) this will become
and continuing the same process until all the frequencies (D) (E). .. ( M) (N) are eliminated, Le. n — 1 times altogether,
That is to say the theorem must be true quite generally: " A series of n attributes ABG... MN are completely independent if the relations of independence are proved to hold for (2* — n + 1) of the 2" ultimate frequencies; such relations must then bold for the remaining n + 1 frequencies also." If the ultimate frequencies are only given by the relations of independence in n cases or less, independence may exist for certain pairs of attributes in certain universes but not in general. The mere fact of the relation holding for one class, e.g.
implies nothing—in striking contrast to the simple case of two attributes, where 2* — n + 1 = 1 and only the one class-frequency need be tested in order to see if independence exists. In the case of three attributes the number of third-order classes is eight, of which four must be tested in order to be certain that complete independence exists. In the case of four attributes there are sixteen fourth-order classes of which eleven must be tested, and so on.
I have dealt with the problem hitherto on the assumption that only the first- order and the nth order frequencies were given, and that the frequencies of intermediate orders were unknown—or at least uncalculated, for of course the frequencies of all lower orders may be expressed in terms of those of the nth order. If however the frequencies of all orders may be supposed known, the above result may be thrown into a somewhat interesting form. It will be remembered that the frequency of any class of any order may be expressed in terms of the frequencies of the positive classes [(A) (AB) (AC) (ABO) eta] of its own and lower orders. Then complete independence exists for a series of attributes if the criterion of independence hold for all the positive-cbiss frequencies up to that of the nth order. If we have for instance
and also
we must have
and so on. The number of class-frequencies to be tested in order to demonstrate the existence of complete independence is, of course, the same as before, viz.
It should be noted aa a consequence of these results that the definition of " complete independence " given on p. 127 is redundant in its terms. It is quite true that if complete independence subsist for a series of attributes every possible pair must exhibit independence in every possible sub-universe as well as in the universe at large, but it is not necessary to apply the criterion of independence to all these possible cases. In the case of three attributes for instance the criterion of independence need only be applied to four frequencies, as we have just seen, in order to demonstrate complete independence; it cannot then be necessary, as suggested by the definition, to test nine different associations, viz.
a
in the notation of my memoir on Association (an expression like ) AB j G | specifying " the association between A and B in the universe of (7s"). It is in fact only necessary to test |-4-B|, \AC, \BC, and | J 4 5 | ( 7 | (or one of the other three partial associations in positive universes). If these are zero, the remaining associations must be zero also; for we are given
Le. | AG\ B |, | BG \ A |, etc. are zero. Quite generally, it is only necessary, if the testing be supposed to proceed from the second order classes upwards, to test one of all the possible partial associations corresponding to each positive class. If there be four attributes A BCD, the six total associations | AB |, | AC , | AD |, | _BG _
17—
G. U. YULB 133
negative value we cannot be sure that nevertheless | _AB \ C_ and | AB | y | are not both zero. Some given attribute might, for instance, be inherited neither in the male line nor the female line; yet a mixed record might exhibit a considerable apparent inheritance. Suppose for instance that 50 % of the fathers and of the sons exhibit the attribute, but only 10 % of the mothers and daughters. Then if there be no inheritance in either line of descent the record must give (approximately)
fathers with attribute and sons with attribute 25 % „ without „ 25°/o without „ „ with „ 2 5 % without „ 2 5 %
mothers with attribute and daughters with attribute 1 % „ without „ 9 % without „ „ „ with „ 9 % „ without „ 81°/o.
If these two records be mixed in equal proportions we get
parents with attribute and offspring with attribute 13 % ,, without „ 1 7 % without „ „ „ with „ 1 7 % » „ n „ „ without „ 5 3 %
Here 13/30 = 43£ % °f the offspring of parents with the attribute possess the attribute themselves, but only 30% °f offspring in general, i.e. there is quite a large but illusory inheritance created simply by the mixture of the two distinct records. A similar illusory association, that is to say an association to which the most obvious physical meaning must not be assigned, may very probably occur in any other case in which different records are pooled together or in which only one record is made of a lot of heterogeneous material.
Consider the case quite generally. Given that | AB \ G | and | AB \ y | are both zero, find the value of (AB). From the data we have at once
Adding
Write
subtract _(AB_ from both sides of the above equation, simplify, and we have
(AB) - {AB\ C[N-{C)] ' That is to say, there will be apparent association between A and B in the universe at large unless either A or B is independent of G. Thus, in the imaginary case of inheritance given above, if A and B stand for the presence of the attribute in the parents and the offspring respectively, and C for the male sex, we find a positive association between A and B in the universe at large (the pooled results) because A and B are both positively associated with C, i.e. the males of both generations possess the attribute more frequently than the females. The " parents with attribute " are mostly males; as we have only noted offspring of the same sex as the parents, their offspring mnst be mostly males in the same proportion, and therefore more liable to the attribute than the mostly-female offspring of "parents without attribute." It follows obviously that if we had found no inheritance to exist in any one of the four possible lines of descent (male-male, male-female, female-male, and female-female), no fictitious inheiitance could have been introduced by the pooling of the four records. The pooling of the two records for the crossed-sex lines would give rise to a fictitious negative inherit- ance—disinheritance—cancelling the positive inheritance created by the pooling of the records for the same-sex linea I leave it to the reader to verify these statements by following out the arithmetical example just given should he so desire. The fallacy might lead to seriously misleading results in several cases where mixtures of the two sexes occur. Suppose for instance experiments were being made with some new antitoxin on patients of both sexes. There would nearly always be a difference between the case-rates of mortality for the two. If the female cases terminated fatally with the greater frequency and the antitoxin were administered most often to the males, & fictitious association between " antitoxin " and "cure" would be created at once. The general expression for _(AB) — {AB_ shews how it may be avoided; it is only necessary to administer the antitoxin to the same proportion of patients of both sexes. This should be kept constantly in mind as an essential rule in such experiments if it is desired to make the most use of the results. The fictitious association caused by mixing records finds its counterpart in the spurious correlation to which the same process may give rise in the case of continuous variables, a case to which attention was drawn and which was fully discussed by Professor Pearson in a recent memoir*. If two separate records, for each of which the correlation is zero, be pooled together, a spurious correlation will necessarijy be created unless the mean of one of the variables, at least, be the same in the two cases.
- Phil Tram. A, Vol. 192, p. 277.