Taylor Francis Communications in Statistics-Theory and Methods ISSN:0361-0926(Print)1532-415X(Online)Journal homepage:http://www.tandfonline.com/loi/lsta20 The Chi Square Test With Both Margins Fixed I.S.Alalouf To cite this article:I.S.Alalouf(1987)The Chi Square Test With Both Margins Fixed,Communications in Statistics-Theory and Methods,16:1,29-43,DOl: 10.1080/03610928708829350 To link to this article:http://dx.doi.org/10.1080/03610928708829350 曲 Published online:27 Jun 2007. E Submit your article to this journal Article views:12 View related articles 电 Citing articles:3 View citing articles Full Terms Conditions of access and use can be found at http://www.tandfonline.com/action/journallnformation?journalCode=lsta20 Download by:[China Science Technology University] Date:14 September 2015,At:17:36
Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=lsta20 Download by: [China Science & Technology University] Date: 14 September 2015, At: 17:36 Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 The Chi Square Test With Both Margins Fixed I.S. Alalouf To cite this article: I.S. Alalouf (1987) The Chi Square Test With Both Margins Fixed, Communications in Statistics - Theory and Methods, 16:1, 29-43, DOI: 10.1080/03610928708829350 To link to this article: http://dx.doi.org/10.1080/03610928708829350 Published online: 27 Jun 2007. Submit your article to this journal Article views: 12 View related articles Citing articles: 3 View citing articles
COMMUN.STATIST.-THEOR.METH.,16(1),29-43 (1987) THE CHI SQUARE TEST WITH BOTH MARGINS FIXED I.S.Alalouf Universite du Quebec a Montreal Key Wonds and Phiases:contingency table:Pearson's statistic; asymptotic distnibution ABSTRACT It is well known that the chi square test for independence in a two-way contingency table is valid when the cell frequen- cies follow either a multinomial distribution or a product of multinomial distributions.We show that the test is valid in the third case as well,namely the case where both margins are fixed.The proof is reasonably self-contained,relying essen- tially on the Central Limit Theorem,and is easily made to cover the first two cases.It also leads naturally to a discussion of partitioning degrees of freedom. 1.INTRODUCTION 办 Consider an r x c contingency table with frequencies xij, i=1,..,r,j 1,..,c.Let x+,....,xe the row totals and N],...,N the column totals.Let N=E N;be the total sample size.It is well known that,under the hypothesis of independence,the distribution of Pearson's chi square statistic, 29 Copyright 1987 by Marcel Dekker,Inc
AQCTD.hP'P z>L,L, AA.J.A.2L It is well known that the cni square tesi for independmae iii a tws-way contingency table is valid =hen the cell frequencies follow either a multinomial distribution or a product of multinomiai distributions. We siiow that the test 2s ;;=7-' "--A= X?: ' the third case as well, namely the case where both margins are fixed. The proof is reasonably self-contained, relying essentially on the Central Limit Theorem, and is easily made to cover the first two cases. it also leads naturally to a discussion of ;---'-:--..-- U~JL LAIIUL~~~~~ dLBILLY .3m,-dac of f reedom: 1. INTRODUCTION Consider ai~ rr c contingency table with frequencies x ij ' i=l ,..Y r, j = 1,. . Let xl+,..-.,xrt be the row totals c and N1,. . . , N the column totals. Let N = C N. be the total '-1 J sample size. It is well known that, under &e hypothesis of independence, the distribution of Pearson's chi square statistic, 29 Downloaded by [China Science & Technology University] at 17:36 14 September 2015 Copyright @ 1987 by Marcel Dekker, Inc
30 ALALOUF X-克82 (1.1) 1=jj=1+N1N is a chi square uder either one of the following models:(1)N is fixed and the xij's have a joint multinomial distribution;(2) the column totals N1,..,Nc are fixed and for j=1,...,c,the random vectors (x1j)are independent multinomials. Specifically,let (pij,i=1,....r,j=1,..c)denote the cell probabilities under the multinomial model (1)Pi+= P1jr1F1,,r and p+.对?P1j寸s1,,c.The nu11 1 hypothesis is that Pij=Pit P.and under this hypothesis,the joint probability function of the cell frequencies is ,PP+产 1349:43 (1.2) Under the product multinomial model(2),the null hypothesis is that each of the c multinomials has the same probability vector,(P1+,...,P)say.Under this hypothesis and model,the joint probability function of the cell frequencies is ) 州 19 (1,3) KojouyaL Finally,it may happen that both sets of margins are fixed. One example of where this might occur in practice is the follo- wing.Suppose N individuals belonging to c classes (ethnic groups,for example)are hired for N summer jobs which fall into r categories.Both the number of individuals of each class and the number of jobs of each category are fixed.The hypothesis of independence in this context is the hypothesis that individuals are assigned to different job categories without regard to their papeojuMo
ALALOUF is a chi square under eliher one of rhe foilowing models: (lj ?.! is fixed and the x 's have a joint multinomial distribution; (2) ij *L. =- - - LL~C COI;C~~FI~~ iotais Ti, 9.. j[* p_;:<p_& fOr j z i- ? cj t-~ i C random vectors (xlj,. . . ,X .) are independent multinomials. rj Specifically, iet (p.. i = 1, . . . , r, j = 1, . ..: c} denote 1J ' the cell probabilities under the multi~oxial iiiodel p,, = A, Under the product multinomial mode:! 12), the nldl hvcotlrlesis . - is that each of the c multinomials has the same probability vector, (pl+, . . . ,p*) say. IJnder this hypothesis and model, the joint probability function of the cell frequencies is Finally, it may happen that - beth sets of margins are fixed. One example of where this might occur in practice is the following. Suppose N individuals belonging to c classes (ethnic groups, for example) are hired for N summer jobs which fall into r categories. Both the number of individuals of each class and th2 number of jobs of each category are fixed. The hypothesis of independence in this context is the hypothesis that individuals are assigned to different job categories without regard to their Downloaded by [China Science & Technology University] at 17:36 14 September 2015
CHI SQUARE TEST WITH BOTH MARGINS FIXED 31 class.Under this assumption,the joint probability fuction of the cell frequencies is N:可×时 (1.4) ij The distribution (1.4)is derived under the assumption of random assignment of people to jobs. The sequence of models (1.2),(1.3)and (1.4)may be obtai- ned by successive conditioning:(1.3)is the conditional distri- bution of the xij given the colum totals;and (1.4)is the conditional distribution under (1.3)given the row totals. Indeed,even (1.4)may be considered as a conditional distribu- tion under a model in which the x's are independent Poisson variables,given the sum N (see,e.g.,Haberman,1974). It is known that the asympLotic distribution of (1.1)is chi square under (1.2)or (1.3),as well as under the indepen- dent Poissons model. Under (1.4),does the statistic (1.1)have an asymptotic chi square distribution?The answer is that it does,provided each of the marginal frequencies tends to infinity in such a way that the relative marginal frequencies converge to numbers in (0,1).We prove this in the next section on the assumption that the vector of observations is asymptotically a multivariate normal,while the proof of the asymptotic normality is given in the appendix.Both proofs are reasonably self contained, relying on the central limit theorem and on some basic results on the distribution of quadratic forms.We point out in Section 3 that the same derivation leads very easily to the conclusion that (1.1)is asymptotically a chi square in the two other models:one multinomial with no constraints on the margins,and c multinomials with constraints only on the column totals.We papeoluMo also show how an overall chi square may be split into several in-
CHI SQUARE TEST WITH BOTH MARGINS FIXED 31 class. Under this assun?tton, the joint probability function of the cell freqcencies Is rho LLxb distribution ti ,,.-; I,? -- 4.: A~-;~.-.,; -- ~ii.?er the .~ssu-ption of random assignment of people to jobs. The sequence of mcdeis (l.2), (1.3) ard (1-4) may he obtained by svcc~ssivr condition in^: (1.3) is the conditional distribution of the xij given "Le column totals; and (1.h) is the - conditional distribution under (1.3) given the row totals, , - ~~d~~d, even ; i, 4) may k- gc. currDLa ----: ' iLc.L, ,.-,-, 2 z~ ii ccnditional <isiributlon under a model in which the xz _ 's are independent ?cisson "J variables, given the sum N (see, e.g,, Haberman, 1974). Ti -- 4, &= 5 : ..Ls.-,9<-L -.-,vr, i:nqh =LL.>% -- 'I- L!22 ~ &>~JL!LEi62ii~ c~l:gkr:L?~?537~ af ('::) i.~ , L - - chi square under (1.2) or (1,3), as well as under the independent Poissons model. Under (1.41, does the statistic (1.1) have an asymptotic chi square distribution'? The answer is that it does, provided each of the marginal frequencies tends to infinity in such a way that the reiative marginai frequencies converge to numbers in 1. We pr~ne this in the nPxt section on the assumption that the vector of observations is asymptotically a multivariate normai, while the prmf of the asymptotic normality is given in the appendix. Both proofs are reasonably self contained, relying on the centrai iimit theorem and on some basic results on the distribution of quadratic £oms. We point out ir-, Sectior, 3 that the same derivation leads very easily to the conclusion that (1.1) is asymptotically a chi square in the two other models: one multinomial with no constraints on the margins, and c multfnomials with constraints only en the coium. totalsi We also show how ari overall chi square may be split into several inDownloaded by [China Science & Technology University] at 17:36 14 September 2015
32 ALALOUF dependent chi squares to test various subhypotheses.Thus one proof covers a number of issues related to the traditional chi square test. 2.MAIN RESULT Consider a population of N elements partitioned into r c1 insscs v对th Froquene1es为4r'艺,Xt=N,Asampling i=1 scheme which leads to (1.4)is the following.Suppose a sample of size N is drawn without replacement,then a sample of size N,from the remaining elements of the population,and so on until a sample of size N is drawn from a population which by then contains only Ne elements.That is,the last sample is not random conditional on all previous samples.Then the probability function of the xij's is given by (1.4).Let =(1”,x',j=1,,c (2,1) 1195.5 be the j-th column of the contingency table,and let x=(xi,,x)'. (2.2) Let Kojouy aL n=(1,,7'=(x1N,,× P=(1,,P。)'=(N,,Ne/N (2.3) D=diag (#],...,) Finally,define V D-T T (2,4) 入Ko Then,using (1.4),we find that
scheme which leads to (1.4) is the following. Suppose a sample ~f size N, is draw- without replacement, t-hen a sample of size N2 from ihe remaining elements of the population, aid so oii until a sample of size N is drawn from a population which by then contains only N elements; Thar is, the last sample is not C -- Landom mnditlonal on a1.l -- .,e, .-'-.-- .,,,, samples. 'Then che probability be the j-th column of the contingency table, and let Let Finally, define TJzD=lT7r' Then, using (1,4), we find that Downloaded by [China Science & Technology University] at 17:36 14 September 2015