The following are some comments by Richard Reyment who has worked on problems in this area: "Is this useful. Generally speaking, biologists know abolsutely nothing about the geometry of the simplex, and this is also true of a great many statisticians. For the geomathematical fraternity, however, the subject is of great importance because it is often connected to analyses involving large-scale economic aspects where an inappropriate analysis can waste great sums of money. G G Simpson was among the first biologists to point out that ratios cannot be used in correlation exercises such as indictaed in the Course 1 agenda. Originally, it was Karl Pearson who in 1898 proved that ratios induce spurious correlations. This was in relation to so-called standardised data-vectors. Of recent years geomathematicians have taken the subject much further, following the results of the statistician John Aitchison, who proved that correlation coefficients are not defined in simplex space, that is the space in which percentages, frequencies etc lie. This is the outcome of the fact that such data have a constant sum and any alteration introduced into a matrix of compositions changes the sum in a manner that is beyond control. This is not a problem for open-space data of course. Ref. John Aitchison: The Statistical analysis of Compositional Data; Chapman and Hall (1986), slightly revised version reprinted in 2003. Hence, multivariate analyses involving compositional data must be made using the appropriate algebra for distributions on the simplex. Applying the "open-space" standard version can only lead to incorrect results. Since the original work was published by Aitchison, the Applied Mathematicians Professors Vera Pawlowsky-Glahn and Juan José Egozcue have raised the bar several levels in that they introduced the concept of a finite dimensional Hilbert Space into the analysis of simplicial geometry. This leads to very elegant solutions. An indispensible reference is the recently published volume edited by A. Buccianti, G. Mateu-Figueras and V. Pawlowsky-Glahn COMPOSITIONAL DATA-ANALYSIS IN THE GEOSCIENCES: FROM THEORY TO PRACTICE Published by the Geological Society of London, Special Publication No 264, 2006 (212 pp.) http://www.geolsoc.org.uk/bookshop Best wishes Richard A. Reyment" ------------------------ F. James Rohlf, Distinguished Professor Ecology & Evolution, Stony Brook University www: http://life.bio.sunysb.edu/ee/rohlf > -----Original Message----- > From: Classification, clustering, and phylogeny estimation > [mailto:[log in to unmask]] On Behalf Of Richard Wright > Sent: Saturday, September 27, 2008 2:05 AM > To: [log in to unmask] > Subject: using ratios in MV correlational analysis > > There is a scattered literature on the dangers, or otherwise, of using > ratios in correlational analyses. > > I have read what looks like a non-obfuscatory paper on this topic by > Firebaugh and Gibbs "User's Guide to Ratio Variables" from American > Sociological Review, Vol.50, No.5 (1985) pp.713-722. > > On page 721 the authors state: "Avoid mixed methods (part ratio, part > component). If Z is controlled by division rather than by > residualization, all of the other variables should be divided by Z. > Should only some of the variables by divided by Z, the effect of Z is > 'controlled' for some variables and not for others, and a defensible > interpretation of the results is difficult." > > The reason for my interest is that I am trying to evaluate a > morphometric paper that does linear discriminant analysis on a mixture > of measurements and ratios derived from those same measurements. For > example the analysis includes (A) Length as well as Height/Length and > (B) Height and Breadth as well as Height/Breadth and Height/Length. > > This paper seems to be an example of the 'mixed method' that Firebaugh > and Gibbs warn against, where data are part ratio, part measurement, > and spurious correlations are introduced into the data. > > So my first question is whether I am correct in this interpretation. > > My second question also concerns ratios. > > In his Multivariate Statistical Methods, 2nd ed. 1994, B.F.J. Manly > suggests controlling for the effects of absolute size difference in a > PCA of pots (goblets) by expressing the measurements as "a proportion > of the sum of all measurements on that goblet." > > Given that each variable is divided by the same sum, this example of > the use of ratios seems to be a case that Firebaugh and Gibbs would > not frown on. > > I shall welcome any comments on these questions and any pointers to > relevant literature. > > Richard > > ---------------------------------------------- > CLASS-L list. > Instructions: http://www.classification- > society.org/csna/lists.html#class-l ---------------------------------------------- CLASS-L list. Instructions: http://www.classification-society.org/csna/lists.html#class-l