Multiple Regression

Multiple Regression

Files

Description

Multiple regression analysis is a statistical tool for understanding the relationship between two or more variables. Multiple regression involves a variable to be explained—called the dependent variable—and explanatory variables that are thought to produce or be associated with changes in the dependent variable. For example, a multiple regression analysis might estimate the effect of the number of years of work on salary. Salary would be the dependent variable; years of experience would be the explanatory variable. Multiple regression analysis is sometimes well suited to the analysis of date about competing theories in which there are several possible explanations for the relationship among a number of explanatory variables. Multiple regression typically uses a single dependent variable and several explanatory variables to assess the statistical data pertinent to these theories. In a case alleging sex discrimination in salaries, for example, a multiple regression analysis would examine not only sex, but also other explanatory variables such as education and experience. The employer-defendant might use multiple regression to argue that salary is a function of the employee’s education and experience, and the employee-plaintiff might argue that salary is also a function of the individual’s sex. Multiple regression also may be useful (1) in determining whether a particular effect is present; (2) in measuring the magnitude of a particular effect; and (3) in forecasting what a particular effect would be, but for an intervening event. In a patent infringement case, for example, a multiple regression analysis could be used to determine (1) whether the behavior of the alleged infringer affected the price of the patented product; (2) the size of the effect; and (3) what the price of the product would have been had the alleged infringement not occurred. Over the past several decades the use of regression analysis in court has grown widely. Although multiple regression analysis has been used most frequently in cases of sex and race discrimination and antitrust violation, other applications include census undercounts, voting rights, the study of the deterrent effect of the death penalty, rate regulation, and intellectual property. Multiple regression analysis can be a source of valuable scientific testimony in litigation. However, when inappropriately used, regression analysis can confuse important issues while having little, if any, probative value. In EEOC v. Sears, Roebuck & Company, in which Sears was charged with discrimination against women in hiring practices, the Seventh Circuit acknowledged that “[m]ultiple regression analyses, designed to determine the effect of several independent variables on a dependent variable, which in this case is hiring, are an accepted and common method of proving disparate treatment claims.” However, the court affirmed the district court’s findings that the “E.E.O.C.’s regression analyses did not ‘accurately reflect Sears’ complex, nondiscriminatory decision-making processes’” and that the “’E.E.O.C.’s statistical analyses [were] s flawed that they lack[ed] any persuasive value.’” Serious questions also have been raised about the use of multiple regression analysis in census undercount cases and in death penalty cases. Moreover, in interpreting the results of a multiple regression analysis, it is important to distinguish between correlation and causality. Two variables are correlated when the events associated with the variables occur more frequently together than one would expect by chance. For example, if higher salaries are associated with a greater number of years of work experience, and lower salaries are associated with fewer years of experience, there is a positive correlation between salary and number of years of work experience. However, if higher salaries are associated with less experience, and lower salaries are associate with more experience, there is a negative correlation between the two variables. A correlation between two variables does not imply that one event causes the second. Therefore, in making causal inferences, it is important to avoid spurious correlation. Spurious correlation arises when two variables are closely related but bear no causal relationship because they are both caused by a third, unexamined variable. For example, there might be a negative correlation between the age of certain skilled employees of a computer company and their salaries. One should not conclude from this correlation that the employer has necessarily discriminated against the employees on the basis of their age. A third, unexamined variable, such as the level of the employees’ technological skills, could explain difference in productivity and, consequently, differences in salary. Or, consider a patent infringement case in which increased sales of an allegedly infringing product are associate with a lower price of the patented product. This correlation would be spurious if the two products have their own noncompetitive market niches and the lower price is due to a decline in the production costs of the patented product. Pointing to the possibility of a spurious correlation should not be enough to dispose of a statistical argument, however. It may be appropriate to give little weight to such an argument absent a showing that the alleged spurious correlation is either qualitatively or quantitatively substantial. For example, a statistical showing of a relationship between technological skills and worker productivity might be required in the age discrimination example above. Causality cannot be inferred by data analysis alone; rather, one must infer that a causal relationship exists on the basis of an underlying causal theory that explains the relationship between the two variables. Even when an appropriate theory has been identified, causality can never be inferred directly. One must also look for empirical evidence that there is a causal relationship. Conversely, the fact that two variables are correlated does not guarantee the existence of a relationship; it could be that the model—a characterization of the underlying causal theory—does not reflect the correct interplay among the explanatory variables. In fact, the absence or correlation does not guarantee that a causal relationship does not exist. Lack of correlation could occur if (1) there are insufficient data; (2) the date come from inaccurate measurements; (3) the data do not allow multiple causal relationships to be sorted our; or (4) the model is specified wrongly because of the omission of a variable or variables related to the variable of interest. There is a tension between any attempt to reach conclusions with near certainty and the inherently probabilistic nature of multiple regression analysis. In general, statistical analysis involves the formal expression of uncertainty in terms of probabilities. That statistical analysis generated probabilities that there are relationships should not be seen in itself as an argument against the use of statistical evidence. The only alternative might be to use less reliable anecdotal evidence. This chapter addresses a number of procedural and methodological issues that are relevant in considering the admissibility of, and weight to be accorded to, the findings of multiple regression analyses. It also suggests some standards of reporting and analysis that an expert presenting multiple regression analyses might be expected to meet. Section 6-2 discussed research design—how the multiple regression framework can be used to sort out alternative theories about a case. Section 6-3 concentrates on the interpretation of the multiple regression results, from both a statistical and a practical point of view. Section 6-4 briefly discusses the qualifications of experts. Section 6-5 emphasizes procedural aspects associated with the use of the data underlying regression analyses. Finally, the Appendix delves into the multiple regression framework in further detail; it also contains a number of specific examples that illustrate the application of the technique.

Source Publication

Modern Scientific Evidence: The Law and Science of Expert Testimony

Source Editors/Authors

David L. Faigman, David H. Kaye, Michael J. Saks, Joseph Sanders

Publication Date

2002

Edition

2

Volume Number

1

Multiple Regression

Share

COinS