Psychology 557

Psychometric Methods

Fall 2007

Instructor Patrick E. McKnight, Ph.D.
Office David King 2064/2065
Office Hours Tues 3:30pm-4:30pm and by appointment
Phone (703) 993–8292
Class Location DK 2072
Class Date/Time Tuesday 4:30pm-7:10pm
Class website http://mres.gmu.edu/PSYC557/
Important Dates Please see GMU academic calendar
Syllabus PDF format

Overview

The following course is a survey of important measurement topics within social and behavioral science. Typical measurement classes tend to be heavily mathematical and overburden students on details they so rarely need. In sharp constrast, this course covers similar material to all other graduate courses in social science measurement but at the conceptual level. The course is technical in nature only in the sense that measurement is a technical aspect of all critical inquiry.

Prerequisites

Due to the nature of the material and the relevance to research, I assume all students will have successfully completed a graduate course in statistics \textbf{and} a graduate course in research methods. I do not intend to cover in great detail the statistical models underlying measurement tools but it is essential to understand statistical procedures in general and to appreciate how measurement fits into the process of research.

Course Requirements and Grading

The course covers many topics in only a semester so the reading requirements might be more than what most graduate students experience in other courses. Let me emphasize this point - this course values thinking about measurement in general and therefore requires an above average amount of reading. I expect all students to attend every class, complete all the readings prior to the class meeting, and come prepared to discuss the topic as outlined. In exchange for these requirements, I do not require written papers nor exams. One brief assignment will be discussed the first day of the course and will require some work outside class. Grades, therefore, are determined based upon class discussion and this brief assignment.

Readings and Required Texts

The readings will be made available in electronic format. Each article is scanned into an Adobe Acrobat file (i.e., pdf file). The quality of some readings is not great but all articles are readable in the format they are distributed. Some students may prefer to get the original articles from the source journals but that is left to each student to decide. The electronic versions are distributed at no charge to students enrolled in the course.

The following three books are required for the course. They may be purchased either at the bookstore or online. Readings from these books are not required until later in the semester, however, you might want to order the books at the beginning of the semester. I would urge you not to rely on the library for your course materials. Other students and faculty members may recall these books at any time and it might interfere with you completing the readings for the course.

Crocker, L., and Algina, J. (1986). Introduction to classical and modern test theory, Classical test theory, and Reliability and the classical true score model.

Shavelson, R.J., and Webb, N.M. (1991). Generalizability theory: a primer. Newbury Park, CA: Sage Publications.

Kraemer, H.C. (1992). Evaluating medical tests:objective and quantitative guidelines. Newbury Park, CA: Sage Publications.

Topic Outline and Readings

1: Introduction and orientation

The first class meeting covers the brief, semester-long assignment. Please come prepared to discuss your interests in the course and your expectations for learning measurement.

2: Logic of measurement

The following articles address the general topic and logic of measurement. In particular, the concept of measurement and what measures ought to deliver is the focus of these readings and our in-class discussion.

Jackson, D.N. (1971). The dynamics of structured personality tests. Psychological Review, 78, 229-248.

Dawes, R.M. (1989). Measurement models for rating and comparing risks: the context of AIDS. In L. Sechrest, H. Freeman, and A. Mulley (Eds.) Health services research methodology: a focus on AIDS. Washington, D.C.: DHHS, National Center for Health Services Research and Health Care Technology Assessment, 31-44.

Webb, E., Campbell, D.T., Schwartz, R.D., Sechrest, L., and Grove, J. (1981). Nonreactive measures in the social sciences. Boston: Houghton-Mifflin, 1-40, 41-77, 78-87, 197-240, 275-330

Nicholls, J.G., Licht, B.G., and Pearl, R.A. (1982). Some dangers of using personality questionnaires to study personality. Psychological Bulletin, 92, 572-580.

3: Concepts of measurement

Measurement, like all technical areas within science, contains many specific terms. Frequently we perjoratively call these terms jargon but in the case of measurement, the terms convey important details worth knowing. The following readings cover the use and misuse of many measurement terms.

Rotter, J.B. (1990). Internal versus external control of reinforcements: a case history of a variable. American Psychologist, 45, 489-493.

Messick, S. (1995). Validity of psychological assessment: validation of inferences from persons’ responses and performances and scientific inquiry into score meaning. American Psychologist, 50, 741-749.

Houts, A.C., Cook, T.D., and Shadish, W.R. (1986). The person-situation debate: A critical multiplist perspective. Journal of Personality, 54, 52-105.

Wallace, J. (1966). An abilities conception of personality. The American Psychologist, 21, 132-138.

In addition to the preceding, you should read the following paper, which we will use as background and discuss on a continuing basis during the remainder of the course. This paper is of unusual importance, although it has, sadly in my estimation, fallen into neglect.

Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3, 635-694.

4: Statistics for measurement

This session will be devoted to the discussion of a variety of statistical procedures and tests that are especially useful in measurement research and development and that are not always covered adequately in more general statistics courses.

Ozer, D.J. (1985). Correlation and the coefficient of determination. Psychological Bulletin, 97, 307-315.

Rodgers, J.L., and Nicewander, W.A. (1988). Thirteen ways to look at the correlation coefficient. The American Statistician, 42, 59-66.

Sapolsky, R.M. (1987). The case of the falling nightwatchmen. Discover, July, 42-45.

Kaufman, A.S. (1972). Restriction of range: questions and answers. Test Service Bulletin, No. 59, The Psychological Corporation.

Test Service Bulletin, (1953). Better than chance. The Psychological Corporation, No. 45, 8-12.

Test Service Bulletin, (1954). The correction for guessing. The Psychological Corporation, No. 46, 13-16.

Test Service Bulletin. (1956). How accurate is a test score? The Psychological Corporation, No. 50.

5: Scaling and scoring responses

Frequently overlooked or downright neglected, scaling is an essential part of measurement. We will discuss the following articles in the context of scaling and scoring instruments. Pay attention to the broader perspective when reading these articles. The important matters are not necessarily in the details but in the points with the greatest implications for our approach to measurement.

Diamond, J. (1987). Soft sciences are often harder than hard sciences. Discover, Aug., 34-39.

Borgatta, E.F., and Bohrnstedt, G.W. (1981). Level of measurement: once over again. In E.F. Borgatta and G.W. Bohrnstedt (Eds.) Social measurement: current issues. Beverly Hills, CA: Sage Publications, 23-37.

Buss, D.M., and Ozer, D.J. (1980). Inference and the interpretation of test scores. American Psychologist, 35, 475-476.

Sechrest, L., McKnight, P.E., and McKnight, K.M. (1996). Calibration of measures for psychotherapy outcome studies. American Psychologist,51, 1065-1071.

Ware, J.E., and Sherbourne, C.D. (1992). The MOS 36-item Short-Form Health Survey (SF-36): I. Conceptual framework and item selection. Medical Care, 30, 473-483.

6: Classical test theory

I begin covering specific measurement theory today. Classical test theory (CTT) is the oldest and longest-standing theory in social science. While the theory and methods has grown many detractors, it remains the most widely used theory.

Crocker, L., and Algina, J. (1986). Introduction to classical and modern test theory, Classical test theory, and Reliability and the classical true score model.

Rogosa, D.R., and Willett, J.B. (1983). Demonstrating the reliability of the difference score in the measurement of change. (1983). Journal of Educational Measurement, 20, 335-343.

Lachar, D., and Gruber, C.P. (1993). Development of the Personality Inventory for Youth: a self-report companion to the Personality Inventory for Children. Journal of Personality, 61, 81-98.

Cortina, J.M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98-104.

7: Classical test theory

We continue our discussion of CTT but focus now on the implications of the methods and how they might lead us to appreciate measures.

Cronbach, L.J., and Meehl, P.E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281-302.

Miller, M.B. (1995). Coefficient alpha: a basic introduction from the perspective of classical test theory and structural equation modeling. Structural Equation Modeling, 2, 255-273.

Sechrest, L. (1963). Incremental validity: a recommendation. Educational and Psychological Measurement, 23, 153-158.

8: Generalizability theory

In this class session, I will introduce the topic of generalizability. It is important that you read the following book chapters carefully. G-theory, as it is called, is not an easy topic but these authors do the best job at conveying this complex material.

Shavelson, R.J., and Webb, N.M. (1991). Generalizability theory: a primer. Newbury Park, CA: Sage Publications. pp. 1-45.

9: Generalizability theory

We will continue our discussion of G-theory during this class. An in-class exercise will offer a bit more insight into the workings of the procedure.

Shavelson, R.J., and Webb, N.M. (1991). Generalizability theory: a primer. Newbury Park, CA: Sage Publications. pp. 83–98.

Chambers, L.W., Haight, M., Norman, G., and MacDonald, L. (1987). Sensitivity to change and the effect of mode of administration on health status measurement. Medical Care, 25, 470-479.

Slack, M.K., Sabers, D., Larson, L.N., McGhan, W.F., and Bootman, J.L. (1992). Reliability indexes for use in educational experiments: Cronbach’s alpha versus a G study. Journal of Pharmacy Teaching, 3, 33-48.

10: Factor Analysis and the Multitrait-Multimethod Matrix

Factor Analysis: Perhaps the oldest method in any measurement area, factor analysis remains a vital force in evaluating social science instruments. The following articles provide you with an introduction and application of factor analysis.

Goldberg, L.R., and Digman, J.M. (1994). Revealing structure in the data: principles of exploratory factor analysis. In. Strack, S. (Ed.). Differentiating normal and abnormal personality. New York: Springer Publishing Co.

Figueredo, A.J., Ferketich, S.L., and Knapp, T.R. (1991). More on MTMM: the role of confirmatory factor analysis. Research in Nursing and Health, 14, 387-391.

MTMM: This topic is one of the more vexing for most graduate students. I would encourage all students to carefully read through the first article before reading the second. The devils are in the details regarding MTMM but you must know the purpose of the procedure before you can understand - at any depth that is - the merits of different approachs to analyzing MTMM data.

Campbell, D.T., and Fiske, D.W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.

Ferketich, S.L., Figueredo, A.J., and Knapp, T.R. (1991). The multitrait-multimethod approach to construct validity. Research in Nursing and Health, 14, 315-320.

11: Latent Response Models: Item response theory and Rasch models

We have now come to the point in the course where we cover what is termed ``modern measurement theory.’‘ As I noted previously, CTT is the longest-standing theory but item response theory (IRT) is the successor to CTT in many measurement domains. I will introduce IRT in some detail, however, for you to understand my lecture you will need to read the following article very carefully.

Embretson, S.E. (1999). Issues in the Measurement of Cognitive Abilities. In S.E. Embretson and S. L. Hershberger (Eds) New Rules of Measurement. Mahwah, NJ: Lawrence Erlbaum.

Drasgow, F., and Hulin, C.L. (1990). Item response theory. In M.D. Dunnette (Ed.). Handbook of industrial and organizational psychology, Vol. 1. Palo Alto: Consulting Psychologists Press, Inc., 577-635.

12: Applications of Latent Response Models

We continue our discussion of latent response models with these examples.

King, D.W., King, L., Fairbank, J.A., Schlenger, W.E., and Surface, C.R. (1993). Enhancing the precision of the Mississippi Scale for Combat-Related Posttraumatic Stress Disorder: an application of Item Response Theory. Psychological Assessment, 5, 457-471.

Harris, M.M., and Sackett, P.R. (1987). A factor analysis and item response theory analysis of an employee honesty test. Journal of Business and Psychology, 2, 122-135.

13: Measurement for decisions

The readings for this week is an entire book. While this might appear daunting, it is not. The book is short - fewer than 250 pages - and easy-to-read. Kraemer does a masterful job at conveying the topics and she provides excellent examples. Do not feel compelled to learn all the nuances of these methods - read the book for description and clarity.

Kraemer, H.C. (1992). Evaluating medical tests:objective and quantitative guidelines. Newbury Park, CA: Sage Publications.

We finish the course with a classic article by Paul Meehl and his colleague Al Rosen. This paper extends Kraemer’s descriptions of signal detection theory but provides a broader context and specific examples within the clinical psychology domain.

Meehl, P.E., and Rosen, A. (1955). Antecedent probability and the efficiency of psychometric signs, patterns, or cutting scores. Psychological Bulletin, 52, 194-216.

TBA: Meta-analysis and validity generalizability

If time permits, we will cover meta-analysis and validity generalizability. Measurement is not always about measurement theory. At times, we might be interested in collecting data that serves as an indicator of other unobservable entities such as research findings. Meta-analysis is one method for gathering and making sense out of multiple research findings.

Schmidt, F.L., Gast-Rosenberg, I., and Hunter, J.E. (1980). Validity generalization for computer programmers. Journal of Applied Psychology, 63, 643-661.

Schmidt, F.L. (1992). What do the data really mean? Research findings, meta-analysis, and cumulative knowledge in psychology. American Psychologist, 47, 1173-1181.