The closed-ended questions formats: The nature, validity and reliability of their responses

Teresa Garcia-Marques, Rui Bártolo-Ribeiro


This paper aims to support researchers in the decisions about the use of closed answer questions in
their questionnaires by critically reviewing the literature regarding the implications of such decisions
for the nature, validity and reliability of the measure. This literature’s review provides arguments for
the researcher to decide on how best to operationalize their variables through a closed response.
Arguments presented support decision-making in the construction of response options to be provided
to the respondent, and the type of scale to be used: graphical or non-graphical; categories or continuous
assessment; with 3 or more points; with or without labels and in this case, with what kind of labels,
etc. Other topics are illustrated to be considered for construction of a response scale analyzing the
specific case in which “perceived frequencies” are measured.


Assessment scales, Questionnaires, Bias.


Abend, R., Dan, O., Maoz, K., Raz, S., & Bar-Haim, Y. (2014). Reliability, validity and sensitivity of a computerized visual analog scale measuring state anxiety. Journal of Behavior Therapy and Experimental Psychiatry, 45, 447-453.

Alwin, D. F. (1991). Research on survey quality. Sociological Methods & Research, 20, 3-29.

Alwin, D. F. (1997). Feeling thermometers versus 7-point scales: Which are better?. Sociological Methods & Research, 25, 318-340.

Alwin, D. F., & Krosnick, J. A. (1991). The reliability of survey attitude measurement: The influence of question and respondent attributes. Sociological Methods & Research, 20, 139-181.

Andersen, E. B. (1973). Conditional inferences for multiple-choice questionnaires. British Journal of Mathematical and Statistical Psychology, 26, 31-44.

Andrews, F. M., & Withey, S. B. (1976). Social indicators of well-being: The development and measurement of perceptual indicators. New York: Plenum.

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573.

Averbuch, M., & Katzper, M. (2004). Assessment of visual analog versus categorical scale for measurement of osteoarthritis pain. The Journal of Clinical Pharmacology, 44, 368-372.

Bartram, P., & Yelding, D. (1973). The development of an empirical method of selecting phrases used in verbal rating scales: A report on a recent experiment. Journal of the Market Research Society, 15, 151-156.

Berman, D. R., & Stookey, J. A. (1980). Adolescents, television, and support for government. Public Opinion Quarterly, 44, 330-340.

Champney, H., & Marshall, H. (1939). Optimal refinement of the rating scale. Journal of Applied Psychology, 23, 323-331.

Churchill Jr, G. A., & Peter, J. P. (1984). Research design effects on the reliability of rating scales: A meta-analysis. Journal of Marketing Research, 21, 360-375. Retrieved from

Chyung, S. Y. Y., Roberts, K., Swanson, I., & Hankinson, A. (2017). Evidence Based Survey Design: The Use of a Midpoint on the Likert Scale. Performance Improvement, 56(10), 15-23.

Cicchetti, D. V., Showalter, D., & Tyrer, P. J. (1985). The effect of number of rating scale categories on levels of inter-rater reliability: A Monte-Carlo investigation. Applied Psychological Measurement, 9, 31-36.

Collins, L. M., Graham, J. W., Hansen, W. B., & Johnson, C. A. (1985). Agreement between retrospective accounts of substance use and earlier reported substance use. Applied Psychological Measurement, 9, 301-309.

Colman, A. M., Norris, C. E., & Preston, C. C. (1997). Comparing rating scales of different lengths: Equivalence of scores from 5-point and 7-point scales. Psychological Reports, 80, 355-362.

Conrad, F. G., Brown, N. R., & Cashman, E. R. (1998). Strategies for estimating behavioural frequency in survey interviews. Memory, 6, 339-366.

Converse, J. M., & Presser, S. (1986). Survey questions: Handcrafting the standardized questionnaire. Thousand Oaks: Sage.

Cork, R. C., Isaac, I., Elsharydah, A., Saleemi, S., Zavisca, F., & Alexander, L. (2004). A comparison of the verbal rating scale and the visual analog scale for pain assessment. The Internet Journal of Anesthesiology, 8, 23-38.

Couper, M. P., Tourangeau, R., Conrad, F. G., & Singer, E. (2006). Evaluating the effectiveness of visual analog scales: A web experiment. Social Science Computer Review, 24, 227-245.

Cox III, E. P. (1980). The optimal number of response alternatives for a scale: A review. Journal of Marketing Research, 17, 407-422.

Cummins, R. A., & Gullone, E. (2000, March). Why we should not use 5-point Likert scales: The case for subjective quality of life measurement. Proceedings of 2nd International Conference on Quality of Life in Cities (pp. 74-93). Singapore: National University of Singapore.

Flynn, D., van Schaik, P., & van Wersch, A. (2004). A comparison of multi-item Likert and visual analogue scales for the assessment of transactionally defined coping function. European Journal of Psychological Assessment, 20, 49-59.

Funke, F. (2016). A web experiment showing negative effects of slider scales compared to visual analogue scales and radio button scales. Social Science Computer Review, 34, 244-254.

Funke, F., & Reips, U. D. (2012). Why semantic differentials in web-based research should be made from visual analogue scales and not from 5-point scales. Field Methods, 24, 310-327.

Givon, M. M., & Shapira, Z. (1984). Response to rating scales: a theoretical model and its application to the number of categories problem. Journal of Marketing Research, 21, 410-419.

Grapentine, T. (2003). Scales: Still problematic 10 years later. Marketing Research, 15, 45-46.

Hambleton, R. K., van der Linden, W. J., & Wells, C. S. (2010). IRT Models for the Analysis pf Polytomously Scored Data: Brief and Selected History of Model Building Advances. In M. L. Nering & R. Ostini (Eds.), Handbook of polytomous item response theory models (pp. 21-42). New York, NY: Routledge, Taylor & Francis Group.

Hand, D. J. (1996). Statistics and the theory of measurement. Journal of the Royal Statistical Society, 159, 445-492. Retrieved from

Hartley, J., Trueman, M., & Rodgers, A. (1984). The effects of verbal and numerical quantifiers on questionnaire responses. Applied Ergonomics, 15, 149-155.

Hayes, M. H. S., & Patterson, D. G. (1921). Experimental development of the graphic rating method. Psychological Bulletin, 18, 98-99.

Hofmans, J., & Theuns, P. (2008). On the linearity of predefined and self-anchoring Visual Analogue Scales. British Journal of Mathematical and Statistical Psychology, 61, 401-413.

Huskisson, E. C. (1974). Measurement of pain. The Lancet, 304(7889), 1127-1131.

Huttenlocher, J., Hedges, L. V., & Bradburn, N. M. (1990). Reports of elapsed time: Bounding and rounding processes in estimation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 196-213.

Jenkins, G. D., & Taber, T. D. (1977). A Monte Carlo study of factors affecting three indices of composite scale reliability. Journal of Applied Psychology, 62, 392-398.

Kennedy, R., Riquier, C., & Sharp, B. (1996). Practical applications of correspondence analysis to categorical data in market research. Journal of Targeting Measurement and Analysis for Marketing, 5, 56-70.

Kiess, H. O., & Bloomquist, D. W. (1985). Psychological research methods: A conceptual approach. Needham Heights, MA, US: Allyn & Bacon.

Kline, T. J. B. (2005). Psychological testing: A practical approach to design and evaluation. Thousand Oaks, California: SAGE Publications, Inc.

Krosnick, J. A., & Alwin, D. F. (1989). Aging and susceptibility to attitude change. Journal of Personality and Social Psychology, 57, 416-425.

Krosnick, J. A., & Presser, S. (2010). Questionnaire design. In J. D. Wright & P. V. Marsden (Eds.), Handbook of Survey Research (2nd ed.). West Yorkshire, England: Emerald Group.

Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22, 5-55.

Lindzey, G. E., & Guest, L. (1951). To repeat-check lists can be dangerous. Public Opinion Quarterly, 15, 355-358.

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.

McCormack, H. M., David, J. D. L., & Sheather, S. (1988). Clinical applications of visual analogue scales: A critical review. Psychological Medicine, 18, 1007-1019.

McKelvie, S. J. (1978). Graphic rating scales – How many categories?. British Journal of Psychology, 69, 185-202.

Meek, P. M., Sennott-Miller, L., & Ferketich, S. L. (1992). Focus on psychometrics scaling stimuli with magnitude estimation. Research in Nursing & Health, 15, 77-81.

Menon, G., Raghubir, P., & Schwarz, N. (1995). Behavioral frequency judgments: An accessibility-diagnosticity framework. Journal of Consumer Research, 22, 212-228.

Michell, J. (1986). Measurement scales and statistics: A clash of paradigms. Psychological Bulletin, 100, 398-407. Retrieved from

Moxey, L. M., & Sanford, A. J. (1992). Context effects and the communicative functions of quantifiers: Implications for their use in attitude research. In. N. Schwarz & S. Sudman (Eds.), Context effects in social and psychological research (pp. 279-296). New York, NY: Springer.

Myers, J. H., & Warner, W. G. (1968). Semantic properties of selected evaluation adjectives. Journal of Marketing Research, 5, 409-412.

Myles, P. S., & Urquhart, N. (2005). The linearity of the visual analogue scale in patients with severe acute pain. Anaesthesia and Intensive Care, 33, 54-58.

Myles, P. S., Troedel, S., Boquest, M., & Reeves, M. (1999). The pain visual analog scale: Is it linear or nonlinear?. Anesthesia & Analgesia, 89, 1517-1520. Retrieved from

Neuman, L., & Neuman, Y. (1981). Comparison of six lengths of rating scales: Students attitude toward instruction. Psychological Reports, 48, 399-404.

Noelle-Neumann, E. (1970). Wanted: Rules for wording structured questionnaires. Public Opinion Quarterly, 34, 191-201. Retrieved from

Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in Health Sciences Education, 15, 625-632.

Oaster, T. R. F. (1989). Number of alternatives per choice point and stability of Likert-type scales. Perceptual and Motor Skills, 68, 549-550.

Osgood, C. E., Suci, G. J., & Tannenbaum, P. M. (1957). The measurement of meaning. Urbana: University of Illinois Press.

Osinski, I. C., & Bruno, A. S. (1998). Categorías de respuesta en escalas tipo Likert. Psicothema, 10, 623-631.

Paul-Dauphin, A., Guillemin, F., Virion, J. M., & Briançon, S. (1999). Bias and precision in visual analogue scales: A randomized controlled trial. American journal of epidemiology, 150, 1117-1127.

Peabody, D. (1962). Two components in bipolar scales: Direction and extremeness. Psychological Review, 69, 65-73. Retrieved from

Pepper, S. (1981). Problems in the quantification of frequency expressions. In D. W. Fiske (Ed.), Problems with language imprecision: New directions for methodology of social and behavioral science (Vol. 9, pp. 25-41). San Francisco: Jossey-Bass.

Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta psychologica, 104, 1-15.

Ramsay, J. (1991). Kernal smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56, 611-630.

Rasmussen, J. L. (1989). Analysis of Likert-scale data: A reinterpretation of Gregoire and Driver. Psychological Bulletin, 105, 167-170.

Rausch, M., & Zehetleitner, M. (2014). A comparison between a visual analogue scale and a four-point scale as measures of conscious experience of motion. Consciousness and Cognition, 28, 126-140. Retrieved from

Reips, U. D., & Funke, F. (2008). Interval-level measurement with visual analogue scales in internet-based research: VAS Generator. Behavior Research Methods, 40, 699-704.

Revill, S. I., Robinson, J. O., Rosen, M., & Hogg, M. I. J. (1976). The reliability of a linear analogue for evaluating pain. Anaesthesia, 31, 1191-1198.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika, Monograph Supplement, 34(4, Pt. 2), 100.

Schriesheim, C. A., & Castro, S. L. (1996). Referent effects in the magnitude estimation scaling of frequency expressions for response anchor sets: An empirical investigation. Educational and Psychological Measurement, 56, 557-569.

Schriesheim, C. A., & Novelli Jr, L. (1989). A comparative test of the interval-scale properties of magnitude estimation and case III scaling and recommendations for equal-interval frequency response anchors. Educational and Psychological Measurement, 49, 59-74.

Schriesheim, C., & Schriesheim, J. (1974). Development and empirical verification of new response categories to increase the validity of multiple response alternative questionnaires. Educational and Psychological Measurement, 34, 877-884.

Schuman, H., & Scott, J. (1987). Problems in the use of survey questions to measure public opinion. Science, 236(4804), 957-959.

Schwarz, N., & Hippler, H.-J. (1990). Response alternatives: The impact of their choice and presentation order (ZUMA-Arbeitsbericht, 1990/08). Mannheim: Zentrum für Umfragen, Methoden und Analysen – ZUMA. Retrieved from

Schwarz, N., Hippler, H. J., Deutsch, B., & Strack, F. (1985). Response scales: Effects of category range on reported behavior and comparative judgments. Public Opinion Quarterly, 49, 388-395.

Schwarz, N., Knäuper, B., Hippler, H.-J., Noelle-Neumann, E., & Clark, L. (1991). Rating scales numeric values may change the meaning of scale labels. Public Opinion Quarterly, 55, 570-582.

Schwarz, N., & Oyserman, D. (2001). Asking questions about behavior: Cognition, communication, and questionnaire construction. The American Journal of Evaluation, 22, 127-160.

Schwarz, N., & Scheuring, B. (1992). Frequency-reports of psychosomatic symptoms: What respondents learn from response alternatives. Zeitschrift fur Klinische Psychologie, 21, 197-208.

Schwarz, N., Strack, F., Müller, G., & Chassein, B. (1988). The range of response alternatives may determine the meaning of the question: Further evidence on informative functions of response alternatives. Social Cognition, 6, 107-117.

Schwarz, N. E., & Sudman, S. E. (1996). Answering questions: Methodology for determining cognitive and communicative processes in survey research. San Francisco, CA: Jossey-Bass.

Srinivasan, V., & Basu, A. K. (1989). The metric quality of ordered categorical data. Marketing Science, 8, 205-230.

Stevens, S. S. (1975). Psychophysics. New Jersey: Transaction Publishers.

Strack, F., & Martin, L. L. (1987). Thinking, judging, and communicating: a process account of context effects in attitude surveys. In H.-J. Hippler, N. Schwarz, & S. Sudman (Eds.), Social information processing and survey methodology: Recent research in psychology. New York, NY: Springer. Retrieved from

Sudman, S., Bradburn, N. M., & Schwarz, N. (1996). Thinking about answers: The application of cognitive processes to survey methodology. San Francisco, CA, US: Jossey-Bass.

Svensson, E. (2000). Comparison of the quality of assessments using continuous and discrete ordinal rating scales. Biometrical Journal, 42, 417-434.

Swait, J., & Adamowicz, W. (2001). The influence of task complexity on consumer choice: A latent class model of decision strategy switching. Journal of Consumer Research, 28, 135-148.

Thurstone, L. L. (1928). Attitudes can be measured. American Journal of Sociology, 33, 529-554.

Tourangeau, R. (1984). Cognitive sciences and survey methods. In T. B. Jabine, M. Straf, J. Tanur, & R. Tourangeau (Eds.), Cognitive aspects of survey methodology. Building a bridge between disciplines: Report of the advanced research seminar on cognitive aspects of survey methodology (pp. 73-100). Washington, DC: National Academy Press.

Tourangeau, R., Rips, L. J., & Rasinski, K. (2000). The psychology of survey response. Cambridge: Cambridge University Press.

Townsend, J. T, & Ashby, F. G. (1984). Measurement scales and statistics: The misconception misconceived. Psychological Bulletin, 96, 394-401. Retrieved from

van der Linden, W. J., & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York: Springer.

Viswanathan, M., Sudman, S., & Johnson, M. (2004). Maximum versus meaningful discrimination in scale response: Implications for validity of measurement of consumer perceptions about products. Journal of Business Research, 57, 108-124. Retrieved from

Ware, J. E., & Gander, B. (1994). The SF-36 health survey: Development and use in mental health research and the IQOLA project. International Journal of Mental Health, 23, 49-73. Retrieved from

Weathers, D., Sharma, S., & Niedrich, R. W. (2005). The impact of the number of scale points, dispositional factors, and the status quo decision heuristic on scale reliability and response accuracy. Journal of Business Research, 58, 1516-1524.

Wewers, M. E., & Lowe, N. K. (1990). A critical review of visual analogue scales in the measurement of clinical phenomena. Research in Nursing & Health, 13, 227-236.

Wildt, A. R., & Mazis, M. B. (1978). Determinants of scale response: Label versus position. Journal of Marketing Research, 15, 261-267.

Wills, C. E., & Moore, C. F. (1994). Focus on psychometrics. A controversy in scaling of subjective states: Magnitude estimation versus category rating methods. Research in Nursing & Health, 17, 231-237.

Winkielman, P., Knäuper, B., & Schwarz, N. (1998). Looking back at anger: Reference periods change the interpretation of emotion frequency questions. Journal of Personality and Social Psychology, 75, 719-728.

Worcester, R. M., & Burns, T. R. (1975). Statistical examination of relative precision of verbal scales. Journal of the Market Research Society, 17, 181-197.

Wright, D. B., Gaskell, G. D., & O’Muircheartaigh, C. A. (1994). How much is ‘quite a bit’? Mapping between numerical values and vague quantifiers. Applied Cognitive Psychology, 8, 479-496.



  • There are currently no refbacks.