Errors in Survey Data

All survey data are subject to error; errors can arise from innumerable, often unexpected, sources. Errors can be generated during the planning stage of a survey, more errors can be created during the later stages of data recording and processing. Indeed, we cannot list all the ways in which surveys can go wrong. It is possible, however, to raise our awareness of the problem by focusing on two broad categories of error, random error and systematic error (or bias).

Random Error

The first major type of error of which statisticians must be aware is called random error, or chance error, or sampling error, and equals the difference between the value of a variable obtained by taking a single random sample and the value obtained by taking a census (or by averaging the results of all possible random samples of like size).

This type of error is associated only with sample surveys. It arises from the operation of chance that determines which particular units of the population happen to be included in the sample. This error can be positive or negative, tiny or huge, but it can always be reduced by increasing the size or number of random samples taken, and it is zero in a census.

Systematic Error or Bias

Systematic error, or bias, or nonsampling error, equals the difference between the value of a variable obtained by taking a census (or by averaging the results of all possible random samples of a given size) and the true value.

Unfortunately, bias can be hard to detect, and its size–unlike that of random error–cannot be estimated. For this reason, statisticians who seek to discover the truth must become aware of the major sources of bias and try their best to neutralize them.

How Bias Can Enter Surveys: The Planning Stage

If statisticians are not careful, they can literally build systematic error into the very design of their surveys. Such error can take the form of selection bias, response bias, or nonresponse bias.

Selection Bias

Selection bias is a systematic tendency to favor the inclusion in a survey of selected elementary units with particular characteristics, while excluding other units with other characteristics. As a result of selection bias, any data that are eventually collected are bound to overrepresent the former characteristics. The use of faulty frame or the selection of a nonrandom sample can easily result in a selection bias.

Response Bias

Response bias is a tendency for answers to survey questions to be wrong in some systematic way. Nothing, probably, contributes more to this problem than the faulty design of questionnaires. A good questionnaire contains instructions that motivate respondents to tell the truth by providing them with the reasons for the survey. A leading question is likely to lead the respondent to a particular answer. However, leading questions can be obvious or extremely subtle. Response bias can result from the sequencing of different questions in such a way that people answer one question within a frame or reference provided by a (possibly unrelated) earlier question.

Nonresponse Bias

Nonresponse bias is a systematic tendency for selected elementary units with particular characteristics not to contribute data in a survey while other such units, with other characteristics, do. In the presence of this problem, even a census based on a perfect frame, or a perfectly selected random sample, will fail. They will yield faulty conclusions because the data actually collected will in fact constitute a convenience sample–for example, of the most strongly opinionated people among all the people who were supposed to be in the survey. Questionnaire features that contribute to nonresponse bias include a physically unattractive design; hard-to-read print; questions that are boring, unclear, or long and involved; an excessive number of questions; bad sequencing of questions so that respondents are forced to jump back and forth from topic to topic; and, in the case of multiple-choice questions, the specification of answers that are not mutually exclusive or are excessively restricted to particular points of view, while omitting other possible views.

How Bias Can Enter Surveys: The Collection Stage

Even when the designers of surveys successfully avoid the emergence of bias by using proper frames, carefully selecting random samples, meticulously creating and pretesting questionnaires, and the like, those who execute surveys can still allow bias to enter the process.

Selection bias is apt to enter a survey when interviewers are instructed to select, within broad guidelines, the particular individuals they will question.

Response bias can arise for a number of reasons during data collection. Both interviewers and respondents can be at fault. Interviewers, often inadvertently, may solicit a particular “acceptable” answer to a question by their dress and choice of words (which may betray their social class) or by their tone of voice and demeanor (a gesture of disapproval or surprise at some answers will almost surely affect other answers). Interviewers can also make systematic mistakes when recording answers–for example, by consistently categorizing part-time income or work as full-time. And interviewers have been known to fake answers completely. Even the timing of a survey can create response bias.

Even when questionnaires are designed well, respondents often give false answers for a variety of reasons. Respondents may simply not know the answer but may give one anyway to conceal their ignorance. They may give whatever answer they think will please the interviewer. Respondents may give distorted answers in line with some current fad toward optimism or pessimism. And they may tell deliberate falsehoods to mislead competitors or impress the interviewer. People are particularly likely to boast of successes and hide failures on so-called “prestige questions” concerning, say, their knowledge of current events or famous people, their reading of books or level of education, their grade-point average or wealth, and even their brushing of teeth and taking of baths.

The data collection stage also generates its share of nonresponse bias. Respondents often refuse to answer questions they consider too personal, such as those about their age, health, and sexual habits, about their income, or about illegal and immoral activities. Indeed, many respondents refuse to answer any questions, particularly in surveys conducted by phone or mail (e-mail).

How Bias Can Enter Surveys: The Processing Stage

The emergence of bias during data collection can conceivably be minimized by the careful training and supervision of interviewers and by a variety of techniques designed to forestall incorrect or refused responses. These techniques include prominent and favorable coverage by the news media about surveys to be undertaken, personal special-delivery letters to respondents on the stationery of well-known organizations or even the signature of respected individuals, credible promises of anonymity or of rewards, and more. Nevertheless, bias can enter even at the data-processing stage. Multitudes of people who code, edit, keypunch, tabulate, print, and otherwise manipulate data have multitudes of opportunities for making non-canceling errors. Those who code answers to open-ended questions (which elicit answers in people’s own words) may consistently miscategorize them; editors may consistently introduce high or low values when encountering incomplete or illegible responses; they may fail to eliminate outliers, or “wild values” (maverick responses that are not believable because they differ greatly from the majority of observed values).

Thank you for reading!

4 thoughts on “Errors in Survey Data

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.