Basic Concepts of Data Collection

Statistical work cannot be performed in a vacuum. Before all else, such work requires the acquisition of a crucial type of raw material: information relevant to the subject matter under study. Before going with data collection, you must become familiar with a number of crucial concepts that you will employ.

Internal vs. External Data Sources

Sometimes relevant information already exists somewhere; in that case, an investigator need only find it. A business administrator, for example, might simply search the the firm’s internal records for material that already resides in filing cabinets or computer memories. Thus, customer records would provide names, addresses, telephone numbers, data on amounts purchased, credit limits, and more. Employee records would provide names, addresses, job titles, years of service, salaries, social security numbers, and even numbers of sick days used. Production records would contain lists of products, part numbers and quantities produced, along with associated labor costs, raw material consumption, and equipment usage. A government economist would, similarly, have access to a vast database held by the Bureau of the Census, the Department of Labor, the Federal Reserve Board, and the Office of Management and Budget, to name but a few. All the sources just mentioned are internal sources.

In addition to scouring internal sources of information, our business administrator or government economist could also look for external depositories of already existing data and persuade their owners to share information. Indeed, all kinds of organizations routinely gather data and sell them to would-be users in the private sector and in government agencies alike.

From a point of view of the professional statistician, the matter of collecting data is much more complicated than checking out sources of already existing data. It concerns the question of how valid data can be generated in the first place.

Elementary Units and Variables

A statistical investigation invariably focuses on people or things with characteristics in which someone is interested. The persons or objects possessing the characteristics that interest the statistician are called elementary units. A complete listing of all elementary units relevant to a statistical investigation is called a frame. Any single observation about a specified characteristic of interest is called a datum; it is the basic unit of the statistician‘s raw material. Any collection of observations about one or more characteristics of interest, for one or more elementary units, is called a data set. A data set is univariate, bivariate, or multivariate depending on whether it contains information on one variable only, on two variables, or on more than two. The table shown below contains a multivariate data set.

Selected Characteristics of All the Full-Time Employees of an Organization

Population vs. Sample

There are two important concepts we must consider: (1) The set of all possible observations about a specified characteristic of interest is called a statistical population. (2) A subset of a statistical population, or of the frame from which it is derived, is called a sample.

Population vs. Sample

Qualitative vs. Quantitative Variables

Any given characteristic of interest to the statistician can differ in kind or in degree among various elementary units. A variable that is normally described in words rather than numerically is called a qualitative variable. As shown in the table above, examples of qualitative variables are: race, sex, and job title. Qualitative variables can, in turn, be binomial or multinomial. Observations about a binomial qualitative variable can be made in only two categories: for example, male or female, employed or unemployed, correct or incorrect, defective or satisfactory, elected or defeated, absent or present. Observations about a multinomial qualitative variable can be made in more than two categories; consider job titles, colors, languages, religions, or types of businesses.

On the other hand, a variable that is normally expressed numerically (because it differs in degree rather than kind among the elementary units under study) is called a quantitative variable. Examples of quantitative variables, as shown in the table above, include: years of service and annual salary. Quantitative variables can, in turn, be discrete or continuous. Observations about a discrete quantitative variable can assume values only at specific points on a scale of values, with gaps between them. For example, the number of children in families, of employees in firms, of students in classes, of rooms in houses, of cars in stock, of cows in pastures. Observations about a continuous quantitative variable can, in contrast, assume values at all points on a scale of values, with no breaks between possible values. Consider hight, temperature, time, volume, or weight.

Qualitative vs. Quantitative Variables

Surveys vs. Experiments

The collection of data from elementary units without exercising any particular control over factors that make these units different from one another and that may, therefore, affect the characteristic of interest being observed is called an observational study or survey.

On the other hand, the collection of data from elementary units while exercising control over some or all factors that may make these units different from one another and that may, therefore, affect the characteristic of interest being observed is called an experiment.

Census Taking vs. Sampling

A census is a complete survey in which observations about one or more characteristics of interest are made for every elementary unit that exists.

A sample survey is a partial survey in which observations about one or more characteristics are made for only a subset of all existing elementary units.

Census vs. Sample

References

Kohler, H., 1994. Statistics For Business And Economics. 3rd ed. New York: HarperCollins College Publishers, pp.5-10.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.