thin blue line
Date of Publication: December 2000 CYFERNet For Professionals

Section 6: Designing Data Collection Instruments

Using Existing Data

Thin Magenta Line
Previous Page Home Next Page
Thin Magenta Line

The use of existing data is often helpful, and can save considerable time and money if the information is of good quality. The evaluation may call for data from records such as class attendance, lists of learning objectives, pre- and post-tests, prevention files and scores on standardized instruments. The disadvantage of existing data is that the evaluator has no control over which data are collected, the quality of the data, and the methods of data entry, filing, and storage.

When the evaluator has decided to pull information from existing forms, it will be important to plan carefully and be specific about what type of information is needed. For example, a form may have information about gender of the parent and the child, age of the child, employment status, and whether the family receives ancillary support from organizations like WIC. The evaluator will need a systematic way to transform this information into numerical data.

Type of Variables

The evaluator can enter this information directly into a spreadsheet (this is discussed below). Alternatively, it might be more practical to record this information on paper and enter it into the computer at a later date. Below is a sample coding form that could be used to extract information from existing records. Notice that each response category (e.g. male/female) has a corresponding numerical value. Information that already has a numerical value (e.g., age) does not require an additional numerical value; the variable name can simply be written and a space left for the coder to enter this information.

Example 6.1: Coding Demographic Information

Gender of child
male……….1
female……..2

Age of child
               ______

WIC recipient
yes…………1
no……….….0

For many variables, the numbers, as numbers, are arbitrary and meaningless. For example, by convention, "male" is typically coded "1," and "female" is coded "2" (or female can be coded as "1" if preferred). But if an average for the variable "gender" were computed, the number derived would make no sense. These types of variables are called "categorical." Categorical variables are useful in describing the sample. For example, the evaluator may want to know how many male and female children are in a sample. A frequency distribution with the number of "1's" and "2's" would provide this information.

Technical Tip 6.1: Coding Yes/No Variables

At first glance, the codes for yes/no questions may also appear arbitrary. However, a general rule is to code 'yes" as "1," and "no" as "0." This is helpful when deciding to add together a series of yes/no questions (such as number of agencies that are involved with a particular family, or the number of life stressors that a client reports). If yes/no has been coded in this way, the sum of different variables in a set is meaningful. On the other hand, if "no" is coded as "2" (as many researchers do), the evaluator will have more difficulty interpreting the sum of various yes/no questions. Or the evaluator may have to "recode" the no's in the statistical program, adding another step to the process.

Coding Forms

Should the evaluator develop a coding form or simply enter the data into the computer? That depends on the resources available for the evaluation. It will need to be determined if the data is to be entered into the computer at each site or sent to a central office. If sites have the facilities to enter data, it might make more sense to enter this information directly into the computer rather than completing a coding sheet. This may be particularly helpful if the statistical package being used presents a data template so that coders do not have look up codes every time they enter information. On the other hand, some statistical packages do not have this option and it may make more sense to have the coder simply circle the appropriate code on a sheet. Further, some facilities or sites will not allow the evaluator to remove files from a specified area. Unless the coder has a laptop computer or access to a computer with appropriate software, a coding sheet may still be the best bet.

If a pre-coded coding sheet is developed, it should be pretested on any and all forms being used. Following in-house testing, the pre-coded coding sheet should be sent to a few other sites for them to try. This type of pre-testing will be an efficient time save in the long-run and will help minimize confusion about how this information should be coded.

Technical Tip 6.2: Developing a Coding Form

Sometimes when individuals develop coding forms, they are tempted to squeeze as many questions as they can on a page. As thrifty as this may be, questions instead should be spread out on the page. Tightly spaced questions increase the number of coding errors. Instead, codes should be listed, with ample space left between questions. The evaluator should use only one column of information on a page, not two. This format makes it easier to both code and enter data, and will save considerable time.

The "Coder" of Data

Who should code the data? Again, this depends on the type of data and resources available. In many cases, someone at the site can code data out of the files (although it might be necessary to give release time or compensate the coder for extra time). Sites may decide to hire an extra person to help with this task. Or the evaluator may decide to pay someone to photocopy the relevant documents and have them coded at a central location. The types of data that are the most amenable to coding are unambiguous variables such as gender, age, or duty status. When the coder must make a judgment call, more opportunity for coding error occurs. Although qualitative or open-ended coding can be done, it requires more training of coders and introduces greater chances for error (Sudman & Bradburn, 1982). For variables where coders must make judgment calls (such as assigning a single code to a paragraph, or page, of narrative data), the evaluator should provide coding guidelines that will help coders make these decisions. Pre-testing the coding form will help the evaluator to anticipate the types of questions and concerns that might arise. Last, interrater reliability assessments could be conducted to help the evaluator to determine the extent to which two or more coders are in agreement in their judgements. Interrater reliability assessments help to determine if coders are interpreting data in consistent fashion.

Client Confidentiality

A final concern has to do with protecting the confidentiality of client responses (Fowler, 1998). Much of the information contained in client records is highly confidential and must be protected. One way to protect confidentiality is to identify families only by number. This number can be assigned by the evaluator or taken from existing records, such as the last four digits of the Social Security number. Another important issue is how to track families through multiple data collection opportunities. The evaluator may need to develop a protocol about how a number should be assigned so that data from multiple sources is matched to the right family. The evaluator may also decide to keep a master list of family names and their numbers in order to keep track of information relevant to each family. Obviously, this list must be handled carefully and kept secured (e.g., in a locked filing cabinet) when not in use. Below is an example of a study in the child abuse literature that drew all of its information from existing sources

Example 6.2: Use of Existing Documents

Use of existing documents can be a rich source of data. One example is the Maltreatment and School Achievement study conducted at the Family Life Development Center, Cornell University (Eckenrode, Laird & Doris, 1993).

Records that were coded included information from the child abuse register and school records. The register provided information on the type and severity of maltreatment and the demographic characteristics of the child and family. School records provided information on academic performance, grade repetitions, school transfers, home moves, and disciplinary actions.

If there are concerns about confidentiality, it may be more appropriate to have the data coded by someone else. The evaluator may decide to have someone already familiar with the families do the coding. Such an insider might be more sensitive to the family's issues and, therefore, have a greater appreciation for confidentiality. On the other hand, by using someone from the outside who is not familiar with the families, confidentiality might not be an issue at all. Yet another option is to have someone obliterate the names and other identifying information before turning the information over to a coder. Again, these types of decisions need to be made on a case-by-case basis. For some small sites, it might make more sense to have the information coded on site. For sites involving multiple families, the information might be better handled at central coding location.

Items vs. Scale Scores

Another type of existing information is scores on previously administered standard instruments. With standard instruments, the evaluator can simply enter a summary score, or enter all the individual items. Entering a final score is obviously faster (and may be the only information available). On the other hand, entering the responses to each item provides for more flexibility, and allows the evaluator to look at individual items or sub-scales.

As can be seen, existing files can provide a rich source of data. However, it may also be that information is still needed that is not in the files. How to collect this additional information is the focus of the next section.

Thin Magenta Line
Previous Page Home Next Page
Thin Magenta Line