thin blue line
Date of Publication: December 2000 CYFERNet For Professionals

Section 5: Assessing Program Impact

Reviewing Design Options: Example for a Couples Communication Program

Thin Magenta Line
Previous Page Home Next Page
Thin Magenta Line

In the previous section, three basic types of evaluation design were described. More than one evaluation design can be used for the same program. Below, five alternatives to evaluating a program designed to increase communication skills in couples are presented.

  1. One might use a quasi-experimental design with non-completers as the possible comparison group. This approach would be more cost effective than screening a large population for matched controls, and would also have the advantage of pre-collected baseline data from intake. With this approach a substantial proportion of non-completers could be identified, located, and interviewed by phone. Self-report measures on the same outcome instruments as those administered to completers would be obtained. Drop-out interviews would also provide a means to assess the reasons for program discontinuance or dissatisfaction.
  2. One design modification would be to add on a new, time-two-only outcome measure for both experimental and comparison groups.
  3. An alternative model would be to use a design similar to that described above comparing class or treatment group couples to matched couples receiving alternative programs.
  4. Another evaluation could be conducted by using a multiple comparison group design with Group 1 (Communication Couples group), Group 2 (Alternate marital program), and Group 3 (Communication Group non-completers).
  5. A true experiment could be organized with random assignment of families to either the Couples Communication program, or an alternate program, or a no "treatment" condition such as a wait-listed group or an attendance at one overview class only group. Long-term follow-up would be desirable with this type of design. The strength of this design is its greater scientific validity over the designs noted above.

The information presented above is to assist the evaluator to select an appropriate design. Once a design has been determined, the scope of the evaluation being performed should be considered. This is the next topic.

Evaluation Scope

Impact evaluations come in all sizes. For any of the designs described above, it is possible to have versions that are "small," "medium," or "large." Factors that contribute to the "size," or scope, of the evaluation are described below. In particular sample and available resources are considered. Questions related to both of these factors are found on Worksheet 5.2.

  • Who Will Participate In the Study? Identifying The Sample.

In evaluating the program, all participating families can be assessed or a sample of clients who are "representative" of the target population can be drawn. Unless the number of participating clients is small, drawing a sample will be the most sensible choice. On Worksheet 5.2, questions are provided to assist the evaluator in thinking about the number of clients or families participating in the program under evaluation. One of the questions to be considered is the percentage of those participating that might yield an acceptable number for an evaluation. For example, if there are 9,000 parents who participated in a parent-education program, it might be decided that 10% (900) of the participants would yield an adequate sample size. Or if a program has multiple sites, the evaluator might decide to sample by site. For example, in the United States Air Force, if 30 air base sites are using a particular program, the evaluator might decide to sample clients from 20% (6) of the participating bases (Henry, 1998; Pecora, Fraser, Nelson, McCroskey & Meezan, 1995).

One factor that might influence the decision about the percentage of participating clients to be included is the size of the sample that each percentage yields. It is important to have enough subjects to demonstrate program impact, but not so many that the evaluation becomes unmanageable. A sample that is too small may lead to the false conclusion that an intervention has failed. The general rule is: the larger, the better. However, a large sample might not be possible, or even necessary. If resources are limited, it might be necessary to consult a statistician to know what the minimum necessary sample size would be for demonstrating an effect.

Technical Tip 5.1: Number of Subjects

It is important to have enough subjects in each group. Too few subjects will not yield enough data to show statistically significant effects even if these exist. A good rule is to have at least 50 subjects per group. The smaller the effect, the greater the number of subjects needed to detect it, and vice versa. Sample size considerations are particularly important when one is examining a low probability behavior such as the onset of physical aggression as an outcome indicator of program effectiveness in a non-high risk group.

Much has been written on the subject of sampling, and a full discussion of the various types of sampling is beyond the scope of this manual. However, there are two sampling methods that are easy to implement. One is the simple random sample. Using this procedure, an evaluator would randomly draw a sample of, say, 20% of the target population. To do this, one might randomly select every fifth family from a list of all participant families. Or one might decide to collect data from every family that comes into the program in a given time period (e.g., every Tuesday and Thursday, or on alternate weeks).

A more sophisticated technique is the stratified random sample. This technique still involves random selection, but ensures that different groups are represented in the sample. Demographic characteristics are commonly used to create groups. For example, a sample may be stratified by age, or gender, or ethnicity, or some other important characteristic. If there is a small number from a certain group (e.g., single mothers), the evaluator might decide to "over-sample" this group (i.e., have a higher percentage of this particular group in the sample than is present in the target population) in order to have a sufficient number of participants for data analysis. The evaluator might decide to sample from both low and high-risk families in the proportions that represent them in the total sample. These sampling techniques can also be used to sample by site. In a stratified sample, sites can be selected that reflect the range of size, mission and location of all sites participating in the program (Pietrzak, Ramier, Renner, Ford, & Gilbert, 1990).

With any sampling technique, there must be a specified inclusion/exclusion criteria to minimize potential bias that can occur when a sample is drawn in a non-standard way (e.g., sampling only "compliant" or "nice" families). It is important to keep track of people who were asked to participate but refused. The number of those who participated divided by the total number of those who were approached allows the evaluator to calculate the compliance rate. For example, if 1,000 clients who were eligible to participate were approached, and 700 actually participated, the compliance rate would be 70%.

  • Are Sufficient Resources Available?

A second scope issue to consider is whether there are sufficient resources to conduct an evaluation. In Section 1 (Subsection "Get An Overview Of The Program"), lack of human or financial resources was listed as one of the potential pitfalls in an evaluation. A common mistake is to drastically under-estimate the amount of work and money necessary to conduct an evaluation. Below is a listing of some of the factors related to the amount of resources needed for an evaluation, and the adjustments that may be made in scaling down a project.

  • Project Scale
    The more sites and/or subjects included in the evaluation, the greater both the complexity of data collection and expense of the project. If resources are limited, an evaluation of fewer sites and/or fewer subjects is the best choice.
  • Instrument Development
    Developing data collection instruments for an evaluation can increase both cost and length of time necessary for an evaluation. If it is decided that developing instruments is necessary, they should be kept as simple as possible. (Developing efficient instruments is described further in Section 6, Subsection "Collecting New Data.") On the other hand, attempting to use existing data that is in poor shape (or unusable) can cost more in time and money in the long run.
  • Pretesting Instruments
    An issue related to instrument development is pretesting. When developing an instrument, field testing will be needed. Pretesting can be brief or extensive. If resources are limited, a less-intensive session of pretesting is warranted. Often, three to five pretests are sufficient.
  • Data Coding & Cleaning
    Data coding and cleaning refer to preparing data for entry into the computer. The less coding and cleaning required, the faster (and cheaper) this step will be. Closed-ended response categories and concise questionnaires are the most efficient use of resources. Questionnaire design is described in detail in Section 6 (Subsection "Designing Survey Questions").

Bottom Line: Both design and scope influence the level of difficulty in implementation. For each design, there is a "big" and "small" version. Factors associated with increased expense/time are as follows:

  • multiple sites
  • multiple assessments
  • lengthy assessments
  • complex data that requires extensive coding and cleaning
  • large numbers of subjects

If resources are limited, or results are needed quickly, a small evaluation with either focus groups or a survey is the most appropriate choice.

Thin Magenta Line
Previous Page Home Next Page
Thin Magenta Line