| ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 49
Page 49
Appendix A
Paired Testing and the 2000 Housing
Discrimination Survey
Stephen L. Ross
This paper was prepared for a National Research Council workshop on
the use of paired testing to study racial and ethnic discrimination
in housing markets. A primary motivation for the conduct of this
workshop was to examine methodological issues surrounding the use
of newspaper advertisements for initiating tests. This methodology
was used in the 1989 Housing Discrimination Study (HDS) and is
being used in Phase I of the 2000 HDS. The approach involves a
two-stage sampling of newspaper advertisements from medium-sized
and large U.S. metropolitan areas with substantial minority
populations. In the first stage, metropolitan areas are selected as
test sites, and tests are conducted within a site on the basis of a
sampling of advertisements from the major metropolitan newspaper.
This paper is organized into two major sections. The first
introduces the concept of paired testing and reviews the major
issues surrounding its use. The second provides a brief summary of
the design of Phase I of the 2000 HDS, including a more detailed
discussion of the advertisement-based sampling approach and
potential alternatives.
Stephen L. Ross is an associate professor of
economics in the Department of Economics, University of
Connecticut.
OCR for page 50
Page 50
PAIRED TESTING METHODOLOGY
Basic Approach
The basic logic behind a paired test for discrimination is fairly straightforward. Two testers, one white and one minority, are matched on characteristics that are relevant to the market transaction being considered. Each tester is then sent to inquire about a market transaction under fairly controlled and highly similar circumstances. For example, in the case of rental housing, the two testers would be similar in age and physical appearance, assigned the same income and family status, and sent to inquire about the same rental unit and/or to the same rental agency using a common protocol. The result of each tester's inquiry and the treatment experienced are reported and documented in isolation from the other tester. The two testers' experiences are combined and compared at a later date by an independent third party.
Any differences between the paired testers' experiences is considered evidence of adverse or differential treatment. Paired testing is designed to measure the level or frequency of adverse treatment discrimination in a given market, where adverse treatment discrimination is defined as instances in which the treatment of an individual is adversely affected by his or her race, ethnicity, or other legally protected characteristic. Paired testing measures the level or frequency observed based on a specific protocol for sampling the market. Therefore, the testing cannot measure the actual impact of discrimination on individuals in the marketplace. For example, if real estate agents steer minority home buyers away from discriminatory lenders, a paired test of the mortgage market will not capture the mitigating effect of this behavior.
In addition, paired testing will not uncover the existence of adverse impact discrimination in a given market. Adverse impact discrimination is defined as follows. A firm or a set of firms in a market engages in many economic transactions, and for each transaction there is a relevant population of reasonable candidates. Adverse impact discrimination occurs when the policy of one or a number of firms places the minority group within the relevant population at a disadvantage relative to the majority even when the policy is applied uniformly, and this policy cannot be justified by business necessity. Naturally, this type of discrimination cannot be detected by testing because the policy is applied uniformly, and systematic racial differences in treatment may not exist.
OCR for page 51
Page 51
Paired Testing Versus Analysis of Market
Outcomes
As mentioned earlier, the key difference between findings based on
testing data and those based on analysis of market outcomes is that
testing isolates the incidence or level of discrimination observed
when pairs of testers are assigned to enter a market following
exogenous sampling and testing protocols. This structure raises
issues concerning the relevance of the observed patterns of adverse
treatment. The sampling and testing protocols may not yield a
sample of market entries that is representative of the types of
experiences typically observed in the marketplace. For example, in
the 1989 HDS, a sample of units advertised in major metropolitan
areas may not have been representative of the available housing
stock. Likewise, the testing protocol, which required testers to
walk into a real estate agency and refer to an advertisement they
had found in the newspaper, may not resemble the approach followed
by most consumers when entering the housing market. Second, testers
are sampled in a nonrandom manner based on a hiring process, which
may lead to systematic differences between the population of white
and minority testers. Finally, results based on testing data ignore
the mitigating influence of minority attempts to avoid
discrimination or mitigate the impact of experienced discriminatory
behavior.
While these concerns are important when interpreting the results of
a testing study, the design features that lead to these concerns
are also important positive attributes of testing as a research
tool. Studies of market outcomes often face considerable design
challenges because unobserved individual characteristics may
influence key determinants of treatment, such as income, education,
and work history, and also influence treatment directly
(endogeneity bias), and these unobservables may influence
individuals' choices concerning whether and how to enter a specific
market (selection bias). For example, Ondrich et al. (2001) find
that the initial request of a potential home buyer has a large
influence on the treatment experienced, but such a request is
typically unobserved in market data. Many of the observable
determinants of treatment are assigned and therefore uncorrelated
with tester unobservables. In addition, the protocols eliminate any
possibility of selection bias by exogenously sampling from a
population and by establishing a testing protocol that is followed
carefully by both testers.
Of course, actual characteristics of testers, such as education or
work experience, may influence their behavior during a test and as
a result affect their treatment. If so, these characteristics may
bias the results of a testing
OCR for page 52
Page 52
study because of across-race differences in these characteristics or
the nonrandom assignment of testers to particular tests. Naturally,
the goal of the testing protocols and tester training is to minimize
the variation in behavior across testers, which should in turn limit
the influence of actual characteristics on testers' behavior and
therefore on observed treatment. A well-designed paired-testing study
may in fact dramatically limit the potential for omitted-variable bias
by insulating observed outcomes from individual characteristics that
are often difficult to observe or record and potentially correlated
with race within the population. Heckman and Siegelman (1993) and
Ondrich et al. (2000, 2001) test whether testers are heterogeneous
over attributes that influence treatment in employment and housing
tests, respectively. The evidence for employment tests is mixed, and
the evidence for housing tests does not support the conclusion that
testers are heterogeneous in a way that influences treatment.
Moreover, the interpretation of observed racial differences is much
more straightforward with testing data than with market data. First,
tests for discrimination based on market data completely incorporate
the effects of any compensating behavior by the individuals being
discriminated against even if such behavior imposes additional costs
on the minority group. For example, in mortgage markets, a home buyer
may avoid potential discrimination in underwriting by seeking out a
higher-cost lender with lower standards. Alternatively, a home buyer
may obtain a mortgage from a second lender after being discriminated
against, but only after losing his or her first-choice home.
Second, observed racial differences in testing data represent adverse
treatment against minorities. On the other hand, analyses of market
data often combine the outcomes of individuals who engaged in economic
transactions with different firms. Even in a model that controls for
all relevant individual characteristics, observed racial differences
may arise because on average, minorities engage in economic
transactions with firms that have different policies, standards, or
prices from those of firms that are typically engaged by whites. If
these behavioral differences between firms are not justified by
business necessity, the observed racial differences would be described
as adverse impact discrimination. However, the behavioral differences
may arise because the firms operate in different market segments and
therefore represent legitimate business practices, in which case the
observed racial differences in the market should not be classified as
discrimination. Market analyses often cannot distinguish among these
three explanations for racial differences in outcomes (see Ross and
Yinger, 1999).
OCR for page 53
Page 53
The paired structure of the tests also provides two significant
advantages. First, the comparison is based on observationally
equivalent individuals being treated differently by the same firm or
individual, and the results of such comparisons carry considerable
narrative power in both legal and policy arenas. Second, the structure
of a paired test results in substantial statistical power for
detecting discrimination. Specifically, the likelihood of similar
treatment of two testers is very high because they have the same
relevant characteristics and have been sent into very similar
circumstances. The high probability of similar treatment decreases the
likelihood that differences in treatment arise by chance and increases
the ability to statistically isolate systematic adverse treatment of a
given group.
Measuring Adverse Treatment
The results of a test are typically described using two measures of
adverse treatment—gross and net. Gross adverse treatment is
the portion or fraction of tests in which the white tester received
more favorable treatment than the minority tester based on the
reports of the two testers and a predetermined criterion for
favorable treatment. Net adverse treatment is the fraction of tests
in which whites were favored minus the fraction of tests in which
minorities were favored. If the treatment can be described by a
binary variable in which favorable treatment for one tester is
recorded as a one and unfavorable treatment as a zero, the white
tester is favored over the minority tester when the former records
a one and the latter a zero. If the treatment is described by an
ordinal or continuous variable, the white tester is favored if he
or she records a higher value than the minority tester. For
continuous variables, a threshold will usually be established, and
the testers are assumed to have experienced equal treatment if the
difference in white and minority treatment does not exceed the
threshold.
Both the gross and net measures of adverse treatment may provide
misleading estimates of the actual extent of discrimination even
within the sampling frame being examined by the set of tests. The
gross measure is likely to include differences in treatment that
arise simply because the testers' visits differed in some
unobserved way, and it may therefore overstate discrimination. The
net measure is intended to correct for this problem by subtracting
instances in which the white tester experiences adverse treatment
relative to the minority tester. The net measure is constructed
under the assumption that adverse treatment against the white
tester occurs only because the testers' visits differed, and so
adverse treatment against the
OCR for page 54
Page 54
white tester provides an accurate measure of the number of instances
of minority adverse treatment that arose because the testers' visits
differed. In some cases, however, adverse treatment of the white
tester may have been based on the tester's race. For example, in a
housing test, the white tester may not be shown a unit in a minority
neighborhood because he or she is white. In this case, the net measure
will understate discrimination because the frequency of white adverse
treatment overstates the frequency of minority adverse treatment that
arose from differences between the two testers' visits. For
alternative discussions of net and gross adverse treatment, see Fix et
al. (1993) and Heckman and Siegelman (1993).
This problem may be avoided by the use of a three-person test, often
called a “sandwich test.” In a sandwich test, two white
and one minority tester are matched, assigned similar characteristics,
and sent into the same market conditions. In this test, the potential
exists for two individuals of the same race to receive differential
treatment. These differences in treatment cannot be caused by race and
must have arisen because of differences between the visits. Therefore,
these differences can be used to construct a net measure that measures
discrimination more accurately. Specifically, the frequency of adverse
treatment of one white tester relative to the other, which can arise
only because of differences in the two testers' visits, is subtracted
from the frequency of adverse treatment of the minority tester
relative to a white tester.
Alternatively, additional information concerning each test might be
used to uncover the extent of discrimination experienced in a sample
of tests. For example, in the housing market, information may be
available concerning whether the white and minority testers saw the
same agent during their visits or whether the advertised unit was in a
neighborhood with a large percentage of minority residents. If the
gross measure declines dramatically for the subsample in which the
testers saw the same agent, that measure must seriously overstate
discrimination. Alternatively, if the vast majority of white-favored
tests occur when the advertised units are located in neighborhoods
with large minority populations, the net measure must understate
discrimination.
Ondrich et al. (2000) use this information and the structure provided
by a parametric model to estimate upper and lower bounds on housing
discrimination using the 1989 HDS. The frequency measures of adverse
treatment discussed above can be thought of as simple nonparametric
estimates of the probability of adverse treatment. The same
probabilities can be predicted using the estimates from a parametric
model of a test. A
OCR for page 55
Page 55
paired test can be modeled as two separate decisions by an economic
agent, where the unobservables associated with those two decisions
share a common component because of the paired nature of the test. One
possible specification is a bivariate probit in which each equation
models the treatment of one tester, and there is a correlation between
the treatments received by the pair. Unobservable differences between
the two testers' visits are likely to decrease the correlation between
the treatments and increase the predicted probability of adverse
treatment of the minority tester relative to the white
tester—the gross measure. Ondrich et al. (2000) control for
differences between the visits by increasing the correlation between
the equations to eliminate differences between the visits and revise
the gross measure downward.
Implementation Issues
In the abstract, the strategy of sending a pair of testers to
attempt the same market transaction following a common protocol
appears simple and fairly straightforward. However, many market
transactions are quite complex, involving substantially more
interactions than simply a negotiation of prices and quantities,
and only limited information concerning the nature and form of
these transactions may be available. A testing effort will be
successful only if the design sends testers into the market in a
systematic and realistic manner.
The first design step for a testing effort is to define a point of
entry into the market. This point of entry becomes the basis for
sampling the market. A test must be initiated by random or
stratified sampling from a well-defined population. For example, in
the case of a rental housing market, tests might be initiated on
the basis of sampling the population of available rental units or
the population of agents who represent rental properties. However,
there is no reliable source for the population of available units
or even the population of agents for rental properties. Even if the
population of agents could be observed for a specific metropolitan
area, it is unlikely that any information would be available on the
volume of business handled by individual agents. An alternative
approach used in the 1989 HDS was to sample from the population of
housing advertisements appearing in the major metropolitan
newspaper, which was easily observable and provided a reasonable
mechanism for entering the market.
Once a test has been initiated, the testers must approach the
economic agent who has been sampled or who represents the property,
job, or good
OCR for page 56
Page 56
that has been sampled. A tester's approach should be consistent with
both the sampling frame discussed above and approaches commonly
witnessed by the economic agent being tested. In the case of the 1989
HDS, testers walked into a real estate agency and inquired about a
unit in an advertisement that had been selected randomly from the
newspaper. This behavior would be expected by real estate agents since
advertisements are typically used to attract customers, and this
protocol also explicitly tied the treatment experienced to the unit
that had been sampled. In some markets, however, a realistic point of
entry is more difficult to implement. For example, independent
mortgage brokers would be a very difficult group to test because many
mortgage brokers obtain the majority of their business through
referrals from builders or real estate agents. These brokers would
notice if they and their competitors simultaneously experienced a
substantial increase in direct contacts either by phone or by walk-in.
Recent Applications
1989 Housing Discrimination Study
The 1989 HDS was a major national study of discrimination
against African Americans and Hispanics in both the rental and
sales housing markets. The study sampled newspaper
advertisements in 25 metropolitan areas to produce national
estimates of housing discrimination. For each advertisement
sampled, a pair of testers who were matched by age and gender
were assigned an appropriate income level for the sampled
housing. The testers were then sent to the advertising agency to
inquire about the advertised unit and request to see it and any
other similar available housing.
The 1989 HDS was designed to measure the national incidence of
discrimination arising during visits by qualified home seekers
to a sample of units advertised for sale or rent in major
metropolitan area newspapers across the United States. The
sample of advertised units was drawn in two stages. First, a
sample of metropolitan areas was drawn from major U.S.
metropolitan areas with a central city population of 100,000 or
more and a substantial proportion of African Americans and/or
Hispanics based on the 1980 census (12 percent African American
and/or 7 percent Hispanic). Additional tests were conducted in
five of these sites to support more in-depth analysis. These
sites were chosen with certainty based on their substantial
minority population to increase the statistical precision of the
national estimates. Each
OCR for page 57
Page 57
selected area became an African American-white and/or a Hispanic-white
site for the 1989 HDS. Within each site, weekly samples of
advertisements were drawn randomly from the Sunday newspaper.
A system of weights was generated to represent the inverse of the
probability of selection for any given advertisement and to adjust for
oversampling and nonresponse. These weights represented the joint
probability of site selection and advertisement selection within a
site, controlling for advertisement volume from week to week,
saturation of the housing market within any week, and attrition within
the sample of advertisements. Weighted racial differences in treatment
provide an estimate of average adverse treatment in a national sample
of advertisements.
The study provided estimates of adverse treatment for a variety of
measures covering housing availability, sales effort, terms and
conditions (rental only), and financing assistance (sales only). For
the treatment variable “Was the advertised unit available to the
tester?” the gross incidence of adverse treatment was 17.2,
15.5, 11.1, and 9.5 percentage points for African American-white
rental, Hispanic-white rental, African American-white sales, and
Hispanic-white sales tests, respectively. The corresponding net
incidence of adverse treatment for these samples was 5.5, 8.4, 5.5,
and 4.2 percentage points. (See Yinger, 1995, for an in-depth look at
the results of the 1989 HDS.) The study also examined geographic
differences in treatment for the five in-depth sites and provided
estimates of racial steering by neighborhood racial composition, per
capita income, and median house value (see Turner and Mikelsons, 1991,
for these results).
Other Applications
The first systematic application of paired testing to hiring,
conducted in 1989, focused on discrimination against Hispanic men
applying for entry-level jobs in Chicago and San Diego. In each of
these sites, approximately 150 paired tests were conducted, based
on random samples of job openings advertised in the major
metropolitan newspapers. A similar study of hiring discrimination
against African American men was conducted a year later in Chicago
and Washington, D.C. Again, about 200 paired tests were conducted
in each metro area, based on random samples of advertised job
openings. Both studies found that white applicants were able to
advance further in the hiring process than their minority
counterparts in a statistically significant share of cases.
Specifically, in the Hispanic-white tests in which both testers
were able to submit an application, whites re
OCR for page 58
Page 58
ceived an interview and Hispanics did not 22 percent of the time,
while in the African American-white tests, only whites received an
interview 9 percent of the time. These numbers are based on the gross
measure of adverse treatment. Net adverse treatment was 14 and 6
percent for Hispanic-white and African American-white tests,
respectively. In addition, whites were significantly more likely to
receive encouragement in the hiring process (Kenney and Wissoker,
1994).
A 1998 pilot study used paired testing to assess the extent and forms
of possible discrimination in the home insurance market. Testers in
three metropolitan areas posed as buyers of closely matched homes
located in minority and white neighborhoods. They called insurance
agents on the telephone to seek insurance quotes. The homes,
neighborhoods, and insurance seekers were matched on a wide range of
characteristics so that the primary difference within a paired test
was whether the home was located in a minority or white neighborhood.
Results indicated that buyers in white neighborhoods were no more
likely than those in minority neighborhoods to receive quotes, but
they were slightly more likely to be offered some desirable types of
coverage (in one site) and to receive higher levels of service than
minorities (in another site). In Phoenix, substantially higher
premiums were quoted for homes in Hispanic neighborhoods, but because
the white and Hispanic neighborhoods were in different insurance
rating territories, the study could not determine definitively whether
the difference in premiums might have been due to legitimate
differences in rates of risk and loss (Wissoker et al., 1998).
The 1999 Homeownership Testing Project is a pilot study of
discrimination in the pre-application phase of the mortgage market.
This testing effort includes tests for African Americans and Hispanics
in two major metropolitan areas. In each area, a stratified sample of
lenders was selected by loan volume based on Home Mortgage Disclosure
Act data. The testers were assigned income, assets, and debts
sufficient to qualify to purchase a home priced at the median sales
price in the area. The assignment was structured so that the
qualifying price was constrained by the down payment, and income and
debts were assigned so that the mortgage would conform to standard
secondary market guidelines. The testers were also provided with an
A– credit history profile. The results of this study are not yet
available.
In 1999, the Urban Institute analyzed enforcement tests that had been
conducted by the National Fair Housing Alliance (NFHA) in five sites.
In two of the sites, statistically significant differences were found
between the
OCR for page 59
Page 59
treatment of white and African American testers. White applicants
received a quote, defined as information about a loan product with an
estimate of monthly mortgage payments and closing costs; African
American applicants did not receive a quote in 16 percent of the tests
in Chicago and 25 percent of the tests in Atlanta. The net measures of
adverse treatment in Chicago and Atlanta were 13 and 25 percent,
respectively. It should be noted that the lender sample for the NFHA
tests was not random; rather, lenders were chosen using indicators
based on the Home Mortgage Disclosure Act data (Smith and Delair,
1999).
2000 HOUSING DISCRIMINATION STUDY: PHASE
I
Basic Structure of Study
Phase I of the 2000 HDS is designed to study discrimination in
both rental and sales housing markets against African Americans,
Hispanics, Asian Americans, and Native Americans. The study will
provide estimates of the national incidence and severity of
discrimination against African Americans and Hispanics in
medium-sized and large metropolitan area housing markets. The
study will also provide less precise metropolitan-level
estimates of discrimination for all African American and
Hispanic sites, as well as metropolitan-level estimates for the
pilot Asian American and Native American sites. In the Asian
American pilot study, separate estimates will be developed on
the basis of different major ethnic sub-groups to assess the
importance of ethnicity in the treatment of Asian Americans.
Finally, given the concentration of the Native American
population in small metropolitan and rural areas, the study will
include pilot testing for Native Americans in two small
metropolitan areas and the surrounding hinterland.
The 2000 HDS follows the basic methodology of the 1989 HDS. The
point of entry to the market is an advertisement in a major
metropolitan newspaper. The study is based on a sampling of
advertisements in the relevant major metropolitan newspapers,
followed by a test in which the testers approach the relevant
agent or agency and identify their interest in the advertised
unit and similar units. The tests are paired in the sense that
two individuals, one white and one minority, pose as otherwise
identical home seekers. Observed racial differences in treatment
between racial groups are interpreted as the adverse treatment
expected to be experienced
OCR for page 60
Page 60
by a qualified minority member inquiring about a randomly chosen
housing unit advertised in the newspaper.
The use of a sample of newspaper advertisements offers several
advantages. First, the classified advertisements provide a clearly
defined list of housing units that are currently on the market and for
which information is available to individuals in search of housing.
Newspaper advertisements provide a credible starting point for each
test. This common starting point increases the match between the two
testers' visits relative to simply approaching a real estate agency
and therefore increases the statistical power available from a
given-sized sample of tests. Finally, the advertisement sampling
approach matches the sampling methodology of the 1989 HDS, increasing
comparability between the two studies. The weaknesses of the
advertisement sampling frame are discussed later in this section.
Sampling Design
The national samples of African American-white and Hispanic-white
tests are two-stage samples. First, a sample of sites (16 African
American-white and 10 Hispanic-white) is selected from the
population of medium-sized to large metropolitan areas with
substantial populations of the minority group being tested. A site
is included in the sample if the central city population exceeds
100,000 and the percentage of the minority group in the site
exceeds that in the U.S. population overall. Probabilities of
selection from the population of sites are based on the
metropolitan area population. Then, advertisements are drawn weekly
from the major metropolitan newspaper in each site. The samples of
Asian American and Native American tests are single-stage samples
drawn weekly from the major newspapers of individual metropolitan
areas (two Asian American sites with three ethnic groups and one
Native American site). In all sites, sufficient tests are being
conducted to provide metropolitan-level estimates of adverse
treatment (72 tests per tenure).
The sampling of advertisements is a centralized process conducted
at the Urban Institute in Washington, D.C. The real estate sections
of the Sunday newspapers for all sites are shipped to the Urban
Institute every Sunday. A site must be sampled within a couple of
hours of receipt so the sample can be relayed back to the local
fair housing group for testing in a timely fashion. For each site,
the order of the advertisement sample is randomized, and the
advertisements are forwarded to the local group one at a time (see
the next subsection for a more detailed discussion).
OCR for page 61
Page 61
One of two sampling methods is used to select advertisements for
rental and sales tests—systematic sampling or grid sampling.
Systematic sampling involves the “numbering” of
advertisements in a newspaper and the subsequent selection of a
systematic sample using an interval designed to yield the target
number of selections. Systematic sampling is employed when the number
of advertisements is relatively small (say, less than 1,000) and
confined to a specific format in the classified section. All rental
advertisement selections are made using this method. Grid sampling is
essentially an area sampling technique whereby a randomly assigned
sampling grid is overlaid on the newspaper to reveal the areas
(rectangles) that represent the sample. (Application of one grid is
tantamount to a 1 in 24 sampling fraction.) Each advertisement is
defined by a single point on the newspaper using an objective rule
(i.e., the upper corner of the first letter of the first word in the
line of descriptive text). Accordingly, all advertisements have the
same chance of selection regardless of their size. Grid sampling is
used for very large newspaper classified sections that include one or
more supplements and can contain up to 3000 advertisements.
Regardless of the selection method, once an advertisement has been
selected, it is reviewed to determine eligibility. To be eligible, a
housing unit must be within the metropolitan area boundaries, and must
be a rental property in a complex represented by an agent or a
single-family home or condominium for sale. For example, the rental
tests exclude shared rentals, seasonal rentals, and properties rented
by owners, while sales tests exclude seasonal or temporary housing,
income-generating properties, and properties for sale by the owner.
Finally, the advertisement itself may not clearly identify whether a
housing unit is eligible so that the eligibility criteria are applied
by the local testing agency on the basis of information gathered on
site. The sampling team at the Urban Institute draws substantially
more advertisements than the number of tests planned in case some are
determined to be ineligible by local testing agencies.
At the analysis stage, sample weights will be developed for each
ethnic group at both the metropolitan and national levels for the
African American-white and Hispanic-white tests. The national sampling
weights will be the product of the site selection probability and the
probability of selection of the advertisement. This weight will be
adjusted for nonresponse to form a national analytic weight for use in
national analyses (trends since 1989, as well as year 2000 estimates).
Separate metropolitan analytic weights are being developed for each
site. These will be used in creating metropolitan report cards (i.e.,
develop
OCR for page 62
Page 62
ing metro-specific estimates). The metropolitan analytic weight is the
product of a sampling weight and a nonresponse adjustment. The
sampling weight reflects the probability of selection of the
advertisement and incorporates selection within the classified section
as well as across weeks. In addition, the sampling weight controls for
market saturation within a week if it occurs. In other words, in some
small markets or during a week when many advertisements are
ineligible, the entire pool of advertisements sent to the local office
at a site may be used. Finally, the weights will be adjusted for
nonresponse.
To generate confidence intervals, statistical analysis will be
conducted for the gross measures and hypothesis tests for the net
measures using the sample weights. The standard errors of estimates
will be adjusted to account for the complex sampling design; see Kish
(1965) and Wolter (1985). Given the small number of tests available in
any given test site, statistical analysis will also be conducted for
the metropolitan report cards using exact permutation tests (see
Agresti, 1990, for a general discussion and Heckman and Siegelman,
1993, for the use of these tests in a testing context).
Test Protocol
A test begins with the selection of an eligible advertisement at
the Urban Institute and the submission of a test authorization form
to the local test coordinator specifying the type of test to be
conducted, the order in which the testers should contact the
housing provider, and whether a narrative (a quality control
measure) must be completed for this test. Selection proceeds in
order of the randomized list of advertisements An advance call by a
nonminority individual to obtain information concerning
availability (rental tests only), price, size, and location is
conducted for all rental tests and for sales tests if this
information is not available in the advertisement. Tester income
and financial characteristics (sales tests only) are assigned to
match the price of the housing unit. Occupations and employers are
assigned consistent with these characteristics, but specific
occupations (e.g., law enforcement) and regional employers are
excluded based on the belief that these occupations or employers
might receive some special treatment. Marital status and family
structure are assigned on the basis of the size of the unit and the
desire to obtain a fairly equal distribution of family types.
The local agency assigns the selected advertisement to one minority
and one white tester as soon as two testers of the same gender and
compa-
OCR for page 63
Page 63
rable ages are available. The testers each call to set up an
appointment and visit in alternating order. These calls should be 1 to
6 hours apart for rental testing and 24 to 48 hours apart for sales
testing. The actual tester visits should also be 1 to 6 hours apart
for rental testing and 24 to 96 hours apart for sales testing. Rental
testers make one visit to a rental housing site to inquire about the
availability of the advertised and similar units. A similar protocol
is followed by sales testers, except that the tester is available for
a follow-up visit to see additional units, and provision has been made
to record follow-up phone calls by the real estate agent. Testers are
required to take notes during their visit and to document its results
on standardized forms within 1 hour of completing the visit. The local
test coordinator debriefs all testers, and also collects and reviews
all test file materials. Test narratives are required on a small
number of randomly chosen tests to provide information for a quality
control review of test files. Testers are not informed that a
narrative is required prior to performing the test.
Limitations of and Alternatives to Random
Sampling of Advertisements
While the use of a sample of advertisements offers many advantages,
there are a number of disadvantages associated with this sampling
strategy. First, the units advertised in the newspaper may not
accurately represent the population of available housing units.
Units may be advertised because they are especially attractive or
in desirable neighborhoods and will attract clients to the agency.
Alternatively, some units may not be advertised to more closely
control the population of home seekers who have access to a unit or
the neighborhood in which it is located. Moreover, in the case of
sales tests, most home buyers do not learn from the newspaper about
the home they actually purchase. Finally, the importance of
newspapers in marketing housing may be declining in significance
over time as the Internet is increasingly used to market a wide
variety of products.
Within Phase I of HDS 2000, a two-pronged strategy is being used to
examine the limitations of the newspaper sampling frame, to be
carried out in a small number of pilot sites. First, newspapers
list housing advertisements by community and sometimes by smaller
geographic regions for large central cities. The distribution of
advertisements by community will be examined and compared with
estimates of the distribution of rental and owner-occupied housing
across communities in each of the pilot sites. This comparison will
make it possible to identify communities in which hous
OCR for page 64
Page 64
ing units are underrepresented in newspaper advertisements and to draw
additional samples of advertisements from these communities.
Second, after the completion of a test, the actual address of the
advertised unit is available. The Urban Institute will perform a
geographic analysis of these addresses in an attempt to identify
regions of the metropolitan area that do not appear in the sample of
advertisements, and will acquire socioeconomic characteristics for
regions and/or neighborhoods from private vendors, such as Claritas.
Once these regions have been identified, six to ten neighborhoods
(three to five per tenure) will be selected, and a variety of local or
neighborhood-level sources will be used to identify housing units for
testing. These tests will be used for comparison with traditional
advertisement-based tests, but cannot be combined with the sample of
the latter because of the nonrandom nature of the selection process.
There are many other types of marketing that might be considered for
Phases II and III of HDS 2000. First, the population of advertisements
might be expanded to include Internet and other easily observable
metropolitan-wide sources of advertisements. This expansion would
likely increase the base of marketed units covered without sacrificing
comparability to Phase I because the sample would still be drawn from
a metropolitan-wide sample of advertisements. The addition of local
sources of advertisements (below the metropolitan level) as discussed
above would expand the base further, but at the expense of
comparability. Other, more extreme modifications to the protocols
might involve a sampling of agencies and agents rather than
advertisements. As discussed earlier, it can be quite difficult to
compile a complete, nonduplicative list of rental or sales real estate
agents, and nearly impossible to obtain any measure of volume for
these agents. Finally, attempts might be made to sample available
units. One possibility is the random sampling of streets and the
second-stage selection of street-level advertisements from the
selected streets. This approach might provide a fairly representative
sample of units for sales tests (with the exception of condominiums),
but is unlikely to provide a representative sample for rental tests
since the use of street-level advertisements for rental properties is
far less uniform.
Imperfect Pairs and Differences Across
Visits
As discussed earlier, the paired-testing approach is unlikely to
yield a perfect match within a test. First, the testers approach
the selected agent at
OCR for page 65
Page 65
different times, and as a result the circumstances and treatment they
encounter may differ. The possibility of such differences implies that
the frequency of adverse treatment of minority testers, the gross
measure, may capture differences that do not represent discrimination.
In addition, testers are paired only on gender and age, and therefore
may differ on many characteristics that might influence behavior
during a test. This second problem may exacerbate the error in gross
adverse treatment as a measure of discrimination while creating the
potential for more severe biases in the analysis. Specifically, the
populations of white and minority testers may differ systematically on
characteristics that influence treatment. If so, the net and gross
measures capture a combination of discrimination and the effect of
racial differences in unobserved tester characteristics.
The 2000 HDS is attempting to address these issues. To the author's
knowledge, Phase I of this study is the first paired-testing effort
that records actual tester characteristics and makes those
characteristics available for analysis. The characteristics collected
include employment status and history, education level, individual and
household income, household structure, and experience as a home
seeker. Earlier research by Heckman and Siegelman (1993) and Ondrich
et al. (2000, 2001) found only limited evidence that tester
characteristics affect treatment. The data analyzed in these studies,
however, contain no information about testers beyond an identification
number, and these analyses were based on examining the experiences of
pairs of testers who conducted multiple tests together. In HDS 2000,
the analysis will exploit the information on actual tester
characteristics, as well as test characteristics such as the
attributes of the advertised unit and observed circumstances during a
tester's visit, to determine whether these factors influence treatment
and whether such influences affect observed net and gross adverse
treatment.
Finally, Phase II of the 2000 HDS will include three-person or triplet
tests to examine the influence of random differences between visits
and testers on observed adverse treatment. These tests will take two
forms: minority-white-white and white-minority-minority. The form will
be randomized over tests. This approach will minimize noise by
limiting the time between same-race visits while also ensuring that
the first two visits of each triplet will yield a standard paired
test.
OCR for page 66
Page 66
REFERENCES
Agresti, Alan 1990 Categorical Data Analysis .
New York : John Wiley and Sons .
Fix, M., G.
Galster, and R. Struyk 1993 An overview of auditing for
discrimination. In Clear and Convincing
Evidence: Testing for Discrimination in
America , M. Fix
and R. Struyk, eds. Washington, D.C. : Urban
Institute Press .
Heckman, J., and