Read "Measuring Housing Discrimination in a National Study: Report of a Workshop" at NAP.edu

« Previous: References

Page 49 Cite

Suggested Citation:"Appendix A Paired Testing and the 2000 Housing Discrimination Survey." National Research Council. 2002. Measuring Housing Discrimination in a National Study: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/10311.

Page 50 Cite

Page 51 Cite

Page 52 Cite

Page 53 Cite

Page 54 Cite

Page 55 Cite

Page 56 Cite

Page 57 Cite

Page 58 Cite

Page 59 Cite

Page 60 Cite

Page 61 Cite

Page 62 Cite

Page 63 Cite

Page 64 Cite

Page 65 Cite

Page 66 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Appendix A Paired Testing and the 2000 Housing Discrimination Survey Stephen L. Ross This paper was prepared for a National Research Council workshop on the use of paired testing to study racial and ethnic discrimination in hous- ing markets. A primary motivation for the conduct of this workshop was to examine methodological issues surrounding the use of newspaper adver- tisements for initiating tests. This methodology was used in the 1989 Hous- ing Discrimination Study (HDS) and is being used in Phase I of the 2000 HDS. The approach involves a two-stage sampling of newspaper advertise- ments from medium-sized and large U.S. metropolitan areas with substan- tial minority populations. In the first stage, metropolitan areas are selected as test sites, and tests are conducted within a site on the basis of a sampling of advertisements from the major metropolitan newspaper. This paper is organized into two major sections. The first introduces the concept of paired testing and reviews the major issues surrounding its use. The second provides a brief summary of the design of Phase I of the 2000 HDS, including a more detailed discussion of the advertisement- based sampling approach and potential alternatives. Stephen L. Ross is an associate professor of economics in the Department of Economics, University of Connecticut. 49

50 APPENDIX A PAIRED TESTING METHODOLOGY Basic Approach The basic logic behind a paired test for discrimination is fairly straight- forward. Two testers, one white and one minority, are matched on char- acteristics that are relevant to the market transaction being considered. Each tester is then sent to inquire about a market transaction under fairly con- trolled and highly similar circumstances. For example, in the case of rental housing, the two testers would be similar in age and physical appearance, assigned the same income and family status, and sent to inquire about the same rental unit and/or to the same rental agency using a common proto- col. The result of each testerâs inquiry and the treatment experienced are reported and documented in isolation from the other tester. The two testersâ experiences are combined and compared at a later date by an independent third party. Any differences between the paired testersâ experiences is considered evidence of adverse or differential treatment. Paired testing is designed to measure the level or frequency of adverse treatment discrimination in a given market, where adverse treatment discrimination is defined as instances in which the treatment of an individual is adversely affected by his or her race, ethnicity, or other legally protected characteristic. Paired testing mea- sures the level or frequency observed based on a specific protocol for sam- pling the market. Therefore, the testing cannot measure the actual impact of discrimination on individuals in the marketplace. For example, if real estate agents steer minority home buyers away from discriminatory lenders, a paired test of the mortgage market will not capture the mitigating effect of this behavior. In addition, paired testing will not uncover the existence of adverse impact discrimination in a given market. Adverse impact discrimination is defined as follows. A firm or a set of firms in a market engages in many economic transactions, and for each transaction there is a relevant popula- tion of reasonable candidates. Adverse impact discrimination occurs when the policy of one or a number of firms places the minority group within the relevant population at a disadvantage relative to the majority even when the policy is applied uniformly, and this policy cannot be justified by business necessity. Naturally, this type of discrimination cannot be detected by test- ing because the policy is applied uniformly, and systematic racial differ- ences in treatment may not exist.

PAIRED TESTING AND THE 2000 HOUSING DISCRIMINATION SURVEY 51 Paired Testing Versus Analysis of Market Outcomes As mentioned earlier, the key difference between findings based on testing data and those based on analysis of market outcomes is that testing isolates the incidence or level of discrimination observed when pairs of testers are assigned to enter a market following exogenous sampling and testing protocols. This structure raises issues concerning the relevance of the observed patterns of adverse treatment. The sampling and testing pro- tocols may not yield a sample of market entries that is representative of the types of experiences typically observed in the marketplace. For example, in the 1989 HDS, a sample of units advertised in major metropolitan areas may not have been representative of the available housing stock. Likewise, the testing protocol, which required testers to walk into a real estate agency and refer to an advertisement they had found in the newspaper, may not resemble the approach followed by most consumers when entering the housing market. Second, testers are sampled in a nonrandom manner based on a hiring process, which may lead to systematic differences between the population of white and minority testers. Finally, results based on testing data ignore the mitigating influence of minority attempts to avoid dis- crimination or mitigate the impact of experienced discriminatory behavior. While these concerns are important when interpreting the results of a testing study, the design features that lead to these concerns are also impor- tant positive attributes of testing as a research tool. Studies of market out- comes often face considerable design challenges because unobserved indi- vidual characteristics may influence key determinants of treatment, such as income, education, and work history, and also influence treatment directly (endogeneity bias), and these unobservables may influence individualsâ choices concerning whether and how to enter a specific market (selection bias). For example, Ondrich et al. (2001) find that the initial request of a potential home buyer has a large influence on the treatment experienced, but such a request is typically unobserved in market data. Many of the observable determinants of treatment are assigned and therefore uncorrelated with tester unobservables. In addition, the protocols elimi- nate any possibility of selection bias by exogenously sampling from a popu- lation and by establishing a testing protocol that is followed carefully by both testers. Of course, actual characteristics of testers, such as education or work experience, may influence their behavior during a test and as a result affect their treatment. If so, these characteristics may bias the results of a testing

52 APPENDIX A study because of across-race differences in these characteristics or the non- random assignment of testers to particular tests. Naturally, the goal of the testing protocols and tester training is to minimize the variation in behav- ior across testers, which should in turn limit the influence of actual charac- teristics on testersâ behavior and therefore on observed treatment. A well- designed paired-testing study may in fact dramatically limit the potential for omitted-variable bias by insulating observed outcomes from individual characteristics that are often difficult to observe or record and potentially correlated with race within the population. Heckman and Siegelman (1993) and Ondrich et al. (2000, 2001) test whether testers are heteroge- neous over attributes that influence treatment in employment and housing tests, respectively. The evidence for employment tests is mixed, and the evidence for housing tests does not support the conclusion that testers are heterogeneous in a way that influences treatment. Moreover, the interpretation of observed racial differences is much more straightforward with testing data than with market data. First, tests for discrimination based on market data completely incorporate the effects of any compensating behavior by the individuals being discriminated against even if such behavior imposes additional costs on the minority group. For example, in mortgage markets, a home buyer may avoid poten- tial discrimination in underwriting by seeking out a higher-cost lender with lower standards. Alternatively, a home buyer may obtain a mortgage from a second lender after being discriminated against, but only after losing his or her first-choice home. Second, observed racial differences in testing data represent adverse treatment against minorities. On the other hand, analyses of market data often combine the outcomes of individuals who engaged in economic trans- actions with different firms. Even in a model that controls for all relevant individual characteristics, observed racial differences may arise because on average, minorities engage in economic transactions with firms that have different policies, standards, or prices from those of firms that are typically engaged by whites. If these behavioral differences between firms are not justified by business necessity, the observed racial differences would be de- scribed as adverse impact discrimination. However, the behavioral differ- ences may arise because the firms operate in different market segments and therefore represent legitimate business practices, in which case the observed racial differences in the market should not be classified as discrimination. Market analyses often cannot distinguish among these three explanations for racial differences in outcomes (see Ross and Yinger, 1999).

PAIRED TESTING AND THE 2000 HOUSING DISCRIMINATION SURVEY 53 The paired structure of the tests also provides two significant advan- tages. First, the comparison is based on observationally equivalent indi- viduals being treated differently by the same firm or individual, and the results of such comparisons carry considerable narrative power in both legal and policy arenas. Second, the structure of a paired test results in substan- tial statistical power for detecting discrimination. Specifically, the likeli- hood of similar treatment of two testers is very high because they have the same relevant characteristics and have been sent into very similar circum- stances. The high probability of similar treatment decreases the likelihood that differences in treatment arise by chance and increases the ability to statistically isolate systematic adverse treatment of a given group. Measuring Adverse Treatment The results of a test are typically described using two measures of ad- verse treatmentâgross and net. Gross adverse treatment is the portion or fraction of tests in which the white tester received more favorable treatment than the minority tester based on the reports of the two testers and a prede- termined criterion for favorable treatment. Net adverse treatment is the fraction of tests in which whites were favored minus the fraction of tests in which minorities were favored. If the treatment can be described by a binary variable in which favorable treatment for one tester is recorded as a one and unfavorable treatment as a zero, the white tester is favored over the minority tester when the former records a one and the latter a zero. If the treatment is described by an ordinal or continuous variable, the white tester is favored if he or she records a higher value than the minority tester. For continuous variables, a threshold will usually be established, and the testers are assumed to have experienced equal treatment if the difference in white and minority treatment does not exceed the threshold. Both the gross and net measures of adverse treatment may provide misleading estimates of the actual extent of discrimination even within the sampling frame being examined by the set of tests. The gross measure is likely to include differences in treatment that arise simply because the testersâ visits differed in some unobserved way, and it may therefore over- state discrimination. The net measure is intended to correct for this prob- lem by subtracting instances in which the white tester experiences adverse treatment relative to the minority tester. The net measure is constructed under the assumption that adverse treatment against the white tester occurs only because the testersâ visits differed, and so adverse treatment against the

54 APPENDIX A white tester provides an accurate measure of the number of instances of minority adverse treatment that arose because the testersâ visits differed. In some cases, however, adverse treatment of the white tester may have been based on the testerâs race. For example, in a housing test, the white tester may not be shown a unit in a minority neighborhood because he or she is white. In this case, the net measure will understate discrimination because the frequency of white adverse treatment overstates the frequency of mi- nority adverse treatment that arose from differences between the two testersâ visits. For alternative discussions of net and gross adverse treatment, see Fix et al. (1993) and Heckman and Siegelman (1993). This problem may be avoided by the use of a three-person test, often called a âsandwich test.â In a sandwich test, two white and one minority tester are matched, assigned similar characteristics, and sent into the same market conditions. In this test, the potential exists for two individuals of the same race to receive differential treatment. These differences in treat- ment cannot be caused by race and must have arisen because of differences between the visits. Therefore, these differences can be used to construct a net measure that measures discrimination more accurately. Specifically, the frequency of adverse treatment of one white tester relative to the other, which can arise only because of differences in the two testersâ visits, is sub- tracted from the frequency of adverse treatment of the minority tester rela- tive to a white tester. Alternatively, additional information concerning each test might be used to uncover the extent of discrimination experienced in a sample of tests. For example, in the housing market, information may be available concerning whether the white and minority testers saw the same agent dur- ing their visits or whether the advertised unit was in a neighborhood with a large percentage of minority residents. If the gross measure declines dra- matically for the subsample in which the testers saw the same agent, that measure must seriously overstate discrimination. Alternatively, if the vast majority of white-favored tests occur when the advertised units are located in neighborhoods with large minority populations, the net measure must understate discrimination. Ondrich et al. (2000) use this information and the structure provided by a parametric model to estimate upper and lower bounds on housing discrimination using the 1989 HDS. The frequency measures of adverse treatment discussed above can be thought of as simple nonparametric esti- mates of the probability of adverse treatment. The same probabilities can be predicted using the estimates from a parametric model of a test. A

PAIRED TESTING AND THE 2000 HOUSING DISCRIMINATION SURVEY 55 paired test can be modeled as two separate decisions by an economic agent, where the unobservables associated with those two decisions share a com- mon component because of the paired nature of the test. One possible specification is a bivariate probit in which each equation models the treat- ment of one tester, and there is a correlation between the treatments re- ceived by the pair. Unobservable differences between the two testersâ visits are likely to decrease the correlation between the treatments and increase the predicted probability of adverse treatment of the minority tester relative to the white testerâthe gross measure. Ondrich et al. (2000) control for differences between the visits by increasing the correlation between the equations to eliminate differences between the visits and revise the gross measure downward. Implementation Issues In the abstract, the strategy of sending a pair of testers to attempt the same market transaction following a common protocol appears simple and fairly straightforward. However, many market transactions are quite com- plex, involving substantially more interactions than simply a negotiation of prices and quantities, and only limited information concerning the nature and form of these transactions may be available. A testing effort will be successful only if the design sends testers into the market in a systematic and realistic manner. The first design step for a testing effort is to define a point of entry into the market. This point of entry becomes the basis for sampling the market. A test must be initiated by random or stratified sampling from a well- defined population. For example, in the case of a rental housing market, tests might be initiated on the basis of sampling the population of available rental units or the population of agents who represent rental properties. However, there is no reliable source for the population of available units or even the population of agents for rental properties. Even if the population of agents could be observed for a specific metropolitan area, it is unlikely that any information would be available on the volume of business handled by individual agents. An alternative approach used in the 1989 HDS was to sample from the population of housing advertisements appearing in the major metropolitan newspaper, which was easily observable and provided a reasonable mechanism for entering the market. Once a test has been initiated, the testers must approach the economic agent who has been sampled or who represents the property, job, or good

56 APPENDIX A that has been sampled. A testerâs approach should be consistent with both the sampling frame discussed above and approaches commonly witnessed by the economic agent being tested. In the case of the 1989 HDS, testers walked into a real estate agency and inquired about a unit in an advertise- ment that had been selected randomly from the newspaper. This behavior would be expected by real estate agents since advertisements are typically used to attract customers, and this protocol also explicitly tied the treat- ment experienced to the unit that had been sampled. In some markets, however, a realistic point of entry is more difficult to implement. For example, independent mortgage brokers would be a very difficult group to test because many mortgage brokers obtain the majority of their business through referrals from builders or real estate agents. These brokers would notice if they and their competitors simultaneously experienced a substan- tial increase in direct contacts either by phone or by walk-in. Recent Applications 1989 Housing Discrimination Study The 1989 HDS was a major national study of discrimination against African Americans and Hispanics in both the rental and sales housing mar- kets. The study sampled newspaper advertisements in 25 metropolitan areas to produce national estimates of housing discrimination. For each advertisement sampled, a pair of testers who were matched by age and gender were assigned an appropriate income level for the sampled housing. The testers were then sent to the advertising agency to inquire about the advertised unit and request to see it and any other similar available housing. The 1989 HDS was designed to measure the national incidence of discrimination arising during visits by qualified home seekers to a sample of units advertised for sale or rent in major metropolitan area newspapers across the United States. The sample of advertised units was drawn in two stages. First, a sample of metropolitan areas was drawn from major U.S. metropolitan areas with a central city popula- tion of 100,000 or more and a substantial proportion of African Ameri- cans and/or Hispanics based on the 1980 census (12 percent African American and/or 7 percent Hispanic). Additional tests were conducted in five of these sites to support more in-depth analysis. These sites were chosen with certainty based on their substantial minority popula- tion to increase the statistical precision of the national estimates. Each

PAIRED TESTING AND THE 2000 HOUSING DISCRIMINATION SURVEY 57 selected area became an African American-white and/or a Hispanic- white site for the 1989 HDS. Within each site, weekly samples of advertisements were drawn randomly from the Sunday newspaper. A system of weights was generated to represent the inverse of the prob- ability of selection for any given advertisement and to adjust for over- sampling and nonresponse. These weights represented the joint probabil- ity of site selection and advertisement selection within a site, controlling for advertisement volume from week to week, saturation of the housing mar- ket within any week, and attrition within the sample of advertisements. Weighted racial differences in treatment provide an estimate of average ad- verse treatment in a national sample of advertisements. The study provided estimates of adverse treatment for a variety of mea- sures covering housing availability, sales effort, terms and conditions (rental only), and financing assistance (sales only). For the treatment variable âWas the advertised unit available to the tester?â the gross incidence of adverse treatment was 17.2, 15.5, 11.1, and 9.5 percentage points for African American-white rental, Hispanic-white rental, African American-white sales, and Hispanic-white sales tests, respectively. The corresponding net incidence of adverse treatment for these samples was 5.5, 8.4, 5.5, and 4.2 percentage points. (See Yinger, 1995, for an in-depth look at the results of the 1989 HDS.) The study also examined geographic differences in treat- ment for the five in-depth sites and provided estimates of racial steering by neighborhood racial composition, per capita income, and median house value (see Turner and Mikelsons, 1991, for these results). Other Applications The first systematic application of paired testing to hiring, conducted in 1989, focused on discrimination against Hispanic men applying for en- try-level jobs in Chicago and San Diego. In each of these sites, approxi- mately 150 paired tests were conducted, based on random samples of job openings advertised in the major metropolitan newspapers. A similar study of hiring discrimination against African American men was conducted a year later in Chicago and Washington, D.C. Again, about 200 paired tests were conducted in each metro area, based on random samples of advertised job openings. Both studies found that white applicants were able to ad- vance further in the hiring process than their minority counterparts in a statistically significant share of cases. Specifically, in the Hispanic-white tests in which both testers were able to submit an application, whites re-

58 APPENDIX A ceived an interview and Hispanics did not 22 percent of the time, while in the African American-white tests, only whites received an interview 9 per- cent of the time. These numbers are based on the gross measure of adverse treatment. Net adverse treatment was 14 and 6 percent for Hispanic-white and African American-white tests, respectively. In addition, whites were significantly more likely to receive encouragement in the hiring process (Kenney and Wissoker, 1994). A 1998 pilot study used paired testing to assess the extent and forms of possible discrimination in the home insurance market. Testers in three metropolitan areas posed as buyers of closely matched homes located in minority and white neighborhoods. They called insurance agents on the telephone to seek insurance quotes. The homes, neighborhoods, and in- surance seekers were matched on a wide range of characteristics so that the primary difference within a paired test was whether the home was located in a minority or white neighborhood. Results indicated that buyers in white neighborhoods were no more likely than those in minority neighbor- hoods to receive quotes, but they were slightly more likely to be offered some desirable types of coverage (in one site) and to receive higher levels of service than minorities (in another site). In Phoenix, substantially higher premiums were quoted for homes in Hispanic neighborhoods, but because the white and Hispanic neighborhoods were in different insurance rating territories, the study could not determine definitively whether the differ- ence in premiums might have been due to legitimate differences in rates of risk and loss (Wissoker et al., 1998). The 1999 Homeownership Testing Project is a pilot study of discrimi- nation in the pre-application phase of the mortgage market. This testing effort includes tests for African Americans and Hispanics in two major metropolitan areas. In each area, a stratified sample of lenders was selected by loan volume based on Home Mortgage Disclosure Act data. The testers were assigned income, assets, and debts sufficient to qualify to purchase a home priced at the median sales price in the area. The assignment was structured so that the qualifying price was constrained by the down pay- ment, and income and debts were assigned so that the mortgage would conform to standard secondary market guidelines. The testers were also provided with an Aâ credit history profile. The results of this study are not yet available. In 1999, the Urban Institute analyzed enforcement tests that had been conducted by the National Fair Housing Alliance (NFHA) in five sites. In two of the sites, statistically significant differences were found between the

PAIRED TESTING AND THE 2000 HOUSING DISCRIMINATION SURVEY 59 treatment of white and African American testers. White applicants re- ceived a quote, defined as information about a loan product with an esti- mate of monthly mortgage payments and closing costs; African American applicants did not receive a quote in 16 percent of the tests in Chicago and 25 percent of the tests in Atlanta. The net measures of adverse treatment in Chicago and Atlanta were 13 and 25 percent, respectively. It should be noted that the lender sample for the NFHA tests was not random; rather, lenders were chosen using indicators based on the Home Mortgage Disclo- sure Act data (Smith and Delair, 1999). 2000 HOUSING DISCRIMINATION STUDY: PHASE I Basic Structure of Study Phase I of the 2000 HDS is designed to study discrimination in both rental and sales housing markets against African Americans, Hispanics, Asian Americans, and Native Americans. The study will provide estimates of the national incidence and severity of discrimination against African Americans and Hispanics in medium-sized and large metropolitan area housing markets. The study will also provide less precise metropolitan- level estimates of discrimination for all African American and Hispanic sites, as well as metropolitan-level estimates for the pilot Asian American and Native American sites. In the Asian American pilot study, separate estimates will be developed on the basis of different major ethnic sub- groups to assess the importance of ethnicity in the treatment of Asian Americans. Finally, given the concentration of the Native American popu- lation in small metropolitan and rural areas, the study will include pilot testing for Native Americans in two small metropolitan areas and the sur- rounding hinterland. The 2000 HDS follows the basic methodology of the 1989 HDS. The point of entry to the market is an advertisement in a major metropolitan newspaper. The study is based on a sampling of advertisements in the relevant major metropolitan newspapers, followed by a test in which the testers approach the relevant agent or agency and identify their interest in the advertised unit and similar units. The tests are paired in the sense that two individuals, one white and one minority, pose as otherwise identical home seekers. Observed racial differences in treatment between racial groups are interpreted as the adverse treatment expected to be experienced

60 APPENDIX A by a qualified minority member inquiring about a randomly chosen hous- ing unit advertised in the newspaper. The use of a sample of newspaper advertisements offers several advan- tages. First, the classified advertisements provide a clearly defined list of housing units that are currently on the market and for which information is available to individuals in search of housing. Newspaper advertisements provide a credible starting point for each test. This common starting point increases the match between the two testersâ visits relative to simply ap- proaching a real estate agency and therefore increases the statistical power available from a given-sized sample of tests. Finally, the advertisement sam- pling approach matches the sampling methodology of the 1989 HDS, in- creasing comparability between the two studies. The weaknesses of the advertisement sampling frame are discussed later in this section. Sampling Design The national samples of African American-white and Hispanic-white tests are two-stage samples. First, a sample of sites (16 African American- white and 10 Hispanic-white) is selected from the population of medium- sized to large metropolitan areas with substantial populations of the minor- ity group being tested. A site is included in the sample if the central city population exceeds 100,000 and the percentage of the minority group in the site exceeds that in the U.S. population overall. Probabilities of selec- tion from the population of sites are based on the metropolitan area popu- lation. Then, advertisements are drawn weekly from the major metropoli- tan newspaper in each site. The samples of Asian American and Native American tests are single-stage samples drawn weekly from the major news- papers of individual metropolitan areas (two Asian American sites with three ethnic groups and one Native American site). In all sites, sufficient tests are being conducted to provide metropolitan-level estimates of adverse treatment (72 tests per tenure). The sampling of advertisements is a centralized process conducted at the Urban Institute in Washington, D.C. The real estate sections of the Sunday newspapers for all sites are shipped to the Urban Institute every Sunday. A site must be sampled within a couple of hours of receipt so the sample can be relayed back to the local fair housing group for testing in a timely fashion. For each site, the order of the advertisement sample is randomized, and the advertisements are forwarded to the local group one at a time (see the next subsection for a more detailed discussion).

PAIRED TESTING AND THE 2000 HOUSING DISCRIMINATION SURVEY 61 One of two sampling methods is used to select advertisements for rental and sales testsâsystematic sampling or grid sampling. Systematic sam- pling involves the ânumberingâ of advertisements in a newspaper and the subsequent selection of a systematic sample using an interval designed to yield the target number of selections. Systematic sampling is employed when the number of advertisements is relatively small (say, less than 1,000) and confined to a specific format in the classified section. All rental adver- tisement selections are made using this method. Grid sampling is essen- tially an area sampling technique whereby a randomly assigned sampling grid is overlaid on the newspaper to reveal the areas (rectangles) that repre- sent the sample. (Application of one grid is tantamount to a 1 in 24 sam- pling fraction.) Each advertisement is defined by a single point on the newspaper using an objective rule (i.e., the upper corner of the first letter of the first word in the line of descriptive text). Accordingly, all advertise- ments have the same chance of selection regardless of their size. Grid sam- pling is used for very large newspaper classified sections that include one or more supplements and can contain up to 3000 advertisements. Regardless of the selection method, once an advertisement has been selected, it is reviewed to determine eligibility. To be eligible, a housing unit must be within the metropolitan area boundaries, and must be a rental property in a complex represented by an agent or a single-family home or condominium for sale. For example, the rental tests exclude shared rentals, seasonal rentals, and properties rented by owners, while sales tests exclude seasonal or temporary housing, income-generating properties, and proper- ties for sale by the owner. Finally, the advertisement itself may not clearly identify whether a housing unit is eligible so that the eligibility criteria are applied by the local testing agency on the basis of information gathered on site. The sampling team at the Urban Institute draws substantially more advertisements than the number of tests planned in case some are deter- mined to be ineligible by local testing agencies. At the analysis stage, sample weights will be developed for each ethnic group at both the metropolitan and national levels for the African Ameri- can-white and Hispanic-white tests. The national sampling weights will be the product of the site selection probability and the probability of selection of the advertisement. This weight will be adjusted for nonresponse to form a national analytic weight for use in national analyses (trends since 1989, as well as year 2000 estimates). Separate metropolitan analytic weights are being developed for each site. These will be used in creating metropolitan report cards (i.e., develop-

62 APPENDIX A ing metro-specific estimates). The metropolitan analytic weight is the prod- uct of a sampling weight and a nonresponse adjustment. The sampling weight reflects the probability of selection of the advertisement and incor- porates selection within the classified section as well as across weeks. In addition, the sampling weight controls for market saturation within a week if it occurs. In other words, in some small markets or during a week when many advertisements are ineligible, the entire pool of advertisements sent to the local office at a site may be used. Finally, the weights will be adjusted for nonresponse. To generate confidence intervals, statistical analysis will be conducted for the gross measures and hypothesis tests for the net measures using the sample weights. The standard errors of estimates will be adjusted to ac- count for the complex sampling design; see Kish (1965) and Wolter (1985). Given the small number of tests available in any given test site, statistical analysis will also be conducted for the metropolitan report cards using ex- act permutation tests (see Agresti, 1990, for a general discussion and Heckman and Siegelman, 1993, for the use of these tests in a testing con- text). Test Protocol A test begins with the selection of an eligible advertisement at the Ur- ban Institute and the submission of a test authorization form to the local test coordinator specifying the type of test to be conducted, the order in which the testers should contact the housing provider, and whether a narra- tive (a quality control measure) must be completed for this test. Selection proceeds in order of the randomized list of advertisements An advance call by a nonminority individual to obtain information concerning availability (rental tests only), price, size, and location is conducted for all rental tests and for sales tests if this information is not available in the advertisement. Tester income and financial characteristics (sales tests only) are assigned to match the price of the housing unit. Occupations and employers are as- signed consistent with these characteristics, but specific occupations (e.g., law enforcement) and regional employers are excluded based on the belief that these occupations or employers might receive some special treatment. Marital status and family structure are assigned on the basis of the size of the unit and the desire to obtain a fairly equal distribution of family types. The local agency assigns the selected advertisement to one minority and one white tester as soon as two testers of the same gender and compa-

PAIRED TESTING AND THE 2000 HOUSING DISCRIMINATION SURVEY 63 rable ages are available. The testers each call to set up an appointment and visit in alternating order. These calls should be 1 to 6 hours apart for rental testing and 24 to 48 hours apart for sales testing. The actual tester visits should also be 1 to 6 hours apart for rental testing and 24 to 96 hours apart for sales testing. Rental testers make one visit to a rental housing site to inquire about the availability of the advertised and similar units. A similar protocol is followed by sales testers, except that the tester is available for a follow-up visit to see additional units, and provision has been made to record follow-up phone calls by the real estate agent. Testers are required to take notes during their visit and to document its results on standardized forms within 1 hour of completing the visit. The local test coordinator debriefs all testers, and also collects and reviews all test file materials. Test narratives are required on a small number of randomly chosen tests to pro- vide information for a quality control review of test files. Testers are not informed that a narrative is required prior to performing the test. Limitations of and Alternatives to Random Sampling of Advertisements While the use of a sample of advertisements offers many advantages, there are a number of disadvantages associated with this sampling strategy. First, the units advertised in the newspaper may not accurately represent the population of available housing units. Units may be advertised because they are especially attractive or in desirable neighborhoods and will attract clients to the agency. Alternatively, some units may not be advertised to more closely control the population of home seekers who have access to a unit or the neighborhood in which it is located. Moreover, in the case of sales tests, most home buyers do not learn from the newspaper about the home they actually purchase. Finally, the importance of newspapers in marketing housing may be declining in significance over time as the Internet is increasingly used to market a wide variety of products. Within Phase I of HDS 2000, a two-pronged strategy is being used to examine the limitations of the newspaper sampling frame, to be carried out in a small number of pilot sites. First, newspapers list housing advertise- ments by community and sometimes by smaller geographic regions for large central cities. The distribution of advertisements by community will be examined and compared with estimates of the distribution of rental and owner-occupied housing across communities in each of the pilot sites. This comparison will make it possible to identify communities in which hous-

64 APPENDIX A ing units are underrepresented in newspaper advertisements and to draw additional samples of advertisements from these communities. Second, after the completion of a test, the actual address of the adver- tised unit is available. The Urban Institute will perform a geographic analy- sis of these addresses in an attempt to identify regions of the metropolitan area that do not appear in the sample of advertisements, and will acquire socioeconomic characteristics for regions and/or neighborhoods from pri- vate vendors, such as Claritas. Once these regions have been identified, six to ten neighborhoods (three to five per tenure) will be selected, and a vari- ety of local or neighborhood-level sources will be used to identify housing units for testing. These tests will be used for comparison with traditional advertisement-based tests, but cannot be combined with the sample of the latter because of the nonrandom nature of the selection process. There are many other types of marketing that might be considered for Phases II and III of HDS 2000. First, the population of advertisements might be expanded to include Internet and other easily observable metro- politan-wide sources of advertisements. This expansion would likely in- crease the base of marketed units covered without sacrificing comparability to Phase I because the sample would still be drawn from a metropolitan- wide sample of advertisements. The addition of local sources of advertise- ments (below the metropolitan level) as discussed above would expand the base further, but at the expense of comparability. Other, more extreme modifications to the protocols might involve a sampling of agencies and agents rather than advertisements. As discussed earlier, it can be quite difficult to compile a complete, nonduplicative list of rental or sales real estate agents, and nearly impossible to obtain any measure of volume for these agents. Finally, attempts might be made to sample available units. One possibility is the random sampling of streets and the second-stage selection of street-level advertisements from the selected streets. This ap- proach might provide a fairly representative sample of units for sales tests (with the exception of condominiums), but is unlikely to provide a repre- sentative sample for rental tests since the use of street-level advertisements for rental properties is far less uniform. Imperfect Pairs and Differences Across Visits As discussed earlier, the paired-testing approach is unlikely to yield a perfect match within a test. First, the testers approach the selected agent at

PAIRED TESTING AND THE 2000 HOUSING DISCRIMINATION SURVEY 65 different times, and as a result the circumstances and treatment they en- counter may differ. The possibility of such differences implies that the frequency of adverse treatment of minority testers, the gross measure, may capture differences that do not represent discrimination. In addition, testers are paired only on gender and age, and therefore may differ on many char- acteristics that might influence behavior during a test. This second prob- lem may exacerbate the error in gross adverse treatment as a measure of discrimination while creating the potential for more severe biases in the analysis. Specifically, the populations of white and minority testers may differ systematically on characteristics that influence treatment. If so, the net and gross measures capture a combination of discrimination and the effect of racial differences in unobserved tester characteristics. The 2000 HDS is attempting to address these issues. To the authorâs knowledge, Phase I of this study is the first paired-testing effort that records actual tester characteristics and makes those characteristics available for analysis. The characteristics collected include employment status and his- tory, education level, individual and household income, household struc- ture, and experience as a home seeker. Earlier research by Heckman and Siegelman (1993) and Ondrich et al. (2000, 2001) found only limited evidence that tester characteristics affect treatment. The data analyzed in these studies, however, contain no information about testers beyond an identification number, and these analyses were based on examining the experiences of pairs of testers who conducted multiple tests together. In HDS 2000, the analysis will exploit the information on actual tester char- acteristics, as well as test characteristics such as the attributes of the adver- tised unit and observed circumstances during a testerâs visit, to determine whether these factors influence treatment and whether such influences af- fect observed net and gross adverse treatment. Finally, Phase II of the 2000 HDS will include three-person or triplet tests to examine the influence of random differences between visits and testers on observed adverse treatment. These tests will take two forms: minority-white-white and white-minority-minority. The form will be ran- domized over tests. This approach will minimize noise by limiting the time between same-race visits while also ensuring that the first two visits of each triplet will yield a standard paired test.

66 APPENDIX A REFERENCES Agresti, Alan 1990 Categorical Data Analysis. New York: John Wiley and Sons. Fix, M., G. Galster, and R. Struyk 1993 An overview of auditing for discrimination. In Clear and Convincing Evidence: Testing for Discrimination in America, M. Fix and R. Struyk, eds. Washington, D.C.: Urban Institute Press. Heckman, J., and P. Siegelman 1993 The Urban Institute studies: Their methods and findings. In Clear and Convincing Evidence: Testing for Discrimination in America, M. Fix and R. Struyk, eds. Washington, D.C.: Urban Institute Press. Kenney, G.M., and D.A. Wissoker 1994 An analysis of the correlates of discrimination facing young Hispanic job-seekers. American Economic Review 84(June):674-683. Kish, L. 1965 Survey Sampling. New York: John Wiley and Sons. Ondrich, J., S. Ross, and J. Yinger 2001 Now You See It, Now You Donât: Why Some Homes Are Hidden From Black Buyers. Unpublished manuscript. Ondrich, J., S. Ross, and J. Yinger 2000 How common is housing discrimination? Improving on traditional measures. Journal of Urban Economics (47):470-500. Ross, S., and J. Yinger 1999 Does discrimination exist? The Boston Fed study and its critics. In Mortgage Lending Discrimination: A Review of Existing Evidence, M. Turner and F. Skidmore, eds. Urban Institute Monograph Series on Race and Discrimination. Washington, DC: Urban Institute Press. Smith, R., and M. Delair 1999 New evidence from lender testing: Discrimination at the pre-application stage. In Mortgage Lending Discrimination: A Review of Existing Evidence, M. Turner and F. Skidmore, eds. Urban Institute Monograph Series on Race and Discrimination. Washington, DC: Urban Institute Press. Turner, M.A., and M. Mikelsons 1991 Patterns of racial steering in four metropolitan areas. Journal of Housing Economics (2):199-234. Wissoker, D., W. Zimmermann, and G. Galster 1998 Testing for Discrimination in Home Insurance. Washington, D.C.: Urban Institute Press. Wolter, K. 1985 Introduction to Variance Estimation. New York: Springer-Verlag, Inc. Yinger, J. 1995 Closed Doors, Opportunities Lost: The Continuing Costs of Housing Discrimination. New York: Russell Sage Foundation.

Next: Appendix B Audit Studies and the Assessment of Discrimination »

Measuring Housing Discrimination in a National Study: Report of a Workshop (2002)

Chapter: Appendix A Paired Testing and the 2000 Housing Discrimination Survey

Welcome to OpenBook!

Get Email Updates