Click for next page ( 248


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 247
Appendix E Statistical Rationale Behind Some Initial Findings on the Relative Statistical Plausibility of a Multiple-Frame Approach to Estimating the Victimization Rate of Rape and Sexual Assault William D. Kalsbeek1 T his paper expands the discussion in Chapter 10 on the use of a mul- tiple-frame approach to estimating the incidence of rape and sexual assault in household surveys of the Bureau of Justice Statistics. It explores the statistical rationale behind some initial findings on the relative statistical plausibility of a multiple-frame approach.2 BACKGROUND AND ASSUMPTIONS 1. The primary analysis objective is to estimate the proportion (P) of persons in the target population who have been a victim of a rape or sexual assault (RSA) in some calendar year. 2. The following two overlapping frames are involved in defining a dual-frame (DF) sample design that might be used to estimate P: (1) an administrative frame consisting of persons seen/treated/ processed for their RSA during the same calendar year and (2) a standard area household frame of the residential population of the kind used for the NCVS. 1  Kalsbeek is a professor in the Department of Biostatistics at the University of North C ­ arolina. He served as cochair of this panel. 2  presentation on the statistical issues in this appendix was presented at the Joint Statistical A Meetings in Montreal in August 2013 (Kalsbeek, Spencer, and House, 2013), available http:// www.amstat.org/meetings/jsm/2013/onlineprogram/AbstractDetails.cfm?abstractid=309226 [December 2013]. 247

OCR for page 247
248 ESTIMATING THE INCIDENCE OF RAPE AND SEXUAL ASSAULT 3. The administrative frame is a subset of the area household frame, and thus the two frames overlap. However, one can define two non- overlapping strata by considering those in the administrative frame to be one stratum and all members of the area household frame not included in the administrative frame to be the second stratum, im- plying that a sample for the second stratum selected from the area household frame would need to be screened to excluded members of the administrative frame. Formation of these two strata is the simplest frame construction arrangement for a dual-frame design and comparable to the frame structure of telephone sampling of landline and cell-only households (Hartley, 1962; Lohr, 2011). 4. The administrative frame might be chosen from any of the follow- ing sets of people who: (1) filed a crime complaint with the police or some other law enforcement agency, (2) were victims of RSA or aggravated assault when an accused perpetrator is charged with a crime and tried in the criminal justice system, (3) were treated for assault-related health consequences by a hospital emergency department, (4) were clients of victim support services (e.g., rape crisis center, domestic violence shelters, etc.), (5) were registered residents of Indian reservations, (6) were treated at Indian Health Services facilities, or (7) were patients of outpatient mental health clinics. 5. A simple form of sampling (i.e., simple random sampling with replacement, SRSWR) is applied separately to the administrative and the nonadministrative household strata. 6. The dual-frame sample design is seen as an alternative to a single- frame (SF) design but uses a standard area household frame as currently used in the NCVS. While more complex forms of strati- fied cluster sampling would be used with DF and SF designs, one assumes SRSWR sampling is applied to each frame, with the pre- sumption that effects of greater sampling complexity would cancel, thus sustaining a comparison between the two design alternatives. DETERMINING THE MOST COST-EFFICIENT SAMPLE ALLOCATION AMONG STRATA IN THE DUAL-FRAME DESIGN One can consider the simplest case of multiframe sample design in which the set of population members comprising two overlapping frames is divided into two nonoverlapping sampling strata, as for instance with cell and landline frames in telephone sampling (Hartley, 1962; Lohr, 2011). In the situation described above, we have two nonoverlapping sampling strata formed by the members of: (1) the administrative frame (A), and (2) the nonadministrative household frame (HH) consisting of those members

OCR for page 247
APPENDIX E 249 of the HH frame who are not members of the administrative frame. Under this scenario one can observe the precision of a dual-frame estimator of the prevalence of rape and sexual assault on the basis of well-known properties of the analysis from a stratified sample. H For stratified SRSWR, the variance of the estimator, pW = ∑ Wh ph , of h =1 P for the general case of selecting a sample of size n from H strata is H H Ph (1 − Ph ) V ( pW ) = ∑ Wh2V ( ph ) = ∑ Wh2 , h =1 h =1 nh where for the h-th stratum: Wh = Nh/N is the proportion of the population, Ph is the proportion of victims of RSA among all Nh population members, and ph is the proportion of RSA victims among the nh sample members. If one defines Ch, the average cost of adding another survey respondent in the h-th stratum, then we can use the simple linear variable cost model, H C* = ∑ Ch nh and the Cauchy-Schwartz inequality to establish the sample h =1 { } { } allocation that minimizes  V ( pW ) x C*  . The most cost-efficient sample   allocation to the h-th stratum is thereby ( C − E) Wh Ph (1 − Ph ) nh =K , [1] Ch C* where K = H . ∑ Wh Ph (1 − Ph ) Ch h =1 Applying the general result from Eq. [1] to the two-stratum setting of the dual frame, WA PA (1 − PA ) n( C − E) A =K CA [2] for the administrative stratum, and WHH PHH (1 − PHH ) nHH ) = K (C − E CHH [3] for the household stratum, where C* K= . WA PA (1 − PA ) CA + WHH PHH (1 − PHH ) CHH

OCR for page 247
250 ESTIMATING THE INCIDENCE OF RAPE AND SEXUAL ASSAULT VARIANCE OF A DUAL-FRAME ESTIMATE BASED ON THE MOST COST-EFFICIENT ALLOCATION The variance of pW for the stratified SRSWR with the most cost-efficient sample allocation (i.e., the nh(C–E)) for the case of H strata can be shown to be H   ∑ Wh Ph (1 − Ph ) / Ch  VDF ) ( pW ) =  h =1 H (C − E  .   n /  ∑ Wh Ph (1 − Ph ) Ch   h =1  For the two-stratum case, WA PA (1 − PA ) / CA + WHH PHH (1 − PHH ) / CHH    . [4] VDF ) ( pW ) = (C − E n/ WA PA (1 − PA ) CA + WHH PHH (1 − PHH ) CHH    Dual-Frame vs. Single-Frame HH Area Household Frame Design A cost-equivalent comparison of the dual-frame (DF) estimator with a single-frame (SF) estimator with a sample of size nSF = C*/CHH when the total variable cost of data collection for the SF design is C*. For design comparability one assumes SRSWR sampling from the household frame in which case the variance of the SF estimator (pHH) of P will be simply P (1 − P ) VSF ( pHH ) = . [5] nSF The variances of estimates of P by the DF and SF designs can be com- pared using the ratio VDF ) ( pW ) (C − E RV = . [6] VSF ( pHH ) Other Comparison Indicators 1. Ratio of Average Unit Costs for the Two Dual-Frame Strata—This ratio depicts the ratio of the average cost of adding another respon- dent to the administrative stratum compared to the comparable average cost for the nonadministrative household stratum. This indicator is computed as

OCR for page 247
APPENDIX E 251 CA θ= . [7] CHH 2. Ratio of Stratum RSA Rates for the Dual-Frame Design—Compared to an unstratified SRSWR design, Cochran (1977, Section 5.6) notes that when stratum unit costs are equal the relative effective- ness of the most cost-efficient stratum allocation for a stratified SRSWR depends on the extent of stratum differences in (i) Ph and (ii) the standard error of the RSA status (i.e., σ h = Ph (1 − Ph ) ). Differences in (ii) are especially pronounced for extremely small (or large) values of Ph, as is the case here with P being about 0.001 for the rate of RSA prevalence, and thus implying that PA >> PHH. The indicator used to measure the relative sizes of PA and PHH is PA β= . [8] PHH 3. Extent of Oversampling Members of the Administrative Frame in the Dual-Frame Design—This is a descriptive indicator of the relatively greater sampling intensity in the administrative stratum compared to the household stratum in the DF design. The indicator is computed as f( ) n( ) C−E C−E / NA φ = AC − E) = (C − E) ( A . [9] fH nHH / NHH 4. Percentage of Dual-Frame Sample from Administrative Stratum— Indicates how much of the total dual-frame sample (nDF) comes from the administrative frame. The indicator is computed as n( C − E) A Admin Sample % = 100 x . [10] nDF 5. Relative Size of the Dual-Frame Sample Compared to the Single- Frame Sample— Indicates the comparative sizes of the total sample sizes for the DF design (nDF) vs. the SF design (nDF). The indicator is computed as Relative Overall Sample Size = nDF / nSF . [11] 6. Relative Standard Error of the Estimate for the Dual-Frame De- sign—Relative measure of the precision of the dual-frame estimate with the most cost-efficient stratum allocation. The indicator is computed as (C − E) (C − E) VDF ( pW ) RSEDF ( pW ) = P . [12]

OCR for page 247
252 ESTIMATING THE INCIDENCE OF RAPE AND SEXUAL ASSAULT EXAMPLE 1: [θ = CA/CHH = 2] Suppose the following setting in which we are to compare the statisti- cal quality of estimates from a DF design involving police records as the administrative source with comparable (and thus cost-equivalent) estimates from a household SF design as currently used in the NCVS. To determine the relative utility of DF and SF designs one might pose this question. How would the variance of a DF estimate of RSA prevalence ( VDF ) ( pW ) ) (C − E compare with the variance of a comparable SF estimate (VSF(pHH)) obtained for the same cost? To find an answer to this question within the context of the design as- sumptions, definitions, and theoretical findings described previously in this document, consider the following numerical values: 1. Police records are to be used to define an administrative stratum of crime victims, so specify the size of the administrative stratum as about NA = 140,000 by extrapolating to the total U.S. population the 1997 Uniform Crime Reports partial national count of 96,122 assaults/attempts to commit rape as reported on p. 25 of Crime in the United States 1997 (Federal Bureau of Investigation, 1997) at: http://www.fbi.gov/about-us/cjis/ucr/crime-in-the-u.s/1997/toc97. pdf. 2. From an August BJS Selected Findings report by CM Rennison (Bureau of Justice Statistics, 2002b) at: http://bjs.ojp.usdoj.gov/ content/pub/pdf/rsarp00.pdf, the NCVS estimated average an- nual number of RSAs reported to police (1992-2000) was about 116,300. Thus, the proportion of police records on assaults/ attempts to commit rape that would turn out to be an RSA would be about PA = 116,300/140,000 = 0.83.3 3. Persons living at addresses define the household frame (as in the NCVS). According to Bureau of Justice Statistics (2008a) the total number of persons 12+ years of age is about N = 250,000,000 (in 2007), thus making the size of the household stratum NHH = N – NA = 249,860,000, and the proportion of the popula- tion in the administrative stratum will be about WA = 1 – WHH = 140,000/250,000,000 = 0.00056. 4. P = 0.001 based on figures from Criminal Victimization, 2007 (Bureau of Justice Statistics, 2008a), which can be found at http:// bjs.ojp.usdoj.gov/content/pub/pdf/cv07.pdf. 5. Based on a 2009 FCSM Research Conference paper presented 3  If for confidentiality protection the types of crimes sampled through police records was broader, then PA would be lower, and perhaps much lower, than this value.

OCR for page 247
APPENDIX E 253 by Michael R. Rand of BJS (Rand, 2009) in (see pages 9 and 16 of this paper) at http://www.fcsm.gov/09papers/Rand_X-B.doc, funds available to conduct the NCVS in FY2009 amounted to C* = $26M, and about 150,000 NCVS interviews were completed in 2008. These figures imply an average cost per completed inter­ view of about CHH = $26M/150000 = $173 for the household stratum. Dual-Frame Design: If the average per completed interview for the police records (admin- istrative) stratum is two (2) times that of the household stratum (i.e., like the NCVS), then θ = CA/CM = 2 and thus CA = $346. First determine the RSA rate for the household stratum as P − WA PA PHH = = 0.00053550, which makes PA = 0.83 larger than PHH 1 − WA P by a factor of about β = A ≈ 1, 550. The standard deviations of the PHH 0/1 RSA status indicator for the two strata thus differ by a factor of σ A / σ HH = PA (1 − PA ) / PHH (1 − PHH ) ≈ 16.2 . Because of these substan- tial stratum differences in Ph and σ h = Ph (1 − Ph ) one might expect from Eq. (5.37) in Cochran (1977) that a cost-efficient stratum allocation in this dual-frame context will produce substantially greater precision in estimates of P than a single-frame approach relying solely on household sampling. We will see this to be case below. Using Equations [2] and [3] above, we find that the most cost-efficient allocation of the dual-frame sample given C* for the police records stratum will be  C*   WA PA (1 − PA )  n(A ) =  C−E   ≈ 955  WA PA (1 − PA ) CA + WHH PHH (1 − PHH ) CHH   CA     and for the household stratum,  C*   WHH PHH (1 − PHH )  nHH ) =  (C − E   ≈ 148, 380.  WA PA (1 − PA ) CA + WHH PHH (1 − PHH ) CHH  CHH     Thus, the total sample size for the DF design in this case would be

OCR for page 247
254 ESTIMATING THE INCIDENCE OF RAPE AND SEXUAL ASSAULT 149,334, of which 955 (or about 0.6%) would be from the police records stratum. The variance of the weighted estimate of P from the DF design based on this most cost-efficient sample allocation between strata will be WA PA (1 − PA ) / CA + WHH PHH (1 − PHH ) / CHH  VDF ( pW ) =   = 3.649 × 10 −9. (C − E) n/ WA PA (1 − PA ) CA + WHH PHH (1 − PHH ) CHH    Cost-Equivalent Single-Frame Design: Now turning our attention to the SF design, also with a budget of C* = $26M and CHH = $173, the sample size we can afford for the house- hold frame is nSF = C*/ CHH = 150,289, which is only slightly greater that the total sample for the DF design. The variance of the single-frame esti- mate will therefore be P (1 − P ) 0.001(1 − 0.001) VSF ( pHH ) = = = 6.647 × 10 −9. nSF 150, 289 Cost-Equivalent Design Comparison: Comparing the variances for RSA estimates from the DF and SF designs with C* = $26M, we have (C − E) RV = VDF ( pW ) = 0.549. VSF ( pHH ) implying that the variance for the DF design is about 45% lower than the cost-equivalent variance for the SF design. EXAMPLE 2: [θ = CA/CHH = 10] Consider the same setting as above but where θ = CA/CHH = 10; i.e., where the average cost for the police records stratum is 10 times greater than for the household stratum (e.g., because it may be much more dif- ficult to sample, recruit, and collect data from the sample obtained from ­ olice ­ecords). Here, the most cost-efficient allocation of the DF sample p r changes to nA(C–E) = 420 and nHH(C–E) = 146,086, and the variance ratio is Rv = 0.556, implying a 43% lower variance by the DF design. 1. An important factor in the much higher average unit cost for the police records stratum is the need to broaden the search for RSA cases beyond those persons reporting assaults/attempts to com-

OCR for page 247
APPENDIX E 255 mit rape (e.g., to also include aggravated assaults by a male on a f ­ emale) so that, we note that the following changes in Rv when PA is smaller: PA Rv 0.60 0.709 0.50 0.768 0.40 0.825 0.30 0.879 0.20 0.930 These findings indicate that even at lower concentrations and substan- tially higher average unit costs for this administrative source, the dual-frame approach produces reasonable gains over a cost-equivalent single-frame approach. 2. I have produced a wider range of findings for all of the statistical and process indicators just computed to more broadly illustrate comparative results for the dual-frame approach versus a cost- equivalent single-frame approach when police records are the ad- ministrative frame source for the dual frame. SOME FINAL THOUGHTS Admittedly, the utility of the comparative findings in this document is somewhat limited by several simplifying assumptions I have made, par- ticularly by (i) the use of a contrived two-stratum framework for the two overlapping frames of the dual-frame by screening out target population members from one frame in sampling the other, and (ii) the assumption of SRSWR sampling instead of further stratified multistage cluster sampling in each stratum,4 and (iii) considering only effects on sampling error instead of also including effects arising from other nonsampling sources errors such as nonresponse and measurement. Nonetheless, I believe that these preliminary findings strongly suggest that it would be worthwhile for BJS to more closely investigate the feasibility of using a dual-frame approach for estimating rates of RSA, particularly if these estimates are obtained from an independent RSA victimization survey as recommended by the panel. Finally, the panel’s suggestions accompanying a further investigation of the dual-frame might be to incorporate more realistic elements overlooked by my simplifying assumptions above. 4  Kalsbeek, Spencer, and House (2013) provide more information on the potential efficiency reductions expected from relaxing this assumption.

OCR for page 247