This chapter summarizes the workshop’s eighth session, which focused on evaluating the reliability and validity of current rural area classifications. Stephan Goetz and Yicheol Han (Pennsylvania State University) prepared a commissioned paper, Evaluation of Rural Area Classifications Using Statistical Modeling, for the workshop. Goetz described their results, and Mark Shucksmith (New Castle University) presented on evaluating the validity and reliability of the classifications via “ground-truthing.” The discussant was Carlianne Patrick (Georgia State University). Her discussion was followed by open floor discussion. The moderator was Mark Partridge (Ohio State University).
STATEMENT BY STEPHAN GOETZ1
Goetz noted he and Yicheol Han conducted an evaluation of the Rural-Urban Continuum Codes (RUCC), the Urban Influence Codes (UIC), the Rural-Urban Commuting Area (RUCA) Codes, and the Frontier and Remote (FAR) Codes using statistical modeling. They also constructed a measure of their own and evaluated it in a similar way. Their analysis asked the following questions:
- Are existing classifications still useful or should they be changed?
- Are there an optimal number of categories?
1This presentation is based on Goetz and Han (2015a).
- Are statistical methods useful for assessing the classification systems?
- Should alternative outcome measures, other than population growth and poverty, be considered for use in defining classifications and evaluating them?
Goetz and Han also explored whether there are now better tools for classifying counties in terms of their status along the rural-urban continuum. In particular, they applied principles from network science to start thinking about counties differently.
Statistical Performance of Existing Codes
Goetz stated that he and Han applied the adjusted R-square values as “goodness of fit” criterion for 1990, 2000, and 2010, and evaluated how well RUCC, UIC, RUCA, and FAR fit various dependent variables. The goal was to compare the individual codes against one another and also to compare them over time. Following the regional typology of the OECD, Goetz and Han focused on the employment/population ratio, an important variable that is correlated quite highly with poverty status. Using this ratio also results in a more dynamic measure of population. They found that the RUCA Code does reasonably well on existing (OECD) outcome measures. Perhaps not surprisingly, he commented, the UIC does well on the employment/population ratio but the goodness of fit values declined somewhat over time. Goetz observed that this might be one of the concerns that motivated this particular workshop.
He suggested that the decline over time in the fit of these regressions may be due to the fact that the nature and geographic distribution of the outcome variables are changing, rather than that the classification code is no longer relevant. To examine this possibility, Goetz and Han plotted the poverty rates obtained from the Small Area Income and Poverty Estimates (SAIPE) Program for the years 1989–2013, and the 2013 RUCC. They found the nation did quite well in reducing poverty in the period through most of the 2000s. At that time, poverty was mostly a rural problem. However, poverty has increased since 2007 and 2008. The point is that poverty has tended to shift. It has risen in urban areas and is now also prevalent in suburban areas.
Goetz said that evaluation using a few other variables showed that RUCA Codes perform very well on outcome measures such as population density, percentage rural population, and percentage farm area.
Optimal Classification of Codes
Goetz next explored whether any of the existing classification codes could be collapsed into fewer categories to generate a more simple classification scheme. He said he and Han concluded that collapsing the codes is not going to work consistently across variables and will not work consistently across time. Different socioeconomic variables would require a different reclassification, and such reclassifications would also have to change across time. The message, therefore, is that the codes are working well the way they are now, and collapsing them into fewer categories would not be straightforward. For these reasons, he suggested that reducing the number of categories of the classification codes is not feasible.
Alternative Classification Approach
Goetz and Han also explored a potential new classification system by applying network principles to community-level data to try to improve fit. He said that counties are positioned within commuting and potentially other networks in terms of information, commuter, resource, and other flows. He asked about the availability of better measures of access to economic opportunities, such as jobs and income, and diversity of labor markets.
Goetz reported that they used the 3,141 × 3,141 county matrix of commuting flows for the entire United States, based on Census Bureau data from 1990, 2000, and 2010, to develop two measures. Unlike the current measures, which allocate each county into a single labor market area (LMA), their measure allowed a county to belong to multiple LMAs, through the commuting links, and took the number of such LMAs into account.
One way is to calculate the number of distinct LMAs to which a county is connected, while allowing for the fact that LMAs may overlap. Using commuting data, one can calculate the number of LMAs to which a county belongs. Moreover, membership in more LMAs through commuting flows would provide more diverse economic opportunities and more stability over time. With their approach a county can be classified in terms of how many different LMAs it belongs to, and using a portfolio approach the hypothesis would be the larger the number of LMAs it belongs to, the more robust the economy might be. This is the Diversity measure.
The second way to look at labor markets is proximity to potential jobs as possibly introducing more economic opportunity to commuters. Higher gross payroll in a commuting destination ensures access to more potential income (jobs) and a larger scale of economic opportunities. Goetz and Han adopted a gravity model to measure a county’s access to
total earnings through commuting to other counties (i.e., total employment weighted by wages). This is referred to as the Proximity measure.
Developing a Network-Based County Code
Goetz said overlapping LMAs can be evaluated using network principles. For example, with the definition of the LMA that is currently used, the City of New York is classified as belonging to a single LMA, but in reality the City of New York is part of multiple overlapping LMAs. When labor markets overlap, it is possible to consider the membership of a particular county in different LMAs. The more two counties send workers to the same kinds of counties, like adjacent counties, the more similar they are.
As a next step in their exploratory analysis, Goetz and Han converted diversity and proximity, two dimensions of economic access to employment, into a single measure that can be viewed as a rural area classification code. They tentatively called it the Network-Based County Code (NBCC). Consistent with network terminology, they used three categories or types of counties. The top category, the Hub, includes counties connected to at least 12 other LMAs. The middle range, Hybrid, has between 7 and 12 LMAs. The Hinterland counties are those with connections to 1 through 7 LMAs. Lastly, there are the isolated counties with small potential earnings in terms of the gravity model and no proximity to other LMAs.
Goetz reiterated that they are using neither population, adjacency, nor metropolitan/nonmetropolitan status to develop their code. Their code is based purely on commuting flows.
Comparative Evaluation of NBCC
Goetz and Han compared the fit of the NBCC with existing outcome measures, as well as new measures not included in the previous comparison. Goetz said that their general conclusion is that the NBCC is not consistently better in terms of existing outcome measures. It does better for some variables, such as employment to population. The fit generally still declines over time for most variables, including employment, population, and population growth, as well as percent rural population and farm area code. One exception is population density.
Looking at the fit of the NBCC to other outcomes measures points to the question of what is being measured. For example, NBCC does well predicting social capital and quite well predicting poverty change. These dynamic measures are perhaps better explained or accounted for by the NBCC than the four ERS codes, Goetz said. Looking at economic mobility measures, it turns out that the most remote rural areas do better at
allowing children to move up the income ladder. The NBCC also does better than other existing measures in terms of correlation with the teenage birth rate, economic upward mobility, child poverty rate, and change in child poverty.
Goetz stated their conclusion is that the existing ERS measures continue to perform well. They could be tweaked, but tweaking would depend on the outcome measure selected. Their alternative measure, the NBCC that considers counties’ positions in the network, may offer a better goodness of fit, especially for measures of economic mobility or perhaps some dynamic measures such as poverty change.
STATEMENT BY MARK SHUCKSMITH
Shucksmith discussed an evaluation of the validity of rural area classification via “ground-truthing.” He noted there is no objective definition of rural and for some areas, current rural area classifications do not make sense. As examples, the Grand Canyon is classified as a metropolitan region, and the city of Inverness in Scotland is classified as rural, while the island of Arran is classified as an urban area. Shucksmith explained that issues of scale, boundaries, and data availability give rise to these problems. These issues are especially problematic when undertaking a cross-national analysis such as in the European Development Opportunities in Rural Areas (EDORA) project. The overarching aim of EDORA was to examine the process of delineation of rural areas across the 28 countries of the European Union (EU), in order to better understand how EU, national, and regional policy could enable these areas to build upon their specific potentials to achieve “smart, sustainable and inclusive growth” (Copus and Hornstrom, 2012). Developing a rural typology across the EU faced the challenge that each country has its own cultural idea of what rural is, their own definition of rural, and diversity in the data and variables available in national datasets. In practice, the analysis used a three-dimensional framework rather than a one-dimensional classification. The three dimensions were (1) urban/rural (remote/accessible), (2) economic structure (diversification), and (3) accumulation–depletion (performance). Each dimension had four categories.
Shucksmith summarized the results along the economic structure dimension. The four structure types were intermediate and predominantly rural (agrarian), consumption countryside, diversified with strong secondary sector, and diversified with strong market sector. The analysis revealed that the least successful areas were those most dependent on agriculture, while those doing best in terms of economic performance were the consumption countryside and those where the tertiary sector now dominates. Shucksmith explained this study used case studies to
investigate in more depth the issues facing each structure type and its stability and to confirm validity.
Shucksmith said the study leads to the question, relevant to this workshop, of whether mixed methods might be helpful, which he illustrated further citing a study of ward-based classification of rural housing markets in England (Shucksmith et al., 2012). This analysis adopted a theory-based approach, deciding a priori that demand, supply, and existing local housing opportunities would be modeled as three axes in a GIS-based principal components analysis. These were used to derive and map a typology of rural housing markets, the validity of which was then tested through qualitative case studies, undertaken by independent researchers. These case studies confirmed the analysis, while also adding depth and understanding of the processes: The different types of areas had very different problems, which required different policy responses.
Shucksmith noted the two fundamental questions about what is rural and what is the purpose of the classification have come up throughout this workshop. He described a 2004 effort by the Department for Environment, Food and Rural Affairs (DEFRA) to introduce a new rural-urban classification of census output areas (units of about 300 people) based on two axes: settlement size and sparsity of population, each of which is easy for the public to understand. Moreover, using these two dimensions of settlement size and sparsity was much more analytically precise, and often revealing or surprising. For example, housing is systematically less affordable as settlement size reduces. However, if a combined “rural” variable had been used, this relationship would have been masked by an offsetting sparsity effect. The simple two-dimensional rural classification/ definition proved much more helpful in understanding rural areas’ characteristics, and in raising questions about underlying processes of change. This rural definition was also useful in many other spheres, he said, such as in many official datasets by the Commission for Rural Communities and in annual reports titled The State of the Countryside.
Shucksmith questioned the course of action if an analysis contradicts ground truths. Does it make the analysis invalid, he asked, or, on the contrary, might an analysis that runs counter to perceived wisdom be iconoclastic “myth-busting”? Shucksmith reiterated that the role of research is to challenge and reveal power-infused discourses, not only to confirm them.
He closed with the following thoughts about the validity of rural classifications. First, he suggested mixed methods to triangulate and enrich analysis. Second, he said to engage in knowledge exchange by conducting research with rural communities and respecting their forms of knowledge alongside the expert knowledge of scientists. Third, he suggested exploring all three dimensions of rural space—localities, representations, and
everyday lives. Finally, he urged being critical and reflexive, remembering that all knowledge is power-infused, and ask who gains from any suggested rural classification scheme.
STATEMENT BY CARLIANNE PATRICK
Evaluating Statistical Systems
Patrick said that a statistical study to evaluate the validity and reliability of classification methods should inform whether the classification system provides a valid and reliable measure or categorization of counties’ urban or rural status. The parameters of evaluation thus depend critically upon what is meant by valid, reliable, urban, and rural, she said. In other words, what is the purpose of the classification system?
She reiterated that the purpose is one of the critical points to answer before redesigning a classification system. As has been discussed, she noted it is unlikely that one classification system will serve everyone’s purpose all the time. If the purpose of the classification system is simply to refine and further characterize urban-rural according to Census definitions of integration with economic agglomerations of various sizes, then the current classification systems do that. If instead, it is supposed to be valid or reliable in that it captures a general sense of rurality or of urbanization, that is something different, although probably highly correlated with types and intensity of land use.
If the classifications are valid and reliable and describe a group of counties similar at the time of measurement with respect to things usually thought of when describing rural or urban, then evaluation can be viewed in one particular sense. If instead there is a belief that classifications are valid and reliable if they describe a group of counties that behave in a similar way with respect to changes in population and population characteristics, employment, and economic structure, this might be a slightly different set of characteristics, she said. It is very possible that places that look the same at one point in time also change in a similar way. But she said it is possible that places that look the same at a particular point in time due to one measure, like population size, behave differently over time based upon one of these other factors.
When thinking about an evaluation using statistical modeling, the selection of the dependent variables is very important because they address different concepts of “validity” and “reliability,” Patrick said. The Goetz and Han study evaluates the correlation between current classification systems and a necessarily small set of potential dependent variables. Some of these are single point in time variables, others are change variables. Some might be thought of as outcomes, rather than descriptions
of rural and urban status. That leads to the question, she said, whether these variables represent the things the classification system is intended to explain, and whether they capture the desired degrees of rural and urban. The idea behind using a statistical model is that these things are going to be a function of the information captured by the classification system. Each dependent variable is describing somewhat different concepts of rural and urban.
Alternative Classification Evaluation Methods
Patrick pointed out that Goetz and Han created the NBCC as an alternative classification system and compared its performance against the current classification systems. One strength of their classification, she said, is that it is based on county-networked labor markets, making it a very county-centric rather than metro-centric concept. The measure also incorporates information on proximity to earnings, defined by commuting patterns and distance. One of the things she said she was not clear about is the construction of the proximity measures. Specifically, she asked about earnings as the appropriate metric.
The NBCC rural and urban categories are based upon labor market and proximity thresholds, and she asked how the thresholds were chosen. Evaluation of this type does not necessarily illuminate whether other thresholds could do better or how many categories are optimal. If it is agreed that they are the right dependent variables to describe rural/ urban, then this analysis could be considered a baseline for comparing alternatives, perhaps ones with other thresholds to try to construct measures in order to get more explanatory power.
She pointed out other methods for evaluation using statistical methods include calculating parametric estimates of current and proposed components of classification systems to determine relative explanatory power, or doing nonparametric work and letting the data speak. If the purpose can be articulated in terms of what the classification is to explain and the variables to be predicted, then nonparametric methods like kernel density estimation or locally weighted regression can be used, she suggested. Alternatively, with the same understanding, one could do a maximum likelihood grid search.
Patrick said that nonparametric techniques could help determine where the natural groupings of different types are, the relationships between variables used to define rural and urban, and the important outcomes to change. It would be possible to create categories based on the thresholds identified by these natural groupings and inflection points. Then, she said, an analysis similar to that of Goetz and Han could see if these grouping perform better.
Another possibility she suggested is to use maximum likelihood grid search methods to identify threshold values. This involves iterating through every possible threshold value. The value of the likelihood functions can be used to determine which threshold best fits the data. This process can be repeated for any number of groupings. Comparing specifications for different numbers of thresholds, it is possible to see how much additional explanatory power comes from adding categories and choosing the optimal number of groupings. Ground-truthing and qualitative methods could verify validity of thresholds identified by nonparametric and grid source methods.
Patrick added that she would expect the outcomes evaluated in the workshop discussion thus far (e.g., population growth, employment to population ratios, poverty, social capital outcomes, and other outcomes) to behave similarly as technology and preferences change for places that are characterized by similar population size and density and the degree of connection and access. In addition, analysts might consider alternative ways in which connection and access might be captured. Research suggests connection and access might be captured by distance to agglomerations of a particular size or density to reflect access to different functions and specialties that might be available within those agglomerations. In addition to commuting patterns, or perhaps even instead of commuting patterns, analysts might think about input/output relationships as one measure of access to these types of goods and services.
David Brown asked Shucksmith to explain co-production of data. Shucksmith responded that he discussed co-production of knowledge, rather than co-production of data. He explained it as academic experts recognizing the complementary expertise of people living in rural or urban areas. Working together helps the process of producing results, he said.
John Logan commented on the value of a session about ground-truthing and validation, but he pointed out Patrick’s observation that validating a classification requires knowing what variable it is going to have to be ground-truthed against. He referred to Douglas O’Brien’s earlier point that the government uses classifications to target resources to places that have needs and a lack of capacity to support themselves. He expressed concern about the accurate measurement of needs for all counties and subcounty areas, and a lack of clarity about the concept of government capacity. There are many sources of data, not necessarily at a national level, for some parts of the country. Rurality may be defined based on on size and density because they can be measured, or
it may be because people believe size and density reflect some concept of rurality. Logan said that he does not think the important question is whether size and density, connectiveness, or other characteristics are highly associated with rurality. He said the question should be if they are highly associated with need and capacity so that the government can provide services to people. Or, from his own perspective, given where people live, what resources are available and what are the opportunity structures? Shucksmith responded there is a lot of qualitative research on the elements of capacity, but the question is whether data are attached to the research. He cautioned against the danger of going where the data lead regardless of relevance.
Stephan Goetz suggested that measures such as social capital pick up some of these factors. He referred to Linda Lobao’s comment that while costly, it is possible to get measures of capacity for all counties. He observed O’Brien also alluded to the difficulty of showing impact in communities. It is not just the need and the lack of government, Goetz commented, it is also how to show that a new water system or other investment pays for itself.
John Pender commented it is important to distinguish between a concept of interest and what is thought that concept or construct affects. He said O’Brien’s hypothesis is that being in rural areas means less access to needed goods or services. That may be true at one point but could change over time. It could be that poverty may have been greater in rural areas 50 years ago than it is today.
David Plane said that he liked the Goetz and Han (2015b) nonmutually exclusive view of commuting patterns. In previous work in New England, he found a typical New Englander in the 1970 Census lived in the commuting sheds of four to seven major metropolitan areas. He suggested that perhaps Goetz should try using a different data source, for example the Longitudinal Employer-Household Dynamics (LEHD) database. He noticed from Goetz’s analysis that big counties, if they had bigger places in multiple directions away from them, tended to score high and smaller counties did not have that possibility. If that could be downscaled to individuals, he said he wondered about the impact. He said that one of his concepts about rurality is that people who live in rural areas have to go to multiple places to obtain what they need, including work.
Goetz referred to an earlier comment that there may not be many gains in going down to smaller geographies. He said he is intrigued by the prospect of looking at commuting by industry. He said another opportunity is based on central place theory in terms of commuting flow or migration flow. He asked what additional insights might be gained by using some of the tools available for network analyses that are applied in cental place theory. Rose Olfert observed she and her colleagues looked
at commuting sheds around different places in the urban hierarchy in the prairies. They found systematically larger commuting sheds around lower-order places.
David Brown said almost all the quantitative analysis discussed at this workshop has used aggregate-level data. In looking at things like commuting, a question about using microlevel data is if people who move from urban to rural areas keep their jobs in the urban areas. Brown also said the issue of retirement migration is about life course transitions, which he said has been missing from the discussion. He asked how understanding microprocesses at the household and individual level fit into changing macro social structures. Goetz commented that he and Han had not done analysis at the individual level, but they have looked at overlapping commuting and migration networks. He asked about people’s motivation. If researchers had individual data, it would be even better, but he and Han think they can tease some of that out with migration data.
Michael Ratcliffe stated he and Alan Murray both mentioned the use of cell phone data as indicators of individuals and their movements. Some navigation companies are making their data available, and it is possible to get data at the individual level.
Lobao pointed to a tension that she felt throughout this workshop. Demographers are focused on settlement processes. In one way, like urban and rural codes, they historically have looked at settlement processes, population flows, and movements across the subnational United States. She noted a secondary, or perhaps even a primary, question raised during the workshop: Why are settlement patterns being measured? Is it to measure distress, poverty, or gaps in resources? She suggested that this question be elevated.
As a follow-on, Brown noted the workshop was motivated by a belief that context matters; where a person lives and works affects his or her life chances and opportunities. Brown stated what is needed is to figure out whether it makes a difference where a person lives. Responding to these comments, Richelle Winkler identified two different purposes. One is to know about the purpose of a classification scheme, which should be to measure capacity and need. On the other hand, a measure of density and proximity may help understand whether a variable is related to need or capacity, and how the relationshop changes over time. She said that the current classification systems are set up to address these two fundamentally different things.
This page intentionally left blank.