Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
14 3.1 Introduction Many data are required for model development, validation, and application. This chapter briefly describes the data used for these functions. Model application data primarily include socioeconomic data and transportation networks. These data form the foundation of the model for an area, and if they do not meet a basic level of accuracy, the model may never adequately forecast travel. When preparing a model, it is wise to devote as much attention as necessary to developing and assuring the quality of input data for both the base year and for the forecast years. This chapter provides an overview of primary and secondary data sources and limitations of typical data. 3.2 Socioeconomic Data and Transportation Analysis Zones Socioeconomic data include household and employment data for the modeled area and are usually organized into geographic units called transportation analysis zones (TAZs, sometimes called traffic analysis zones or simply zones). Note that some activity-based travel forecasting models operate at a more disaggregate level than the TAZ (for example, the parcel level); however, the vast majority of models still use TAZs. The following discussion of data sources is applicable to any level of model geography. TAZ boundaries are usually major roadways, jurisdictional borders, and geographic boundaries and are defined by homo- geneous land uses to the extent possible. The number and size of TAZs can vary but should generally obey the following rules of thumb when possible: â¢ The number of residents per TAZ should be greater than 1,200, but less than 3,000; â¢ Each TAZ should yield less than 15,000 person trips per day; and â¢ The size of each TAZ should be from one-quarter to one square mile in area. The TAZ structure in a subarea of particular interest may be denser than in other areas further away. It is important that TAZs are sized and bounded properly (Cambridge Sys- tematics, Inc. and AECOM Consult, 2007). In general, there is a direct relationship between the size and number of zones and the level of detail of the analysis being performed using the model; greater detail requires a larger number of zones, where each zone covers a relatively small land area. TAZs are typically aggregations of U.S. Census geo- graphic units (blocks, block groups, or tracts with smaller units preferred), which allows the use of census data in model development. To facilitate the use of U.S. Census data at the zonal level, an equivalency table showing which zones correspond with which census units should be constructed. Table 3.1 provides a brief example of such a table. Once the zone system is developed and mapped and a census equivalency table is constructed, zonal socioeconomic data can be assembled for the transportation planning process. Estimates of socioeconomic data by TAZ are developed for a base year, usually a recent past year for which neces- sary model input data are available and are used in model validation. Forecasts of socioeconomic data for future years must be developed by TAZ and are estimated based on future land use forecasts prepared either using a manual process or with the aid of a land use model. As a key input to the travel demand model, the accuracy of socioeconomic forecasts greatly affects the accuracy of a travel demand forecast. 3.2.1 Sources for Socioeconomic Data Data availability and accuracy, the ability to make periodic updates, and whether the data can be reasonably forecast into the future are the primary criteria in determining what data C h a p t e r 3 Data Needed for Modeling
15 will be used in a model.1 With that consideration and the understanding that in some cases it may be an objective to gather base year data for other planning purposes, the fol- lowing sources should be evaluated. In general, population and household data come from the U.S. Census Bureau and employment data from the Bureau of Labor Statistics (BLS, part of the United States Department of Labor), as well as their equivalent state and local agencies. Many of the pro- grams are collaborations between the two federal agencies. Socioeconomic input data are also available from a number of private vendors. Population and Households Four major data sources for population and household information are described in this subsection: decennial U.S. Census, American Community Survey (ACS), ACS Public Use Microdata Samples (PUMS), and local area population data. Decennial U.S. Census. The decennial census offers the best source for basic population and household data, including age, sex, race, and relationship to head of household for each individual. The census also provides data for housing units (owned or rented). These data are available at the census block level and can be aggregated to traffic zones. The decennial census survey is the only questionnaire sent to every Ameri- can household with an identifiable address. The 2010 Census is the first since 1940 to exclude the âlong form.â Previously, approximately one in every six households received the long form, which included additional questions on individual and household demographic characteristics, employment, and journey-to-work. The absence of the long form means that modelers must obtain these data (if available) from other sources, such as the American Community Survey (see below). American Community Survey. The ACS has replaced the decennial census long form. Information such as income, education, ethnic origin, vehicle availability, employment status, marital status, disability status, housing value, housing costs, and number of bedrooms may be obtained from the ACS. The ACS content is similar to the Census 2000 long form, and questions related to commuting are about the same as for the long form, but the design and methodology differ. Rather than surveying about 1 in every 6 households once every 10 years, as had been done with the long form, the ACS samples about 1 in every 40 addresses every year, or 250,000 addresses every month. The ACS uses household addresses from the Census Master Address File that covers the entire country each year. The ACS thus samples about 3 million households per year, translating into a less than 2.5 percent sample per year. As a result of the smaller sample size, multiple years are required to accumulate sufficient data to permit small area tabulation by the Census Bureau in accordance with its disclosure rules. Table 3.2 highlights the ACS products, including the population and geography thresholds associated with each period of data collection. The sample size for the ACS, even after 5 years of data collection, is smaller than the old census long form. Thus, ACSâs 5-year estimates have margins of error about 1.75 times as large as those associated with the 2000 Census long form estimates, and this must be kept in mind when making use of the data. AASHTO and the FHWA offer Internet resources providing additional detail on ACS data and usage considerations. ACS Public Use Microdata Samples. The Census Bureau produces the ACS PUMS files so that data users can create custom tables that are not available through pretabulated data products (U.S. Census Bureau, 2011a). The ACS PUMS files are a set of untabulated records about individual people or housing units. PUMS files show the full range of popu- lation and housing unit responses collected on individual ACS questionnaires. For example, they show how respondents answered questions on occupation, place of work, etc. The PUMS files contain records for a subsample of ACS housing units and group quarters persons, with information on the characteristics of these housing units and group quarters persons plus the persons in the selected housing units. The Census Bureau produces 1-year, 3-year, and 5-year ACS PUMS files. The number of housing unit records contained in a 1-year PUMS file is about 1 percent of the total in the nation, or approximately 1.3 million housing unit records and about 3 million person records. The 3-year and 5-year ACS PUMS files are multiyear combinations of the 1-year PUMS files TAZ Census Block 101 54039329104320 101 54039329104321 101 54039329104322 102 54039329104323 102 54039329104324 Source: Martin and McGuckin (1998). Table 3.1. Example TAZ to Census geography equivalency table. 1The explanatory power of a given variable as it relates to travel behavior must also be considered; however, such consideration is subordinate to the listed criteria. A model estimated using best-fit data that cannot be forecast beyond the base year, for example, provides little long-term value in forecasting.
16 with appropriate adjustments to the weights and inflation adjustment factors. They typically cover large geographic areas with a population greater than 100,000 [Public Use Microdata Areas (PUMAs)] and, therefore, have some limits in applica- tion for building a socioeconomic database for travel fore- casting, but can be helpful because of the detail included in each record. PUMS data are often used as seed matrices in population synthesis to support more disaggregate levels of modeling (such as activity-based modeling). PUMS users may also benefit from looking at Integrated PUMS (IPUMS), which makes PUMS data available for time series going back over decades with sophisticated extract tools. Local area population data. Some local jurisdictions collect and record some type of population data. In many metropolitan areas, the information is used as base data for developing cooperative population forecasts for use by the MPO as travel model input. Employment Obtaining accurate employment data at the TAZ level is highly desirable but more challenging than obtaining house- hold data for a number of reasons, including the dynamic nature of employment and retail markets; the difficulty of obtaining accurate employee data at the site level; and lack of an equivalent control data source, such as the U.S. Census, at a small geographic level. Six potential sources of data are discussed in this subsection. Quarterly Census of Employment and Wages. Pre- viously called ES-202 data, a designation still often used, the Quarterly Census of Employment and Wages (QCEW) provides a quarterly count of employment and wages at the establishment level (company names are withheld due to con- fidentiality provisions), aggregated to the county level and higher (state, metropolitan statistical area). Data are classified using the North American Industry Classification System (NAICS). The QCEW is one of the best federal sources for at-work employment information. State employment commissions. State employment commissions generally document all employees for tax pur- poses. Each employer is identified by a federal identification number, number of employees, and a geocodable address usually keyed to where the payroll is prepared for the specified number of employees. Current Population Survey. The Current Population Survey (CPS) is a national monthly survey of about 50,000 households to collect information about the labor force. It is a joint project of the Census Bureau and the BLS. The CPS may be useful as a comparison between a local areaâs labor force characteristics and national figures. Market research listings. Many business research firms (e.g., Infogroup, Dun and Bradstreet, etc.) sell listings of all (or major) employers and number of employees by county and city. These listings show business locations by street addresses, as well as post office boxes. Longitudinal EmployerâHousehold Dynamics. Longi- tudinal EmployerâHousehold Dynamics (LEHD) (U.S. Census Bureau, 2011b) is a program within the U.S. Census Bureau that uses statistical and computational techniques to com- bine federal and state administrative data on employers and employees with core Census Bureau censuses and surveys. LEHD excludes some employment categories, including self- employed and federal workers, and data are not generated for all states (i.e., Connecticut, Massachusetts, and New Hampshire as well as the District of Columbia, Puerto Rico, and the U.S. Virgin Islands as of September 2011). Users of LEHD should also be mindful of limitations with the methodology used to assemble the data, including the use of Minnesota data as the basis for matching workers to workplace establishments and the match (or lack of match) with Census Transportation Plan- ning Products (discussed below). Murakami (2007) provides Table 3.2. ACS data releases. Data Product Population Threshold Geographic Threshold Years Covered by Planned Year of Release 2010 2011 2012 2013 1-year estimates 65,000+ PUMAs, counties, large cities 2009 2010 2011 2012 3-year estimates 20,000+ Counties, large cities 2007 2009 2008â2010 2009â2011 2010â2012 5-year estimates All areas* Census tracts, block groups in summary file format 2005â â 2009 2006â2010 2007â2011 2008â2012 *5-year estimates will be available for areas as small as census tracts and block groups. Source: U.S. Census Bureau.
17 an examination and discussion of LEHD issues for transpor- tation planners. The LEHD Quarterly Workforce Indicators (QWI) report is a useful source for modelers, particularly as a complement to the QCEW. Local area employment data. Few areas record employ- ment data other than a broad listing of major employers with the highest number of employees locally, typically reported by a local chamber of commerce or similar organization. Special Sources Census Transportation Planning Products. Previously called the Census Transportation Planning Package, the Census Transportation Planning Products (CTPP) Program (AASHTO, 2011) is an AASHTO-sponsored data program funded by member state transportation agencies and oper- ated with support from the FHWA, Research and Innovative Technology Administration, FTA, U.S. Census Bureau, MPOs, state departments of transportation (DOTs), and the TRB. CTPP includes tabulations of interest to the transportation community for workers by place of residence, place of work, and for flows between place of residence and place of work. CTPP are the only ACS tabulations that include flow infor- mation. Examples of special dimensions of tabulation include travel mode, travel time, and time of departure. CTPP are most frequently used as an observed data source for comparison during model validation, but are sometimes used as a primary input in model development, particularly in small areas where local survey data are unavailable. The previous CTPP tabulations were based on the decennial census long form. The CTPP 2006 to 2008 is based on the ACS and is available at the county or place level for geography meeting a population threshold of 20,000. The CTPP 2006 to 2010, anticipated to be available in 2013, will provide data at the census tract, CTPP TAZ, and CTPP Transportation Analysis District (TAD) levels. ACS margin of error considerations apply to the CTPP. Aerial photography. Often aerial or satellite photo- graphs available at several locations on the Internet can be used to update existing land use, which can then be used as a cross-check in small areas to ensure that population and employment data are taking into account changes in land use. It is crucial to know the date of the imagery (when the pictures were taken) prior to using it for land use updates. Aerial photography is also useful in network checking, as discussed later. Other commercial directories. Some commercial direc- tories provide comprehensive lists of household and employ- ment data sorted by name and address. For households, such information as occupation and employer can be ascertained from these sources. For business establishments, type of businessâincluding associations, libraries, and organizations that may not be on the tax fileâcan be determined. Other commercial databases provide existing and forecasted house- holds and employment by political jurisdictions. Other sources. Data on school types, locations, and enrollment are typically obtained directly from school districts and state departments of education (DOE). Large private schools might have to be contacted directly to obtain this information if the state DOE does not maintain records for such schools. 3.2.2 Data Source Limitations Population The main data source to establish a residential database is the decennial census. Other sources do not provide com- parable population statistics by specific area (i.e., block level). Often, the base year for modeling does not conform to a decennial census. In that case, data from the decennial cen- sus should be used as the starting point and updated with available data from the census and other sources to reflect the difference between the decennial census year and the base year. Employment Each of the previously identified data sources has some deficiency in accurately specifying employment for small geographic areas: â¢ The census provides total labor force by TAZ; however, this represents only employment location of residents and not total employment. â¢ The census also shows labor force statistics by industry group but does not compile this by employer and specific geographic area (i.e., block). â¢ The CTPP counts employed persons, not jobs. For persons with more than one job, characteristics on only the principal job are collected. â¢ Considerations regarding margin of error apply to use of CTPP or ACS data (or any data for that matter). â¢ The employment commission data may provide accurate employment for each business but only partially list street addresses. â¢ Market research listings have all employers by street address. Although these listings are extensive, the accuracy is con- trolled internally and often cannot be considered com- prehensive (because of the lack of information regarding
18 collection methodology), but it offers a check for other data sources. â¢ The land use data obtained from aerial photography pro- vide a geographic location of businesses but do not provide numbers of employees. â¢ Employment commission data (as well as other data on employers) often record a single address or post office box of record; employee data from multiple physical locations may be aggregated when reported (i.e., the headquarters of a firm may be listed with the total employment combined for all establishments). â¢ Government employment is not included in some data sources (including market research listings) or is included incompletely. Government employment sites are often either double-counted in commercially available data sources or âlumpedâ (i.e., multiple sites reported at one address). For example, public school employees are not always assigned to the correct schools. Employment data are the most difficult data component to collect. None of the data sources alone offers a complete inventory of employment by geographic location. Therefore, the methodology for developing the employment database should be based on the most efficient and accurate method by which employment can be collected and organized into the database file. All data must be related to specific physical locations by geocoding. Planning for supplementary local data collection remains the best option for addressing deficiencies in source data on employment; however, this effort must be planned several years in advance to ensure that resources can be made available for survey development, administration, and data analysis. For all sources of socioeconomic data, users must be aware of disclosure-avoidance techniques applied by the issuing agency and their potential impact on their use in model development. 3.2.3 Base and Forecast Year Control Totals for the Database The control totals for the database should be determined before compilation of the data. The source of the control totals for population should be the decennial census. Control totals for employment at the workplace location are more difficult to establish; however, the best source is usually the QCEW or state employment commission data. When the most recent census data are several years old, it may be desirable to have a more recent base year for the model, especially in faster growing areas. This means that some data may not be available at the desired level of detail or segmentationâfor example, the number of households for a more recent year may be available, but not the segmentation by income level. Analysts often use detailed information from the most recent year for which it is available to update segmentations, such as applying percentages of households by segment from the census year to the total number of households for a more recent year. In some cases, estimates of totals (for example, employment by type) may not be available at all for the base year. Other data sources, such as building permits, may be used to produce estimates for more recent years, building upon the known information for previous years. Census data are, of course, unavailable for forecast years. Some of the agencies discussed aboveâas well as state agen- cies, counties, and MPOsâproduce population, housing, and employment forecasts. Such forecasts are often for geo- graphic subdivisions larger than TAZs, and other types of segmentation may also be more aggregate than in data for past years. This often means that analysts must disaggregate data for use as model inputs. Data are typically disaggregated using segmentation from the base year data, often updated with information about land use plans and planned and proposed future developments. 3.3 Network Data The estimation of travel demand requires an accurate rep- resentation of the transportation system serving the region. The most direct method is to develop networks of the system elements. All models include a highway network; models that include transit elements and mode choice must also include a transit network. Sometimes, a model includes a bicycling or a walk network. Accurate transportation model calibra- tion and validation require that the transportation networks represent the same year as the land use data used to estimate travel demand. 3.3.1 Highway Networks The highway network defines the road system in a manner that can be read, stored, and manipulated by travel demand forecasting software. Highway networks are developed to be consistent with the TAZ system. Therefore, network coding is finer for developed areas containing small zones and coarser for less-developed areas containing larger zones. The types of analyses, for which the model will be used, determine the level of detail required. A rule of thumb is to code in roads one level below the level of interest for the study. One high- way network may be used to represent the entire day, but it may be desirable to have networks for different periods of the day that include operational changes, such as reversible lanes or peak-period HOV lanes. Multiple-period networks can be stored in a single master network file that includes
19 period or alternative-specific configurations for activation and deactivation. Each TAZ has a centroid, which is a point on the model network that represents all travel origins and destinations in a zone. Zone centroids should be located in the center of activ- ity (not necessarily coincident with the geographic center) of the zone, using land use maps, aerial photographs, and local knowledge. Each centroid serves as a loading point to the highway and transit systems and, therefore, must be connected to the model network. Sources for Network Data Digital street files are available from the Census Bureau (TIGER/Line files), other public sources, or several commercial vendors and local GIS departments. Selecting the links for the coded highway network requires the official functional clas- sification of the roadways within the region, the average traf- fic volumes, street capacities, TAZ boundaries, and a general knowledge of the area. Other sources for network development include the FHWA National Highway Planning Network, Highway Performance Monitoring System (HPMS), Freight Analysis Framework Version 3 (FAF3) Highway Network, National Transportation Atlas Database, and various state transportation networks. All of these resources may be use- ful as starting points for development or update of a model network. However, there are limitations with each in terms of cartographic quality; available network attributes; source year; and, especially with commercial sources, copyrights, which should be considered when selecting a data source to use. In states where the state DOT has a database with the road- way systems already coded, the use of the DOTâs coded net- work can speed up the network coding process. Questions can be directed to the DOT; and such a working relationship between DOT and MPO helps the modeling process because both parties understand the network data source. Highway Network Attributes Highway links are assigned attributes representing level of service afforded by the segment and associated inter- sections. Link distance based on the true shape of the road- way (including curvature and terrain), travel time, speed, link capacity, and any delays that will impact travel time must be assigned to the link. Characteristics, such as the effect of traffic signals on free-flow travel time, should be considered (see Parsons Brinckerhoff Quade & Douglas, 1992). Three basic items needed by a transportation model to determine impedance for the appropriate assignment of trips to the net- work are distance, speed, and capacity. Additional desirable items may include facility type and area type. Facility Type and Area Type The link attributes facility type and area type are used by many agencies to determine the free-flow speed and per-lane hourly capacity of each link, often via a two-dimensional look-up table. Area type refers to a method of classifying zones by a rough measure of land use intensity, primarily based on popula- tion and employment density. A higher intensity of land use generally means more intersections, driveways, traffic signals, turning movements, and pedestrians, and, therefore, slower speeds. Sometimes, roadway link speeds and capacities are adjusted slightly based on the area type where they are located. Common area type codes include central business district (CBD), CBD fringe, outlying business district, urban, sub- urban, exurban, and rural. The definition of what is included in each area type is somewhat arbitrary since each study area is structured differently. In some models, area type values are assigned during the network building process on the basis of employment and population density of the TAZ centroid that is nearest to the link (Milone et al., 2008). Note that, since area type definitions are aggregate and âlumpy,â their use in models may result in undesirable boundary effects. In many cases, use of continuous variables will be superior to use of aggregate groupings of zone types. Facility type is a designation of the function of each link and is a surrogate for some of the characteristics that deter- mine the free-flow capacity and speed of a link. Facility type may be different from functional classification, which relates more to ownership and maintenance responsibility of dif- ferent roadways. Table 3.3 provides common facility types used by some modeling agencies. Features, such as HOV lanes, tolled lanes, and reversible lanes, are usually noted in net- work coding to permit proper handling but may not be facility types per se for the purposes of typical speed/capacity look-up tables. Link Speeds Link speeds are a major input to various model compo- nents. The highway assignment process relates travel times and speeds on links to their volume and capacity. This pro- cess requires what are commonly referred to as âfree-flowâ speeds. Free-flow speed is the mean speed of passenger cars measured during low to moderate flows (up to 1,300 passenger cars per hour per lane). Free-flow link speeds vary because of numerous factors, including: â¢ Posted speed limits; â¢ Adjacent land use activity and its access control; â¢ Lane and shoulder widths;
20 â¢ Number of lanes; â¢ Median type; â¢ Provision of on-street parking; â¢ Frequency of driveway access; and â¢ Type, spacing, and coordination of intersection controls. Transportation models can use any of several approaches to simulate appropriate speeds for the links included in the network. Speeds should take into account side friction along the road, such as driveways, and the effect of delays at traffic signals. One way to determine the free-flow speed is to conduct travel time studies along roadways included in the network during a period when traffic volumes are low and little if any delay exists. This allows the coding of the initial speeds based on observed running speeds on each facility. Speed data are also available from various commercial providers (e.g., Inrix); and in some jurisdictions, speed information on certain facilities is collected at a subsecond level. An alternative approach is to use a free-flow speed look-up table. Such a table lists default speeds by area and facility type, which are discussed later. Although regional travel demand forecasting validation generally focuses on volume and trip length-related measures, there is often a desire to look at loaded link speeds and travel times. The analyst should be cognizant that âmodel timeâ may differ from real-world time due to the many network simplifications present in the modeled world, among other reasons. Looking at changes in time and speed can be infor- mative (e.g., by what percentage are speeds reduced/travel times increased). When looking at such information for the validation year, a variety of sources may be available for comparative purposes, including probe vehicle travel time studies, GPS data collection, and commercial data. Link Capacity In its most general sense, capacity is used here as a measure of vehicles moving past a fixed point on a roadway in a defined period of time; for example, 1,800 vehicles per lane per hour. In practice, models do not uniformly define capacity. Some models consider capacity to be applied during free-flow, un congested travel conditions, while others use mathematical formulas and look-up tables based on historical research on speed-flow relationships [e.g., Bureau of Public Roads (BPR) curves and other sources] in varying levels of congestion on different types of physical facilities. Throughout this report, the authors have tried to specify what is meant for each use of âcapacity.â The definitive reference for defining highway capacity is the Highway Capacity Manual (Transportation Research Board, 2010), most recently updated in 2010. âCapacityâ in a traffic engineering sense is not necessarily the same as the capac- ity variable used in travel demand model networks. In early travel models, the capacity variable used in such volume- delay functions as the BPR formula represented the volume at Level of Service (LOS) C; whereas, in traffic engineering, the term âcapacityâ traditionally referred to the volume at LOS E. The Highway Capacity Manual does contain use- ful information for the computation of roadway capacity, although many of the factors that affect capacity, as dis- cussed in the manual, are not available in most model high- way networks. Table 3.3. Typical facility type definitions. Facility Type Definition Link Characteristics Centroid Connectors Links that connect zones to a network that represent local streets or groups of streets. High capacity and low speed Freeways Grade-separated, high-speed, high-capacity links. Freeways have limited access with entrance and exit ramps. Top speed and capacity Expressways Links representing roadways with very few stop signals serving major traffic movements (high speed, high volume) for travel between major points. Higher speed and capacity than arterials, but lower than freeways Major Arterials Links representing roadways with traffic signals serving major traffic movements (high speed, high volume) for travel between major points. Lower speed and capacity than freeways and expressways, but more than other facility types Minor Arterials Links representing roadways with traffic signals serving local traffic movements for travel between major arterials or nearby points. Moderate speed and capacity Collectors Links representing roadways that provide direct access to neighborhoods and arterials. Low speed and capacity Ramps Links representing connections to freeways and expressways from other roads. Speeds and capacity between a freeway and a major arterial
21 Link capacities are a function of the number of lanes on a link; however, lane capacities can also be specified by facility and area type combinations. Several factors are typically used to account for the variation in per-lane capacity in a highway network, including: â¢ Lane and shoulder widths; â¢ Peak-hour factors; â¢ Transit stops; â¢ Percentage of trucks2; â¢ Median treatments (raised, two-way left turn, absent, etc.); â¢ Access control; â¢ Type of intersection control; â¢ Provision of turning lanes at intersections and the amount of turning traffic; and â¢ Signal timing and phasing at signalized intersections. Some models use area type and facility type to define per lane default capacities and default speed. The number of lanes should also be checked using field verification or aerial or satellite imagery to ensure accuracy. Some networks combine link capacity and node capacity to better define the characteristics of a link (Kurth et al., 1996). This approach allows for a more refined definition of capacity and speed by direction on each link based on the character- istics of the intersection being approached. Such a method- ology allows better definition of traffic control and grade separation at an intersection. Typical Highway Network Database Attributes The following highway network attributes are typically included in modeling databases: â¢ Node identifiers, usually numeric, and their associated x-y coordinates; â¢ Link identifiers, either numeric, defined by âAâ and âBâ nodes, or both; â¢ Locational information (e.g., zone, cutline, or screenline location); â¢ Link length/distance; â¢ Functional classification/facility type, including the divided or undivided status of the linkâs cross section; â¢ Number of lanes; â¢ Uncongested (free-flow) speed; â¢ Capacity; â¢ Controlled or uncontrolled access indicator; â¢ One-way versus two-way status; â¢ Area type; and â¢ Traffic count volume (where available). 3.3.2 Transit Networks Most of the transit network represents transit routes using the highways, so the highway network should be complete before coding transit. Transit network coding can be complex. Several different modes (e.g., express bus, local bus, light rail, heavy rail, commuter rail, bus rapid transit) may exist in an area; and each should have its own attribute code. Peak and off-peak transit service likely have different service char- acteristics, including headways, speeds, and possibly fares; therefore, separate peak and off-peak networks are usually developed. The transit networks are developed to be consistent with the appropriate highway networks and may share node and link definitions. Table 3.4 is a compilation of transit network characteristics that may be coded into a modelâs transit network. Charac- teristics in italics, such as headway, must be included in all networks, while the remaining characteristics, such as transfer penalty, may be needed to better represent the system in some situations. Transit networks representing weekday operations in the peak and off-peak periods are usually required for transit modeling; sometimes, separate networks may be required for the morning and afternoon peak periods, as well as the mid-day and night off-peak periods. The development of bus and rail networks begins with the compilation of transit service data from all service providers in the modeled area. Transit networks should be coded for a typical weekday situation, usually represented by service provided in the fall or spring of the year. Two types of data are needed to model transit service: schedule and spatial (the path each route takes). Although the data provided by transit operators will likely contain more detail than needed for coding a transit network, software can be used to calculate, for each route, the average headway and average run time during the periods for which networks are created. Transit Line Files Local bus line files are established âoverâ the highway network. Sometimes nodes and links, which are coded below the grain of the TAZ system, must be added to the highway network so that the proximity of transit service to zonal 2Facilities experiencing greater-than-typical truck traffic (say, greater than 5 percent for urban facilities; greater than 10 percent for nonurban facilities) have an effective reduction in capacity available for passenger cars (i.e., trucks reduce capacity available by their passenger car equiva- lent value, often a simplified value of 2 is used). Trucks in this context are vehicles F5 or above on the FHWA classification scheme, the standard Highway Capacity Manual definition.
22 activity centers can be more accurately represented. These subzonal highway links, which are used to more accurately reflect transit route alignments, should be disallowed from use during normal highway path-building and highway assign- ments. Local bus stops are traditionally coded at highway node locations. Transit line files can be designated for different types of ser- vice or different operators using mode codes, which designate a specific provider (or provider group) or type of service. Pre- mium transit line files that operate in their own right-of-way are coded with their own link and node systems rather than on top of the highway network. Some modeling software requires highway links for all transit links, thus, necessitating the coding of âtransit onlyâ links in the highway network. The modeler may not be provided with detailed characteristics for transit services that do not already exist in the modeled area and may need guidance with regard to what attribute values should be coded for these new services (FTA, 1992). Each transit line can be coded uniquely and independently so that different operat- ing characteristics by transit line can be designated. Transit line files contain information about transit lines, such as the headway, run time, and itinerary (i.e., the sequence of nodes taken by the transit vehicle as it travels its route). Some models compute the transit speed as a function of underlying highway speed instead of using a coded run time. Line files are time-of-day specific, so there is a set of line files for each time period for which a network is coded. One can usually designate stops as board-only or alight-only (useful for accurately coding express bus service). Similarly, one can code run times for subsections of a route, not just for the entire route; a feature useful for the accurate depiction of transit lines that undergo extensions or cutbacks, or which travel through areas with different levels of congestion. One can also store route-specific comments (such as route origin, route destination, and notes) in line files. Access Links It is assumed that travelers access the transit system by either walking or driving. Zone centroids are connected to the transit system via a series of walk access and auto access paths. In the past, modeling software required that walk access and auto access links be coded connecting each zone centroid to the transit stops within walking or driving distance. These Table 3.4. Transit network characteristics and definitions. Transit Network Characteristic Description Drive access link A link that connects TAZs to a transit network via auto access to a park-and-ride or kiss- and-ride location. Effective headway* The time between successive transit vehicles on multiple routes with some or all stops in common. Headway The time between successive arrivals (or departures) of transit vehicles on a given route. Local transit service Transit service with frequent stops within a shared right-of-way with other motorized vehicles. Mode number Code to distinguish local bus routes from express bus, rail, etc. Park-and-ride-to-stop link A walk link between a park-and-ride lot and a bus stop, which is used to capture out-of- vehicle time associated with auto access trips, and also for application of penalties asso- ciated with transfers. Premium transit service Transit service (e.g., bus rapid transit, light rail transit, heavy rail, commuter rail) with long distances between infrequent stops that may use exclusive right-of way and travel at speeds much higher than local service. Route description Route name and number/letter. Run time The time in minutes that the transit vehicle takes to go from the start to the finish of its route and a measure of the average speed of the vehicle on that route. Transfer link A link used to represent the connection between stops on two transit lines that estimates the out-of-vehicle time associated with transfers, and also for application of penalties associated with transfers. Transfer penalty Transit riders generally would rather have a longer total trip without transfers than a shorter trip that includes transferring from one vehicle to another; therefore, a penalty is often imposed on transfers to discourage excess transfers during the path-building process. Walk access link A link that connects TAZs to a transit network by walking from a zone to bus, ferry, or rail service; usually no longer than one-third mile for local service and one-half mile for premium service (some modeling software distinguishes access separately from egress). Walking link A link used exclusively for walking from one location to another. These links are used in dense areas with small TAZs to allow trips to walk between locations rather than take short transit trips. *Italics indicate characteristics that must be included in all networks.
23 separate access links are still seen, particularly in models that have been converted from older modeling software packages. Current modeling software generally allows walk or auto access paths to be built using the highway network links, including, where appropriate, auxiliary links that are not available to vehicular traffic (such as walking or bicycle paths). Walk paths are coded to transit service that is within walk- ing distance of a zone to allow access to and egress from transit service. The maximum walking distance may vary depending on urban area, with larger urban areas usually having longer maximum walk distances although generalizations about typical values could be misleading. The best source for deter- mining maximum walk distances is an on-board survey of transit riders. Some models may classify âshortâ and âlongâ walk distances. Auto access paths are used to connect zones with park-and- ride facilities or train stations. Auto access paths are coded for zones that are not within walking distance (as classified by that model) of transit service but are deemed to be used by transit riders from a zone. A rule-based approach (for example, maximum distance between the zone centroid and the stop) is often used to determine which zones will have auto access to which stops. Again, the best source for determining which zones should have auto access is an on-board survey of transit riders. Travel Times and Fares The time spent on transit tripsâincluding time spent riding on transit vehicles, walking or driving to and from transit stops, transferring between transit lines, and waiting for vehiclesâmust be computed. This computation is done by skimming the transit networks for each required variable (for example, in-vehicle time, wait time, etc.). In-vehicle times are generally computed from the network links represent- ing transit line segments, with speeds on links shared with highway traffic sometimes computed as a function of the underlying (congested) highway speed. Wait times are usu- ally computed from headways with one-half of the headway representing the average wait time for frequent service and maximum wait times often used to represent infrequent service where the travelers will know the schedules and arrange their arrival times at stops accordingly. Auto access/egress times are often computed from highway networks. Walk access/ egress times are sometimes computed assuming average speeds applied to distances from the highway networks. Transit fares used in the mode choice process must be computed. The process may need to produce multiple fare matrices representing the fare for different peak and off-peak conditions. This can be done in multiple ways. If the fare system is distance based, then transit fares can be calculated by the modeling software by skimming the fare over the shortest path just as the time was skimmed. Systems that use one fare for all trips in the study area can assign a fare to every trip using transit. More complex systems with multiple fare tariffs will require unique approaches that may be a combination of the previous two or require the use of special algorithms. Some transit systems require transfer fares that are applied whenever a rider switches lines or from one type of service to another. 3.3.3 Updating Highway and Transit Networks Transportation networks change over time and must be coded to represent not only current conditions for the base year, but also forecasting scenarios so that models can be used to forecast the impact of proposed changes to the highway network. Socioeconomic data and forecasts must also be updated, and these can affect network attributes (for example, area type definitions that depend on population and employ- ment density). It is good transportation planning practice to have a rel- atively up-to-date base year for modeling, particularly when there are major changes to the supply of transportation facili- ties and/or newer socioeconomic data available. Many of the same data sources, such as digitized street files, aerial photographs, and state and local road inventories, can be used to update the network to a new base year. A regionâs Transportation Improvement Program (TIP) and state and local capital improvement programs (CIPs) are also very useful for updating a network representing an earlier year to a more recent year. Traffic volumes and transit ridership coded in the network should also be updated for the new base year. Most MPOs and many local governments use models to evaluate short- and long-range transportation plans to determine the effect of changes to transportation facilities in concert with changes in population and employment and urban structure on mobility and environmental condi- tions in an area. Updating the transportation network to a future year requires some of the same data sources, as well as additional ones. In addition to TIPs and CIPs, master plans, long-range transportation plans, comprehensive plans, and other planning documents may serve as the source of net- work updates. 3.3.4 Network Data Quality Assurance Regardless of the sources, network data should be checked using field verification or an overlay of high-resolution aerials or satellite imagery.
24 Visual inspection cannot be used to verify certain link characteristics, such as speed and traffic volume, which may often be verified using databases and GIS files available from state DOTs or other agencies. One approach used to verify coded distances is to use the modeling software to build two zone-to-zone distance matrices: the first using airline distance calculated using the x-y coordinates for each centroid, and the second using the over-the-road distance calculated from paths derived using the coded distance on each link. If one matrix is divided by the other, the analyst can look at the results and identify situations where the airline distance is greater than the over-the-road distance, or where the airline distance is much lower. These situations should be investigated to determine if they are the result of a coding error. Coded speeds can be checked in a similar fashion by creat- ing skim trees (time between zone matrices) for each mode and dividing them by the distance matrix. Resulting high or low speeds should be investigated to determine if they are the result of coding errors. There are other data sources that may be used for reasonable- ness checking of roadway networks. For example, the HPMS has network data that may be used to check model networks. Quality assurance applies to transit networks, as well as highway networks. Local data sources may be available to check the networks against. For example, transit operators can often provide line-level data on run times, service hours, and service miles, which can be compared to model estimates of the same. The Travel Model Validation and Reasonableness Checking Manual, Second Edition (Cambridge Systematics, Inc., 2010b) includes detailed discussions of other transit network check- ing methods, including comparing modeled paths to observed paths from surveys and assigning a trip table developed from an expanded transit survey to the transit network. 3.4 Validation Data Model validation is an important component of any model development process. As documented in the Travel Model Validation and Reasonableness Checking Manual, Second Edition (Cambridge Systematics, Inc., 2010b), planning for validation and ensuring that good validation data are available are tasks that should be performed as an integral part of the model development process. Model validation should cover the entire modeling process, including checks of model input data and all model com- ponents. While reproduction of observed traffic counts and transit boardings may be important validation criteria, they are not sufficient measures of model validity. Adjustments can be made to any model to reproduce base conditions. Pendyala and Bhat (2008) provide the following comments regarding travel model validation: There is no doubt that any model, whether an existing four- step travel demand model or a newer tour- or activity-based model, can be adjusted, refined, tweaked, andâif all else failsâ hammered to replicate base year conditions. Thus, simply per- forming comparisons of base year outputs from four-step travel models and activity-based travel models alone (relative to base year travel patterns) is not adequate . . . the emphasis needs to be on capturing travel behavior patterns adequately from base year data, so that these behavioral patterns may be reasonably transferable in space and time. 3.4.1 Model Validation Plan The development of a model validation plan at the outset of model development or refinement is good model develop- ment practice. The validation plan should establish model validation tests necessary to demonstrate that the model will produce credible results. Such tests depend, in part, on the intended uses of the model. Validation of models intended for support of long-range planning may have increased focus on model sensitivity to key input variables and less focus on the reproduction of traffic counts or tran- sit boardings. Conversely, models intended for support of facility design decisions or project feasibility probably require a strong focus on the reproduction of traffic counts or transit boardings. The validation plan should identify tests and validation data for all model components. A good approach for the develop- ment of a validation plan is to identify the types of validation tests and the standards desired (or required) prior to identify- ing whether the required validation data are available. Then, once the tests and required data have been identified, the available validation data can be identified and reviewed. Data deficiencies can then be pinpointed and evaluated against their importance to the overall model validation, as well as the cost, time, and effort required to collect the data. 3.4.2 Example Model Validation Tests Ideally, model validation tests should address all model components. The list of tests shown in Table 3.5 was devel- oped by a panel of travel modeling experts who participated in the May 2008 Travel Model Improvement Program Peer Exchange on Travel Model Validation Practices (Cambridge Systematics, Inc., 2008b). The table is intended to provide examples of tests and sources of data that may be used to validate travel models.
(continued on next page) Table 3.5. Example primary and secondary model validation tests. Model Component Primary Tests Secondary Tests Potential Validation Data Sources Networks/Zones Correct distances on links Network topology, including balance between roadway network detail and zone detail Appropriateness of zone size given spatial distribution of population and employment Network attributes (managed lanes, area types, speeds, capacities) Network connectivity Transit run times Intrazonal travel distances (model design issue) Zone structure compatibility with transit analysis needs (model design issue) Final quality control checks based on review by end users Transit paths by mode on selected interchanges GIS center line files Transit on-board or household survey data Socioeconomic Data/Models Households by income or auto ownership Jobs by employment sector by geographic location Locations of special generators Qualitative logic test on growth Population by geographic area Types and locations of group quarters Frequency distribution of households and jobs (or household and job densities) by TAZ Dwelling units by geographic location or jurisdiction Households and population by land use type and land use density categories Historical zonal data trends and projections to identify âlargeâ changes (e.g., in autos/ household from 1995 to 2005) Census SF-3 data QCEW Private sources, such as Dun & Bradstreet Trip Generation Reasonableness check of trip rates versus other areas Logic check of trip rate relationships Checks on proportions or rates of nonmotorized trips Reasonableness check of tour rates Cordon lines by homogeneous land use type Chapter 4 of this report Traffic counts (or intercept survey data) for cordon lines Historic household survey data for region NHTS (2001 or 2009) Trip Distribution Trip length frequency distributions (time and distance) by market segments Worker flows by district District-to-district flows/desire lines Intrazonal trips External station volumes by vehicle class Area biases (psychological barrierâ e.g., river) Use of k-factors (Design Issue) Comparison to roadside intercept origin- destination surveys Small market movements Special groups/markets Balancing methods ACS/CTPP data Chapter 4 of this report Traffic counts (or intercept survey data) for screenlines Historic household survey data for region NHTS (2001 or 2009)
Table 3.5. (Continued). Model Component Primary Tests Secondary Tests Potential Validation Data Sources Mode Choice Mode shares (geographic level/market segments) Check magnitude of constants and reasonableness of parameters District-level flows Sensitivity of parameters to LOS variables/elasticities Input variables Mode split by screenlines Frequency distributions of key variables Reasonableness of structure Market segments by transit service Existence of âcliffsâ (cutoffs on continuous variables) Disaggregate validation comparing modeled choice to observed choice for individual observations Traffic counts and transit (or intercept survey data) for screenlines CTPP data Chapter 4 of this report Transit on-board survey data NHTS (2001 or 2009) Household survey data (separate from data used for model estimation) Transit Assignment Major station boardings Bus line, transit corridor, screenline volumes Park-and-ride lot vehicle demand Transfer rates Kiss-and-ride demand Transfer volumes at specific points Load factors (peak points) Transit boarding counts Transit on-board survey data Special surveys (such as parking lot counts) Traffic Assignment Assigned versus observed vehicles by screenline or cutline Assigned versus observed vehicles speeds/times (or vehicle hours traveled) Assigned versus observed vehicles (or vehicle miles traveled) by direction by time of day Assigned versus observed vehicles (or vehicle miles traveled) by functional class Assigned versus observed vehicles by vehicle class (e.g., passenger cars, single-unit trucks, combination trucks) Subhour volumes Cordon lines volumes Reasonable bounds on assignment parameters Available assignment parameters versus required assignment parameters for policy analysis Modeled versus observed route choice (based on data collected using GPS- equipped vehicles) Permanent traffic recorders Traffic count files HPMS data Special speed surveys (possibly collected using GPS-equipped vehicles) Source: Cambridge Systematics, Inc. (2008b). Time of Day of Travel Time of day versus volume peaking Speeds by time of day Cordon counts Market segments by time of day Permanent traffic recorder data NHTS (2001 or 2009) Historic household survey data for region Transit boarding count data