Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Management and Use of Data for Transportation Performance Management: Guide for Practitioners Reporting
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 27 Step 3 Store & Manage Data This step includes validating, cleaning, normalizing, aggregating and integrating data; storing the data in one or more repositories â either within the agency or âin the cloudâ; producing documentation needed for both technical and business users of the data; and managing access to the data â to both protect it from unauthorized use and to ensure that it is accessible to those who need it. This step also includes activities to design, develop, and manage databases and technical infrastructure for data storage and data integration. The key decisions that agencies must make are: â¢ where and how to store data, â¢ how to make sure data can be integrated across repositories as needed, â¢ which best practices should be implemented for QA and documentation, and â¢ how much data to keep. âData is just like crude. Itâs valuable, but if unrefined it cannot really be used.â Michael Palmer
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 28 Step 3.1 Establish Databases Design Databases to Support Analysis Needs. Performance measures rely on a deep archive of data to develop an accurate baseline, understand multi-year, seasonal trends, and establish reasonable targets. Database design supporting performance measures should consider requirements for reporting, trend analysis, and root cause analysis. Design should also consider the possibility that requirements may change over time â for example, an agency may decide to calculate different metrics, drawing on the same raw data sources. Therefore, both raw and transformed data may need to be stored. When raw data is voluminous (for example, pavement images), processed data can be maintained in active storage and the raw data can be kept in lower-cost archive storage. Determine Data Retention Policies. If retention policies are not modernized to reflect changes in storage costs, or if they are set without full understanding of business needs, there is a danger of loss of valuable data and TPM capability. Ten or more years ago, data storage hardware was both physically large and expensive. Therefore, agencies implemented data retention policies to better manage budgets and constrained physical space in data centers by limiting the amount of storage and the duration of the storage. Both the size and cost of storage have dropped dramatically over the years. With the exponential cost savings and available storage options, agencies can re-examine their retention policies to make sure they align with business needs. For example: â¢ An agency may be required to report performance of the system to the Federal or State Government in 15-minute intervals. That agency may be tempted to aggregate raw data coming in 1-minute intervals and only retain the aggregate information to save space. Later, the agency may identify a need to track incident management performance metricsârequiring the original 1-minute data that tracks growing and shrinking queue lengths, user delay, arterial signal performance, and the effects of secondary incidents. If the 1-minute data are gone, the agency may be unable to track that metric accurately (if at all). â¢ A data set may have limited value by itself and would be considered unimportant for retention. However, when combined with other MARC repurposed two open positions, including a âGIS specialistâ and a âdemographer,â into âdata developersâ positions capable of creating and managing systematic work flows for data gathering and organization. The developers have since created automated processes to obtain datasets and import them into SQL databases. The databases have front-end interfaces that greatly simplify the process of querying them to extract the information MARC needs. Case G
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 29 data sets, it may provide new and important performance measures insights that neither data set could in isolation. Data storage architectures can be designed to consider both current and future TPM data requirements. For example, cleaned and processed data that supports frequent and/or immediate TPM needs can be stored using faster and more accessible servers. Raw data used to generate those measures and calculations can be stored on lower-end servers or storage arrays locally or with a cloud-based back-up provider. Agency TPM personnel should work collaboratively with IT departments and records managers within the agency to draft more modern retention policies. Plan for both âBigâ and âSmallâ Data. As the concept of âbig dataâ becomes more prevalent and hyped, there is a potential for data proponents and decision makers to focus too much on big data and overlook the value of âsmall data.â For example, in an effort to utilize a big data set, agencies may invest in big data platforms that are not designed to handle smaller data sets. The push to keep on top of the latest technology trends can usurp resources from existing data and performance management programs and may not meet the agencyâs predominant requirements. How does an agency determine when data is big enough to require a big data storage approach? The true sign of a need for a big data platform is when traditional storage and processing techniques become inadequate for the specific usage needs that have been identified. Agencies should strike a balance in their investment in big and small data platforms, and work backwards from their use cases to the storage strategies. For example: â¢ It is appropriate to store some data sets in relational databases if they lend themselves well to normalization, indexing, joining, and database supported statistical analysis. Calculating incident clearance metrics can be done very effectively using a traditional database management system. â¢ On the other hand, complex analytics that parse through billions or even trillions of data points for identifying problem locations, prioritizing projects, computing user delay, or understanding the complexities of signal retiming efforts are computationally expensive and are not cost-effective in traditional relational databases. In order to fully leverage both types of data sets, agencies can develop simple interfaces that operate across both the traditional relational databases and big data platforms.
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 30 Agencies should have strategies for both big and small data, including a way to integrate across both types of data. A well-designed storage architecture can be flexible enough to accommodate both data sets as well as incremental changes as technology and data continue to develop. Identify Storage Options. In the past, agencies had to rely on in-house systems to store and process their data. As the concept of cloud computing becomes more mainstream, agencies are now presented with a choice between storing data in-house or in the cloud. There are also a variety of commercial and open source data storage options available for both cloud and on-premise. What is right for each agency is highly dependent on the TPM use case, agency policies (procurement, information technology, etc.), IT staff support, the type of data, and how frequently the data will be accessed. There are several general considerations each agency should address as they evaluate their options. Consider Commercial Cloud and Specialized Hosted Solutions. Commercial cloud providersâ services (like Amazon, Azure, etc.) appear to be very affordable when pricing is based solely on the amount of data to be stored. However, agencies need to consider more than just the size of their data when estimating costs. Pricing is also highly dependent on the number of transactions (or the number of times you access the data and process it), and how much bandwidth is used. For example, it may be inexpensive to load raw data into cloud, but very costly to extract it back when needed to calculate metrics to support TPM. The appropriate approach then may be to move data processing and metrics calculations to the cloud as well and avoid extraneous costs of downloading data each time it needs to be processed. Alternatively, agencies can rely on specialized hosted solutions from their partners: universities, consultants, and sister agencies. Partners may have a better understanding of the transportation and TPM domain and provide more cost-effective approaches for storing transportation data. While in-house solutions often provide more control, they also require a larger investment in workforce and physical infrastructure. However, storing data in the cloud may not be cost effective for highly transactive systems. It is important to consider all these aspects to avoid becoming a âhostageâ of a vendor or service or investing in internal system that becomes obsolete due to the inability to continue to innovate and stay current. New Jersey DOT uses outsourced cloud based hosting platform and subject matter expertise, to collect, store, and analyze complex data sets to support agency TPM efforts. Case H
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 31 Plan for Data Security. Implement a sound data back-up strategy that will allow you to restore data in the event of a hardware failure, cyber- attack or inability to physically access facilities. If your data contains personally identifiable information (PII) or other sensitive elements, it should be clearly categorized as sensitive and managed to prevent unauthorized access. TPM-related data that may be sensitive includes crash reports, travel survey data, and data from mobile devices. Some agencies have policies that do not allow sensitive data to be stored in the cloud. However, many cloud providers have a robust security policy to both prevent and recover from cybersecurity compromises. In contrast, agencies may have limited funds and expertise to implement robust security mechanisms. Maintain Metadata. While some data sets may be considered to be âself-explanatory,â metadata and documentation are critical. For example, highway crashes may appear to be a straightforward data set. On closer examination, you may find that data from one jurisdiction is gathered using different definitions for serious injuries than another. Data may be collected using a mixture of electronic and manual processes with different quality assurance processes applied. Newer data sets may be provisional and subject to further updates. Metadata and documentation become even more important when data is used in calculations to support TPM. Two individuals can use the same raw data and measure definition, but execute calculations differently depending on the context, and interpret the results completely differently. Metadata should be maintained at both the data set and data element level. Data set metadata covers information such as source, spatial and temporal scope, quality, and access classification. Data element metadata covers meaning, origins, usage, value domain and format. Standards for data set level metadata can be found in International Organization for Standardization (ISO) 19115 and the Office of Management and Budgetâs (OMB) Project Open Data (POD) Schema. Standards for data element level metadata can be found in ISO/IEC 11179. Proper metadata and documentation that is frequently updated and audited can ensure that confusion and interpretation variations are minimized. Metadata and documentation must be properly versioned so that data processing spanning different versions of metadata can be interpreted and processed properly.
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 32 For more information... 1. ISO 19115-Geographic Information-Metadata https://www.iso.org/standard/53798.html 2. ISO/IEC 11179-Information Technology-Metadata Registries http://metadata-standards.org/11179/ 3. NCHRP Project 17-75, Leveraging Big Data to Improve Traffic Incident Management (2019-publication forthcoming) http://apps.trb.org/cmsfeed/TRBNetProjectDisplay.asp?ProjectID= 4051 4. Integrating Emerging Data Sources into Operational Practice (FHWA, 2017) https://rosap.ntl.bts.gov/view/dot/34175 5. NCHRP Project 08-36 Task 130, Inventory and Assessment of Methods for Making Collected Transportation Data Anonymous (2016) http://onlinepubs.trb.org/onlinepubs/nchrp/docs/NCHRP08- 36(130)_FR.pdf.
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 33 Step 3.2 Load & Integrate Data Establish Repeatable Data Loading Processes. Ad-hoc data loading conducted in a rushed manner is a recipe for disaster. Repeatable processes need to be set up and ideally, automated to load and transform raw data into a form suitable for use. When (not if) a problem occurs with a data load, procedures should be in place to roll back and then repeat the process once the issue is identified. Sometimes, a series of loads are needed to refresh data in various repositories. For example, new bridge inspection data may be loaded into a staging database for review and quality assurance. The data may then be transferred to the bridge management system database for analysis, and to the agencyâs road inventory system. These data flows should be thoroughly tested, automated and well-documented. Accurate and detailed documentation is essential, especially when data loads occur infrequently and there are multiple systems and staff from different business units involved. Store Both Raw and Processed Data. Storing transformed performance data in addition to raw data can facilitate analysis and reporting. Make Use of Data Integration Tools. There is a wide array of commercial and open source tools available supporting data integration processes. Some tools are geared to building extract-transform-load processes for data warehouse environments; others are geared to big data sets. Several excellent tools focus on integrating geospatial data. Use of these tools requires expertise and involves a learning curve but can save a great deal of time for data loading and integration tasks, while also reducing the risk that errors are introduced through highly manual processes. At MARC, use of automated processes and commercial data integration tools for maintaining key datasets has greatly simplified the process of querying them, which means MARC is able to dedicate more time to analyzing data, not just collecting it. Virginia DOT integrated pavement condition data with data on planned paving projects to produce performance monitoring reports that tracked anticipated versus actual changes in condition and likelihood of achieving performance targets. The data integration effort relied on standardization of several data elements across two databases. Case G Case K
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 34 For more information... 1. NCHRP Synthesis 523: Integration of Road Safety Data from State and Local Sources (2018) http://www.trb.org/Main/Blurbs/177990.aspx 2. NCHRP 08-36, Task 131: Transportation Data Integration to Develop Planning Performance Measures (2017) http://onlinepubs.trb.org/onlinepubs/nchrp/docs/NCHRP08- 36(131)_FR.pdf 3. Informational Guide for State, Tribal and Local Safety Data Integration (FHWA, 2016) https://safety.fhwa.dot.gov/rsdp/downloads/fhwasa16118.pdf 4. Data Integration Primer (FHWA, 2010) https://www.fhwa.dot.gov/asset/dataintegration/if10019/if10019.pdf
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 35 Step 3.3 Assess & Improve Data Quality Data Quality Assessment. Poor quality data may have significant impacts on calculated performance metrics and therefore impact TPM decisions. Step 2.2 above discussed the importance of planning for data quality as part of data acquisition, and outlined the contents of a data quality management plan. However, there may be already existing data sets needed for TPM that are of unknown quality. A data quality assessment can be conducted to determine suitability of a data set for use in TPM. Quality assessment can consider multiple characteristics including completeness, currency, accuracy, and consistency. Data accessibility and interoperability are also sometimes considered. Assessing data quality involves establishing data quality metrics and measurement methods. For example, a metric for crash data completeness might be the percentage of data records that are missing a location code. This could be measured through a simple data query. Accuracy is typically assessed through a combination of independent verification for a sample of the records and application of validation checks to make sure measured values are within expected ranges. Quality Management. Quality management is a continuous process that starts prior to data acquisition and continues through the entire data life cycle. It should include analysis and flagging of data records that fail specific quality policies and thresholds. For example, pavement roughness measurements less than 30 inches/mile or travel speeds over 150 mph might be flagged as suspect. It is important to find the right balance when planning for data quality improvement. All too often, agencies spend large amounts of resources attempting to clean, scrub, and validate dataâonly to find that there continue to be data issues regardless of how much time and energy is spent in cleaning. Perfection becomes the enemy of good, and agencies end up never fully using the data to inform decisions. Worse, the department (or person) responsible for the data hides it or prevents âData that is loved tends to survive.â Kurt Bollacker I-95 Corridor Coalition Data Use Agreements contain explicit data quality specifications that ensure 3rd party provided data meets required quality standards to support TPM. Case D
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 36 others from using it due to potential issues, fear, liability, etc. As soon as data (in any form) becomes available, it can and should be analyzed for data quality and consistency. The act of analyzing data, even when it has not been cleaned or validated, is important for guiding and informing potential users, applications, and data investment decisions. Annotating (Not Discarding) Suspect Records. When suspect data records are encountered, a methodical process should be followed to flag these records and address the gaps in a carefully planned manner. Bad data records should not be summarily deleted because this could cause downstream analysis problems. Discarding bad data could negatively impact calculations if data gaps are not properly addressed. One way to address suspect or missing records is to fill gaps with historical data or otherwise imputed data. When this approach is used, these imputed records must be flagged to ensure that TPM decisions account for this simulated/modeled input. In some situations, marking and tracking bad data can provide important information that can be used to improve future data quality. For example, a crash data manager might observe a pattern of inaccurate or incomplete crash records from one particular source. As another example, a traffic sensor may exhibit a specific pattern where it reports erroneous data (or doesnât report data) every day during the 8am hour. If bad data is discarded or not otherwise tracked, this particular failure (and its cause) may never become evident, and subsequently continue to impact metrics based on that data. For more information... 1. Development of a Computational Framework for Big Data-Driven Prediction of Long-Term Bridge Performance and Traffic Flow (Midwest Transportation Center, 2018) https://rosap.ntl.bts.gov/view/dot/36042 2. Crash Data Improvement Program Guide (NHTSA, 2017) https://crashstats.nhtsa.dot.gov/Api/Public/Publication/812419 3. National Performance Management Research Data Set (NPMRDS) â Speed Validation for Traffic Performance Measures (Oklahoma Department of Transportation, 2017) http://www.okladot.state.ok.us/Research/FinalRep_2300_FHWA- OK-17-02.pdf
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 37 Capabilities Checklist: Store & Manage Data Basic ï¯ Data needed for TPM is stored in databases that are managed and regularly backed-up to provide protection from unauthorized access and corruption. ï¯ Back-ups are tested on a regular, established cycle (e.g. monthly). ï¯ Quality control procedures are in place to flag records that do not meet established validation criteria. ï¯ Data dictionary information (metadata) is maintained and stored in a standardized fashion. ï¯ Annual data snapshots are created for coordinated reporting across data programs. Advancing ï¯ Hardware and software requirements for data storage, updating, integration and access are understood. ï¯ Central data repositories have been established to integrate data from multiple sources and provide source data for reporting and analysis. ï¯ Cloud and hosted storage options are considered for larger and more complex data sets. ï¯ Data retention policies and archiving protocols have been updated to reflect lower storage costs and analysis of TPM business data needs. ï¯ A range of data storage options are available to support databases with high transaction volumes and memory-intensive calculations as well as archived data retained for future use. ï¯ Standards have been adopted to enable combining data from different sources. ï¯ Data from multiple sources are fused to assemble a more complete and accurate data set than would be possible from any single source. ï¯ Where appropriate, edge computing techniques are usedâ involving data processing at the source (e.g. at the site of the field sensor) rather than within a centralized repository. Doâs and Donâts Do: ï¡ Consider cloud storage to reduce or minimize the agencyâs IT footprint and make it easier to scale storage up or down based on need. ï¡ Explore hosted solutions from partners â universities, consultants, and sister agencies to provide cost-effective approaches to managing large and complex data sources. ï¡ Explore how fusing of disparate data sources can add value to your analysis and capabilities. ï¡ Build or hire expertise in statistical analysis and computer programming to effectively analyze and transform data into TPM related information. ï¡ Adjust your agencyâs data retention policies and storage architectures so that potentially useful data isnât destroyed permanently. ï¡ Establish repeatable, automated and documented data loading processes. ï¡ Take advantage of commercial data integration tools. Donât: ï² Delete older data. The minute you get rid of it, youâll find you need it again. ï² Delete erroneous data records â flag them instead. ï² Aggregate data sets to the lowest common denominator to save on storage space. ï² Let the allure of âbig dataâ technologies prevent you from continuing to invest in proven solutions. ï² Rely on âad-hocâ approaches to loading and integrating data.
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 38 Step 4 Share Data This step includes sharing transportation performance data across business units within an agency, across agencies, or with the general public. This includes but is not limited to transmitting data and reports to meet reporting obligations. Agencies benefit from sharing data through improved coordination across jurisdictions, enhanced understanding of joint priorities, and leveraging of investments. Note: this step focuses on the mechanics of data sharing and reporting including tool selection â see Step 5 for a discussion of data analysis and Step 6 for a discussion of communicating data. âThereâs a digital revolution taking place both in and out of government in favor of open- sourced data, innovation, and collaboration.â Kathleen Sebelius, Former United States Secretary of Health and Human Services
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 39 Step 4.1 Establish Reporting & Presentation Infrastructure Select and Deploy Analysis and Reporting Tools. Data analysis and reporting tools that are available to agency staff are a critical element in making effective use of data. These can include tools that fuse âsiloedâ data from disparate sources, tools that fill in gaps (âmissing dataâ), and those that identify or screen data outliers. Other important tools support analytics and visualization that help the agencies âseeâ into the dataâ asking questions, identifying issues, deriving meaning from the data, and communicating those insights to others. Tools include commercial business intelligence (BI) packages that support both traditional reporting as well as dashboards; GIS tools, statistical analysis packages, and specialized tools geared to particular types of performance data â for example, asset management systems and analytics platforms for congestion performance reporting. While it is unlikely that a single reporting and analysis tool can meet all of the agencyâs needs, it is important to keep in mind that every new tool requires support to bring on new releases, train users, and troubleshoot issues. It is best to follow a disciplined and coordinated process of defining needs and requirements and considering whether existing tools are sufficient prior to bringing on a new tool. Make Build versus Buy Decisions. Developing the appropriate analytics software and databases that make the data easier to analyze and accessible to end users can be a significant hurdle for agencies. For an agency to build successful tools independently, they will typically need to draw upon the expertise of software engineers, system architects, user interface and user experience design specialists, developers, and project managers. The tools will need to be maintained over time; therefore, ample documentation and knowledgeable staff are needed that can be called upon over the course of many years to keep the tools up-to-date. Building complex tools with extremely small teams can be risky and costly to an agency. Because of the high barrier to entry and continuing maintenance costs of developing custom tools, many agencies are now choosing to either purchase off-the-shelf tools, or leverage tools that other Maryland SHA uses RITIS visual analytics to combine disparate data sets and derive valuable information as part of after action reviews for operational improvements. Case E
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 40 agencies/universities have already paid to develop. This effectively creates a pooled-fund approach to software development and maintenance. This approach is becoming easier for those agencies who are unaccustomed to purchasing services and for those who have historically not adopted tools and products that were not developed in-house or even within their respective states. Whether an agency decides to build their own tools, hire consultants to build custom tools, or leverage existing tools, the following items should be considered. In-house Development â¢ Allocate ample time to working on requirements for usability, functionality, and recruiting multiple user groups to get an understanding of expected usage. â¢ Find an experienced partnerâattempt to procure the services of a consultant who has performed similar work for other agencies. Analysis tools may need customization and tailoring, but a proven provider is often more reliable than a standard consultant. â¢ Recognize that initial startup will be costly. There are several private-sector and university providers that have excellent archiving, fusion, and analytics products. Some of these systems work across borders and across multiple agencies. Consider adopting similar technologies or products as neighboring jurisdictions when possible so that shared experiences, knowledge, and benefits from shared resources can be leveraged. â¢ Avoid âblack boxâ solutions that do not explain the underlying technologies, algorithms, or methods used to calculate the performance measures. Ensure the chosen provider has documented procedures that can be shared with software engineers and data analysts. Some providers have multi-state/agency steering committees that collectively drive the features of the archive products to ensure they are constantly meeting user needs. Purchasing Tools â¢ More and more states and MPOs are starting to purchase probe- based speed data; however, not as many agencies are investing in tools to analyze the data that enables better decisions. Probe data vendors, for example, have analytic tools that are sold at prices that are less expensive than the effort needed to reproduce those tools inside of the agency. These tools dramatically improve the
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 41 productivity of analysts by making the data much easier to analyze. In addition, these tools provide capabilities to agencies that might have previously taken months of effort to produce. â¢ Most vendors can provide return on investment (ROI) examplesâ including case studies from previous applicationsâshowing how much money other organizations have been able to save by investing in third-party data analytics tools or third-party data. The I-95 Corridor Coalition has produced these types of ROI documents for its member agencies showing the benefits of some of their probe and incident data analytics products. More information can be obtained at www.i95coalition.org, or by reaching out to any third-party data provider. Purchasing Services â¢ For agencies that are not comfortable using analytic tools and are not interested in doing in-house data analysis, hiring outside consultant or university support may prove to be a viable option. Consultants and universities frequently have access to scientists, statisticians, database programmers, economists, and other analysts that would otherwise be difficult to hire at state and local agencies. When seeking out-of-agency services, it is wise to review product and project portfolios for examples of prior work to ensure an agencyâs needs match the skills of the consultant or university personnel being proposed on a project. â¢ When hiring outside support (consultant or universities), consider a phased approach to projects. Start small, and ensure the consultant is able to perform basic analysis and fusion tasks with the data available. If the consultants are successful, then work can progress on bigger analysis tasksâadding layers of complexity and building on prior work and available data sets. Initiating extremely large analysis tasks that are not easily broken down into smaller deliverables can be a recipe for confusion, cost overruns, disappointment, and waste. â¢ Regardless of who does the work, it is advisable to avoid mandating that consultants use specific tools, technologies, or techniques to deliver a solution. New technologies, methodologies, and tools are developed quickly and often. Requiring outdated technologies can result in unnecessarily limiting the agency and the consultant in performing analytical tasks. Allow the consultants to drive these decisions based on what they perceive to be the most efficient and effective tools and methods.
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 42 For more information... 1. NCHRP Project 03-128: Business Intelligence Techniques for Transportation Agency Decision Making (report forthcoming) http://apps.trb.org/cmsfeed/TRBNetProjectDisplay.asp?ProjectID=4 352 2. Development of a Travel-Time Reliability Measurement System (Minnesota Department of Transportation, 2018) http://www.dot.state.mn.us/research/reports/2018/201828.pdf 3. Implementation of Probe Data Performance Measures (Pennsylvania Department of Transportation, 2017) https://rosap.ntl.bts.gov/view/dot/32283
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 43 Step 4.2 Establish Data Standards & Formats Take Advantage of Data Standards. There are a number of data standards that can be adopted for agency data sets and/or used when sharing transportation system performance data between agencies (See Table 2). Some data standards cover data dictionary information (data elements and their definitions); others are more comprehensive and specify data formats, message structures, and technical mechanisms and protocols for sharing. Data standards can make sharing processes easier. However, standardization should not be a prerequisite for sharing. It is more beneficial to share non-standard data, than to not share anything. While standards are absolutely necessary in some instances (such as for vehicle to vehicle safety communications), the use of standards can break down in practice. Standards may become cumbersome because they try to address every possible data element and use case, or alternatively, standards are extended with custom fields and effectively lose the benefit of being a standard. Agencies may be asked to comply with a standard imposed by an external entity as a condition for data sharing. This can have unintended consequences if that standard requires data to be âdumbed downâ to the lowest common denominator to satisfy the needs of the external entity. In order to comply with this standard and remain on budget, agencies may permanently modify their data to match that standard and therefore lose significant value of that data for future use. This issue of standards becomes even more challenging when dealing with big data. Unstructured data and crowdsourced data are rarely standardized or clean, but still may have substantial value to an agency to support TPM. The key to successful data sharing is to adhere to standards when possible, but not at the cost of losing insight or capability from non-standard data sets. âThe wonderful thing about standards is that there are so many of them to choose from.â Grace Murray Hopper
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 44 Table 2. Example Data Standards Related to TPM Safety American National Standards Institute (ANSI) Standard D16 Manual on Classification of Motor Vehicle Traffic Crashes Model Minimum Uniform Crash Criteria (MMUCC) (https://crashstats.nhtsa.dot.gov/Api/Public/Publication/812433) Model Inventory of Road Elements (MIRE) (https://safety.fhwa.dot.gov/rsdp/downloads/fhwasa17048.pdf) Pavement Condition HPMS Field Manualâdefines pavement data elements AASHTO Standard R43-13, Standard Specification for Transportation Materials and Methods of Sampling and Testing, Standard Practice for Quantifying Roughness of Pavement Bridge Condition FHWA National Bridge Inspection Standards System Performance IEEE 1512-2006 Standard for Common Incident Management Message Sets for use by Emergency Management Centers ITE TMDD 3.3 ITE TMDD Traffic Management Data Dictionary (TMDD) Standard for Center to Center Communications ASTM E2665-08 Standard Specifications for Archiving ITS- Generated Traffic Monitoring Data Other Open Geospatial Consortium â variety of standards for geospatial data All-Roads Network of Linearly Referenced Data (ARNOLD) manual â provides guidance and best practices for building linear referencing systems (LRS) covering all public roads National Information Exchange Model (NIEM) â provides a common vocabulary that enables efficient information exchange across diverse public and private organizations. Data Type Applicable Standards
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 45 Select File Formats. Certain file formats have advantage over others when it comes to sharing data between agencies. For example, exchanging PDF files containing detour plans may make sense on an individual case basis, but it significantly reduces the ability to automatically process information and incorporate it in TPM processes. Ideally, data should be formatted in a machine-readable format that provides most flexibility for integration in TPM tools. Common data file formats found in open data platforms include JSON, XML, CSV, and KML. For more information... 1. Open Geospatial Consortium (website) (http://www.opengeospatial.org/standards) 2. Project Open Data (website) https://project-open-data.cio.gov/ 3. General Transit Feed Specification (website) http://gtfs.org/ 4. National Information Exchange Model (website) (https://www.niem.gov/) 5. National Bridge Inventory Resources (website) https://www.fhwa.dot.gov/bridge/nbi.cfm 6. USDOT JPO ITS Standards Program (website) https://www.standards.its.dot.gov/ 7. Manual on Classification of Motor Vehicle Traffic Crashes Eighth Edition - ANSI D16 (Association of Transportation Safety Information Professionals, 2017) (http://www.atsip.org/ANSI_Ver_2017_D16.pdf) 8. HPMS Field Manual (FHWA, 2016) https://www.fhwa.dot.gov/policyinformation/hpms/fieldmanual/ 9. Traffic Monitoring Guide (FHWA, 2016) https://www.fhwa.dot.gov/policyinformation/tmguide/tmg_fhwa_pl_17_00 3.pdf 10. All Roads Network of Linear Referenced Data (ARNOLD) Reference Manual (FHWA, 2014) https://www.fhwa.dot.gov/policyinformation/hpms/documents/arnold_refe rence_manual_2014.pdf 11. AASHTO Standard R43-13, Standard Specification for Transportation Materials and Methods of Sampling and Testing, Standard Practice for Quantifying Roughness of Pavement, 34th/2014 Edition 1-56051-606-4. (AASHTO, 2014)
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 46 Step 4.3 Publish Data Designate Authoritative Data Sources. Authoritative data sources for performance measure calculation should have been established as part of step 1.3 â Identify Analysis and Reporting Requirements. In preparation for publication, it is also important to designate authoritative sources for the computed performance measures, and for any contextual data to be provided in the reports. Only designated authoritative sources should be used for reporting. Following this guideline will ensure that information released to the public is consistent and quality-checked. Determine what Data to Share. The growing âopen dataâ movement is creating the need for agencies to decide what data to proactively make available to the public, what data to provide on request, and what data to keep restricted. Several states have developed policy guidance on data classification. For example, the District of Columbia defines five levels: â¢ Level 0âOpen (the default classification) â¢ Level 1âPublic, Not Proactively Released (e.g. due to potential litigation risk or administrative burden) â¢ Level 2âFor District Government Use (exempt from the Freedom of Information Act but not confidential and of value within the agency) â¢ Level 3âConfidential (sensitive or restricted from disclosure) â¢ Level 4âRestricted Confidential (unauthorized disclosure can result in major damage or injury) DC has adopted the philosophy that data should be open by default, and restricted only if there is a reason to do so. Select Data Sharing Methods. Sharing methods can vary from very basic file transmission such as FTP, to more complex asynchronous, persistent transmission methods such as subscriptions, web services, and others. Open data sharing platforms such as data.gov have been established at the federal level and by many state agencies. While simple methods may be quick and inexpensive to implement, they can in some situations diminish the value of shared data. For example, files posted to an FTP site once a day introduce unnecessary latency and reduce certain TPM capabilities.
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 47 To support system performance management, agencies should strive to share data in near real-time and at the highest possible resolution in order to provide the most flexibility and usefulness. For example, the Maryland State Highway Administration allows external entities to securely subscribe to their real-time operations data sharing interface which pushes incident information out to subscribers as it is entered by operators into their Advanced Traffic Management System (ATMS) platform. This approach allows partners to integrate this data into their system and become aware of significant incidents as soon as they occur (as opposed to minutes or hours later). This is important because it enables better real-time tracking of incident clearance times, responder activities, and other measures that are often requested by senior managers. Share Data within the Agency. Departments within agencies often invest in data collection and data services to satisfy specific needs. For example, operations groups may procure and install sensors to support real-time operations. Planning groups may install different devices with a slightly different configuration to support planning and modeling needs. However, agencies frequently fail to evaluate existing investments within the agency. For example, there could be significant overall cost savings if agency departments evaluated existing data sets within the agency and adjusted the existing configurations or agreements rather than going through a completely separate procurement process. This is particularly true for larger agencies Share Data with Other Agencies. Sharing data with other agencies provides significant benefits to all parties, as well as the traveling public. Access to other agenciesâ data allows a more holistic approach to TPM, as well as better coordination in efforts to improve performance. Some of the challenges to sharing data with other agencies include data sharing methods, formats, and agreements. It is also important for agencies to develop methods for the integration of external data. Separating relevant data from noise is an important exercise that can have a significant impact on TPM output. For example, an agency integrating incident data from a neighboring agency may want to only focus on external incidents close to the border, major regional incidents, or external incidents that may have an impact on the agencyâs area of responsibility. This means that for TPM purposes, agency must develop a policy for identifying incidents of significance to the agency and its system, while avoiding the trap of throwing away too much data that may be of use in the future. The Metropolitan Area Transportation Operations Coordination (MATOC) Program enhanced real-time data sharing among agencies. This allowed agencies to become aware of incidents more quickly, to respond more quickly, to clear the incident more quickly, to alert travelers more quickly, and to develop standard operating procedures (SOPs) that account for impacts of regional and cross- jurisdictional events. Case F The Florida DOT created an open data portal for sharing data both internally and with the public. Several FDOT business units had already begun to post important data sets on-line at various web sites. The portal provided a central location for data sharing, making it easier for people to locate available data. Case C
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 48 Sharing Data with the Public. Public agencies have a responsibility to provide best possible service to their customers: the general traveling public. One component of this responsibility is sharing of agencyâs performance with the public. While some data elements may be sensitive, most transportation data can be shared to better inform the public regarding system performance. In addition to open data, agencies can provide easily digestible and interactive reports regarding system performance. One challenge with open data is that it is exposed to a general public that has varying levels of understanding of raw data. This can lead to distorted interpretation of data. While this is a real challenge, it should not be a barrier to sharing data with the public. Agencies can provide metadata, documentation, and sample applications to help users better understand raw data and its potential uses. Application Programming Interfaces (APIs) allow users to develop their own data ingestion and processing applications and add value to existing data sets. In order for the agency to effectively distribute information, it must be able to share data via APIs for integration with other applications and systems. For example, the City of Chicago publishes APIs for historical congestion estimates, average daily traffic counts, and other TPM related data sets. Similarly, the New York State DOT publishes APIs for historical traffic and transit events across the state and New York City. Consider Data Sharing Agreements. When agencies share data with each other, there is frequently a need for some type of an agreement or memorandum of understanding. Inter-agency agreementsâespecially those between statesâhave sometimes been difficult to negotiate because of governing law language and other restrictive terms and conditions. Because of this, many agencies are now opting to make their data âopenâ to the publicâeliminating the need for data sharing agreements. Other agencies opt for informal, hand-shake agreements. Informal agreements can work well, though agencies with significant staff turnover will want to document any agreements in place. Investigate Public-Private Data Sharing Arrangements. Over the last decade, the private sector has emerged as an important partner when it comes to transportation data to support TPM. Private sector data providers have been able to leverage technology and innovation in ways that public agencies are often unable to do. The concept of sharing data with the private sector has become both more important and more
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 49 prevalent in recent years. Not only are agencies benefiting from obtaining new data sets from the private sector, but they are also benefiting from the private sector value-add to the existing agency data sets. Agencies must be careful about negotiating data sharing contracts with private sector entities. In particular, agencies should pay particular attention to data use restrictions, and seek maximum flexibility in use of data. This includes the ability to share data with universities and partner agencies, and ability to generate and share reports and summaries with general public. Agencies, in turn, should treat the private sector as equal partners who can assist in disseminating information to the public and providing valuable insight in customers behavior and travel patterns. Provide Tools for Easy Data Access. Data has little value if it is not easily accessible. With continued improvement in bandwidth capabilities, web-based tools and data portals are becoming the norm. These tools allow users to log in and access data from anywhere with an internet connection. In addition to web-based access, the user interface and efficiency of the applications are critical. Poor user interfaces can make it difficult to understand what data and capabilities are available. Similarly, executing a query on a data set and waiting several hours or even days to receive an answer is unacceptable. Users must be able to quickly define a question and receive a response to make data and information useful. This means that agencies need to go beyond establishing databases or big data platforms, and ensure that appropriate tools exist to access, visualize, and manipulate data for TPM. In many cases more than one type of tool will be required to meet the needs â and skill sets of different types of users. For example, some agencies make available one reporting package for technical staff and âpower usersâ and a second for more casual users. The I-95 Corridor Coalition collaboratively developed a public private partnership between member agencies and 3rd party data providers to take advantage of the latest private sector data offerings. They created a liberal and flexible model data use agreement that has become the âgold standardâ for agencies and consortiums across the country for over a decade. Case D
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 50 For more information... 1. Data Presentation on Transportation Agency Websites: Trends and Best Practices (Caltrans, 2017) http://transweb.sjsu.edu/sites/default/files/1501-data- presentation-on-transportation-agency-websites-trends-and- best-practices.pdf 2. Uses of Geospatial Applications for Transportation Performance Management (FHWA, 2016) https://www.gis.fhwa.dot.gov/documents/Uses_of_Geospatial_ Applications_for_Transportation_Performance_Management_ Case_Studies.pdf 3. State of the Practice on Data Access, Sharing, and Integration (FHWA, 2016) https://rosap.ntl.bts.gov/view/dot/35860 4. NCHRP Synthesis 460, Sharing Operations Data Among Agencies (2014) http://www.trb.org/Publications/Blurbs/170868.aspx 5. Geospatial Tools for Data Sharing: Case Studies of Select Transportation Agencies (FHWA, 2014) https://rosap.ntl.bts.gov/view/dot/12147
Introduction â¢ Foundation â¢ Reporting â¢ Insight â¢ Cases 51 Capabilities Checklist: Share Data Basic ï¯ Employees are aware of key performance data sources within the agency. ï¯ There are clear agency policies in place that data should be shared unless the need to protect it is demonstrated. ï¯ There are protocols defined for how to share data to meet different needs that consider use of state and federal open data portals and hosted or cloud solutions. ï¯ Open data portals are used to share data. ï¯ Data explanations are provided in âplain Englishâ to help users understand meaning, sources and limitations. Advancing ï¯ Data governance and stewardship structures have been established to facilitate communication about data sharing and identify opportunities for synergies across business units for collaborating or combining data sources. ï¯ Data sharing agreements are used (internal to an agency and between and agency and its partners) that specify what data will be shared, when and how â and establish a clear understanding of data limitations and expectations for use. ï¯ Data are shared in formats that are designed to meet the needs of different users which may include standard reports, data feeds, and dashboards. ï¯ Data with sensitive elements are sanitized for public distribution. ï¯ Data contracts and sharing agreements are reviewed to ensure that agency flexibility is retained. Doâs and Donâts Do: ï¡ Strive to open your data up to partner agencies and the public. ï¡ Make sure that your data sets are ready to be shared by putting in place some standard criteria (no sensitive information, passed basic quality review, from an authoritative source, etc.) ï¡ Treat other agencies as partners with whom you want to share your data so that you can improve systems safety and reliability. ï¡ Put your data into standard formats when itâs simple and improves upon your capabilities. Donât: ï² Assume a one-size-fits-all data feed will work for both the public and your agency partners. ï² Sign data sharing agreements with restrictive âgoverning lawâ language. ï² Let a lack of standardization become an excuse for not sharing data.