Cover Image

Not for Sale

View/Hide Left Panel
Click for next page ( 58

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 57
57 TSI officials note that the index is not scalable down to the and other potential users on the value of the measures. The local level. The TSI was intended to be a national-level index. TSI team has a media person who is focused on educating Scaling it down to the local level poses many difficulties, fore- and communicating the use and value of the TSI. most being availability of data. Trucking information at the The TSI experience also suggests that long-term fund- local level is not available, nor is railroad freight information. ing and the ability to recruit expertise will be necessary to There was a request from Fannie Mae for quarterly regional establish a comprehensive freight performance measurement information. Given the current processes and sources of data system. As noted, the TSI project started with a team of 22 collection, analysis, seasonal adjustments, and indexing and people. After the initial start-up effort, the staff was reduced the weighting and chaining process involved in generating to five federal employees and two consultants. the TSI, there is no plan to scale the national index to a local The TSI experience also illustrates that process and quality or regional level. reviews are integral where data from varied sources have to The systems and processes involved are detailed, often be collected, analyzed, scrubbed, filtered, and then combined requiring manual manipulation of data and collecting of to create the index. Data availability has to be studied and data from air carrier websites and revising the data for three various alternative sources of data need to be tapped. The months prior to making it available in a stable state. The TSI TSI team notes that 50 percent of the data is lost through team has gone through significant streamlining of the process the process of data scrubbing, cleaning, and filtering prior to and data analysis, making it possible to generate the reports being included in the published TSI. Where possible, receiv- in a timely manner. ing processed data from the source reduces some of the data The TSI staff report that the level of effort involved is signifi- scrubbing efforts. One such example of scrubbed data is cantly high. There are also some current uncertainties about the rail data received from FRA. Also, making sure that the roles and responsibilities. Even in its current state, reports are required data will be available through the life of the mea- published as tentative for the last three months. After monitor- sure is important. Moreover, sometimes data is not available ing changes for a quarter, the earliest month is moved from pre- timely to complete all necessary tasks required to meet the liminary to a final state and the latest monthly report is added tight windows of generating the monthly reports. At least in a preliminary state. In this way the current three months of one set of trade association data was only available, forcing a data are always shown in a "preliminary" state. three-month lag for the TSI. Trucking Data Considerations for the Monthly truck ton-mile data is not available through a Freight Report Card federal agency, so the data are obtained from the American Trucking Association using a calculated truck tonnage index. Based upon the findings of the literature, the case s tudies, When the official data become available the preliminary and the interviews with stakeholders, the following data- values are replaced. There is a small cost associated with pur- quality considerations will need to be addressed in the devel- chase of these data. opment of a Freight System Report Card. Air Use Common Definitions for Aviation data are collected from the airline websites and Common Understanding the Office of Airline Information (OAI). Often times the data are not readily available from the OAI dataset. The data In order for stakeholders to generate and to use the data change frequently, and the TSI team have to be prepared to needed to create a set of national freight performance mea- include the changes and to replace data as the data become sures, there needs to be clarity regarding what each measure officially available from the airlines. and each piece of data means. Clarity of definitions--not only for each measure, but also for the data that feeds each Rail measure--will promote a common understanding of the The data are obtained from FRA and do not include data data and the measure among all shareholders. This can be from Amtrak and the Alaskan Railway Corp. Commuter rail accomplished by defining the metadata, that is, data that is included in transit. describe data. There are many variations to the definition of metadata, but a common definition is one provided by Webopedia, TSI Challenges and Lessons Learned which defines metadata as "Data that describes how and Among the challenges that the TSI effort faces is the need when and by whom a particular set of data was collected, for continuous effort to educate the management, the public, and how the data are formatted." The TRB Final Metadata

OCR for page 57
58 Working Group Report 2006 states one of the many values of in use. The result is a hybrid of databases and technologies metadata thus: within an organization; each often having varying standards, formats, and quality. Within an organization, even when the Metadata provides information necessary for data to be data are managed by one central group, the data often come understood and interpreted by a wide range of users . . . metadata from multiple databases that do not necessarily communicate are particularly important when the data users are physically or administratively separated from the data producers. Metadata with each other. This issue is magnified when data manage- also reduce the workload associated with answering the same ment is decentralized. The explosion of data leads to many questions from different users about the origin, transformation, challenges with sorting, selecting, and retrieving relevant data, and character of the data.14 including scrubbing, preprocessing, and integrating data from multiple sources. With data being stored in different systems Metadata management is not an easy task, but it is essential that use different technologies and databases, the challenges of when working with data from multiple sources and is easier communicating between databases also has to be addressed. to implement if formalized at the start of a project rather than Therefore, dealing effectively with multiple sources of data enforced after the data has been pulled together from a vari- becomes a major issue when working across agencies and ety of different sources. Agencies have worked independent of crossing over to accessing data from the private sector. each other for decades and each has its own data structures, In measuring the performance of a multimodal freight naming conventions, and formats. In the past decade, with system, formal mechanisms will need to be put in place to public agencies collaborating and conducting peer s tudies ensure that data derived from multiple sources or silos, cov- informally, they have moved toward similar understand- ering a range of technologies, systems, and databases, are ing and definitions of data in many areas of transportation. adequately preprocessed and integrated prior to populating However, there is much that is still needed. In some of the the framework. newer areas, such as Geographic Information Systems, there is much more standardization. In order for the performance measures for the freight transportation system to be success- Adopt Data Standards and Formats ful, the metadata for the framework should be defined early Data standards and formats play a very important role in the process. when integrating data from different sources. Some of these may seem very simple and conceptually easy to resolve, but Ensure Data Quality in dealing with millions of records from multiple sources, the issues get compounded. All of these issues are resolvable, but Data quality is the essential component that makes data the time required to address each of them adds to the time valuable to users. This includes accuracy, consistency, timeli- required for the overall analysis and preprocessing time. ness, and completeness. Data quality, as defined by the British One simple example of data formats involves a freight cost Columbia Government Information Resource Management of $5,000.50 recorded in several databases. It would be com- Glossary, is "the state of completeness, validity, consistency, mon for one to store this information in a text format (five timeliness and accuracy that makes data appropriate for a thousand dollars and 50 cents), another to save it in currency specific use." Data quality refers to how closely the data can format but capture it as "Dollars 5000" and in yet another portray the real phenomena. The quality of the data is what database to record it in a currency format, but with more determines if a decision maker will rely on the data to make detail, as "$5000.50." Several detailed steps will have to be fol- decisions. lowed in this simple example to integrate cost information While the importance of having high-quality data is intui- from these multiple sources. A data format for the final inte- tively clear, it takes considerable effort to ensure that the qual- grated data will first have to be established. Data from each ity of data is maintained. Because the quality of data will have source will then have to be processed for conversion to that a significant impact on decision making, a process will need final format before it can be integrated. The analysis and pre- to be implemented to systematically ensure quality checks of processing needed for use of such data for establishing a per- the data being used to populate the freight performance man- formance management framework will be dependent on the agement framework. number of sources, which could involve numerous public and private organizations. Appropriate attention will therefore be necessary to bring together data from private and public Draw On Multiple Data Sources agencies covering the multiple modes, standards, and formats Organizations developing and deploying new applica- involved, to ensure that the data are preprocessed appropri- tions routinely use available newer technologies, databases, ately for conversion to the final format established for the and programming languages, in addition to those already performance measures framework.

OCR for page 57
59 Address Data Integration users from using a system. The database design will have to take into consideration the access time and also design for As mentioned earlier, data integration is the process of the both active and dormant data. The design should consider standardization of data definitions and data structures by a tiered approach to data storage in which cheaper storage is using a common conceptual schema across a collection of used for less frequently used data, while frequently accessed data sources. Integrated data will be consistent and logically data could be on high-performing disk storage. Backup and compatible in different systems or databases and can be used recovery processes should be formalized, tested, and imple- across time and users. mented from the very beginning. Historically, data warehousing has been a technique suc- cessfully used by organizations to bring together data from multiple sources for reporting and decision making. The Plan for Archived vs. Real-Time Data Needs Ohio DOT, an agency that is advanced in the use of perfor- In addition to archived data, the measures for the perfor- mance measures, has at least five different types of databases mance of the freight transportation system could include that use various programming languages, ranging from newer real-time systems such as Intelligent Transportation Sys- languages such as Java to older languages such as COBOL. It tems (ITS). In the Freight Information Real-Time System for has successfully used data warehousing to bring together data Transportation (FIRST), ITS was used by the Port Authority from many different applications and many different data- of New York and New Jersey from mid-2001 until December bases to provide information to assist with decision making. 2003 to provide real-time freight information.15 The Port of The model used by the Virginia and Ohio DOTs, to create a Vancouver has also successfully used ITS to improve freight data warehouse to provide information about performance movement. In both instances, data quality and availability of of operations and assets, has been successful. The data ware- data were among the items listed that required attention for house approach also addresses the issue raised by Minnesota successful deployment of the systems. DOT about parochial systems and systems that duplicate data. In using a data warehouse, data can be extracted from According to the USDOT, "ITS can facilitate the safe, efficient, different sources and the necessary logic can be applied to secure, and seamless movement of freight. Applications being compute various statistical figures about performance of the deployed provide for tracking of freight and carrier assets such measures (for example, percentage of time in a day that the as containers and chassis, and improve the efficiency of freight terminal processes, drayage operations, and international border traffic flow is below a specified service level). Alternatively, crossings." data may be summarized, integrated, or broken down and saved as more granular components. The granular informa- The architecture required to report summary data is dif- tion can then be used for the performance measure dash- ferent from that required for real-time decisions or trend board, decision support, or other reporting systems and to analysis. Any real-time or near real-time information that is provide answers to ad hoc queries by users. Historical data required will need to consider additional factors such as data required for trend analysis or computation of lagging and latency, frequency of refresh, and the frequency at which leading indicators of performance of measures can also be data need to be presented for each measure. Near real-time obtained from the data in the data warehouse. measures will require that data be captured, cleansed, and loaded in near real-time. Consider Access Time Plan for the Sustainability of the The time taken to access information is important to the Framework usability of any system, particularly one envisioned for high- way operation data as sought for this project. The informa- For continuity of decision making, it is important that any tion technology industry invests millions of dollars each year set of measures be sustained beyond the initial deployment in researching user behavior to improve the user's experience. and continue to provide timely and accurate information dur- If the goal is to make these performance measures available ing the entire period of its use or life. The purpose of freight nationally for users to access for decision making, then one performance measures is to provide information to allow factor that needs to be considered for usability is the time decision makers to make informed decisions and for users to taken from the moment a user commences an attempt to see the performance of the freight transportation system not access the information to the moment when the user actually for the short term, but for several years. This can happen only retrieves the information. As the volume of data in the sys- if the framework is sustainable and available for the period of tem grows, the time taken to access the information will also its intended use. Sustainability involves ensuring that timely, increase. Long time periods to access information discourage accurate data are available, that they can be easily accessed by