Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 57
57
TSI officials note that the index is not scalable down to the and other potential users on the value of the measures. The
local level. The TSI was intended to be a national-level index. TSI team has a media person who is focused on educating
Scaling it down to the local level poses many difficulties, fore- and communicating the use and value of the TSI.
most being availability of data. Trucking information at the The TSI experience also suggests that long-term fund-
local level is not available, nor is railroad freight information. ing and the ability to recruit expertise will be necessary to
There was a request from Fannie Mae for quarterly regional establish a comprehensive freight performance measurement
information. Given the current processes and sources of data system. As noted, the TSI project started with a team of 22
collection, analysis, seasonal adjustments, and indexing and people. After the initial start-up effort, the staff was reduced
the weighting and chaining process involved in generating to five federal employees and two consultants.
the TSI, there is no plan to scale the national index to a local The TSI experience also illustrates that process and quality
or regional level. reviews are integral where data from varied sources have to
The systems and processes involved are detailed, often be collected, analyzed, scrubbed, filtered, and then combined
requiring manual manipulation of data and collecting of to create the index. Data availability has to be studied and
data from air carrier websites and revising the data for three various alternative sources of data need to be tapped. The
months prior to making it available in a stable state. The TSI TSI team notes that 50 percent of the data is lost through
team has gone through significant streamlining of the process the process of data scrubbing, cleaning, and filtering prior to
and data analysis, making it possible to generate the reports being included in the published TSI. Where possible, receiv-
in a timely manner. ing processed data from the source reduces some of the data
The TSI staff report that the level of effort involved is signifi- scrubbing efforts. One such example of scrubbed data is
cantly high. There are also some current uncertainties about the rail data received from FRA. Also, making sure that the
roles and responsibilities. Even in its current state, reports are required data will be available through the life of the mea-
published as tentative for the last three months. After monitor- sure is important. Moreover, sometimes data is not available
ing changes for a quarter, the earliest month is moved from pre- timely to complete all necessary tasks required to meet the
liminary to a final state and the latest monthly report is added tight windows of generating the monthly reports. At least
in a preliminary state. In this way the current three months of one set of trade association data was only available, forcing a
data are always shown in a "preliminary" state. three-month lag for the TSI.
Trucking
Data Considerations for the
Monthly truck ton-mile data is not available through a
Freight Report Card
federal agency, so the data are obtained from the American
Trucking Association using a calculated truck tonnage index. Based upon the findings of the literature, the case s
tudies,
When the official data become available the preliminary and the interviews with stakeholders, the following data-
values are replaced. There is a small cost associated with pur-
quality considerations will need to be addressed in the devel-
chase of these data. opment of a Freight System Report Card.
Air
Use Common Definitions for
Aviation data are collected from the airline websites and
Common Understanding
the Office of Airline Information (OAI). Often times the
data are not readily available from the OAI dataset. The data In order for stakeholders to generate and to use the data
change frequently, and the TSI team have to be prepared to needed to create a set of national freight performance mea-
include the changes and to replace data as the data become sures, there needs to be clarity regarding what each measure
officially available from the airlines. and each piece of data means. Clarity of definitions--not
only for each measure, but also for the data that feeds each
Rail measure--will promote a common understanding of the
The data are obtained from FRA and do not include data data and the measure among all shareholders. This can be
from Amtrak and the Alaskan Railway Corp. Commuter rail accomplished by defining the metadata, that is, data that
is included in transit. describe data.
There are many variations to the definition of metadata,
but a common definition is one provided by Webopedia,
TSI Challenges and Lessons Learned
which defines metadata as "Data that describes how and
Among the challenges that the TSI effort faces is the need when and by whom a particular set of data was collected,
for continuous effort to educate the management, the public, and how the data are formatted." The TRB Final Metadata
OCR for page 58
58
Working Group Report 2006 states one of the many values of in use. The result is a hybrid of databases and technologies
metadata thus: within an organization; each often having varying standards,
formats, and quality. Within an organization, even when the
Metadata provides information necessary for data to be data are managed by one central group, the data often come
understood and interpreted by a wide range of users . . . metadata
from multiple databases that do not necessarily communicate
are particularly important when the data users are physically or
administratively separated from the data producers. Metadata with each other. This issue is magnified when data manage-
also reduce the workload associated with answering the same ment is decentralized. The explosion of data leads to many
questions from different users about the origin, transformation, challenges with sorting, selecting, and retrieving relevant data,
and character of the data.14 including scrubbing, preprocessing, and integrating data from
multiple sources. With data being stored in different systems
Metadata management is not an easy task, but it is essential that use different technologies and databases, the challenges of
when working with data from multiple sources and is easier communicating between databases also has to be addressed.
to implement if formalized at the start of a project rather than Therefore, dealing effectively with multiple sources of data
enforced after the data has been pulled together from a vari- becomes a major issue when working across agencies and
ety of different sources. Agencies have worked independent of crossing over to accessing data from the private sector.
each other for decades and each has its own data structures, In measuring the performance of a multimodal freight
naming conventions, and formats. In the past decade, with system, formal mechanisms will need to be put in place to
public agencies collaborating and conducting peer s tudies ensure that data derived from multiple sources or silos, cov-
informally, they have moved toward similar understand- ering a range of technologies, systems, and databases, are
ing and definitions of data in many areas of transportation. adequately preprocessed and integrated prior to populating
However, there is much that is still needed. In some of the the framework.
newer areas, such as Geographic Information Systems, there
is much more standardization. In order for the performance
measures for the freight transportation system to be success- Adopt Data Standards and Formats
ful, the metadata for the framework should be defined early Data standards and formats play a very important role
in the process. when integrating data from different sources. Some of these
may seem very simple and conceptually easy to resolve, but
Ensure Data Quality in dealing with millions of records from multiple sources, the
issues get compounded. All of these issues are resolvable, but
Data quality is the essential component that makes data the time required to address each of them adds to the time
valuable to users. This includes accuracy, consistency, timeli- required for the overall analysis and preprocessing time.
ness, and completeness. Data quality, as defined by the British One simple example of data formats involves a freight cost
Columbia Government Information Resource Management of $5,000.50 recorded in several databases. It would be com-
Glossary, is "the state of completeness, validity, consistency, mon for one to store this information in a text format (five
timeliness and accuracy that makes data appropriate for a thousand dollars and 50 cents), another to save it in currency
specific use." Data quality refers to how closely the data can format but capture it as "Dollars 5000" and in yet another
portray the real phenomena. The quality of the data is what database to record it in a currency format, but with more
determines if a decision maker will rely on the data to make detail, as "$5000.50." Several detailed steps will have to be fol-
decisions. lowed in this simple example to integrate cost information
While the importance of having high-quality data is intui- from these multiple sources. A data format for the final inte-
tively clear, it takes considerable effort to ensure that the qual- grated data will first have to be established. Data from each
ity of data is maintained. Because the quality of data will have source will then have to be processed for conversion to that
a significant impact on decision making, a process will need final format before it can be integrated. The analysis and pre-
to be implemented to systematically ensure quality checks of processing needed for use of such data for establishing a per-
the data being used to populate the freight performance man- formance management framework will be dependent on the
agement framework. number of sources, which could involve numerous public and
private organizations. Appropriate attention will therefore
be necessary to bring together data from private and public
Draw On Multiple Data Sources
agencies covering the multiple modes, standards, and formats
Organizations developing and deploying new applica- involved, to ensure that the data are preprocessed appropri-
tions routinely use available newer technologies, databases, ately for conversion to the final format established for the
and programming languages, in addition to those already performance measures framework.
OCR for page 59
59
Address Data Integration users from using a system. The database design will have to
take into consideration the access time and also design for
As mentioned earlier, data integration is the process of the
both active and dormant data. The design should consider
standardization of data definitions and data structures by
a tiered approach to data storage in which cheaper storage is
using a common conceptual schema across a collection of
used for less frequently used data, while frequently accessed
data sources. Integrated data will be consistent and logically
data could be on high-performing disk storage. Backup and
compatible in different systems or databases and can be used
recovery processes should be formalized, tested, and imple-
across time and users.
mented from the very beginning.
Historically, data warehousing has been a technique suc-
cessfully used by organizations to bring together data from
multiple sources for reporting and decision making. The Plan for Archived vs. Real-Time Data Needs
Ohio DOT, an agency that is advanced in the use of perfor-
In addition to archived data, the measures for the perfor-
mance measures, has at least five different types of databases
mance of the freight transportation system could include
that use various programming languages, ranging from newer
real-time systems such as Intelligent Transportation Sys-
languages such as Java to older languages such as COBOL. It
tems (ITS). In the Freight Information Real-Time System for
has successfully used data warehousing to bring together data
Transportation (FIRST), ITS was used by the Port Authority
from many different applications and many different data-
of New York and New Jersey from mid-2001 until December
bases to provide information to assist with decision making.
2003 to provide real-time freight information.15 The Port of
The model used by the Virginia and Ohio DOTs, to create a
Vancouver has also successfully used ITS to improve freight
data warehouse to provide information about performance
movement. In both instances, data quality and availability of
of operations and assets, has been successful. The data ware-
data were among the items listed that required attention for
house approach also addresses the issue raised by Minnesota
successful deployment of the systems.
DOT about parochial systems and systems that duplicate data.
In using a data warehouse, data can be extracted from According to the USDOT, "ITS can facilitate the safe, efficient,
different sources and the necessary logic can be applied to secure, and seamless movement of freight. Applications being
compute various statistical figures about performance of the deployed provide for tracking of freight and carrier assets such
measures (for example, percentage of time in a day that the as containers and chassis, and improve the efficiency of freight
terminal processes, drayage operations, and international border
traffic flow is below a specified service level). Alternatively,
crossings."
data may be summarized, integrated, or broken down and
saved as more granular components. The granular informa- The architecture required to report summary data is dif-
tion can then be used for the performance measure dash- ferent from that required for real-time decisions or trend
board, decision support, or other reporting systems and to analysis. Any real-time or near real-time information that is
provide answers to ad hoc queries by users. Historical data required will need to consider additional factors such as data
required for trend analysis or computation of lagging and latency, frequency of refresh, and the frequency at which
leading indicators of performance of measures can also be data need to be presented for each measure. Near real-time
obtained from the data in the data warehouse. measures will require that data be captured, cleansed, and
loaded in near real-time.
Consider Access Time
Plan for the Sustainability of the
The time taken to access information is important to the
Framework
usability of any system, particularly one envisioned for high-
way operation data as sought for this project. The informa- For continuity of decision making, it is important that any
tion technology industry invests millions of dollars each year set of measures be sustained beyond the initial deployment
in researching user behavior to improve the user's experience. and continue to provide timely and accurate information dur-
If the goal is to make these performance measures available ing the entire period of its use or life. The purpose of freight
nationally for users to access for decision making, then one performance measures is to provide information to allow
factor that needs to be considered for usability is the time decision makers to make informed decisions and for users to
taken from the moment a user commences an attempt to see the performance of the freight transportation system not
access the information to the moment when the user actually for the short term, but for several years. This can happen only
retrieves the information. As the volume of data in the sys- if the framework is sustainable and available for the period of
tem grows, the time taken to access the information will also its intended use. Sustainability involves ensuring that timely,
increase. Long time periods to access information discourage accurate data are available, that they can be easily accessed by