Skip to main content

Currently Skimming:


Pages 43-74

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 43...
... 36 4 Overview of Approach This section of the report presents the methodology and findings from a comprehensive research effort and assessment of the current state of practice in data management (emerging transportation technology data in particular) for transportation agencies.
From page 44...
... 37 Finally, the findings from the research were synthesized, presented, and collaboratively reviewed and discussed at a stakeholder validation workshop. Participants at this workshop were able to see the preliminary data management framework that was developed and to provide further commentary from their personal experience.
From page 45...
... 38 IT communities, university IT department representatives, the National Association of State Technology Directors, DC Web Women, and the Association for Information and Image Management) , feedback from DOTs on challenges and success factors for implementing data systems, and industry trends in the use of data systems (including technology recommendations for building sustainable platforms, data governance guidance, software deployment guidance, and acquisition recommendations)
From page 46...
... 39 The objectives of Integrating Emerging Data Sources into Operational Practice (Gettman, et al., 2017) were to provide agencies responsible for traffic management with an introduction to big data tools and technologies that could be used to aggregate, store, and analyze new forms of traveler-related data; to identify the challenges and options to consider when compiling, using, and sharing these data; and to describe ways the tools/technologies could be integrated into existing systems.
From page 47...
... 40 a very high-level common data curation models supporting data lifecycle and the design of data pipelines running on ITS data infrastructure. It should be noted that the concepts and recommendations presented here are presented in more detail in NCHRP Research Reports 865 and 904.
From page 48...
... 41 they face many technical and institutional challenges in doing so. These challenges include a lack of trust in externally collected data, failure to view data improvement as a priority, and overly restrictive data use agreements with public and private partners.
From page 49...
... 42 Structural health monitoring systems involve different types of sensors that result in the collection of massive volumes of data with diverse and complex data types (e.g., video images, traffic information, weather data)
From page 50...
... 43 Figure 12. Big Data Framework for Smart Grids (Daki, El Hannani, Aqqal, Haidine, & Dahbi, 2017)
From page 51...
... 44 Figure 13. Agencies Responding to the Online Survey Table 3 provides a detailed breakdown of responses by type of emerging transportation technology project.
From page 52...
... 45 heatmap that there were three very incomplete survey responses (rows where nearly all columns are red)
From page 53...
... 46 Figure 15.
From page 54...
... 47 Figure 17.
From page 55...
... 48 Figure 19. How is Your Agency Planning to Use the Data Being Collected?
From page 56...
... 49 scope. While the survey results offer a glimpse into these projects, it is difficult to draw many conclusions other than the overall lack of documentation and information available.
From page 57...
... 50 4.2.2.4 How Are the Data Being Used? In general, the data collected by transportation agencies involved in the deployment and testing of emerging transportation technology projects is not currently being used to any great extent.
From page 58...
... 51 There is an awareness among some states and cities that these overly restrictive implementations of the department-led curation model will not be sustainable and that a more advanced form of data curation process will need to be implemented; however, as of now, no significant progress has been made in developing more modern data curation models or adapting them to connected vehicles and smart cities use. This may be due, in part, to the fact that very few public agencies employ data scientists, or that data management is not a prioritized focus of these projects.
From page 59...
... 52 the inevitable flood of data. In addition, many agencies lack the technical, culture, policy, and legal experience needed to deal with such data and are currently relying on their contractors for big data management.
From page 60...
... 53 Table 4. Project Documentation Noted by Survey Respondents Project # of Responses No Documents Yes, but Cannot Share Can Share, but Did Not Shared Documents JPO CV Pilot 0 0 0 0 0 JPO CV Testbeds 1 1 0 0 0 DOT Smart City 1 0 0 0 1 FHWA ATCMTD 2 0 0 0 2 SPaT Challenge 10 4 1 2 3 FTA MOD 3 1 1 1 0 FHWA -FTA ATTRI Applications 4 2 1 0 1 Crowdsourcing Using Social Media 3 2 0 1 0 Other 4 0 0 1 3 Totals 28 10 3 5 10 Table 5.
From page 61...
... 54 Table 6. Documentation Available from Other Emerging Transportation Technology Projects Project Name Type of Documentation DMP System Diagram Open Data Meta- data Data Quality DPP Data Security Data Retention Data Definition Archived Data New York City CV Pilot   *
From page 62...
... 55 Table 7. Big Data Benchmark and Assessment Methodology Focus Area Low Benchmark Moderate Benchmark High Benchmark Data Collection Data Modeling & Design Little or no data collected Data collected is in outdated or proprietary formats Data collected is not relevant Source data are deleted or modified PII is collected in an insecure process No documented data collection procedures Did not reference any existing models or frameworks when designing data workflow No data usability assessments performed Model does not allow for ad hoc data augmentation or other continuous development practices Model does not include any data masking techniques.
From page 63...
... 56 Focus Area Low Benchmark Moderate Benchmark High Benchmark Data Security Data Quality Data Governance Data Integration & Interoperability Data Warehousing & Business Intelligence Sensitive information stored in plaintext No privacy filters No network encryption or endpoint protection Rigid authentication structure hinders authorized use of data Insecure authentication process fails to prevent unauthorized use of data A great amount of time and effort required to grant access to a new user Data quality is unknown No data quality rankings performed No data quality dashboards available No processes in place to flag low quality data 3rd party owns data, severely restricts use Very high cost to use data Data can only be analyzed with a small number of proprietary tools Data system is ineffectively managed No data management software No data management plan No one in-house is familiar with the system No system monitoring Data cannot be integrated to new systems without significant effort Each system uses its own data type and siloed data source No uniform organization plan, folder structures are unique to each dataset No uniform classification taxonomy Business needs are not met, and business leaders see data as worthless Few or no useful BI products or visualizations BI products are "siloed" to where they are only used by the original stakeholders for the original use case Sensitive information stored in somewhat secure manner Some privacy filters and/or encryption employed for PII Basic level of network and endpoint security Authorization structure somewhat hinders authorized use outdated or inadequate authentication process fails to fully secure data Some amount of time and effort required to grant access to a new user Data quality is somewhat known Basic data quality rankings performed Data quality dashboard available Manual processes in place to flag low quality data Organization owns data and with rigid use restrictions High cost to use data Data can only be analyzed with a handful of tools Data system is adequately managed Sufficient data management software is used Some data management plan has been written A few people are familiar with the system System activity dashboards available Data must be converted into a new format before integrating into a new system Some systems connect to the same data source Some, but not all, datasets are organized using the same folder structure Some datasets use similar classification taxonomies Some business needs are met, but little worth is perceived Some useful BI products or basic visualizations Some BI products and visualizations are infrequently shared among stakeholders All sensitive information fully secured from collection to data product Privacy filters and other safeguards applied at time of collection All relevant security software and procedures employed. Fluid and convenient authorization structure Authorization process is up to date and fully prevents all unauthorized use Easy to grant new users access when warranted Data quality is fully known and actively monitored Detailed data quality rankings performed Data quality dashboard and other tools available Both automated and manual processes used to flag low quality data Organization owns data and sensibly guides use Reasonable cost to store and use data Data can be analyzed with any number of tools Data system is well managed Optimal data management software in use Data management plan is sensible and frequently updated Multiple in-house experts on the system are available Both reactive and proactive monitoring is in place Data can be integrated without conversion or modification All systems connect to a single data source All datasets organized using a single planned folder structure All datasets conform to a single, documented classification taxonomy All stakeholders derive real value from the data All current needs satisfactorily met by a suite of BI products and visual dashboards BI products and processes are regularly reviewed so that the most successful can be shared and emulated
From page 64...
... 57 Focus Area Low Benchmark Moderate Benchmark High Benchmark Data Analytics Data Development Documents & Content Reference & Master Data Metadata Data Dissemination Data must be copied to other systems for analysis Analysis results written across many different systems No data streaming processes Software outdated or proprietary and cannot handle big data No data products being developed No review processes being performed No in-house knowledge of data development processes No documentation maintained Documentation is never reviewed No clear ownership of documentation responsibilities No automated processes to update data documentation No reference diagrams or documentation maintained No visual representation of dataset relations Master data values are inconsistent No metadata catalog All metadata are dataset dependent Metadata are never made available to data users Data are unavailable to all but a few users No thought given to implementing open data policies No shared data products Some data must be copied to other systems for analysis Analysis results written on a separate system from source data Little or no data streaming capabilities Software used is modern and open source but not designed for big data Some data products being developed Reliant upon third party for development and support Not all data views available Data products meet some needs of the organization Some documentation maintained in an offline format Some documentation is sporadically reviewed Documentation ownership is clear but there is no incentive for owners to keep documentation updated Automated status checks made but no automated document editing available Some reference documentation maintained in an offline format Visual representation exists but is outdated or otherwise inaccurate Master data values exist in multiple siloed locations but are consistent Metadata catalog only applies to some data Some groups of similar datasets are augmented with similar metadata fields Some users may be able to access metadata fields applied to some datasets Data are unavailable to some users who could benefit from it Some open data policies in use Some data products are shared via an unmonitored process All analysis possible on same system as source data All analysis results written to the same system as source data System designed to handle streaming data in real-time Software designed specifically to handle big data Many data products being used and developed In-house experts can write and understand the code All relevant data views available Data products meet all current needs of the organization Detailed documentation is updated regularly Documentation available in an online, web-based format All documentation is regularly reviewed, revised, and updated Owners encouraged to manage content regularly Automated processes regularly update web documentation with information extracted from live datasets Detailed documentation is available in an online, web- based format that is updated regularly Visual representation exists in a regularly updated and highly legible format Master data values are stored and managed in one accessible location Metadata catalog used for all applicable data All datasets are augmented with the same well- documented metadata fields wherever possible All metadata for all datasets, along with associated documentation, is made available wherever appropriate Data are available for use by all users for whom it is relevant Open data policies applied wherever possible All relevant data products are shared with authorized users whose usage is monitored and who may bear some of the costs involved
From page 65...
... 58 It is important to note that the purpose of performing these assessments was not to single out individual agencies for corrective action, but rather to identify the state of practice across all the agencies generally. Rather than focusing on individual agencies, it is most useful to know whether most agencies scored high, low, or if there was a mix of high and low scores.
From page 66...
... 59 Considering the assessment results, most organizations are in the early stages of developing modern ways of managing emerging technology data, if at all. Typically, organizations deploying emerging transportation technologies have started thinking about open data policies, privacy protection, and data collection but have not yet finalized their documentation or procedures.
From page 67...
... 60 It should be noted that the development of the benchmarking methodology (from the foundation principles of modern data management) let to the development of a Data Management Capability Maturity Self-Assessment (DM-CMSA)
From page 68...
... 61 Following the introduction, there were four cycles focusing on each of the four data lifecycle management components: "create," "store," "use," and "share." Each cycle began with an overview of the framework component, followed by a break-out session gathering feedback related to that component and a report-out session where the most relevant insights were shared across all break-out groups. Each of the break-out groups consisted of 5-7 attendees and one moderator, and the members of each group shifted to create as many varied discussions as possible.
From page 69...
... 62 o Many have little to no understanding about big data and do not see a problem to solve using big data; as such, developing a modern, big data environment is perceived as nothing more than a new cost. o Most do not have the bandwidth to learn about big data on top of existing responsibilities.
From page 70...
... 63 o The cost of acquiring new datasets limits how much agencies can access. o When vendors negotiate directly with elected officials and other leadership it precludes transportation agencies from having the opportunity to negotiate for data access.
From page 71...
... 64 o Many organizations have no approved "Data Scientist" pay scale, leading them to approve significantly lower salaries as compared to those in the private sector. o Traditional data expertise is abundant, while modern, big data expertise remains scarce.
From page 72...
... 65 o Organizations require some consideration of both their current use cases and future data needs. □ Communication "If executives don't understand it in 5 minutes, it gets pushed aside." o Analysts, data teams, and IT professionals need to be able to communicate their data initiatives in clear business terms that executives and elected officials can understand.
From page 73...
... 66 analysts, engineers, and front-line decision-makers, and not just be a conversation between the vendor and limited groups within an organization. o Legal teams need to understand data contracts, especially the need to maximize ownership of the data while minimizing liability.
From page 74...
... 67 o Appropriate compensation categories are needed for analysts, engineers, and data scientists. o Internal data teams need to be appropriately sized.

Key Terms



This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.