Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
82 Appendix B â Interview Guide and Questions Appendix B contains the script and questions for the telephone interviews conducted as part of the data gathering for this project. Interview Script and Questions Hi, my name is from , thanks for taking time out of your busy schedule to meet with us. As youâre aware, weâre conducting this interview on behalf of NCHRP Project 08-116: âFramework for Managing Data from Emerging Transportation Technologies to Support Decision- Making.â Based on our initial research and your survey responses, we identified your agency and project being on the cutting edge of data management to support decision-making for emerging transportation technology deployments. We wanted to take the next to walk through a series of interview questions to learn more about how you are managing data for your project. Your inputs are important to help guide the project team in establishing a framework with specific procedures for identifying, collecting, aggregating, analyzing, and disseminating data from these early tests and deployments and will ensure that the challenges, successes, and lessons experienced by early adopters are captured and communicated to peer groups. The framework will focus on the data management from emerging transportation technologies including connected vehicles, automated vehicles, on-demand and shared mobility services, smart city technologies, accessible transportation technologies, as well as other emerging technologies. While I have a list of interview questions, the intent is to keep the discussion informal. Please note that individual respondents will be not identified by name unless you give express permission. If you have any questions regarding this NCHRP project, or would like additional information, please feel free to contact me. Iâll be providing you with my contact information following this interview. Again, thank you for volunteering to participate in this interview. If you donât have any questions, letâs get started. 1. Can you provide a brief overview of your emerging technology project? What is the scope of the project and what is the role of data in supporting decision-making for your agency and other end users? Note: Information should be pre-populated based on survey results. If needed, you can ask if there is a website or documentation that the project team would be willing to share with the NCHRP project to learn more about their emerging technology project. 2. What is the status of your emerging technology project? For example, is the project in the initial planning phase, design phase, or operations phase? Note: Again, if we already have this information from the survey, we can simply ask the interviewee to confirm the information. A. Overview
83 3. Before we dive into some more data specific questions, can you take a minute to discuss your role in the project? How familiar are you with the data that is being collected and how it is being managed as part of the project? a. Is there someone else in your agency or working on the project that you believe would be beneficial for us to contact to learn more about the data and how it is being managed? 4. Who (i.e., what agency/organization) is collecting data for this project? 5. What types of data are you collecting and how is it being collected? a. Describe the nature of the data being generated/collected. b. Do you have procedures and policies to manage data collection? If documented, would you be willing to provide it to the NCHRP team? c. How are data collected (from the point where it is generated to the point where it is stored)? If you have any documentation or diagram showing this process, would you be willing to provide it to the NCHRP team? d. Is the source data ever directly modified after it is collected? e. What type of software is being used to collect the data? 6. What file format(s) are you collecting data in? (e.g., XML, binary, JSON, CSV) a. Do you have the ability to provide a sample of the raw data? If so, please provide 7. Can you speak to the amount of data the is being generated by your project? a. How much data is being generated/collected hourly/daily? b. How often is new data being generated by the system? 8. Are there any institutional, policy, and or technical challenges related to data collection that you have encountered? And how have you overcome these challenges? a. Are there any resource, policy, legal, or other requirements, restrictions, or limitations surrounding the data collection? 9. Who (what organization or organizations) is responsible for the storage of the data? 10. Where and how is your data stored? a. At what level is the data being stored? â e.g., event level, vehicle level, road level; temporal level (millisecond, minute, hour, month, year); geographic level (geo- coordinate, address, FIPS code, zip code, county, state)? b. In what format is the data stored? (e.g., spreadsheet, flat files, relational database, NoSQL database; in-house vs. cloud; third-party) c. How long do you keep the stored data? B. Data Collection (Components: Data Collection) C. Data Storage (Components: Data Storage, Data Architecture, Metadata)
84 11. What portion of the data is stored for analysis (readily accessible) vs. stored in archive (not readily accessible)? a. What is the volume of the data being stored for analysis â megabytes (e.g., spreadsheet, PDF), gigabytes (e.g., relational databases), terabytes (e.g., large relational databases), petabytes (e.g., NoSQL databases)? b. What is the volume of the data being stored in archive â megabytes (e.g., spreadsheet, PDF), gigabytes (e.g., relational databases), terabytes (e.g., large relational databases), petabytes (e.g., NoSQL databases)? 12. Do you have any legal or policy restrictions regarding what data you can store? a. What is the nature of the data being stored for analysis in comparison to the data being collected? If not the same as whatâs being collected, can you provide a sample of the data? 13. How are data organized within the system? a. What type of database(s) do you use? (Relational Database, NoSQL Database, etc.) b. Do you have a metadata catalog? If so, please share. c. Do you have a data schema (database schema, XML schema, JSON schema, ontology, etc.)? a. How are the folders containing data organized? b. Do you have a naming convention for the data files? c. Are the tables well formed with one column per feature, one row per observation, and one table per type of observational unit? d. Is the data geo-spatially located? e. What dimensions (other than temporal and spatial) is your metadata composed of? 14. How is the data backed up and preserved? a. How often are backups performed? b. Is at least one backup offsite or cloud-based? c. How much time would it take to recover the data if the primary storage system suffered a total failure? d. Is the data being versioned (can earlier versions of the data be retrieved)? 15. Do you have a disaster recovery plan? If so, would you be willing to share it with the NCHRP team? 16. How is the data secured? D. Data Organization (Components: Data Architecture, Database Operations) E. Data Resilience (Components: Data Architecture, Database Operations, Data Storage) F. Data Security (Components: Data Security, Data Storage)
85 a. Who controls access to the data and how is data access controlled? a. Who is allowed to enter (write) data? b. Who has (or can get) access to (read) the data? c. How does a user authenticate when accessing the data? (Password? Two factor authentication?) d. What process is followed when a new user needs access to the data? b. What security measures are in place to prevent unauthorized access to the systems that handle sensitive data? c. How is the data initially secured at time of collection? 17. Does any of the data you are collecting contain personally identifiable information (PII)? a. What type of PII is being collected? b. What type of protection is used for PII, is the data deleted? Is it stored as a hash? Separated into its own table? c. What processes are in place to anonymize PII and sensitive data? 18. What methods do you use to verify or manage the quality of the collected data? a. How is the quality of the data being managed in the data store (e.g., schema, data audits)? b. What dashboards or other monitoring processes are in place to identify data quality? c. Do you employ a data quality ranking system? d. How is the data quality initially managed at the time of collection? How is the history and origin of each data point being tracked from raw collection of data to storage (i.e., data provenance)? 19. How do you manage lower quality data? a. When low quality data are identified, is it deleted or managed in some other way? b. What automated processes do you have in place to detect and flag low quality data automatically? c. Is there a system in place to empower all data users to manually flag low quality data they find? 20. Who owns the data and is responsible for its management? a. Are users restricted in what tools they may use with the data? b. Are there any service providers that contractually own or control use of the data? c. Are there costs associated with accessing the data? â e.g., public/free, one-time fee, subscription-based, pay as you go 21. What if any institutional and policy barriers are you facing in governing, and managing the data? G. Data Quality (Components: Data Quality, Data Collection, Data Governance, Database Operations) H. Data Governance (Components: Data Governance, Database Operations)
86 a. What data management challenges have been encountered? â e.g., lack of standards, privacy, security, legal. 22. Describe the systems that interact with your data. a. Do you have a graphical representation of your system? b. Is any piece of software or architecture in use closed source or proprietary? 23. Is the architecture managed in-house or is it cloud-based? 24. Do you have a data management plan for the data store? If so, would you be willing to share the Data Management Plan with the NCHRP team? If a DMP doesnât exist, why not? 25. What data management software is being used (if any)? a. Are any dashboards in place for monitoring system activity? b. Are any proactive measures in place to detect and/or prevent abuse of the data environment? 26. Who maintains the systems that interact with your data? a. Does your organization employ experts in-house to maintain these systems, or is that work contracted out? b. How is user activity tracked within the system? 27. What documentation do you have for these processes and how is that documentation managed? a. How frequently is the documentation reviewed and updated? b. Where is the documentation located? c. How is the documentation accessed? d. What format is the documentation in? (PDF file, web-based, etc.) 28. Is your project applying agile software development processes (or iterative development processes) where a working model is put in place and then repeatedly improved upon? 29. What are the data products your pilot is generating? a. Who is developing the software for your project's data products? b. Do you have any data views (reports, dashboards, visualizations, etc.) you can share? c. As part of the data products, can raw data be queried or downloaded? Are data products persistent or temporary? I. Data System Architecture (Components: Database Operations, Data Storage) J. Data System Management (Components: Data Governance) K. Data Documentation (Components: Documents & Content, Reference & Master Data) L. Data Product Development (Components: Data Development)
87 30. Who analyzes the data and how do they interact with the data? â e.g., APIs, dashboards, query tools, custom programs to analyze the data. a. Are there costs associated with accessing the data? â e.g., public/free, one-time fee, subscription-based, pay as you go. 31. What types of analyses are being performed? a. What specific kinds of analysis are associated with the data? â e.g., aggregate, merge, update, modify, match and compare, classify, forecast, filter. b. What is the nature of the data analysis on the system? â e.g., same queries at scheduled intervals, ad hoc queries, streaming analysis. c. What data are able to be analyzed in real-time? d. Are the results of any analysis written to the same system as the data? e. Does the data need to be copied to a new location for analysis? 32. What tools are being used for the analyses? a. What tools are in place to enable "big data" analysis? b. Is the software used by data analysts open source or proprietary? 33. Who are the current and future data consumers? 34. How do you plan on making the data product(s) generated by your pilot available to its current or future users? a. Is the data going to be restricted to a single entity, or will it be available publicly in some form? b. If an unexpected new data consumer requests access to the data, what process will grant them access? 35. Does your agency have an Open Data Policy? And if so, can you describe your policy? a. Were there any institutional, policy, and or technical challenges related to setting up your agencyâs Open Data Policy that you can share with us? And how have you overcome these challenges? That concludes our interview. Thank you for your participation. Your inputs are extremely valuable to the NCHRP team and will be instrumental as the team develops its framework to help state and local transportation agencies manage data from emerging transportation technologies. If you have additional thoughts or would be willing to share project related documentation with us, please feel free to share them with us through email address. Thanks again for your time and have a great day! M. Data Processing and Analysis (Components: Data Analytics, Data Development) N. Data Product Dissemination (Components: Data Security, Data Storage) O. Wrap-up