Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
5 Pilot projects were conducted at the Washington, Utah, and Iowa State DOTs. The WSDOT pilot demonstrated ways to improve search and navigation for engineering manual content. The UDOT and IADOT pilots demonstrated techniques for automating tagging of documents to improve their findability. Each of the pilot projects is described in the follow- ing sections. 2.1 WSDOT Pilot The WSDOT findability pilot was designed to build on a separate ongoing Manual Modern- ization effort at WSDOT to improve management and findability of content within WSDOTâs engineering manuals. This effort was originally inspired by Missouri DOTâs Engineering Policy Guide wiki that combined content from their various manuals. Because WSDOTâs manuals are produced in PDF, there was no easy mechanism to search across the body of manuals to find information related to a particular topic. WSDOTâs vision was to enable search and navigation to the content of interest across its various manuals. Preliminary work had looked at explicit cross-references built into the manuals to gain an understanding of interrelationships across the different manuals. WSDOT was interested in further exploring connections across the manuals by implementing a pilot demonstration system within their web platform (Drupal). WSDOT had selected eight manuals to include in this pilot demonstration; each manual had content related in some manner to stormwater management. The WSDOT findability pilot included the following activities that augmented the agencyâs Manual Modernization effort: ⢠Implement an unsupervised machine learning technique (cluster analysis) to identify common topics or themes across the manuals and understand how these topics are distrib- uted in different manual chapters; ⢠Demonstrate an algorithm to partially automate the process of splitting up manual sections into subsections or âchunksâ of text to be displayed on individual web pages; ⢠Create a set of categories or âfacetsâ to serve as filters of manual content based on likely ways that manual users would want to search and navigate the consolidated body of content; ⢠Develop a sample ontology (set of terms with associated semantic relationships) based on the selected facets; ⢠Demonstrate auto-classification of individual manual sections based on selected terms in the ontology; ⢠Work with subject matter experts to identify compelling search/discovery use cases for the manuals website and validate the ontology; and C H A P T E R 2 The Pilot Projects
6 Implementing Information Findability Improvements in State Transportation Agencies ⢠Create a video demo of several use case scenarios showing how the ontology adds value for search and discovery. A screenshot of the WSDOT Manual Modernization Pilot website is shown in Figure 2. 2.2 UDOT Pilot UDOTâs pilot also built on an existing initiative. The agency was piloting implementation of a commercially available information indexing and discovery tool called âKnowvation.â This tool includes full-text search, faceted navigation, and spatial (map-based) search capabilities. It also has the capability of âcrawlingâ specified disk locations and building an index of content while leaving the content in place. UDOT was interested in initially testing this tool to provide a way for planners working on corridor studies to search across multiple agency repositories to find data sets and documents relevant to the corridor. The Knowvation pilot indexed content from UDOTâs ProjectWise engineering content management system, a shared file drive in one of the agencyâs regions, and the agencyâs repository of data sets. The Knowvation tool provided a way to search spatially, by project number, and by file type (e.g., PDF and XLS), but it did not have the capability to filter search results by content types such as as-built plans, contracts, and studies. The original repositories did not categorize documents based on content type, so there was no source of information for this type of filter. The UDOT findability pilot was designed to add value to the Knowvation implementation by creating an Figure 2. WSDOTâs Manual Modernization Pilot websiteâSearch results page.
The Pilot Projects 7 automated approach to identifying selected content types. Six content types were included in the pilot: agreements, project concept reports, design exceptions, quitclaim deeds, warranty deeds, and highway easement deeds. The pilot also tested techniques for automated extraction of proj- ect numbers and other important metadata elements from documents. Automatically extracted metadata could be used to fill in gaps in the Knowvation metadata for documents sourced from locations (such as file drives) that do not provide this metadata. A screenshot of the Knowvation system is shown in Figure 3. The UDOT pilot also included a separate effort to replicate the cluster analysis technique applied in the WSDOT pilot for engineering manuals. This part of the pilot explored transfer- ability of cluster analysis results across DOTs. 2.3 IADOT Pilot IADOT was in the process of upgrading their legacy Electronic Records Management System (ERMS) that had long served as the agencyâs official system of record for construction project (and other) records. They were interested in ways to streamline the document intake processes for this system. Document sources included the ProjectWise repository, used during the project design process to store design files and related correspondence, and DocExpress, used to support eConstruction and facilitate secure document exchange with external partners. Processes were in place to ensure that all project plans were entered into the ERMS at the point of letting. However, subsequent changes in plans (including as-builts) and other project documents were not necessarily included. There was an effort made to ingest authoritative Figure 3. UDOTâs Knowvation search tool.
8 Implementing Information Findability Improvements in State Transportation Agencies copies of files from ProjectWise and DocExpress documents into the ERMS, but this required considerable manual effort because of the lack of consistent metadata. Consistent metadata were needed to enable efficient entry in the ERMS. Consistent metadata were also key for deter- mining whether documents were already in the ERMS. The scope of the IADOT findability pilot was to test methods for auto-classification of and metadata extraction from plan and proposal documents in DocExpress. This would address IADOTâs current pain points related to registering documents from DocExpress in the ERMS and ensuring that the ERMS has the most authoritative project information. Auto-classifying documents and extracting key metadata elements should help to streamline the process of check- ing whether the DocExpress contents have already been registered in the ERMS. IADOT was interested in extracting project identification numbers (PINs), project numbers, and work types. While project locations are also important, IADOT maintains good data on the locations associated with each project. If a project number is available, then a data service can be used to identify the location. 2.4 Summary of Findability Improvement Techniques Demonstrated The WSDOT pilot demonstrated a solution to the common challenge of having multiple, separately published but interrelated engineering guidance documents in an agency with- out a reliable way to find definitive information about a given topic. It showed the potential value of (1) moving from a library of stand-alone PDF documents to a searchable body of web content and (2) integrating standard terminology with definitions and relationships across terms into a search tool. The pilot system showed promise for improving usersâ ability to quickly search and navigate a complex body of technical content and find answers to their questions. See Section 5.1 for a description of how the WSDOT pilot system was created. The WSDOT and UDOT pilots demonstrated use of an analysis technique for clustering a set of documents (in this case, chapters of engineering manuals) by topic area. By grouping words and concepts according to their relatedness, cluster analysis can discern the major topic areas in a collection of manuals. This technique can be used to inform creation of subject area categories for search and navigation and to understand the degree to which certain topics are covered within multiple documents. It can also be used to identify commonly appearing terms for a given subject area for incorporation into controlled vocabularies or auto-classification rules. See Section 5.2 for further information on the unsupervised machine learning technique used in the pilots for conducting the cluster analysis. The UDOT and IADOT pilots demonstrated that automated or semi-automated document classification and metadata extraction can be viable alternatives to manual metadata creation, which is both time-consuming and often inconsistent. The ability to create metadata that other- wise would not be available improves existing content management system and search tool capabilities, helping users to quickly locate information required for their jobs. See Sections 5.3 and 5.4 for further information on document classification and metadata extraction techniques used in the pilots.