Skip to main content

Currently Skimming:


Pages 9-65

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 9...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 2 State Transportation Agencies 2.0 Washington State DOT Tests 2.1 Planning and Scoping Round 1 The WSDOT findability test was designed to build on a separate ongoing effort at WSDOT to improve management and findability of engineering manual content. This effort was originally inspired by another state DOT's implementation of a wiki combining content from their various manuals.
From page 10...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 3 State Transportation Agencies • Create a video demo that walks through several use case scenarios demonstrating how the ontology adds value for search and discovery. 2.2 Content Collection and Analysis Round 1 Content Harvesting and Processing WSDOT provided 18 of their engineering manuals in PDF format for the content analysis.
From page 11...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 4 State Transportation Agencies WSDOT Manual Description Roadside Manual (RSM) The Roadside Manual supplements the Roadside Policy Manual by explaining how to implement the policies found in the RPM.
From page 12...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 5 State Transportation Agencies WSDOT Manual Description Development Services Manual (DSM) The Development Services Manual is a major component of the department's overall strategy to promote a consistent statewide development review process and the application of mitigation policies.
From page 13...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 6 State Transportation Agencies WSDOT Manual Description Utilities Accommodation Policy (UAP) The Utilities Accommodation Policy was established in cooperation with the utility industry.
From page 14...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 7 State Transportation Agencies Figure 1. Example section from the Temporary Erosion and Sediment Control Manual.
From page 15...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 8 State Transportation Agencies Figure 2. PHP Script designed to convert a PDF document to a txt file or an html file.
From page 16...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 9 State Transportation Agencies These words are called "stop words." Common stop words are: a, an, and, are, as, at, be, by, for, from, has, he, in, is, it, its, of, on, that, the, to, was, were, will, with. Table of Contents and Glossaries The Table of Contents (TOCs)
From page 17...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 10 State Transportation Agencies Many different text analysis algorithms and software applications can perform text mining. We used a Natural Language Processing (NLP)
From page 18...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 11 State Transportation Agencies Table 2 lists 80 words that account for 20% of the most frequently used words. While text mining can provide evidence of words and themes that are representative of the corpus' meaning, manual effort is required to distinguish words that are meaningful and are candidates for inclusion in a taxonomy or ontology to describe the content in the manuals.
From page 19...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 12 State Transportation Agencies Single Word Term # Manuals Frequency Count Cumulative % agency 18 2579 11.2% permit 18 2531 11.4% management 16 2525 11.6% regions 18 2495 11.8% flows 17 2479 12.0% typed 18 2458 12.3% agreements 18 2448 12.5% requirement 18 2426 12.7% water 17 2398 12.9% standard 18 2331 13.1% determined 18 2328 13.3% reviewed 17 2312 13.5% processed 18 2275 13.7% siting 18 2267 13.9% environmental 18 2235 14.1% maintenance 18 2207 14.3% within 18 2175 14.5% system 18 2139 14.7% department 18 2134 14.9% contracted 17 2060 15.0% datums 18 2038 15.2% documented 18 2036 15.4% existed 18 1996 15.6% offices 18 1953 15.7% impacting 18 1951 15.9% making 18 1920 16.1% transportation 18 1872 16.2% soil 17 1857 16.4% lined 18 1842 16.6% material 18 1825 16.7% times 18 1823 16.9% limits 18 1782 17.0% runoff 14 1773 17.2% showing 18 1771 17.4%
From page 20...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 13 State Transportation Agencies Single Word Term # Manuals Frequency Count Cumulative % locations 18 1756 17.5% cities 17 1732 17.7% following 18 1722 17.8% conditioned 18 1710 18.0% locals 18 1691 18.1% pipes 13 1686 18.3% developments 18 1674 18.4% reported 16 1661 18.6% foot 18 1641 18.7% roadways 18 1637 18.8% sloping 16 1623 19.0% also 18 1616 19.1% appendix 16 1610 19.3% considered 18 1576 19.4% formed 18 1575 19.5% approvals 18 1570 19.7% lands 17 1568 19.8% surveys 16 1568 20.0% Each of the highlighted terms are used in 14 to 18 manuals indicating that there are many terms that are used widely across manuals. Table 3 includes two-word phrases.
From page 21...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 14 State Transportation Agencies 2-Word Term # Manuals Frequency Count referred to 18 669 Washington state 18 665 highway runoff 11 649 limited access 14 607 service manual 8 597 accordance with 15 591 number of 18 547 traffic controls 13 540 used for 16 528 less than 17 522 required to 17 507 local agencies 15 499 included in 17 476 states department 18 455 meet the 16 444 responsible for 17 442 use in 17 441 necessary to 18 438 utility accommodations 7 427 displaced person 1 425 consultant services 3 424 creek near 1 421 standard specification 14 420 development services 7 419 management practices 13 418 due to 17 416 use the 16 416 applied to 18 413 cost of 17 397 real estate 13 396 plan and 18 394 related to 17 387 provided the 17 385 reviewed and 17 381
From page 22...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 15 State Transportation Agencies 2-Word Term # Manuals Frequency Count shown in 14 380 best management 11 378 runoff treatment 4 378 land use 14 373 way plan 10 355 real properties 9 354 flow control 5 353 contact the 16 351 area of 17 346 plan sheets 12 345 described in 16 341 information on 18 339 information management 1 336 orders to 18 330 personal property 4 328 highway rights 13 319 subject to 16 311 wsdot environmental 9 309 see the 15 308 standard plan 11 308 purpose of 17 300 changes in 18 299 associated with 16 298 relocation assistance 5 297 cost estimates 14 295 access control 12 293 compliance with 15 293 complied with 18 290 sediment control 9 284 Cluster Analysis Cluster analysis divides textual data into conceptually meaningful groups. In the context of understanding and classifying unstructured text, cluster analysis identifies conceptual classes that can be used for classification of content.
From page 23...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 16 State Transportation Agencies as the basis for identifying categories and subcategories of concepts that, when arranged hierarchically can help a user navigate and find content that meets their requirements. We selected the open source software programming language Python 3.6 with its various components to perform the analysis.
From page 24...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 17 State Transportation Agencies Function Definition Parsing Parsing in NLP processing is the process of determining the syntactic structure of text by analyzing its constituent words based on the underlying grammar. The output of the parsing process is a parse tree in which the sentence is the root, and intermediate nodes are noun phrase and verb phrase.
From page 25...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 18 State Transportation Agencies We decided to focus on the k-means clustering algorithm because our preliminary results from the LDA models showed that most documents belonged to a single topic. See references (2)
From page 26...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 19 State Transportation Agencies The 18 clusters resulting from the analysis (full set of 18 manuals divided into chapters) are shown in Table 5.
From page 27...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 20 State Transportation Agencies Cluster Top 15 Terms 15. Development Review sepa, gma, counti, local, land, appeal, los, agenc, impact, propos, environment, mitig, review, cipp, rtpo 16.
From page 28...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 21 State Transportation Agencies Round 2 The round 2 tests did not add content to the manuals site – activities focused on creating additional ways of navigating and searching the original set of eight manuals. 2.3 Solution Development and Testing Round 1 Scope Development of the WSDOT findability solution consisted of the following activities: • Ingesting the manual "chunks" into Drupal (WSDOT's web content management system)
From page 29...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 22 State Transportation Agencies • Which fields are required? • How to index and search each field?
From page 30...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 23 State Transportation Agencies • Asset type (e.g., material related to traffic signals or culverts) , • Mode (e.g., material related to pedestrian and bicycle accommodations)
From page 31...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 24 State Transportation Agencies • The Practical Solutions thesaurus and the Engineering Publications glossary also provide useful resources but primarily consist of terms and definitions and contain very few synonyms (equivalent terms) or relationships across terms.
From page 32...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 25 State Transportation Agencies Process Group/Element Design Project Delivery Method Selection Project Data, Survey Data, and Base Map Interchange Justification Access Control Materials (Roadway) Geotechnical Bridge and Structures Roadway Geometrics and Plans Hydraulics/Drainage Partnerships Railroad Facilities Plans Roadside Restoration and Site Development Traffic Analysis Traffic Design & Plans Utilities Work Zone Traffic Control - Design & Plans Design Documentation R/W Base Map and R/W Plans Environmental Review and Permitting Endangered Species Act Compliance Section 106 & EO 05-05 Compliance Discipline Reports NEPA/SEPA Compliance Environmental Permits Environmental Commitment File Design-Build Procurement Design-Build Contract Package Statement of Qualification Phase Proposal Phase Plans, Specifications & Estimates Contract Plan Sheets Preparation Contract Specifications Development Construction Estimate Development Construction Permits Constructability Reviews PS&E Reviews Project Shelf Contract Ad & Award
From page 33...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 26 State Transportation Agencies Process Group/Element Real Estate Services Appraisal/Administrative Offer Summary Review & Determination of Value Acquisition Relocation/Relocation Review Board and/or Adjudicative Hearings Property Management Condemnation/Possession & Use R/W Certification Construction Construction Engineering Construction Milestones Table 7. WSDOT's ECM Taxonomies Discipline Categories Agreements Architectural & Engineering Services Construction Developer Services Information Technology Inter-Agency Leases & Rentals Personal Services Purchased Services & Goods Railroad Rates Specialty Group Internal Agreements Utilities Buildings Architectural Electrical Foundations Mechanical Superstructures Bridges & Structures Design Documentation Plans, Specifications and Estimates Construction Management Construction Administration Payroll and Other Confidential Information Environmental Archaeological & Other Confidential Information Cultural Resources Endangered Species Act Hazardous Materials NEPA-SEPA Permits
From page 34...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 27 State Transportation Agencies Discipline Categories Public Lands - Section 4(f)
From page 35...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 28 State Transportation Agencies Discipline Categories Project Reporting Regional & Statewide Programming Risk Management Scheduling Scoping Trend Analysis Vendor Payments (Accounts Payable) Work Order Accounting Workforce Planning Project Design Design Documentation Plans, Specifications and Estimates Public Involvement Administrative - Internal Communications Informational Materials & Web News & Media Public Outreach & Responses Real Estate and Right of Way Acquisition Appraisal Property Management Relocation Survey Photogrammetry Aerial Photography Computer Aided Engineering Photogrammetry & Remote Sensing Survey Traffic Services Analysis Illumination ITS Signals Signing Pavement Marking Utilities and Railroads Utilities Railroads Based on the input from target users and the review of available vocabulary resources, the research team selected three initial facets for auto-classification: asset, master deliverables list and subject.
From page 36...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 29 State Transportation Agencies Table 8. Selected Facets and Scope for Rule Development Facet Scope Definition/Description Asset Culvert A pipe or concrete box structure that drains open channels, swales, or ditches under a roadway or embankment.
From page 37...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 30 State Transportation Agencies drainage asset or similarly, a culvert "is a" drainage asset. These types of parent -child relationships occur throughout the hierarchy.
From page 38...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 31 State Transportation Agencies Figure 6. Culvert ontology.
From page 39...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 32 State Transportation Agencies the Stormwater BMP Ontology was reviewed with subject matter experts who validated the structure and provided additional equivalent and related terms. Tagging Process The engineering manual content was automatically tagged using the terms in the ontologies described in the previous section.
From page 40...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 33 State Transportation Agencies Technology Description Apache Jena Apache Jena is an open source application that uses the ontology created by Protégé to find and extract term relationships from the engineering manuals. When it finds a word or concept in an engineering manual, Jena creates and stores a triple which describes the relationship.
From page 41...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 34 State Transportation Agencies The second process flow starts when a new content object is added to the corpus. The content is uploaded to the MySQL database and triggers a workflow that applies tags to a document using the class relationships described in the ontology.
From page 42...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 35 State Transportation Agencies Figure 8 shows an example of how preprocessing occurs. Figure 8.
From page 43...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 36 State Transportation Agencies step occurs in Taggr. Taggr changes all upper-case letters to lower case letters.
From page 44...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 37 State Transportation Agencies Figure 9. Content tagging validation process.
From page 45...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 38 State Transportation Agencies • IsPartOf – used within the culverts facet to represent part-whole relationships. Example: An Invert is part of a Culvert.
From page 46...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 39 State Transportation Agencies • collection and conveyance drainage system elements that collect water and convey it to another location • energy dissipation drainage system elements that help to limit erosion by reducing flow velocity • storage and dispersion drainage system elements that provide temporary or permanent storage for water or cause water to be spread over a wide area. Materials Facet In round 1, terms representing culvert materials were incorporated into the asset facet.
From page 47...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 40 State Transportation Agencies WSDOT Deliverables Expectation Matrix Topic NCHRP 20-97 Cluster Analysis Topic(s) Roadway Geometrics and Plans 11.
From page 48...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 41 State Transportation Agencies • Tag based on rules – one could develop a set of rules to check for words and phrases that signal each topic. The results from the cluster analysis indicating common words could provide a starting point.
From page 49...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 42 State Transportation Agencies • Expert Design Engineer ‒ Advising Design-Build project team about how steep they could make the side slope of a ditch to fit with the available right of way. (Design, Highway Runoff, Hydraulics)
From page 50...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 43 State Transportation Agencies and if so, what the most cost-effective way to comply is. (Design, Highway Runoff, TESC)
From page 51...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 44 State Transportation Agencies Scenario 3: Floodplain Mitigation During a Design-Build project meeting, a question came up about floodplain mitigation requirements. They searched for floodplain.
From page 52...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 45 State Transportation Agencies once users gain experience using the basic features of the site, they would begin to use the filtering options more frequently. The fact that users didn't immediately begin using the filters underscores the importance of upfront design work involving target users.
From page 53...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 46 State Transportation Agencies 2.4 Washington State DOT Implementation Plan The purpose of the implementation plan is to provide WSDOT with a roadmap for future development and application of the techniques demonstrated in this test. Introduction WSDOT participated as a test agency for NCHRP 20-97: Improving Findability and Relevance of Transportation Information.
From page 54...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 47 State Transportation Agencies Task Explanation A3. Select and implement vocabulary management and text analytics tool(s)
From page 55...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 48 State Transportation Agencies Task Explanation D Add Content and Adjust Ontology and Facets D1.
From page 56...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 49 State Transportation Agencies One option to consider is using Protégé as the master source for agency terminology. This tool can store glossary definitions, synonyms, and other term relationships.
From page 57...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 50 State Transportation Agencies Language Toolkit.) The Transportation Research Board has selected PoolParty for managing the TRT following an evaluation of products conducted as part of NCHRP Project 20-109 and documented in NCHRP Report 874.
From page 58...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 51 State Transportation Agencies B Test the Technology Solution and Specify Enhancements B1.
From page 59...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 52 State Transportation Agencies Summarize and validate findings Summarize key findings of the interviews, and answer the following questions: • Are there particular clusters of manuals that users want to search horizontally? • Are there search facets or terms that appear to be common across multiple users?
From page 60...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 53 State Transportation Agencies • Implement the ability to view and navigate parent-child relationships built into the ontology. Currently the facets along the left side of the pilot site do not show the hierarchy of terms.
From page 61...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 54 State Transportation Agencies C4. Establish tagging process Tags will need to be refreshed whenever new content is added OR adjustments to the ontology are made.
From page 62...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 55 State Transportation Agencies Decompose the manuals into "chunks." Chunks are intact sections and subsections of manual content. A script was created as part of the NCHRP 20-97 test to decompose the manuals and output html pages.
From page 63...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 56 State Transportation Agencies Update the Cluster Analysis Starting with the prepared text files, apply tools in the Python Natural Language Toolkit (NLTK) to remove punctuation and stop words and perform stemming (convert words to their root form.)
From page 64...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 57 State Transportation Agencies relationships first. Then, enter labels for each of the classes (terms)
From page 65...
... NCHRP Web-Only Document 279: Information Findability Implementation Pilots at 58 State Transportation Agencies • Oversee text analysis/controlled vocabulary development. • Oversee development of requirements for a vocabulary management tool.

Key Terms



This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.