National Academies Press: OpenBook
« Previous: Volume I - A Guide for State Transportation Agencies
Page 121
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 121
Page 122
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 122
Page 123
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 123
Page 124
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 124
Page 125
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 125
Page 126
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 126
Page 127
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 127
Page 128
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 128
Page 129
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 129
Page 130
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 130
Page 131
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 131
Page 132
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 132
Page 133
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 133
Page 134
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 134
Page 135
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 135
Page 136
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 136
Page 137
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 137
Page 138
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 138
Page 139
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 139
Page 140
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 140
Page 141
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 141
Page 142
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 142
Page 143
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 143
Page 144
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 144
Page 145
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 145
Page 146
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 146
Page 147
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 147
Page 148
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 148
Page 149
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 149
Page 150
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 150
Page 151
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 151
Page 152
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 152
Page 153
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 153
Page 154
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 154
Page 155
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 155
Page 156
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 156
Page 157
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 157
Page 158
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 158
Page 159
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 159
Page 160
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 160
Page 161
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 161
Page 162
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 162
Page 163
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 163
Page 164
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 164
Page 165
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 165
Page 166
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 166
Page 167
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 167
Page 168
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 168
Page 169
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 169
Page 170
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 170
Page 171
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 171
Page 172
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 172
Page 173
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 173
Page 174
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 174
Page 175
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 175
Page 176
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 176
Page 177
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 177
Page 178
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 178
Page 179
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 179
Page 180
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 180
Page 181
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 181
Page 182
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 182
Page 183
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 183
Page 184
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 184
Page 185
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 185
Page 186
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 186
Page 187
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 187
Page 188
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 188
Page 189
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 189
Page 190
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 190
Page 191
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 191
Page 192
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 192
Page 193
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 193
Page 194
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 194
Page 195
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 195
Page 196
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 196
Page 197
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 197
Page 198
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 198
Page 199
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 199
Page 200
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 200
Page 201
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 201
Page 202
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 202
Page 203
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 203
Page 204
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 204
Page 205
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 205
Page 206
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 206
Page 207
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 207
Page 208
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 208
Page 209
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 209
Page 210
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 210
Page 211
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 211
Page 212
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 212
Page 213
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 213
Page 214
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 214
Page 215
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 215
Page 216
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 216
Page 217
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 217
Page 218
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 218
Page 219
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 219
Page 220
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 220
Page 221
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 221
Page 222
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 222
Page 223
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 223
Page 224
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 224
Page 225
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 225
Page 226
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 226
Page 227
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 227
Page 228
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 228
Page 229
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 229
Page 230
Suggested Citation:"Volume II - Background Research." National Academies of Sciences, Engineering, and Medicine. 2017. Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research. Washington, DC: The National Academies Press. doi: 10.17226/24804.
×
Page 230

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

V o l u m e I I Background Research

C o n t e n t s V o l u m e I I Background Research II-3 Chapter 1 Introduction II-3 1.1 Project Overview II-4 1.2 Document Overview II-5 Chapter 2 Current Practices for Improving Findability II-5 2.1 Overview II-5 2.2 Literature Review II-11 2.3 Findability Practices—DOTs II-18 2.4 Findability Practices—Other Organization Types II-22 Chapter 3 Framework for Improving Findability II-22 3.1 Framework Development Process II-22 3.2 Final Framework II-27 Chapter 4 Pilot Demonstration II-27 4.1 Pilot Objectives II-27 4.2 Identification of Pilot Agencies II-29 4.3 Summary of Pilot Activities II-30 4.4 Level of Effort for the Pilot II-34 4.5 Transferability and Scalability of the Pilot II-36 Chapter 5 Conclusions and Future Research Needs II-36 5.1 Conclusions II-38 5.2 Future Research Needs II-41 References II-42 Appendix Pilot Findability Report II-43 A.1: Pilot Overview II-44 A.2: Assessment II-49 A.3: Content Collection II-54 A.4: Solution Development II-68 A.5: Test and Evaluation II-75 Annex 1: Pilot Classification Rule Descriptions II-84 Annex 2: Example Scenarios Using Faceted Search Design II-90 Annex 3: Evaluation Metrics

II-3 C h a p t e r 1 1.1 Project Overview Research Objectives The objective of this research was to improve state department of transportation (DOT) infor- mation findability by (1) defining a management framework—including responsibilities of a transportation agency and its partners—for classification, search, and retrieval of transportation information; (2) describing successful practices for organizing and classifying information (e.g., ontologies or metadata schemas) that can be adapted to classification, search, and retrieval of the diversity of information a transportation agency creates and uses; (3) developing federated or enterprise search procedures that a DOT can use to make transportation information available to users, subject to concerns for security and confidentiality; and (4) undertaking an example implementation of the management framework, the organization and classification practices, and search procedures to demonstrate enhanced findability for a DOT’s data. Research Scope and Tasks NCHRP Project 20-97 was structured in four phases: • Phase 1 involved information gathering to document current practices for improving find- ability and relevance of information. It included a literature review, interviews with five state DOTs, and compilation of information about practices in non-DOT organizations based on the research team’s prior experience. • Phase 2 involved developing a framework for improving findability for use by DOTs, based on the information gathered in Phase 1. • Phase 3 involved a pilot demonstration of techniques for improving findability at a state DOT. This pilot focused on findability of construction project information and the application of text analytics tools for automated classification of content and improving relevancy of search results. • Phase 4 involved documenting the results of the research in this final report, and development of a stand-alone guide to improving findability. The resulting guide is presented as Volume 1 in this research report. Introduction

II-4 Improving Findability and relevance of transportation Information 1.2 Document Overview Volume II of NCHRP Research Report 846 provides a high level summary of the project method- ology and deliverables. The balance of Volume II is organized as follows: • Chapter 2 documents the information gathering activities and summarizes practices for improving findability. • Chapter 3 documents the framework that was developed for improving findability. • Chapter 4 provides an overview of the pilot demonstration. • Chapter 5 contains conclusions from the research including lessons learned and suggested ideas for future research. • The Appendix and Annex materials contain a more detailed description of the pilot activities.

II-5 C h a p t e r 2 2.1 Overview The initial substantive research task involved a review of “successful practices for ensuring find- ability of mission-critical information in public and private sector organizations,” leading to the development of an initial framework for ensuring findability. To accomplish this objective, the research team conducted a practice review involving a literature search, telephone surveys with five state DOTs, and documentation of prior research team project experience with findability projects outside of the state DOT community. Each of these activities is discussed in this chapter. 2.2 Literature Review A limited literature review was conducted, focusing on three areas: (1) general references on information architecture and search, (2) specific references on transportation information man- agement and findability, and (3) state DOT enterprise architecture studies. References reviewed are listed in Table II-1. Findings of relevance to this project are briefly summarized below. Information Architecture and Search As noted in the working plan, a rich base of research, guidance, and practice examples from multiple domain areas exist related to information governance, architecture, and search. General lessons from information architecture practice include: • Ensure that improvement initiatives will have clear benefits to the organization that can be measured. Identify the organization’s critical information assets, and target findability issues that are adversely impacting efficiency or effectiveness. • Tailor solutions to different search questions and associated patterns. Distinguish simple lookup needs from more open-ended discovery needs. • Recognize that success of any findability initiative depends on actual usage. Providing convenient and satisfying search experiences for users is essential. • Make sure that the solutions fit the organization’s capabilities; do not pursue a resource- intensive approach if it cannot be realistically sustained. For example, approaches that require ongoing staff efforts to manually classify documents generally are not sustainable. • Leverage new technologies. For example, automation of content discovery and indexing has advanced; semi-automated approaches can provide effective results and are more cost-effective than manual processes. The Information Management Foundation published a set of 19 best practice articles on topics including taxonomy and content management integration, metadata creation and man- agement, knowledge transfer, enterprise information management investments, web analytics, Current Practices for Improving Findability

II-6 Improving Findability and relevance of transportation Information and media creation management (see Table II-1). One of these articles, referencing a content management initiative at Motorola, contained sufficient information to be included as one of the research team’s standard findability practice examples. Other articles were informative, but they were either not directly applicable to state DOTs or not provided in a case study format. The book’s introduction offers several principles that are reinforced by the best practice articles, and that are useful to keep in mind in designing findability improvements: • Information is communication. In designing findability improvements, it is necessary to consider what kind of communication is being supported or assisted. • Information has value. Information management takes effort, and therefore candidate efforts need to be evaluated based on business value. • Information has audiences. It is important to obtain an in-depth understanding of the needs and search behaviors of information users, who are the targets of any findability improve- ment. Otherwise, there is a real danger that resources will be invested without providing value. Information Architecture and Search 1. Information Management Best Practices, Volume 1 (The Information Management Foundation) 2010 https://books.google.com/books?id=m RhgHhvhiUsC&lpg=PA4&ots=4Ihtwj- Kq&dq=boiko%20Hartman%20best%2 0prACTICES&PG=pa1=onepage&q=boik o%20Hartman%20best%20practices&f- false 2. Search Patterns (Morville and Callender) 2010 http://searchpatterns.org/ 3. Ambient Findability (Morville) 2005 http://www.amazon.com/Ambient- Findability-Peter- Morville/dp/0596007655/findability- 20/ Transportation Information Management 4. NCHRP Report 754: Improving Management of Transportation Information (Cambridge Systematics) 2013 http://onlinepubs.trb.org/onlinepubs/ nchrp/nchrp_rpt_754.pdf 5. NCHRP Report 643: Implementing Transportation Knowledge Networks (Spy Pond Partners, LLC) 2009 http://onlinepubs.trb.org/onlinepubs/ nchrp/nchrp_rpt_643.pdf State DOT Enterprise Architecture 6. Development of a Strategic Enterprise Architecture Design for Ohio DOT (Cooney, Clement, and Shah) 2014 http://www.dot.state.oh.us/Divisions/ Planning/SPR/Research/reportsandpla ns/Reports/2014/Administration/1347 56_FR.pdf 7. Kansas DOT Enterprise Architecture 2005 http://www.mdt.mt.gov/other/webda ta/external/research/DOCS/RESEARCH _PROJ/IT_ARCH/TASK_2.PDF (pp. 11- 15) Table II-1. Literature review: Findability of transportation information.

Current practices for Improving Findability II-7 • Information has a life cycle. A full life cycle view of information is needed to ensure findability—and successful content management efforts encompass the creation, tagging, conversion, publication and retirement/culling processes. Peter Morville’s two books provide valuable background on the nature of search and strate- gies for designing useful search environments (see Table II-1). Some fundamental concepts of relevance to identifying successful practices for findability include: • Findability is a challenge because information is stored in multiple repositories, with multiple inconsistent ways of labeling and categorization, and inherent ambiguities in language. Moreover, in most organizations, “findability falls through the cracks,” meaning that nobody is responsible for the end result. • Improving findability means considering the nature of searches. Search objectives range from looking for a specific item (e.g., find the AASHTO Asset Management Guide) to exploring available resources within a topic area (e.g., find out what guidance or experience exists for roundabout design). Also, searchers vary in terms of search skill level and familiarity with the content space. • Where exploratory searches are common, search interfaces based on faceted navigation can be very helpful. This approach is used on many shopping websites (e.g., on www.Amazon.com). They are powered by structured databases incorporating standardized metadata for each item. • Findability can be approached not only through “pull” methods (in which a user actively searches for content), but also through “push” methods (in which the user receives content based on subscriptions or role-based targeting). • Search success can be evaluated based on precision and recall. Precision measures how well a system will retrieve only the relevant documents (e.g., the percentage of results that are relevant); recall measures how well a system will retrieve all of the relevant documents (e.g., the percent- age of available relevant documents that were included in the search results). Because precision and recall often are inversely related, it is helpful to understand which metric is more important when designing search capabilities. • Full-text search performance depends on the size of the search pool. As collections increase in size, both precision and recall decline. • Removal of redundant, outdated, and trivial content from the search pool is helpful for shrinking the search space and improving search results. This can be tackled via content policies that define what should (and should not) be stored, and by regular weeding of the collection. • Use of descriptive metadata for subject and content type is increasingly valuable for improv- ing findability as collections grow larger. However, while metadata improves search perfor- mance, centralized, manual tagging is typically too expensive and time-consuming for most large-scale search applications. • Different search objects require different findability strategies. An approach for a relatively small set of policy documents might rely on full text search, whereas an approach for a large collection of photographs would require use of keywords or structured metadata. • Effective search design patterns include use of faceted navigation, use of “Best Bets” for the most common queries, use of auto-complete and auto-suggest as the user is typing the search criterion, emphasis on presenting the “best results” first (through well-tuned relevance algorithms), options to sort by date, options to filter by format and content type, use of personalization information, and use of diversity algorithms to guard against redundant results. • Federated search is helpful when searches across multiple sources are needed, but perfor- mance can be slow, and metadata-based queries are limited to the “lowest common denomi- nator” across the different sources. Building a unified index of content across repositories is an alternative that can achieve the same objective.

II-8 Improving Findability and relevance of transportation Information • A variety of approaches can be taken for the design of content classification and tagging methods. There is no single best way; the approach should be designed to fit the need: – Taxonomies can be helpful in situations for which findability can be enhanced via a hierarchical breakdown or tree structure of content (e.g., locating construction project information applicable to project phases, and tasks within phases). – Faceted classification can be helpful in situations for which users want to search based on different criteria, such as locating meeting records based on date of meeting, organizational unit running the meeting, or type of content produced (e.g., presentation slides, meeting minutes, meeting agenda, etc.). – Standard key words can be helpful to facilitate common searches. Use of thesauri can extend the value of key words by establishing preferred terms as well as equivalent and associated (broader and narrower) terms. For example, a user searching for “performance measures” could be directed to resources that were tagged with the terms “structurally deficient” or “pavement condition index”. – The resource description framework can be used to document a formal, machine-readable representation of relationships across terms, providing a powerful semantic foundation for search-based applications, and the ability to link independently produced data resources. For an example, see the BBC’s Wildlife Ontology (http://www.bbc.co.uk/ontologies/wo), used to power the organization’s Wildlife Finder website (http://www.bbc.co.uk/nature/wildlife). – Free-form tagging (or folksonomies) can be helpful for social media posts and other con- tent that is somewhat transient in nature. • Google’s search methods combine full-text, metadata, and popularity measures in which inbound links constructed by humans are, in effect, used as metadata. • Typical intranet searches do not perform as well as Google searches of the Internet given the absence of structured metadata to power faceted navigation, and insufficient scale to support full text relevance-ranking algorithms. Transportation Information Management NCHRP Report 754: Improving Management of Transportation Information reviewed the state of the practice in transportation agency information management (see Table II-1). This report identified successful strategies in some areas, including organization and distribution of structured data (notably traffic data and geographic data), use of content and document management systems such as SharePoint and ProjectWise, sharing of internal research reports, and DOT library services including cataloging of printed documents. Challenges also were identified in several areas: • The growing number of information sources, which make it time-consuming and difficult for staff to determine what is most relevant and valuable to read and share. • A need to make website and other content findable through improved information organiza- tion and use of key words. • The siloed nature of information creation and management within the DOT, impeding find- ability and use of centralized information management strategies. • A lack of user training on how to discover and retrieve data and information. • A lack of executive policy direction on information management. • Highly constrained staffing and financial resources for improving findability and providing reference support. The report also presented a variety of information management strategies, organized around processes for capturing, administering, and retrieving information. Strategies related to improved findability included: • Establish agency policies for information governance, archiving and records retention. • Provide content in electronic format; use digital preservation and allocate funds to address electronic file management.

Current practices for Improving Findability II-9 • Leverage available technology for information storage and retrieval. • Establish categorization schemes for data and information management. • Use taxonomies, semantic schemes, and authoritative glossaries and vocabularies, and use taxonomy management tools. NCHRP Report 643: Implementing Transportation Knowledge Networks was an earlier study that established a business plan for knowledge sharing within and across transportation agencies (see Table II-1). This study included focus groups and a web-based survey to better understand information needs. Reported information needs were wide ranging; the list below provides a flavor of the diversity of information being sought and nature of search requirements. Searches for Specific Documents • Search for a particular engineering standard. • Search for an older plan or engineering document (especially difficult if the project name changed or if the project was split or combined). • Search for online equipment maintenance manuals. • Search for unpublished or “gray literature” (e.g., presentations from internal meetings or national conferences). • Find current active contracts and agreements. Searches for Information on a Specific Topic and of a Specific Content Type • Search for research reports or information about best practices. For example, “Has a study been done about outdoor advertising practices?” • Search for latest developments for a specific technique or technology application. For example, “What are the latest technologies for automated speed enforcement?” • Search for activities at peer agencies. For example, “What are other agencies doing in the area of innovative finance?” “Who is using electronic signatures on plans?” • Find current links to different websites with information collections. For example, “Where can I look for information about pavement preservation methods?” • Search for existing or pending/proposed local, state, and/or federal legislation related to a particular topic area. People Searches • Find specific contacts at a peer agency for different functional areas. Data Set Searches • Search for data relevant to a particular question (e.g., construction costs, vehicle registration trends, freight movements). Project Searches • Search for historical information or construction details related to a specific project. Recorded Event Searches • Search for historical information about events (e.g., details about a particular crash or incident). Based on the identified information needs, the report presented a vision for a central informa- tion portal including federated search capabilities with the following components: • Information search, giving access to various organized information sources including agency survey results, library catalogs, data sets, and legislation. • Topic search, giving access to curated sets of information resources, maintained by designated national topic leaders, organized by resource type (e.g., research report, synthesis, data set, etc.).

II-10 Improving Findability and relevance of transportation Information • People search, by role (e.g., traffic engineer for City X) or by area of expertise. • Calendar search for events by date range and topic. • News search for articles by topic area, keyword, and source. • Research search (e.g., giving access to TRB sources and other sources on active transportation research projects). The business plan recommended the following performance measures related to findability: • Changes in access time and in cost for a standard “basket” of information goods. • Percentage of unique transportation library holdings that can be discovered via available search tools. • Percentage of active and completed research projects that can be discovered. State DOT Enterprise Architecture The state DOT enterprise architecture studies for Ohio DOT and Kansas DOT did not explicitly address findability but do provide useful models of state DOT business processes and information systems that serve as a context for understanding search needs and behav- iors (see Table II-1). Figures II-1 and II-2 show two products of the Kansas DOT enterprise architecture study. Figure II-1 is a value-chain view that distinguishes primary and support- ing activities of the agency and illustrates the life cycle of core business process activities. The value chain provides a way of understanding the business context for information orga- nization. The categories identified can provide a useful way to classify agency information resources and understand their creation and utilization patterns. Figure II-2 illustrates a high level data model that provides another way of categorizing different information resources in an agency. Source: Redrawn from figure in draft document from the Kansas State DOT. Figure II-1. Kansas DOT enterprise architecture value-chain diagram.

Current practices for Improving Findability II-11 Figure II-3 shows a simplified business process model developed as part of the Ohio DOT enterprise architecture study. Similar to the Kansas DOT value chain, it offers a way to associate the agency’s information resources with key business activities. In effect, these architectural views provide ways of dividing up the state DOT information space into logical categories that can support findability of information. These categories may be used to develop one or more facets for DOT information search. (Note that Figure II-3 was adapted from the original and simplified to show its essential elements.) 2.3 Findability Practices—DOTs Methodology Five state DOTs were selected for interviews based on the research team’s familiarity with ongoing initiatives related to improving findability: • The Washington State DOT • The Virginia DOT Source: Redrawn from figure in draft document from the Kansas State DOT. Figure II-2. Kansas DOT enterprise architecture high level data model.

II-12 Improving Findability and relevance of transportation Information • The Mississippi DOT • The Illinois DOT • The Colorado DOT The interview was divided into the following sections: • Basic information about the agency (e.g., number of employees, system size). • Description of successful practices for improving findability. • Current practices for managing selected types of content. • Approaches for managing special content types (images, data sets, social media). • Findability improvement needs. If an agency had adopted information organization schemes or classification methods, the research team requested copies. The interviews proved to be a very useful method for gaining a good understanding of the “information management landscape” at state DOTs. Several commonalities were identi- fied across the five agencies with respect to content management tools, processes, and chal- lenges. One observation from the exercise was that, because information management is not typically a highly centralized activity in state DOTs, one would need to interview many dif- ferent individuals across multiple departments to obtain a complete picture of content stor- age, organization, and search practices. For the most part, the individuals interviewed had a reasonably broad and complete understanding of formalized information management Figure II-3. Ohio DOT business process view (simplified).

Current practices for Improving Findability II-13 systems and practices. In some instances, they consulted with others. For the Washington State DOT, the research team conducted follow-up calls to fill in some of the details for specific content types. Summary of Findings The DOTs interviewed were facing several common challenges: • Lack of consistency across business units as to what content is stored, where it is stored, and in what format. • Lack of a coordinated approach for management of structured and unstructured information resources. • Lack of ability to search across different information repositories in the organization. • Limited formalized metadata standards; the metadata in use was primarily administrative rather than descriptive in nature. • Lack of formalized information governance processes. In general, the DOTs interviewed had implemented the following types of practices to provide information findability: • Deployment of content management systems for construction project plans. • Deployment of content management/collaboration software for sharing of corporate, busi- ness unit and team content. • Implementation of data warehouses and geographic information system (GIS) portals to provide centralized access to structured data resources. • Digitizing paper documents for archiving and retrieval. Successful Practices for Enhancing Findability Successful practices for findability (as selected by the interview subjects) were as follows: • Colorado DOT. The Colorado DOT identified two initiatives: their Online Transportation Information System (OTIS) website and their Document Retention Program (DRP). – OTIS provides a single point of access for roadway data using a GIS platform. – The DRP applies “lean” business process improvement methodologies for document reten- tion to meet legal, regulatory, or audit requirements. The department is developing an implementation plan for improving consistency and efficiency of the program. There are more than 40 document retention coordinators distributed across the different functional areas of the department. All of the content will be stored either on ProjectWise (engineer- ing documents), SAP ContentServer (financial documents), or SharePoint (everything else). Each document will be assigned to a retention schedule, which will serve as a classi- fier. Retention schedules are being refined and streamlined. Metadata is being defined but emphasizes document management rather than search. – The Colorado DOT plans to use the Fast Search and Transfer (FAST) search engine to provide federated search capabilities across the three repositories. The DOT is exploring use of the Perceptive product for supporting document intake workflow. The DOT also has a governance committee for document management systems. They have developed suggested standards and user-friendly guides to available document storage options. • Illinois DOT. The Illinois DOT identified two initiatives: a SharePoint implementation and their data warehouse program. – The Illinois DOT was an early adopter of SharePoint, and its use has become “part of the culture.” Basic governance is in place for defining new content types, defining metadata

II-14 Improving Findability and relevance of transportation Information elements, and ensuring searchability of content across SharePoint sites. There are more than 5 million documents in the system. IDOT implemented SharePoint to cut down on duplicate copies, eliminate multiple potentially conflicting versions of documents, and provide a business platform for content management, team collaboration, and workflow. Built-in workflow is a key factor for success. IDOT uses an add-on tool for workflow design and a companion tool from KnowledgeLake for document capture. An initial effort with this product was conducted to capture content related to American Recovery and Restoration Act projects. – The Illinois DOT has established a data warehouse program that includes extract- transform-load processes for capture of information from a diverse set of legacy systems, and a business intelligence portal providing access to several subject area data marts including construction, financial information, payroll, safety, and human resources. The DOT does not use a taxonomy for SharePoint, but is currently working to develop one as part of a records management system implementation. This is a collaborative effort involving the Bureau of Information Processing, the records coordinator, and the Illinois DOT’s library. • Mississippi DOT. The Mississippi DOT did not identify a single successful effort, but rather offered information on their overall approach to content management utilizing three systems: Microsoft SharePoint, Bentley ProjectWise, and EMC ApplicationXtender. – SharePoint was implemented at the Mississippi DOT in 2003 and is used for human resources content, transportation commission documents, standard operating procedures, e-forms, and business unit collaboration. – ProjectWise was implemented more recently and is used for management of construction project files (including CAD plans). – ApplicationXtender is an older product that is used for scanning and archiving documents for records management. Documents include permits, financial records, law enforcement records, and project-related files. – The Mississippi DOT uses the native SharePoint search engine for federated searches includ- ing SharePoint, ApplicationXtender (AX), and files stored on shared network drives. Goals for implementation of these systems included improved searchability and access, decreased paper file storage, and support for internal and external collaboration. – The Mississippi DOT has established metadata standards and has defined clear roles for content management. They have an enterprise content management (ECM) team that meets regularly to update governance and ensure standardization across the agency. • Virginia DOT. The Virginia DOT was focusing on improving management of 1,400+ policy and procedure documents to ensure that DOT staff can find the most recent, authoritative versions of these essential corporate documents. – The Virginia DOT was implementing a new tool that will allow for simultaneous publication of updated documents in both SharePoint (used for the agency’s intranet) and the external website. The Virginia DOT’s Knowledge Management Office was taking the lead for this ini- tiative, handling metadata development and assignment, including controlled vocabularies for document type and subject key words. – The Virginia DOT has also implemented DeepWeb, a federated search tool that allows simultaneous searches of their library catalog, several subscription databases, the Vir- ginia DOT’s Twitter feed and YouTube channels, the 50-state DOT Google search (which searches content within state DOT public-facing websites), TRID, and other national and international transportation information sources. This capability is working well and the Virginia DOT is beginning to consider how it might be integrated with the DOT’s SharePoint site. • Washington State DOT. The Washington State DOT identified several initiatives of note, including development of an agency data catalog and metadata repository (“DOTS”), a data

Current practices for Improving Findability II-15 warehouse, a physical library staffed by professional librarians, use of SharePoint for docu- ment sharing and collaboration, use of ProjectWise for engineering document sharing, and deployment of a GIS tool providing access to widely used geospatial data. The DOT also noted that they are piloting a tool called Varonis for security management and file utilization tracking. Varonis has capabilities for automated content classification based on pattern and dictionary-based content matching. (The agency subsequently reported that it did not imple- ment this tool, in part due to its cost.) – “DOTS” is a custom application that harvests metadata from the Washington State DOT’s Information Technology (IT)-managed databases. DOTS maintains definitions for thou- sands of business terms, and maps these terms to data elements. Currently, DOTS has a lim- ited text search function. Planned improvements will add the capability to search based on use of synonyms. A single full-time equivalent position is devoted to maintaining DOTS; other staff support is provided as needed. The data warehouse provides a single, authori- tative source of integrated data to support core business functions, providing answers to questions that would have previously been prohibitively time-consuming to answer. Data are stored in SQLServer. – The Washington State DOT is currently replacing their existing query/reporting tool with IBM Cognos. The DOT is served by six professional librarians staffing four physical libraries (main, materials lab, terminal engineering, and vessels engineering). The first two librar- ies are affiliated with the Washington State Library; the latter two libraries support the Washington State Ferries division. Librarians assign keywords and Transportation Research Thesaurus (TRT) index terms to research reports. Information Management Practices and Goals Additional findings from the state DOT interviews are summarized in the remainder of this section. Where Do these DOTs Store and Manage Content? Identified storage locations are: • Database servers • Dedicated video servers • Shared network drives • Local hard disks on employee desktops and laptops • External websites • Intranet sites • Cloud storage locations • Project websites (internal and external) • Physical libraries Identified information repositories and content management systems are: • SharePoint (agency-wide, departmental and team documents) • ProjectWise (engineering documents) • Falcon (design plans) • Oracle content management system/Stellent • OpenText/ECM LiveLink • DocuWare (records management solution) • EMC Application Xtender (AX) • GIS Portals

II-16 Improving Findability and relevance of transportation Information • SAP Content Server • Data warehouse • Web-based content management system (e.g., Plone) • Custom applications (e.g., for right-of-way management, project management) • Social media (cloud) (e.g., YouTube, Twitter, Facebook, Flickr) What Other Tools Do DOTs Use for Information Management? Search tools used are: • Google Search • SharePoint Search • FAST search • Other search tools (e.g., those built into documents and content management systems) Query and reporting tools consist of: • IBM Cognos Catalogs/metadata repositories used are: • Library catalog software (EOS) • An enterprise metadata repository (at the Virginia DOT) • A data catalog (at the Washington State DOT) Other tools used are: • KnowledgeLake Capture for SharePoint • NINTEX Workflow Designer (a SharePoint add-in) • Informatica (for data transformation) What Approaches Are Used for Metadata and Categorization of Content? The approaches used are: • Keywords for subject and document type, assigned by Knowledge Management or Library staff for policies and procedures (Virginia DOT). • Use of standard link fields across systems (e.g., vendor ID, project ID) (Illinois DOT). • Taxonomy for records management system (Illinois DOT, under development). • Definition of standard content types and metadata elements for SharePoint. • Federal Geographic Data Committee metadata for GIS data sets (descriptive metadata; limited value for search). • Standard folder organization structures and metadata elements (project ID, location) for ProjectWise. • Business concept definition management; association with data elements (Washington State DOT). What Findability-Related Business Goals Do They Have? • Enable improved discovery in response to litigation, audits, and Freedom of Information Act (FOIA) requests. • Support core business needs (e.g., asset management). • Ensure recovery of valuable documents in the event of a disaster or hardware failure. • Reduce duplication through making available centrally accessible repositories, use of links rather than copies, version control, and so forth.

Current practices for Improving Findability II-17 • Avoid information loss due to lack of organized information management. • Protect investment in costly plans/studies and make sure they can be found. • Facilitate getting new employees up to speed (e.g., ability to find and review background documents relevant to a position). • Promote data sharing by ensuring that people know what data are available and how to access it. What Types of Needs Are Recognized? Standards, policies, and governance-related needs are: • Develop agency-wide standards for managing construction project-related content and reducing time/costs of finding information when claims are filed. • Provide clear guidance on where different types of content can and should be stored. • Ensure electronic content is text-readable/searchable. • Implement common classification approach across content types and storage locations. • Develop metadata solutions that recognize the wide variety of heterogeneous content types. • Obtain stronger endorsement/management support for coordination of application and data architecture to promote data re-use and ensure integration across systems. • Put in place stronger information governance to accomplish and sustain findability improvements. Education and training needs are: • Make the case for a more disciplined approach to content management, without which it is difficult to implement and enforce strict governance policies. • Train users on how and where to search. • Improve awareness of when full text search is sufficient and when a more structured approach to metadata is needed. • Provide education to ensure that staff understands the importance of information manage- ment (including unstructured content) and the role of information owners. Content management capabilities needs are: • Provide reliable and persistent electronic storage for content. • Automate workflow for life cycle management of content. • Digitize archival paper records; reduce/eliminate paper generation. • Implement content/document management systems to provide electronic access (the alterna- tive being paper files in boxes or scanned files on CDs). • Manage access to content based on roles (e.g., tied to ActiveDirectory) across different repositories. Search capabilities needs are: • Support searches for specific documents as well as searches by topic area to provide a satisfying user experience (e.g., “Google-like” or “Amazon-like”) and provide meaningful search refiners (e.g., date, document type, content classification). • Build standard search/retrieval services into line-of-business applications. • Reduce the number of places to search for information; provide a single search interface to look across multiple repositories (federated search), including across SharePoint, library catalog, data repositories, and file servers. • Provide spatial search capabilities across content types. • Provide an approach to finding documents stored on external website from internal search tools.

II-18 Improving Findability and relevance of transportation Information • Identify where full text search capability is sufficient and where additional effort to invest in taxonomy development and tagging is worth the cost. • Improve email and archived records findability. • Improve web content organization and findability. 2.4 Findability Practices—Other Organization Types Methodology To complement the state DOT findability practice examples, the research team drew on its experience working with a range of organization types over the past decade to develop six findability improvement examples. A seventh example was drawn from the literature review. Selected follow-up calls to the organizations were made to obtain updated information. The seven organizations were: • The U.S. Government Accountability Office (U.S. GAO). • Battelle (a 5,000-employee engineering and science consulting organization). • The Wyndham Hotel Group. • Boehringer Ingelheim (a major international pharmaceutical company with more than 40,000 employees). • A major industrial conglomerate with 32,000 employees. • First Wind (a 200-employee renewable energy company). • Motorola (a multinational telecommunications company with more than 20,000 employees). These cases represent a variety of organization types, approaches, technologies, types of content, and applications for findability improvement. A standard template was developed to document successful practices, including the following types of information: • Practice description. • Business case. • Scope of content included. • Organizational units leading and supporting the effort. • Technologies used for information storage, access, and search. • Metadata and classification schemes. • Responsibilities and resources for tagging/indexing, vocabulary management, and search monitoring. • Information governance policies and processes. • Reported benefits. Summary of Findings • Organizations. The organizations ranged in size from 200 employees to more than 40,000 employ- ees. Organization types included a government agency, an energy start-up, an industrial manufac- turer, a biotech firm, and an engineering and science consulting company. • Findability practices. Examples were split between a focus on enterprise search capabilities and a focus on content management functions. • Scope of content included. Target content types including project information, scientific literature, and web content. Applications targeted both internal and external content. • Organizational responsibilities. A variety of units had primary responsibility for both the content being searched and the overall project to improve search. One common theme was

Current practices for Improving Findability II-19 the need for collaboration between central groups and distributed groups (e.g., field offices) and the need for a dedicated team working together to plan and implement improvements. • Technologies. Content and document management platforms included SharePoint, Docu- mentum, Adobe Experience Manager, HP Autonomy, and Fatwire (acquired by Oracle). Search engines included those built into these content management platforms, Lucene/SOLR, Verity, and AskMe. Taxonomy management software included DataHarmony, SchemaLogic, and ConceptSearching. The use of text analytics software (Teragram [SAS], ConceptSearching, Inxight, and others) was a major factor in improving the overall quality of tagging documents as well as reducing the cost and time required for creating metadata. • Metadata. Most of the efforts involved creating metadata standards and well-structured vocabularies (including keywords). Several of the examples illustrated the use of taxonomies and particularly faceted taxonomies. • Information governance and policies. These examples illustrated the importance of having a well-defined policy and process for adding metadata to content. Applications of broader information management policies were not explored. • Benefits. Certain shared benefits were exhibited by all or most of the projects, with some specific variations. The main benefit areas were: – Improved search, improved quality of metadata. – Reduced cost and time of the search and creating metadata. – Reduced cost for business processes, including customer self-service, eProcurement, online training, and others. – Reduced need to re-create documents and remove duplicate documents from current repositories. – Increased value from existing information resources. – Value added from new applications and information retrieval capabilities built on top of search. • Success factors and lessons. Success factors and lessons learned for each example are summarized in Table II-2. Implications for State DOT Findability The example search and content management practices assembled generally were more advanced than those in place within the DOTs interviewed for this project. Although many of these organizations are larger (and in some cases, less financially constrained) than the typical state DOT and they have different types of needs, many of the practices described could poten- tially be implemented within a DOT environment. For example, many DOTs could develop or adapt existing taxonomies and use taxonomy terms to tag information resources either manually or semi-automatically. DOTs could also implement faceted search capabilities to improve users’ ability to navigate through available content. DOTs could also devote additional resources to refinement of search tools based on user feedback. The examples illustrate useful approaches and lessons that are applicable for design of find- ability improvements in any organization. By looking at search and information initiatives in a variety of environments, it is possible to get a deeper understanding of search and what factors lead to success. A number of general lessons can be seen in these examples: • First, improving search clearly takes much more than buying a new search engine. Mean- ingful improvement almost always requires taking a deeper look at how search is being used and at the entire information life cycle. It is important to take a comprehensive, strategic perspective that considers integration across all the parts of the organization involved with information access.

II-20 Improving Findability and relevance of transportation Information Organization Lessons and Success Factors U.S. Government Accountability Office (U.S. GAO) • Good quality metadata is essential for good search. • Efficiency of metadata tagging can be improved with hybrid human and text analytics. • Developing good rules for auto-categorization is essential for success. • Good auto-categorization rules require a combination of subject matter expertise and library science expertise. • It is important to understand how information management software works in different environments. Major Industrial Conglomerate • Need for a comprehensive approach to findability aligned with business strategic objectives. • Need to recognize that taxonomies require ongoing maintenance and refinement. First Wind • Essential to get a complete understanding of processes for content creation and retrieval. • Need a good understanding of search technology functions and capabilities, especially within SharePoint. Wyndham Hotel Group • Importance of integrating different perspectives from multiple teams. • Collaborative approach of consultant expertise and in-house business understanding. Battelle • Important to match the level of detail in a taxonomy to the information needs (do not over-engineer). • Development of an integrated search capability for documents, people, and external technical information provided business value. • Making the link to impacts on critical business processes is essential to get support for findability improvements. • A hybrid tagging approach is a powerful way to assign metadata for improved search. Boehringer • Traditional keyword search was inadequate. • Use of faceted search to filter results improved findability. • Text analytics works on highly scientific and technical literature as well as general semi-structured office documents. Motorola • Important to do content management system development and website redesign in parallel. • Important to recognize the many ways a taxonomy can be leveraged. • Important to consider three different aspects of taxonomy implementation (taxonomy for navigation and search, taxonomy management, content tagging). Table II-2. Lessons and success factors from findability practice examples.

Current practices for Improving Findability II-21 • Second, for many search applications, having high quality metadata that is based on well- designed taxonomies is essential for success. In most cases, the more metadata added, the better the search experience will be. Text analytics software applications have the potential to improve metadata quality and partially automate the process of assigning metadata to content. • Third, implementation of faceted search based on well-defined facets is a successful practice. • Fourth, search can be improved incrementally without undertaking a large and expensive information initiative. For example, simply assigning subject matter experts and/or librarians to tag documents as Best Bets can result in gradual improvements to the overall search experience.

II-22 C h a p t e r 3 3.1 Framework Development Process Based on the information gathering activities, the research team developed a preliminary framework for improving findability at DOTs. This framework recognized the complexity of improving findability in a DOT and the need for a multi-pronged approach involving: • Understanding information seeking behaviors and needs. • Mapping the information landscape (i.e., identifying where different types of information are stored). • Understanding the information management life cycle to determine where and how to improve practices for metadata assignment, designation of authoritative documents, and cleanup of redundant and outdated content. • Developing appropriate solutions integrating information management, search, and classification/ metadata elements. The framework was refined during the pilot activities, and again during the creation of the final guidance document. New elements were added to identify agency motivations (business drivers) for pursuing improvements to findability, and for the overall approach to implementation. 3.2 Final Framework Major Elements of the Framework An important insight drawn from the information gathering and the pilot that was reflected in the final framework is that improving findability of transportation information within a DOT is not something that can be done in a single project or initiative. Rather, a set of techniques, tools and organizational functions can be implemented or strengthened over time and applied to meet a set of targeted business objectives. Each agency can approach findability improvement with different emphasis areas or implementation sequences. For example, some agencies may want to begin by focusing solely on information management improvements to clean up and improve organization of content on file drives, email systems, and content management systems. Some may want to focus on improving performance of their existing intranet or content manage- ment system search tools by automating metadata assignment using text analytics techniques. Others may want to implement new enterprise search tools that index content across multiple repositories. Especially when paired with text analytics software that can automate the process of metadata creation and improve metadata quality, search tools have the potential to substantially improve findability in an agency. All of these techniques for improving findability require resources and focused attention. Without a solid grounding in specific business needs—and clear demonstration that there is Framework for Improving Findability

Framework for Improving Findability II-23 a solution that can make a noticeable difference—it is unlikely that resources for findability improvement will be allocated and sustained. A clear focus on meeting business needs and matching of solutions to needs is critical to making progress. The final framework for improving findability is illustrated in Figure II-4. The top-level ele- ment, business drivers, covers major reasons why agencies would be motivated to implement findability improvements. The middle elements, planning and implementation, cover (1) how to establish requirements for improvements so that they fit with agency business needs and information resources, and (2) how to support continuous improvements to findability through a phased implementation approach and appropriate management functions and staff capabili- ties. The planning and implementation elements rest on three pillars representing key techniques and practices from which agencies can draw as they develop their implementation strategy. Business Drivers The top portion of the framework identifies four key motivations for pursuing findability improvements. These motivations were identified as part of the information gathering activities for the project, and were reinforced during the pilot. • Reduced time spent searching for information. Employees spend significant amounts of time trying to find information. A paper by Cleverley (2015) reported that a review of several surveys spanning different business sectors found that “24% of a business professional’s time is spent looking for information.” Reducing the amount of time it takes to track down avail- able information makes more time available for productive work. Figure II-4. Framework for DOT information findability.

II-24 Improving Findability and relevance of transportation Information • More re-use of information, less re-work. Employees who cannot easily ascertain whether something already has been done that they could build upon may end up “reinventing the wheel.” The resulting re-work diminishes the value of agency investments to develop reports, studies, data sets, etc. • Ensure use of authoritative information. Lack of ability to find the most current, autho- rized versions of documents or data sets is a common issue at many organizations, including DOTs. Use of outdated information can create risks for the agency, including inconsistent or improper implementation of agency policies and procedures, which can impact timely project delivery and consistent use of proven effective design practices. • Efficient response to FOIA requests and claims. DOTs face an increasing number of public information requests, which can consume substantial amounts of time to fulfill. Similarly, responding to construction claims may require compilation of detailed records, including emails. Reducing time to compile this information is an important motivation for improving findability. Planning for Findability Improvements The second portion of the framework covers the steps needed to target and design improve- ments to match the needs and the information landscape of the organization. • User needs. The starting point for any findability improvement is an understanding of infor- mation needs, current search behaviors, and pain points. This understanding can be obtained through online surveys, focus groups, interviews, and to some extent, a review of existing search logs. • Information landscape. Once information needs and search behaviors are understood, it is important to develop an information landscape, or “map of the territory” with respect to information repositories and their contents. This information landscape provides the basis for identifying which repositories and which content types should be targeted. Once targets are established, it is useful to obtain a picture of how information is created, updated, culled, and archived, including who is involved, what the processes are, etc. Based on this understand- ing, opportunities and constraints can be identified for improving information management practices, search, and metadata. Implementation of Findability Improvements The third portion of the framework covers a general approach to implementing findability improvements and identifies key implementation activities. The implementation element of the framework recognizes that it is not possible to address all of an agency’s findability needs with a single solution or project. The range of needs across the agency will require multiple solutions. An incremental approach is recommended, grounded in the initial development of a vision that guides future activities. The Road Map A seven-step road map is suggested, involving: 1. Establishing an architectural vision for findability involving shared information repositories, common metadata elements, common terminology, master data management and enterprise search capabilities. 2. Identifying a focus area for improvement. 3. Conducting an assessment. 4. Identifying candidate improvements.

Framework for Improving Findability II-25 5. Implementing “quick wins” (improvements can be easily accomplished with existing resources). 6. Implementing a pilot improvement that is consistent with and supports the architectural vision. 7. Expanding and formalizing the pilot. The architectural vision provides a big-picture view of how the agency will pursue findability improvements. It defines a set of guiding principles and cross-cutting resources (e.g., technology tools, metadata standards) that will be applied and refined over time. The vision also identifies priority needs. A deliberate process of developing a vision that involves key players in the organization is important to building an understanding of how different activities must fit together. This process can be integrated within a DOT’s overall business planning or information manage- ment strategic planning efforts. With a vision and strategy in place, the agency can implement incremental improvements, each of which may focus on a particular business area or type of content. DOTs can build their organizational and technology capabilities with each initiative. To make significant progress in improving findability, it is necessary early on to obtain man- agement understanding of what is needed and why (i.e., how improved findability enhances management functions). Once this understanding is established, a collaborative approach to improvement can be pursued involving existing units that are concerned with improving access to information for decision making (e.g., IT, data management, library, records management, intranet manager, engineering document management system owner, collaboration system owner, etc.). Creating a structure for this collaboration—or identifying an existing team with the right membership—also is important to establishing a focal point for taking action in a coordinated way. Several operational functions need to be considered for supporting findability, including: • Establishing policies and standards. A root cause of difficulties with finding information is the lack of disciplined and consistent practices across organizational units for naming conven- tions, storage locations, metadata assignment, and so forth. Agencies should anticipate the need to establish and facilitate implementation of clear policies and standards for expected information management behaviors. • Putting in place training and change management functions. Training and change man- agement are important both for introduction of new standard practices and for adoption of content management systems and other tools. • Ongoing operational support. This includes management of search and related tools, assign- ment of metadata through manual, semi-automated, or automated means, and management of data integration and synchronization processes. Availability of staff resources to manage, monitor, and improve search over time is an important success factor for a findability solution. Agencies need to plan for and resource these functions, keeping in mind that specialized skill sets will be required. Findability Techniques A range of techniques can be used to improve management of information, provide better search and navigation tools, and build standard terminology and metadata needed to support findability. Any given findability improvement may involve a combination of three types of techniques, which are summarized briefly in this section. Volume I of this research report pro- vides more detailed descriptions.

II-26 Improving Findability and relevance of transportation Information Information Management Techniques Findability techniques related to information management include: • Document management systems and content management systems. Use of these systems offers a more structured and contained environment for content than the “Wild West” of shared file drives and email attachments. • Content storage and cleanup policies and practices. These policies and practices provide consistency with respect to where different types of content can be found, as well as assurance that outdated or obsolete documents are removed or archived. • File naming conventions. Using naming conventions allows users and information manage- ment staff to understand file contents without needing to open them. • Scanning practices. Well-defined scanning practices ensure that files are text searchable. • Security and access controls. Effective security and access controls will provide adequate protection of sensitive information without imposing unnecessary barriers to search across repositories. Search and Navigation Techniques Findability techniques related to search and navigation include: • Enterprise search. These tools support search both within individual information reposito- ries and across different repositories. • Faceted navigation. These interfaces allow a user to explore a body of information resources by selecting from a set of filters that restrict what resources appear on the list. • Auto-suggest. This search tool capability improves search performance by suggesting standard terms that match a user-entered search string. • Search monitoring and tuning. This technique enhances search performance based on targeting of problem areas observed through review of search logs. • Search-based applications, meaning software applications in which a search engine plat- form (rather than a database) is used as the core infrastructure for information access and reporting. Metadata and Terminology Development Techniques Findability techniques related to metadata and terminology development include: • Standard agency metadata elements and content types. Adoption of standard metadata schemes provides consistency across search interfaces and facilitates implementation of fed- erated search and service-oriented models for discovery of information resources. • Standard classifications. Lists of values for common elements (e.g., organizational units, project phases, work types, material types, or infrastructure asset types) can be standardized. • Terminology resources. This technique involves adapting or building taxonomies, synonym lists, and so forth to integrate into search tools to improve their effectiveness. • Automated metadata creation. This technique involves the application of text analytics to automate or assist assignment of metadata elements.

II-27 C h a p t e r 4 4.1 Pilot Objectives A pilot demonstration was undertaken to: • Test and validate concepts and methods for improving findability and relevance of transpor- tation information. • Identify areas for refinements. • Demonstrate effectiveness of findability improvements. • Provide a documented case study application that could be used to strengthen the value of the report. 4.2 Identification of Pilot Agencies The following criteria were identified for selection of pilot agencies: • Availability of an agency point person who supports the effort and can marshal the necessary resources to support it. • Agency level of interest in enhanced search. • Extent to which existing information repositories and search tools reflect “typical” DOT prac- tice (to maximize relevance of pilot results to other agencies). • Ability to provide the necessary access to agency systems and search tools (direct or via agency staff) to enable the research team to implement a search improvement. • Availability of target users to participate in interviews and testing process. • Availability of information management staff (library, data management, website manage- ment, etc.) to support the effort. • Existence of usable controlled vocabulary and/or subject matter taxonomy as a starting point (although a lower priority, the existence of such a vocabulary or subject matter taxonomy could provide the effort with a “leg up”). Based on these criteria, and on indications of potential interest from panel members, the research team contacted three agencies to explore their potential participation in the pilot for NCHRP Project 20-97: the Washington State DOT, the Virginia DOT, and the Wisconsin DOT. A project briefing document was provided to each agency describing the objectives and scope of NCHRP Project 20-97 in general and the pilot in particular. The Wisconsin DOT declined to participate due to internal resource constraints. The Washington State DOT and the Virginia DOT expressed a strong interest in participating. Follow-up telephone conversations were held with staff in both agencies to discuss potential pilot scopes that would be both helpful for the agencies and meet the project objectives. The rest of this section summarizes content from these telephone conversations. Pilot Demonstration

II-28 Improving Findability and relevance of transportation Information The Virginia DOT Background: The Virginia DOT’s Knowledge Management Office is responsible for ensur- ing findability of the agency’s mission-critical content through enhancement of information management and classification methods. The DOT uses SharePoint 2010 for their corporate intranet. SharePoint is also used as the agency’s intranet platform for document sharing and team collaboration. The Virginia DOT uses the FAST search tool that is embedded within the SharePoint environment. The agency has deployed a corporate document repository on SharePoint and continues to improve classification and management of these documents. The Virginia DOT also is in the process of developing a high level taxonomy for describing its content, and plans to build out different elements over time. Needs: The Virginia DOT was interested in addressing several priority findability issues. Spe- cifically, the DOT sought to: • Improve management of active construction project documents, including construction inspector logs and notes, material test results, contractor invoices, certified payroll submittals, and so forth. Practices for managing this content varied across districts, with some districts utilizing SharePoint and others relying on folder structures on shared drives. • Improve the likelihood that a search for a particular document would return a single authori- tative source and distinguish authoritative documents as such. The Washington State DOT Background: In 2015, the Washington State DOT established an Enterprise Information Gov- ernance Group and adopted eight principles for data and information management. At the time of the interview, this group was discussing next steps toward improving enterprise content man- agement (ECM) at the agency. Several distinct content management and collaboration systems were in use, including Oracle ECM, LiveLink, Bentley ProjectWise, and Microsoft SharePoint. The Washington State DOT also had recently completed two projects with students from Kent State University: one focused on improving findability of information in support of agency responses to public disclosure requests (PDRs), and a second focused on developing an asset taxonomy. The PDR project recommended a core metadata structure and developed high level specifi- cations for each of the metadata elements. The core metadata structure included the following elements: • Title • Organizational Unit • Region/Division • Date Created • File Type • Content Type • Abstract-Description • Transportation Keywords • Transportation Asset • Project ID • Project Phase • Business Function/Records Class Work also was done to test the build-out for two elements: Content Type and Transportation Keywords. The Content Type vocabulary involved integration of a broad set of categories for records classification at the statewide and DOT levels. The Transportation Keywords provided a full build-out for the “Transportation Asset” element.

pilot Demonstration II-29 Needs: The Washington State DOT had an interest in leveraging the work completed to date in order to move forward with implementation of a findability solution in support of PDR informa- tion searches and potentially a broader set of findability use cases. Specifically, the agency wanted to validate the content type and asset taxonomies that were developed with users, improve them as needed, investigate practical approaches to creation and assignment of metadata, and develop strategies to integrate use of metadata with current (and future) content management solutions and search tools. Agency staff noted that, currently, each content type had a different set of meta- data elements with multiple taxonomies in use, so a strategy for migrating or mapping to a con- sistent set of metadata elements was needed. Agency Selection An important objective for the pilot was to demonstrate application of the findability frame- work and demonstrate benefits from findability improvements. Achievement of this objective required a fairly intensive effort at a single agency. The research team recommended focusing on a single agency but involving the second agency in a review of results and discussion of transfer- ability. Both agencies met many of the defined selection criteria with respect to level of interest and willingness to provide access to systems and staff support. The Virginia DOT was selected to be the pilot agency for the following reasons: • A focus on construction project document findability is more tractable in the context of a brief pilot effort than the broader and more complex issue of findability in support of PDRs. • The technology environment for content management and search at the Washington State DOT was in flux, and less typical than that at the Virginia DOT (SharePoint and the FAST search engine). Products of a pilot at the Virginia DOT were therefore perceived as likely to have a greater potential for re-use in other DOTs. 4.3 Summary of Pilot Activities The pilot involved the following steps: • An assessment of findability needs based on interviews with relevant stakeholders and identi- fication of specific scenarios on which to focus. • Assembly of a body of content to be searched based on the selected scenarios, and a review of relevant content types and storage locations. • Identification of a standard set of content categorization elements (facets) that would allow users to search or navigate to content of interest. • Development of a semantic model (a set of classification categories and associated terminol- ogy) for describing the content collection based on text mining and review of available agency data sources (e.g., project lists, standard pay item lists). • Development and automation of rules for auto-classification of the content using a commer- cial text analytics package. • Design of a search/navigation solution that allowed a user to enter search terms and refine the search based on the various facets. • Evaluation of recall and precision for different types of searches, comparing use of the solu- tion that was developed to a “plain vanilla” full text search of the same body of content. • Evaluation of transferability of the auto-categorization rules to similar content obtained from the Washington State DOT. The research team initially explored conducting the pilot in-house at the Virginia DOT, but was not able to do this because of hardware, software and IT staffing constraints. The text analyt- ics software vendor selected for this project (Smartlogic) agreed to host a cloud environment and

II-30 Improving Findability and relevance of transportation Information run the indexing and categorization processes using the specifications and rules developed by the research team. Because the solution created as part of this project is based on specific commercial platforms, it is intended as a demonstration of capabilities only; it is not a packaged software product intended for distribution. The rules developed for auto-classification are documented in detail in Annex 1 to this volume of NCHRP Research Report 846, and the ontology developed has been provided to NCHRP as one of the research products of NCHRP Project 20-97, to be made available on request. Table II-3 summarizes the specific activities of the pilot. The pilot project is described in greater detail in the Volume II Appendix of this research report. 4.4 Level of Effort for the Pilot The pilot was implemented by a team with the following roles: • Business lead. This individual had expertise in DOT business processes, structure, and data. • Text analytics lead. This individual had expertise in information architecture, metadata, and text analytics. • Text analytics/taxonomy specialists. These individuals focused on ontology and rule development. • Business analysts. These individuals focused on content harvesting, content analysis, content conversion, and solution testing. Table II-3. Pilot summary. Step Activities 1. Identify Needs Appoint a Lead • The Virginia DOT Knowledge Management Office Director was the lead for the initiative. She works closely with agency IT staff to improve the use of SharePoint as a platform for managing both corporate documents and work group documents. Identify the Emphasis Area • The Virginia DOT wanted to focus on findability of construction project information. Practices for information storage and organization were not consistent across districts. Response to public information requests and assembling information related to construction claims were pain points. An engineering content management system was under development, but it was not going to incorporate pre-existing documents. Identify Stakeholders • Knowledge Management/Library staff who offer services to help staff find information. • Central Construction Office staff involved in developing tools and standards and conducting statewide analysis of construction costs and performance. • District Construction staff involved in day-to-day management of construction projects (e.g., construction manager, area construction engineer, contracts manager, project controls engineer, technology resource engineer). • IT staff responsible for development and support for content management and collaboration software (in this case, SharePoint) and associated search capabilities.

pilot Demonstration II-31 Step Activities 2. Define the Target Scope Select Target Needs • After analysis, four categories of needs were identified to be addressed in the pilot: - Find a specific known document for a project (e.g., an estimate) using a variety of search criteria. - Find/review all available documents for a project (e.g., for a FOIA request). - Search across projects to find projects with a specified item, material, or construction technique. - Research reasons for delays and changes. Identify Target Content • Stakeholders identified a variety of construction project content types and provided examples. • For purposes of the pilot, three content types were selected based on likely business value and relevance to the selected four types of needs: daily work reports, project work orders (change orders), source of materials forms, and project profile forms. Assemble the Content • The pilot involved gathering content of the selected types from SharePoint sites and shared drives. Target content for approximately 250 projects was assembled. • The body of content included emails with attachments that matched one of the three selected content types. These attachments were extracted from the emails. • Data for each active construction project was assembled, including universal project code (UPC), which is the agency’s “cradle-to-grave” project identifier; contract ID; work type; cost; district; route; etc. 3. Prepare the Content Identify and Analyze Stakeholder Needs • Several needs were identified based on the interviews, including: - Provide all records or certain types of records for a particular project (e.g., to respond to a public information request or claim). - Find projects that have installed a particular make and manufacture of an item. - Find projects that have used a particular construction technique (e.g., cold-in- place recycling). - Identify systemic issues that contribute to construction projects not meeting goals for on-time, on-budget, environmental, and quality scores. • Some needs expressed would best be addressed through improvements to structured databases and query tools. These needs were not addressed, because the pilot was focused on improving search and exploration of unstructured content. Some needs expressed were general or complex (e.g., “What projects have involved use of innovative materials?”). Further specificity was required to understand whether improved search capabilities could meet the need. Interview Stakeholders • Stakeholder interviews focused on (1) understanding current information sources and management practices and (2) identifying findability needs and concerns. (continued on next page) Table II-3. (Continued).

II-32 Improving Findability and relevance of transportation Information 4. Develop the Solution Identify the Search Facets • Based on the search needs and the content analysis, the following search criteria were identified: content type, contract award amount (ranges), contractor name, district, type of equipment, jurisdiction name, manufacturer/supplier name, material type, pay item code, project ID/name, highway system category, route ID, type of work, work issue and work order category. • For each of these items, source(s) for lists of possible values were identified. In some cases, these were based on the categories included in the project data set (e.g., the list of Virginia DOT districts). In other cases, values were based on text mining of content (e.g., to produce a list of manufacturers and suppliers). Populate the Semantic Model • The text analytics tool selected for the pilot (Smartlogic) powers faceted navigation and search based on a semantic model (ontology). This model was populated based on the search facets and lists of values. • Building the ontology involved specifying relationships across different terms (e.g., “Bristol District” is a “District”; “Bland County” is in “Bristol District”, etc.). • The ontology also included synonym lists, including for example, multiple types of project identifiers that referred to the same project, and pay item numbers and names. Develop Auto- Categorization Rules • Rules were developed to assign metadata or tags to projects for the different search facets. Some of these rules were fairly straightforward and driven by the ontology. Others required more in-depth analysis and iterative development (e.g., identifying work orders that were related to “drainage issues”). • Assignment of a project number was based on rules that looked for UPCs, project numbers, and contract numbers (using different formatting conventions). Once the project number was found, it was used as the lookup value to tag the document with project attributes from the project data set (e.g., district, routes, contract award amount, type of work, and road system). Clean up the Content • Non-searchable PDFs were converted using optical character recognition (OCR) software, with varying results based on original scan quality. • Irrelevant and obviously duplicative content was eliminated from the body of content to be searched. • Excessively long files were removed. • The project data set was cleaned to reduce inconsistencies, fill in missing data, and normalize identifiers and classifiers. Step Activities Analyze the Content • The body of content was analyzed to (1) inform development of rules for auto- classifying documents and (2) identify issues that would impact the effectiveness of the search capability to be developed. • Issues included inconsistent and non-informative file naming conventions, non- searchable files (scanned PDFs), variations in the work order and daily construction report formats over time and across project offices, presence of duplicate documents, and existence of very long documents consisting of compilations of individual daily work reports. Table II-3. (Continued).

pilot Demonstration II-33 Table II-3. (Continued). Build the Interface • A faceted search interface was built in a SharePoint environment using the Microsoft FAST search engine; however, the text analytics software used for the pilot can be configured for use with other platforms and search engines. • The interface allowed users to explore the content and refine search results by filtering through selected criteria. • The interface included an auto-suggest feature that would provide users with matching terms from the ontology as they started typing. Step Activities 5. Evaluate the Solution Set up a Test Environment • The goal of the evaluation was to compare the solution that was developed to the search capability that could be provided by an “out-of-the-box” full text search capability without faceted navigation or auto-classification. • To accomplish this goal, a “plain vanilla” search environment was set up, pointed to the same body of content. Define Metrics • The testing process focused on measurement of recall and precision. Recall is typically defined as the fraction of all relevant items that were returned from a search. Precision is the fraction of items returned from a search that are relevant to the user’s search query. • In practice, it is time-consuming to test for relevance because (1) each document must be manually reviewed and (2) clear protocols for determining relevance must be established, given that each person may have a different definition of what this means. The metrics and test cases used struck a balance between the need to obtain meaningful test results and the time requirements to conduct the tests. Define Test Cases • A set of specific test cases were defined to provide coverage of different content types and search criteria. • For each test case, steps were defined for conducting a search in both test environments. Perform the Tests • For recall, tests measured the percentage of known relevant documents in the top 30 results. • For precision, tests measured (1) the percentage of relevant documents in the top 20 results and (2) the number of documents needed to find 10 relevant results. • Tables of results were compiled for each test case. Summarize Findings • Test results were analyzed to highlight test cases in which the developed solution provided a significant advantage over the vanilla search. In some cases where results appeared to be similar, further enhancements to the rules were identified for potential future implementation. • Limitations of the test process were recognized (e.g., improvements to the convenience of the search experience provided by the faceted search and the auto- suggest were not tested and would require a more extensive process involving actual target users and subjective evaluation measures). Test and Refine the Solution • The development process involved multiple cycles of testing and refinement. Sample searches were conducted to identify situations in which search results varied from expectations. The samples were used to refine the auto-categorization rules.

II-34 Improving Findability and relevance of transportation Information The estimated total effort was about 700 hours across all team members. An approximate breakdown of this effort across the major pilot activities appears at the end of this section. The experience of the pilot team provides a benchmark for planning similar future efforts. Note that the estimated hours only include design and development of the solution. Additional time was required for documentation, team coordination, and evaluation (assessment of precision and recall). Technical activities to establish the test environment and run the indexing processes to apply the auto-categorization rules were performed by the text analytics vendor and are not included in the estimate. Specific design and development activities included: • Needs analysis (24–40 hours). This task included stakeholder identification, interviews, syn- thesis, and identification of information seeking scenarios to address. • Content assembly (160 hours). The pilot involved harvesting content from various locations to develop a stand-alone external test platform, and converting content to text-readable format. The content harvesting portion of this step would not be necessary for the more typical situa- tion, in which the solution involved searching and indexing content in its native repositories. • Semantic model development (120 hours). This task involved analysis and text mining of the content and incorporation of other sources. • Rule development for auto-categorization (160 hours). This activity involved the initial speci- fication of rules for auto-classifying content with terms in the semantic model. The majority of this time was spent on finding and refining the keywords to use in work issue categorization rules. • Solution testing and refinement (200–240 hours). The research team also engaged in iterative testing and refinement of auto-categorization rules. Following the pilot, the Virginia DOT expressed an interest in better understanding what the cost of software licensing would be for the text analytics product. The agency completed a ques- tionnaire provided by the vendor in order to provide a pricing estimate. The following pricing factors were included on the questionnaire, with the Virginia DOT’s responses: • Volume of content to be indexed (2,000 gigabytes [GB], 4 million documents). • Anticipated growth in content over next few years (500 GB per year). • Type of categorization to be performed (taxonomy/ontology-based vocabulary; options not selected included entity extraction and personally identifiable information [PII]). • Current crawl frequency and duration (incremental crawl every 15 minutes, 10-minute duration; full crawl once a week, 13-hour duration). • Number of queries per day (6,000). • Number of users accessing (7,200). • Number of ontology editor users (1–3). • Integrations (FAST search engine, SharePoint). • Plan to tag content outside of SharePoint (Yes). • Ontology widgets on search results page (Best Bets, related topics, search facets/filters/refiners). The software cost estimate was between $200,000–$300,000, which included a 20% mainte- nance fee for 12 × 5 (12 hours × 5 days) technical support for the initial year. The estimate did not include hardware, installation, configuration, or training costs. 4.5 Transferability and Scalability of the Pilot Transferability Based on obtaining samples of content from the Washington State DOT, the research team con- cluded that most of the ontology, auto-categorization rules, and faceted search design developed for the Virginia DOT pilot could be easily adapted for other DOTs. The content type categoriza- tion rules would require minor adjustments to reflect different titles, and the agency-specific data

pilot Demonstration II-35 for projects, districts, highway system categories, and so forth would need to be replaced with equivalent data. Interestingly, two of three rule sets for identifying types of work issues (i.e., drain- age and weather-related issues) worked well with no modifications on Washington State DOT content. The third rule set (utility issues) would need further refinement, possibly because of the need to use specific utility company names in this rule. Another transferability consideration relates to software platforms for solution development. Based on a review of websites and conference presentations, the research team estimated that at least one-half of state DOTs use SharePoint; however, it should not be assumed that all DOTs would want to base their findability solutions within this platform. Further, a variety of enterprise search and text analytics tools are available. NCHRP Research Report 846, Volume I, Appendix E presents a partial list of commercially available tools—and these tools are evolving rapidly. The pilot developed for this project was constructed for demonstration only. Agencies wish- ing to replicate the solution developed in the pilot would need to re-create the steps taken to fit within their own software environments. However, much of the effort for the pilot was spent on design and development of the semantic structures and the auto-categorization rules rather than on implementing them within the text analytics tool; and the effort involved to set up the faceted search in SharePoint was not extensive. Scalability The pilot implementation, which covered only a fraction of DOT content (specifically, con- struction daily work reports, change orders, source of materials forms, and project profile forms), required an intensive effort. The question of scalability is therefore important to consider; namely, what would it take to implement this type of faceted search solution for a larger portion of the DOT’s information resources? Although the research team did not estimate what percentage of the total content the pilot represented, the researchers would not expect a linear relationship to exist between the number of content types included and the number of hours necessary to implement a search capability. In fact, with marginal additional effort, the ontology and rule base developed for the pilot could be adapted for additional content types. It would be particularly straightforward to extend the pilot framework for other types of text- based construction project-related content. Given that many of the facets relate to construction projects already, the main work that would be required would be to develop rules for auto- categorizing the new content types. In most cases, these rules would be fairly straightforward to develop, particularly when standard titles or text blocks identify the content types. The effort required to extend the pilot beyond project-related content is highly dependent on the complexity of required rules. As previously noted, substantial time was spent in the pilot to develop and refine the rules for classifying work issues. This involved time-consuming manual review of multiple documents to develop lists of key words on which to base rules. Other rules were considerably simpler and quicker to develop, particularly those that leveraged available resources such as pay item code lists, and project data files allowing assignment of district, high- way system, cost range, and so forth, based on a project identifier.

II-36 C h a p t e r 5 5.1 Conclusions DOT Business Drivers for Findability Based on the literature review, information gathering, and pilot activities conducted for NCHRP Project 20-97, a recognized need exists to improve findability of information within transportation agencies. Needs of greatest concern are efficient retrieval of information in response to FOIA requests, PDRs, and legal claims; and ensuring that employees can find cur- rent, authoritative versions of agency policies, manuals, guidance, and standards. Agencies also understand the potential benefits of making it quicker and easier to find relevant information, including reduced time spent searching and improved ability to re-use information that has already been created rather than duplicating efforts. Because search capabilities in DOTs are typically basic and fragmented, employee expectations are low. Employees may not attempt to discover information that would be helpful for their tasks at hand, beyond use of specialized applications that serve their particular job functions. Those employees who do seek additional information rely on asking colleagues, visiting known intranet pages, and conducting external Internet searches. When an external information request comes in, staff may be asked to spend hours sifting through email files, file drives, and databases to develop a response. Considerable opportunity exists for agencies to realize employee time savings from findability improvements, though these time savings will be spread across the organization and, as a result, difficult to track. Importance of Understanding Needs The findability of information in a DOT is not a simple problem with a single solution. Multiple types of information needs exist, and a multi-faceted approach is needed to address most types of needs. The guide presented as Volume I of this research report emphasizes understanding user needs as a key step in any findability improvement. Doing this avoids wasted effort making improvements that do not have any benefit, and provides the foundation for effective design of a solution. Some types of needs will be readily apparent (e.g., improving relevance of intranet searches for guidance information). Other types of needs may be latent in nature because people may not think to ask for something that could be helpful but is not currently possible. It is essential to identify customers and involve them in the process of developing solutions. Practices for Improving Findability In most cases, a combination of information management discipline, effective deployment of enterprise search tools that index content within multiple agency repositories, and design and Conclusions and Future Research Needs

Conclusions and Future research Needs II-37 implementation of a workable metadata strategy will be required to improve findability. Develop- ing a workable metadata strategy means standardizing on an essential set of metadata elements that are (1) helpful for information search and discovery and (2) can be reliably populated using a combination of manual and automated methods. Development and ongoing improvement and management of terminology resources are integral to the metadata strategy, because terminology provides lists of values for metadata elements. Terminology resources also are needed for build- ing effective search solutions with features such as auto-suggest and query expansion to include synonyms and related terms. The guidance developed for this project can be used by DOTs and other transportation agencies to assess and strengthen each of these elements. While search within organizations (called “enterprise search”) does not perform as well as Inter- net search, the private sector examples reviewed and the pilot capabilities demonstrated show that search within an organization can, in fact, be improved substantially. Improvements to search within DOTs would likely lead to identification of new applications and benefits. Following the demonstration of pilot capabilities, one DOT employee remarked: If we could index our structured and unstructured data, it would solve most of our search and findability issues. It would help to structure our information landscape. It would get people thinking, “What else can we do?” . . . The faceted search allows you to group like items together and gives cohesiveness to content. Once all related items are findable, maybe people will take more care in making just the appro- priate information available (versioning, duplicate info). I think the ripple effect of having this would be enormous. . . . It would be a game changer (Personal Communication 2016). A wide variety of open source and commercial text analytics and search products are available, and they are growing in sophistication. These technologies support development of faceted search capabilities, tuning of results relevancy ranking, and automating the assignment of metadata, all of which are essential ingredients for improved search. Need for an Integrated and Coordinated Management Approach To develop, deploy and maintain findability capabilities, agencies must put in place the right set of roles and responsibilities, and acquire or build the necessary types of expertise. At DOTs, putting in place the necessary organizational functions, skills, and disciplines presents more of a challenge than acquiring and implementing supporting tools and technologies for findability. To provide much of the needed expertise, however, DOTs can look to their existing staff resources, including librarians, content managers, documentation engineers, project controls specialists, data managers, records managers, website managers, and other IT professionals. A coordinated approach to improving findability is needed, leveraging available skills. Several DOTs have established data or information governance groups that can provide a focal point for coordinated implementation of improvements. One aspect of the pilot demonstration implemented for this project involved leveraging avail- able data resources (construction project data) to facilitate search of unstructured documents. Doing this provided the ability to build a search capability that could, for example, auto-complete a project number entered into a search box and find documents that referenced only a contract number that was related to the project number entered. Other key search facets were populated from lists of districts, lists of jurisdictions, and other reference data sets. This example highlights the importance of an integrated approach to findability for both structured and unstructured data and information resources. The implication is that staff responsible for developing agency business applications, GIS, and business intelligence capabilities should work collaboratively with those involved in developing search capabilities focused on unstructured content. For example, master and reference data management practices are valuable not only to support structured reporting but also for search applications.

II-38 Improving Findability and relevance of transportation Information 5.2 Future Research Needs Future research would be beneficial in several areas: • Additional DOT pilots to validate, extend, and facilitate adoption of the findability practices developed in the initial pilot. • Investigation of machine learning techniques for auto-categorization of content. • Investigation of techniques for auto-categorization of DOT image files. Additional DOT Findability Pilots Additional pilot implementations of the auto-categorization and faceted search techniques developed in the Virginia DOT pilot would provide several benefits. They would: • Enable a new set of agencies to gain exposure to these techniques, understand their potential benefits, and have a basis for evaluating ongoing staffing and implementation options. • Provide an opportunity for extension of the resource base (ontology and auto-categorization rules logic) that can be made available to the entire DOT community. • Allow for development of expanded guidance for DOTs covering findability scenarios beyond those tested at the Virginia DOT (related to construction project information). • Allow for additional validation of the transferability of the techniques and an improved under- standing of resources needed for extending capabilities beyond construction project information. A sample set of tasks for DOT pilots is suggested below: 1. Prepare a prospectus detailing pilot objectives and time requirements, solicit agency interest, and select agencies to participate. 2. Meet with agency staff; identify target content and search needs. 3. Refine the ontology developed in NCHRP Project 20-97, working with agency-specific branches. The product would be an updated, expanded version of the ontology developed in the Virginia DOT pilot, reflecting work done in any additional pilots. It would include both generic and agency-specific elements (e.g., each agency’s list of regions would be different). 4. Develop and refine rules for auto-categorization based on each agency’s content collection. The product would be documentation of an expanded set of rules for automatically tagging content with terms from the ontology (e.g., an expanded set of rules to identify projects). Each agency included in the pilot could use these rules to implement an auto-categorization function using the technology solution of their choice. The documented rule set could also be adapted for use by other agencies. 5. Develop an agency work plan for implementation. Each agency would be provided with a work plan detailing tasks for implementing software and processes for improving findability, applying the ontology and classification rules. This task would involve interviews with agency staff to develop an approach to identify technical implementation solutions compatible with their existing IT environments. The work plan would include special attention to opportuni- ties for implementation of spatial search interfaces and integration with business intelligence capabilities. 6. Produce a summary guidance document for general DOT use providing implementation guidance based on the pilot experience. Provide rules and ontology as separate reference files. Provide example work plans (from Task 5) as a resource for DOTs to use in developing their own work plans. Based on the work plans, general guidance could be included on how to integrate the products of NCHRP Project 20-97 into DOT geospatial and business intel- ligence applications. The tasks outlined above provide a practical approach to validating and refining the products of NCHRP Project 20-97 and providing additional exposure to the techniques within the DOT

Conclusions and Future research Needs II-39 community. They do not, however, involve actual implementation or development of the tech- niques. This approach is suggested given the costs and risks of developing a specific technology solution for each participating agency. A logical follow-on activity would be to support development of faceted search and auto- categorization capabilities within the agency’s production environment. An in-house implemen- tation would provide a real-world example of what is involved in building a search index across repositories and working through access restrictions. This approach would require a greater level of involvement of agency IT staff than was possible in the first pilot. However, agencies might view this as an opportunity for staff to gain experience, which could be applied in the future if the agency decided to move forward with production implementation. An actual implementa- tion project might be considered following the initial pilots in order to produce a functioning example capability within a DOT that could be demonstrated to other interested agencies. Machine Learning Investigation Techniques involving machine learning for automated categorization of documents are cur- rently used for a wide variety of applications, with an emphasis on legal e-discovery. Machine learning techniques involve manual classification of a set of “seed” documents, and then using the set of manually classified documents to derive algorithms for assigning classifications to other documents. The potential advantage of these techniques is that they would not require extensive manual rule development. The disadvantage is that they do require a subject matter expert to manually classify the set of seed documents, and the algorithms that are developed are not transparent. Given that there is interest in applying machine learning techniques, it would be valuable to test application of these techniques for some specific DOT use cases. Two areas that would be of potential interest would be: • Speeding response to FOIA requests. • Assignment of construction work issue categories, extending the work conducted in the NCHRP Project 20-97 pilot. For the FOIA application, potential tasks would be: 1. Assembling an advisory panel from several DOTs comprised of staff involved in preparing responses to FOIA (or state public disclosure) requests. 2. Collecting examples from the advisory panel members of common requests that are time- consuming to fulfill. 3. Selecting one or more example requests from advisory panel members who are willing to provide content associated with the request for the analysis. 4. For each request, assembling a body of content, including the items that were provided in response to the FOIA request along with other content that was not relevant to the request. 5. Utilizing a text analytics tool to analyze a selected set of seed content and develop algorithms for selecting new applicable content. 6. Testing the algorithms developed on a mixture of additional content (beyond the seed set), including both relevant and irrelevant items. 7. Comparing and analyzing results across the different FOIA requests. 8. Writing a summary report with conclusions about the level of effort required and the accuracy of results. A second test of machine learning would build on the work conducted in the Virginia DOT pilot to auto-classify construction change orders and daily work reports for work issues. It would utilize a seed set of documents classified with one or more work issues (e.g., drainage, weather,

II-40 Improving Findability and relevance of transportation Information utilities) to train the analytics software. Then, the software would be used to auto-classify another set of documents, potentially from a different agency. The result could be used to compare the level of effort and outcomes for application of the rule-based and machine learning methods of classification. Findability of Image Files NCHRP Project 20-97 emphasized findability of text-based content. Given that DOTs have large (and growing) collections of image files (both photographs and video images), techniques are needed for improving findability of these images. A large body of research on multimedia information retrieval is available covering a variety of techniques, from facial recognition to machine learning techniques utilizing social tagging. Potential tasks in this research topic could include: • A critical review of existing techniques and their applicability to DOTs. • Documentation of case study examples of organizations that have implemented advanced techniques for auto-classification and search of images. • Identification of available open source and commercial products that provide auto-tagging capabilities for images. • Development of guidance for extending the findability framework established in NCHRP Project 20-97 for inclusion of image content types.

II-41 Boiko, B., and E. M. Hartman (Eds.) (2010). TIMAF Information Management Best Practices, Vol. 1. Utrecht, The Netherlands: Erik Hartman Communicatie. Cleverley, P. H. (2015). The best of both worlds: Highlighting the synergies of combining manual and automatic knowledge organization methods to improve information search and discovery. Knowledge Organization, 42(6), pp. 428–444. NCHRP Project 20-109. (n.d.). “Enhancement of the Transportation Research Thesaurus.” Project description retrieved July 25, 2016, from: http://apps.trb.org/cmsfeed/TRBNetProjectDisplay.asp?ProjectID=4061. Personal communication. Washington State Department of Transportation focus group participant (March 11, 2016). Virginia Department of Transportation. (2016). Construction Dashboard Project. Project details retrieved January 4, 2016, from: http://dashboard.virginiadot.org/Pages/Projects/ConstructionOriginal.aspx. References

II-42 A p p e n d i x This appendix documents implementation of a findability pilot for the Virginia DOT and an analysis of the transferability of pilot results with the Washington State DOT. Three annexes providing additional details are included at the end of the appendix: • Annex 1: Pilot Classification Rule Descriptions • Annex 2: Example Scenarios Using Faceted Search Design • Annex 3: Pilot Evaluation Metrics and Description Pilot Findability Report

pilot Findability Report II-43 A.1: Pilot Overview The research team conducted pilot activities at the Virginia DOT (VDOT) in order to demonstrate an application of the findability framework and potential benefits from findability improvements. The pilot demonstrated and validated the concepts and methods proposed in the guidance to improve findability, assessed the effort required in making these improvements, and evaluated the transferability to other DOTs. The pilot also provided an opportunity to document a case study for inclusion in the guidance, creating resources that other DOTs can build upon such as a faceted search navigation design and a set of common terms to use in document classification rules. As described in the pilot proposal, the research team first identified candidate agencies and potential pilot project scopes at each agency. After identifying these project scopes, the research team selected VDOT as the “primary” agency for the pilot, with a focus on construction project document findability. VDOT staff identified construction documents as a priority findability issue for VDOT, and it was tractable in the context of a brief pilot effort. Meeting the research objectives required a fairly intensive effort with the “primary” agency, VDOT, for development and testing of a solution to improve management of active construction project documents. A “secondary” agency, the Washington State DOT (WSDOT), provided a set of sample content resources to allow the research team to consider variations across the two agencies and the transferability of the pilot solutions. This pilot project consisted of a number of activities, grouped into four areas: assessing findability needs, collecting content, developing a solution, and testing and evaluating that solution. The process used to structure pilot project activities is displayed in Figure II-A-1. Each of these activities is further detailed in the remainder of the document. Pilot Objectives • Demonstrate and validate concepts and methods to improve findability • Assess effort required and transferability to other DOTs • Document a case study for inclusion in the guidance

II-44 improving Findability and Relevance of Transportation information Source: Adapted from figure in internal draft document from Kansas City DOT (2005). Figure II-A-1. Pilot activity process. A.2: Assessment Stakeholder Identification The research team identified three main types of stakeholders: Knowledge Management and Library staff, Central Construction Office staff, and District Construction staff. Individuals in these groups have responsibility for improving construction information management and search capabilities, and routinely search for construction-related information. Knowledge Management and Library employees within transportation agencies are stakeholders because this research relates to the findability and organizational structure of information within the agency. At VDOT, Knowledge Management and Library employees work heavily with the content management and collaboration portal platform used in the pilot analysis, and are knowledgeable both in its use and its content. The Knowledge Management Office identified a pilot goal of improving the ability to find needed project information through a search of this portal. The research team identified Central Construction Office staff as a second group of stakeholders interested in improving findability of project information, particularly related to project costs and schedule. For example, the Central Construction Office could use patterns in unstructured text (e.g., in construction daily work reports) to identify issues leading to project cost overruns or time extensions. Finally, the research team identified District Construction staff as the third group of stakeholders for this pilot. This group includes the District Construction Manager, Area Construction Engineer, Contracts Manager, Project Controls Engineer, and Technology Resource Manager. While the Central 1. Assessment A. StakeholderIdentification B. Interviews C. Findability Needs 2. Content Collection A. Content Type Selection B. Content Harvesting, Analysis, and Conversion C. Project Data and Profiles 3. Solution Development A. Semantic Model Development B. Rule Development and Refinement C. Faceted Search Design 4. Test and Evaluation A. Rule-based vs. "Vanilla" FAST Search B. Testing and Subjective Evaluation C. Transferability Analysis

pilot Findability Report II-45 Construction Office staff wanted to find patterns across projects, District Construction staff primarily focused on locating information about specific projects. Interviews A 3-day site visit to VDOT was conducted during May 4–6, 2015, during which members of the research team met with Knowledge Management staff, construction staff from three districts, representatives of the central office Construction Division, and Information Technology staff responsible for the agency’s content management/collaboration platform implementation, including search capability configuration. Following these initial meetings, research team members followed up with VDOT Library staff and with individuals holding statewide responsibilities for construction scheduling and materials. The research team also worked with Information Technology staff to discuss the technical approach to the pilot. In discussions with VDOT Information Technology staff, it became clear that VDOT would have difficulty hosting the actual pilot demonstration on agency servers. There were several reasons for this, including availability of suitable hardware, initiation of activities to migrate to an updated software version, security challenges associated with demonstrating a federated search approach, and logistical difficulties providing direct access to agency servers and databases to external consultants within the timeframe of the pilot. As a result, the research team initiated discussions with a vendor of search and text analytics software, who agreed to host the pilot (in “the cloud”) and provide access to the necessary software. This allowed the research team to demonstrate new kinds of search capabilities in an environment that can be easily controlled by the research team. Also through the interviews, the research team identified document content types used at VDOT, as listed in Figure II- A-2. The research team’s efforts focused on three content types: daily work reports/inspector diaries, work orders, and source of materials forms. The research team also included project profiles in the pilot, which are publicly available from the VDOT website. Descriptions of each of these content types are included in the “Content Type Selection” section. The interviewees also identified a number of content management methods at VDOT that would affect the pilot. Correspondence Meeting Minutes Contracts Work Orders (C-10) Daily Work Reports/Inspector Diaries (C-84) Material Documentation (C-85) Source of Materials Forms (C-25) Subletting Request (C-31) EEO Reports (C-64) Estimates (C-79) Starting and Completion (C-5) Vouchers Price Adjustments Blast Reports Environmental Compliance Reports Contractor Inspection Reports Certified Payroll Design Field Changes Job Mix Designs Materials Test Results Insurance Certificates Tracking Logs Notice of Intent Claims Source: VDOT; list adapted from figure in internal draft document from Kansas City DOT (2005). Figure II-A-2. Content types.

II-46 improving Findability and Relevance of Transportation information This included the following findings. More detail on a number of these findings is included in the “Content Harvesting, Analysis, and Conversion” section. A number of documents include scanned images that are not text searchable. The documents demonstrate a lack of file naming conventions. Often, file names are context dependent (e.g., “estimate #1”) or not meaningful (e.g., “SCAN001”). “Standard” forms include a number of variations, reflecting changes over time, across districts, and across projects. Folder hierarchies that store the documents contain variations in folder structure. Document storage includes extensive use of email and email attachments. There is duplication of content across the SharePoint drive, including within project folders. Documents contain minimal metadata. Multiple VDOT interviewees expressed an interest in using some degree of automation to add metadata to documents. This idea is described in more detail in the “Semantic Model Development” section. Interviews also provided the research team with a more complete understanding of current searches and pain points for each of the identified stakeholders. Interviewees noted that a major pain point is in responding to Freedom of Information Act (FOIA), audits, Notices of Intent (NOIs), and claims. These searches require significant effort to locate documents. Additionally, the common practices of making multiple copies of a document and distributing documents via email attachments creates problems with locating the most recent or “authoritative” version of a document. They also make the application of retention schedules difficult. Interviewees also noted the risk of document loss when an employee leaves a position and leaves content on personal drives not accessible to other staff. They noted that the new Construction Document Management System is intended to address many of these areas but will not address management of historical content. Following the initial set of interviews, the research team developed several potential user scenarios on which to focus the pilot, and began investigating these scenarios. The information garnered from the interviews served as the foundation for the pilot design, both in defining the existing information infrastructure and methods, and in understanding search needs that the pilot search tool should address. Findability Needs Through the interviews with VDOT staff, the research team identified business questions and associated information search needs. Table II-A-1 summarizes these questions and search needs. The Comments column of this table includes the research team’s assessment of whether the business question was an appropriate candidate for inclusion in the pilot. The business needs that the research team chose to use to guide the development of specific search scenarios for the pilot are marked with asterisks.

pilot Findability Report II-47 Table II-A-1. Summary of VDOT Business Questions and Search Needs ID Business Question/Search Need Comments 1 Where was work actually done (for projects that do not have a single route- from-to location)? Daily work report descriptions do not generally include location, and stationing information in the item block of the form is typically blank. 2* Where have we used cold-in-place pavement recycling? Can search for variants of “cold-in-place recycling”, “CIP recycling”, etc. within daily work reports and classify projects based on results.1 3 What were the success factors and lessons learned from projects of a particular type (e.g., design-build, accelerated bridge construction, etc.) These would be best addressed through interviews – search capability would not add substantial value. Design-build contracts can be identified via existing structured data (contract type), and use of accelerated bridge construction is sufficiently specialized that the state bridge engineer would be aware of these projects. 4 What projects have involved innovative use of materials, what was done? Would need to have more specificity (e.g., list of specific techniques) to investigate potential. 5* Provide all records (or certain types of records) for a particular project (in response to a FOIA request, audit request, NOI, claim investigation, or Construction Quality Inspection Program check). Pilot demonstrates ability to retrieve multiple document types based on one of several IDs, leveraging a crosswalk for project identifiers. 6 Locate a Right-of-Way agreement or a set of correspondence for a particular project. Similar to above – rules could be used to partially compensate for inconsistent tagging of documents by project number - but biggest barrier to findability relates to information management practice (i.e. storing documents in a searchable location). 7* Find recent projects that installed a particular make, manufacturer of item (e.g., Trinity guardrail GR-9) – based on construction item and source of material. Pilot allows a search to use a combination of Material facet and Supplier facet to search the Source of Materials forms. 1 Although the research team intended to incorporate this search need into the pilot, there was not sufficient content available to test or implement this scenario.

II-48 improving Findability and Relevance of Transportation information ID Business Question/Search Need Comments 8* Find construction documents based on one or a combination of: tax map parcel, project number/universal project code (UPC), project type (paving, bridge, etc.), fixed completion date (for active projects), construction document type, district, county, route, owner, contractor, subcontractor, cost range, types of material, item code/category, responsible charge engineer. Pilot used this business need as input for design of a more comprehensive search tool for projects, and can demonstrate some of these search items. 9 Find projects that have used a particular type of asphalt binder within a specified date range. This could be accomplished through a search of pay items in SiteManager. 10 Find all documents associated with a given project that reference “KARST” – a geological condition characterized by sinkholes – as part of an investigation of when this condition was discovered. A simple text search would meet this need – no complex rules required. A more complete set of project records (including emails) would be required to meet this need. Not representative of a common search need. 11 Find out asset install dates by mining the daily work reports. This could potentially be accomplished through a search of the structured data on pay items placed by date in SiteManager. 12* Identify projects that used a particular pay item or category of pay items (in order to guide selection of pay items to include on a project being designed). Pilot demonstrates use of a pay item category facet as a way to drill down to a collection of projects that used one of a related set of items. 13* Identify systemic issues that contribute to construction projects not meeting goals for on-time, on-budget, environmental and quality scores. Currently very time- consuming to do this research. Pilot partially addresses this by providing a way to filter project documents by whether there was an issue identified on a Daily Work Report or a Work Order with a particular type of reason/cause – e.g., utilities, drainage issues. 14 Respond to inquiries from other DOTs on various topics: e.g., use of a particular technique and material for bridge deck overlays (alternative to asphalt); how to respond to issue related to longitudinal crack on bridge decks. Pilot partially addresses this through the materials facet – but this need is really a collection of unrelated topical investigations that would require further specificity to assess.

pilot Findability Report II-49 The selected information access needs can be classified into four categories. These four categories serve to organize the pilot evaluation test cases described in the “Testing and Subjective Evaluation” section: 1. Find a Single Known Document for a Project (e.g., an estimate) Using a Variety of Search Criteria 2. Find/Review All Documents for a Project (e.g., for a FOIA Request) 3. Search Across Projects - Find Projects with Item, Material, Construction Technique 4. Research Reasons for Delays and Changes A.3: Content Collection Content Type Selection After analyzing the content types listed in Figure II-A-2, the research team selected three types of content: daily work reports, work orders, and Source of Materials forms. These were selected based on their likely business value and relevance to the business questions in Table II-A-1. An emphasis was placed on content with valuable unstructured (text) that could not easily be discovered via existing database applications or query tools. Daily work reports (VDOT Form C-84), also known as Inspector’s Daily Reports, are forms completed by construction inspectors. They are used to record pay item quantities placed and equipment used, and provide a narrative of activities and conditions on the construction site. Daily work reports were selected because they contain substantial blocks of free-form text that could be mined to derive useful information about construction projects. Work orders, also known as change orders (VDOT Form C-10), are used to authorize a change in contract scope, schedule or budget. These contain text descriptions of the location and type of work included in the change, and the justification or reasons for requesting the change. Similar to daily work reports, work orders were selected because they contain substantial blocks of free-form text that could potentially be mined to derive useful information about construction projects. Both daily work reports (DWRs) and work orders (WOs) can be created within VDOT’s SiteManager application. However, SiteManager includes only a rudimentary search capability, and is not generally accessible to the general user. In addition, content analysis revealed many examples of DWRs and WOs that appeared to be created outside of SiteManager, as well as many examples of PDFs and HTML reports created from SiteManager but stored independently on file drives and team collaboration sites. Source of Materials forms (VDOT Form C-25) are completed by contractors detailing the intended manufacturer or supplier for each type of material to be utilized for a construction project. Materials Division employees receive this form and complete the required method for testing of each material. This form is used by the construction inspector to verify that the sources have been approved, and that appropriate testing takes place. Source of Materials forms were selected to illustrate a search capability for a combination of material type and vendor – for information that is fundamentally of a structured nature (i.e. pay items and vendor/supplier names) but not currently collected via a structured database. This capability would have been useful for the investigations that occurred related to the recent issues with the Trinity guardrail.

II-50 improving Findability and Relevance of Transportation information In addition to the three above content types, the research team elected to include a fourth type that would provide a user with general information about a construction project. Project profiles for each construction project were created utilizing information that was publicly available for download from VDOT’s online dashboard, in the “Project Delivery” section. Each profile contains project details, project summary, contact information, and budget and schedule details. Project profiles were selected to provide a document that would allow users searching for a project to quickly access information about that project. Content Harvesting, Analysis, and Conversion In general, content collection was time-consuming because document naming and storage locations are not standardized across districts. To collect daily work reports, work orders, and Source of Materials forms, the research team conducted a series of searches for key words or identifiers that appeared within forms (e.g., “Form C-25”). Documents meeting the search criteria were downloaded, and found in a variety of formats (including PDF, MHT, Word, Excel, RTF, MSG). Some of these documents were produced from systems (e.g., AASHTOWare SiteManager), while others were stand- alone forms and related correspondence. The collection, analysis, and conversion process for each type of document is described in more detail below. Roughly 3,000 daily work reports, 1,000 Source of Materials forms, 1,000 work orders, and 6,000 project profiles were collected. From this, the research team limited the content to approximately 2,000 daily work reports, 1,000 Source of Materials forms, 1,000 work orders, and 2,000 project profiles. This accomplished two objectives: it increased the computing performance of the search function by limiting the total content volume, and it avoided skewing search results toward a particular document type (e.g., including all 6,000+ project profiles would have accounted for over half of the pilot content). Document Harvesting The research team obtained a collection of content through a combination of methods: direct provision of files by districts (from their shared drives and/or team sites), searches for particular document types on VDOT’s content management/collaboration platform, and direct downloads from this platform of entire project folders. As noted during the interviews, the folder structure varied from project to project. Although content is often contained in subfolders multiple levels down in the project folder hierarchy (and not in a consistent location from project to project), once found, this structure allows for bulk download of all of the work orders and/or daily work reports for a project. The research team downloaded and stored these documents in folders named with the project number, and did not alter the original file names (i.e., the research team gave each content type- project combination a folder and maintained the original filenames for all content within the folder). This improved the speed of document collection and increased the number of documents collected, as not all documents appeared in the initial searches. This approach was not used for Source of Materials forms because the initial searches resulted in a suitably high volume of content. VDOT’s Knowledge Management Office facilitated access to a body of content that was stored on a contractor content management/collaboration site for a design-build megaproject. This document collection process followed a similar search pattern as the initial search collection process, although

pilot Findability Report II-51 with fewer issues related to naming conventions which were more standardized given that the content was for a single project. There were some variations between the megaproject and the internal VDOT content. For example, Form C-10 was called a “Change Order” rather than a “Work Order”. Content Analysis and Conversion To identify how many different projects the collected body of content represented, the research team created a spreadsheet of individual projects through an examination of individual daily work report and work order files. VDOT construction projects are identified by a state project number (e.g., “0023- 101-102,C501”), a sequentially assigned UPC - a cradle-to-grave project identifier, e.g., “15786”, and a contract ID (for the main construction contract). Files collected typically contained at least one of these identifiers. The team downloaded information from VDOT’s dashboard that includes each of these three identifiers (as described in the “Project Data and Profiles” section). Using this information, the team matched the project identifiers for the existing files to this master file. Based on this exercise, the collection of content assembled collectively represents approximately 250 different projects. Analysis of the body of content led to the following observations. Each of these issues represents challenges likely to be faced in other organizations seeking to implement improved search capabilities across existing repositories: Naming Conventions. As noted in the “Content Harvesting, Analysis, and Conversion” section, different projects maintain different naming conventions. At times, naming conventions also differ within projects (e.g., a single project could have filenames of “Work Order 2” and “WO 3,” include a date at the end of some file names, and include the project number at varying levels of specificity within the file name). File names are also often context dependent (e.g., the “Work Order 2” document noted above is context dependent on the project folder in which it appears). These differences and inconsistencies could make it difficult for a user to find a specific work order within any content repository. Text Recognition. Many of the downloaded documents are scanned PDF documents that do not contain searchable text. In this original state, the search tool would not be able to read or index these PDFs. To take advantage of these documents, the research team used optical character recognition (OCR) software to convert these to searchable documents. Due to the number of documents, this required extensive processing time. The resulting document quality was generally good, but varied based on the quality of the original scan. The process of converting PDF images to PDF text files made them readable. However, once converted, different pieces of software may read the OCRed text differently . Specifically, the classification software seems to have read some zeroes as Os, Bs as 8s and so on. This is common in OCRed text. The search tool, FAST, appears to have read some of those characters more accurately. These observations are based on testing classification in one tool (where it is possible to see exactly how the software read the characters) and search in another (which returned some searches that the classification software had read differently). Email Documents. Many of the relevant documents were in email format, with embedded attachments containing work orders, daily work reports, and Source of Materials forms. Since the search tool utilized for the pilot was not configured to search text within attachments, the research

II-52 improving Findability and Relevance of Transportation information team opened each email and downloaded the relevant attachments. Although Smartlogic could be configured to search both email and attachment text, it would only be able to do so for attachments with searchable text. By downloading the relevant attachments, the research team was able to use OCR software to recognize text in the attachments and increase the content base. Emails presented a challenge within the search tool in that content could be repeated multiple times within an email chain (e.g., once in an original email, and again multiple times as part of the replies). Historical Evolution of “Standard” Forms. Each of the content types included a variety of formats – presumably reflecting changes in practice over time, variations across districts, and variations in contractor-created forms. For example, most daily work reports were entered into VDOT’s construction management software (SiteManager), and then either printed and scanned to PDFs or output and saved as HTML or Microsoft Word (.doc) files. Some were completed using a standard form (C-84) and .doc files. Others were completed in Microsoft Word using a custom format (i.e. not using the C-84 form). The collected content contains 14 varieties of daily work report documents (including documents with titles such as “Inspector’s Daily Report,” “PM Diary,” and “Daily Report of Construction”). Similarly, the collected content contains nine varieties of work orders (including “change orders”). Notably, the work orders often have similar sections, but in different sequences. For example, the slight differences in two versions of the Form C-10 (from 2006 and 2007) include: The inclusion of a VDOT- defined “category” field (e.g., “ADD” for additional work not originally planned) in the 2007 version; Different language specifying the “Contract ID” field (“Job Des. Or Contract ID. No.” in the 2006 version, “Contract ID No.” in the 2007 version); Different language specifying the explanation for the proposed work (“Engineer’s Explanation of Necessity for Proposed Work” in the 2006 version, “Responsible Charge Engineer’s Explanation of Necessity for Proposed Work” in the 2007 version); Different specification of the time effect of the work order (referencing “A Time Extension” of a specified amount of additional calendar days with a specified new fixed completion date in the 2006 version, compared to the specified contract time limit prior to approval and upon approval of the work order in the 2007 version). Other differences in the forms, such as the title (e.g., “Work Order” and “Change Order,” or “Daily Work Report” and “Daily Report of Construction”) could have more of an impact on findability. As discussed in the “Solution Development” section, understanding these varietal differences is essential to the development of the search capability by enabling search logic to take advantage of patterns in the documents. Related Documents. In addition to the official work order documents, several other related documents were collected from the same folders as the work orders: the FHWA Conceptual Approval Request letter, and various cover letters and transmittal slips with related information on prices, etc. Official work orders are entered and tracked in SiteManager. Duplicate Documents. Because of the different approaches used to harvest documents, the content collection process resulted in some duplication.

pilot Findability Report II-53 Document Length. Some documents were quite lengthy. For example, PM Diaries often contain a full month (or more) of daily work reports. In these cases, information relevant to a user search may be limited to 1-2 pages within a 100 page document. Similarly, some work orders contain significant additional information, in which the Form C-10 is only one to two pages out of hundreds of pages in the document (and may be located in the later portion of the document). This can impact the value of the search tool – which can find documents containing relevant information, but not point the user to the portion of the document that is relevant. This capability could be developed via customization of search or text analytics capabilities but was beyond the scope of the pilot. For the purposes of the pilot, the research team attempted to limit the inclusion of these documents in the final content by selecting documents based on file size (mainly to improve processing speed, but also to prevent these documents from overwhelming search results by matching on many search terms. Alternatively, fully built-out rules could similarly prevent these lengthy documents from consistently appearing as top results). Content Storage Locations. Differences across districts in where files were stored (content management system/collaboration portal or file servers) made the content collection process complex and would similarly complicate development of an enterprise search tool. Project Data and Profiles Project List As noted above, the research team compiled project information from publicly available sources. The research team downloaded a project list available from the VDOT Dashboard website’s “Project Delivery” section.2 This website provides information for all construction projects dating back to FY 1999. The downloaded project list includes the following fields for each project: District Route Road System (e.g., Primary, Secondary, Urban, Rural) UPC Description Contract ID Original Specified Completion Date Estimated Completion Date Current Contract Amount Award Amount Cost of Work to Date Final UA Cost Acceptance Date Contract Type 2 http://dashboard.virginiadot.org/Pages/Projects/ConstructionOriginal.aspx

II-54 improving Findability and Relevance of Transportation information Type of Work (e.g., Bridge Widening, Grade / Drain / Pave) Type of Work (Code) Type of Work (Group) On Time On Budget The description field contains a variety of information, often including the project number and location. The research team harvested this information to create separate fields for project number and location. For example, a description of “2009 PLANT MIX SCHEDULE (South Hill, Mecklenburg County) (PM4A-058-026,N501)” represents the PM4A-058,026,N501 project in South Hill, Mecklenburg County. In extracting the project number for each project, the research team created a project and contract numbering mapping. As described in the “Search Capability” section, the search tool includes a “smart” search capability based on this information by retrieving selected structured project information from the project list, and generating results for documents containing the contract ID, project number, or UPC. A number of the project list items required significant data cleaning and normalization. For example, the project team: Separated the project number and location from the description, as described above Normalized route information and added five synonyms for each (RT. 66 and I-66, for example) Separated route numbers for some projects that involved more than one route Mapped contract award amounts to ranges Separated multiple UPC codes into columns so that they could be imported individually and be mapped to each project Separated and added road system(s) to each project Project Profiles On the same public website as the project list, VDOT provides access to project profiles for each of the projects contained in the project list. Because of the volume of project profile documents available, the research team wrote a script to automate a PDF download of each of the project profiles. These project profiles provide search tool users the ability to quickly access information about projects. A.4: Solution Development Semantic Model Development The research team explored the possible use of a variety of semantic resources, and used some of these as input into the development efforts. One resource was a set of search logs from InsideVDOT, which the research team mined to include some terms in the semantic model. The research team examined the TRT but found it to be of limited value given the highly specific focus of the pilot. The TRT provides broad coverage of transportation concepts at a general level. The search terms for the pilot required more specific terms.

pilot Findability Report II-55 The research team turned to content from the districts. The research team began by using a number of text analytics tools to mine this content for entities and noun phrases, and manually analyzing content for concepts and complex relationships. Using the understanding gained from analyzing VDOT’s content, the research team designed the architecture of the semantic ontology to respond to VDOT’s specific search needs. This architecture governs the facets in the semantic model, how those facets are related to one another, and how they work together to classify content. It is designed to be intuitive to users and responsive to the information seeking needs that were identified in the interviews. After designing the architecture of the model, the research team created facets, imported data, and created relationships among terms. The semantic model includes a number of project-specific facets available from the master project list, which could allow for additional analysis by VDOT staff. The top-level categories of the ontology are based on the metadata fields or facets that would be useful in a search application. Table II-A-2 documents the items included in each facet available for search. Users would be able to filter search results by selecting criteria from this second level of the semantic model (and in further detail by selecting criteria from the third level of the model and beyond if desired).

II-56 improving Findability and Relevance of Transportation information Table II-A-2. Semantic Model Facet Included Values and Value Sources Facet Source of Values Included Values Content Type Content Analysis Values limited to: Work Order Daily Work Report Source of Materials Project Profile Related to Work Order Contract Award Amount Project List Ranges of: Less than $500,000 $500,000 - $1,000,000 $1,000,000 - $5,000,000 $5,000,000 and above Contractors Text Mining of Content Variety District Project List (VDOT Master Data) Values limited to: Bristol District Culpeper District Fredericksburg District Hampton Roads District Lynchburg District Northern Virginia District Richmond District Salem District Staunton District Equipment Text Mining of Content 20 top-level equipment categories, with additional subcategories Jurisdiction City and County List Text Mining of Content All possible values for Virginia cities and counties Manufacturers and Suppliers Text Mining of Content Variety Materials List of Materials Text Mining of Content 30 top-level materials categories, with additional subcategories Pay Items VDOT Standard Item Code Table3 Variety 3 Virginia DOT, “Standard Item Code Table,” available at http://www.virginiadot.org/business/resources/const/itemcodestandard.pdf

pilot Findability Report II-57 Facet Source of Values Included Values Projects Project List Variety Road System Project List Values limited to: Interstate Primary Primary (Arterial) Rural Secondary Urban Various Routes Project List Variety Type of Work Project List Values limited to: Box Culvert Bridge Bridge Ordinary Maintenance Bridge Painting Bridge Repair (& Rehab) Bridge Widening Demolition Fence Repair / Replace GR Replacement / Repair Grade / Drain / Pave Jacked Pipe / Pipe Rehab Maint Replacement New Roadway Pavement Marking / Markers Pavement Repair Paving / Asphalt Paving / Concrete Planting Sidewalk, Curb & Gutter Signals Signing / Sign Overlay Surface (Overlay & Treatment) Utility Widen Roadway

II-58 improving Findability and Relevance of Transportation information Facet Source of Values Included Values Wildflowers Work Issue Content Analysis; Adapted from Previous Work (Sun and Meng)4 Values limited to:5 Drainage Issue Utilities Issue Weather Issue Work Order Categories Content Analysis; Work Order Category Lists in VDOT Content Values limited to: ADD (Additional work not originally planned) CHAR (Changes per Section 104.2 (Character of Work)) CONT (Error or omission in contract document) LEG (Local, State or Federal government proposal) MISC (Does not fit into other categories) NBID (Items specified in contract with set unit price, not bid on by contractor) PLAN (Plan error or omission) POL (Changes in VDOT Policy) RENW (Renewing / Extending time limit on a renewable contract) UTIL (Delays caused by utility issues) VALU (Contractor Value Engineering Proposal) VDOT (Late NTP or VDOT caused delay) Figure II-A-3 displays the facets in the top-level of the semantic model. Users have the ability to search and/or filter by these facets. Users can also drill down within a facet to view subcategories (e.g., selecting “District” in Figure II-A-3 would then allow the user to select a district from among the list of districts). 4 Sun, Ming and Xianhai Meng. “Taxonomy for change causes and effects in construction projects,” International Journal of Project Management 27 (2009), p. 560-572. 5 The research team also identified the following issues as candidates for inclusion in a full development: concrete issues, contractor issues, equipment issues, external delay issues, materials issues, paving issues, plan changes, safety work issues, traffic maintenance issues, and value propositions. These candidates are included in the ontology, although the research team did not focus on them for testing.

pilot Findability Report II-59 Figure II-A-3. Top-level semantic model categories. The research team used Smartlogic’s Ontology Manager to organize the data into a structure, which defines relationships between terms. Figure II-A-4 provides an example of how the model incorporates these relationships. In the example, the user has selected “Bristol District,” which is an element of the “District” facet, displayed above the center circle. Bristol District “has” a number of associated “Routes,” which are displayed in the circles to the right of the center of the center circle. Similarly, “Bristol District” “has” a number of “Projects,” displayed in the circles below the center circle. Finally, “Bristol District” “is District of” multiple “Jurisdictions,” displayed in the circles to the left of the center circle. The search tool interface provides the user with this visual way to view relationships between facets. The content analysis was also used to help model the relationships between the different facets. One of the advantages of an ontology over a simple taxonomy is this ability to model multiple relationships and types of relationships. Table II-A-3 provides a list of the relationships in the VDOT ontology. Relationships in the VDOT ontology fall into three groups: Hierarchical, which creates a parent- child relationship. Associative, which links two equal concepts in a non-linear way, and Equivalence, which include synonyms and other terms that we want to be included as an equivalent to our preferred term. Hierarchical relationships are limited to broader and narrower terms, as their name indicates. The research team defined the associative relationships to indicate the function of each term in the relationship. The relationships are reciprocal, just as many relationships in language are. For example, “is a parent” has a reciprocal relationship of “has a child.” Similarly, “is a parent” could have a reciprocalFigure II-A-4. Example of semantic model.

II-60 improving Findability and Relevance of Transportation information relationship of “is a child of” so that now our relationships are: Tim is a parent of Sarah Tim has a child. Tim is a parent of Sarah is a child of Tim. The defined (or named) relationship is a reference for the user and allowed the research team to use those relationships in different ways for classification. For example, in Table II-A-3, districts comprise counties and cities, and counties and cities are in districts. But districts also have relationships with Projects and Routes, and defining those relationships lets the user discover information through search. For example, a user can look for counties that are located in the Lynchburg District without having to know the name of the county first. Table II-A-3. List of relationships in the VDOT pilot ontology. Term class Term subclass Relationship Class of terms relationship is with Relationship type Content Type C-10 hierarchical has work order Work Order Category associative is work issue Work Issues associative C-25 hierarchical C-84 hierarchical has work order Work Order Category associative is work issue Work Issues associative Project profile hierarchical Related to C-10 hierarchical related to C-10 C-10 associative Contract Award Amount is contract award amount of Project associative District has a roadway Routes associative has project Projects associative is District of County associative is District of City associative Jurisdiction City hierarchal is in County County associative is in District District associative County hierarchical is in County County associative is in District District associative has a roadway Routes associative Pay Items has item ID equivalence

pilot Findability Report II-61 Term class Term subclass Relationship Class of terms relationship is with Relationship type Projects has contract ID equivalence has project ID short equivalence has UPC equivalence Involves route Route associative is project of District associative has road system type Road System associative has contract award amount Contract Award Amount associative has type of work: Type of Work associative Road System is road system Projects associative Routes has road UF6 equivalence is a roadway of District associative is part of project Projects associative Type of Work is type of work Projects associative Work Issue has work issue C-84 associative has work issue C-10 associative Work Order Category hierarchal is work order category C-10 associative Rule Development and Refinement The Smartlogic Semaphore – Classification Server tool includes text analytics features that allow for development and application of rules that improve search results over a simple full text search capability. This proprietary, commercial software served as the basis for the pilot efforts to demonstrate improved findability. Text analytics is software that can be used to add structure to unstructured content, which can automate assignment of metadata to improve search within the enterprise. The basic elements of text analytics include auto-categorization and entity/fact extraction. Auto-categorization can characterize the subject of content, while entity extraction can pull out key concepts and information from a set of documents (e.g., materials and locations). Both processes start with content analysis, through manual sampling to understand patterns, and through text mining software to extract noun phrases. Entity extraction aims for collecting all significant noun phrases, while categorization is built upon phrases 6 “road UF” is a named synonym that translates to “road use for.” Giving synonym types a specific name allows the user to identify them more easily in the ontology. More importantly, this also allows the user to set up types of terms as a facet for search and to weight terms differently. For example, identifying contract IDs with a specific term type name lets the user count it as an equal to project IDs, and naming route synonyms as road UF lets the user weight it at 0.25 or at any score that helps classification.

II-62 improving Findability and Relevance of Transportation information that are unique to each subject area. The research team used these processes to create rules for tagging content for each element in the semantic model. The software the research team used for this project goes through these basic steps when classifying a document: It looks for vocabulary (terms, term variants, phrases and patterns) in documents and metadata related to documents. It applies weightings to any vocabulary it finds to build an overall score for each term in the ontology. It adjusts weightings based on terminology frequency, location of the terms, proximity to other terms, the combination of terms found and the format and layout of the text containing terms. If the score exceeds a (configurable) threshold of 0.48, the document is tagged with classification results. The process of developing and refining rules to classify documents and automate metadata tagging is discussed in more detail below. Unstructured and Semi-Structured Text-Based Rule Development The pilot focused on building rules for “unstructured” text to classify documents. For example, a simple rule to categorize a document’s content type as “Work Order” would determine if the term “work order” appears at a particular location in the document. Such a rule would avoid false positives (i.e. documents that contain the phrase “work order” but are actually some other content type) that might be obtained via a simple search of the term “work order”. Another part of the rule development included the use of synonyms, which cover everything from simple terms that refer to the same item or concept to common misspellings or abbreviations. For example, project numbers, UPCs, and contract IDs each identify a project. The research team added UPCs and contract IDs as synonyms for project numbers so that users could find a project by looking for any of the three. Similarly, the rules included pay item numbers as synonyms for pay item names, so searching for a number will return the name of the item and vice versa. In a third example, a user searching for a particular route will find documents containing a number of route identifiers (e.g., “RTE 66”, “Route 66”, “RT 66”, “Interstate 66”, and “I-66”). Because the four types of content used in the pilot have more structure than many documents due to having fairly consistent fields, the research team used that structure to improve the categorization rules. For example, the top section of many work order documents included specific fields containing values for project identifiers and work order categories (the VDOT-specified reason for the work order), among other facets. The research team developed rules to tag these work order categories. In a more complete application, more advanced structure rules could be developed. In addition to textual variation, rules were specified to distinguish between when a word appears as a category indicator and when it appears elsewhere in the text. For example, the word “Plan” appears throughout typical work orders so a rule only counts “Plan” (and the other work categories) if it appears in conjunction with the word, “Category:” (which often occurs in the top section as noted above). The rules were further refined to account for variations in spelling and the placement of designation words in the text. For example, seeing “Category:” followed by “Plan” worked in some

pilot Findability Report II-63 cases, but in others, a considerable (and unpredictable) number of words separated “Category:” and “Plan”. Some of the intervening words existed because of poor scan quality and text recognition. To account for this, the rule searches for the word, “Plan” only within two paragraphs of the term “Category:”. This increased the categorization accuracy from unusable to highly accurate. It is a general rule of thumb that 60%-70% accuracy is a minimum acceptable level and 90%+ is usually the goal, although that varies with the type of content and the type of application. Figure II-A-5 illustrates the programming language used for this example rule. In the actual development, the word “PLAN” is replaced by a variable that points to a list of terms, since in some work orders the word is replaced by the description. Work Issue Classification The research team also focused significant efforts on building rules for auto-categorizing work orders and DWRs based on the types of work issues encountered – as described in free text blocks within the documents. This required reading through a sample of the work orders collected to develop an understanding of the types of issues that appeared as work order justifications across agency construction projects. Based on this understanding, the research team reviewed previous work to categorize work issues. The most relevant reference identified was by Sun and Meng, who developed a taxonomy of change causes in construction projects.7 The research team ultimately chose to focus the rule development on identifying weather, drainage, and utilities issues. The challenge in building rules for these issues is in distinguishing when they are problems or when they prompt a work order or plan change. The rules built for the pilot accomplish that in some cases, but in others capture an occurrence more than a problem. Full rule development beyond a pilot could further refine these rules to the desired extent. Upon selecting an initial set of work issues, the research team conducted a search across compiled documents for key related terms. For example, to search for the weather-related work issues, the research team used terms such as “storm”, “rain”, “heavy rain”, “warmer weather”, “winter”, “shutdown”, “muddy”, “snow”, “weather event”, “hurricane”, and “cold temperature”. Similarly, to search for utility issues, the research team searched for documents containing related terms such as “obstruction”, “utility”, “conflict”, “gas line”, “sewer”, “waterline”, “electric”, “power”, and “signal”. Based on these searches, the research team selected a subset of work orders that fit into various categories. For each of these documents, the research team also recorded the text that triggered the categorization. These patterns, including sentence structure, word combinations, and the location of key words, served as the foundation for the initial rule development of work issues. The use of text 7 Sun, Ming and Xianhai Meng. “Taxonomy for change causes and effects in construction projects,” International Journal of Project Management 27 (2009), p. 560-572. <sequence sequencetype="paragraph"> <paragraph> <text data="Category:" /> </paragraph> <skip count="2"/> <paragraph> <text data="PLAN" /> </paragraph> </sequence> Figure II-A-5. Sample work order category rule.

II-64 improving Findability and Relevance of Transportation information analytics (including entity extraction) supplemented this analysis to find additional terms and phrases to use in the rules. For the work issue rule development, two standard section headings (structural elements) signified a location within work orders to search for work issues: “Location and Description of Proposed Work” and “Responsible Charge Engineers Explanation of Necessity for Proposed Work.” The research team developed rules to look for work issues only at these two sections, eliminating a lot of noise and false positives. The work issue rules use a two-step classification process. First, they classify for content type. Almost all C-10s (whether they are “work orders” or “change orders”) have the two “signifying” phrases in them. If those two phrases are present, the software classifies the document as a work order and gives it a “score” of 100, indicating full certainty. Second, the software classifies for work issues. This is challenging because the explanations tend to be short and use non-distinct terms. For example, the word “utility” or “Verizon” may be the only term present that indicates the nature of a utility issue described in the work order. Simply looking for those terms will return many false positives. Meanwhile, searching for longer terms such as “move gas main” produces limited results (e.g., gas main line moves may be planned from the beginning of the project and not require a work order). Using these general terms is the only option available to classify the documents, so the software uses this strategy but gives these documents a low score, indicating a low confidence level in accuracy. In the pilot, the research team used simple lists of these terms to develop the work issue rules (for example, the list in Figure II-A-6 provides phrases used in the rule that auto-categorizes utility issues). In a full development, the rules would be generalized further based on analysis of work combination patterns. This process was applied to daily work reports using a similar approach, expanding work order issue categorizations to issues or topics addressed in daily work reports. Rule Refinement The development of these categorization rules requires a number of testing cycles to refine the rules for greater accuracy. Normally, the process started by finding terms that would correctly identify as many of the target documents as possible (recall). This was followed by testing and refining the rules to reduce the number of false positives (precision). For example, applying this approach to content type abandoned gas line adjustment due to utilities conflict with existing utilities Dominion Power existing pipes replaced existing utilities existing Verizon fire hydrant gas line gas line in conflict gas line in the way gas main gas main in conflict install new manholes new manholes old gas lines power company power lines relocate gas lines relocated utilities relocating utilities sewer line sewer main streetlight poles streetlight relocation telecommunication duct telephone cable telephone lines utilities adjustment utilities conflict utility delay utility relocation utility situation utility work Verizon water service lines waterline alignment waterline placement waterline relocation waterline system Figure II-A-6. Utility issue terms.

pilot Findability Report II-65 classification provided an indication that overly broad rules were capturing casual references to work orders in addition to work orders themselves. Another common technique starts with a small set of documents and achieves maximum recall and precision, then tests the rules against new and larger sets of documents. This process can continue almost indefinitely, but since the payoff decreases each time it is necessary to set an acceptable level of accuracy. For a pilot, this level is normally somewhat lower than for full development. The research team used this technique to develop rules for identifying work issues, using a subset of the total collection of documents to develop the rules. This led to the conclusion that the initial attempt to compensate for non-distinct terms was too restrictive, limiting the ability to identify issues in a larger document set. Full development beyond what was done for the pilot would entail generalizing the rules to other facets besides work issue, applying the rules to more content which might mean additional development, and aiming for “production level” accuracy. A full set of rules applied in the pilot is included in Annex 1. This includes rules for each of the facets specified in Table II-A-2. Incorporating Structured Data The rules locating project identifiers allowed the search tool to incorporate structured project information from the project list. As described in the discussion of “synonyms,” the research team linked the UPC and contract ID to project numbers through this list, so that a document containing any of the three project identifiers is tagged with a project number. The team also looked for a shortened version of each project number. Identified as a “project ID short” in the ontology, this identifier finds project numbers without the leading (FO) or (NFO) of the project. It also drops the characters following the comma in project numbers, as these indicate different phases of the same project. This approach did find more projects, but the research team adds a caution to this approach: it would require additional testing before a full implementation to ensure that it is finding projects correctly. Once the project number is found using any of these identifiers, it is then used as the lookup value to tag the document with project attributes, such as District, Routes, Contract Award Amount, Type of Work, and Road System. This capability results in a more powerful search tool that combines use of structured data resources (e.g., master project data) with text search capabilities.

II-66 improving Findability and Relevance of Transportation information Faceted Search Design The research team built a simple search interface, with facets on the left and search results in the middle of the screen. The faceted search design allows users to explore the content and refine the search results by filtering through selected criteria. Annex 2 includes example search scenarios with screenshots of the faceted search design. The challenge in building a faceted search capability is in generating enough metadata to support each of the facets; however, the rule development described above allowed the research team to do this by auto-categorizing documents with metadata. Figure II-A-7 demonstrates the search tool’s use of facets in an example based on a search for documents in the Hampton Roads District. Upon search submittal, the list of facets appears to allow the users to further filter the results. For example, a user searching for daily work reports for projects in the Hampton Roads District would be able to select “Daily work report” under the “Content Type” facet, and the original set of 1,224 Hampton Roads documents would be further limited to the 197 Hampton Roads District daily work reports. The user can sequentially select facet criteria to allow for combinations of criteria across facets and further limit the results. The faceted search design allows users to select a combination of filters to more quickly find information in response to user business questions and search needs. It also allows users to find target documents by starting with what they know (e.g., that the project involved an Interstate road system or a particular type of work). In this way, a rich set of facets can support a variety of users who start with different knowledge. Users can begin a search in two ways: with a free text search or with a taxonomy-driven search. As users begin typing, the system automatically suggests any terms that match in the taxonomy. This type- ahead feature displays terms that match the letters a user types, with the closest matches displaying first in the list. For example, if a user types “Ut” into the search box, the search tool offers suggestions for the “UTIL” work order category, the “Utilities Issue” work issue, and the “Mitchell Utilities LLC” manufacturer, among others. This option allows users to search for documents about a term in the taxonomy instead of documents that only mention a term. For example, a file (such as an email) might mention a daily work report but not contain one. A simple text search for “daily work report” would return this file as a result, while a taxonomy-driven search would return only files that actually contain daily work reports (because the categorization rules will only tag these documents as daily work reports). Figure II-A-7. Facets within example search.

pilot Findability Report II-67 The pilot solution also includes a capability to browse facets at the outset instead of needing to type in a search term to start a search. This provides users with a starting point for the search if they are unsure of what search term to use. For example, a user may have an interest in searching by a particular type of work, but may want to examine the types of work listed under this facet before choosing what term to use. This discovery works in a similar way to the visual ontology, as discussed in the Semantic Model Development section. The faceted tool incorporates both structured and unstructured data. It allows users to search or filter by the structured information contained in the project list or the unstructured information contained in the documents. For example, a user could search for utility issues on projects greater than $1,000,000 using: 1) The rules built around unstructured information to develop the work issue facet and automatically identify work issues in free-form text; 2) The incorporation of structured information for contract amount based on the project identifier; and 3) The identification of a project identifier that bridges the unstructured and structured information. The pilot version of the faceted search tool does not currently incorporate logic related to the location of information within a multi-faceted search. For example, a user intending to find documents containing a specified material supplied by a specified manufacturer could filter document results to those containing the specified material, then filter a second time to those containing the specified supplier. The ensuing results, however, will be limited to documents that include the specified material and the specified supplier. Current rules do not guarantee a relationship between these two facets – it is possible that the manufacturer appears in the document in reference to an entirely different material, in a different location within the document. This type of logic could be a future extension for the search tool. The research team also considered adding a Best Bets option to select specific documents to automatically return at the top of the results list when a user types a particular term. The pilot does not include this feature because the research team did not have enough information about specific documents that users might want to search for from the interviews; however, this could also serve as a future extension for the search tool. Another future extension could include a user personalization feature. This feature could only expose certain sets of facets for particular users. This is something that could greatly enhance the usefulness of search. For the pilot, all of the content was loaded into a single repository. However, current commercially available search tools support searches across repositories. These capabilities require custom configuration to account for different file formats, access methods, and security protocols. Additionally, although the pilot design used the Microsoft FAST search engine, other search engines could be used. For example, Google Search, Apache Solr, and others could incorporate similar concepts into a faceted search design. Basic features of the Pilot search tool were recorded in a video file, which can be accessed at: http://sites.spypondpartners.com/nchrp2097/Solution%20Demonstration.mp4

II-68 improving Findability and Relevance of Transportation information A.5: Test and Evaluation Rule-Based vs. Plain Vanilla FAST Search One component of the pilot evaluation differentiates between a rule-based search and a plain vanilla FAST search. The rule-based search includes the rules and automation specified by the research team, while the vanilla search includes the same body of content but without any built-in rules, i.e., similar to the status-quo “out-of-the-box” search environment. The vanilla search was set up by porting the pilot content into a separate instance of the content management system without the ontology and developed facets. There are a limited number of facets available with the standard setup, including document type (Word, PDF, etc.), Author (who last touched the file), Date (file date), and Company (a generic set of company names). The evaluation compares the two environments. Testing and Subjective Evaluation The research team collected baseline metrics on success rate, using a selected set of search test cases that reflect the business questions and search needs collected through the VDOT interviews. The baseline metrics for these test cases were collected using the vanilla search, with post-improvement values collected through the rule-based search. Evaluation metrics include: Precision of results Recall of results Time spent to compile results Qualitative user feedback Metrics specifying the precision of results were collected to identify if the rule-based search resulted in documents more or less relevant to the user than the vanilla search. These metrics include: The total number of results, which is used both to calculate the percentage of total results that were relevant and to identify test cases where a search appeared to have “too many” results – i.e., cases unlikely to have high precision. The position of the first relevant document signifies if the earliest results (sorted by relevancy) are relevant to the test case. A user searching for an individual document would prefer to have that document appear earlier in the results; similarly, a user searching for a set of documents would prefer to find relevant documents immediately rather than after many results. Relevancy was defined individually for each test case. The number of relevant documents in the top 20 results assesses how many of the first set of documents were relevant to the user. In cases where the search returned fewer than 20 results, this metric considers all results (e.g., the number of relevant documents in the top 15 results in a search that only returns 15 results). The percentage of documents in the top 20 results that are relevant provides a key precision comparison metric. The percentage is calculated by dividing the number of relevant documents identified by 20 (or by the total number of results if fewer than 20).

pilot Findability Report II-69 The number of documents needed to find 10 relevant results (or in some scenarios, a number less than 10) allows the research team to compare the vanilla and rule-based search capabilities to find a set of documents. The research team also collected metrics related to the recall of results to assess which search method is more capable of returning a full set of documents desired by the user. These metrics include: The number of known relevant documents, which is used to calculate the recall percentage, and is evaluated based on the content collection structure. The majority of the content was downloaded by project, so individual project documents are readily identified outside of the search environment. The remaining documents were not downloaded by project, but were each examined individually to determine the associated project. Because of this structure, recall metrics focus on project identifiers. To evaluate recall of other facets (e.g., work issues), the research team would need to read and manually tag each of the documents in the content set, a time-consuming process. The number of relevant documents in the top 30 results assesses how many of the first set of results are relevant to the user. In cases where the search returns fewer than 30 results, this metric considers all results (e.g., the number of relevant documents in the top 15 results for a search that only returns 15 results). The choice to examine the top 30 results was based on time constraints, and the concept that a well-performing search should find the relevant results early in the result set. The recall in the top 30 results divides the number of relevant documents in the top 30 results by the number of known relevant documents. A higher number represents higher recall – i.e., the search finds a greater percentage of the total known results. The research team also applied other evaluation metrics in specific cases when appropriate. For example, when searching for a specific document, the research team evaluated the result position of the document to demonstrate the relative ease with which a user could find the document (in a sense, as a proxy for the time that it would take a reader to find the document). The full set of evaluation test cases and metrics are provided in Annex 3. General findings from the pilot evaluation are as follows: The combination of structured (i.e., linked through the project identifier) and unstructured information allows the user to conduct rule-based searches that would not be possible in a vanilla search by using related information from outside of the documents. The rule-based search often results in higher precision of specified searches. In some cases, this comes at the cost of lower recall. This is particularly true for searches related to the project number (which included combinations of letters and numbers), as it appeared that the FAST software more easily distinguished “0”s from “O”s and “5”s from “e”s in documents with poor scan quality, possibly due to differences in the capabilities of reading OCRed text in each piece of software. Including rules to limit the scope of work issue identification within the document based on the standard section headings resulted in higher precision but lower recall. A more complete application could explore tweaking the rule to capture some of the variants present in the

II-70 improving Findability and Relevance of Transportation information section heading language, which could increase recall. Similarly, a more complete application could examine if including rules for finding work issues in additional parts of the document would allow for increased recall while maintaining a high level of precision. The use of synonyms in the rule-based search provides greater precision and recall in a number of searches. For example, a rule-based search is able to find documents that use one of three project identifiers – project numbers, contract IDs or UPCs – even when users search for another. For example, if a user searches for project number 867-4305 and a document uses the contract ID 1986 but not project number 867-4305, the search results will include the document. Synonym matching behind the scenes means that users need to know only one important piece of information to find the content they need. The vanilla search is more adequate for searches of terms that do not have many synonyms in common use, or different contexts. For example, the vanilla search would have high precision and recall of results on a search for Source of Materials forms, which have fairly constant language. However, terms that apply to multiple contexts are better suited to a rule-based search. For example, a vanilla search for a specific route may return results using the number in a different context (e.g., as part of an address). The rule-based search is able to provide some structure to these searches to improve result precision. The rule-based search interface can accommodate misspellings by suggesting search terms from the ontology, resulting in greater search recall and precision. Similarly, the rule-based search interface can suggest terms to further filter within facets, allowing a user to add specificity to a search through name recognition. The test cases in Annex 3 are organized into the four information needs categories described in the “Findability Needs” section. A brief summary of results applicable to each of these categories is also provided below: 1. Find a Single Known Document for a Project (e.g., an estimate) Using a Variety of Search Criteria. The rule-based search can accomplish this effectively, but for some types of searches a vanilla search can also do so if specified at a similar level. This is particularly true for simple searches with a few search terms. The main advantages in using the rule-based search here derive from general findings discussed above. 2. Find/Review All Documents for a Project (e.g., for a FOIA Request). The rule-based search is able to find documents containing the project number, contract ID, or UPC, regardless of which is specified in the search. This flexibility provides an advantage over the vanilla search, which requires that the project identifier specified exist in the document. The degree to which this is helpful depends on the search – for example, searching across all project documents would favor the rule-based approach because of the variety of project identifiers used in different project documents; searching for daily work reports containing a specific contract ID could return similar results in both a rule-based and vanilla search because the contract ID is so prevalent in daily work reports. 3. Search Across Projects - Find Projects with Item, Material, Construction Technique. Again, a major advantage here of a rule-based search derives from the use of synonyms – for example, a rule-based search for a pay item by number would also find instances where the pay item name appears without a number. Meanwhile, the structure of the Source of Materials forms results in high levels of precision for both vanilla and rule-based searches related to materials,

pilot Findability Report II-71 with the main advantages in using the rule-based search derived from general findings discussed above. 4. Research Reasons for Delays and Changes. The ability for the rule-based search to define work issues allows a user to specify a singular work issue instead of multiple terms (e.g., the “utility work issue” is built on over 40 phrases). This categorization allows a user to more easily search for documents with a specified issue. The rule-based search limits the number of results that a user must review to research reasons for delays and changes, with high precision for these results. Additionally, the rule-based search provides the user with the ability to search for documents with a VDOT-specified work order category with higher precision than through the vanilla search (due to the frequent use of the work order category terms such as “VDOT” and “ADD” in a different context). Virginia DOT User Evaluation The research team demonstrated the pilot solution at a focus group at the Virginia DOT on February 9, 2016. Focus group participants included: Telecom Coordinator, Business Owner for InsideVDOT (VDOT’s intranet) District Technical Resource Manager Knowledge Management Office Director Quality Specialist for Internet and Extranet Information and Knowledge Management Program Coordinator District Technology Resource Manager Project Manager for the Project Document Management System Project Construction Project Controls Lead District Construction Administrator Feedback from focus group participants focused mainly on potential implementation of a text analytics solution. Users noted that implementation would require rules, governance, and buy-in across the agency. They would need to make a business case (e.g., staff time or money savings) in order to receive that buy-in and funding. One participant provided a potential business case that this type of findability solution could decrease storage costs (due to reduction of duplicative information) and save millions of dollars. This type of argument would need to be further demonstrated and reinforced. Additionally, focus group participants discussed the agency roles that would need to accompany an implementation effort. For example, they suggested a “reviewer” role for checking metadata. Additionally, this effort would require agency staff to have the skills to build ontology and rules . VDOT does have a professional librarian who has this skill set. Focus group participants were also interested in the technical implementation. They mentioned that any text analytics solution would need to be integrated with VDOT’s current content management/collaboration solutions as part of the document intake process. Furthermore, implementation of a federated search tool would require work to identify and work through variations in permissions. The indexer could be granted the necessary access in order to do the auto- classification, but user access restrictions on the content itself would need to be enforced.

II-72 improving Findability and Relevance of Transportation information Transferability Analysis The final component of the pilot evaluation considers the transferability to other agencies of the findability solution and development process. While each findability solution should be driven by an information architecture that is tailored to the agency’s situation, the general ideas behind the pilot setup contained in this report are transferable, including the logic behind the rules in Annex 1. Using a similar approach in any software will likely produce similar results. The WSDOT provided information and feedback for purposes of this transferability analysis. Transferability Subjective Testing To test the transferability of classification rules, WSDOT provided content similar to the VDOT content: 100 inspector’s daily work reports, 134 change orders, and 112 requests for approval of material. This content contains similar language and form fields, with similar content quality issues (e.g., poor scans, variation of usage and forms). However, these documents were generated from applications, so are more consistent from document to document. The research team evaluated the effort needed to convert rules developed for VDOT documents to apply to the WSDOT documents in order to provide an estimate of the transferability of the rules developed to other agencies. To do this, the team directly applied the ontology built for VDOT to the WSDOT content using Smartlogic’s Classification Server. Table II-A-4 contains transferability results of the rules to classify content type. These metrics examine the recall of the classification, comparing the documents that correctly classified as a given content type to the known total number of documents of that content type. The results demonstrate that the current classification rules for content type as work order or inspector’s report could be directly applied to WSDOT without any changes. The naming difference of the materials form would require a simple addition to the rule of the WSDOT term to correctly classify those documents at WSDOT. Table II-A-4. Content type transferability results: recall. Content Type Total Documents Correctly Classified Recall Work Orders/Change Orders 134 (Including 5 Unreadable) 125 93% Daily Work Reports/ Inspector’s Daily Reports 100 (Including 2 Unreadable) 98 98% Source of Materials Forms/Request for Approval of Material 112 0 0% Table II-A-5 presents transferability results of the rules to classify work issues. These metrics examine the precision of the classification, comparing the documents that were correctly classified to the total number of documents that were classified for each work issue. Based on these results, the current rules for a drainage issue and a weather issue could be applied directly to WSDOT; however, the rules

pilot Findability Report II-73 for a utility issue tend to capture some of the materials listed in the change order. Improving the precision of these rules would require narrowing the language and adjusting the scope of the rules to avoid that content. Table II-A-5. Work issue transferability results: precision. Work Issue Total Documents Classified Correctly Classified Precision Drainage Issue 19 19 100% Weather Issue 13 11 85% Utility Issue 16 7 44% To customize the pilot content and process, WSDOT could follow the steps shown in Figure II-A-8. A similar approach also would apply to other agencies choosing to customize this approach. The successful classification of most of WSDOT’s content suggests, however, that the information architecture developed for VDOT is usable by other agencies even though the ontology will need customization. The expected effort to customize other facets for WSDOT would vary. For example: Classifying manufacturers and suppliers, contractors, materials, equipment, district, and jurisdiction could be done within hours for each facet by extracting data for each and replacing the existing rules with the extracted data lists. Some of these items (e.g., materials), may also be able to build on the existing lists. Generating project information (e.g., type of work, award amount, etc.) and using this information to link to project identifiers within documents would take moderate effort (i.e., multiple days). Washington State DOT User Evaluation WSDOT staff also provided input through a focus group discussion on March 8, 2016. Participants provided feedback on how the features of the rule- Pilot Content Use Data from Current Pilot System(s)• Capture Spelling, Synonyms, Model RelationshipsSupplement with Lists not in Database• Text Mining, Existing Resources Customize with DOT-Speci�ic Terms Create Project Pro�iles Develop Text Analytics Rules• Adjust Indicator TextModel Ontology Relationships• Project - District, Route, etc. Round of Testing and Re�inement Customized Content Figure II-A-8. Customization process.

II-74 improving Findability and Relevance of Transportation information based search applied to the WSDOT context and uses. Participants represented the following functions: Knowledge Management Records Management Risk Management Communications (website) Construction Materials Research Data Management Information Technology Security Library Services Asset Management The WSDOT focus group participants noted a number of existing information management “pain points” related to both technical and process challenges. These pain points included: Determining which document is authoritative. Fragmentation of information repositories making documents difficult to find. Apparent redundancy of documents that actually provide a valuable historical record. Different formatting in documents received from subcontractors, including handwritten notes, which leads to difficulty in determining what material was used or what was installed on a project. Difficulty in finding information on assets that have been replaced. The use of email records that follow employees instead of remaining connected to the employee position. The use of multiple project identifiers. Internal and external dissatisfaction with search capabilities. Focus group participants raised a number of questions and discussion points about the transferability of the pilot. As in the focus groups at VDOT, these discussions focused mainly on the potential implementation of the pilot search tool or a similar tool in the WSDOT setting. Participants were interested in the level of effort involved in this process, including the initial effort required to set up the environment, build the taxonomy, and tag documents. Beyond the initial effort, participants were interested in the effort required to maintain everything, including administration, validation, and updating of the ontology and search capabilities. Participants noted that because taking the next steps with this process requires both time to lead the effort and additional financial resources, it would be useful to consider where the most payoff occurs for findability efforts. Much of the conversation also focused on the complexity of applying this tool to a complex DOT information landscape. Participants were interested in the search tool’s ability to search both within databases and across repositories, and had questions about how to build the search tool across different kinds of servers, permissions, and access requirements. Finally, the focus group participants discussed the capabilities of the text analytics tool used for the pilot, the availability of other, similar tools, and the mechanics of developing and maintaining a taxonomy over time. Finally, one participant noted that an ideal future tool would be able to integrate text analytics capabilities with a geospatial front end (i.e., allow users to conduct a faceted search for documents beginning with a map selection).

pilot Findability Report II-75 Annex 1 Pilot Classification Rule Descriptions The following subsections provide descriptions of the rules used to classify documents. The actual rules used in the pilot were built using a programming language (as demonstrated in the example rule in Figure II-A-5). Each rule includes weightings of different factors that the research team identified while testing content for successful classification. Those factors are set to count as described below, but the algorithm in the software adjusts weights up and down. Content Type Daily Work Report Find one of the following phrases: 1) “Daily report of construction” 2) “Inspector's daily report” 3) “project diary – daily work report” 4) “PM Diary” Weight any of these phrases as 0.5. Related to Work Order Find the phrase “work order” combined with one of the following: 1) “approval” 2) “proposed” 3) Other related ideas (e.g., asking for a signature, or language in an email chain negotiating a work order or signifying that a work order is coming) This series of phrases is meant to distinguish work orders from emails and documents that refer to work orders. Each of these phrases is weighted 0.5. If more than one of these phrases is present, the document is likely to have a higher relevancy score as the weight of phrases is cumulative in Content Types. Source of Materials Form Find one of the following phrases: 1) “VIRGINIA DEPARTMENT OF TRANSPORTATION SOURCE OF MATERIALS” (must be all capital letters and an exact phrase) 2) “SOURCE OF MATERIALS” (must be all capital letters) 3) “C-25”

II-76 improving Findability and Relevance of Transportation information The first listed phrase is on all C-25 forms. This may not catch Source of Materials that are not on the official form, but it screens out referrals to Source of Materials in emails and daily work reports. If the second phrase is caught instead, a lesser weighting (of 0.25) is used to contribute to the classification. This helps account for references in other material where writers have used all capital letters for the entire document. Exclude the following combination from contributing to the classification: 1) “source of materials” in conjunction with any tense of the verb “to be” has zero weight since this combination generally referred to a Source of Materials instead of denoting a C-25. The first phrase is set to "score if found" meaning that it is weighted at 1.0. The second and third phrases are weighted at 0.25. Work Order Find one of the following: 1) The phrases “location and description of proposed work” and “responsible charge engineers explanation of necessity for proposed work” 2) The phrase “Change Order” (must be capitalized and an exact phrase). Both of these phrases are weighted at 1.0. Contract Award Amount Find the following: 1) A project identifier, as defined in the “Project” rules If the project identifier is found, the contract award amount is added based on the project information provided in the project list. Contract award amounts have no weightings as their scores are inherited from the project rules. Contractor Find one of the following: 1) A single mention of the contractor 2) A single mention of the contractor with a variation of the name that drops a suffix. In the second case, examples of the dropped suffix include but are not limited to: “Inc.”, ”LLC”, ”Corp.”, ”Corporation”, ”Company”, and “Co.” For example, the rules would find references to “Branscome Inc.” or “Branscome”. Each contractor is weighted at 1.0. The list of values for contractors is defined in a separate list.

pilot Findability Report II-77 District Find one of the following: 1) A project identifier, as defined in the “Project” rules 2) Several mentions of a district If the project identifier is found, the district is added based on the project information provided in the project list. In most instances, districts have no weightings as their scores are inherited from the project rules. Districts inherit their scores from project identifiers because documents often do not refer to the district by name, but it can be important for users to know the district associated with a project. However, when found in documents, districts are weighted as follows: the first mention receives a weight of 0.35. Subsequent weightings are adjusted algorithmically so that it takes four mentions of a specific district to score a base relevancy of 0.48. The list of values for districts is defined in a separate list. Equipment Find: 1) A single mention of the equipment Each piece of equipment is weighted at 1.0. The list of values for equipment is defined in a separate list. Jurisdiction Find one of the following: 1) A single mention of a city 2) A single mention of a county Each city and county phrase is weighted at 1.0. The list of values for cities and counties are defined in separate lists. Manufacturers and Suppliers Find one of the following: 1) A single mention of the manufacturer/supplier 2) A single mention of the manufacturer/supplier with a variation of the name that drops a suffix. In the second case, examples of the dropped suffix include but are not limited to: “Inc.”, ”LLC”, ”Corp.”, ”Corporation”, ”Company”, and “Co.” For example, the rules would find references to “BMG Metals, Inc.” or “BMG Metals”. Each manufacturer and supplier is weighted at 1.0. The list of values for manufacturers and suppliers is defined in a separate list.

II-78 improving Findability and Relevance of Transportation information Materials Find: 1) A single mention of the material Each material is weighted at 1.0. The list of values for materials is defined in a separate list. Pay Items Find one of the following: 1) A single mention of the pay item name 2) A single mention of the pay item number Each pay item is weighted at 1.0. The list of values for pay items is defined in a separate list. Either of these two items will link to the document so that users will find all documents for a pay item when searching with either pay item identifier, even if the specified identifier is not in the document. Project Find one of the following: 1) A project number with or without the leading “(FO)” or “(NFO)” 2) A project number without any number-letter combination that follows the final comma 3) A UPC number preceded by “UPC” 4) A contract ID As an example of the project number criteria (1 and 2 above), the rules would search for “0015-030- 117” and “(FO)0015-030-117” to find project number “(FO)0015-030-117,C501.” Any of these four items will link to the document so that users will find all documents for a project when searching with any project identifier, even if the specified identifier is not in the document. Each project number, contract ID, and UPC is weighted at 1.0. Road System Find the following: 1) A project identifier, as defined in the “Project” rules If the project identifier is found, the road system is added based on the project information provided in the project list. Road systems have no weightings as their scores are inherited from the project rules. Route Find one of the following: 1) A project identifier, as defined in the “Project” rules

pilot Findability Report II-79 2) Several mentions of a route preceded by “RTE” 3) Several mentions of a route preceded by “RT” 4) Several mentions of a route preceded by “ROUTE” 5) Several mentions of a route preceded by “U.S.” 6) Several mentions of a route preceded by “SR” 7) Several mentions of a route preceded by “State Route” If the project identifier is found, the route is added based on the project information provided in the project list. In most instances, routes have no weightings as their scores are inherited from the project rules. Routes inherit their scores from project identifiers because documents often do not refer to the route by name, but it can be important for users to know the route associated with a project. However, when found in documents, routes are weighted as follows: the first mention receives a weight of 0.35. Subsequent weightings are adjusted algorithmically so that it takes four mentions of a specific route to score a base relevancy of 0.48. Any of route identifiers will link to the document so that users will find all documents for a route when searching with any route identifier, even if the specified identifier is not in the document (e.g., a user searching for “RTE 66” will also find documents containing “RT 66”, “ROUTE 66”, etc.). The list of values for routes is defined in a separate list. Type of Work Find the following: 1) A project identifier, as defined in the “Project” rules If the project identifier is found, the type of work is added based on the project information provided in the project list. Type of work has no weightings as their scores are inherited from the project rules. Work Issues The work issue rules have a complex scoring system. First, a document is identified as one of the following content types: 1) Work Order 2) Daily Work Report 3) Related to Work Order If the document classifies as one of these content types, it is given an initial score of 0.35 or 0.40. Otherwise, it is given an initial score of 0. The document is then scanned for a list of phrases applicable to the specified work issue. The rules search for these phrases in three ways: 1) An exact phrase (a “phrase score”) 2) Within two words of one another (a “near score”) 3) Within a sentence (a “sentence score”)

II-80 improving Findability and Relevance of Transportation information For example, the phrase “abandoned gas line” could have the following matches: 1) A phrase score: “The design calls to remove the abandoned gas line” 2) A near score: “The gas line was abandoned” 3) A sentence score: “The gas utility abandoned several lines, necessitating additional work.” Based on these matches, the document score will increase. For example, a document that is one of the three specified content types and contains one work issue phrase would score a 0.48. If it contains two work issue phrases, it would score a 0.55. A document must score 0.48 or above to classify for a work issue. If the rules find a work issue term outside of the three specified content types, the document receives a score of 0.11. If the document finds multiple work issue phrases, the score will increase with each instance (with the possibility that a document that contains many references to a work issue will score 0.48 or above and classify as a work issue even if it does not meet the content type criteria). This scoring process applies to each of the work issues. The following subsections provide further detail on the phrases included to classify each work issue. Drainage Issue Use the work issue criteria defined in the “Work Issue” section introduction, combined with the following phrases specific to drainage issues: 1) “add underdrain” 2) “adequate drainage” 3) “cleaning storm drain” 4) “different drainage” 5) “drainage abilities” 6) “drainage alteration” 7) “drainage analysis” 8) “drainage change” 9) “drainage error” 10) “drainage modifications” 11) “drainage problem” 12) “drainage revision” 13) “drainage structures” 14) “draining delay” 15) “erosion control” 16) “erosion problem” 17) “excessive erosion” 18) “modified underdrain” 19) “necessary drainage” 20) “new drainage” 21) “permanent diversion ditch” 22) “planned drop inlet” 23) “positive drainage” 24) “required new wingwalls”

pilot Findability Report II-81 25) “revise drainage” 26) “storm drain installation” 27) “storm drainage” 28) “storm sewer placement” 29) “underdrain installation” 30) “water ponding” Utility Issue Use the work issue criteria defined in the “Work Issue” section introduction, combined with the following phrases specific to utility issues: 1) “abandoned gas line” 2) “adjustment due to utilities” 3) “conflict with existing utilities” 4) “Dominion Power” 5) “existing pipes replaced” 6) “existing utilities” 7) “existing Verizon” 8) “fire hydrant” 9) “gas line” 10) “gas line in conflict” 11) “gas line in the way” 12) “gas main” 13) “gas main in conflict” 14) “install new manholes” 15) “new manholes” 16) “old gas lines” 17) “power company” 18) “power lines” 19) “relocate gas lines” 20) “relocated utilities” 21) “relocating utilities” 22) “remove gas lines” 23) “sewer line” 24) “sewer main” 25) “streetlight poles” 26) “streetlight relocation” 27) “telecommunication duct” 28) “telephone cable” 29) “telephone lines” 30) “utilities adjustment” 31) “utilities conflict” 32) “utility delay” 33) “utility relocation”

II-82 improving Findability and Relevance of Transportation information 34) “utility situation” 35) “utility work” 36) “Verizon” 37) “Verizon in conflict” 38) “water service lines” 39) “waterline alignment” 40) “waterline placement” 41) “waterline relocation” 42) “waterline system” These phrases do not include the term “utility issue” as this phrase is used to identify work order categories. Using it would result in misclassified documents. Weather Issue Use the work issue criteria defined in the “Work Issue” section introduction, combined with the following phrases specific to weather issues: 1) “anticipated hurricane” 2) “cold temperatures” 3) “due to rain” 4) “due to showers” 5) “due to weather” 6) “extreme heat” 7) “extreme rains” 8) “flood plain” 9) “heat” 10) “heavy rain” 11) “heavy rainfall” 12) “hot weather” 13) “ponding at road edge” 14) “prolonged curing” 15) “rain delay” 16) “rain events” 17) “rainfall inspection” 18) “severe weather” 19) “shutdown due to rain” 20) “shutdown due to weather” 21) “shutdown during months” 22) “warmer weather” 23) “weather conditions” 24) “weather delay” 25) “weather event” 26) “wet conditions” 27) “wet roadway conditions”

pilot Findability Report II-83 28) “wet weather” 29) “winter shut down” Work Order Categories Find all of the following: 1) Identification that the document is a work order, as described in the “Work Order” section 2) A work order category abbreviation within two paragraphs of “Category:” Limiting the documents to work orders eliminates the possibility of finding a work order category in other document types. The list of values for work order categories is defined in a separate list. Finding a work order category abbreviation includes a positional restriction (i.e., two paragraphs) to find work order categories that OCRs do not read as being on the same line as “Category:” but are found near “Category:” This restricts the search from finding work order category information contained elsewhere in the document (e.g., in a list at the bottom), a variation that could be accounted for in a production environment (provided that the information is typed and not a handwritten “x” in a list box).

II-84 improving Findability and Relevance of Transportation information Annex 2 Example Scenarios Using Faceted Search Design This Annex provides examples of search scenarios. These examples include screenshots of the faceted search design, in order to illustrate the pilot tool that the research team built. A video illustrating these and additional scenarios can be accessed at: http://sites.spypondpartners.com/nchrp2097/Solution%20Demonstration.mp4 Scenario 1: Finding Daily Work Reports for a Project 1. It is possible to search by entering the UPC: “18944” into the search box. The corresponding project number appears, as it is linked through the project list.

pilot Findability Report II-85 2. It is also possible to search by entering the contract number (“R00018944C02”) into the search box. The corresponding project number appears, as it is also linked through the project list. It is not necessary to type the entire number due to the auto-suggest feature. 3. Finally, it is possible to search using the project number.

II-86 improving Findability and Relevance of Transportation information 4. Once selected, clicking on the magnifying glass begins a search and leads to a set of 14 results. A user can further filter these results by any number of facets. In the image below, the user may choose to filter by the “daily work report” content type by clicking on that facet. This further limits the set of results.

pilot Findability Report II-87 5. There are 13 results. This can be further filtered. For example, a user searching for daily work reports within a specific time range could in theory filter using a date range (although the daily work reports in the sample did not have consistent or readable dates, so pilot rules did not work with the date). But to simulate this, the pilot does include the document update date, as demonstrated below. Selecting a group here may further limit the results (e.g., to two results for the earliest document modified date range).

II-88 improving Findability and Relevance of Transportation information Scenario 2: Finding Assets Supplied by a Specific Manufacturer 1. This example will search for content from a specific sign manufacturer, Korman Signs. Since suppliers and manufacturers are in the ontology, a user can search for the supplier directly by entering the name in the search box, and selecting “Korman Signs… in Manufacturers and Suppliers.” 2. This search results in 64 documents. The user can then further filter using the facet lists. For example, the user can filter the content by district. For example, selecting “Hampton Roads District” would result in 16 documents.

pilot Findability Report II-89 3. This can then be further filtered by any number of criteria to further refine the results.

II-90 improving Findability and Relevance of Transportation information Annex 3 Evaluation Metrics The following subsections present test case evaluations for the four different categories of information access needs identified in the VDOT interviews. Each test case description is accompanied by a definition of relevancy and a set of steps for both the vanilla and rule-based searches. Find a Single Known Document for a Project (e.g., an Estimate) Using a Variety of Search Criteria The first set of test cases compares how well the vanilla and rule-based searches could find a single document for a project using various search criteria. In the first example of this (Tables II-A-6a and II-A-6b), the research team searched for a daily work report on a specific date, route, and district, with an unknown project. This search assumes that the user would recognize the description when reading the document (i.e., the document file name is used in the relevancy criteria under the assumption that the user would recognize the text when reading this document). In this test case, the rule-based search is able to take a complex amount of information and narrow it down to a manageable number of documents for the user to read through. Specifying the rule-based search takes more steps than with the vanilla search, but this time is inconsequential in comparison to reading through the increased content level as required in the vanilla search. Table II-A-6a. Relevancy Criteria and Search Steps: Daily Work Report for a Specific Date, Route, and District Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document is a daily work report. 2) Daily work report is for the date of 9/7/2012 3) Daily work report is for a project on Route 13. 4) Daily work report is for a project in Hampton Roads District 5) Document is “0013-001-623/EST 06 DIARY REPORT.pdf” 1) Search: 9/7/2012 daily work report route 13 1) Type 9/7/2012 without using auto- suggest, then search. 2) Select Content Type = Daily work report 3) Select District = Hampton Roads District 4) Select Route = US Route 13 5) Read each document until desired document found. Table II-A-6b. Metrics: Daily Work Report for a Specific Date, Route, and District Metric Vanilla Rule-Based Total number of results (excluding duplicates) 59 14 Result position of targeted document 39 6 Number of steps to specify search 1 4

pilot Findability Report II-91 A second test case (Tables II-A-7a and II-A-7b) similarly anticipates that the user will recognize the document when reading it, and provides the criteria that the user is searching for a FHWA conceptual approval of work orders related to a drainage issue, in “.doc” format. In this example, the rule-based search is able to use the “Related to Work Order” facet to narrow the drainage issue content to a more manageable volume. Limiting it to Microsoft Word documents narrows this even further, resulting in a maximum of only five documents to read through, compared to 36 for the vanilla search. Although the total number of results is an improvement over the vanilla search, the result position of the targeted document is only a few documents earlier. In this case, the user is able to effectively specify criteria in the vanilla search adequately enough to find the document as the fifth result. Although the rule-based search encourages a user to increase the detail of search specification by providing additional options, a vanilla search does not. The level of specification is critical in understanding the effectiveness of the vanilla search. Table II-A-7a. Relevancy Criteria and Search Steps: Microsoft Word Document for FHWA Conceptual Approval of Work Order Related to Drainage Issue Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document is a Word Document 2) Document is a signed FHWA Conceptual Approval 3) Conceptual approval is related to a drainage issue 4) Document is “Concept_WO_10_Approval.doc” 1) Search FHWA approval drainage 2) Select Result Type = Microsoft Word 1) Type and select Drainage issue in auto-suggest, then search. 2) Select Content Type = Related to work order 3) Select Result Type = Microsoft Word 4) Read each document until desired document found. Table II-A-7b. Metrics: Microsoft Word Document for FHWA Conceptual Approval of Work Order Related to Drainage Issue Metric Vanilla Rule-Based Total number of results (excluding duplicates) 36 5 Result position of targeted document 5 2 Number of steps to specify search 2 3 The third single-document test case (Tables II-A-8a and II-A-8b) searches for a specific work order for a bridge project in the Virginia Beach jurisdiction. In this test case, the rule-based search limits the documents to a manageable size to review, with some knowledge of the project and issue. Meanwhile, the vanilla search provides a considerable difference in the number of results depending on how much detail is specified in the search (with a greater level of detail specified and better results in Version 1 than in Version 2). Although it is not the case here, the position of the relevant result could also vary significantly based on the user’s knowledge about the document. With substantial knowledge and a highly specific search, the rule-based search does not provide an improvement over the vanilla search. With less direct knowledge about the document, it is more likely to provide an advantage by offering suggestions on how to refine the search.

II-92 improving Findability and Relevance of Transportation information Table II-A-8a. Relevancy Criteria and Search Steps: Work Order for a Virginia Beach Bridge Project Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document is a work order. 2) Document is for a project that has “Bridge” type of work. 3) Document is for a project in Virginia Beach. 4) Work order about installing a horizontal directional drilled sanitary sewer force main Version 1 1) Search: Virginia Beach bridge work order horizontal drilled sewer main Version 2 1) Search: Virginia Beach work order sewer 1) Type and select Virginia Beach in auto-suggest, then search. 2) Select Type of Work = Bridge 3) Read each document until desired document found. Table II-A-8b. Metrics: Work Order for a Virginia Beach Bridge Project Metric Vanilla Version 1 Vanilla Version 2 Rule-Based Total number of results (excluding duplicates) 3 74 5 Result position of targeted document 1 2 3 Number of steps to specify search 1 1 2 Find / Review All Documents for a Project (e.g., for FOIA Request) The research team also evaluated a number of test cases related to finding a set of documents for a specific project, based on the use of project identifiers. Because the rule-based search is able to link all project identifiers (contract ID, UPC, and project number), it is able to find project documents containing any of the three identifiers regardless of which the user enters in the search. This provides an advantage over the vanilla search where there are differences in the identifier type entered in the search and the identifier type displayed in the document. The first test case (Tables II-A-9a, II-A-9b, and II-A-9c) measures precision and recall of daily work reports using a specific contract ID to search. This search has the same perfect performance for both the vanilla and rule-based searches. This demonstrates that both searches are capable of finding project daily work reports based on the contract ID. As this identifier often appears in daily work reports, this result is unsurprising. Table II-A-9a. Relevancy Criteria and Search Steps: Daily Work Reports for Contract ID V00014672C01 Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document contains a daily work report 2) Daily work report is for Project 0337-122-F14, UPC 14672, or Contract ID V00014672C01 or C00014672C01. 1) Search: V00014672C01 daily work report 1) Type V00014672C01 in auto- suggest, select the corresponding project number, and search. 2) Select Content Type = Daily work report. 3) Look for relevant documents

pilot Findability Report II-93 Table II-A-9b. Precision Metrics: Daily Work Reports for Contract ID V00014672C01 Precision Metric Vanilla Rule-Based Total number of results (excluding duplicates) 8 8 Position of first relevant document 1 1 Number of relevant documents in top 20 results (or in all results if fewer than 20) 8 8 Percentage of documents in top 20 results (or in all results if fewer than 20) that are relevant 100% 100% Documents needed to find 5 relevant results 5 5 Table II-A-9c. Recall Metrics: Daily Work Reports for Contract ID V00014672C01 Recall Metric Vanilla Rule-Based Number of known relevant documents 8 Number of relevant documents in top 30 results 8 8 Recall in top 30 results 100% 100% The evaluation tells a different story when searching for documents using the UPC, another type of project identifier. The following test case (Tables II-A-10a and II-A-10b) searches for daily work reports for the same project, using the UPC to search instead of the contract ID. Although the recall is high for this project when using the contract ID in the vanilla search, the search only finds 25% of the documents when using the UPC. Meanwhile, the results for the rule-based search remain unchanged. The rule-based search performs better than the vanilla search by linking UPC to project number and contract ID, enabling it to find documents that contain any of the project identifiers. The vanilla search, on the other hand, can only find documents that contain the UPC, an identifier less frequently used in VDOT daily work reports. Table II-A-10a. Relevancy Criteria and Search Steps: Daily Work Reports for UPC 14672 Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document contains a daily work report 2) Daily work report is for Project 0337-122-F14, UPC 14672, or Contract ID V00014672C01 or C00014672C01. 1) Search: daily work report 14672 1) Type UPC 14672 in auto-suggest, select the corresponding project number, and search. 2) Select Content Type = Daily work report. 3) Look for relevant documents Table II-A-10b. Recall Metrics: Daily Work Reports for UPC 14672 Recall Metric Vanilla Rule-Based Number of known relevant documents 8 Number of relevant documents in top 30 results 2 8 Recall in top 30 results 25% 100% This pattern is similar when searching for work orders using the UPC, as illustrated in the following test case (Tables II-A-11a, II-A-11b, and II-A-11c). Again, the rule-based search outperforms the vanilla

II-94 improving Findability and Relevance of Transportation information search for document recall, finding 92% of all relevant documents in the top 30 results (compared to 15% in the vanilla search). Notably, the vanilla search only returns 5 documents. The ability to find documents that include any project identifier enables the rule-based search to generate this higher recall value. Both the vanilla and rule-based searches have high precision (100% for the rule-based search in the top 20 results), demonstrating that high recall does not compromise the precision. Table II-A-11a. Relevancy Criteria and Search Steps: Work Orders for UPC 50057 Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document contains a work order 2) Work order is for Project 0615- 047-169, UPC 50057, or Contract ID U00050057C01 or C00050057C01. 1) Search: 50057 work order 1) Type 50057 in auto-suggest and select (NFO)0615-047-169, then search. 2) Select Content Type = Work order. 3) Look for relevant documents. Table II-A-11b. Precision Metrics: Work Orders for UPC 50057 Precision Metric Vanilla Rule-Based Total number of results (excluding duplicates) 5 31 Position of first relevant document 2 1 Number of relevant documents in top 20 results (or in all results if fewer than 20) 4 20 Percentage of documents in top 20 results (or in all results if fewer than 20) that are relevant 80% 100% Documents needed to find 10 relevant results Only 4 Found 10 Table II-A-11c. Recall Metrics: Work Orders for UPC 50057 Recall Metric Vanilla Rule-Based Number of known relevant documents 26 Number of relevant documents in top 30 results 4 24 Recall in top 30 results 15% 92% The vanilla search is highly effective in finding documents using the project number, however. The next test case (Tables II-A-12a and II-A-12b) considers recall of a search for work orders using the project number as the identifier. In this evaluation, the vanilla search performs better than the rule-based search. The search rules are unable to find all documents with the specified project number. Built-out rules in a post-pilot scenario could improve on these results, as they would be able to account for issues such as poor scan quality misreading 0’s as o’s (the FAST software appeared to be better able to read OCRed text in documents with poor scan quality, as discussed in the “Content Harvesting, Analysis, and Conversion” section).

pilot Findability Report II-95 Table II-A-12a. Relevancy Criteria and Search Steps: Work Orders for Project Number 0337- 122-F14 Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document contains a work order 2) Work order is for Project 0337- 122-F14, UPC 14672, or Contract ID V00014672C01 or C00014672C01. 1) Search: 0337-122- F14 work order 1) Type 0337-122-F14 in auto- suggest, select the corresponding project number, and search. 2) Select Content Type = Work Order. 3) Look for relevant documents Table II-A-12b. Recall Metrics: Work Orders for Project Number 0337-122-F14 Recall Metric Vanilla Rule-Based Number of known relevant documents 34 Number of relevant documents in top 30 results 27 18 Recall in top 30 results 79% 53% While the first few test cases in this information need category considered daily work reports and work orders, the following test case (Tables II-A-13a and II-A-13b) searches for Source of Materials forms for a specific project. In this test case, the vanilla search is unable to prioritize Source of Materials forms. The first set of documents found in the vanilla search consists of work orders, instead of Source of Materials forms. Following the work order results, most of the documents are Source of Materials forms; the precision would be higher if including a greater number of results (beyond the top 20 documents) in the testing. Meanwhile, the rule-based search has extremely high precision, likely due to the consistency in the form structure for C-25s. It is able to identify the content type using the built rules with high precision. Table II-A-13a. Relevancy Criteria and Search Steps: Source of Materials Forms for Project 0615-047-169 Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document contains a Source of Materials form. 2) Work order is for Project 0615- 047-169, UPC 50057, or Contract ID U00050057C01 or C00050057C01. 1) Search: 0615-047- 169 Form C-25 1) Type 0615-047-169 in auto-suggest and select project, then search. 2) Select Content Type = Source of materials.

II-96 improving Findability and Relevance of Transportation information Table II-A-13b. Precision Metrics: Source of Materials Forms for Project 0615-047-169 Precision Metric Vanilla Rule-Based Total number of results (excluding duplicates) 42 17 Position of first relevant document 16 1 Number of relevant documents in top 20 results (or in all results if fewer than 20) 5 17 Percentage of documents in top 20 results (or in all results if fewer than 20) that are relevant 25% 100% Documents needed to find 10 relevant results 28 10 A final test case for this information need category demonstrates how the rule-based search can help identify documents from a project when using an incorrect project identifier (Tables II-A-14a and II-A- 14b). In this case, there is much higher recall in the rule-based search. The vanilla search has high precision, but finds a limited number of results because the contract ID identified in the work orders is “C00050057C01” instead of “U00050057C01.” If the user typed “C00050057C01” instead of “U00050057C01” in the vanilla search, recall would be significantly higher in the top 30 results. If the user typed “C00050057C01” in the rule-based search, auto-suggest would not complete this, which would suggest to the user to look up a different contract ID. The rule-based search is able to map the U00050057C01 to the project number, which appears in the documents. Table II-A-14a. Relevancy Criteria and Search Steps: Work Orders for Contract ID U00050057C01 Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document contains a work order 2) Work order is for Project 0615- 047-169, UPC 50057, or Contract ID U00050057C01 or C00050057C01. 1) Search U00050057C01 work order 1) Type U00050057C01 in auto- suggest and select (NFO)0615-047- 169, then search. 2) Select Content Type = Work order. 3) Look for relevant documents. Table II-A-14b. Recall Metrics: Work Orders for Contract ID U00050057C01 Recall Metric Vanilla Rule-Based Number of known relevant documents 26 Number of relevant documents in top 30 results 4 24 Recall in top 30 results 15% 92% Search Across Projects – Find Projects with Item, Material, Construction Technique The next set of test cases allows users to find documents or projects using a particular pay item, material, construction technique, or other identifier. The first test case in this set (Tables II-A-15a and II-A-15b) searches for daily work reports that include pay item 12600.

pilot Findability Report II-97 In this evaluation, the vanilla search is able to find a set of daily work reports that the rule-based search fails to classify. Both have 100% precision, but the vanilla search has higher recall in this case. A further built-out rule-based search would include rules that would better identify these documents as daily work reports, and in turn would likely have a similar recall to the vanilla search. Table II-A-15a. Relevancy Criteria and Search Steps: Daily Work Reports including Pay Item 12600 Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document is a daily work report 2) Daily work report has a pay item for 12600, or “Std. Comb. Curb & Gutter CG-6.” 1) Search: “12600” + “daily work report” 1) Type 12600 in auto-suggest, select “STD. COMB. CURB & GUTTER CG-6,” then search. 2) Select content type = Daily work report. Table II-A-15b. Precision Metrics: Daily Work Reports including Pay Item 12600 Precision Metric Vanilla Rule-Based Total number of results (excluding duplicates) 14 7 Position of first relevant document 1 1 Number of relevant documents in top 20 results (or in all results if fewer than 20) 14 7 Percentage of documents in top 20 results (or in all results if fewer than 20) that are relevant 100% 100% Documents needed to find 5 relevant results 5 5 While the previous test case searches for documents by pay item number, the following test case searches for documents by the pay item name (Tables II-A-16a and II-A-16b). In this case, the vanilla search finds a number of documents with references to pay item Underdrain UD-4 (which is a more common pay item). But, it is unable to distinguish UD-4 from UD-2. The vanilla search also finds some documents that do not reference underdrains at all (even UD-4), but instead likely match the “UD” part of the search to other words in the documents such as “include”. In this textual search for pay items, the rule-based search provides a more precise match to the desired terms. Table II-A-16a. Relevancy Criteria and Search Steps: Documents including Pay Item for Underdrain UD-2 Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document has a pay item for 00585, 00598, “Underdrain UD-2,” or “Underdrain Modified UD-2.” 1) Search: “underdrain” + “UD-2” 1) Type UD-2 in auto-suggest, select “Underdrain UD-2” then search.

II-98 improving Findability and Relevance of Transportation information Table II-A-16b. Precision Metrics: Documents including Pay Item for Underdrain UD-2 Precision Metric Vanilla Rule-Based Total number of results (excluding duplicates) 96 19 Position of first relevant document 10 1 Number of relevant documents in top 20 results (or in all results if fewer than 20) 3 19 Percentage of documents in top 20 results (or in all results if fewer than 20) that are relevant 15% 100% Documents needed to find 10 relevant results More than 30 10 The following test case attempts to identify Source of Materials forms referencing a specific supplier (Tables II-A-17a and II-A-17b). The main difference between the vanilla and rule-based searches in this example is that the vanilla search returns some letters referencing Korman Signs and Source of Materials forms. This results in lower precision than in the rule-based search, where all 49 documents are relevant. The rule-based search is able to identify that these letters are simply referencing but do not contain Source of Materials forms, so does not include them in the results. Meanwhile, the vanilla search finds the “Source of Materials” reference text and includes the documents. Table II-A-17a. Relevancy Criteria and Search Steps: Source of Materials Forms for Korman Signs Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document contains a Source of Materials form 2) Source of Materials form contains entry for Korman Signs 1) Search: “Korman Signs” + “Source of Materials” 1) Type and select Korman Signs in auto-suggest, then search. 2) Select Content Type = Source of materials. Table II-A-17b. Precision Metrics: Source of Materials Forms for Korman Signs Precision Metric Vanilla Rule-Based Total number of results (excluding duplicates) 65 49 Position of first relevant document 1 1 Number of relevant documents in top 20 results (or in all results if fewer than 20) 16 20 Percentage of documents in top 20 results (or in all results if fewer than 20) that are relevant 80% 100% Documents needed to find 10 relevant results 14 10 The rule-based search is able to expand on this type of search, and add additional criteria. The following test case (Tables II-A-18a and II-A-18b) similarly searches for Source of Materials forms containing a specific supplier, but adds that the project should be for a primary road. This structured information is available in the project list, so the rule-based search is able to search for a project identifier and link it to the road type.

pilot Findability Report II-99 The vanilla search finds more documents, but is unable to classify them by road system. A high proportion of all projects are for primary roads, so the vanilla search performs fairly well; however, it would not perform as well for other road types. The recall on the rule-based search is not as high as on the vanilla search, as the rule-based search does not account for company misspellings. This could be adapted in a fully built-out rule-based search function. Table II-A-18a. Relevancy Criteria and Search Steps: Source of Materials Forms for Asphalt Emulsion Inc. on a Primary Road Project Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document contains a Source of Materials form 2) Source of Materials form contains entry for Asphalt Emulsion Inc 3) Source of Materials form is for a project for a road system classified as “primary.” 1) Search: “Asphalt Emulsion Inc. Source of Materials” 1) Type and select Asphalt Emulsion Inc in auto-suggest, then search. 2) Select Road System = Primary. 3) Select Content Type = Source of materials. Table II-A-18b. Precision Metrics: Source of Materials Forms for Asphalt Emulsion Inc. on a Primary Road Project Precision Metric Vanilla Rule-Based Total number of results (excluding duplicates) 58 11 Position of first relevant document 4 1 Number of relevant documents in top 20 results (or in all results if fewer than 20) 13 11 Percentage of documents in top 20 results (or in all results if fewer than 20) that are relevant 65% 100% Documents needed to find 10 relevant results 16 10 In a related test case (Tables II-A-19a and II-A-19b), the rule-based search allows the user to limit content to Source of Materials forms and a specific manufacturer, with the goal of identifying a particular contractor that the manufacturer supplied (using name recognition). In this test case, the metrics are the total number of results in the specified search, and the number of documents that the user would need to read to identify the contractor (i.e., the first document in which the contractor appears). The list of contractors in the rule-based search under the “Contractors” facet can quickly provide name recognition, as is the case in finding “Branscome Inc.” and “Slurry Pavers, Inc.” in this test case. For these two instances, the user would not need to read any documents to identify these contractors in the rule-based search because they appear in the Contractors Facet List. However, if a contractor is not one of the most frequent contractors appearing in the results, it will not show up in the Contractors Facet List. In these cases, finding the contractor in the rule-based search requires a similar approach to the vanilla search – opening each of the documents and reading the name of the contractor until it is recognized (as was the case for finding “Curtis Contracting, Inc.”). If the number of contractors appearing in the facet list remains small for usability, there is a smaller advantage over the vanilla search.

II-100 improving Findability and Relevance of Transportation information Table II-A-19a. Relevancy Criteria and Search Steps: Name Recognition of Contractor that Korman Signs Supplied Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document contains a Source of Materials form 2) Source of Materials form contains entry for Korman Signs 1) Search: “Korman Signs” + “Source of Materials” 1) Type and select Korman Signs in auto-suggest, then search. 2) Select Content Type = Source of materials. Table II-A-19b. Metrics: Name Recognition of Contractor that Korman Signs Supplied Precision Metric Vanilla Rule-Based Total number of results (excluding duplicates) 69 62 Documents Read to find “Branscome Inc.” 17 0 (Contractors List) Documents Read to find “Slurry Pavers, Inc.” 32 0 (Contractors List) Documents Read to find “Curtis Contracting, Inc.” 11 9 The rule-based search can also help a user identify when a search is incorrectly specified (e.g., misspelled). A vanilla search will take the misspelled search entry and return zero relevant results (unless a document contains the same misspelling). Meanwhile, the auto-suggest feature of the rule- based search suggests terms based on the built ontology. The following test case example (Tables II-A- 20a and II-A-20b) provides evaluation results for a search in which the user misspells the supplier name. In the vanilla search, the user searches with this misspelled name. In the rule-based search, the user begins to type the misspelled contractor name then selects the correctly spelled contractor name using the auto-suggest feature. Alternatively, the user could enter and search by the incorrect contractor name in the rule-based search, then select the correctly spelled name from the list of possible topics. Due to the misspelling, the vanilla search does not return any results. Because the rule-based search proposes a spelling correction as the supplier is typed, it is able to match documents to the correct supplier. Table II-A-20a. Relevancy Criteria and Search Steps: Source of Materials Form with “Kormen” Signs Supplier Misspelling Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document contains a Source of Materials form 2) Source of Materials form contains entry for Kormen Signs or Korman Signs 1) Search: Kormen Signs Source of Materials 1) Type “Korm” in auto-suggest, select Korman Signs, then search. Alternatively, type “Kormen signs” and search, then select “Korman Signs” from the list of possible topics. 2) Select Content Type = Source of materials.

pilot Findability Report II-101 Table II-A-20b. Precision Metrics: Source of Materials Form with “Kormen” Signs Supplier Misspelling Precision Metric Vanilla Rule-Based Total number of results (excluding duplicates) 0 49 Position of first relevant document N/A 1 Number of relevant documents in top 20 results (or in all results if fewer than 20) 0 20 Percentage of documents in top 20 results (or in all results if fewer than 20) that are relevant 0% 100% Documents needed to find 10 relevant results 0 Results 10 The rule-based search is also able to find documents based on project location. It does this in two ways: by linking the project identifier to the defined project information (e.g., district, route, etc.) using the project list, and by finding the location information directly in the document. The following test case (Tables II-A-21a and II-A-21b) evaluates this ability in a search for documents on a specific route, and finds considerably higher precision in the rule-based search. Table II-A-21a. Relevancy Criteria and Search Steps: Documents for Projects on Route 679 Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document is on a project for Route 679 1) Search: “Route 679”. 1) Type Route 679 in auto-suggest, select “State Route 679”, and search. Table II-A-21b. Precision Metrics: Documents for Projects on Route 679 Precision Metric Vanilla Rule-Based Total number of results (excluding duplicates) 51 17 Position of first relevant document 1 1 Number of relevant documents in top 20 results (or in all results if fewer than 20) 2 17 Percentage of documents in top 20 results (or in all results if fewer than 20) that are relevant 10% 100% Documents needed to find 10 relevant results More than 30 10 The following test case (Tables II-A-22a and II-A-22b) further limits the content to work orders on a specific Interstate route within a specific district. In this example, the vanilla search is unable to identify both the district and the route of the project. Meanwhile, the rule-based search is able to identify both pieces with precision.

II-102 improving Findability and Relevance of Transportation information Table II-A-22a. Relevancy Criteria and Search Steps: Work Orders Related to Route I-95 in Richmond District Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document contains a work order 2) Work order is about Route I-95 3) Work order is for project in Richmond District 1) Search: I-95 work order Richmond District 1) Type and select I-95 in auto- suggest, then search. 2) Select Content Type = Work order. 3) Select District = Richmond District. Table II-A-22b. Precision Metrics: Work Orders Related to Route I-95 in Richmond District Precision Metric Vanilla Rule-Based Total number of results (excluding duplicates) 106 6 Position of first relevant document 0 in top 30 1 Number of relevant documents in top 20 results (or in all results if fewer than 20) 0 6 Percentage of documents in top 20 results (or in all results if fewer than 20) that are relevant 0% 100% Documents needed to find 5 relevant results 0 in top 30 5 Research Reasons for Delays and Changes The final set of test cases examines the ability to research reasons for delays and changes in work. These reasons often appear initially in daily work reports, and then are included in work orders, which define material changes to a project. As noted in the “Rule Development and Refinement” section, two standard section headings (structural elements) signified a location within work orders to search for work issues. The following metrics include an evaluation of precision with and without this rule as part of the rule-based search. The evaluation tables that follow differentiate between “Rule-Based” (which does not include this rule) and “Rule-Based (Headings)” (which does include this rule). The search steps are the same for each rule-based evaluation, as the only differentiation occurs in the rules on the back-end. The first test case (Tables II-A-23a and II-A-23b) evaluates the ability of the different search methods to find work orders related to utility issues. In this test case, the vanilla search results often include a list of VDOT-specified work order categories at the end, with an accompanying box marked for each issue that applied. This list includes a VDOT-specified “UTIL” category with the description: “Delays caused by utility issues.” Because of the presence of this list, the vanilla search matches the search for “utility” to this description even though the utility list item does not apply for the specified document. Meanwhile, the rule-based search is able to identify work orders related to utility issues with 75% precision in the top 20 results. The rule-based search includes a rule to exclude this list of VDOT- specified work order categories, which along with the other built-in rules improves the precision of the results over the vanilla search. Including the section heading elements in the rule-based search further increases the precision to 100%; however, it may decrease recall from the other rule-based search (under the assumption that additional relevant results would have been found in the other rule-based search if additional documents were reviewed beyond the top 20). As noted in the “Testing and

pilot Findability Report II-103 Subjective Evaluation” section, the recall of the rule-based search that uses section headers could be increased using additional variations for a more complete application. Table II-A-23a. Relevancy Criteria and Search Steps: Work Orders Related to Utility Issues Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document contains a work order 2) Work order is about or due to a utility issue 1) Search: Utility work order 1) Type and select Utility issue in auto-suggest, then search. 2) Select Content Type = Work order. Table II-A-23b. Precision Metrics: Work Orders Related to Utility Issues Precision Metric Vanilla Rule-Based Rule-Based (Headings) Total number of results (excluding duplicates) 974 222 16 Position of first relevant document 1 1 1 Number of relevant documents in top 20 results (or in all results if fewer than 20) 3 15 16 Percentage of documents in top 20 results (or in all results if fewer than 20) that are relevant 15% 75% 100% Documents needed to find 10 relevant results More than 30 13 10 The research team also built out rules for a number of other issues, focusing in particular on drainage, utilities, and weather issues. To develop estimates of effectiveness compared to a vanilla search, the following test case examines a set of 30 rule-based search results for each of these three work issues, specifying only the work issue and the content type (work order). It calculates the percentage of documents containing the work issue term within the document (e.g., the percentage of documents containing the term “drainage” for a drainage issue). Tables II-A-24a and II-A-24b provide the results of these searches for the rule-based search that does not include rules for the standard section headings (in order to provide a greater number of results for the evaluation). For the utilities issue search, daily reports of construction incorrectly classify as work orders because of two sections with “utility” in the title. Since the majority of the first 30 documents for utilities are of this type, these are not included in the count. Across the three work issues, utilities issue has the lowest percentage of documents classified that contain the name of the issue within the document. This is similarly true for the phrases used to build the classification rules: “utility” or “utilities” appear in 26% of the phrases used to build the utilities issue rules, “weather” appears in 31% of the phrases used to build the weather issue rules, and “drainage” appears in 53% of the phrases used to build the drainage issues rules (as specified in Annex 1). The high percentage of these documents that contain the search term in them decreases the advantage of using classification rules, as a plain search on “drainage,” for example, would be expected to capture a considerable number of the drainage issue documents. Meanwhile, a plain search for “utility” or “utilities” is less likely to be successful in comparison to the rule-based search. Further building out the rules to include additional phrases not containing the “title” word would increase this benefit of using a rule-based search.

II-104 improving Findability and Relevance of Transportation information Table II-A-24a. Relevancy Criteria and Search Steps: Work Orders that Directly Reference the Work Issue Relevancy Rule-Based Search Steps 1) Document is a work order. 2) Document is tagged with a specified work issue. 1) Type and select desired issue in auto-suggest, then search. 2) Select Content Type = Work Order 3) Search document to see if the issue label appears within the document text: “drainage” for a “drainage issue”, “utility” for a “utilities issue”, and “weather” for a “weather issue”. Table II-A-24b. Metrics: Work Orders that Directly Reference the Work Issue Metric Drainage Utilities Weather Number of documents examined 30 30 30 Number of documents containing text of specified issue within document (e.g., “Drainage” is found within document tagged with “drainage issue”) 28 22 27 Percent of documents examined containing text of specified issue within document 93% 73% 90% The following test cases look more closely at weather and drainage issues. Tables II-A-25a and II-A-25b consider specifically work orders related to weather issues. Of these test cases, the rule-based search that includes the section heading rules is the only one of the searches with high precision. Again, the difference in the total number of results is noticeable, as the rule-based search returns less than 1% of the total number of results of the vanilla search. Reading through all the documents in the vanilla search would take a significant effort, but may result in finding more work orders related to weather issues than either version of the rule-based search. If the search rules were fully built-out, this expected recall gap would decrease. Notably, without the section heading rule, the rule-based search catches phrases that would apply to weather (e.g., “weather conditions”) that appear in the document outside of the work order form. Variation in how the agency provides these forms (e.g., sometimes supplemental information provided in the document specifically relates to the work order purpose, and sometimes relates to other aspects of the document such as within materials instructions) requires complex rules to increase the precision in these cases. Fully built-out rules would attempt to identify and address these subtle complexities in order to capture these cases while maintaining a high level of precision. Table II-A-25a. Relevancy Criteria and Search Steps: Work Orders Related to Weather Issues Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document contains a work order 2) Work order is about or due to a weather issue 1) Search: Weather work order 1) Type and select Weather issue in auto-suggest, then search. 2) Select Content Type = Work order.

pilot Findability Report II-105 Table II-A-25b. Precision Metrics: Work Orders Related to Weather Issues Precision Metric Vanilla Rule-Based Rule-Based (Headings) Total number of results (excluding duplicates) 1,810 48 12 Position of first relevant document 1 1 1 Number of relevant documents in top 20 results (or in all results if fewer than 20) 8 11 12 Percentage of documents in top 20 results (or in all results if fewer than 20) that are relevant 40% 55% 100% Documents needed to find 10 relevant results 36 19 10 The next test case (Tables II-A-26a and II-A-26b) examines daily work reports about drainage issues. This test case includes two different vanilla search specifications: the second expands on the first to include a number of additional terms beyond “drainage” in an attempt to construct a search that would account for synonyms in a similar way as the rule-based search. Because the first vanilla search only searches for “drainage,” a number of vanilla search result documents include a section on drainage with the phrase “there was no activity on drainage items on this date.” The rule-based search has a more limited number of results, but is highly accurate in those results. The first vanilla search has a high number of results, but would require considerable effort to read through all results to find documents related to drainage. It is also likely that fully built-out search rules would increase the number of documents found in the rule-based search by including additional synonym phrases. Since the first vanilla search has low precision and a high number of results, the second version adds a number of drainage terms that were used to build the rule-based search rules. Because the software used in the pilot has a limit on the number of characters in a search (200 in an advanced search), this search mostly uses keywords instead of the full phrases used to improve precision in the rule-based search (e.g., the FAST search includes “erosion” while the rule-based search includes rules for phrases such as “erosion control,” “erosion problem,” and “excessive erosion.”) This second test actually increases the total number of results to the full set of daily work reports because “daily work report” is the only “required” phrase. The vanilla search does not have a way to specify that “daily work report” appear in addition to one of the other phrases. Although the research team anticipated that this search specification would increase the relevancy of the top searches, the precision actually decreases from the first to the second vanilla search.

II-106 improving Findability and Relevance of Transportation information Table II-A-26a. Relevancy Criteria and Search Steps: Daily Work Reports Related to Drainage Issues Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document contains a daily work report 2) Daily work report discusses drainage issues. Version 1 1) Search: daily work report pm diary drainage Version 2 1) Search: ALL("daily work report") ANY("drainage" "underdrain" "storm drain" "erosion" "diversion ditch" "drop inlet" "wingwalls" "storm sewer" "water ponding" "drainage problem" "drainage structures" ) 1) Type and select Drainage issue in auto-suggest, then search. 2) Select Content Type = Daily work report. Table II-A-26b. Precision Metrics: Daily Work Reports Related to Drainage Issues Precision Metric Vanilla Version 1 Vanilla Version 2 Rule-Based Total number of results (excluding duplicates) 721 2,436 13 Position of first relevant document 1 1 1 Number of relevant documents in top 20 results (or in all results if fewer than 20) 9 6 11 Percentage of documents in top 20 results (or in all results if fewer than 20) that are relevant 45% 30% 85% Documents needed to find 10 relevant results 21 25 12 Because of the standard daily work report section on drainage, the following test case examines work orders related to drainage issues, and further limits these to a specific district (Tables II-A-27a and II- A-27b). Again, the vanilla search produces a high number of results. While it has high precision in the top 20 results, it finds documents that specify “drainage” and would not find the same range of drainage issues as a rule-based search. The vanilla search precision is higher for work orders than for daily work reports because the “drainage” placeholder section does not exist in work orders, and is discussed only when relevant. Notably, confirming that the document is for a project in Hampton Roads District requires an extra step of looking up the project information in the vanilla search. A fully built-out rule-based search with high precision would eliminate this step and reduce the effort required in the rule-based search compared with the vanilla search.

pilot Findability Report II-107 As in the examples measured in Tables II-A-18b and II-A-20b, the rule-based search that uses the section headings again has 100% precision. However, the reduced recall is clear in this example, as the number of relevant documents found in this search is lower than the other rule-based search and vanilla search. A fully built-out search containing the section header rule would look to capture more of the relevant documents found in the other two searches. Table II-A-27a. Relevancy Criteria and Search Steps: Work Orders Related to Drainage Issues in Hampton Roads District Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document contains a work order 2) Work order is due to or about a drainage issue. 3) Work order is for project in Hampton Roads District 1) Search: drainage work order Hampton Roads 1) Type and select Hampton Roads District in auto-suggest, then search. 2) Select Content Type = Work order. 3) Select Work issue = Drainage issue. Table II-A-27b. Precision Metrics: Work Orders Related to Drainage Issues in Hampton Roads District Precision Metric Vanilla Rule-Based Rule-Based (Headings) Total number of results (excluding duplicates) 121 46 11 Position of first relevant document 2 1 1 Number of relevant documents in top 20 results (or in all results if fewer than 20) 15 14 11 Percentage of documents in top 20 results (or in all results if fewer than 20) that are relevant 75% 70% 100% Documents needed to find 10 relevant results 14 15 10 The rule-based search is also able to identify VDOT-specified work order categories, such as the “UTIL” category discussed earlier. The next test case (Tables II-A-28a and II-A-28b) examines results from searching for the “CHAR” work order category, defined by VDOT as “Changes per Section 104.2 (Character of Work).” In this example, the rule-based search precisely defines where “CHAR” refers to the work order category, while the vanilla search cannot do so. The rule-based search excels in this test case, where the rules can focus on a specific term (“Category:”) to identify the presence of the work order category of interest. Table II-A-28a. Relevancy Criteria and Search Steps: Work Orders with “CHAR” Work Order Category Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document contains a work order 2) The work order category is listed as CHAR. 1) Search: “Category: CHAR” 1) Type and select CHAR in auto- suggest, then search. 2) Select Content Type = Work Order.

II-108 improving Findability and Relevance of Transportation information Table II-A-28b. Precision Metrics: Work Orders with “CHAR” Work Order Category Precision Metric Vanilla Rule-Based Total number of results (excluding duplicates) 112 24 Position of first relevant document 1 1 Number of relevant documents in top 20 results (or in all results if fewer than 20) 2 20 Percentage of documents in top 20 results (or in all results if fewer than 20) that are relevant 10% 100% Documents needed to find 10 relevant results More than 30 10 The final test case (Tables II-A-29a and II-A-29b) extends the prior example to identify work orders on a specific project with the specified work order category. In this case, the “VDOT” work order category is used, with the idea that a search including the term “VDOT” may lower precision when searching across VDOT documents. In this test case, the vanilla search is unable to distinguish where in the document “VDOT” appears. So, if it finds work order, category, and VDOT, it matches to the document, resulting in low precision. The rule-based search has higher precision for this. However, it also has lower recall than the vanilla search (possibly due to fewer matches with the project number). Table II-A-29a. Relevancy Criteria and Search Steps: Work Orders on Project Number 0337- 122-F14 with “VDOT” Work Order Category Relevancy Vanilla Search Steps Rule-Based Search Steps 1) Document contains a work order 2) Work order is for project 0337- 122-F14 3) The work order category is listed as VDOT. 1) Search: “0337-122- F14” + “Category: VDOT” 1) Type and select (FO)0337-122-F14 in auto-suggest, then search. 2) Select Content Type = Work Order. 3) Work order category = VDOT. Table II-A-29b. Precision Metrics: Work Orders on Project Number 0337-122-F14 with “VDOT” Work Order Category Precision Metric Vanilla Rule-Based Total number of results (excluding duplicates) 32 6 Position of first relevant document 1 1 Number of relevant documents in top 20 results (or in all results if fewer than 20) 9 5 Percentage of documents in top 20 results (or in all results if fewer than 20) that are relevant 45% 83% Documents needed to find 5 relevant results 10 6

Abbreviations and acronyms used without definitions in TRB publications: A4A Airlines for America AAAE American Association of Airport Executives AASHO American Association of State Highway Officials AASHTO American Association of State Highway and Transportation Officials ACI–NA Airports Council International–North America ACRP Airport Cooperative Research Program ADA Americans with Disabilities Act APTA American Public Transportation Association ASCE American Society of Civil Engineers ASME American Society of Mechanical Engineers ASTM American Society for Testing and Materials ATA American Trucking Associations CTAA Community Transportation Association of America CTBSSP Commercial Truck and Bus Safety Synthesis Program DHS Department of Homeland Security DOE Department of Energy EPA Environmental Protection Agency FAA Federal Aviation Administration FAST Fixing America’s Surface Transportation Act (2015) FHWA Federal Highway Administration FMCSA Federal Motor Carrier Safety Administration FRA Federal Railroad Administration FTA Federal Transit Administration HMCRP Hazardous Materials Cooperative Research Program IEEE Institute of Electrical and Electronics Engineers ISTEA Intermodal Surface Transportation Efficiency Act of 1991 ITE Institute of Transportation Engineers MAP-21 Moving Ahead for Progress in the 21st Century Act (2012) NASA National Aeronautics and Space Administration NASAO National Association of State Aviation Officials NCFRP National Cooperative Freight Research Program NCHRP National Cooperative Highway Research Program NHTSA National Highway Traffic Safety Administration NTSB National Transportation Safety Board PHMSA Pipeline and Hazardous Materials Safety Administration RITA Research and Innovative Technology Administration SAE Society of Automotive Engineers SAFETEA-LU Safe, Accountable, Flexible, Efficient Transportation Equity Act: A Legacy for Users (2005) TCRP Transit Cooperative Research Program TDC Transit Development Corporation TEA-21 Transportation Equity Act for the 21st Century (1998) TRB Transportation Research Board TSA Transportation Security Administration U.S.DOT United States Department of Transportation

TRA N SPO RTATIO N RESEA RCH BO A RD 500 Fifth Street, N W W ashington, D C 20001 A D D RESS SERV ICE REQ U ESTED N O N -PR O FIT O R G . U .S. PO STA G E PA ID C O LU M B IA , M D PER M IT N O . 88 Im proving Findability and Relevance of Transportation Inform ation N CH RP Research Report 846 TRB ISBN 978-0-309-44635-8 9 7 8 0 3 0 9 4 4 6 3 5 8 9 0 0 0 0

Improving Findability and Relevance of Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research Get This Book
×
 Improving Findability and Relevance of  Transportation Information: Volume I—A Guide for State Transportation Agencies, and Volume II—Background Research
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

TRB's National Cooperative Highway Research Program (NCHRP) Report 846: Improving Findability and Relevance of Transportation Information (Volumes I and II) provides practices and tools to facilitate on-demand retrieval of useful information stored in project files, libraries, and other agency archives. The report defines a management framework for classification, search, and retrieval of transportation information; documents successful practices for organizing and classifying information that can be adapted to search and retrieval of the diversity of information a transportation agency creates and uses; develops federated or enterprise search procedures that an agency can use to make transportation information available to users, subject to concerns for security and confidentiality; and demonstrates implementation of the management framework, information organization and classification practices, and search procedures. Volumes I and II provide guidance and background information designed to assist agencies to tailor findability procedures and tools to meet their particular needs.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!