Read "Designing the Archive for SHRP 2 Reliability and Reliability-Related Data" at NAP.edu

« Previous: Chapter 6 - User Guide Working with the Archive

Page 79

Suggested Citation:"Chapter 7 - System High-Level Architecture." Transportation Research Board. 2014. Designing the Archive for SHRP 2 Reliability and Reliability-Related Data. Washington, DC: The National Academies Press. doi: 10.17226/22281.

Page 80

Page 81

Page 82

Page 83

Page 84

Page 85

Page 86

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

79 C h a p t e r 7 The SHRP 2 Archive system consists of the following components: â¢ Amazon Web Services (AWS); â¢ Apache HTTP server; â¢ WordPress system with specific SHRP 2 plugins and themes; â¢ MySQL database; â¢ Tomcat application server; â¢ Solr search engine; and â¢ S2A server. These components are interconnected, as shown in Fig- ure 7.1. Detailed information on the system components is provided below. 7.1 amazon Web Services AWS is a bundle of remote computing services that provides a cloud-computing platform offered over the Internet. Both the L13 report and L13A teamâs assessments indicated that the cloud-based service is a viable solution for hosting the Archive. From the project teamâs point of view, the architecture pro- posed in the L13 report (See Section 3.1.10 earlier in this report) was slightly outdated. To that end, the team modified the proposed architecture and leveraged the extensive cloud- based services Amazon provides to the public. The team deployed the Archive system on a bundle consisting of the following components: â¢ Amazon Elastic Compute Cloud (EC2). EC2 provides virtual servers and is delivered on the CentOS operating system. EC2 manages the data and information via Elastic Block Storage (EBS). EBS provides volume-based storage that has a separate life span and can be attached to any instance. EBS module size is 200 GB and can be resized. For now the team has used the medium M3 instance for the EC2 module. It should be noted that in the design, the team has not implemented a hot standby instance as a backup for cases in which the operation of the EC2 module fails. Amazon guarantees uptime of more than 99%. In case of any poten- tial failure, the administration team can set up another instance in a couple of hours. â¢ Amazon Relational Database Service (RDS). Database admin- istration (e.g., configuration, backup, monitoring resource consumption) can be an expensive and error-prone task. The purpose of this module is to provide a relational database service via Amazon cloud that helps users save money and avoid errors. RDS supports three popular relational data- bases: MySQL, SQL Server, and Oracle. The Archive uses MySQL for managing its database system. As of April 2014, the size of the database was 500 GB. The service is elastic. Therefore scaling up the storage is easy. â¢ Simple Storage Service (S3)/Glacier. This service is used to back up the database and the file system. The Archive backs up the contents of the EBS daily and the RDS biweekly on S3. S3 keeps the data for 1 month and then moves them to Glacier, a cost-efficient archival storage system with very high availability and very low failure rate. It should be noted that sending and retrieving data to and from Glacier can take time. The size of the S3 storage service is 2 TB (as of April 2014). 7.2 Wordpress WordPress is one of the most popular open source content management and blogging systems available. WordPress was selected as the core CMS of the Archive after a thorough assess- ment of various COTS CMSs (see Section 3.5.4 for more information). WordPress requires a web server with PHP support, a URL rewriting facility, and an instance of MySQL. The Archive sys- tem uses Apache as the HTTP server. Apache is a preferred option that developers normally implement with WordPress because it provides PHP interpretation and URL rewriting. System High-Level Architecture

80 7.2.1 Themes The WordPress theme is the face and graphical aspect of the website which encompasses the entire user experience. There- fore, the appearance of user interface is built on the basis of a theme. A theme is a bundle of template files (PHP files to pro- vide logic and structure), CSS files (to keep the style), images, and JavaScripts. There are many WordPress theme resources available that can be used directly or customized. The SHRP 2 Archive theme is a child theme of WordPressâs Twenty Eleven general theme. The SHRP 2 Archive theme was customized for the Archive user interface. 7.2.1.1 Key Open Source JavaScript Libraries Used in the Archive JavaScript works within WordPress. It can be used within WordPress template files in WordPress themes or child themes. As recommended by the L13 report, the project team effort was to use open source libraries as much as possible. Table 7.1 sum- marizes the list of open source JavaScript libraries used to deliver some of the core functionalities of the Archive system. 7.2.2 Plugins In WordPress, a plugin is a PHP file that provides specific func- tionality to a website. It allows the theme to achieve a certain objective and help users tailor the website for their specific needs. Table 7.2 shows the list of plugins used for the Archive. Table 7.1. Open Source JavaScript Libraries JavaScript Description Recline Library to build data applications. It can be inte- grated with Leaflet, Slickgrid, and Highcharts. This library was used as a platform that delivers the visualization functionalities on the Data tab located on top of the data set pages. Slickgrid Grid/spreadsheet view of the data sets Highcharts Data set plots and graphs Leaflet Interactive maps features (i.e., markers, overlap- ping marker spiderfier) Cloudmade Map tiles based on OpenStreetMap. At the time of writing this report, Cloudmade stopped providing the free service. The team is looking into finding alternatives, such as Google or Nokia. WordPress Framework Web Browser - Dashboard - Search - Filters - Maps - Visualization - Download - Comment - Ranking - Admin SHRP 2 Theme Recline JS SHRP2 Workflow S2 Comment Form Theme My Login Solr Plugin Database (MySQL) - WordPress Content - Metadata - Indexed Tables File System S2A Server - Indexing Datasets - Validation - Loading Artifacts to Solr Solr Search Engine Tomcat SHRP2 Ingestion Apache Internet SHRP 2 Archive Java EC2 EC2 EC2 RDS EBS CustomMeta L13A WP-Admin Custom Email Attribute Slickgrid JS HighChart JS Theme JS Plugin AWS Element Figure 7.1. Components of SHRP 2 Archive.

81 â¢ Database table, which is used for visualizing and filtering the data sets. The system generates these tables automati- cally by converting .csv files to database tables during the back-end processing (see Figure 5.3). 7.3.1 Database Diagram Figure 7.2 provides a visual overview of the SHRP 2 Archive database and the relations between the tables required to oper- ate the Archive. Tables starting with âs2_dset_â are converted from original data set files in .csv format. The general naming convention for a data set table is âs2_dset_ArtifactIDâ; ArtifactID is a unique identifier that is assigned to each file (artifact) by WordPress. Note that the s2_dset_1001 table is only an example of a data set table. 7.3.2 Overview of Database Tables Table 7.3 lists database tables for the Archive. Table 7.4 to Table 7.8 show fields in tables created or modi- fied for the Archive. Table names starting with âs2â represent relations specifically created for the Archive system. 7.4 Solr Search engine Server Solr is an open source enterprise search engine that performs keyword search on the Archive. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Jetty. Solr uses the Apache Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JavaScript Object Notation (JSON) APIs that make it easy to use from virtually any pro- gramming language (Apache Lucene 2014). The Archiveâs Solr engine has been installed on Apache Tomcat. Solr indexes any artifact and metadata being uploaded into the Archive before they become available on the Archive. 7.5 S2a Server S2A server is a back-end module, written in Java, to manage each artifactâs workflow and processing states in the Archive. Depending on the type, an artifact goes through different back-end processes. The workflow controls various processing paths that an artifact goes through, from the time it is uploaded into the Archive until the moment it becomes available in (or gets deleted from) the Archive. S2A core functionalities are listed in Subsection 5.2.4. (Step 4. Back-End Processing). There are three state variables by which the status of an artifact in the Archive is defined. These variables are 7.3 MySQL Database The only database that is supported by WordPress is MySQL version 5.0.15 or greater (the version number may change later). For most applications WordPress normally deals with the database by itself. So the developer does not need to worry about the structure and the design of the database. However, for this project, the development team has cus- tomized the database. The customization was implemented in two forms: modifying existing WordPress tables and add- ing new tables. Section 7.3.1 and Section 7.3.2 review the native WordPress tables and the SHRP 2âspecific tables in more detail. Note that the Archive stores data sets in two formats: â¢ Flat file, which is the original .csv format kept in Word- Pressâs file system; and Table 7.2. Plugins Used in Archive Plugin Description Attributes plugin Used to handle inappropriate content, ratings, and such. This feature was implemented into the system but is not being used. Custom e-mail Sends custom e-mail from SHRP 2 Archive plugins and adds a custom registration e-mail. L13A ingestion Implements the custom file ingestion process for the L13A Reliability data archive. L13A WordPress-Admin Restriction Mod Hides the WordPress admin banner on top of the site. Meteor Slides Easily creates responsive slideshows with WordPress that are mobile friendly and simple to customize. In the SHRP 2 Archive system, the administrator has the ability to insert a slideshow at the homepage. S2 comment form A plugin to add custom fields to the comment form. SHRP 2 Custom Meta This plugin defines and enables custom metadata fields. SHRP 2 Workflow Enables administrators to manage artifacts in the SHRP 2 Archive. Solr for WordPress Indexes, removes, and updates documents in the Solr search engine. Theme My Login Themes the WordPress log in, regis- tration, and forgot password pages according to your theme. (text continues on page 85)

82 s2_artifact_relations artifact_id INT(11) relation_id INT(11) Indexes PRIMARY relations_relation_id s2_col_types id BIGINT(20) idx INT(10) name VARCHAR(80) type INT(10) label VARCHAR(80) min_range VARCHAR(80) max_range VARCHAR(80) Indexes s2_dset_1001 _rowid__ BIGINT(20) STN_ID DOUBLE HOUR DOUBLE FFS DOUBLE DOW DOUBLE DOW_NO_HOLIDAY DOUBLE SHRP_SECTION DOUBLE LANE_QTY DOUBLE DIR_TXT VARCHAR(2) DATE DATE STA_TIME TIME LANES_REPORTING DOUBLE VOLUME DOUBLE SPEED DOUBLE VMT DOUBLE VHT DOUBLE SMS DOUBLE DELAY DOUBLE ROUTE VARCHAR(4) Mile_Post DOUBLE Indexes PRIMARY STN_ID HOUR FFS DOW DOW_NO_HOLIDAY SHRP_SECTION LANE_QTY DIR_TXT DATE STA_TIME LANES_REPORTING VOLUME SPEED VMT VHT SMS DELAY ROUTE Mile_Post s2_events rec_id INT(10) wp_id BIGINT(20) event_id INT(10) dt TIMESTAMP ok TINYINT(1) msg VARCHAR(80) Indexes PRIMARY wp_posts ID BIGINT(20) post_author BIGINT(20) post_date DATETIME post_date_gmt DATETIME post_content LONGTEXT post_title TEXT post_excerpt TEXT post_status VARCHAR(20) comment_status VARCHAR(20) ping_status VARCHAR(20) post_password VARCHAR(20) post_name VARCHAR(200) to_ping TEXT pinged TEXT post_modified DATETIME post_modified_gmt DATETIME post_content_filtered LONGTEXT post_parent BIGINT(20) guid VARCHAR(255) menu_order INT(11) post_type VARCHAR(20) post_mime_type VARCHAR(100) comment_count BIGINT(20) s2_wf_state INT(11) s2_proc_state INT(11) s2_proc_msg VARCHAR(80) s2_numrecs BIGINT(20) s2_numrecs_inserted BIGINT(20) s2_numrecs_rejected BIGINT(20) num_ratings INT(11) num_downloads INT(11) rating FLOAT s2_last_mod TIMESTAMP Indexes PRIMARY post_name type_status_date post_parent post_author fk_s2_events_wp_posts1f s ts stsf t t fk_s2_col_types_wp_posts1f s c l ty s tsl fk_s2_art ifact_relations_wp_posts1fk rt i f t r l t i s ti l ii l ii l Figure 7.2. SHRP 2 Archive database diagram.

83 Table 7.3. SHRP 2 Archive Database Tables Table Name Description Created By wp_commentmeta Each comment features information called the metadata, and it is stored in the wp_commentmeta table. WordPressa wp_comments The comments within WordPress are stored in the wp_comments table. WordPress wp_links The wp_links table holds information related to the links entered into the Links feature of WordPress. (This feature has been deprecated but can be reenabled with the Links Manager plugin.) WordPress wp_options The options set under the Administration > Settings panel are stored in the wp_options table. See Option Reference for option_name and default values. WordPress wp_postmeta Each post features information called the metadata, and it is stored in the wp_postmeta. Some plugins may add their own information to this table. WordPress wp_posts The core of the WordPress data is the posts. They are stored in the wp_posts table. Also pages and navigation menu items are stored in this table. This table is customized for the Archive and includes information on workflow state, record inserted, number of ratings, average rating, number of down- loads, and last time the artifact was modified. WordPress (This table is customized for the Archive.) wp_terms The categories for both posts and links and the tags for posts are found within the wp_terms table. WordPress wp_term_relationships Posts are associated with categories and tags from the wp_terms table, and this association is maintained in the wp_term_relationships table. The associations of links to their respective categories are also kept in this table. WordPress wp_term_taxonomy This table describes the taxonomy (category, link, or tag) for the entries in the wp_terms table. WordPress wp_usermeta Each user features information called the metadata, and it is stored in wp_usermeta. WordPress wp_users The list of users is maintained in table wp_users. WordPress (This table is customized for the Archive.) s2_artifact_relations The table stores the relationships among artifacts. L13A team s2_col_types The column types of each data set are stored in this table. L13A team s2_dset_ArtifactID Artifact_ID represents the artifact ID number of a data set (automatically generated by WordPress). This table stores the content of a data set and is used for visualizing and filtering. L13A team s2_events This table stores the ingestion state of all the artifacts. L13A team a For more information on WordPress database descriptions, visit http://codex.wordpress.org/Database_Description. Table 7.4. S2_artifact_relations Table Fields Field Type Null Key Default Extra artifact_id int (11) NO PRI relation_id int (11) NO PRI Table 7.5. S2_col_types Table Fields Field Type Null Key Default Extra id bigint (20) unsigned NO PRI idx int (10) unsigned NO PRI name varchar (80) NO type int (10) unsigned NO label varchar (80) NO min_range varchar (80) YES max_range varchar (80) YES

84 Table 7.6. S2_dset_ArtifactID Table Fields Field Type Null Key Default Extra _rowid__ bigint (20) NO PRI auto_increment Data set columna Column type Depends MUL a This table stores data set columns. The field and type vary depending on the data set. Table 7.7. Wp_posts Table Fields Field Type Null Key Default Extra ID bigint (20) unsigned NO PRI auto_increment post_author bigint (20) unsigned NO MUL 0 post_date datetime NO 0000-00-00 00:00:00 post_date_gmt datetime NO 0000-00-00 00:00:00 post_content longtext NO post_title text NO post_excerpt text NO post_status varchar (20) NO publish comment_status varchar (20) NO open ping_status varchar (20) NO open post_password varchar (20) NO post_name varchar (200) NO MUL to_ping text NO pinged text NO post_modified datetime NO 0000-00-00 00:00:00 post_modified_gmt datetime NO 0000-00-00 00:00:00 post_content_filtered longtext NO post_parent bigint (20) unsigned NO MUL 0 guid varchar (255) NO menu_order int (11) NO 0 post_type varchar (20) NO MUL post post_mime_type varchar (100) NO comment_count bigint (20) NO 0 s2_wf_state int (11) NO 0 s2_proc_state int (11) NO 0 s2_proc_msg varchar (80) NO s2_numrecs bigint (20) unsigned NO 0 s2_numrecs_inserted bigint (20) unsigned NO 0 s2_numrecs_rejected bigint (20) unsigned NO 0 num_ratings int (11) NO 0 num_downloads int (11) NO 0 s2_last_mod timestamp NO CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP

85 stored in the wp_posts table. Table 7.9 summarizes the state variables. â¢ s2_wf_state shows an artifactâs workflow state. Figure 7.3 depicts the various workflow states. Table 7.8. S2_events Table Fields Field Type Null Key Default Extra rec_id int (10) unsigned No PRI NULL Auto_increment wp_id bigint (20) unsigned No NULL event_id int (10) unsigned No 0 dt timestamp No CURRENT_TIMESTAMP ok tinyint (1) No 1 msg varchar (80) No â¢ s2_proc_state indicates the back-end processing status of an artifact. See Subsection 5.2.4 (Step 4. Back-End Process- ing) for more information. â¢ s2_proc_msg provides processing outcomes in a message for the creator. The message is displayed on the My Artifact list located on the My Profile page. Table 7.9. State Variables State Variable Description Values s2_wf_state Workflow approval state 0 = Ingest, the artifact is in the ingestion process but not yet submitted. This is the default state for a new artifact. 3 = Unprocessed â¢â Triggeredâby:âSubmitâbuttonâclickedâinâStepâ4âofâtheâingestionâprocess 2 = Processing â¢â Triggeredâby:âadministratorâreviewsâandâapproves 1 = Published, available for public use â¢â Triggeredâby:âS2Aâserverâcompletesâprocessing -1 = Pretrash â¢â Triggeredâby:âadministratorâsendsâartifactâtoâBinâstate -3 = Trash â¢â Triggeredâby:âS2AâserverâmovesâartifactâfromâGulagâtoâBinâstate -4â= Processing error (validation, loading, or indexing) â¢â Triggeredâby:âS2Aâserverâ(seeâs2_proc_stateâandâs2_proc_msgâforâdetails) s2_proc_state Processing state -1 = Error 0 = Unprocessed (default) 1 = Validating 2 = Validation failed (see s2_proc_msg) 3 = Loading 4â= Load failed (see s2_proc_msg) 5 = Indexing 6 = Indexing failed (see s2_proc_msg) 10 = Processing success s2_proc_msg Message for users from artifact processing âData set ingestion finished.â âCould not parse XXX fields. Optimizing table.â âCalculating column extents.â âInternal error: unknown column type.â âFailed to load.â (continued from page 81)

86 Figure 7.3. Archive processing finite state diagram.

Next: Chapter 8 - Test Plan »

Designing the Archive for SHRP 2 Reliability and Reliability-Related Data (2014)

Chapter: Chapter 7 - System High-Level Architecture

Welcome to OpenBook!

Get Email Updates