Read "Site-Based Video System Design and Development" at NAP.edu

« Previous: Chapter 5 - Conflict Metrics and Crash Surrogates

Page 36

Suggested Citation:"Chapter 6 - Site Observer Design." National Academies of Sciences, Engineering, and Medicine. 2012. Site-Based Video System Design and Development. Washington, DC: The National Academies Press. doi: 10.17226/22836.

Page 37

Page 38

Page 39

Page 40

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

36 C h a p t e r 6 The video system design needs to be robust and accurate, which normally implies the need for sensor redundancy, which in this case means using multiple cameras. Accuracy is still dependent on the limitations of the individual seniorsâ in this case the video camera. Although the error process is complex, depending on the detail and quality of tracked visual features and whether there are unknown latencies in the video capture, it is instructive to estimate fundamental limits based on the simple assumption that errors arise purely from camera image pixilation; this is the ideal case that errors caused by lack of definition in the vehicle image, lens distor- tion, inaccurate calibration, blurring through motion, loss of images through occlusion, and so forth can be adequately controlled through the use of synchronized shutter operation, image redundancy of multiple cameras, careful camera cali- bration and checking, and error reduction through Kalman filtering. In Figure 6.1, we assume that the particular vehicle comes within distance D = 10 m of the camera and that the camera is mounted at a height of d = 3 m above the feature being tracked. The pixelation error q is assumed to result from a field of view of 70Â° in the vertical sense, and with an image resolution of 640 Ã 480, the corresponding angle is 0.0025 rad. The resulting uncertainty in horizontal posi- tion is then: x d D d = + ï£«ï£ï£¬ ï£¶ï£¸ï£·Î¸ 1 2 2 which turns out to be just a little less than 10 cm. This suggests that the required tracking accuracy is feasible but only if suitable steps are taken to control errors at each step in the data capture and analysis process. This places con- siderable importance on the need for taking an essentially 3-D approach, for which feature heights are estimated as part of the estimation process. Further from the intersection center, approach behavior is likely to be of some importance, although not beyond a distance of approximately 100 m. If the cameras are all based close to the intersection, the lateral and longitudinal posi- tions of vehicles approaching are again dependent on field of view, image resolution, and baseline distance d, but in this case it may be horizontal rather than vertical (longitu- dinal position cannot be satisfactorily resolved if the camera is directly aligned with the approach lane). Then, a 10-m baseline, or offset from the vehicle approach path, assumed straight, leads through a similar calculation to a longitudinal error of approximately 2.5 m. Lateral position theoretically can be resolved to within approximately 1 m, and overall we would expect that the vehicle can be located with a preci- sion of 2â5 m at that kind of range, without requiring addi- tional cameras or other sensors positioned remotely from the intersection. Design Concept The design concept is not strictly derived from the earlier tasks, but it is strongly motivated by the positive capabili- ties and limitations of current approaches, as well as the most challenging system requirements, which can be loosely summarized: â¢ Automation: good for commercial systems but lacking in research-based systems; â¢ Accuracy: especially lacking in commercial systems and likely to be a continuing challenge for refinement; and â¢ System integration: again a weakness of all existing research- based systems. Automation and system integration will emerge naturally within the design, so the focus here is on the accuracy prob- lems and the shortfalls found in current system architectures. Basic video processing methods are heavily reliant on back- ground subtraction, which easily provides a deceptively (to the human observer) elegant method to derive a trajectory: Site Observer Design

37 cannot be used directly to infer the 2-D (plan) view of the object from the perspective projection observed. As described earlier, SAVME countered this problem by using very high camera mounting towers, whereas NGSIM also had the lux- ury of high camera positions and made improvements using 3-D rigid body vehicle shapes to replicate the effect of the ver- tical dimension. Even the IVSS study used high camera posi- tions although with problems of camera motion in windy conditions. In general, high camera locations are not avail- able, and non-ideal camera angles are forced by site access. In addition, irregular or unpredictable vehicle shapes, rapid changes in perspectives, the effects of glare, and so forth make the solid models unreliable estimators. Multicamera Feature Tracking Based on the above information, a basic architecture for the system is presented. Enhancements are to be made by com- bining feature tracking and grouping with the background subtraction and doing this simultaneously from multiple cameras with overlapping fields of view. This is expected to provide greater precision in motion estimation, though with the same 2-D projection problem limits the accuracy of abso- lute position estimation. Another significant problem, one that is never easily solved, is the need to cluster and separate features, with each cluster uniquely associated with a rigid vehicle object in the world. The perspective of the single camera again makes this harder, and although any image-based tracker is going to face this challenge at some point, it is expected that multiple views can help reduce the problem. It is worth comparing to the camera-domain image process- ing techniques, typical of most commercial and research lab systems, the architecture of which is illustrated in Figure 6.3. When information derived from video processing is incom- plete or indeterminate, iteration is needed to improve and correct estimates, as represented by the closed loop. This may require manual user intervention in the estimation, and a video archive is also needed to support this iterative loop. To Once the background is recognized, it is subtracted pixel by pixel from the immediate scene, and we can see that only the vehicles and other transient objects remain. Following the centroid of any extracted shape, we have an approximate tra- jectory in the camera image, and we are surely 90% of the way to a solutionâthe remaining 10% cannot be so hard? Unfor- tunately, this is not so. Used by SAVME and most other com- mercial systems, it supplies approximate data of transitory object locations; this is adequate to determine whether a pre- identified road section is filled (e.g., for a virtual loop detec- tor) but gives tracking results that are highly sensitive to factors such as shadows, occlusions, and the effects of a viewing angle (Figure 6.2). Here C1 has a high-level viewpoint, whereas C2 has a much lower viewing angle. The camera shadows shown are indicative of the parts of the visual scene that may be con- founded with the actual object when viewed from that particu- lar angle; the high viewing angle makes the image insensitive to the vertical dimension in the object (the vehicle to be tracked), and a reasonably accurate plan view of the object is captured. On the other hand, C2 produces a stretched image of the solid object, and although this is obvious to the human observer, the background subtraction algorithm is not so smart, and it D d x Î¸ Figure 6.1. Basic camera geometry and resulting pixelation error. 2C 1C Figure 6.2. Effect of camera viewing (altitude) angle on boundary estimation. camera (2D) Video archive user Trajectory database Research Questions occlusions Image processing Figure 6.3. Camera-domain object tracking.

38 avoid losing image quality, high volumes of video data must be recorded and retained for analysis. In the multicamera design concept, the known properties and existing algorithms of 2-D camera-domain video image processing are retained (Figure 6.4). The uncertainty in grouping and segmentation is removed from the image domain so that once processed for features, the video stream is discarded. As much basic infor- mation as necessary is extracted from the individual camera video stream, and additional compressed video is recorded to enable operator review. 3-D bounding boxes will be derived from the features and subsequently projected over the com- pressed video stream for visual confirmation. The advantages of this general approach are summarized as follows: â¢ Only existing video image processing techniques are used. â¢ No uncompressed video archive is needed. â¢ Automated processing is built into the architecture. â¢ System development comprises three parallel strands of activity (see below). â¢ The parallel architecture is inherently expandable. There is no computational constraint on using large numbers of cameras for large and complex intersections or even networks of intersections. â¢ Large baseline stereographic information is directly incor- porated, increasing the potential accuracy of the measure- ment system many times compared with that of the simple single-camera systems. System Design The basic system comprises multiple cameras (four are shown: C1, C2, C3, C4) that are simultaneously triggered via a real- time data acquisition system residing on a host computer that houses storage for both a database and compressed video. The compressed video is not part of the automatic sys- tem, being reserved for development and review purposes. In Figure 6.5, we mention MPEG-4 compression, but any form of compression, including subsampling of image frames or pixels, as well as JPEG compression of individual frames, could be used. Preprocessing of the video image streams uses proven algorithms to extract features from the video input. The syn- chronous shutter trigger is significant, ensuring that features are extracted from different cameras at the same time, to allow feature registration at a later stage. The absolute time refer- ence from GPS allows multiple systems of this type to be inte- grated at a later stage as feature databases are uploaded to a remote server. At the intersection, each camera station has a local networked computer to perform feature extraction and image compression. The results are sent via network to a host, which is also responsible for basic control functions, such as starting and stopping the local data collection and uploading the feature sets. A large-scale system might comprise several CDAPS sub- systems (Figure 6.6). In any case, the expectation is that a separate server, based at a research lab or transportation facil- ity, will host the data, and it is there that final postprocessing into trajectories and other analysis will take place at this data center. Ideally, high-speed Ethernet is available to commu- nicate to the data center, but if not, data upload can be per- formed periodically in manual fashion; in any case, without the need to capture raw video, data volumes should not be onerous. Assuming a rather generous 50 bytes per feature, and 50 features per frame, and a frame rate of 20 Hz, a system with four cameras will produce a data store that grows at less than 1 Gb per hour. This is not insignificant, but it is orders of magnitude less than the corresponding figure of 88 Gb per hour for raw video images (assuming only 1 byte per pixel monochrome images and a modest 640 Ã 480 camera resolu- tion). The feature storage also does not greatly expand with camera resolution, or with the use of color, and at times of sparse traffic the storage growth drops to zero. Thus, even without a dedicated network connection, the system can be World (3D) camera (2D) feature database(tracklets) Trajectory database Site geometry feature extraction feature matching (3D) occlusions Figure 6.4. Site Observer concept.

39 made to run for many weeks without the need to offload data, and the advantages are very clear. The design concept has other potential advantages, not least that it builds on previous work and expertise. Chal- lenges do exist in the design concept, but these are almost all concentrated at the level of algorithm performance at the integration stage: how best to carry out feature segmenta- tion and grouping using stored features and how to ensure vehicle bodies are correctly located, with features merged or separated appropriately. These are not trivial problems, but they are familiar ones and depend on geometry much more than on low-level image processing, and the multicamera 1C 2C 3C 4C FEATURE STORAGE FILE SERVER GPS RECEIVER HOST PC REALTIME PROCESSOR COMPRESSED VIDEO remote trigger ETHERNET LINK (optional)video buffer feature extraction 4MPEG compression video buffer feature extraction 4MPEG compression video buffer feature extraction 4MPEG compression video buffer feature extraction 4MPEG compression Figure 6.5. Camera, data acquisition and preprocessing system (CDAPS). Site-based System Remote Server/Control Center Figure 6.6. Integrated vehicle tracking systemâSite Observer.

40 becoming available all the time, and in the future it may be preferred to integrate the video preprocessing function within the camera body itself. Cameras are already available with powerful digital signal processors built in, and in many com- mercial vehicle-sensing systems, there is a clear trend toward locating the heavy computation within the camera. In this case, feature information may be broadcast over wireless Eth- ernet, greatly increasing installation flexibility. Although this is not the best choice for the development of the current robust prototype system, the option for future development in this direction is a major strength of the design concept. synthesis approach provides redundant information to resolve uncertainties not available for a single-camera system. Note however that, to ensure adequate coverage, the expectation is not to use stereo imaging per se; even when the same vehi- cle is seen in two cameras, the features recorded are normally expected to be from different parts of the vehicle. Expanding the concept to stereo vision may be possible in the future, but in this case it would be necessary to double the number of cameras. One further point about the architecture is its suitability for migration to future hardware. New video products are

Next: Chapter 7 - System Hardware and Site Installation »

Site-Based Video System Design and Development (2012)

Chapter: Chapter 6 - Site Observer Design

Welcome to OpenBook!

Get Email Updates