Digitizing the Shape and Appearance of Three-Dimensional Objects
Computer Science Department
There is a growing interest within the design, manufacturing, and graphics communities in building a device capable of digitizing the shape and appearance of physical objects. Depending on the type of object digitized, the resulting computer model may have many uses. For small objects, applications may include product design; reverse engineering; museum archiving; and creation of models for visual simulation, movie making, videogames, and home shopping. For large objects, applications may include architectural preservation, engineering retrofits, virtual reality flythroughs, and recording of such cultural artifacts as sculptures, historic buildings, and archeological sites. If the object is small enough and the computer model is of high enough quality, a physical replica of the object can be produced using a rapid prototyping technology such as stereolithography. This type of system, if inexpensive enough, rightfully could be called a ''3D fax machine."
Together with my students in the Stanford Computer Graphics Laboratory, I have been building a prototype for such a machine. In this paper I describe how we obtain range images of small objects using a laser-stripe triangulation scanner, how we combine these range images to produce a watertight computer model, and how we fabricate a physical replica from the model. I also summarize some preliminary results on digitizing color. Finally, I outline the future of this project and comment briefly on some possible economic implications.
Passive Versus Active Sensing
I restrict the discussion here to optical sensing technologies—that is, those technologies that employ visible light. Hence, we are concerned only with the
external surfaces of objects. In this domain, acquiring the shape and appearance of an object requires solving an inverse rendering problem: given a set of images, one must solve for scene illumination, sensor geometry, object geometry, and object reflectance. This is a central problem in the computer vision field, and if the images are acquired using passive sensing, such as a videocamera, it is a hard problem. The difficulty arises in large part from the necessity of finding corresponding features in multiple images, each of which may in itself be very complex.
However, if the images are acquired using active sensing—for instance, with a light-stripe scanner—the problem is greatly simplified. In particular, by limiting the problem domain to a stationary scanner for small objects, we can control sensor geometry and scene illumination, thereby eliminating several variables from the problem. By employing active sensing using structured light, we can independently measure geometry and reflectance, eliminating yet even more variables. Finally, by providing computer control over the operation of the scanner, we can acquire redundant data, improving the robustness (i.e., error tolerance) of the system. To bypass the many difficulties associated with passive sensing, we employ active sensing in our work.
The goal of shape digitization is to produce a seamless, occlusion-free, geometric representation of the externally visible surfaces of an object. Restricting ourselves now to active optical sensing technologies, we are aware of many devices on the market that can digitize the shape of one side of an object. These devices are generically called range finders, and their output is called a range image—a rectangular lattice of pixels, each of which contains a distance from the sensor to the object (Besl, 1989). The system we are building in the Stanford Computer Graphics Laboratory employs a modified Cyberware laser-stripe triangulation scanner as shown in Figure 1.
A harder task is to digitize objects having multiple sides and self-occlusions (i.e., parts that obscure other parts). Several methods have been proposed to solve this problem, involving scanning the object from several directions and then combining data from the individual scans. In our laboratory we have investigated methods based on fine-grained polygon meshes (Turk and Levoy, 1994) and fine-grained voxel arrays (Curless and Levoy, 1996). The second method has given us our best results, so I describe it here briefly.
Working with one range image at a time, we first convert the image to a signed distance function defined over 3D-space. In other words, for each point in space, this function gives an estimate of the distance forward or backward from that point to the surface of the object. In places where the laser did not see the object because of occlusions, this function will have gaps in it. We then sample this function on a lattice, producing an array of voxels
(literally, volume elements), and combine it using a simple additive scheme with a voxel array representing everything we have seen so far. Digitizing a complicated object may require 50 or more such scans. This takes several hours, but since our motion platform is partially automated, only a few minutes of human interaction is necessary. The result is a voxel array densely populated with signed distances. Finally, we extract a contour surface (also called a level set) at the zero-distance level. We typically represent this contour surface as a mesh of tiny polygons, frequently numbering in the millions. This mesh is our best estimate of the object's surface.
The meshes created using this algorithm have two desirable properties. First, because they arise from the addition of many overlapping scans, they are relatively free of sensor noise. Second, because they are contour surfaces, they are watertight, having no holes or self-intersections. Mathematically, we say that they are manifolds. This property allows us to fabricate a physical replica of the model, as shown in Figure 2.
|The figure appears on the previous page.|
The amount of light reflected from a surface depends on both the direction of illumination and the direction of reflection. Each of these two directions consists of two angles. The resulting four-dimensional function is called the bidirectional reflection distribution function (BRDF; Nicodemus et al., 1977). BRDF typically varies with the wavelength of the illumination, and, for textured objects, it also varies from point to point on the object's surface.
Construction of the BRDF for a given material can be done analytically—by mathematically modeling the underlying physics—or empirically—by measuring the light reflected from samples of the material. Where analytical models are available, this method is preferred. One example is the Cook-Torrance-Sparrow reflection model for metals and simple plastics. Some of the most realistic computer graphics pictures to date have been produced using this model (Cook and Torrance, 1981). But for compound materials, materials subjected to extensive surface finishing, or materials of unknown composition, analytical methods may be infeasible. Unfortunately, this is true for many materials used in design and manufacturing applications, and in these situations empirical methods are necessary.
A device capable of measuring the light leaving an object as a function of the angles of illumination and reflection is called a scatterometer or gonioreflectometer (Hunter and Harold, 1987). Such a device typically is accurate but expensive and slow. In keeping with our goal of building inexpensive devices, we are designing a new generation of handheld gonioreflectometers that employ wide-angle optics, high-resolution sensor arrays, and commodity multimedia chips to compress the digitized reflections. Our goal is to build a device that is capable of quickly characterizing the reflectance of an unknown material to an accuracy sufficient to generate computer animations of objects covered with that material. Of course, we would love to build a color 3D fax machine, but no such technology as yet exists. A preliminary result of our work in digitizing object appearance is shown in Figure 3.
There are several directions this project might take in the future. First, our present scanning methods are time consuming and not fully automated. We need algorithms that, given a partial computer model, can determine the "next best view" to acquire unseen portions of the object. Such an algorithm would drive a robot arm that holds either the object or the camera, making scanning completely automatic.
Second, although colored dense polygon meshes suffice for some of our applications, other applications require a higher-level representation. For applications in computer animation and computer-aided design (CAD), we have
developed a representation that combines both B-spline surface patches to capture overall shape and displacement maps (sometimes called offset functions) to capture fine surface detail (Krishnamurthy and Levoy, 1996). This hybrid representation yields a coarse but efficient model suitable for animation and a fine but more expensive model suitable for rendering. An example is shown in Figure 4.
Third, recent advances in time-of-flight laser range-finding technology permit us to scan accurately such large objects as cars, buildings, engineering sites, movie sets, and so on. Combined with our present system, this technology allows us to build virtual environments of unprecedented complexity. Of course, increases in the complexity of computer models must be accompanied by improvements in the software needed to manipulate these models as well as improvements in the rendering engines needed to display them.
As a technology become less expensive, it can be applied in an increasing number of areas. We have already seen commercial applications of 3D digitization emerging at the professional and retailer level. For example, shoe and clothing manufacturers, with an eye on the custom-fit market, have begun installing range image scanners in their larger stores. Some of these scanners are capable of digitizing an entire body at once, and many of them have our software embedded in them.
Another example can be found in several museums, which are evaluating this technology for scanning their most precious sculptures. If someone ever again should attack Michelangelo's Pieta with a hammer and chisel (which will hopefully never happen), there might be a 3D backup from which to restore it. On a more positive note, museums could use these 3D archives in conjunction with emerging rapid-casting technologies to make perfect replicas to sell to the public. These copies, authorized by the museum, would revolutionize the art reproduction industry, which currently is pervaded by poor quality replicas often made from photographs of the original.
It is doubtful that this technology will be cheap enough—or fast enough—to appear in homes anytime soon, much as my children would love to own a Lego® duplicator. But even if consumer applications are unlikely, we at Stanford have been driven in this project by the image of a popular consumer device: the microwave oven. In our dreams we see a fax machine about that size; we put the object in, close the door, press a button, and a few minutes later we have a computer model. We then punch in a telephone number, and a few minutes later our collaborator across the country opens the door of his box and retrieves a physical replica of the object.
Besl, P. J. 1989. Active optical range imaging sensors. In Advances in Machine Vision, J. L. C. Sanz, ed. New York: Springer-Verlag.
Cook, R. L., and K. E. Torrance. 1981. A reflectance model for computer graphics. From proceedings of SIGGRAPH '81 published in Computer Graphics 15(3):307-316.
Curless, B., and M. Levoy. 1996. A volumetric method for building complex models from range images. Pp. 303-312 in SIGGRAPH '96. Computer graphics proceedings, annual conference series. New York: Association for Computing Machinery.
Hunter, R. S., and R. W. Harold. 1987. The Measurement of Appearance. New York: John Wiley & Sons.
Krishnamurthy, V., and M. Levoy. 1996. Fitting smooth surfaces to dense polygon meshes. Pp. 313-324 in SIGGRAPH '96. Computer graphics proceedings, annual conference series. New York: Association for Computing Machinery.
Nicodemus, F. E., J. C. Richmond, J. J. Hsia, I. W. Ginsberg, and T. Limperis. 1977. Geometric Considerations and Nomenclature for Reflectance. NBS Monograph 160. Washington, D.C.: National Bureau of Standards.
Turk, G., and M. Levoy. 1994. Zippered polygon meshes from range images. Pp. 311-318 in SIGGRAPH '94. Computer graphics proceedings, annual conference series. New York: Association for Computing Machinery.