Edmund L. Russell
Advanced Micro Devices
Increasingly in industry and recently in semiconductor manufacturing, partly due to the introduction of sensor-based data collection, network connectivity, and the availability of cheap data storage devices, we are seeing the automation of data collection becoming so common that one of the questions most frequently asked to statisticians is: "How do we turn all this data into information?" The consulting statistician in this industry is beginning to be faced with massive data sets.
The semiconductor manufacturing environment is a high volume manufacturing environment. In a typical processing sequence, there may be over 100 process operations performed before the raw material, crystalline silicon wafers up to 8 inches in diameter, is converted to wafers carrying up to several thousand unpackaged electronic circuits, called die, on them. There are perhaps a few dozen additional manufacturing operations related to packaging, in which the die are encapsulated in plastic or ceramic, that occur before the product is ready for sale.
In the semiconductor manufacturing industry, the competition is generally very aggressive. Prices for products generally fall throughout any given product's life-cycle. And that product life-cycle can be very short; it is often measured in months. Even in this environment of short life-cycles and failing prices, the demands for product quality and reliability are extreme. Customers are beginning demanding that product be delivered with less than 10 parts per million defective and with reliability such that there are less than 30 failures expected per billion device hours of operation.
These characteristics of falling prices, increasing competition, complex state of the art processes and short life-cycles combined with the high capital cost of bringing a new manufacturing facility on-line are creating a great need to collect an analyze ever greater amounts of data.
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 69
--> Massive Data Sets in Semiconductor Manufacturing Edmund L. Russell Advanced Micro Devices 1 Introduction Increasingly in industry and recently in semiconductor manufacturing, partly due to the introduction of sensor-based data collection, network connectivity, and the availability of cheap data storage devices, we are seeing the automation of data collection becoming so common that one of the questions most frequently asked to statisticians is: "How do we turn all this data into information?" The consulting statistician in this industry is beginning to be faced with massive data sets. 2 Background The semiconductor manufacturing environment is a high volume manufacturing environment. In a typical processing sequence, there may be over 100 process operations performed before the raw material, crystalline silicon wafers up to 8 inches in diameter, is converted to wafers carrying up to several thousand unpackaged electronic circuits, called die, on them. There are perhaps a few dozen additional manufacturing operations related to packaging, in which the die are encapsulated in plastic or ceramic, that occur before the product is ready for sale. In the semiconductor manufacturing industry, the competition is generally very aggressive. Prices for products generally fall throughout any given product's life-cycle. And that product life-cycle can be very short; it is often measured in months. Even in this environment of short life-cycles and failing prices, the demands for product quality and reliability are extreme. Customers are beginning demanding that product be delivered with less than 10 parts per million defective and with reliability such that there are less than 30 failures expected per billion device hours of operation. These characteristics of falling prices, increasing competition, complex state of the art processes and short life-cycles combined with the high capital cost of bringing a new manufacturing facility on-line are creating a great need to collect an analyze ever greater amounts of data.
OCR for page 69
--> 3 Data Overview Data taken during manufacturing is available from a variety of sources. Some of the data are collected as a normal part of the statistical process control effort. Some of the data are collected as a normal part of the electrical screening tests that ensure product quality. At AMD we currently collect and summarize approximately 2 gigabytes of such data per day. In order to discover better ways of controlling key process steps, a number of companies are now automatically collecting some data using process state sensors. Even in the development stage, the data volume from these sensors is huge. It is now possible to collect over 1 megabyte of sensor data per wafer in a plasma etch step alone. Given that there are typically 10 or more such steps in a manufacturing process, when one considers that an average wafer fabrication site produces several thousand wafers per week, the potential data volume for analysis is huge. Some of the reasons we wish to collect manufacturing data and perform the analyses include: process and product characterization process optimization yield optimization process control design for manufacturing The question might be raised as to how these needs are different from the same needs in a more typical manufacturing environment? The first and foremost reason is that data are available from a large number of process operations—and much of that data can be collected automatically. The second reason is that the manufacturing process involves a large number of steps, some of which are essentially single wafer steps and others of which are batch processing steps of various batch sizes. In addition, much of the summary data collected at this time are highly correlated due to the nature of the underlying physics and chemistry of the processing operations. In addition there is an established practice of taking multiple measures of the same electrical characteristics using test cells of varying sizes and properties. So, many of the apparently "independent" observations aren't actually independent. There are other sources of data that are less related to direct manufacturing that may be used with the manufacturing data. These sources of data involve the output of process simulators and die design simulators. It is becoming more standard throughout the semiconductor industry to link these simulators together in chains to get a better picture of the expected performance characteristics of processes and semiconductor devices. these expectations may then be compared to actual manufacturing experience. Manufacturing process data are typically collected in 4 different stages, each of which provides a characteristic type of data for analysis. These data types are: die fabrication data wafer electrical data sort electrical data final test data 4 Die Fabrication Data Die fabrication data are typically in-process SPC data at this time. Although SPC data and its uses in the manufacturing environment are fairly well understood, there has been some interest expressed both within AMD and in other companies about further leveraging the
OCR for page 69
--> SPC data. Given that there are often about 100 process operations providing SPC data, in a typical manufacturing process there would appear to be a good opportunity for mining the SPC data for process characterization and optimization. However, even when several of lots of wafers have experienced some similar sort of problem earlier in the process, it can be quite difficult to determine when all the lots ran through a given piece of equipment in a given span of time. this is because the wafer lot processing is not generally serial by lot. This is due to the mix of products in a manufacturing facility and the differences in the process flows among the products. We are also beginning to get process-state data on individual process operations, such as plasma etch, from process-state sensors that are being added to the equipment. These sensors can provide over 10,000 measurements per wafer. This type of data is being examined by a number of companies for the potential to provide model-based process control for run-to-run, or wafer-to-wafer, modification of the processing parameters. Because many, if not most, of the process steps during semiconductor manufacture are under control of process controllers and are also capable of being fitted with sensors, the future data potential is enormous. For instance, for the etch step in a typical wafer fabrication site, it would not be unreasonable to expect approximately 35 GB of data per week to be collected at some point in the future. This process-state sensor data is typically autocorrelated within the process step itself and there is frequently some correlation from wafer to wafer, both in single-wafer steps and in batch-process steps. It is obvious that the observed autocorrelation patterns in the data change over time within the processing of a single wafer—and the autocorrelation patterns may even change in response to changes in the processing environment. Both SPC and sensor based die fabrication data are now being explored in a variety of ways to assist with the optimization of individual process steps with respect to product quality, reliability, yield and throughput. A complication in this is that it is not known at this time what a "signal" is for a good process or even if the present process-state sensors are the correct ones to use. For instance, one of the hidden complications of sensor based data is that the sensors have response functions. That is, the signal that is observed can be distorted or even completely disguised by the response of the sensor itself and how it interacts with its measurement environment. There are few statistical tools available today, and precious little training for statisticians, on identifying and dealing with sensor response functions. Another hidden complication is that most of these process operations are under the active control of a process controller, often a PID controller. So we must also deal with the presence in the data of the apparent signal induced by the process controller itself. Here again, the statisticians are often not very conversant with process controllers and may not even be aware that at times the "signal" from a process may be mostly due to the controller trying to "settle." 5 Wafer Electrical Data This type of data is typically taken towards the end of the manufacturing of the semiconductor circuits, but before the die on the wafer are individually separated and packaged. The
OCR for page 69
--> data represents parametric measurements taken from special test structures located near the individual die on the wafers. Generally these test structures are neither a test die nor a part of the circuit itself. They are intended to provide information to the manufacturing site's engineers about the basic health of the manufacturing process. The typical set of wafer electrical tests is comprised of about 100 to 200 or more electrical tests on fundamental components or building blocks of the electronic circuits. Most of the values retained in the data bases are processed data and not the raw data taken directly from the test structures. The choices of the reported values are typically made so as to be most informative about particular fabrication process operations. This leads to many of the test values being highly correlated with each other. In addition, for any given test, consistent patterns across the individual silicon wafers or across wafers within the same wafer lot may be evident to the analyst. These patterns are often due to the underlying physics or chemistry of the process steps and so are expected to some extent. So with wafer electrical test data, we have data which can be both spatially and temporarily correlated by test site as well as highly autocorrelated within test site. This type of data represents some of the potentially most useful data gathered during the manufacturing of the wafer. It is generally believed by the engineers and managers that there are strong relationships between the wafer electrical tests and the individual processing operations. Indeed, there have been numerous occasions in the industry where misprocessed product was identifiable at wafer electrical test. If this is the case in the more general setting, then it would be highly desirable to produce models relating in-process data to wafer electrical data for "normal" product so that corrections can be made to the process to achieve optimum performance for the process. 6 Sort Electrical Test Data This data is generated by an electrical pre-screen, usually 100 pre-screen, of each of the individual die on all wafers in a wafer lot. These tests are often functional tests, however some tests my be performed on critical electrical parameters, such as product speed. As in wafer electrical test data, there is a potential for hundreds of data values collected per individual die on a wafer. And as in wafer electrical tests, there are often patterns discernible in the test results across the wafers and from wafer to wafer. Sort electrical test represents the first time the product itself is actually electrically tested. The primary use of this data is to pre-screen the product so that resources are not wasted in the remainder of the manufacturing process by embedding non-functional die in plastic or ceramic packages. Another possible use of this data, however one that is not frequently pursued, is to relate the sort results to the wafer electrical test data and thus the wafer fabrication process itself. It is worth noting that in such a case the analyst would have hundreds of dependent variables and independent variables simultaneously.
OCR for page 69
--> 7 Final Electrical Test Data This delta is electrical test data taken on the finished product. There can be literally thousands of highly correlated tests performed each packaged die. Many of these tests are parametric in nature however only pass/fail data may be available for analysis unless special data collection routines are used. Most of the tests are related directly to product specifications and so are tests of the product's fitness for use. Since there can be considerable interaction between the package and the silicon die, some electrical tests are possible for the first time at this stage of manufacture. As with the sort electrical test data, we would like to relate these test results to the die fabrication process. In addition we would also like to relate them to the package assembly process. Possible additional uses of this data include the identification of defective subpopulations and marginal product, product characterization and process characterization. At this stage of the manufacturing process, there can literally be thousands of highly correlated measurements taken on literally millions of packaged die. Even with relatively sparse sampling, the size of the data sets that can accumulate for characterization and analysis can be enormous. Because of the sizes of the data bases and the large number of measurements taken on each die, the available statistical methods for analyzing tend to be highly inadequate. Even seemingly trivial tasks such as validating and reformatting the data become major problems! 8 Statistical Issues The types of data sets that we are beginning to see in the semiconductor manufacturing industry present major challenges to the applied statistician. In this industry there are simultaneously, high "n," high "p," and high ''n and p" problems to be dealt with. Because of the turn around time required for these types of analyses, and the potential for rewarding results, many managers and engineers are trying to use other methods, such as neural nets and CART like methods, to analyze these types of data sets. Most are unaware that these methods are also statistical in nature and have particular strengths and weaknesses. By and large the best manner to proceed in attacking these types of problems is not really known. We often have great difficulty just dealing with the database issues, much less the analytical issues. Data volume issues even with simple analyses are a significant stumbling block for many software packages. Sematech has investigated a limited number of statistical packages and evaluated their performance in doing simple analyses with merely "large" data sets and found that most gave up or failed to work with reasonable performance. Once one begins analyzing a "small" problem in say 70 main effects with multiple responses and moderate correlation of some explanatory variables, and you are told that interactions are expected, the analyst quickly discovers the limitations of the common statistical methods at his disposal. Common concepts of even such routine ideas as goodness of fit are less intuitive. Even describing the domain of the explanatory variables can be a challenge!
OCR for page 69
--> 9 Conclusions How do we cope? The short answer is: "Not well." Statistical resources are generally somewhat scarce in many industries—and those resources are often directed towards supporting ongoing SPC and simple design of experiments efforts. In addition, many applied statisticians are at best, somewhat hobbled by a number of other factors: a statistical education that was too focused on the simpler core techniques, a lack of capable and efficient analytical tools, and a lack of ability to rapidly validate and reformat data. An applied industrial statistician that is lacking in any of these three areas on any given project may be in serious trouble. The more adventuresome analysts explore the use of non-standard (or non-standard uses of) methodologies such as AID, CART, neural nets, genetic algorithms, kriging, PLS, PCR, ridge regression, factor analysis, cluster analysis, projection pursuit, Kalman filtering, and Latin hypercube sampling to mention only a few. Less adventuresome analysts utilize more commonly taught procedures such as stepwise regression, logistic modeling, fractional factorial and response surface experiments, and ARIMA modeling. However far too many applied statisticians are unable to apply more than a couple of these techniques. Virtually all the meaningful training is "on the job." Even very highly experienced analysts fall prey to the special problems found with sensor based data and processes that use automated controllers. These problems include, but are definitely not limited to modeling: processes that have more than one type of regime (such as turbulent flow and laminar flow), non-stationary time series that change order suddenly for brief periods of time, sensor response functions rather than process signal, data from differing stoichiometries in the same experiments, PID controller settling time as noise, the responses of PID controllers rather than the process itself. Failure to deal appropriately with these sorts of problems in the design of the collection of the data can jeopardize the results of any analysis, no matter how numerically sophisticated the basic techniques. 10 Recommendations To adequately address these problems, we need a permanent forum focusing specifically on the issues inherent in massive industrial data sets, possibly including an annual conference and a journal. Issues that I would like to see addressed include: improving the education of applied statisticians. The ability to use a standard statistics package, perform simple tests of hypothesis, design and analyze standard fractional factorial experiments, and set up SPC programs does not represent the needs for the future. developing an understanding of which types of advanced modeling techniques provide leverage against which types of data for different analytical goals. Developing an understanding of the problems related to sensors in particular is critical. It is no longer sufficient to fit simple linear models and simple time series to data. developing more graphical aids and more knowledge on how to use existing graphics to assist in analysis. Graphical aids can assist with analysis and can help explain findings to
OCR for page 69
--> engineers and management. developing data handling and reformatting/parsing techniques so that throughput can be increased. A large portion of the time spent in a typical analysis is often in data handling and parsing. making new algorithms more readily available and providing a source for training in the usage of the new algorithms. encouraging software vendors to provide analytical tools that can handle large quantities of data, either in the number of observations or in the number of variables, for identified high leverage techniques. This research is supported by ARPA/Rome Laboratory under contract #F30602-93-0100, and by the Dept. of the Army, Army Research Office under contract #DAAH04-95-1-0466. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes not withstanding any copyright notation hereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements either expressed or implied, of the Advanced Research Projects Agency, Rome Laboratory, or the U.S. Government. References  Paul R. Cohen, Michael L. Greenberg, David M. Hart, and Adele E. Howe. Trial by fire: Understanding the design requirements for agents in complex environments. AI Magazine, 10(3):32-48, Fall 1989.  John D. Emerson and Michal A. Stoto. Transforming data. In David C. Hoaglin, Frederick Mosteller, and John W. Tukey, editors, Understanding robust and exploratory data analysis. Wiley, 1983.  Usama Fayyad, Nicholas Weir, and S. Djorgovski. Skicat: A machine learning system for automated cataloging of large scale sky surveys. In Proceedings of the Tenth International Conference on Machine Learning , pages 112-119. Morgan Kaufmann, 1993.  Michael P. Georgeff and Amy L. Lansky. Procedural knowledge. Proceedings of the IEEE Special Issue on Knowledge Representation , 74(10):1383-1398, 1986.  Peter J. Huber. Data analysis implications for command language design. In K. Hopper and I. A. Newman, editors, Foundation for Human-Computer Communication. Elsevier Science Publishers, 1986.  Amy L. Lansky and Andrew G. Philpot. AI-based planning for data analysis tasks . IEEE Expert, Winter 1993.  Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, 1995.  Robert St. Amant and Paul R. Cohen. Toward the integration of exploration and modeling in a planning framework. In Proceedings of the AAAI-94 Workshop in Knowledge Discovery in Databases, 1994.
OCR for page 69
-->  Robert St. Amant and Paul R. Cohen. A case study in planning for exploratory data analysis. In Advances in Intelligent Data Analysis , pages 1-5, 1995.  Robert St. Amant and Paul R. Cohen. Control representation in an EDA assistant. In Douglas Fisher and Hans Lenz, editors, Learning from Data: AI and Statistics V. Springer, 1995. To appear.