Emerging Drinking Water Contaminant Databases: A European Perspective
Drinking water production in Western and Central Europe has its similarities and dissimilarities compared with the situation in North America. Groundwater is heavily utilized in certain regions, whereas others depend on the availability of surface water from lakes and rivers. Chlorination is avoided as far as possible; slow sand filtration or bank filtration is the method of choice, followed by advanced treatment such as activated carbon, ozonization, and at some places membrane filtration.
Groundwater, if unpolluted, is preferred. Lake water is often remarkably free of chemical pollutants but has its own biological problems (growth of microalgae or cyanobacteria), whereas river water is often quite heavily polluted and in need of various treatment steps.
The quality of drinking water and the number of contaminants that might be encountered in it strongly depend on the quality of the raw water (except for disinfection byproducts). The quality of the raw water in turn depends not only on chemical and biological properties (which are influenced by industrial activities, effluents of municipalities or runoff from agriculture) but also recreational activities, shipping routes, river bank structures and so forth. So, at least in Western Europe, databases are rarely used for drinking water parameters only but invariably contain millions of data on raw water too. That is the reason I would like to concentrate here on both aspects, since most contaminants that we encounter in raw water can find their way to drinking water, at least if treatment fails.
As secretary of the International Association of Waterworks (IAWR) in the Rhine basin and director of the Dutch-Belgian Association of Waterworks, I am particularly familiar with the situation in Western and Central Europe and base this paper on my experiences in these regions. It can safely be stated that the most advanced treatment of surface water for the production of drinking water can be found in this part of Europe (Switzerland, Germany, France, and the Netherlands). Also, there is probably no other river in Europe that for decades
has been so intensely monitored for hundreds of parameters. According to a recent list of the Association of Rhine and Meuse Waterworks (RIWA), more than 1,200 chemical substances have been detected in Rhine water and/or drinking water. Not surprisingly, database structure and development are important for those waterworks associations and follows modem lines of thinking.
The Rhine Catchment Area
Reasons for this preoccupation with the Rhine is the special situation of its basin. The Rhine River is not only one of the major rivers of Europe, but its river basin is also the most densely populated and most heavily industrialized area in Europe. The catchment area of the Rhine belongs to eight different countries.
After World War H when industrial activities started again and wastewater treatment was almost unknown, the waterworks in the Rhine basin became worried. Since they did not expect fast results from governmental activities, they decided to start their own monitoring programs along the Rhine and Meuse rivers and the important lakes and tributaries. The intention of the waterworks was not only to measure the extent of the pollution but also to confront industries and governments with the data and to demand the elimination of toxic or other relevant substances, especially those that are not biodegradable.
Out of necessity, the 120 waterworks in the Rhine basin, belonging to eight different countries, created an international association (the IAWR) specifically for the catchment area of the Rhine. All waterworks are also members of their national waterworks association, but they needed a second, more specialized, association for the problems of their own river basins. Nowadays it is modem to speak about policies based on river basins, but 40 years ago this was quite unique.
Governments were and are of course important because of the regulations and laws that are issued, but they were not very active in finding and inhibiting pollution. The waterworks associations were much more active in this field: apart from monitoring water quality at many places, staff members also went by boat to relevant emission points of industries or communities to find the sources of pollution. And they could do what no government institution could: cross the border and take water samples in a neighboring country.
This would in itself not be enough because detecting the source of pollution and filling the database is one thing but eliminating the pollution is another matter. Discussions with industries or governments could take days or weeks but were always successful if the polluting substances could be traced in drinking water and were toxic. The reason for success was that in such a densely populated area the general public is very sensitive about the purity of its drinking water. The pressure on industries and governments would have been enormous, if the waterworks associations had to publish in the newspapers that their
demands were refused. Thanks to the efforts of the international associations of waterworks of this area, the International Rhine Commission, and numerous discussions with industries and governments, the water quality of the Rhine has improved dramatically.
Drinking Water Production in Western and Central Europe
One short remark concerning the possibility for drinking water contaminants may be useful. When the Romans conquered part of the Rhine valley (nearly 2,000 years ago) they had the problem of finding drinking water. They never used river water directly; they also did not like to dig wells near rivers but preferred to build aqueducts that carried water from the mountains to the valley. Also nowadays, large cities to small villages all use either deep groundwater (about 50 to 300 m deep) or riverbank filtration. Numerous studies have shown that even severe accidents of chemical industries will not make the bank filtration water useless, since normally not more than one percent of the pollutant or even much less will be found in the wells. What hurts are high constant levels of pollution.
Thus, large cities like Cologne or Dusseldorf, producing drinking water for one million inhabitants each, use bank filtration, while cities such as Bonn derive their water from reservoirs in the mountains. Also in the Netherlands, river water is not directly used. The city of Amsterdam, for example, pumps Rhine water from the center of the country via a 60-kin-long pipeline to the dunes near Amsterdam, where it is infiltrated and passes through thick layers of sand; after about three months the water is pumped up again and treated. All larger waterworks in the Rhine catchment area use rapid and slow sand filtration, activated carbon, and ozone for disinfection. Chlorine as a disinfectant is avoided and cities such as Amsterdam have not used it for at least 10 years.
Monitoring and Databases in Western Europe
Since almost SO years we find a parallel development as far as the monitoring efforts and the development of databases are concerned. The waterworks associations concentrated on raw water and drinking water, the governments concentrated on raw water, sediments, and many other aspects that concern water from rivers, lakes, or groundwater. As a result, monitoring programs of the governments contain many more aspects but analyze fewer contaminants in raw water and nothing in drinking water. Databases of waterworks are often more specialized for water parameters and more flexible and less cluttered with other datasets.
In recent years, collaboration between waterworks associations and government institutions has improved, particularly in the Netherlands, where according to an agreement between RIWA (waterworks) and RIZA (government), relevant water contaminants are analyzed by only one of both organizations, depending on the monitoring station. A free exchange of data ensures that both sides can utilize the results in their publications.
The Strategic Position of Databases
A database with records concerning water quality should not be established without a purpose. It is an important part in a sequence of events, from questions to meaningful answers, but not an endpoint:
- Question: Usually the process starts with a question about water quality at a given location. If a river is concerned, the answer would be insufficient if only the water quality at the spot were analyzed because a second question would then inquire about the source of the pollutant, which is probably farther upstream.
- Monitoring: A monitoring program must therefore be specified that contains the strategic position of the monitoring points (e.g., at the border of a country or near emissions pipes of industries or bigger cities), the frequency of monitoring and the possibility of seasonal influences (such as climate, seasonal use of pesticides, or batch production in industries), the parameters that should be analyzed, and the appropriate analytical methods.
- Database: The data shall be placed in a database in such a way that they remain meaningful and are easily extracted. Since the structures of databases are not easily changed, the definition of fields, possible ways of input, connection with other programs and output by way of tables, graphs, and so forth must be thoroughly discussed. A database is nothing but a tool that stores datasets and keeps them available. Often, the analytical data alone do not give an answer to the original question because they must be interpreted and discussed together with other factors.
- Interpretation: Interpretation of data is the next step, taking into account the detection limit of the analytical method, the recovery rate, plausibility of the result, and so forth.
- Integration: This is followed by integration with other facts or factors, such as limiting values concerning human health or ecotoxicology, presence of other contaminants with similar or antagonistic behavior, biological or chemical degradation, and the influence of radiation. A given contaminant is not regarded as being isolated from the rest of the waterborne pollutants but is part of a much larger group of substances or organisms.
- Answer: Finally, an answer is given that (under ideal conditions) not only contains the concentration of a certain substance in water but gives an interpretation of the data as far as the methods are concerned and places these data in a broader context of health-related information and possible synergistic or antagonistic effects.
A Database for Whom?
Water quality monitoring and storage of data depend on who is interested. Most decision makers in governments are chiefly interested in information that determines whether a policy and the subsequent management program are actually achieving the desired results. Waterworks, on the other hand, may only be interested in those parameters for which limiting values exist. Also, politicians may find other parameters important but sometimes not for a long time. Interest can shift from one legislation period to another, depending on scientific insight, changing requirements, or pressure by action groups.
Thus, in the Netherlands, interest is shitting from contaminants in water to those in sediments, and ecological and ecotoxicological studies are now high on the priority list. In former years government institutions in the Netherlands studied roughly the same parameters as the waterworks (such as chloride, nitrogen, phosphate, heavy metals, and pesticides); now they concentrate more on substances in sediments (and try to use mathematical modeling to enable them to calculate the concentration of a substance in the water phase). Biotests or the study of contaminants in biota or the food chain gain more and more importance.
Growing costs and shifting priorities make the work with mathematical models more important. Based on the concentration of a contaminant in sediments, the concentration in water is calculated. The results will also be stored in databases under the name of the parameter. In this case it is important to make a reference to the mathematical model that has been used and also to the sediment data, in case a better model can produce more accurate predictions of the concentrations.
Contents and Uses of Databases
Databases have several purposes: they are filled with relevant information on physical properties, chemical substances, (micro)biological data, limiting values, etc.; a query makes it possible to obtain the raw data; connections between parameters or monitoring points should be possible; calculations and statistical operations should be possible; and output in form of tables or graphs is standard. Databases are used to check the status quo (e.g., the drinking water quality at a certain moment); for operational purposes (how water treatment is performing); to determine trends over months or years; to show the parallel development of certain parameters (e.g., orthophosphate and total phosphate) or the interdependency between parameters (e.g., oxygen concentration and dissolved organic carbon); and to check whether regulations (e.g., concerning waste water treatment or nitrogen removal) are effective.
Depending on the interests of a person who asks a question, a database must contain different sets of data: for a trend analysis over decades, one analysis per week or month can be sufficient (depending on the parameter); for operational purposes, continuous measurements every 10 minutes or every hour might be necessary; and in other cases the mere presence or absence of a parameter (perhaps above a limiting value) might already be sufficient.
"Decision Makers" and Their Responsibility
An old-style decision maker simply wanted to know "everything" because he felt responsible. In those days hundreds of pages were filled with tables containing thousands of data. More often than not a good decision was not really possible. A well-filled database was the basis for these reports.
A modern-style decision maker has no time, hates voluminous reports, and wants the information preferably on one page, accompanied by a graph and a few figures enabling a "yes" or "no" answer. How many data are necessary for such a report? Can a database give an complete answer? Here, I think, interpretation of data and integration with other aspects gains importance; a database report in itself may not be sufficient.
Formerly, a report consisted of datasheets listing the analytical data of all parameters that had been analyzed. The database had to produce these reports based either on actual measurements or statistical data such as monthly averages, minima and maxima, percentiles, and so forth. It was hoped that these reports fulfilled a need.
Nowadays a decision maker is not interested in all data of all parameters; there are only some general questions and a request for an answer, based of course also on the contents of databases. Questions such as these require an answer:
- Is this country safe, now and in the future, or are there problematic spots on the map?
- Can drinking water be produced from surface water or groundwater? Is this water safe and healthy? What is the relationship between the costs of treatment and pollution? Are there dangerous substances in water and what are the health risks? Who is producing these contaminants and why?
- Will the natural environment be the same for future generations? Will it improve or deteriorate and why? Can this process be stopped?
- Can people swim in waters without health risks?
- What is the effect of climate changes of longer periods? How does it effect water availability and quality? Is drinking water safe?
- Will there be enough water of sufficient quality for people, shipping purposes, power plants, industries, and agriculture? Which sector has priority?
For a politician these are important questions and a thorough answer is expected. This should be easy, one assumes, because databases contain the results of monitoring efforts for which millions have been paid.
Form Data to Information
To get information, you need data. For every piece of information the right number of data is required. Less data would make the information fragmentary or would produce perhaps no information at all. A huge number of data might, on the other hand, not produce more information; it would simply be a waste of time and money. Thus, a question about the water temperature of a certain place, date, and time could be based on one analytical record in the database, whereas a comparison of the average water temperature of two years would be based on two statistical operations, which use a dataset of perhaps only 12 measurements per year (one per month) or literally thousands of data per year.
The number of analytical data thus depends obviously on the precision that is required and the situation in the area. To measure the water temperature of the Congo River every 10 minutes would be nonsense, since it doesn't change much in that country. To measure the temperature of the Rhine every 10 minutes would also be nonsense for most questions, although it differs considerably during the seasons, but we know already that the pattern doesn't change much over the years or decades. On the other hand, if contaminants must be detected that are released in batches by unknown industries, much more data may be required.
Accordingly, there is no general rule concerning the amount of data required for meaningful information. It depends on the country, the situation, the contaminants, and the questions that are likely being asked. As far as a database is concerned, it should be able to harbor a large number of records (on the order of millions), respond in a flexible way to the information needs of today and tomorrow, and connect well to other software programs.
Short- and Long-Term Questions
Short-term questions are related to everyday decisions: these concern either questions about the status quo of a water source or comparisons between conditions of recent years. Long-term questions concern, for example, predictions of the water flow of a river (depending on all possible criteria such as water usage by industries or agriculture up to discussions about climate changes); trends in water quality and future impacts (e.g., concerning population growth, higher demands, new scientific results concerning toxicology); trends in biological species composition, population growth of water biota, indication of adverse factors; development of riverbeds and riverbanks in time, influence of sea-level rising; rivers and lakes as commercial shipping routes; and development of recreational shipping and its influence on water quality, biological diversity, and structure of riverbanks and bank vegetation.
Data Collection: Systematic or Pragmatic Approach
A systematic approach covering all kinds of aspects would require a database that has been structured in such a way that it can contain a wide variety of data for which questions have actually been asked or might be asked in the
future. It would contain billions of data, most of which will never be extracted, but it hopefully will contain some data for most questions.
The pragmatic approach will not gather all data. Instead, it mainly concentrates on present needs and expectations. Many inquiries that are now of interest (e.g., concerning ecology, endocrine disruptors, etc., were not asked 20 years ago; consequently, our dataset is very limited. Also, many questions have only been possible because of the advancement of analytical methods, and also in this case older data are lacking.
That is the reason why a more pragmatic approach gains ground: the problem-oriented approach. In river systems such as the Rhine, where water quality has improved considerably over the past two decades, water quality still is the daily bread for waterworks but not for governments. The latter are nowadays more interested in the quality of river sediments, since the sediments act like a sink containing all kinds of undesirable chemical substances of the past, which are slowly released into the environment—effects that are all the more noticeable because of the much improved water quality. Thus, interest is gradually switching to new matrices, and the study of older parameters or former matrices is reduced.
Design of Databases: History and Modern Developments
Old-Style Database Design
Years ago—and here I speak of personal experience in our waterworks associations—a database was programmed by a specialist who often belonged to the computer department of an institute or came from a company specializing in databases. During that time, computer departments were powerful parts of the institute; they decided which hardware and software had to be used and often had a preference for programming the database entirely themselves.
Often a tailor-made database was created for the special needs of that time. More often than not, only the programmer knew the details, and he was often the only person who could change or adjust the database. Generally, a precise description of the structure was missing or was not adjusted in subsequent years, and the source code was usually not available for other specialists. Frequently, additional parts of the database program were written in different programming languages, depending on the knowledge of the programmer at the time and the availability of new programming languages over time.
Mainframes or minicomputers were bought, running under Unix or similar systems, and installed in special air-conditioned rooms and could be operated only by specialists. Queries had to be written in SQL, a language normally not mastered by a decision maker. The production of tables, graphs, or other forms of output also required a specialist.
New-Style Database Design
Changes took place that affected the well-being of the computer departments and could only be brought about after years of heated discussions. The important questions centered not so much on the design and integrity of the new-style database but on the users. In our case it was mandatory that even a secretary be able to extract data without the help of a specialist.
A modem database should be user friendly (can be utilized by all authorized persons, from secretary to director); flexible (can be easily adjusted or rebuilt, depending on future demands); based (preferably) on powerful PCs (no longer minicomputers or mainframes); available in a PC network or on standalone PCs; able to insert new data automatically via modem or from disks (no specialist required); and able to produce tables or graphs via query by example.
The database of the RIWA-IAWR is a modem example: it works on powerful PCs under Windows 95 or NT; its structure is modular, consisting of a core database for raw data, surrounded by a variable number of specialized modules (e.g., for production of tables or graphs, calculations, statistical methods, trends, detection of extreme values, monitoring schemes); it is based entirely on existing programs such as Access for the database and parts of the graphs, Word for texts, Excel for tables and other graphs, statistics programs, and so forth; programming was only necessary to interconnect the commercially available software; the structure of the database and its modules is documented, and the documentation is constantly kept up to date; and the source code is available.
Database of The Associations of Rhine and Meuse Waterworks (Riwa)
The databases of the waterworks associations in Amsterdam (RIWA) and Karlsruhe (TZW, Germany) have similar structures and philosophies. The database at TZW is still being developed; therefore, I shall give some comments based on the RIWA database, which has existed for several years:
- Number of parameters: there are about 1,500 parameters in the database, and the number is constantly growing.
- Code: all parameters have a four-digit number for the concentrations of a chemical substance (e.g., 0230 for chloride in μg/L) or for the numbers of biological entities per 100 μ/L (e.g., bacteria, oocysts of Cryptosporidium); other laboratories use five digits.
- Entity: the entity is an integral part of the parameter; thus, arsenic in micrograms per liter would be another parameter than arsenic in ng/L; entities are not uniform in laboratory information management systems (LIMS)—at least not in Europe; confusion can thus easily be avoided by treating them as different parameters; if necessary, automatic calculation is switched on to transfer data from the nanograms per liter to the micrograms per liter parameters.
- Control: Find extreme (and probably faulty) values; this is either done by a special module that produces a report or visually by controlling the graphs of
- all parameters for all monitoring places in a certain year, a graph-module produced all necessary graphs during the night, which are then placed on the screen in intervals of 5 seconds; about 700 graphs can be checked visually per hour. The complete monitoring network can thus be checked within one or two days.
- Validation: After validation of the data at the end of the year, the dataset for the year is closed and changes in the dataset can only be inserted by a few authorized persons.
- Statistics: After validation the normal statistical calculations are carried out (e.g., minimum, average, maximum, means, percentiles). The resulting data are preserved in the database and are thus quickly available.
- Reports: For reporting purposes the report module produces the tables of our yearbooks. Several other formats are already fixed, and every user can easily build his or her own format (query by example).
- Graphics: Graphs can contain the data of one or several years, with or without the relevant limiting values. Boxplots are built in and so is the possibility for combining datasets of one parameter but from different places or datasets of different parameters from the same location.
- Export: Datasets, tables, and graphs can be exported as comma or tab delimited ASCII-file or as Word or Excel-file for use in other programs.
Delivery of Raw Data
Since data are coming from about 20 different laboratories or institutions, using many different LIMS or database engines, we had two choices: (1) either ask that the requested data be delivered in a certain format that we would define, or (2) allow the data to be delivered in the default format of their system and create a conversion program for each data format at our place. The first option would have been easier for us, but it did not work. So we decided to adapt ourselves and accept all kinds of formats provided the format of a lab remains the same for at least one year and we are notified about format changes. To everyone's satisfaction, this works perfectly.
New Demands for Databases
For New Contaminants
Most databases have been built for chemical parameters but often include colony counts of bacteria. Our database is no exception. This poses the question whether new contaminants can be incorporated or not. As long as the presence of these contaminants is reported in concentration per volume or number per volume, there should not be a problem. This is probably the case for the majority of new contaminants, whether they are new chemical substances, medicinal drugs
in water, chemicals acting as endocrine disruptors, or the counts of Cryptosporidium and Giardia in water. Also, parameters such as the mutagenicity of water pose no problem if expressed in revertants per volume.
Other characterizations of raw water or drinking water are often not preserved in the computer but are kept in a graphical way: the movements of daphnia or mussels are recorded as graphs and are usually kept in this way only. It would nevertheless be possible to keep the underlying data in the computer and to produce the graph anew if necessary. Many other biotests that produce a yes/no answer can also be stored in the database. However, the use of computers in storing data from biotests is often not regarded as necessary. If performed on a more regular basis it would make one problem very clear: the almost complete lack of standardization, which does not allow the biotest results to be compared—at least not in the way it is done with chemical parameters. But this may be merely a matter of time and agreement.
Biological data in the classical sense of the word are often placed in specialized databases, not together with chemical parameters. The reasons for this are that thousands of names can be involved if algae and invertebrates are studied; synonyms should be inserted and kept up to date, since in older literature different names are often used for the same taxon; and a hierarchical structure is wanted that shows the relationship between taxa (not only an alphabetical list of unrelated names); the systematics must be updated.
Databases for Screening Purposes
Another specialized part concerns the screening of water sources using either gas chromatography/mass spectroscopy (GC/MS) or liquid chromatography/mass spectroscopy (LC/MS) fingerprint methods. GC/MS produces a spectrum of many peaks of substances. Depending on the source, some or many of the more prominent peaks can be identified; others belong to unknown substances. With the amazing progress in analytical methods, some of these peaks will be identified in later years, and it would be valuable to compare the new results with older GC/MS spectra.
To facilitate this study, the Dutch organization of waterworks has built a database (called Infospec) that can hold all of the relevant data of the GC/MS spectra and can (retroactively) attach a name to them if the underlying substance is recognized. This program works independently but can be connected to the normal database. The program was developed by KIWA for PCs under DOS. The new version will run under Windows and is a joint effort of KIWA (for the Dutch waterworks) and RIZA (for the Dutch government). It will be used in the whole Rhine catchment area and will also be available worldwide.
How Many Databases Are Necessary?
In former days, and many government institutions still work the same way, there was one single database to hold all the data that were thought to be valuable or necessary. These were very complicated databases indeed and not at all flexible. Nowadays, at least as far as the waterworks associations in Western and Central Europe are concerned, smaller, more specialized, databases are preferred. Based on PCs under Windows, with a modular structure they are user friendly and easy to maintain. Classical databases (containing chemical and microbiological parameters), biological databases, and the so-called Infospec database for GC/MS screening can easily be combined in a network if the structure (modular concept) and the interface are properly designed.
From Data to Information
As noted earlier, information is a much advanced stage and is reached after evaluation of datasets and integration with other aspects. Modem decision makers rely on information, whereas tables with raw data (the original product of a database) are relevant on the working floor and in the hands of specialists.
Information is not easily obtained. Many different databases and a lot of expertise are necessary. The information, once given, may be sufficient for a politician, but for a scientist it is often useless since no reference is made to the data that have been used. Many of the modem reports of governmental institutions are therefore difficult to follow: the message may be clear but not the underlying. That may be the reason that in Europe data must be available on a much broader scale. Some countries (such as Germany) are very reluctant to put raw data on disks, CDs, or the Internet; others (e.g., the Netherlands) prefer to open the databases to the public.
The waterworks in the Netherlands, for instance, always had a policy to publish all data. Nothing was kept secret, even disturbing facts. As a consequence people trust their waterworks. They know the waterworks associations are active and alert and will do everything in their power to change the situation for the better.
All countries, however, regardless of their policies concerning the availability of data, fear that raw data can be interpreted in a very different (and sometimes wrong) way by an average citizen in the streets. They are therefore trying to find better ways of communication. At present, many different graphical ways are explored that could deliver meaningful information, not just of raw data, and of course in dealing with Europe each country does it in a different way.
The old graveyards of data, held in mainframes or minicomputers, are a chapter of the past. Specialized databases—for instance, for chemical, biological, or GC/MS-data—are developed under Windows and kept in local networks. The structures must be as flexible as possible, based on specialized modules surrounding a database core. Reporting facilities have become a matter of utmost concern, and the information to the general public must follow the modern lines: from reports to disks, CD-ROM, and the Internet.