Open-Source Geographic Information Systems Software: Myths and Realities
National Institute for Space Research, Brazil, and University of Maine, United States
The development of open-source software has received substantial attention recently. Following the successful examples of projects such as Linux, Apache, and Perl there has been substantial interest by policy makers and researchers on the dynamics of the production of open-source software (Benkler, 2003). A topic of particular interest is the adoption of open-source software systems in developing nations, as a means of reducing licensing costs and of promoting indigenous technological development by having access to the source code of these systems. A recent survey on intellectual property rights and international development commissioned by the government of the United Kingdom underpins such policies with an explicit recommendation.
Developing countries and their donor partners should review policies for procurement of computer software, with a view to ensuring that options for using low-cost and/or open-source software products are properly considered and their costs and benefits carefully evaluated. (Barton et al., 2002)
Many studies that discuss the development of open-source software portray an idealized view that considers such software to be a product of a committed group of individuals. These individuals would operate on a distributed network, where each programmer works on a small but meaningful module. The programmers are isolated, communicating by means of a central repository and mailing lists. The incentives to participate operate on an individual level (Weber, 2002). Some authors go as far as identifying in open-source software a new mode of organizational structure denoted by commons-based peer production (Benkler, 2003). Others claim that the globally distributed skill induced by open source will loosen the grip of the richest countries on innovation (Kogut and Metiu, 2001).
This paper analyzes in detail one segment of the open-source software market in an attempt to find out the true extent of such claims and to establish the basis for a realistic view of the open-source movement. We will focus on geoinformation technology, which includes geographical information systems (GIS), location-based services, and remotely sensed image processing. We have chosen the geoinformation market for two main reasons. First, it is a key technology for developing nations, given its vast range of applications in areas such as environmental protec-
Director for Earth Observation, National Institute for Space Research (INPE), Brazil. Web: http://www.dpi.inpe.br/Gilberto.
National Center for Geographic Information and Analysis and Department of Spatial Information Science and Engineering, University of Maine, Orono, ME 04469-5711, USA, firstname.lastname@example.org.
tion, urban management, agricultural production, deforestation mapping, public health assessment, crime fighting, and socioeconomic measurements. Secondly, the authors are experts on the area, with a substantial experience on geoinformation software development, and are in a qualified position to assess the different products.
We consider the following questions: (1) What are the conditions of open-source software development? (2) Who builds geoinformation open-source software products? (3) Is there a need for innovative open-source software applications in geoinformation applications? (4) How can developing countries obtain geoinformation open-source software to meet their national needs?
Our survey indicates that the view of open-source software as a product of a team of committed individuals is not realistic, at least for the geoinformation market. Most products are built either by a very small team of individuals or by corporations, and large collaborative networked teams are responsible for a small number of products. Most projects reverse-engineer existing designs or comply with standards, and few products are innovative. Therefore, there is much scope for new ideas, especially considering recent advances in geographical information science and spatial databases and the much-increased availability of Earth observation satellites. Given the constraints in open-source software production, such advances will not happen spontaneously and will require public intervention to fund innovation.
In order to support our claims we first examine the need for innovative geoinformation tools. We consider different models of open-source software production from an intellectual property viewpoint, and then review the process of open-source geoinformation software production. Lastly, we propose a model for open-source projects in the developing world based on networks of government-financed institutions.
THE NEED FOR INNOVATION ON GEOINFORMATION TECHNOLOGY
One of the motivations for our survey on open-source GIS software is to identify the extent of innovation in the community. There are three main drivers for innovation in geoinformation technology: (1) the evolution of database management systems to handle spatiotemporal data types; (2) the availability of a new generation of Earth observation satellites; and (3) the recent advances in geographical information science.
The complete integration of spatial data types in database management systems is bound to change completely the development of GIS technology, enabling a transition from the monolithic systems of today (that contain hundreds of functions) to a generation of spatial information appliances, small systems tailored to specific user needs (Egenhofer, 1999). Coupled with the data-handling capabilities of a new generation of database management systems, rapid application development environments will enable the construction of “vertically integrated” solutions, directly tailored to user needs. Therefore, an important challenge for the GIS community is finding ways to take advantage of the new generation of spatially enabled database systems to build “faster, cheaper, smaller” GIS technology.
A second reason for developing open-source spatial analysis tools is the need to resolve the “knowledge gap” in the process of deriving information from images and digital maps. This knowledge gap has arisen because our capacity to build sophisticated data-collecting instruments (such as remote-sensing satellites, digital cameras, and GPS) is not matched by our means of producing information from these data sources (MacDonald, 2002). To a significant extent we are failing to exploit the potential of the spatial data we collect. For example, there are very few techniques for image data mining in remote-sensing archives, and thus we are failing to use the information available in our large Earth observation data archives. Much of this knowledge gap has resulted from a substantial imbalance in public expenditure in geoinformation technology. Major Earth observation satellite programs such as ENVISAT and EOS have budgets in the billion-dollar range, where the vast majority of the money is spent on building and operating the satellites and sensors.
An additional challenge is how to incorporate recent advances from geographical information science into mainstream GIS. A number of important results have been produced in research areas such as spatiotemporal data models (Erwig et al., 1999), geographical ontologies (Fonseca et al., 2002), spatial statistics and spatial econometrics (Anselin et al., 1999), cellular automata (Batty, 2000), and environmental modeling (Burrough, 1998). These results have largely been outside of the reach of the user community because of a lack of widely available tools and systems that support them.
MODELS OF INFORMATION PRODUCTION IN OPEN-SOURCE SOFTWARE
From an intellectual property viewpoint we distinguish three models of information production for open-source software: (1) the postmature model; (2) the standards-led model; (3) the innovation-led model.
The postmature model exists in strongly consolidated markets. In many cases one proprietary product has a very large market share. As this product becomes popular its functionality and conceptual model becomes well established, and it becomes part of the public commons. Switching costs will prevent a new commercial product from capturing market share even if sold at lower prices. In this case there is a strong incentive for newcomers to license their products as open source. Many users will consider that the perceived benefits of open source will outweigh the cost of switching from the commercial product they might be using. One example is the Open Office productivity suite. Alternatively a private corporation may decide to license a product previously associated with private intellectual property rights as open-source software. Such is the case for the Mozilla browser.
The standards-led model exists when the establishment of standards consolidates a technology and allows compatible solutions from different producers to compete in the marketplace, thus opening an opportunity for open-source products. Newcomers can benefit from the substantial intellectual effort that goes into establishing a standard. An example is the SQL database standard, which has motivated products such as mySQL. Another example is the POSIX standard for operating system interfaces, which has reduced switching costs from other UNIX-based environments to Linux.
The innovation-led model results when universities, public institutions, and corporations produce work that has no direct equivalent in the commercial sector. As we shall see later, innovation is the product of the private sector, either directly (e.g., the Qt multi-platform interface system) or by a spin-off of a successful research project. As an example of the latter the University of California developed the Postgres database management system as a research project (Stonebraker and Rowe, 1986). After an unsuccessful commercialization attempt a private company took over the development of Postgres, added SQL support, named the resulting product PostgreSQL, and made it available as open source.
WHO BUILDS OPEN-SOURCE GIS SOFTWARE?
In order to conduct a more detailed analysis of the GIS open-source software developers, we conducted a survey of 70 GIS open-source projects, mainly using a listing provided by the freegis.org site,3 a repository for open-source software. Based on size, geographical distribution, and affiliation we distinguished three categories of open-source software-development teams:
Individual-led projects. The project team consists of one to three individuals, usually from the same location and working in their spare time. The software products usually are small specialized applications that address specific requirements. In general the developer of the software is also its first user. Examples include the Vis5D visualization tool (Hibbard et al. 1994), the Gstat geostatistical package (Pebesma and Wesseling, 1998), and the shapelib library for reading ArcView® shapefiles.
Collaborative networks. The project core team consists of a team of 15 to 30 individuals geographically distributed. The developers usually have a separate job and do their work in their spare time, or in part-time allocated in agreement with their employer. Examples include the GRASS spatial analysis toolkit and the R collection of statistical functions.
Corporation-based. The project core team is part of an institution and is usually a group of three to eight programmers. There can be outside collaborators, but the main design decisions are made within the institution and in some cases should also address the commercial objectives of these corporations. Examples include the PostGIS extension to the PostgreSQL database management system, and the TerraVision systems for terrain visualization on the Internet.
For additional information, see the FreeGIS project at http://www.freegis.org/.
We characterized each product according to its intellectual-property model and its development team. The results contradict the naïve view of open-source projects as a product of committed teams, based on peer pressure. More than half of the projects are led by individuals, and only four (6 percent) are based on a loose network of collaborators. The presence of corporation-based projects is very strong, with 41 percent of all cases examined. The results are further proof that all software, either open or closed source, is constrained by the essential properties of its development process: conceptual design, program granularity, cohesion of the programming team, and dissemination strategy.
The relatively small proportion of innovative projects (19 percent) indicates that the design of most open-source software products is based on the postmature and standards-led production models, where the main aim is not directly to produce innovation but to lower licensing costs and to break commercial monopolies. The strong presence of standards-led products is also a direct reflection of the influence of the OpenGIS consortium in the developer’s community. This result further illustrates the notion that the hardest part of software development is the conceptual design of the intended product (Brooks, 1987). The two innovative projects developed by a networked team of programmers are GRASS and R. Both products have a simple and well-understood conceptual design, and their innovative contribution lies not in their design but in the analysis functions that scientists develop using these environments.
Out of the 29 corporation-based institutions involved in developing open-source GIS, 17 are private companies, 8 are government institutions, and only 4 are universities. This result indicates that the research community is usually not interested in a direct involvement in long-term, open-source projects. Maintaining and supporting an open-source software project requires considerable resources beyond the reach of most university groups. For a research prototype to evolve into an open-source product a team of developers must take over from the original research team and establish a support and maintenance infrastructure for the product.
Problem granularity is another important factor for open-source projects, and each type of software induces a different breakdown strategy. In most cases there is a strong limit on module size, which forces successful open-source products to be the products of small teams. The fact that GRASS consists of a set of independent executables is evidence that open-source development by distributed teams requires a software structure that can be broken into small, manageable parts.
Our survey of the open-source GIS projects also considered the maturity, support, and functionality of each product. We measured the maturity of a project by three factors: (1) the number of software releases; (2) the amount of changes in each release; and (3) the achievement of the project’s stated goals. For assessment of support we investigated whether the project had an established maintenance team, and evaluated the mailing lists, bug indicators, and improvement requests. Evaluation of the concept of functionality considered the number of modules and the difficulty of the algorithms involved. Each project was graded on a scale from 1 to 5, where 5 is best.
The results indicate a significant difference in all three aspects (maturity, support, and functionality) between individual-led products and corporation-based ones. This indicates that the corporate environment is better suited for long-term software development than an individual’s perspective. Individuals are constrained by their duties, which very rarely include a full-time support for open-source software development, whereas many corporations rely on earning indirect revenues (e.g., consultancy fees) from their open-source products. In many cases the corporation might be performing a public service or developing the product based on public funding. The results also indicate that the difference between a corporation and a collaborative network team is much smaller. This is consistent with the overall picture of the open-source world, that a committed team of individuals can produce results that are comparable (or better) than that produced by corporations.
USING AND PRODUCING OPEN-SOURCE SOFTWARE IN DEVELOPING NATIONS
The preceding sections have examined the nature of open-source software development and outlined the main characteristics of its production. We have argued that most mature and successful products require the establishment of organizational structures dedicated to their production. The consequences for developing nations are significant. Many developing nations are currently actively considering policies to support or enforce the adoption
of open-source software by public institutions (Dravis, 2002). The arguments in favor of adoption by public institutions include (Ghosh et al., 2002):
Lower cost. Adoption of personal computers based on open-source software for public use can reduce initial entry cost by as much as 50 percent. Easier replication of solutions is also possible. Large-scale public projects can greatly benefit from having a prototype developed and tested that can then be replicated across the country with no additional software costs.
Independence from proprietary technology. Many governments are increasingly concerned with overdependence in some important markets on a small number of vendors.
Security. Governments and governmental agencies are becoming aware of the risks they are subject to when adopting proprietary software solutions in sensitive areas, such as e-government, e-procurement, elections, and public finance.
Availability of efficient and low-cost software. The virtuous examples of some products (such as Linux and Apache) have encouraged statements about the widespread availability of open-source software for public use.
Ability to develop custom applications and to redistribute the improved products. Given the open nature of open-source software, skilled local programmers could adapt the software to fit local needs and thus increase the efficiency of the services provided by the improved products.
The authors consider that there is enough empirical evidence to support the first three claims, but the issues regarding software availability and ease of customization are far more problematic and require a much closer examination. Most successful open-source software tools are infrastructure products, such as operating systems, programming languages, and Web servers. By contrast, the number of mature open-source, end-user applications is much smaller (Schmidt and Schnitzer, 2002). Operating systems, compilers, and Web servers respond to the needs of technically qualified information technology professionals, who can more easily adapt to the demands of products where support might be available only on the Internet and might require expertise in the English language.
There is a huge demand for end-user applications in developing nations, especially in the public sector. However, our survey indicates that corporations dominate open-source software development. These corporations will develop software based on their strategic interests, which are unlikely to include the full range of end-user applications needed by developing countries. Therefore, if governments in developing nations aim to profit from the potential benefits of open source, they must intervene and dedicate a substantial amount of public funds to support the establishment and long-term maintenance of open-source software projects.
The benefits of this strategy could be substantial. Consider, for example, the case of urban cadastral systems based on a spatial database for medium-size cities. The typical base cost of a commercial spatial database solution for one city is $100,000. If 10 cities were to adopt such a solution in a given year, there would be a saving of $1 million per year on licensing fees, which could finance local development and local adaptation.
There is also a substantial additional benefit of investing in qualified human resources. Government strategies for supporting indigenous open-source software development and adaptation would result in a learning-by-doing process. Such processes, as opposed to learning-by-using, are credited with fostering innovation in the developed world (Landes, 1999), and the same lessons could apply to those nations supporting emerging economies.
As an example of government-funded projects, a group of research and development institutions in Brazil is currently developing TerraLib,4 an open-source GIS library that enables quick development of custom-built applications for spatial data analysis. As a research tool TerraLib aims to enable the development of GIS prototypes that would include recent advances in geoinformation science. On a practical side TerraLib enables quick development of custom-built applications using spatial databases. Projects such as TerraLib show that open-source GIS projects can make substantial contributions to the spatial information community by providing a platform for innovation and collaborative development (Câmara et al., 2000).
This work examines the nature of open-source software development, by looking in detail at the application area of geoinformation technology. We surveyed 70 open-source GIS software projects and concluded that the Linux paradigm is the exception rather than the rule, and that corporations are the main developers of successful open-source products. Since networked teams develop only 6 percent of the all open-source GIS products, our result refutes the view that open-source software development defines a new mode of production. As established by extensive research, good software design and development are the products of qualified teams that operate at a high level of interaction. Developing software in a decentralized manner requires a modular design, which is difficult to achieve for most applications, since few software products can be broken in very small parts without a substantial increase in interaction costs.
The direct participation of universities in open-source software is limited due to the conflict between the generation of new research ideas and the need for long-term software maintenance and upgrades. As a result innovative projects account for less than 20 percent of the total and a large proportion of the projects (53 percent) simply aim to provide standardized components for spatial data processing. Individuals or small teams develop more than half of the products surveyed, and their best results are specialized applications aimed at conversion and visualization of data in established formats. Corporations account for 41 percent of all products and have a much better quality than individual-led software. This demonstrates that the impetus behind open-source software is not coming from altruistic individuals working in the midnight hour, but from professional programmers.
These results have important consequences for public policy guidance. First, good open-source software is the product of corporations, which will build them based on their strategic intents. Therefore, governments worldwide that try to benefit from the open-source software model by simply establishing legislation that mandates its use could be frustrated in their objectives, because of the lack of suitable public-sector applications. In order to create the software they need governments need to establish public-funded projects for open-source development and adaptation to local needs. Failure to understand the open-source development model will result in a lost opportunity for the developing world to reduce the current technological gap between the rich and poor nations.
Anselin, L, P. Longley, M. Goodchild, D. Maguire, and D. Rhind. 1999. “Interactive Techniques and Exploratory Spatial Data Analysis,” Geographical Information Systems: Principles, Techniques, Management and Applications. Geoinformation International, Cambridge.
Barton, J., D. Alexander, C. Correa, R. Mashelkar, G. Samuels, and S. Thomas. 2002. Integrating Intellectual Property Rights and Development Policy, UK Department for International Development, London.
Batty, M. 2000. “GeoComputation,” in GeoComputation Using Cellular Automata, eds. S. Openshaw and R. J. Abrahart, Taylor & Francis, London, 95-126.
Benkler, Y. 2003. “Coase’s Penguin, or, Linux and The Nature of the Firm,” Yale Law Journal 112, winter 2002-2003.
Brooks, F. 1987. “No Silver Bullet: Essence and Accidents of Software Engineering,” IEEE Computer, 20(4):10-19, April.
Burrough, P. 1998. “Dynamic Modelling and Geocomputation,” Geocomputation: A Primer, eds. P. Longley, S. Brooks, R. McDonnell and B. Macmillan, John Wiley, New York.
Câmara, G., R. Souza, B. Pedrosa, L. Vinhas, A. Monteiro, J. Paiva, M. Carvalho, and M. Gattass. 2000. TerraLib: Technology in Support of GIS Innovation , II Workshop Brasileiro de Geoinformática, GeoInfo2000, Instituto Nacional de Pesquisas Espaciais, São Paulo.
Dravis, P. 2002. “A Survey on Open Source Software,” The Dravis Group, San Francisco.
Egenhofer, M. 1999. Spatial Information Appliances: A Next Generation of Geographic Information Systems, First Brazilian Workshop on GeoInformatics, Campinas, Brazil, Instituto Nacional de Pesquisas Espaciais, Sao Paulo.
Erwig, M., R. H. Güting, M. Schneider, and M. Vazirgiannis. 1999. “Spatio-Temporal Data Types: An Approach to Modeling and Querying Moving Objects in Databases,” GeoInformatica 3(3):269-296.
Fonseca, F., M. Egenhofer, P. Agouris, and G. Câmara. 2002. “Using Ontologies for Integrated Geographic Information Systems,” Transactions in GIS 6(3):231-257.
Ghosh, R. A., B. Krieger, R. Glott, and G. Robles. 2002. Open Source Software in the Public Sector: Policy within the European Union, International Institute of Infonomics, University of Maastricht, Maastricht.
Hibbard, W., B. Paul, A. Battaiola, and D. Santek. 1994. “Interactive Visualization of Earth and Space Science Computations,” IEEE Computer 27(7):65-72.
Kogut, B., and A. Metiu. 2001. Distributed Knowledge and the Global Organization of Software Development, Massachusetts Institute of Technology Press, Cambridge, MA.
Landes, D. S. 1999. The Wealth and Poverty of Nations. W. W. Norton & Co, New York.
MacDonald, J. 2002. The Earth Observation Business and the Forces That Impact It, Earth Observation Business Network 2002, MacDonald Dettwiler, Vancouver, CA.
Pebesma, E. J., and C. G. Wesseling. 1998. “Gstat: A Program for Geostatistical Modelling, Prediction and Simulation,” Computers & Geosciences 24(1):17-31.
Schmidt, K. M., and M. Schnitzer. 2002. Public Subsidies for Open Source? Some Economic Policy Issues of the Software Market, Seminar for Economic Theory, Ludwig Maximilian University, Munich.
Stonebraker, M., and L. A. Rowe. 1986. “The Design of POSTGRES,” in Proceedings of ACM-SIGMOD International Conference on the Management of Data, ACM Press, Washington, D.C., pp. 340-355.
Weber, S. 2000. “The Political Economy of Open Source Software,” Berkeley Roundtable on the International Economy, University of California, Berkeley.