Computational clusters
From Wiki
Computational clusters for GIS and remote sensing
Driven by personal experience of long processing times for basic operations in analysis of long time series of raster maps, I'm investigating software tools for simple cluster computing for the typical instructional GIS lab. I am using GRASS, the GIS system originally developed by the U.S. Army Construction Engineering Research Laboratories (USA-CERL, 1982-1995), a branch of the US Army Corp of Engineers, as a tool for land management and environmental planning by the military. GRASS is now Free (Libre) Software/Open Source released under GNU General Public License (GPL), and has evolved into a powerful system with a wide range of applications in many different areas of scientific research. It has been slow in expanding its user base compared to more modern GIS systems, due to its origins in UNIX and other engineering heritage computer systems where a command-line based user interface is typical. Although GRASS has been adapted repeatedly to provide a graphic user interface (GUI) it is still harder to learn and use than most other workstation GIS systems. The most recent of these efforts is a packaging of a full Win32 GRASS installation with the Quantum GIS system, an evolving open source GIS viewer and editor which has become powerful and flexible enough to serve many needs in teaching and research.
A command-line interface is ideal, however, for scripting of batch oriented processing as is typical in processing of bulk datasets like raster time series. Using Cygwin, another collection of UNIX/Linux heritage software tools ported to the Win32 environment, I am developing techniques for simple control of coarse-grain parallel processing of such bulk datasets, for off-hours use of an instructional laboratory as a computational cluster. As in all such parallel processing work, the nature and the payoff of the necessary programming effort depends strongly on the type of problem addressed: while it may be easy, with 16 computers, to achieve 16x speedup on problems which repeat a GIS operation on each of a series of 16 map files, it may be difficult to achive any speedup of the same operation on a single map 16x as large -- depending on the operation. GRASS and many other GIS systems are structured around file data objects, which are natural data sources and sinks in a parallel computation environment; but database-resident objects are also common. Database and file servers are congestion points for data access, which, with network bandwidth, complicate algorithm design and implementation.
To simplify use of such systems for practical work on GIS and remote sensing data, I am investigating methods for automatically measuring DBMS, filesystem, cluster and network performance parameters to provide a configuration reference for implementation of algorithms that can adapt to available cluster resources. For my own work on time series analysis, I am developing a library of such algorithms, structured around existing GRASS libraries, which can be used either through a scripting tool or through the Quantum GIS user interface. This library could also be implemented using commercial GIS products such as ArcGIS and Idrisi. If time permits selected modules may be developed for each such environment for benchmarking. Other cluster parameters and components should also be varied in benchmarking, such as processor power, 1000/100/10Mb Ethernet lab network, and DBMS vs filesystem source/sink of data objects.
For more information see http://gis.uml.edu/eeas, UML-EEAS-GIS Lab current activity, and UML-EEAS-GIS Lab current status.
- WinXP mixed-use lab (scheduled instructional, student, computational use)
- GRASS raster and vector GIS computational toolset
- MS VS2005 or Cygwin tool development environment
- MPI interprocessor control
- Quantum GIS desktop visualization and authoring for mapserver presentation
