Apply Here

Program Information

General Information




Program Information


Information Technology (IT) is changing all aspects of life, including how scientific research is conducted. Over the years, a few grass-roots efforts developed in the geosciences that started to take advantage of these new resources. However, the impact of IT in the geosciences has not been at a high level yet. To improve understanding of IT and familiarize a group of students and researchers with these new tools, we have designed a Summer Institute for Geoscientists lecture series primarily focused on immediate needs of the community in the area of information technology. The goal of this Summer Institute is to educate a group of earth scientists and expand the community of IT users in earth science research.

With the broad and rapid adoption of IT in science, and the advent of major initiatives such as the NSF Cyberinfrastructure program and UK’s E-science activity, it is essential that the community at large be in a good position to take full advantage of new opportunities to further improve research and educational activities via IT-enabled approaches. This Summer Institute is a step in this direction and designed as an educational and outreach activity in the geosciences primarily focusing on immediate needs. We have chosen a format that provides a quick introduction to six key IT topics that are relevant to the overall science goals of the community in the immediate term. Each course will be taught by an expert in that field and assisted by other experts from the San Diego Supercomputer Center and NCAR. Lectures will be followed with hands-on lab exercises. In the following sections we describe each of these courses in detail.

Workshop Course Schedule

Monday Tuesday Wednesday Thursday Friday

Monday, August 16, 2004
9:00AM - 5:00PM


Welcome & Introduction

Data Management
(Instructors: Bertram Ludaescher, Kai Lin)

NOTE: Several of the sessions below include hands-on segments

  • Relational Data Model
    • Introduction/background
    • Foundations: relations, attributes, keys, ...
    • Introduction to SQL and Datalog
  • Data Management I
    • XML & semistructured data
    • DTDs and XML Schemas
    • Querying and transforming XML: XPath, XQuery, XSLT


  • XML Data Management II
    • Querying and transforming XML: XPath, XQuery, XSLT
    • Data Mediation
  • Database Design
    • Conceptual modeling overview
    • Entity Relationship model, UML, ...
      • Logical and physical database design

Tuesday, August 17, 2004
8:30AM - 5:00PM


Data Management (Continued)

  • From Data to Knowledge Representation
    • Data semantics and integrity constraints
    • Knowledge representation languages: concept maps, controlled vocabularies, ontologies
    • Standards (RDF, OWL, ...)
  • Putting It All Together
    • Semantic data annotation and registration
    • Ontology-based data discovery, browsing, and querying
    • Hands-on Session/Demonstrations


Geographic Information Systems

  • Introduction
    (Instructor Ilya Zaslavsky)
    • Basics of GIS
    • Setting up with Virtual Campus accounts
    • GIS software
    • Review of GIS principles and applications
  • Hands on training on Virtual Campus courses
    (Instructors: Ilya Zaslavsky, Reza Wahadj)
  • GIS Mapping [lecture and hands on]
    (Instructors: Ilya Zaslavsky, Reza Wahadj)
    • GIS roots in cartography.
    • What is map information; spatial data structures and models.
    • Types of maps in GIS.
    • Making simple maps with ArcGIS.
    • Good maps and bad maps.

Wednesday, August 18, 2004
8:30AM - 5:00PM


Geographic Information Systems (Continued)

  • Geo-databases [lecture and hands on]
    (Instructors: Ilya Zaslavsky, Reza Wahadj)
    • Logic of GIS applications; suitability analysis/site selection.
    • Operations on maps, map queries; map combination rules, and some geometric problems of map combination.
    • Buffers and neighborhoods. Surface modeling and representation; operations on surfaces.
  • Planning and designing GIS Application [lecture and hands on] (
    (Instructors: Ilya Zaslavsky, Reza Wahadj)
  • GIS Data sources and Internet mapping [lecture and hands on]
    (Instructors: Ilya Zaslavsky, Ashraf Memon, Reza Wahadj)
    • GIS data sources;
    • XML in online GIS,
    • GML, Internet GIS,
    • Internet map servers,
    • ArcIMS, GIS infrastructure


  • GIS Data sources and Internet mapping (Continued)
  • GIS in Geo-informatics Network
    (Instructors: Ashraf Memon, Ilya Zaslavsky)
    • Demonstration of GEON spatial data registration and online mapping
  • Questions
    (Instructors: Ilya Zaslavsky, Ashraf Memon)

Thursday, August 19, 2004
8:30AM - 5:00PM


Web Services

  • Overview
    (Instructor: Ashraf Memon)
    • The need for Web services
    • Service-oriented architecture (SOA)
    • Core technologies: XML, SOAP, WSDL
  • Establishing a foundation for web services
    (Instructor: Ashraf Memon)
    • XML with examples
    • The Simple Object Access Protocol (SOAP)
    • SOAP message structure
    • The message envelope, header and body
    • The role of WSDL
    • Identifying operations and messages
  • Creating Web services [hands on training]
    (Instructors: Ashraf Memon, Karan Bhatia, Longjiang Ding)
    • Writing service classes in Java
    • Generating service
    • Deploying services with Apache Axis
    • Deploying services with Apache Axis
    • Generating client files and testing them
    • Examples of web service in other languages


  • Creating data access [hands on training]
    (Instructors: Ashraf Memon, Karan Bhatia, Longjiang Ding)
    • Creating a web service for data access (ASCII, Database)
    • Creating a web service from an existing program/class (implementing algorithm or process)
    • Packaging, deploying, generating client and testing them Chaining of multiple web services
  • Chaining of multiple web services
    (Instructor: Ashraf Memon)
    • Introduction to spatial web services
    • Examples of existing web services
    • Chaining of existing web services
  • Running example advanced topics and pointer to references
    (Instructors: Ashraf Memon)
    • Security
    • Tools
    • Online tutorials
    • Reading material
  • Questions
    (Instructors: Ashraf Memon, Longjiang Ding, Karan Bhatia)


6:00PM Dinner Meeting
(Speaker: Chaitan Baru - "GEON")

Friday, August 20, 2004
8:30AM - 3:00PM


Grid Services
(Instructor: Karan Bhatia)
At the high-level, Grid Computing consists of a methodology for creating "virtual organizations" (VO), that is, organizations that are composed of individuals and resources spread throughout other(non-virtual) organizations (NVO). Typically, the resources and individuals of the VO are geographically distributed, and resources are owned and managed by the NVO and donated for use in the VO subject to various security and use conditions. There are many technical challenges inherent in building a VO, including security, data management, resource discovery, user interfaces, application development. In this talk we address each of these issues, illustrating the current state-of-the-art and best practices within the grid community. In addition, as grid services merge with web services, its important to understand the similarities and differences. As such, the talk discusses the new services-based architecture for "grid services" and current efforts to make use of them.

Scientific Workflows
(Instructors: Ilkay Altintas, Bertram Ludaescher)

  • Scientific Workflows I
    • Overview on Scientific Workflows
    • Introduction to the Kepler Systetm
    • Demonstration: Workflow Creation and Execution


Parallel Computing
(Instructor: Tim Kaiser)

  • Definition of parallel computing
  • Advantages and disadvantages of parallel computing III. Types of parallel computing
    • Shared memory
    • Message passing
  • Introduction to Message Passing Interface (MPI)
  • An example scientific Application using MPI Vi. Overview of resources

We will start our talk with a definition of parallel high performance computing (HPC). This definition will lead us to a discussion of the advantages of HPC and then to the disadvantages of parallel computing.

There are many ways to categorize parallel computing. We will break the subject down into the shared memory and message passing paradigms. An overview will be given of each. We will then spend the remaining time discussing message passing parallel computing.

The primary library used for message passing is the Message Passing Interface (MPI). We will show simple examples of MPI based programs. This will lead us to the discussion of a simple scientific application. We will show how this application can be parallelized using MPI. Parallelization of this application demonstrates many of the common tasks in creating a real parallel program.

We will finish our discussion with an overview of some resources available for parallel programming.