The NSF Special Report on Cyberinfrastructure states,

[E]nvironments and organizations, enabled by cyberinfrastructure, are increasingly required to address national and global priorities, such as understanding global climate change, protecting our natural environment, applying genomics-proteomics to human health, maintaining national security, mastering the world of nanotechnology, and predicting and protecting against natural and human disasters, as well as to address some of our most fundamental intellectual questions such as the formation of the universe and the fundamental character of matter.

That's how the NSF Blue Ribbon Advisory Committee in their report, Revolutionizing Science and Engineering Through Cyberinfrastructure, summarized the scientific need for cyberinfrastructure. The new capabilities are 'essential, not optional, to the aspirations of research communities,' the report states.
Indeed, the demand for cyberinfrastructure has come from research communities that recognize the many ways it will allow them to push the scientific envelope. While Geoscience has been one of the early adopters of information technology (IT), there is the potential for greater adoption of modern IT approaches and a corresponding need for training in this field.

The Cyberinfrastructure Summer Institute for Geoscientists (CSIG) has been designed to provide such training, with the goal of improving the understanding of IT concepts among geoscientists and familiarizing graduates students and researchers with new IT tools. CSIG 2005 builds upon the successful first CSIG offering in August 2004 (

The week-long curriculum is focused on IT topics that serve the immediate science goals of the community. The format provides a quick introduction to key IT topics with some hands-on exercises. The overall course structure is as follows. The topic for Day 1 is Data Modeling, with emphasis on scientific data. Day 2 will introduce the concept of Web Services with hands-on exercises, and also provide an introduction to Grid Services. Day 3 will cover GIS concepts and software and will discuss Web services-based standards for GIS information. Day 4 will discuss issues related to data integration and map integration, including advanced concepts such as semantic data integration using ontologies. There will be a discussion of technologies related to Data Registration, Ontology-based Search, and Data Integration. Finally, Day 5 will provide an introduction to scientific workflow systems and their role in scientific analysis and collaboration.

The goal of the CSIG is to provide geoscientists an "IT headstart", and to expand the community of IT users in earth science research.

Workshop Course Schedule*
NOTE: Several of the sessions below include hands-on segments

Monday Tuesday Wednesday Thursday Friday

Monday, July 18, 2005
9:00AM - 5:00PM

9:00 am

Welcome & Introduction (15 Minutes)


Data Management
(Instructors: Chaitan Baru, Kai Lin)

9:15 am Basics of Data Modeling
(Chaitan Baru)
  • Overview
  • Data Models
    • Scientific data
    • Observational and field data
    • Model outputs
    • Spatial data
  • Separation of application logic from data representations
  • Foundations of the relational data model: relations, attributes, keys, introduction to SQL
10:15 am

10:30 am The Relational Data Model and SQL
(Chaitan Baru)
12:00 pm

1:00 pm Modeling Scientific Data [hands-on exercise]
(Kai Lin, Viswanath Nandigam)
  • Data versus metadata
  • Entity Relationship model, UML
  • Logical and physical database design
2:45 pm

3:00 pm Modeling Scientific Data [hands-on example]
(Kai Lin)
  • XML and semi-structured data
  • XML, DTDs and XML Schemas
  • Querying and transforming XML: XPath, XQuery, XSLT
  • OWL and RDF
5:00 pm

Tuesday, July 19, 2005
8:30AM - 5:00PM


Web Services
(Instructors: Ashraf Memon, Longjiang Ding, Ghulam Memon)

8:30 am Overview
(Ashraf Memon)
  • The need for web services
  • Service-oriented architecture (SOA)
  • Core technologies: XML, SOAP, WSDL
9:00 am Establishing a foundation for web services
(Ashraf Memon)
  • XML with examples
  • The Simple Object Access Protocol (SOAP)
    • SOAP message structure
    • The message envelope, header and body
  • The WSDL
    • The role of WSDL
    • Identifying operations and messages
9:45 am

10:00 am Creating Web services [hands-on training]
(Ashraf Memon, Ghulam Memon, Longjiang Ding)
  • Writing service classes in Java
  • Generating web service from a class
  • Deploying web services with Apache Axis
  • Generating client files and testing them
12:00 pm

1:00 pm Creating Web services [hands-on training] (cont'd)
(Ashraf Memon, Ghulam Memon, Longjiang Ding)
  • Creating a web service for data access (ASCII files, Database)
  • Creating a web service from an existing program/command line tools (implementing algorithm or process)
3:00 pm

3:15 pm Hands-on training on tools to consume existing web services from WSDL
(Ashraf Memon, Ghulam Memon, Longjiang Ding)
  • Creating a web service client using wsdl2java tool
  • Testing the generated client
4:15 pm Advanced topics and pointer to references
(Ashraf Memon)
  • Tools for Web Services (Eclipse 3.0, others, etc.)
  • Overview of Web Services Security
  • Examples of Web Services in other languages
  • Online tutorials
  • Discussion and reading material
5:00 pm

Wednesday, July 20, 2005
8:30AM - 5:00PM


Geographic Information Systems (GIS)
(Instructors: Ilya Zaslavsky, Ashraf Memon, Ghulam Memon)

8:30 am Introduction
(Ilya Zaslavsky)
  • Basics of GIS
  • Setting up with Virtual Campus accounts
  • GIS software
  • Review of GIS principles and applications
10:15 am

10:30 am Introduction (cont'd)
(Ilya Zaslavsky, Ashraf Memon, Ghulam Memon)
  • Hands-on training on Virtual Campus courses
  • Making maps [lecture and hands-on]
    • Making simple maps with ArcGIS and other tools
12:00 pm

1:00 pm Geo-databases [lecture and hands-on]
(Ilya Zaslavsky, Ashraf Memon, Ghulam Memon)
  • Logic of GIS applications; suitability analysis/site selection.
  • Operations on maps, map queries; map combination rules, and some geometric problems of map combination.
  • Buffers and neighborhoods. Surface modeling and representation; operations on surfaces.
2:30 pm GIS Data sources and Internet mapping [lecture and hands-on]
(Ilya Zaslavsky, Ashraf Memon, Ghulam Memon)
  • GIS data sources
  • GML, Internet GIS
  • Internet map servers
3:00 pm

3:15 pm GIS Data sources and Internet mapping [lecture and hands-on] (cont'd)
(Ilya Zaslavsky, Ashraf Memon, Ghulam Memon)
5:00 pm

Thursday, July 21, 2005
8:30AM - 5:00PM


Information Integration and knowledge representation
(Instructors: Kai Lin, Ashraf Memon, Chaitan Baru, Doug Greer)

8:30 am Distributed Database Concepts
(Chaitan Baru, Kai Lin, Doug Greer)
  • Concepts of distributed database
  • Federated Databases for the Geosciences
10:00 am

10:15 am Knowledge Representation
(Kai Lin)
  • Data semantics and integrity constraints
  • Knowledge representation languages: concept maps, controlled vocabularies, ontologies
  • Standards (RDF, OWL, ...)
  • Ontology-based integration
11:15 am Map Integration
(Ashraf Memon)
  • Knowledge-based integration of WMS (Web Mapping Services) maps
  • Integration using WFS (Web Feature Services)
  • Demonstration and discussion of GEON spatial data registration and online mapping
12:00 pm

1:00 pm Web services in GEON with example
(Kai Lin, Ashraf Memon)
  • Mapping Services
  • Ontology Services
  • Registration Services
  • Integration Services
2:30 pm Putting It All Together
(Kai Lin, Ashraf Memon)
  • Semantic data annotation and registration
  • Ontology-based data discovery, browsing, and querying
  • Hands-on Session/Demonstrations
3:00 pm

3:15 pm Putting It All Together (cont'd)
(Kai Lin, Ashraf Memon)

Web Services Security
(Instructor: Sriram Krishnan)

4:00 pm Web Services security
(Sriram Krishnan)
  • Why security in Web Services
  • Different security models for implementation overview
5:00 pm

Friday, July 22, 2005
8:30AM - 3:00PM


Scientific Workflows
(Instructor: Efrat Jaeger)

8:30 am Scientific Workflows
  • Overview on Scientific Workflows
  • Introduction to the Kepler System
  • Demonstration: Workflow Creation and Execution
  • Workflow examples in Geosciences
  • Hands-on exercises (if time allows)
    • Customizing "actors" in Kepler. Defining Web services-based actors using database and command line actors
    • Linking or chaining actors together and executing a workflow
    • Creating nested workflows
    • Modifying an existing workflow
10:15 am

Parallel Computing
(Instructor: Tim Kaiser)

10:30 am Parallel Computing
  • Definition of parallel computing
  • Advantages and disadvantages of parallel computing
  • Types of parallel computing
    • Shared memory
    • Message passing
12:00 pm

1:00 pm Parallel Computing (cont'd)
  • Introduction to Message Passing Interface (MPI)
  • An example scientific Application using MPI
  • Overview of resources
3:00 pm

