Anda di halaman 1dari 19

Practical 5 Aim: Study of Globus toolkit 5.

0 and new features of improved versions

The open source Globus Toolkit is a fundamental enabling technology for the "Grid," letting people share computing power, databases, and other tools securely online across corporate, institutional, and geographic boundaries without sacrificing local autonomy. The toolkit includes software services and libraries for resource monitoring, discovery, and management, plus security and file management. The Globus Toolkit is a substrate on which leading IT companies are building significant commercial Grid products. The toolkit includes software for security, information infrastructure, resource management, data management, communication, fault detection, and portability. It is packaged as a set of components that can be used either independently or together to develop applications. Every organization has unique modes of operation, and collaboration between multiple organizations is hindered by incompatibility of resources such as data archives, computers, and networks.

Component Details
GSI C COMPONENT Component Overview The Globus Toolkit Pre-Web Services Authentication and Authorization component provides APIs and tools for authentication, authorization and certificate management. The authentication API is built using Public Key Infrastructure (PKI) technologies, e.g. X.509 Certificates and TLS. In addition to authentication it features a delegation mechanism based upon X.509 Proxy Certificates. Authorization support takes the form of a couple of APIs. The first provides a generic authorization API that allows callouts to perform access control based on the client's credentials (i.e. the X.509 certificate chain). The second provides a simple access control list that maps authorized remote entities to local (system) user names. The second mechanism also provides callouts that allow third parties to override the default behavior and is currently used in the Gatekeeper and GridFTP servers. Feature summary Features new in GT 5.0.0 Support for processing host certificates containing X.509 subjectAltName extensions with dNSName or iPAddress values.

Other Supported Features Authentication of user using standard X.509 End Entity and Proxy Certificates. Delegation using X.509 Proxy Certificates. Pluggable authorization based on the client's certificate chain for GridFTP and GRAM2. Pluggable authorization for GRAM2 based on the RSL of the job.

Technology dependencies The GSI C component depends on the following GT components:

C Common Libraries

The GSI C component depends on the following 3rd party software:

OpenSSL

Tested platforms Tested platforms for GSI C: i386 Linux Backward compatibility summary

Protocol changes in GSI C since GT 4.2.x

None

API changes since GT 4.2.x

None

Exception changes since GT 4.2.x

Not applicable

Schema changes since GT 4.2.x

Not applicable

Associated Standards Associated standards for GSI C:


RFC 3820 Proxy Certificates RFC 2744 GSSAPI: C-bindings RFC 2743 GSSAPI GSSAPI Extensions RFC 2246 TLS

GSI OPEN SSH COMPONENT Component Overview GSI-OpenSSH is a modified version of OpenSSH that adds support for X.509 proxy certificate authentication and delegation, providing a single sign-on remote login and file transfer service. GSI-OpenSSH can be used to login to remote systems and transfer files between systems without entering a password, relying instead on a valid proxy credential for authentication. GSIOpenSSH forwards proxy credentials to the remote system on login, so commands requiring proxy credentials (including GSI-OpenSSH commands) can be used on the remote system without the need to manually create a new proxy credential on that system. Feature summary Features new in GT 5.0.0

None.

Other Supported Features The gsissh command provides a secure remote login service with forwarding of X.509 proxy credentials.

The gsiscp and gsisftp commands provide a secure file transfer service authenticated with X.509 proxy credentials, mimicking the rcp/scp and ftp/sftp commands. All standard OpenSSH features are supported, excluding Kerberos authentication. Kerberos authentication is not compatible with GSI-enabled OpenSSH. The GSI-OpenSSH server can replace the standard system SSH server in typical environments. If no username is given on the command-line, GSI-OpenSSH automatically determines the username that corresponds to the X.509 proxy certificate subject in the server's grid-mapfile.

Summary of Changes in GSI-OpenSSH GT 5.0.0 contains GSI-OpenSSH version 4.7. GSI-OpenSSH clients now attempt only GSI authentication by default, rather than the confusing behavior of attempting other SSH authentication methods when GSI authentication fails. (The GSI-OpenSSH server still supports both GSI and non-GSI authentication methods by default, for compatibility with both GSI and non-GSI clients.) See the GSI-OpenSSH Release History for more details on this and other GSIOpenSSH versions. Technology dependencies GSI-OpenSSH depends on the following GT components:

GSI C

GSI-OpenSSH depends on the following 3rd party software:

OpenSSH

Tested platforms Tested Platforms for GSI-OpenSSH


Mac OS X 10.5 x86/x86_64 GNU/Linux PPC AIX 5.3 Sun4u Solaris 5.10

Backward compatibility summary GSI-OpenSSH is backward compatible. SIMPLECA COMPONENT Component Overview SimpleCA provides a simple implementation of a certification authority which can issue X.509 certificates to Globus Toolkit users and services.

Feature Summary Features new in GT 5.0.0

None

Other Supported Features


Easy creation of X.509 certificates for use with the Globus Toolkit Easy creation of GPT packages for the created SimpleCA

Summary of Changes in SimpleCA Oher than bugfixes, no changes have occurred for SimpleCA since the last stable release, 4.2.x. Technology Dependencies SimpleCA depends on the following GT components:

Non-WS Authentication and Authorization

SimpleCA depends on the following 3rd party software:

OpenSSL Backward Compatibility Summary

Protocol changes for SimpleCA since GT 4.2.x

Not applicable

API changes since GT 4.2.x

Not applicable

Exception changes since GT 4.2.x

Not applicable

Schema changes since GT 4.2.x

Not applicable

MY PROXY COMPONENT Component Overview

MyProxy is open source software for managing X.509 Public Key Infrastructure (PKI) security credentials (certificates and private keys). MyProxy combines an online credential repository with an online certificate authority to allow users to securely obtain credentials when and where needed. Users run myproxy-logon to authenticate and obtain credentials, including trusted CA certificates and Certificate Revocation Lists (CRLs). Feature summary Supported Features Users can obtain certificates and trust roots from the MyProxy CA using myproxy-logon. Users can store and retrieve multiple X.509 proxy credentials using myproxy-init and myproxy-logon. Users can store and retrieve multiple X.509 end-entity credentials using myproxystore and myproxy-retrieve. Users and administrators can manage trustroots (CA certificates and CRLs) using myproxy-logon and myproxy-get-trustroots. Administrators can load the repository with X.509 end-entity credentials on the users' behalf using myproxy-admin-load-credential. Administrators can use the myproxy-admin-adduser command to create user credentials and load them into the MyProxy repository. Administrators can use the myproxy-admin-addservice command to create host credentials and load them into the MyProxy repository. Users and administrators can set access control policies on the credentials in the repository. If allowed by policy, job managers (such as Condor-G) can renew credentials before they expire. The MyProxy server enforces local site passphrase policies using a configurable external call-out.

Summary of Changes in MyProxy GT 5.0.0 contains MyProxy v5.0. MyProxy support for managing trust roots (CA certificates and CRLs) has improved, including a new myproxy-get-trustroots command. See the MyProxy Release Notes for more details on this and other MyProxy versions. Technology dependencies MyProxy depends on the following GT component:

GSI C

Tested platforms Tested Platforms for MyProxy:


Mac OS X 10.5 x86/x86_64 GNU/Linux PPC AIX 5.3 Sun4u Solaris 5.10

Backward compatibility summary All MyProxy versions are fully backwards compatible. Associated Standards Associated standards for MyProxy:

GFD-E.054 MyProxy Protocol RFC 3820 Proxy Certificates RFC 2246 TLS

GRID FTP COMPONENT Component Overview GridFTP is a high-performance, secure, reliable data transfer protocol optimized for highbandwidth wide-area networks. The GridFTP protocol is based on FTP, the highly-popular Internet file transfer protocol. We have selected a set of protocol features and extensions defined already in IETF RFCs and added a few additional features to meet requirements from current data grid projects. Feature Summary Features new in GT 5.0.0:

Improved failure restart capability in globus-url-copy:

A new option to store untransferred urls for later restarting is available. In case of any failures, this option allows users to restart transfers from a checkpoint rather than restarting from scratch. This option can be used for directory transfers, single file transfer or a list of files transfer.

Stall detection:

It is possible that the transfer hangs due to filesystem errors, network errors or GridFTP server errors. A new option is available in globus-url-copy to specify how long before canceling/restarting a transfer with no data movement.

Client-side host aliasing:

This allows concurrent transfers to be extended to multiple different hosts rather than multiple connections to the same host, without relying on DNS. Features that continue to be supported from previous versions

SSH security for GridFTP control channel Running the GridFTP server with GFork GridFTP

Multicasting / Network overlays Netlogger's bottleneck detection for GridFTP transfers GSI security: This is the PKI based, de facto standard security system used in Grid applications. Kerberos is also possible but is not supported and can be difficult to use due to divergence in the capabilities of GSI and Kerberos. Third-party transfers: Very common in Grid applications, this is where a client mediates a transfer between two servers (both likely at remote sites) rather than between the server and itself (called a client/server transfer). Cluster-to-cluster data movement or Striping: GridFTP can do coordinated data transfer by using multiple computer nodes at the source and destination. Partial file access: Regions of a file may be accessed by specifying an offset into the file and the length of the block desired. Reliability/restart: The receiving server periodically (the default is 5 seconds, but this can be changed) sends restart markers to the client. This marker is a messages specifying what bytes have been successfully written to the disk. If the transfer fails, the client may restart the transfer and provide these markers (or an aggregated equivalent marker), and the transfer will pick up where it left off. This can include holes in the file. Large file support: All file sizes, lengths, and offsets are 64 bits in length. Data channel reuse: Data channel can be held open and reused if the next transfer has the same source, destination, and credentials. This saves the time of connection establishment, authentication, and delegation. This can be a huge performance difference when moving lots of small files. Parallel transfers (Multiple TCP streams between a pair of hosts). TCP Buffer size control (Protocol supports Manual and Automatic; Only Manual Implemented). Server-side computation (Extended Retrieve (ERET) / Extended Store (ESTO) commands). Based on Standards: RFC 959, RFC 2228, RFC 2389, IETF Draft MLST-16 , GGF GFD.020.

Other Supported Features On the client side we provide a scriptable tool called globus-url-copy. This tool can take advantage of all the GridFTP protocol features and can also do protocol translation between FTP, HTTP, HTTPS, and POSIX file IO on the client machine. We also provide a set of development libraries and APIs for developers wishing to add GridFTP functionality to their application.

Summary of Changes in GridFTP

The default flavor of the GridFTP server has been changed to non-threaded.

Technology dependencies GridFTP depends on the following GT components:


Non-WS (General) Authentication & Authorization C Common Libraries XIO

GridFTP depends on the following 3rd party software:

OpenSSL (version is included in release)

Tested platforms Tested platforms for GridFTP


i386 Linux ia64 Linux (TeraGrid) AIX 5.2 Solaris 9 PA-RISC HP/UX 11.11 ia64 HP/UX 11.22 Tru64 Unix Mac OS X

While the above list includes platforms on which we have tested GridFTP, it does not imply support for a specific platform. However, we are interested in hearing reports of success or bug reports on any platform. Backward compatibility summary Protocol changes since GT 4.2.x

None

API changes since GT 4.2.x

None

Exception changes since GT 4.2.x

Not Applicable (GridFTP is not Java-based)

Schema changes since GT 4.2.x

Not Applicable (GridFTP is not SOAP-based)

Associated Standards Associated standards for GridFTP: RFC 959 Base FTP protocol RFC 2228 gssapi security extensions for FTP RFC 2389 FEAT, OPTS, etc. extensions to FTP (IETF FTP Working group draft) for structured directory listings, SIZE, MDTM commands. GFD.020 GridFTP extensions

REPLICA LOCATION COMPONENT Component Overview The Replica Location Service (RLS) is a server that provides for the registration and lookup of replica information. Within the RLS, there are two types of services, a catalog service and an index service. Feature summary Features New in GT 5.0.0

None since GT 4.2.1.

Other Supported Features Comprehensive C library for replica registration, replica lookup, replica attributes, index queries, and administrative tasks. Command line (globus-rls-cli) tool for client operations on catalogs and indexes. Command line (globus-rls-admin) tool for administrative tasks.

Summary of Changes in RLS

Streamlined startup for RLS.

When the RLS server was started, initialization previously took anywhere from several seconds to minutes, depending on the number of entries in the RLS database. During this time, users could not issue queries to the RLS database. The streamlined startup feature allows users to issue read-only queries to the RLS, during initialization. This is achieved by creating Bloom filters during the initialization, in a seperate thread, and disallowing queries that update the database, so as not to interfere with the Bloom filter creation. Improved support for 64-bit operating systems and better compliance with ODBC specifications. Backward compatible with GT 4 RLS protocols, APIs, command-line interfaces, and databases.

Technology dependencies RLS depends on the following GT components:


globus_core globus_common globus_io globus_gssapi_gsi

globus_usage

RLS depends on the following 3rd party software: RDBMS: SQLite*, MySQL, PostgreSQL, or Oracle ODBC manager: iODBC, unixODBC ODBC driver: SQLite-ODBC*, MyODBC, psqlODBC, or Oracle * The RLS comes installed with and configured to use these components. Tested platforms Tested platforms for RLS include Debian Lenny on AMD64, and CentOS 5.3 on AMD64. Backward compatibility summary Protocol changes since GT 4.2.x

None

API changes since GT 4.2.x

None

Exception changes since GT 4.2.x

None

Schema changes since GT 4.2.x

None

Associated Standards Associated standards for RLS: The RLS is implemented as a conventional service and, as such, does not conform to the WSRF or other WS set of specifications.

GRAM5 COMPONENT Component Overview The Grid Resource Allocation and Management (GRAM5) component is used to locate, submit, monitor, and cancel jobs on Grid computing resources. GRAM5 is not a job scheduler, but rather a set of services and clients for communicating with a range of different batch/cluster job

schedulers using a common protocol. GRAM5 is meant to address a range of jobs where reliable operation, stateful monitoring, credential management, and file staging are important. Feature summary New Features new since 4.2.x Server-side architectural changes to improve scalability, performance, and reliability Improved error notification protocol compared to GRAM2 Teragrid Gateway Identity support for job auditing Usage stats messages Added support for Sun Grid Engine (SGE)

Other Standard Supported Features


Remote job execution and management Uniform and flexible interface to local resource managers File staging before and after job execution File and directory clean up after job termination Service auditing for each submitted

Removed Features The GRAM5 client tools have dropped support for the Duroc API for task coallocation The GRAM5 service no longer streams output and error during job execution; instead this data is send after the job terminates The GRAM5 service no longer provides intra-job communication via the DUCT API The GRAM5 does not rely on XML schemas and WSDL service definitions

Summary of Changes in GRAM5 GRAM5 represents a significant improvement from GRAM2 and GRAM4 service implementations. GRAM2's limitation is scalability. GRAM4's is reliability. GRAM5 is both reliable AND scalable. It is important to note that GRAM5 is GRAM2 compatible. There are other improvements as well, like completely rewritten service logging based on the CEDPS logging best practices, Teragrid Gateway Identity support for job auditing, support for job exit codes, and usage stat support. We have been very encouraged by our performance results, which shows greater than 10x scalability than GRAM2 and roughly 10x reduction in resource consumption on the service host. We welcome your feedback as you integrate GRAM5 into your production grids. Technology dependencies GRAM depends on the following GT components:

Globus Common GSI C

GridFTP server

Tested platforms Tested platforms for GRAM5:

Linux
o o

CentOS 5.3 x86_64 Debain 4.0 x86_64 Mac OS X 10.5.8

Mac OS X
o

Backward compatibility summary Protocol changes in GRAM since GT4.2.x series: The GRAM5 service uses a superset of the GRAM2 protocol for communciation between the client and service. The extensions supported in GRAM5 are implemented in such a way that they are ignored by GRAM2 services or clients. These extensions provide improved error messages and version detection. GRAM5 does not support task coallocation using DUROC and its related protocols. Jobs submitted using DUROC directives will fail. GRAM5 does not support file streaming. The standard output and standard error streams are sent after the job completes instead of during execution.

Associated Standards None

C COMMON LIBRARIES COMPONENT Component Overview The C Common Libraries provide an abstraction layer for data types, libc system calls, and data structures used throughout the Globus Toolkit and useful for applications that use the Globus Toolkit. Feature summary Features new in release GT 5.0.0: globus_range_list abstraction added globus_logging abstraction added In this release we added globus_options. This is some common code for parsing options from the command line, environment variables, or configuration files.

Summary of Changes in C Common Libraries

No significant changes have happened for C Common Libraries since GT 4.2.x. Technology dependencies C Common Libraries only depend on the globus_core module. Tested platforms The C common libraries work on any platform on which the toolkit is supported. Backward compatibility summary API changes since GT version 4.2.x

globus_range_list abstraction added globus_logging abstraction added

All of the GT 3.2 API is still functional in GT 5.0.0. Associated Standards There are no standards implemented by the C common libraries.

METRICS REPORTING COMPONENT Why are we doing this? The Globus Alliance receives support from government funding agencies. In a time of funding scarcity, these agencies must be able to demonstrate that the scientific community is benefiting from their investment. To this end, we want to provide generic usage data about such things as the following:

how many people use GridFTP how many jobs run using GRAM how many GT4 web services containers are running.

To this end, we have added support to the Globus Toolkit that will allow installations to send us generic usage statistics. By participating in this project, you help our funders to justify continuing their support for the software on which you rely. The overview

Components affected for GT 4.0 are: o GridFTP o Java WS Core o C WS Core o WS GRAM o Reliable File Transfer (RFT) service o RLS

The data sent is as generic as possible Every component affected has a section titled "Usage Statistics" in its Users and Admin guides that lists precisely what is sent and the configuration control that is available (which you can use to disable the ability to send the data). To make this a win-win proposition, receiver for the data is made available from CVS. This means that a (virtual) organization could set up their own listener and collect organization wide usage statistics.

What is sent? The components affected for GT 4.0 are GridFTP, RLS, Java WS Core, C WS Core, WS GRAM, and the Reliable File Transfer (RFT) Service. We send the "how much" data, not "the what" data. For instance, GridFTP sends the number of bytes, how long the transfer took, how many streams were used, etc. It does NOT send filenames, usernames, or even the destination IP since that would mean that the source site would make a decision about sending information about the destination site. Each component has a section in its Users and Admin guides listing what component specific data is sent, and the Admin guide explains configurations related to the usage statistics. Links to these sections are provided here:

Java Core WS GRAM RFT GridFTP RLS

Header data that may be sent by every component, not including the component-specific data listed above, is: Component identifier Usage data format identifier Time stamp Source IP address Source hostname (to differentiate between hosts with identical private IP addresses)

How is the data sent? The messages are sent as a single UDP packet. While this may cause us to lose some data, it drastically reduces the possibility that the usage statistics reporting can adversely affect the operation of the software. When is the data sent? Once per "task" (GridFTP transfer, GRAM Job, container invocation, etc), either immediately upon startup, or at completion of the task. What will the data be used for?

The data will be used for answering questions such as:


How many jobs were run with GRAM last month? How many gigabytes of data has GridFTP moved?

We will also try and mine the data to answer operational questions such as:

What percentage of the jobs run complete successfully? Of the ones that fail, what is the most common fault code returned?

The data will NOT be used to answer questions such as "IP 123.456.789.012 sent 10 TB of data last month." Our intent is to make the data that we get generic enough that we do not have to worry what is done with it. We record the IP only for counting purposes to know how many sites there are, but we will not produce site-specific statistics. EXTENSIBLE IO COMPONENT Component Overview Globus XIO is an extensible input/output library written in C for the Globus Toolkit. It provides a single API (open/close/read/write) that supports multiple wire protocols, with protocol implementations encapsulated as drivers. The XIO drivers distributed with 5.0.0 include TCP, UDP, file, HTTP, GSI, GSSAPI_FTP, TELNET and queuing. In addition, Globus XIO provides a driver development interface for use by protocol developers. This interface allows the developer to concentrate on writing protocol code rather than infrastructure, as XIO provides a framework for error handling, asynchronous message delivery, timeouts, etc. The XIO driverbased approach maximizes the reuse of code by supporting the notion of a driver stack. XIO drivers can be written as atomic units and stacked on top of one another. This modularization provides maximum flexibility and simplifies the design and evaluation of individual protocols. Feature summary Features new in release 5.0.0 Driver specific string attributes. Set values like tcp buffer size via string at runtime. UDT driver. Mode E Driver Telnet Driver Queuing Driver Ordering Driver Dynamically loadable drivers.

Other Supported Features


Single API to swappable IO implementations. Asynchronous IO support.

Native timeout support. Data descriptors for providing driver specific hints. Modular driver stacks to maximize code reuse. TCP, UDP, file, HTTP, telnet, mode E, GSI drivers.

Deprecated Features

GSSAPI_FTP driver now distributed with the GridFTP Server

Summary of Changes in XIO


The TCP driver has been modified to randomize the selection of ephemeral ports. Minor code cleanups.

Technology dependencies XIO depends on the following GT components:


Globus Core Globus Common Globus GSSAPI

Tested platforms Tested Platforms for XIO:

Linux
o o o o

Mandrakelinux release 10.1 SuSE Linux 9.1 (i586) Debian GNU/Linux 3.1 Red Hat Linux release 9 SunOS 5.9 sun4u sparc SUNW,Sun-Fire-280R Darwin Kernel Version 7.9.0

SunOS
o

MacOS
o

New Features in improved versions of Globus toolkit 5.0


Globus Toolkit 5.0.1 New Features: GridFTP o New globus-url-sync command for syncing individual files or directories o New server option to control the default permissions of created files o New server option to time out on slow or hanging filesystems o New server logging level to include transfer statistics GRAM5 o Improved reliability with Condor-G clients o Fixed a number of bugs and memory leaks MyProxy

o Updated to MyProxy v5.1 GSI-OpenSSH o Updated to GSI-OpenSSH v5.2 GSI o Added OpenSSL 1.0.0 Support

Globus Toolkit 5.0.2 New Features:

GridFTP Synchronization (globus-url-copy -sync) feature that transfers files only if they do not exist at the destination or differ from the source o An offline mode for the server GRAM5 o Improvements have been made to address all the known blocker issues for production deployment on TeraGrid and OSG MyProxy o Updated to MyProxy v5.2
o

Globus Toolkit 5.0.3

GridFTP Added new command: Data Channel Security Context (DCSC) Useful for 3rd party transfers between GridFTP servers that use different CA certificates o Added gridftp server chrooting Allows admin to limit the directories a gridftp server can access o Added command strings for '-disable-command-list' option for gridftp server configuration o Added Progress markers for stream mode GRAM5 o Fixed a variety of bugs: PBS and Condor specific, improved reliability, improved usability o Fixed bugs preventing build on Solaris MyProxy o Updated to MyProxy version v5.3 GSI-Enabled OpenSSH o Updated to gsissh version 5.2
o

Globus Toolkit 5.0.4

GridFTP o Fixed GridFTP server bugs in striped or split configurations related to DCSC, setting driver stacks, and hanging backend processes. o Fixed globus-url-copy bugs related to sync operations. o Added globus-gridftp-server options -dc-default and -fs-default to make configurable the default xio driver stack for the network and filesystem. Previously, only the client was able to set this in a session. This makes it possible to always use drivers such as the rate limiting driver or the netlogger driver.

GRAM5 o Added RSL attributes to make it easier to debug an individual gram job. o Added unique job manager log file names so that all log files for all users can be written to a central location instead of each user's home directory. MyProxy o Updated to MyProxy version 5.4

Anda mungkin juga menyukai