CSM Projects

CAPTOOLS PROJECT FINAL REPORT: EVALUATION AND APPLICATION OF THE COMPUTER AIDED PARALLELISATION TOOLS

David O'Neal, NCSA/UIUC
R. Luczak, Rice University
M. White, Ohio Aerospace Institute

 

This file is available for
download in .pdf format.
Click here

We begin this final report with a brief overview of Amdahl's Law of Parallelization. This section serves two purposes. It lends definition to basic terms used throughout the article and it establishes a method for evaluating the effectiveness of CAPTools- generated parallel source codes. Notes describing the basic care and feeding of the software (Usage) lead into a discussion of more severe limitations (Caveats). Fundamental descriptions of each evaluation code precede more detailed observations regarding scalability (Applications). Significant findings are then summarized and final recommendations are made in closing.

 

Introduction:

This report extends a previous study of the Computer Aided Parallelisation Tools software package (http://captools.gre.ac.uk/) developed by the University of Greenwich, London.1 Product effectiveness, deployment and training requirements, and the more general issue of continued support for the project are all addressed by this installment.

Scalability and the level of effort required to achieve it are considered first. An informative inventory of essential features and basic usage strategies follows directly. A review of project accomplishments leads into a discussion of appropriate courses of action, and key recommendations are made in closing.

 

Objectives:

The goal of this project is to evaluate the potential of the Computer Aided Parallelization Tools software by examining its effectiveness when applied to a representative set of DoD and University research codes.

 

Methods/procedures/apparatus:

A set of application codes was identified for testing various aspects of the CAPTools product. In some cases, special treatment was required to accommodate the input format supported by the CAPTools DoD-2.1Beta release. All experiments were performed on the Origin 2000 machines at ASC MSRC and NCSA. Basic approaches and caveats associated with the use of the tool are listed. More comprehensive descriptions of the applications are included. Scalability data through 64 processors is presented. Efficiencies of the resultant codes (percentage of operations parrallelized) are estimated through the application of Amdahl's Law.

 

Results and Discussion:

The CAPTools product was evaluated with respect to its ability to deal with two significant DoD codes and two academic research programs. A total of 15 cases were considered including three datasets produced by the Ohio Supercomputer Center. At least some measure of speedup was observed for all seven of the CAPTools-generated OpenMP codes while only 5 of 8 CAPLib experiments met with similar results (speedup data for the other 3 cases was not presented). The effort required to achieve these results varied from case to case, but the OpenMP tests were consistently less demanding both in terms of required parallel programming expertise and time to complete.

The memory reduction feature could not be applied to one of the test codes (PFEM) due to the presence of an unstructured mesh. In another case (FDL3DI), attempts to apply the feature resulted in segmentation faults. When memory reduction is not (or cannot be) applied, the entire dataset is replicated across the partition.

The CAPLib model requires much more from the user in every respect. Detailed knowledge of the input source code is presumed. Issues related to data distribution must be considered prior to any other parallelization steps. A thorough understanding of parallel programming concepts is required in order to guide CAPTools through the process.

The effectiveness of the CAPLib model is also highly dependent on the presence of a well-defined mesh. Our work with the N-Body code made this perfectly clear. In contrast, OpenMP codes are not subject to this restriction, but are instead dependent on the presence of large arrays. Results for the OpenMP version of N-Body were exceptional.

For two of our test codes, CAPTools implemented block-oriented communications as loops around single element transmissions. This limitation is reportedly slated for improvement in the next release. Until then, users may need to modify message passing output source codes by hand in order to achieve acceptable levels of performance.

We also observed in some cases that CAPTools did not recognize situations that called for the use of collective communication calls. Instead, less efficient code was written in terms of CAPLib primitives. This is a much more difficult problem that is not currently scheduled for near-term improvement. Here we have indication of a long-term need to understand the CAPLib interface and implement optimizations explicitly when working with message passing models.

 

Conclusions and Recommendations:

Users faced with the challenge of parallelizing a FORTRAN 77 code should certainly consider the CAPTools package first. Work may be required to successfully load or analyze any given input file, but once accomplished, the value of the information represented by the Call Graph, the Dependency Graph and the Directives Browser is significant. Transformation Menu options may also be used to affect serial optimizations such loop interchanges, splits and skews. The effort required to achieve this minimal level of progress is considered worthwhile for any porting project.

Where the option exists, users are advised to work with the OpenMP model. Compilers supporting OpenMP are usually associated with distributed shared memory platforms and so we are indirectly recommending that users target DSM machines like the SGI Origin 2000. The simplicity of this approach is most attractive. OpenMP source may be generated immediately after completing an analysis. If formatting changes are neglected, the output files look very much like the input files, thus reducing the impact of any subsequent debugging or tuning efforts. The build process is also completely straightforward. For two of the four test codes considered by this evaluation, the corresponding OpenMP executables were the most efficient. As previously noted, this is currently the only practical choice for meshless codes. The CAPO (CAPTools OpenMP) program out of NASA Ames has been a robust and practical tool for many months now. A CAPO installation based on the configuration developed at NCSA should be deployed across all of the HPCMP centers.

We are less enthusiastic about the CAPTools message passing model. It is most effective in the hands of an experienced programmer, but this type of DoD user is not inclined to develop message passing logic that depends on a set of proprietary libraries. Novice programmers may find the CAPLib interface less complicated than MPI for example, but such users aren't well suited to the demands imposed by the model, especially when debugging or tuning is required (as is all too often the case). The former situation isn't likely to change, and the possibility of a member of the latter group producing a significant result within a reasonable amount of time is also considered to be remote.

Therefore, we recommend that funding to support the development of the OpenMP model should be continued. Deliverables associated with the current version of the UG proposal for CY5 should be interpreted accordingly. Specific items of interest include core support, updates of the Openwin libraries, repair and enhancement of database management features, and improvements to the documentation system with respect to OpenMP.

On the PET side of the equation, we recommend the award of an additional 15-20% FTE position to develop a comprehensive success story over the next 9 months. The suppressor model and one other highly visible DoD code would be targeted. This relatively small adjustment to the budget would yield a significant feature presentation for ASC PET. Support for the implementation of a comprehensive CAPTools tutorial based on the OpenMP model is also recommended. Another 5% FTE (minimum) is indicated. NCSA and UTK are each qualified to produce such a document.

As a final remark, the collaboration between UG and NASA Ames has been very successful, but we would still like to see a single, integrated version of the CAPTools package emerge. The organization behind the product should present itself to the community in the simplest possible terms. Impediments associated with acquisition, installation, and maintenance of complementary versions of the same product should certainly be removed. Further consideration of interoperability issues (between models) is also warranted.

 

References:

D. O'Neal, R. Luczak, and M. White, CAPTools Project: Evaluation and Application of the Computer Aided Parallelisation Tools, proceedings of the DoD High Performance Computing Users Group Conference, Monterrey, CA, 1999.

G. Amdahl, Validity of the single-processor approach to achieving large-scale computational capabilities, proceedings of the AFIPS Conference, volume 30, page 483, AFIPS Press, 1967.

W. Schnauer, Scientific Supercomputing: Architecture and Use of Shared and Distributed Memory Parallel Computers, self-edition, Karlsruhe, Germany, 2000.

D. O'Neal and J. Urbanic, On Microprocessors, Memory Hierarchies, and Amdahl's Law, proceedings of the DoD High Performance Computing Users Group Conference, Monterrey, CA, 1999. MPI: A Message Passing Interface Standard, University of Tennessee, Knoxville, TN, May 5, 1994.

OpenMP Fortran Application Program Interface, Version 1.0, October, 1997.

P. Leggett, S. Johnson, and M. Cross, CAPLib: A Thin Layer Message Passing Library to Support Computational Mechanics Codes on Distributed Memory Systems, internal report, Parallel Processing Research Group, Center for Numerical Modelling and Process Analysis, University of Greenwich, London, UK, 2000.

D. O'Neal and R. Reddy, The Parallel Finite Element Method, in proceedings of the Cray User Group Inc., Spring Conference, Denver, CO, 1995.

Computer Aided Parallelization Tools User's Guide, Parallel Processing Research Group, University of Greenwich, London, UK, Version 2.0 Beta, October, 1998.

 

Related documents and images:

http://www.ncsa.uiuc.edu/EP/CSM/publications/2000/UCG00_CAPTools.pdf

http://www.ncsa.uiuc.edu/EP/CSM/presentations/UGC00_CAPTools_PPT.pdf http://www.ncsa.uiuc.edu/EP/CSM/software/captools/

 

Acknowledgements:

ASC PET, University of Greenwich (London), NASA Ames Research Center, AFRL Basic CFD Research Branch, Wright-State University, Ohio Aerospace Institute, University of Tennessee, University of Illinois (NCSA)

 


  [NCSA]