March, 2000
CAPO Development Team
NASA Ames Research Center
M/S T27A-2
Moffett Field, CA 94035-1000
Please send any feedbacks on CAPO to:
capo@nas.nasa.gov
The success of CAPO relies on accurate interprocedual data dependence information which is currently provided by CAPTools. CAPO generates compiler directives in three stages:
This writeup is not a user manual. We simply hope to provide the beta testers some basic information to try out the tool. Any feedbacks can be sent to the address above.
For more information on CAPTools, check the web site at
http://captools.gre.ac.uk/.
For more information on the LCM project, check
http://www.nas.nasa.gov/Groups/Tools/Projects/LCM.
For major changes in different versions of CAPO, see WhatsNew.CAPO. More documents can be accessed from the web page at http://www.nas.nasa.gov/Tools/CAPO.
The executable of CAPO in this distribution is in
captool/bin/{machine}/capo
where {machine} is sgi for SGI workstation and
sun for SUN workstation.
A better approach for inspecting the loops is to use the Directives Browser implemented in CAPO (see Section 3 for details). The browser can be activated from the "View/Directives" menu and is designed to display information that are directly relevant to directives insertion. For instance, The browser provides more interactive information on the reasons for loops to be parallel or serial. The user can concentrate on loops that are indicated as serial and manipulate the dependence graph if needed. This is an iterative process. It is always a good idea to save the result to a database whenever a change is made before directives are inserted.
% setenv OMP_NUM_THREADS 8
% a.out (or your_program_name)
- Totally Serial (TS)
Loop itself is serial (with loop-carried true dependence, I/O
statements, Exit statements, or number of iteration less than 2)
AND not within or containing any parallel loops.
'Exit' statements refer to statements that jump out of the loop via,
such as, GOTO and RETURN.
Sub filters:
True Recursion - loop-carried true dependence, but containing
no I/O or exit statements
I/O or Exit - I/O and/or exit statements, and with true dependence
No Granularity - loop with one or less iteration, or use of string
range (:) in a possible pipeline loop
- Covered Serial (CS)
Loop itself is serial (loop-carried true dependence, I/O or Exit
statements) AND within or containing parallel loops.
Sub filters:
True Recursion - loop-carried true dependence, containing parallel loops
I/O or Exit - I/O or exit statements, containing parallel loops
Inside Parallel - inside parallel loops
- Falsely Serial (FS)
Loop itself has no loop-carried true dependence and no exit statements
AND not within any parallel loops, but may contain parallel loops.
Sub filters:
Privatization - loop-carried anti or output dependence, no I/O
statements, and/or with non-privatizable variables.
I/O Statement - I/O statements, without true dependence
No Granularity - no granularity or use of string
range (:) in a possible parallel loop
- Reductions (RD)
Loop with reductions. Symbol with '()' indicates an array.
- Pipeline (PP)
Loop could be used as part of a parallel pipeline.
- Chosen (CS)
Loop is a parallel loop other than reduction and pipeline loops
AND not within other parallel loops.
Sub filters:
Normal - containing no copyin/out variables
CopyIn/Out - with copyin/out variables
Ordered - with 'ordered' variables, for example, scalars
assigned in an IF statement and not privatizable
- Not Chosen (NC)
Loop is parallel, but not chosen due to other chosen parallel loops.
The loop is either inside or containing parallel loops.
Sub filters:
Inside Parallel - inside other parallel loops (excluding I/O loop)
I/O Statement - possible parallel I/O, inside or containing
parallel loops
No Granularity - no granularity, but containing parallel loops
For all cases -
Sub filters:
User Defined - as user defined loop type
Show Parallel I/O:
Yes - show loops with parallel I/O in the Sub filters
No - treat loops with I/O as serial
Usually one wants to go through the following loop types:
- Totally Serial->True Recursion - Covered Serial->True Recursion - Falsely Serial->Privatization - Chosen->CopyIn/Outand use the Why window to find out the reason for a particular loop type.
The cause for a loop not to be parallel can come from several sources, for example, loop-carried TRUE/ANTI/OUTPUT dependence, non-privatizable variables (reuse of memory). If one is sure that some of these dependences are false (mostly due to lack of input information for the dependence analysis) and can be removed, the Dep-Graph browser can be used. A shortcut is provided in the Why window where variables can be selected from the Var-List boxes and the relevant dependences can be removed by clicking the 'Remove' button. The following relevant dependences will be removed, based on the loop/variable type:
Loop-Type Var-List Dependence-Type
--------------------------------------------------------------------
Totally Serial True-dep Loop-carried TRUE dependence
Anti-dep Loop-carried ANTI dependence
Output-dep Loop-carried OUTPUT dependence
Covered Serial True-dep Loop-carried TRUE dependence
Anti-dep Loop-carried ANTI dependence
Output-dep Loop-carried OUTPUT dependence
Falsely Serial Anti-dep Loop-carried ANTI dependence
Output-dep Loop-carried OUTPUT dependence
In/Out-dep TRUE dependence from outside of the loop
Chosen Parallel Copyin/Out TRUE dependence from outside of the loop
Once a change to the dependence graph (either via the Dep-Graph browser
or via the WhyDirectives browser) is made, be sure to save the change
to the database (File-> Save Database) and re-perform the directive
analysis ("Update Directives..." button).
Parallel - from parallel without granularity
Serial - from parallel loop, including reduction
Reduction - from parallel loop or serial loop with loop-carried
true dependence
Break - from any other cases
Only the conversions as indicated are possible from the dialog box.
Although loop types can be redefined from the user-defined loop file,
use of the LoopType dialog box is safer. However, one should keep in
mind that changing the loop type manually could potential lead to
incorrect results if the above rule is not followed carefully.
There are two selectable types of routine duplication (rdup):
- 'Loop Usage' as the (default) type for rdup if a routine is used
both inside and outside parallel loop(s).
- 'Region Usage' as the type for rdup if a routine is used inside a
parallel loop and inside parallel region but outside parallel loop.
The second option confirms the OpenMP standard that a parallel region
can be nested inside a parallel loop but not inside a parallel region.
Format of this file:
'#' sign starts a comment 'key value' pair defines an entry
ENV_VARIABLE KEY DEFAULT POSSIBLE VALUES CAPO_PAR capo-inp.par CAPO_LOG log-file on (off on stdout) CAPO_LOGNAME log-file-name codeoutput.log CAPO_LOGINFO log-info std (min std more debug) CAPO_PLOOP loop-granularity 6 (0 1 2 ...) CAPO_TYPE directive-type omp (omp sgi sgix) CAPO_REGION region-type default (loop bloop one join full) CAPO_OPTIMIZE optimize-type o2 (off on o2) CAPO_USERLOOP user-loop-file user-loop.par CAPO_DIRCLEAR directive-clear default-list (off on filename) CAPO_TPRIV tpriv-directive on (off on) CAPO_COMMENT comment-type f90 (f77 f90) CAPO_USEPARTI use-parti-loop no (no yes) CAPO_ORDERED ordered-directive off (off on) CAPO_RDUPTYPE rdup-type loop (loop region)Notes:
off -- Logging to file is off, only minimum messages are printed on screen on -- Information are logged to a log-file stdout-- Information are printed to stdout (screen)
min -- Only minimum information are logged or printed std -- Print standard set of log information more -- Print more detailed log information, including region and loop numbers in the final Fortran file debug -- Print debugging information, probably more than you want, including region and loop numbers in the final Fortran file
omp -- Produce OpenMP directives (default) sgi -- Produce SGI native directives sgix -- Produce OpenMP directives with SGI extensions (currently, only 'NEST' is supported)
loop -- consider only one loop for one region (no pipeline) bloop -- consider one block + one loop for one region (no pipeline) one -- consider one region (region not joined, no pipeline) join -- consider joined region (outer loop nesting, no pipeline) full -- consider full region (region joined and possible pipeline)For SGI directives, only "loop" is allowed for the region type (region-type). The default region-type is "loop" for SGI and "full" for OMP.
off -- Do not do any optimization on -- Try to reduce synchronization at end-of-loop o2 -- Use logical disprove (slow sometime)
# starts comment #RoutineName LoopNumber NewType routine_name loop_count S|P|R|BEntries are specified line-by-line. Routine_name is case insensitive. For a program without the main-routine name defined, "_MAIN" can be used to indicate the main routine. "loop_count" is the loop number counted from the beginning of a given routine. Currently the following loop types are supported:
"S" for serial
"P" for parallel
"R" for reduction
"B" for break-type (e.g. so that a parallel region won't be formed
around this loop).
The "R" type can optionally be attached with
"[OPR:VAR]" or "[OPR:VAR()]" listto indicate reduction operator and reduction variable, no space in-between. The second form indicates an array reduction.
"cdir$", /* Cray vector directive */ "cmic$", /* Cray autotasking directive */ "c$par", /* PCF directive */ "c$doacross", "c$&", /* SGI multiprocessing directive */ "c$ ", "c$\t", "c$omp", /* OMP directive */ "c$sgi" /* SGI OMP extension */The default setting is to use the above list. The 'clearing' action may be turned off by setting CAPO_DIRCLEAR to 'off'. Additional directives may be added to the default list by prefixing a '+' in front of the filename for CAPO_DIRCLEAR.
A dirclear-list file contains simply a list of directives (keywords) to be considered. A keyword should lead with one of 'C', '!' and '*'. A '-' sign can be added to the front of a keyword to indicate the corresponding directive should not be cleared (i.e. keep its original form), otherwise, the keyword will be added to the list.
off -- Use an alternative method to handle private variables on -- Try to create THREADPRIVATE directives
loop -- as the (default) type for rdup if a routine is used both inside and outside parallel loop(s). region -- as the type for rdup if a routine is used inside a parallel loop and inside parallel region but outside parallel loop.The second option confirms the OpenMP standard that a parallel region can be nested inside a parallel loop but not inside a parallel region.
Generate-NOWAIT - enable/disable NOWAIT directive Transform-Induction-Loop - enable/disable induction loop treatment Handle-Array-Reduction - enable/disable array reduction Remove-Old-Directives - enable/disable removing old directives Apply-UserLoop-Type - enable/disable applying userloop types Setup-Pipeline-Loop - enable/disable pipeline loop
# env: CAPO_PAR # Parameters for CAPTools-based Parallelizer with OpenMP (CAPO) # They apply to version 1.0 # env: CAPO_LOG # defines if log-information is wanted log-file on (off on stdout) # env: CAPO_LOGNAME # defines log-file name when log-file = on log-file-name (default: codeoutput.log) # env: CAPO_LOGINFO # defines type of information to be logged log-info std (min std more debug) # env: CAPO_PLOOP # defines granularity (min. no. of iters.) for parallel loops loop-granularity 6 (0 1 2 ...) # env: CAPO_TYPE # defines type of directives to be produced directive-type omp (omp sgi sgix) # env: CAPO_REGION # defines type of parallel regions to be considered region-type full (loop bloop one join full) # env: CAPO_OPTIMIZE # defines optimization type for parallel regions optimize-type o2 (off on o2) # env: CAPO_USERLOOP # defines the file name for user-defined loop types user-loop-file (default: user-loop.par) # env: CAPO_DIRCLEAR # defines the file name for directives to be cleared directive-clear Default (off on filename) # env: CAPO_TPRIV # switches on/off the generation of THREADPRIVATE tpriv-directive on (off on) # env: CAPO_COMMENT # chooses a comment type for directives comment-type f90 (f77 f90) # env: CAPO_USEPARTI # uses partitioned loops for directives use-parti-loop no (no yes) # env: CAPO_ORDERED # creates ORDERED code section ordered-directive off (off on) # env: CAPO_RDUPTYPE # defines routine duplication type rdup-type loop (loop region)
In the case of "more" and "debug", additional labels (region# and loop#) are added as comments for parallel loops in the generated parallel code. Regions and loops are labeled within a given routine, sequentially.
Routine: ROUTINE_NAME
Loop # (loop_variable), group #, level #: parallel/serial
TYPE? Reason for serial...
"TYPE?" is one of types from the loop type list:
"REDU", "NPAR", "PAR", "IO", "LVAR", "SER", "ANTI", "PIPE", "BRK", "UPIPE", "PAREG", "INDU", "INPLP", "RDINP", "GRAN", "PARTI"As an example, part of the analysis for three routines in LU is given here (with log_info set to MORE).
Routine: BUTS
Loop 1 (J), group 1, level 1: parallel, granularity - ok
PAR-> directives to be added for the loop <1,1>
Loop 2 (I), group 1, level 2: parallel, granularity - ok
INPLP? no directive, loop inside a parallel loop
Loop 3 (M), group 1, level 3: parallel, granularity - no
Loop 4 (J), group 2, level 1: serial
PIPE? true dependence, pipeline loop? dvector: V[0,0,-1,0]
Loop 5 (I), group 2, level 2: serial
PIPE? true dependence, pipeline loop? dvector: V[0,-1,0,0]
Loop 6 (M), group 2, level 3: parallel, granularity - no
Loop 7 (M), group 2, level 3: parallel, granularity - no
*** Total number of loops: 7, parallel: 5, serial: 2, directive: 1
Routine: JACU
Loop 1 (J), group 1, level 1: parallel, granularity - ok
PAR-> directives to be added for the loop <1,1>
Loop 2 (I), group 1, level 2: parallel, granularity - ok
INPLP? no directive, loop inside a parallel loop
*** Total number of loops: 2, parallel: 2, serial: 0, directive: 1
...
Routine: SSOR
Loop 1 (I), group 1, level 1: serial
ANTI? loop carried output or non-exact anti dependence: ELAPSED
Loop 2 (I), group 2, level 1: serial
ANTI? loop carried output or non-exact anti dependence: ELAPSED
Loop 3 (ISTEP), group 3, level 1: serial
BRK? break out of the loop or comm-call inside the loop
Loop 4 (K), group 3, level 2: parallel, granularity - ok
PAR-> directives to be added for the loop <2,1>
Loop 5 (J), group 3, level 3: parallel, granularity - ok
INPLP? no directive, loop inside a parallel loop
Loop 6 (I), group 3, level 4: parallel, granularity - ok
INPLP? no directive, loop inside a parallel loop
Loop 7 (M), group 3, level 5: parallel, granularity - no
Loop 8 (K), group 3, level 2: serial
SER? loop carried true dependence: ELAPSED
Loop 9 (K), group 3, level 2: serial
SER? loop carried true dependence: ELAPSED
Loop 10 (K), group 3, level 2: parallel, granularity - ok
PAR-> directives to be added for the loop <2,2>
Loop 11 (J), group 3, level 3: parallel, granularity - ok
INPLP? no directive, loop inside a parallel loop
Loop 12 (I), group 3, level 4: parallel, granularity - ok
INPLP? no directive, loop inside a parallel loop
Loop 13 (M), group 3, level 5: parallel, granularity - no
*** Total number of loops: 13, parallel: 8, serial: 5, directive: 2
>>>> Grand total: num_routines 25, num_loops 157
loops: parallel 145, serial 12, directive 30
The label for a parallel loop with directive to be added (PAR->)
is given as <level,group> pairs. In the case of a serial loop only one
variable is listed for the cause of serialization. For a potential
pipeline loop, the dependence vector for the first related variable
is given for the corresponding loop, as the case of V[0,0,-1,0] for
loop 4 (J) in routine BUTS.
The user-defined loop types are applied after the loop classification. Therefore, it is user's responsibility to ensure the correctness of user-supplied loop types.
Routine: ROUTINE_NAME <yes/no/inploop/noploop>
<yes> - with directives for parallel loops
<no> - no directives
<inploop> - routine is called inside a parallel loop
<noploop> - routine has no parallel loop, but may contain potential
pipeline loops
A sample result from the analysis of NPB-LU looks like the following.
Routine: APPLU <yes>
Routine: READ_INPUT <no>
Routine: DOMAIN <no>
Routine: SETCOEFF <no>
Routine: SETBV <yes>
Routine: SETIV <yes>
Routine: ERHS <yes>
Routine: SSOR <yes>
Routine: TIMER_CLEAR <no>
Routine: JACLD <yes>
Routine: BLTS <yes>
Routine: JACU <yes>
Routine: BUTS <yes>
Routine: RHS <yes>
Routine: TIMER_START <no>
Routine: L2NORM <yes>
Routine: TIMER_STOP <no>
Routine: ELAPSED_TIME <no>
Routine: WTIME <no>
Routine: ERROR <yes>
Routine: EXACT <no>
Routine: PINTGR <yes>
Routine: VERIFY <no>
Routine: PRINT_RESULTS <no>
Routine: TIMER_READ <no>
>>> Total routines: 25, checked: 24, with directives: 13
in/outside ploop: 0, in/with ploop: 0, no ploop: 12
Total directive loops: 30, effective: 30, in ploop: 0
The last line of the statistics indicates how many loops can be put
with directives, how many of them are really added with directives,
and how many of them are nested inside other loops with directives.
Next is to construct parallel regions based on the loop information. A parallel region includes at least one parallel loop or pipeline loop with possible basic blocks in the beginning of the loop. No nested parallel loops are considered at this point. Two neighboring regions can be joined together if no codes other than comments or nops exist between the two regions. Individual regions are labeled sequentially within a routine. For each region a number is included in () to indicate the end (or last) region of a joined area (of regions). For disjointed regions, the end region is the same as the region itself. Additional information included for a region are: loops in the region and type of the region. Regions are also summarized for a routine as "region-type-summary".
Region-type: one ploop containing exactly one parallel loop (no pipeline) +prev-block one parallel loop plus any preceded basic blocks sub ploop one or more parallel loops nested at different levels pipeline potential pipeline <default> region with joined neighbors Region-type-summary: DEFAULT routine contains normal parallel regions PIPE routine is part of a pipeline region UPIPE routine contains potential pipeline regionsSample outputs from the analysis of NPB-LU:
Region-in-Routine: BUTS region-type-summary: UPIPE Parallel region 1 (2): loops [1-3] Parallel region 2 (2): loops [4-7] *** Total number of regions: 2, joined regions: 1 Region-in-Routine: JACU region-type-summary: DEFAULT Parallel region 1 (1): loops [1-2] one ploop *** Total number of regions: 1, joined regions: 1 Region-in-Routine: SSOR region-type-summary: DEFAULT Parallel region 1 (1): loops [4-7] one ploop Parallel region 2 (2): loops [10-13] one ploop *** Total number of regions: 2, joined regions: 2Once the initial regions are determined, routines are then checked for possible pipeline regions across routines. If such a region is identified, the pipeloop limit is checked against all other parallel loops in the same pipeline region for alignment. If a discrepancy is found, a message will be printed out as either "not the same limit" or "low-high limit swapped!". In the first case, the suggested pipeline operation may produce incorrect run-time result and further check of this generated code is needed. In the second case CAPTools automatically swaps the loop limit to ensure the consistence. If pipeline loops are not desirable, set the environment variable CAPO_REGION to "join".
For LU, routines BUTS and JACU were identified to be part of a pipeline region in routine SSOR and information was generated as follows.
Region-in-Routine: BUTS region-type-summary: PIPE pipeloop: DO J=JEND,JST,-1 (BUTS) thisloop: DO J=JEND,JST,-1 (BUTS) same limit Region-in-Routine: JACU region-type-summary: PIPE pipeloop: DO J=JEND,JST,-1 (BUTS) thisloop: DO J=JST,JEND,1 (JACU) low-high limit swapped! Region-in-Routine: SSOR region-type-summary: DEFAULT Parallel region 1 (1): loops [4-7] one ploop Parallel region 2 (2): loops [8-8] pipeline Parallel region 3 (3): loops [9-9] pipeline Parallel region 4 (4): loops [10-13] one ploop *** Total number of regions: 4, joined regions: 4 >>>> Grand total: routines 25, regions 34, joined regions 26Parallel regions are further optimized for removal of end-of-loop synchronization (use the 'NOWAIT' construct). Although more conservative approach is taken, careful examination of NOWAIT is still needed. For example, one should pay attention to the WARNING messages on 'EndLoop-Sync required/re-enforced'. If any problem occurs, one can always switch the optimization off (setenv CAPO_OPTIMIZE off).
For LU, this is the summary after region optimization:
>>>> Total number of syncs removed: 7, in 4 routines (13 checked)
- clearing any old directives if CAPO_DIRCLEAR is not off (section 4.3), - searching threadprivate common blocks and inserting the THREADPRIVATE directive if CAPO_TPRIV is not off, - duplicating routines if needed, and - inserting region/loop-level directives.
Information resulted from these four actions are not fed back to the Directives Browser except for presented as directives in the source code. Thus, once directives are inserted, the Directives Browser should not be used to do further changes.
A threadprivate common block is the one that have all its variables used as private (including copyin) for all the parallel regions in the whole program. It means even a single instance of a non-private usage of a variable can prevent the common block from becoming threadprivate. In the debug mode, causes of a common block being determined as thread- private or shared can be examined. See Section 5.4 for details. Normally messages are printed for identified threadprivate common blocks and routines that contain them. An example is given here.
T_PRIV common blocks:
-/WORK_1D/-18: SP SET_CONSTANTS EXACT_RHS INITIALIZE ADI TXINVR X_SOLVE NINVR
Y_SOLVE PINVR Z_SOLVE LHSINIT TZETAR ADD VERIFY ERROR_NORM COMPUTE_RHS
RHS_NORM
-/WORK_LHS/-18: SP SET_CONSTANTS EXACT_RHS INITIALIZE ADI TXINVR X_SOLVE
NINVR Y_SOLVE PINVR Z_SOLVE LHSINIT TZETAR ADD VERIFY ERROR_NORM
COMPUTE_RHS RHS_NORM
>>> THREADPRIVATE directive added for 2 common blocks in 18 routines
Warnings may be printed for those common blocks that potentially be
threadprivate:
WARNING! SSOR... region 4, loop 8 /CJAC/ Type conflict: old SHARED, new PRIV - use SHAREDIt indicates that in routine SSOR all variables in common block /CJAC/ are used as private in region 4, but the common block is shared in other places. One can trace further for where the common block is shared in the debug mode.
Directives are added by annotating the call graph and using the parallel region information obtained in 5.2. The call paths are printed as the insertion is progressing. Any routine is only visited one time.
Routine: APPLU
Routine: APPLU->SETCOEFF
Routine: APPLU
Routine: APPLU->SETBV
Routine: APPLU
Routine: APPLU->SETIV
Routine: APPLU
Routine: APPLU->ERHS
Routine: APPLU
Routine: APPLU->SSOR
Routine: APPLU->SSOR->RHS
Routine: APPLU->SSOR->RHS->TIMER_START
Routine: APPLU->SSOR->RHS->TIMER_START->ELAPSED_TIME
Routine: APPLU->SSOR->RHS->TIMER_START->ELAPSED_TIME->WTIME
Routine: APPLU->SSOR->RHS->TIMER_START->ELAPSED_TIME
Routine: APPLU->SSOR->RHS->TIMER_START
Routine: APPLU->SSOR->RHS
Routine: APPLU->SSOR->RHS->TIMER_STOP
Routine: APPLU->SSOR->RHS
Routine: APPLU->SSOR
Routine: APPLU->SSOR->L2NORM
INFO! Array reduction variable replaced with local critical in region 1 -
SUM() --> SUM_CAP1()
Routine: APPLU->SSOR
Routine: APPLU->SSOR->JACLD
Routine: APPLU->SSOR
Routine: APPLU->SSOR->BLTS
Routine: APPLU->SSOR
WARNING! Potential memory conflict for shared variable in region <2,1> - ELAPSED
Routine: APPLU->SSOR->JACU
Routine: APPLU->SSOR
Routine: APPLU->SSOR->BUTS
Routine: APPLU->SSOR
WARNING! Potential memory conflict for shared variable in region <3,1> - ELAPSED
Routine: APPLU
Routine: APPLU->ERROR
INFO! Array reduction variable replaced with local critical in region 1 -
ERRNM() --> ERRNM_CAP1()
Routine: APPLU
Routine: APPLU->PINTGR
Routine: APPLU
Routine: APPLU->VERIFY
Routine: APPLU
WARNINGs for "...variable used after a parallel region", "potential memory
conflict", and INFOs on the changes made to routine arguments should be
examined carefully. These are just warnings, may or may not cause any
programming errors. The warnings are the cases where CAPO are uncertain
of decision making and user needs to inspect the generated code at the
pointed places for verification. The parallel region is labeled as
<region_number, parallel_loop_number> pairs in the call path right
preceding the warning message.
Meanings of keywords in the WARNING message:
"variable" -- a variable used in the current routine scope
"common-variable" -- a variable used outside the current scope
e.g. through COMMON blocks or SAVE statements
in a subroutine
"Shared" -- variable shared in the current region
"PLocal" -- potential private variable in the current region
"Control" -- variable with multiple control paths, i.e. variable
could be updated either inside or outside the
current region
"I/O statement" -- routine called inside a parallel region
contains i/o (OPEN,READ,WRITE,CLOSE) statements
"STOP statement" -- routine called inside a parallel region
contains STOP/PAUSE statements
"Potential memory conflict" -- for shared variable that can cause
memory conflict in a parallel region
If a private variable in a parallel region is updated via a COMMON block
in a subroutine, CAPO tries to privatize such a variable by adding it to
the subroutine's argument list and renaming the original variable in the
COMMON block of the subroutine. CAPO will generate the following INFO
messages in this process:
New argument () added to CALL OTHER_ROUTINE():# in ROUTINE_NAME New symbol () added to the argument list of ROUTINE_NAME Common block /cblk/ duplicated for ROUTINE_NAMECAPO performs a code transformation automatically for a reduction variable that is an array element. The corresponding message is like:
Array reduction variable replaced with scalar in region # -
OLD_ARRAY_ELEMENT --> NEW_SCALAR_VARIABLE
- UserLoop information for user-defined loop types
Userloop: Defined loop # in routine ROUTINENAME - newtype
The newtype is one of (S, P, R, B) as mentioned in section 4.3
- List of old directives to be cleared
- Summary of loop type with list of all dependence vector deltas for
pipeline loops
- List of symbols and types in each region
TYPE
Private - Local symbol
Reduction - Scalar reduction variable
ArrayReduction - Array reduction variable
Shared - Shared symbol
LastPrivate - Usage in & after the region
FirstPrivate - Usage in & before the region
CopyInOut - Shared but no or no proof of loop-var dependent
ThreadPrivate - Used in a threadprivate common block
UnknownType - Type not defined yet
CONTROL
No-Control Symbol not in a control dependence
Control-Dep Symbol in a control dependence
SCOPE
In-Scope Symbol defined in the current routine
Not-in-Scope Symbol not defined in the current routine
(defined via common block or save statement)
Not-in-Use Symbol passed into a subroutine but not used
in the subroutine
DTYPE:DEPTH (printed in [.:.])
IO -1, Input/Output
NT 0, Non-exact True
NA 1, Non-exact Anti
NO 2, Non-exact Output
ET 3, Exact True
EA 4, Exact Anti
EO 5, Exact Output
CT 6, Control
UN 7, Unknown type
Depth = 0 for loop-independent dependence
- List of routine call types, indicating the usage of a routine
inside/outside parallel regions/loops. Five bits are used:
bit1 [0x01] called outside parallel region
bit2 [0x02] called inside paregion but outside parallel loop
bit3 [0x04] called inside parallel loop
bit4 [0x08] called outside parallel loop (= bit1 | bit2)
bit5 [0x10] called inside parallel region
- Information on updating duplicated routines
Replace call to DROUTINE with CAP_DROUTINE in ROUTINE
Removed ROUTINE from the calledby list of DROUTINE
Added ROUTINE to the calledby list of CAP_DROUTINE
- List of symbols and affine expressions for testing loop limits
(such as in the removal of end-of-loop synchronizations)
HOME (LOOP-VAR-EXPR, #hits) Low <EXPR> High <EXPR> [A1:INDX,A2:INDX..]
(LOOP-VAR-EXPR, #hits) Low <EXPR> High <EXPR> [B1:INDX,B2:INDX..]
OTHER (NONLOOP-EXPR, #hits) [C1:INDX,C2:INDX..]
(NONLOOP-EXPR, #hits) [D1:INDX,D2:INDX..]
Here <EXPR> is a symbolic expression, A,B,C,D are array names, INDX is
the relevant array index. The lists are for both source and sink.
- Summary of fields associated with the ploopinfo data struct, mainly
for development purpose.
Loop Lvar D/L Type G WP IP Flag
Routine: ROUTINE_NAME
# var ?/? TYPE? ? ? ? [321]
'Loop' -- the loop number in a routine
'Lvar' -- the loop variable name
'D' -- the 'dlevel' value
'L' -- the 'level' value of the loop
'Type' -- one of type strings given in Section 5.1
'G' -- the loop granularity flag (internal info only)
'WP' -- '1' containing parallel loop, '0' without parallel loop
'IP' -- '1' inside parallel loop, '0' not inside parallel loop
'Flag' -- three bits for internal usage only
- Symbols and their types in common blocks (for testing threadprivate)
Meanings of symbol types:
[U] - Unset
[P] - Private
[R] - Reduction
[A] - ArrayReduction
[S] - Shared (RW)
[s] - Shared (Readonly)
[L] - LastPrivate
[F] - FirstPrivate
[C] - CopyInOut