Pragmas Support in Open64

From Open64 Wiki

Revision as of 10:49, 5 November 2009 by Hellbenter (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Available Pragma Support

A Documentation on Pragmas and Links relevant to the Open Research Compiler - Open64:
  • List of pragmas that Open64 compiler supports:

(common/com/wn_pragmas.cxx)

  /* Master pragma descriptor list: */ Some general description on the pragmas below:  
      
  Pragma Name: NULL
  Pragma User: NULL
  Pragma Scope: UNKNOWN
  Comment: Pragma 0 is undefined to make sure the front-ends send a valid pragma.
  Pragma Name: "INLINE_BODY_START"
  Pragma User: IPA, WOPT
  Pragma Scope: ON
  Comment: Mark start of an inlinexd function body.
  Pragma Name: "INLINE_BODY_END"
  Pragma User: IPA, WOPT
  Pragma Scope: OFF
  Comment: Mark end of an inlined function body.
  Pragma Name: "INLINE_DEPTH" 
  Pragma User: IPA
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "INLINE_LOOPLEVEL" 
  Pragma User: IPA
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "AGGRESSIVE_INNER_LOOP_FISSION"
  Pragma User: LNO
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "FISSION"
  Pragma User: LNO
  Pragma Scope: POINT  
  Comment: fission the surrounding l loops here 
  Pragma Name: "FISSIONABLE"
  Pragma User: LNO    
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "FUSE"
  Pragma User: LNO    
  Pragma Scope: SPECIAL 
  Comment: fuse the next n loops for l levels 
  Pragma Name: "FUSEABLE"
  Pragma User: LNO
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "NO_FISSION"
  Pragma User: LNO    
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "NO_FUSION"
  Pragma User: LNO    
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "INTERCHANGE"
  Pragma User: LNO
  Pragma Scope: SPECIAL
  Comment: Interchange the surrounding loops based on the loop indices specified.
  Pragma Name: "NO_INTERCHANGE"
  Pragma User: LNO
  Pragma Scope: SPECIAL
  Comment: Do not interchange loops.
  Pragma Name: "BLOCKING_SIZE"
  Pragma User: LNO
  Pragma Scope: SPECIAL
  Comment: Specify sizes for blocking.
  Pragma Name: "NO_BLOCKING"
  Pragma User: LNO
  Pragma Scope: SPECIAL
  Comment: Do not block loop
  Pragma Name: "UNROLL"
  Pragma User: CG
  Pragma Scope: WN
  Comment: Unroll loop n times
  Pragma Name: "BLOCKABLE"
  Pragma User: LNO
  Pragma Scope: SPECIAL
  Comment: Block loops as specified by indices
  Pragma Name: "PREFETCH"
  Pragma User: LNO
  Pragma Scope: SPECIAL
  Comment: Specify prefetch for each cache level
  Pragma Name: "PREFETCH_MANUAL"
  Pragma User: LNO
  Pragma Scope: SPECIAL
  Comment: Specify handling of manual prefetches
  Pragma Name: "PREFETCH_REF"
  Pragma User: LNO
  Pragma Scope: SPECIAL
  Comment: Generate prefetch node for array ref 
  Pragma Name: "PREFETCH_REF_DISABLE"
  Pragma User: LNO
  Pragma Scope: SPECIAL
  Comment: Disable specified array prefetches 
  Pragma Name: "IVDEP"
  Pragma User: LNO
  Pragma Scope: WN
  Comment: Force mem ref indep
  Pragma Name: "OPTIONS"
  Pragma User: IPA, LNO, WOPT and CG
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "OPAQUE_REGION"
  Pragma User: IPA, LNO, WOPT and CG
  Pragma Scope: PU
  Comment: N/A   
  Pragma Name: "FREQUENCY"
  Pragma User: CG
  Pragma Scope: POINT
  Comment: Provide hints regarding execution
  Pragma Name: "DISTRIBUTE"
  Pragma User: LNO
  Pragma Scope: WN
  Comment: A suitable point for loop distribution can be suggested to the compiler without actually 
  performing the source code modifications.
  Pragma Name: "REDISTRIBUTE"
  Pragma User: LNO
  Pragma Scope: WN
  Comment: Allows you to dynamically redistribute previously distributed arrays. 
  Pragma Name: "DISTRIBUTE_RESHAPE"
  Pragma User: LNO
  Pragma Scope: WN
  Comment: Program makes no assumptions about the storage layout of the array. The compiler 
  performs aggressive optimizations for reshaped arrays that violate standard layout assumptions 
  but guarantee the desired data distribution for that array. 
  Pragma Name: "DYNAMIC"
  Pragma User: LNO
  Pragma Scope:  WN
  Comment: By default, the compiler assumes that a distributed array is not dynamically 
  redistributed, and directly schedules a parallel loop for the specified data affinity. This pragma 
  informs the compiler that array may be dynamically redistributed. 
  Pragma Name: "ACCESSED_ID"
  Pragma User: WOPT
  Pragma Scope: SPECIAL
  Comment: Probably flags for load, store --- (got to check on this)
  Pragma Name: "PFOR_BEGIN"
  Pragma User: MP
  Pragma Scope: ON
  Comment: This directive describes loops whose iterations are independent and can be performed 
  concurrently. -------- ( got to check if this is correct..not sure )
  Pragma Name: "ENTER_GATE"
  Pragma User: MP    
  Pragma Scope: ON
  Pragma Name: "EXIT_GATE"
  Pragma User: MP
  Pragma Scope: ON
  Comment: These two pragmas form a variation of the barrier idea: no thread can go beyond the
   exit until all threads have crossed the enter gate.
  Pragma Name: "BARRIER"
  Pragma User: MP
  Pragma Scope: POINT
  Comment: N/A
  Pragma Name: "CHUNKSIZE"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: Tells the compiler which values to use for chunksize.
  Pragma Name: "COPYIN"
  Pragma User: MP
  Pragma Scope: POINT
  Comment: Copies the value from the master thread's version of a global variable into the slave
   thread's version. 
  Pragma Name: "CRITICAL_SECTION_BEGIN"
  Pragma User: MP
  Pragma Scope: ON
  Comment: N/A

  Pragma Name: "CRITICAL_SECTION_END"
  Pragma User: MP
  Pragma Scope: ON
  Comment: N/A
  Pragma Name: "DOACROSS"
  Pragma User: MP
  Pragma Scope: POINT
  Comment: N/A
  Pragma Name: "IF"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: Lets you set up a condition that is evaluated at run time to determine whether to run the 
   statements serially or in parallel. At compile time, it is not always possible to judge how much 
   work a parallel region does (for example, loop indices are often calculated from data supplied at 
   run time). The if clause lets you avoid running trivial amounts of code in parallel when the possible 
   speedup does not compensate for the overhead associated with running code in parallel.
  Pragma Name: "LASTLOCAL"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: The variables that are local to each process, the compiler saves the value from only the 
  logically last iteration of the loop when it completes.
  Pragma Name: "LOCAL"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: Tells the compiler the names of all the variables that must be local to each thread
  Pragma Name: "MPSCHEDTYPE"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: How to share the loop iterations among the processors 
  Pragma Name: "ORDERED"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: Identifies a structured block of code that must be executed in sequential order.
  Pragma Name: "PARALLEL_BEGIN"
  Pragma User: MP
  Pragma Scope: ON
  Comment: N/A
  Pragma Name: "PARALLEL_END"
  Pragma User: MP
  Pragma Scope: OFF
  Comment: N/A
  Pragma Name: "PARALLEL_DO"
  Pragma User: MP
  Pragma Scope: POINT
  Comment: N/A
  Pragma Name: "PDO_BEGIN"
  Pragma User: MP
  Pragma Scope: ON
  Comment: N/A
  Pragma Name: "PDO_END"
  Pragma User: MP
  Pragma Scope: OFF
  Comment: N/A
  Pragma Name: "PSECTION_BEGIN"
  Pragma User: MP
  Pragma Scope: ON
  Comment: N/A
  Pragma Name: "PSECTION_END"
  Pragma User: MP
  Pragma Scope: OFF
  Comment: N/A
  Pragma Name: "REDUCTION"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: The compiler keeps local copies of the variables and combines them when it exits the loop
  Pragma Name: "SECTION"
  Pragma User: MP
  Pragma Scope: POINT
  Comment: N/A
  Pragma Name: "SHARED"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: Tells the compiler the names of all the variables that the threads must share.
  Pragma Name: "SINGLE_PROCESS_BEGIN"
  Pragma User: MP
  Pragma Scope: ON
  Comment: N/A   
  Pragma Name: "SINGLE_PROCESS_END"
  Pragma User: MP
  Pragma Scope: OFF
  Comment: N/A
  Pragma Name: "ITERATE_VAR"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "ITERATE_INIT"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "ITERATE_COUNT"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "ITERATE_STEP"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "AFFINITY"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "DATA_AFFINITY"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: Data affinity applies only to distributed arrays 
  Pragma Name: "THREAD_AFFINITY"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: Thread affinity assigns particular iterations to a particular thread. 
  Pragma Name: "NUMTHREADS"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: How many of the available threads to use when running this region in parallel. (The
   default is all the available threads.) 
  Pragma Name: "NOWAIT"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: To remove the barrier at the end of the first loop
  Pragma Name: "PAGE_PLACE"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: Allows the explicit placement of data.


  Pragma Name: "SL2_MAJOR_PSECTION_BEGIN"
  Pragma User: CG
  Pragma Scope: ON
  Comment: fork_joint
  Pragma Name: "SL2_MINOR_PSECTION_BEGIN"
  Pragma User: CG
  Pragma Scope: ON
  Comment: fork_joint
  Pragma Name: "SL2_SECTION"
  Pragma User: CG
  Pragma Scope: ON
  Comment: fork_joint
  Pragma Name: "ONTO"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "LASTTHREAD"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "NORECURRENCE"
  Pragma User: LNO
  Pragma Scope: SCOPE_POINT
  Comment: N/A
  Pragma Name: "NEXT_SCALAR"
  Pragma User: LNO
  Pragma Scope: SCOPE_POINT
  Comment: N/A
  Pragma Name: "PURPLE_CONDITIONAL"
  Pragma User: PURPLE
  Pragma Scope: PURPLE_CONDITIONAL
  Comment: N/A
  Pragma Name: "PURPLE_UNCONDITIONAL"
  Pragma User: PURPLE
  Pragma Scope: PURPLE_UNCONDITIONAL
  Comment: N/A
  Pragma Name: "WOPT_FINISHED_OPTIMIZATION"
  Pragma User: WOPT
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ARCLIMIT"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: Sets the size of the “dependence arc data structure”
  Pragma Name: "KAP_CONCURRENTIZE"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: To mark eligible loops to run concurrently (in parallel).
  Pragma Name: "KAP_INLINE_FILE"
  Pragma User: IPA
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_INLINE_PU"
  Pragma User: IPA
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_LIMIT"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_MINCONCURRENT"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: Executing a loop in parallel incurs overhead that varies with different loops. If a loop has little work, parallel execution might be slower than serial execution because of the overhead. However, beyond a certain level, the 
performance can be improved through parallel execution.
  Pragma Name: "KAP_NOCONCURRENTIZE"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_NOINLINE_FILE"
  Pragma User: IPA
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_NOINLINE_PU"
  Pragma User: IPA
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_OPTIMIZE"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ROUNDOFF"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_SCALAR_OPTIMIZE"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_CTHRESHOLD"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_EACH_INVARIANT_IF_GROWTH"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: Allows to limit the total additional lines of code generated through invariant-IF restructuring in each loop.
  Pragma Name: "KAP_MAX_INVARIANT_IF_GROWTH"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: Allows to limit the total additional lines of code generated through i invariant-IF restructuring in each program unit.
  Pragma Name: "KAP_STORAGE_ORDER"
  Pragma User: CG
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_BOUNDS_VIOLATIONS"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_NOBOUNDS_VIOLATIONS"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_CONCURRENT_CALL"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_DO"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_DOPREFER"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_EQUIVALENCE_HAZARD"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_NOEQUIVALENCE_HAZARD"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_LAST_VALUE_NEEDED"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_NOLAST_VALUE_NEEDED"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_PERMUTATION"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A.
  Pragma Name: "KAP_ASSERT_NORECURRENCE"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_RELATION"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_NOSYNC"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_TEMPORARIES_FOR_CONSTANT_ARGUMENTS"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_NOTEMPORARIES_FOR_CONSTANT_ARGUMENTS"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A 
  Pragma Name: "KAP_ASSERT_ARGUMENT_ALIASING"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_BENIGN"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_DEPENDENCE"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_FREQUENCY"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_IGNORE_ANY_DEPENDENCE"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_IGNORE_ASSUMED_DEPENDENCE"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_NO_ARGUMENT_ALIASING"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_NO_CONCURRENT_CALL"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_NO_INTERCHANGE"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_USE_COMPRESS"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A  
  Pragma Name: "KAP_ASSERT_USE_EXPAND"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_USE_CONTROLLED_STORE"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_USE_GATHER"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_ASSERT_USE_SCATTER"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_OPTIONS"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "PREAMBLE_END"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "FLIST_SKIP_BEGIN"
  Pragma User: W2F
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "FLIST_SKIP_END"
  Pragma User: W2F
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "CLIST_SKIP_BEGIN"
  Pragma User: W2C
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "CLIST_SKIP_END"
  Pragma User: W2C
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "FILL_SYMBOL"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: Tells the compiler to insert any necessary padding to ensure that the user variable does not share a cache-line or page with any other symbol.
  Pragma Name: "ALIGN_SYMBOL"
  Pragma User: LNO
  Pragma Scope: PU
  Comment: Specifies alignment of user variables, typically at cache-line or page boundaries.
  Pragma Name: "INDEPENDENT_BEGIN"
  Pragma User: MP
  Pragma Scope: ON
  Comment: N/A
  Pragma Name: "INDEPENDENT_END"
  Pragma User: MP
  Pragma Scope: OFF
  Comment: N/A
  Pragma Name: "KAP_OPTION_INLINE"
  Pragma User: IPA
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "KAP_OPTION_NOINLINE"
  Pragma User: IPA
  Pragma Scope: PU
  Comment: N/A
  Pragma Name: "_CRI_IVDEP"
  Pragma User: MP
  Pragma Scope: WN
  Comment: N/A
  Pragma Name: "_CRI_NOVECTOR"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_NOVSEARCH"   
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_PREFERVECTOR"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_SHORTLOOP"
  Pragma User: LNO
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_CASE"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_ENDCASE"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_COMMON"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_GUARD"
  Pragma User: MP
  Pragma Scope: ON
  Comment: N/A  
  Pragma Name: "_CRI_ENDGUARD"
  Pragma User: MP
  Pragma Scope: OFF
  Comment: N/A
  Pragma Name: "_CRI_ENDLOOP"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_PARALLEL"
  Pragma User: MP
  Pragma Scope: ON
  Comment: N/A
  Pragma Name: "_CRI_ENDPARALLEL"
  Pragma User: MP
  Pragma Scope: OFF
  Comment: N/A
  Pragma Name: "_CRI_PREFERTASK" 
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_TASKCOMMON"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_TASKLOOP"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_SHARED"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_VALUE"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_DEFAULTS"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_MAXCPUS"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_SAVELAST"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_CHUNKSIZE"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_NUMCHUNKS"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_TASK"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  
  Pragma Name: "_CRI_NOTASK"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_ALIGN"   
  Pragma User: CG
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_BL"
  Pragma User: CG
  Program Scope: SPECIAL
  Comment: N/A
  Pragma Name: "_CRI_CNCALL"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "MPNUM"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "COPYIN_BOUND"
  Pragma User: MP
  Pragma Scope: COPYIN_BOUND
  Comment: N/A
  Pragma Name: "SYNC_DOACROSS"
  Pragma User: LNO
  Pragma Scope: SPECIAL
  Comment: N/A
  Pragma Name: "DEFAULT"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: N/A
  
  Pragma Name: "FIRSTPRIVATE"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: Combines the behavior of the PRIVATE clause with automatic initialization of the variables in its list
  Pragma Name: "MASTER"
  Pragma User: MP
  Pragma Scope: ON
  Comment: N/A
  Pragma Name: "ORDERED"
  Pragma User: MP
  Pragma Scope: ON
  Comment: N/A
  Pragma Name: "END_ORDERED"
  Pragma User: MP
  Pragma Scope: OFF
  Comment: N/A
  Pragma Name: "ATOMIC"
  Pragma User: MP
  Pragma Scope: WN
  Comment: N/A
  Pragma Name: "ORDERED_LOWER_BOUND"
  Pragma User: MP
  Pragma Scope: WN
  Comment: N/A
  Pragma Name: "ORDERED_STRIDE"
  Pragma User: MP
  Pragma Scope: WN
  Comment: N/A
  Pragma Name: "END_MARKER"
  Pragma User: MP
  Pragma Scope: OFF
  Comment: N/A
  Pragma Name: "PARALLEL_SECTIONS"
  Pragma User: MP
  Pragma Scope: POINT
  Comment: N/A
  Pragma Name: "START_STMT_CLUMP"
  Pragma User: REGION
  Pragma Scope: POINT
  Comment: N/A
  Pragma Name: "START_STMT_CLUMP"
  Pragma User: REGION
  Pragma Scope: POINT
  Comment: N/A
  Pragma Name: "END_STMT_CLUMP"
  Pragma User: REGION
  Pragma Scope: POINT
  Comment: N/A
  Pragma Name: "TYPE_OF_RESHAPED_ARRAY"
  Pragma User: LNO
  Pragma Scope: WN
  Comment: N/A
  Pragma Name: "ASM_CONSTRAINT"
  Pragma User: CG
  Pragma Scope: WN
  Comment: Each of which indicates an operand constraint for an output operand and the negative preg number that will be used to refer to the output value corresponding to it
  Pragma Name: "ASM_CLOBBER"
  Pragma User: CG
  Pragma Scope: WN
  Comment: Indicate registers clobbered by the given assembly code.
  Pragma Name: "FORALL"
  Pragma User: LNO
  Pragma Scope: SPECIAL
  Comment: #ifdef KEY  
  Pragma Name: "COPYPRIVATE"
  Pragma User: MP
  Pragma Scope: POINT
  Comment: by jhs, 02.9.3
  Pragma Name: "PARALLEL_WORKSHARE"
  Pragma User: MP
  Pragma Scope: POINT
  Comment: by jhs, 04.3.10
  Pragma Name: "PWORKSHARE_BEGIN"
  Pragma User: MP
  Pragma Scope: ON
  Comment: by jhs, 04.3.10
  Pragma Name: "PWORKSHARE_END"
  Pragma User: MP
  Pragma Scope: OFF
  Comment: by jhs, 04.3.10
  Pragma Name: "THREADPRIVATE"
  Pragma User: MP
  Pragma Scope: SPECIAL
  Comment: by jhs, 02.9.18 
  Pragma Name: NULL
  Pragma User: NULL
  Pragma Scope: UNKNOWN
  Comment: MAX_WN_PRAGMA

common/com/wn_pragmas.h

   Pragma scopes:
   * UNKNOWN
   * PU - Affects entire current program unit
   * WN - Affects next whirl statement node
   * POINT - Affects this point of the code

   /* matching on/off pragmas must belong to the same block */
   * ON - Start of affected scope
   * OFF - End of affected scope

   * SPECIAL - pragma-specific rule for scope
   * MAX_SCOPE_PRAGMA - Last one in enum


   Schedule types (for WN_PRAGMA_MPSCHEDTYPE) */
 typedef enum {
   WN_PRAGMA_SCHEDTYPE_UNKNOWN,
   WN_PRAGMA_SCHEDTYPE_RUNTIME, /* tells the compiler that the real schedule type will be specified at run time, based on environment 
variables. */
   WN_PRAGMA_SCHEDTYPE_SIMPLE, /* default scheduling */
   WN_PRAGMA_SCHEDTYPE_INTERLEAVE, /* tells the run-time scheduler to give each thread chunksize iterations of the loop, which are then 
assigned to the threads in an interleaved way. */
   WN_PRAGMA_SCHEDTYPE_DYNAMIC, /* tells the run-time scheduler to give each thread chunksize iterations of the loop. chunksize should be 
smaller than the number of total iterations divided by the number of threads. The advantage of dynamic over simple is that dynamic helps  
distribute the work more evenly than simple. */
   WN_PRAGMA_SCHEDTYPE_GSS, /* (guided self-scheduling): tells the run-time scheduler to give each processor a varied number of iterations of 
the loop. This is like dynamic, but instead of a fixed chunksize, the chunksize iterations begin with big pieces and end with small pieces. */
   WN_PRAGMA_SCHEDTYPE_PSEUDOLOWERED, 
   MAX_PRAGMA_SCHEDTYPE
 } WN_PRAGMA_SCHEDTYPE_KIND;
 /* Possible values for the default clause */
 typedef enum {
   WN_PRAGMA_DEFAULT_UNKNOWN,
   WN_PRAGMA_DEFAULT_NONE,
   WN_PRAGMA_DEFAULT_SHARED,
   WN_PRAGMA_DEFAULT_PRIVATE,
   MAX_PRAGMA_DEFAULT
  } WN_PRAGMA_DEFAULT_KIND;

They support #pragma pack (n), #pragma options options-list, #pragma frequency-hint. Look at Section 4.2.2 in. Have these pragmas from PathScale been merged with the latest release of Open64?

The following are the C/C++ Compiler features:

   #pragma pack (n)
   This pragma specifies that the next structure should have each of their fields aligned to an alignment of n bytes if its natural   
   alignment is not smaller than n.
   #pragma options <list-of-options>
   Optimization flags can now be changed via directives in the user program.
   #pragma frequency-hint
   The user can provide a hint to the compiler regarding which branch of an IF-statement is more likely to be executed at runtime.
   This hint allows the compiler to optimize code generated for the different branches.
  • SGI Compiler Suite: [1]

Loop Nest Optimization #pragma Directives [2]:

  For e.g :
  #pragma unroll [n] 
  This directive instructs the compiler to add n-1 copies of the loop body to the inner loop. If the loop that this directive 
  immediately precedes is an inner loop, then it indicates standard unrolling (version 7.2 and later). If the loop that this directive 
  immediately precedes is not innermost, then outer loop unrolling (unroll and jam) is performed (version 7.0 and later). 
  The value of n must be at least 1. If it is 1, then unrolling is not performed. 
  • Sun Fortran compilers, f77 and f95: [3]

There are other pragmas called "directives" in F77. Those are the ones that are supported in LNO for F77 only.

Other Compilers and Pragmas

Loop Unrolling: http://publib.boulder.ibm.com/infocenter/comphelp/v101v121/index.jsp?topic=/com.ibm.xlcpp101.aix.doc/compiler_ref/pragma_descriptions.html

  • ARM Compiler: [6]

Loop Unrolling: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0348b/CJACACFE.html and http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0348b/CJAHJDAB.html

  • MIPSpro C and C++ Pragmas :[7]

Loop Unrolling: http://www.risc.uni-linz.ac.at/education/courses/ws2003/intropar/origin-new/Pragmas/sgi_html/ch08.html#Z47239

  • Sun C++ compiler pragma: [8] and
 http://blogs.sun.com/solarisdev/entry/new_article_prefetching
  • Microsoft C and C++ compilers: [9]
  • Intel® C++ Compiler Pragmas: [10] and [11]
  • PGI Fortran & C accelerator programming model document : download

Points to be analyzed

 Given any application and with no dependencies among the various statements what happens when loop unrolling is applied?
 1)Number of loops unrolled increases
 2)Longest path delay increases
 3)Fewer resources are shared iteratively
 4)Maximum clock frequency decreases
 5)Total number of clock cycles needed decreases
 6)Circuit size increases and number of components needed increases
 7)Possibility to eliminate loop overhead, improving cache hit, reducing branching by replicating loop body
 What could be the largest feasible unroll factor and what are the factors that would depend upon?
 A best implementation factor would require a balance between:
 Area, Performance and Memory access and lower production costs ofcourse.

Proposed steps to manage the unrolling factor on a reconfigurable platform:

 a) Choose a reconfigurable board/platform.
 b) Choose an application that has no dependencies in the kernel and get the profiling information
 c) Find the maximum time taken to find the area utilized by one kernel "k" on FPGA
 d) What is the area available?
 e) Time consumed by "k" in software and hardware (common measurement)
 f) Number of clock cycles for memory transfer operations for one "k"
 g) Available bandwidth
 h) Set an upper bound for the optimal unroll factor (algorithm) say terminology "up"
 i) what is the time taken for a new instance "n"  of "k" to start, taking into account memory read, memory write and time taken for that  
 instance "n" to run on hardware?
 j) Different instances may request for memory accesses at different times, so this should not overlap..this sets another bound ...say  
 terminology "uo"
 k) so the time taken to run "n" instances of kernel "k" would depend upon memory read, memory write,  time for one instance of k on 
 hardware, number of instances,
 l) so for e.g.= there are 2 instances, that has to run in parallel, then calculate the above discussed timings + the delay that is   
 required for the second instance to get inputs from the first instance.
 m) this gives an idea as to how much time it would be take for "n" instances of kernel "k" to run on hardware.
 Can the optimal unrolling factor yet be decided??
 Yes, how?
 a) Find out the iterations before and after loop unrolling
 b) Find out the time taken/clock cycles required for kernel "k" and loop nests of k() in both hardware and software
 c) this will give you the speedup.
 d) Having said that, a small gain in speed up with a large increase in area utilization indicates not to increase the unroll factor.
 (Some hints that struck as of now...this needs to be refined though).
 A calculation or algorithm required to justify this point "d)"
 The above conditions hold good for one kernel..

Other Challenges

 What about multiple kernel, multiple instances "n1" "n2" of k1 k2...and so on.

Some hints from Beowolf.org http://www.pathscale.com/pdf/PathOpt2-paper.pdf

Common LNO flags

<config>
<define name="common_LNO"> 
<option> -LNO:opt=0 </option>
<option> -LNO:blocking=off </option>
<option> -LNO:ou_max=5 </option>
<option> -LNO:fusion=2 </option>
<option> -LNO:cse=off </option>
<choose k="1">
<option> -LNO:simd=0 </option>
<option> -LNO:simd=1 </option>
<option> -LNO:simd=2 </option>
</choose>
</define>

$ pathopt2 -f lno.xml -r ./test pathcc @ -o test test.c This example directs pathopt2 to use the file lno.xml as the configuration file, to use the command pathcc options -o test test.c for the building phase (where options is iteratively replaced with the rules specified in the configuration file lno.xml), and to use ./test for the testing phase.


$ ./pathopt2 -f lno.xml -t combo -r ./himeno pathf90 @ -o himeno himeno_200.f
Flags Build Test Real User System
----------------------- ----- ---- ---------- ---------- ----------
-O3 -LNO:opt=0 
-O3 -LNO:blocking=off 
-O3 -LNO:ou_max=5 
-O3 -LNO:fusion=2 
-O3 -LNO:cse=off 
-O3 -LNO:simd=0 
-O3 -LNO:simd=1 
-O3 -LNO:simd=2 
-O3 -LNO:simd=1 
-O3 -LNO:simd=2 
-O3 -LNO:cse=off 
-O3 -LNO:simd=1 -LNO:simd=2
-O3 -LNO:simd=1 -LNO:cse=off
-O3 -LNO:simd=2 -LNO:cse=off
-O3 -LNO:simd=1 -LNO:simd=2 -LNO:cse=off
 Guided Build Management with PathOpt2
In general, using PathOpt2 involves four steps:
1. Create the option configuration file, or use the one provided by PathScale.
2. Create build and run scripts to automate these functions.
3. Run PathOpt2. Interpret the results.
4. Choose a more detailed execute target, and repeat from step 2.

Comparison of different loop unrolling pragmas

 ARM - does unrolling irrespective of if it is beneficial or not, does for both vectorized and non-vectorized loops i.e. loops that has 
 and has no dependencies among iterations. The default is 4. #pragma unroll_completely can only be used immediately before a for loop, a 
 while loop, or a do ... while loop. When compiling at -O3 -Otime, the compiler automatically unrolls loops where it is beneficial to do 
 so.   You can use this pragma to request that the compiler to unroll a loop that has not been unrolled automatically.
 Sun - has prefetch operations, i.e it takes care of when the next data has to arrive, Prefetch instructions can increase the speed of an 
 application substantially by bringing data into cache so that it is available when the processor needs it. Note that the performance 
 benefit due to prefetch instructions is hardware-dependent and prefetches which improve performance on one chip may not have the same 
 effect on a different chip.
 
 HP - You can apply an unroll factor that you think is best for the given loop or apply no unroll factor to the loop. If this pragma is 
 not specified, the compiler uses its own heuristics to determine the best unroll factor for the inner loop. The UNROLL pragma must be 
 immediately followed with a loop statement and will be ignored if it is not an innermost loop. You can tell the compiler to unroll the 
 loops that have less than n operations, where the default size n is 60.
 
 Intel compiler - Currently applied only to the innermost loop... the option -funroll-all-loops-- unroll all loops even if the number of 
 iterations is uncertain when the loop is entered.
 
 IBM Compiler - Only one pragma may be specified on a loop. The pragma must appear immediately before the loop or the #pragma block_loop 
 directive to have effect. If number is not specified and if -qhot, -qsmp, or -O4 or higher is specified, the optimizer determines an 
 appropriate unrolling factor for each nested loop.The pragma affects only the loop that follows it.They cannot be applied to do while 
 and while loops. Dependencies in the loop must not be "backwards-looking". For example, a statement such as A[i][j] = A[i -1][j + 1] + 
 4) must not appear within the loop.
 IBM has directives: 
 ASSERT : independence of iterations and trip count.
 CNCALL: no procedure in teh loop has a loop carried dependency.
 PERMUTATION: array contains no repeated values in the scope of a loop.
 Loop Unrolling and Register Spilling
 Examine source code and count how many loads/stores required ?
 Compare with assembly code.
 May need to distribute loops, split the loops, probably fission.
 Counting the number of array elements in the loop and the variables and then cross checking with the logical registers available. If 
 there are more registers required, then the loop could be split or more basic blocks can be introduced, or some if-else statments can be 
 written so that forcibly splits the loops into several basic blocks and then later taken out of the assembler code by hand. 
 Loop Unrolling and FPGA
 Critical parameter is throughput. So there could be different optimal unrolling factor for different program. 
 FPGA area may grow or shrink depending on the increase in the code size. This increase in the number of operations can be mapped to many 
 hardware functional units, the implementation can exploit concurrency at the expense of FPGA capacity.The total running time of an 
 implementation of a loop in a FPGA is given by the product of the number of cycles it takes to execute the code and the frequency of the 
 FPGA which permits the safe operation of the realized design.


List of pragmas we may support in the near future ------[[ Feedback Please??]]

  • PRAGMA OPTIMIZE ON|OFF
Personal tools