*****************************************************************************
**			TAU Portable Profiling Package			   **
**			http://www.cs.uoregon.edu/research/paracomp/tau    **
*****************************************************************************
**    Copyright 1997-2004				   	   	   **
**    Department of Computer and Information Science, University of Oregon **
**    Advanced Computing Laboratory, Los Alamos National Laboratory        **
**    Research Center Juelich, ZAM Germany			           **
*****************************************************************************
/*******************************************************************
 *                                                                 *
 *        Tuning and Analysis Utilities Installation Procedure     *
 *                           Version 2.13                          *
 *                                                                 *
 *******************************************************************
 *    For installation help, see INSTALL.                          *
 *    For release notes, see README.                               *
 *    For JAVA instructions, see README.JAVA                       *
 *    For licensing information, see LICENSE.                      *
 *    For a tutorial on using TAU, open html/index.html in your    *
 *        web browser.                                             *
 *    For more information, including updates and new releases,    *
 *        see http://www.cs.uoregon.edu/research/paracomp/tau      *
 *    For help, reporting bugs, and making suggestions, please     *
 *        send e-mail to tau-bugs@cs.uoregon.edu                   *
 *******************************************************************/


General Installation Procedure: 
-------------------------------
Microsoft Windows users should refer to instructions in Windows-Readme.txt. 

The following instructions are meant for Unix Users.

1.  Configure the package for your system.

After uncompressing and untarring tau, the user needs to configure, compile and
install the package. This is done by invoking:

% ./configure
% make install

TAU is configured by running the configure script with appropriate options that
select the profiling and tracing components that are used to build the TAU 
library.  The `configure' shell script attempts to guess correct values for 
various system-dependent variables used during compilation, and creates the 
Makefile(s) (one in each subdirectory of the source directory).

NOTE: It is highly recommended that you select the *minimal* set of options 
*****
that satisfies the instrumentation and measurement parameters that you need. 
Multiple configurations can be created by using configure several times 
using a different set of options each time. Commonly used configurations are 
typically installed using the 'installtau' tool described below.

% ./configure -help 
Usage: configure [OPTIONS]
  where [OPTIONS] are:
-c++=<compiler>  ............................ specify the C++ compiler.
    options [CC|KCC|g++|*xlC*|cxx|pgCC|FCC|guidec++|aCC|c++|ecpc|icpc].
-cc=<compiler> ................................ specify the C compiler.
                            options [cc|gcc|KCC|pgcc|guidec|*xlc*|ecc].
-pdt_c++=<compiler>  ............ specify a different PDT C++ compiler.
    options [CC|KCC|g++|*xlC*|cxx|pgCC|FCC|guidec++|aCC|c++|ecpc|icpc].
-fortran=<compiler> ..................... specify the Fortran compiler.
   options    [gnu|sgi|ibm|ibm64|hp|cray|pgi|absoft|fujitsu|sun|compaq|
  	                                         kai|nec|hitachi|intel]
-useropt='<parameters>' ............... list of commandline parameters.
-prefix=<dir> ................ Specify a target installation directory.
-pthread .................................. Use pthread thread package.
-sproc .................................. Use SGI sproc thread package.
-tulipthread=<dir> .......... Specify location of Tulip/Smarts package.
-smarts .................. Use SMARTS API for threads (use with above).
-openmp ........................................... Use OpenMP threads.
-opari=<dir>... Specify location of Opari OpenMP tool (use with above).
-opari_region ......... Report performance data for all OpenMP regions.
-opari_construct ... Report performance data for all OpenMP constructs.
-pcl=<dir> ..... Specify location of PCL (Performance Counter Library).
-papi=<dir> ............... Specify location of PAPI (Performance API).
-pdt=<dir> ........ Specify location of PDT (Program Database Toolkit).
-jdk=<dir> ...... Specify location of JAVA 2 Development Kit (jdk1.2+).
-dyninst=<dir> ................... Specify location of DynInst Package.
-mpi .......................... Specify use of TAU MPI wrapper library.
-mpiinc=<dir> ............. Specify location of MPI include dir and use
                           the TAU MPI Profiling and Tracing Interface.
-mpilib=<dir> ............. Specify location of MPI library dir and use
                           the TAU MPI Profiling and Tracing Interface.
-mpilibrary=<library> ................ Specify a different MPI library.
            e.g., -mpilibrary=-lmpi_r                                  
-nocomm  ........ Disable tracking communication events in MPI library.
-epilog=<dir>  ............ Specify location of EPILOG Tracing package.
-pythoninc=<dir> ........ Specify location of Python include directory.
-pythonlib=<dir> ............ Specify location of Python lib directory.
-tag=<unique name> ........ Specify a tag to identify the installation.
-muse ................................. Specify the use of MAGNET/MUSE.
-muse_event ............................Specify the use of MAGNET/MUSE.
                                w/ non-monotonically increasing values.
-muse_multiple......................... Specify the use of MAGNET/MUSE.
                                    w/ monotonically increasing values.
-TRACE ..................................... Generate TAU event traces.
-PROFILE ............ Generate profiles (summary statistics) (default).
-PROFILECALLPATH ......................... Generate call path profiles.
-PROFILESTATS .................. Enable standard deviation calculation.
-MULTIPLECOUNTERS ............ Use multiple hardware counters and time.
-COMPENSATE ........ Compensate for profiling measurement perturbation.
-SGITIMERS .......... Use fast nanosecond timers on SGI R10000 systems.
-CRAYTIMERS ............ Use fast nanosecond timers on Cray X1 systems.
-LINUXTIMERS ......... Use low overhead TSC Counter for wallclock time.
-CPUTIME .......... Use usertime+system time instead of wallclock time.
-PAPIWALLCLOCK ........ Use PAPI to access wallclock time. Needs -papi.
-PAPIVIRTUAL   .......... Use PAPI for virtual (user) time calculation.
-noex .................. Use no exceptions while compiling the library.
-help ...................................... display this help message.
***********************************************************************

The following  command-line options are available to configure:

-prefix=<directory>
   
   Specifies the destination directory where the header, library and binary 
   files are copied. By default, these are copied to subdirectories <arch>/bin 
   and <arch>/lib in the TAU root directory. 
   
-arch=<architecture>
   
   Specifies the architecture. If the user does not specify this option, 
   configure determines the architecture. For SGI, the user can specify either 
   of sgi32, sgin32 or sgi64 for 32, n32 or 64 bit compilation modes 
   respectively. The files are installed in the <architecture>/bin and 
   <architecture>/lib directories. 

IMPORTANT NOTE: For IBM architectures, we use rs6000 and ppc64 to denote the 
   AIX Power4 and Linux Power4 32 bit compilation modes respectively. These architectures 
   are automatically detected by TAU and 32 bit compilation is the default compilation mode. 
   However, if you wish to specify a 64 bit compilation mode, please use 
   -arch=ibm64 for AIX 64 bit, or -arch=ibm64linux for 64 bit Linux Power4 platform. 
   For IBM Linux Power4, we use -c++=g++ for GNU g++ 32 bits and -c++=powerpc64-linux-g++ 
   and -cc=powerpc64-linux-gcc for GNU g++ 64 bits. The compilers are installed in 
   /usr/bin/g++ for 32 bits and /opt/cross/bin/powerpc64-linux-g++ for 64 bit g++ 
   respectively. Under IBM Linux Power4, we use xlf90 as the default Fortran compiler with 
   g++/gcc and xlC/xlc.

   
-c++=<C++ compiler>
   
   Specifies the name of the C++ compiler. Supported  C++ compilers include  
   KCC (from KAI/Intel), CC,  g++ and power64-linux-g++ (from GNU), FCC (from Fujitsu), 
   xlC(from IBM), guidec++ (from KAI/Intel), aCC (from HP), c++ (from Apple), and pgCC 
   (from PGI). 
   
-cc=<C Compiler>
   
   Specifies the name of the C compiler. Supported C compilers include cc, 
   gcc and powerpc64-linux-gcc (from GNU), pgcc (from PGI), fcc (from Fujitsu), 
   xlc (from IBM), and KCC (from KAI/Intel).

-pdt_c++=<C++ Compiler> 
   Specifies a different C++ compiler for PDT (tau_instrumentor). This is 
   typically used when the library is compiled with a C++ compiler 
   (specified with -c++) and the tau_instrumentor is compiled with a different 
   <pdt_c++> compiler. For e.g., -c++=pgCC -cc=pgcc -pdt_c++=KCC -openmp ... 
   uses PGI's OpenMP compilers for TAU's library and KCC for tau_instrumentor.
   
-fortran=<Fortran Compiler>
   
   Specifies the name of the Fortran90 compiler. Valid options are:
   gnu, sgi, ibm, ibm64, hp, cray, pgi, absoft, fujitsu, sun, compaq, and intel.

-tag=<Unique Name>

   Specifies a tag in the name of the stub Makefile and TAU makefiles to 
   uniquely identify the installation. This is useful when more than one MPI 
   library may be used with different versions of compilers.
   e.g., 
   % configure -c++=icpc -cc=icc -tag=intel71-vmi -mpiinc=/vmi2/mpich/include 

-pthread
   
   Specifies pthread as the thread package to be used. In the default mode, no 
   thread package is used. 
   
-tulipthread=<directory>
   
   Specifies Tulip threads (HPC++) as the threads package to be used as well 
   as the location of the root directory where the package is installed. 
   [ Ref: http://www.acl.lanl.gov/tulip ]
   
-tulipthread=<directory> -smarts
   
   Specifies  SMARTS (Shared Memory Asynchronous Runtime System) as the 
   threads package to be used. <directory> gives the location of the SMARTS 
   root directory. [ Ref: http://www.acl.lanl.gov/smarts ]

-openmp
   Specifies OpenMP as the threads package to be used. 
   [ Ref: http://www.openmp.org ]

-opari=<dir>
   Specifies the location of the Opari OpenMP directive rewriting tool. 
   The use of Opari source-to-source instrumentor in conjunction with
   TAU exposes OpenMP events for instrumentation. See examples/opari directory.
   [ Ref: http://www.fz-juelich.de/zam/kojak/opari/ ]
   Note: There are two versions of Opari: standalone - (opari-pomp-1.1.tar.gz) and
   the newer KOJAK - kojak-<ver>.tar.gz opari/ directory. Please upgrade to the 
   KOJAK version (especially if you're using IBM xlf90) and specify 
   -opari=<kojak-dir>/opari while configuring TAU.
   
-opari_region 
   Report performance data for only OpenMP regions and not constructs. 
   By default, both regions and constructs are profiled with Opari.

-opari_construct 
   Report performance data for only OpenMP constructs and not regions.
   By default, both regions and constructs are profiled with Opari.

-pdt=<directory>
   
   Specifies the location of the installed PDT (Program Database Toolkit) root 
   directory. PDT is used to build tau_instrumentor, a C++, C and F90 
   instrumentation program that automatically inserts TAU annotations in the 
   source code. If PDT is configured with a subdirectory option (-compdir=<opt>)
   then TAU can be configured with the same option by specifying 
   -pdt=<dir> -pdtcompdir=<opt>. 

   [ Ref: http://www.acl.lanl.gov/pdtoolkit ]
   
-pcl=<directory>
  
   Specifies the location of the installed PCL (Performance Counter Library) 
   root directory. PCL provides a common interface to access hardware 
   performance counters on modern microprocessors. The library supports 
   Sun UltraSparc I/II, PowerPC 604e under AIX, MIPS R10000/12000 under IRIX, 
   HP/Compaq Alpha 21164, 21264 under Tru64 Unix and Cray Unicos (T3E) and the 
   Intel Pentium family of microprocessors under Linux. This option specifies 
   the use of hardware performance counters for profiling (instead of time).  
   To measure floating point instructions, set the environment variable 
   PCL_EVENT to PCL_FP_INSTR (for example). Refer to the TAU User's Guide or
   PCL Documentation (pcl.h) for other event names.
   [ Ref : http://www.fz-juelich.de/zam/PCL ]

-papi=<directory>

   Specifies the location of the installed PAPI (Performance API) root 
   directory. PAPI specifies a standard application programming interface (API)    
   for accessing hardware performance counters available on most modern 
   microprocessors similar. To measure floating point instructions, set the
   environment variable PAPI_EVENT to PAPI_FP_INS (for example). Refer to the
   TAU User's Guide or PAPI Documentation for other event names.
   [ Ref : http://icl.cs.utk.edu/projects/papi/api/ ]
   
-jdk=<directory>
   Specifies the location of the Java 2 development kit (jdk1.2+). See
   README.JAVA on instructions on using TAU with Java 2 applications. 
   This option should only be used for configuring TAU to use JVMPI for 
   profiling and tracing of Java applications. It should not be used for 
   configuring paraprof, which uses java from the user's path. 

-dyninst=<directory>
   Specifies the location of the DynInst (dynamic instrumentation) package. 
   See README.DYNINST for instructions on using TAU with DynInstAPI for 
   binary runtime instrumentation (instead of manual instrumentation) or
   prior to execution by rewriting it. 
   [ Ref: http://www.cs.umd.edu/projects/dyninstAPI/ ]

-mpiinc=<dir>
   
   Specifies the directory  where mpi header files reside (such as mpi.h and 
   mpif.h). This option also generates the TAU MPI wrapper library that 
   instruments MPI routines using the MPI Profiling Interface. See the 
   examples/NPB2.3/config/make.def file for its usage with Fortran and MPI 
   programs and examples/pi/Makefile for a C++ example that uses MPI. 
   
-mpilib=<dir>
   
   Specifies the directory where mpi library files reside. This option should 
   be used in conjunction with the -mpiinc=<dir> option to generate the TAU 
   MPI wrapper library. 

-mpilibrary=<lib>
   
   Specifies the use of a different MPI library. By default, TAU uses
   -lmpi or -lmpich as the MPI library. This option allows the user to specify
   another library. e.g., -mpilibrary=-lmpi_r  for specifying a thread-safe MPI 
   library.

-nocomm
   Allows the user to turn off tracking of messages (synchronous/asynchronous) in
   TAU's MPI wrapper interposition library. Entry and exit events for MPI routines 
   are still tracked. Affects both profiling and tracing.
   
-epilog=<dir>
   
   Specifies the directory where the EPILOG tracing package [FZJ] is installed.
   This option should be used in conjunction with the -TRACE option to generate
   binary EPILOG traces (instead of binary TAU traces). EPILOG traces can then
   be used with other tools such as EXPERT. EPILOG comes with its own 
   implementation of the MPI wrapper library and the POMP library used with 
   Opari. Using option overrides TAU's libraries for MPI, and OpenMP.

-pythoninc=<dir>
   
   Specifies the location of the Python include directory. This is the directory
   where Python.h header file is located. This option enables python bindings to 
   be generated. The user should set the environment variable PYTHONPATH to 
   <TAUROOT>/<ARCH>/lib/bindings-<options> to use a specific version of the TAU 
   Python bindings. By importing package pytau, a user can manually instrument the source
   code and use the TAU API. On the other hand, by importing tau and 
   using tau.run('<func>'), TAU can automatically generate instrumentation. See
   examples/python directory for further information.

-pythonlib=<dir>
   
   Specifies the location of the Python lib directory. This is the directory
   where *.py and *.pyc files (and config directory) are located. This option is 
   mandatory for IBM when Python bindings are used. For other systems, this option 
   may not be specified (but -pythoninc=<dir> needs to be specified). 

-PROFILE 

   This is the default option; it specifies summary profile files to be 
   generated at the end of execution. Profiling generates aggregate statistics 
   (such as the total time spent in routines and statements), and can be used 
   in conjunction with the profile browser paraprof to analyse the performance. 
   Wallclock time is used for profiling  program entities. 
   
-PROFILECALLPATH 

   This option generates call path profiles which shows the time spent in a 
   routine when it is called by another routine in the calling path. "a => b"
   stands for the time spent in routine "b" when it is invoked by routine "a".
   This option is an extension of -PROFILE, the default profiling option. 
   Specifying TAU_CALLPATH_DEPTH environment variable, the user can vary the 
   depth of the callpath. See examples/calltree for further information.

-PROFILESTATS
   
   Specifies the calculation of additional statistics, such as the standard 
   deviation of the exclusive time/counts spent in each profiled block. This 
   option is an extension of -PROFILE, the default profiling option.
   
-COMPENSATE 
   
   Specifies online compensation of performance perturbation. When this 
   option is used, TAU computes its overhead and subtracts it from the 
   profiles. It can be only used when profiling is chosen. This option works
   with MULTIPLECOUNTERS as well, but while it is relevant for removing 
   perturbation with wallclock time, it cannot accurately account for 
   perturbation with hardware performance counts (e.g., L1 Data cache misses).
   See TAU Publication [Europar04] for further information on this option. 

-PROFILECOUNTERS
   
   Specifies use of hardware performance counters for profiling under IRIX  
   using the SGI R10000 perfex counter access interface. The use of this option 
   is deprecated in favor of the -pcl=<dir> and -papi=<dir> options described 
   above. 

-MULTIPLECOUNTERS
   
   Allows TAU to track more than one quantity (multiple hardware counters, CPU
   time, wallclock time, etc.) Configure with other options such as -papi=<dir>,
   -pcl=<dir>, -LINUXTIMERS, -SGITIMERS, -CRAYTIMERS, -CPUTIME, -PAPIVIRTUAL, 
   etc. See examples/multicounters/README file for detailed instructions on 
   setting the environment variables for this option. If -MULTIPLECOUNTERS is 
   used with the -TRACE option, tracing employs the COUNTER1 variable for 
   wallclock time. 
   
-SGITIMERS
   
   Specifies use of the free running nanosecond resolution on-chip timer on 
   the MIPS R10000. This timer has a lower overhead than the default timer on 
   SGI, and is recommended for SGIs. 

-CRAYTIMERS
   
   Specifies use of the free running nanosecond resolution on-chip timer on 
   the CRAY X1 cpu (accessed by the rtc() syscall). This timer has a 
   significantly lower overhead than the default timer on the X1, and is 
   recommended for profiling. Since this timer is not synchronized across 
   different cpus, this option should not be used with the -TRACE option for
   tracing a multi-cpu application, where a globally synchronized realtime 
   clock is required. 

-LINUXTIMERS
   Specifies the use of the free running nanosecond resolution time stamp 
   counter (TSC) on Pentium III+ and Itanium family of processors under Linux.
   This timer has a lower overhead than the default time and is recommended.

-CPUTIME
   Uses usertime + system time instead of wallclock time. It gives the CPU
   time spent in the routines.  This currently works only on LINUX systems 
   for multi-threaded programs and on all systems for single-threaded programs. 
   
-PAPIWALLCLOCK
   Uses PAPI (must specify -papi=<dir> also) to access high resolution CPU 
   timers for wallclock time. The default case uses gettimeofday() which 
   has a higher overhead than this. 

-PAPIVIRTUAL
   Uses PAPI (must specify -papi=<dir> also) to access process virtual time.
   This represents the user time for measurements. 


-TRACE
   
   Generates event-trace logs, rather than summary profiles. Traces show when 
   and where an event occurred, in terms of the location in the source code and
   the process that executed it. Traces can be merged and converted using 
   tau_merge and tau_convert utilities respectively, and  visualized using 
   Vampir, a commercial trace visualization tool. [ Ref http://www.pallas.de ]

-muse
  
   Specifies the use of MAGNET/MUSE to extract low-level information from the
   kernel. To use this configuration, Linux kernel has to be patched with MAGNET
   and MUSE has to be install on the executing machine.  Also, magnetd has to be
   running with the appropriate handlers and filters installed. User can specify 
   package by setting the environment variable TAU_MUSE_PACKAGE.  By default, 
   it uses the "count". Please refer to README.MUSE for more information.
   
-noex
   
   Specifies that no exceptions be used while compiling the library. This is 
   relevant for C++. 
   
-useropt=<options-list>
   
   Specifies additional user options such as -g or -I.  For multiple options, 
   the options list should be enclosed in a single quote.
   
-help
   
   Lists all the available configure options and quits. 

   Examples:

   % ./configure -c++=KCC 
   Use TAU with KCC
 
   % ./configure -c++=CC -useropt='-g -I/local/apps/STL/'
   Use TAU with SGI CC and add the above user defined options to the 
   commandline.

   % ./configure -TRACE -PROFILE 
   Enable both profiling and tracing.

   % ./configure -c++=KCC -SGITIMERS -tulipthread=/home/smarts/build/smarts-1.0
     -smarts -arch=sgin32 -prefix=/usr/local/packages/tau
   Use TAU with KCC and fast nanosecond timers on SGI and use SMARTS with -n32
   options and install the files in /usr/local/packages/tau

   % ./configure -c++=KCC -cc=cc -arch=sgi64 -mpiinc=/local/apps/mpich/include
     -mpilib=/local/apps/mpich/lib/IRIX64/ch_p4 -SGITIMERS -pdt=/local/apps/pdt
   Use TAU with KCC, and cc on 64 bit SGI systems and use MPI wrapper libraries
   with SGI's low cost timers and use PDT for automated source code 
   instrumentation.

   % ./configure -c++=guidec++ -cc=guidec -papi=/usr/local/packages/papi -openmp
     -mpiinc=/usr/packages/mpich/include -mpilib=/usr/packages/mpich/lib
   Use OpenMP+MPI using KAI's Guide compiler suite and use PAPI for accessing
   hardware performance counters for measurements.

***********************************************************************
   To install *multiple* (typical) configurations of TAU at a site, you may use the 
   script 'installtau'. It takes options similar to those described above. It 
   invokes ./configure <opts>; make clean install;  to create multiple libraries that 
   may be requested by the users at a site. 
   % installtau -help

TAU Configuration Utility 
***********************************************************************
Usage: installtau [OPTIONS]
  where [OPTIONS] are:
-arch=<arch>  
-fortran=<compiler>  
-cc=<compiler>   
-c++=<compiler>   
-useropt=<options>  
-pdt=<pdtdir>  
-papi=<papidir>  
-mpiinc=<mpiincdir>  
-mpilib=<mpilibdir>  
-mpilibrary=<mpilibrary>  
-opari=<oparidir>  
***********************************************************************


2. Compilation.

   Type `make clean install' to compile the package. 

   Make installs the library and its stub makefile  in <prefix>/<arch>/lib 
   subdirectory and installs utilities such as pprof and paraprof in 
   <prefix>/<arch>/bin subdirectory.

   
   Add to your .cshrc file the $(TAU_ARCH)/bin subdirectory.
   e.g.,
   # in .cshrc file
   set path=($path /usr/local/packages/tau/sgi64/bin)

   See the examples included with this distribution in the examples/ directory.
   The README file in examples directory describes the examples. 

3. Instrumentation.
   JAVA requires no special instrumentation. To use TAU with JAVA, the 
   LD_LIBRARY_PATH environment variable must have the TAU <arch>/lib directory
   in its path. See README.JAVA on instructions regarding its usage.
   For other languages such as C++, C, and Fortran 90, TAU instrumentation in 
   the form of macros or routines must be added  to the source code to 
   identify routine transitions. It can be done automatically using the C++ 
   instrumentor - tau_instrumentor,  based on the Program Database Toolkit, or 
   manually using the instrumentation API (Application Programmers Interface). 
   The API is explained in detail in the documentation available at
   http://www.acl.lanl.gov/tau download page and can be seen in the examples 
   directory. This process involves identifying functions and associating each 
   function with one or more TAU profile groups. This enables selectively 
   profiling groups of functions. By default all instrumented functions that 
   are invoked are profiled.
   
   % cd examples/instrument
   % ./simple
   % pprof
   % paraprof

   To use tau_instrumentor, the C++ source code instrumentor: 
   a. Install pdtoolkit. [ Ref: http://www.acl.lanl.gov/pdtoolkit ] 
      % ./configure -arch=IRIX64 -KCC

   b. Install TAU using the -pdt configuration option.
      % ./configure -pdt=/usr/local/packages/pdtoolkit-1.0 -c++=KCC -arch=sgi64 

   c. Modify the makefile to invoke cxxparse from PDT which generates a 
      program database file (.pdb) that contains program  entities (such as 
      routine locations) and tau_instrumentor that uses the .pdb file and the 
      C++ source code to generate an instrumented version of the source code.  
      See examples/autoinstrument/Makefile. 
      
      % cd examples/autoinstrument; make
      % klargest 
      % pprof

   d. tau_reduce is a utility that can determine which routines should not
      be instrumented. Instrumentation in frequently called light-weight routines
      may introduce undue perturbation and distort the performance data. tau_reduce
      examines the profile output and a set of rules for de-instrumentation and 
      produces a selective instrumentation file that can be fed to tau_instrumentor
      or tau_run and specifies which routines should not be instrumented. To see an 
      example of this utility, see examples/reduce (examples/README file has a description).
      Also, utils/TAU_REDUCE.README file contains information about tau_reduce and the
      format for specifying the rules for removing instrumentation. 
      % cd examples/reduce
      % make 

   To illustrate the use of TAU Fortran 90 instrumentation API, we have 
   included the NAS Parallel Benchmarks 2.3 LU and SP suites in the 
   examples/NPB2.3 directory [Ref http://www.nas.nasa.gov/NAS/NPB/ ].
   See the config/make.def makefile that shows how TAU can be used with 
   MPI  (with the TAU MPI Wrapper library) and Fortran 90. To use this, TAU
   must be configured using the -mpiinc=<dir>  and -mpilib=<dir> options. The
   default Fortran 90 compiler used is f90. This may be changed by the user in
   the makefile. LU is completely instrumented and uses the instrumented MPI
   library whereas SP has minimal instrumentation in the top level routine
   and relies on the instrumented MPI wrapper library. 
 
4. Paraprof.

   Paraprof is the GUI for TAU performance analysis. It requires Java 1.2+. An
   earlier version of the profile browser, racy, was implemented using Tcl/Tk.
   It is also available in this distribution but support for racy will be 
   gradually phased out. Users are encouraged to use paraprof instead. Paraprof 
   does *not* require -jdk=<dir> option to be specified (which is used for 
   configuring TAU for analyzing Java applications). The 'java' jvm program 
   should be in the user's path.
   NOTE: If paraprof does not work properly, please rebuild Paraprof.jar file by
   % cd tau-xxx/tools/src/paraprof
   % make clean; make
   Before you do this, please ensure that javac (1.2+) is in your path. 

5. Performance Database: PerfDB

   PerfDB is a tool related to the TAU framework.  The PerfDB database is
   designed to store and provide access to TAU profile data.  A number of
   utility programs have been written in Java to load the data into PerfDB
   and to query the data.  With PerfDB, users can perform performance analyses 
   such as regression analysis, scalability analysis across multiple trials, 
   and so on.  An unlimited number of comparative analyses are available
   through the PerfDB toolkit.  Work is being done to provide the user
   with standard analysis tools, and an API has been developed to access the
   data with standard Java classes. For further information, please refer to 
   tools/src/perfdb/doc/README file for installation and usage instructions. 


6. TAU System Requirements :
   -------------------------
I) The Profiling Library needs a recent C++ compiler. Our recommended list:
	a) Kuck and Associates' (http://www.kai.com) KCC compiler
	b) KAI's KAP/Pro (http://www.kai.com) OpenMP guidec++ compiler
	c) SGI (http://www.sgi.com) MipsPro 7.2+ CC compiler 
	d) PGI (http://www.pgroup.com) 3.0 pgCC compiler for Linux
	e) GNU (http://www.gnu.org) gcc-2.95 g++ compiler
	f) IBM (http://www.ibm.com) xlC C++ compiler for IBM SP
        g) SUN (http://www.sun.com) Sun CC 5.0+ compiler
        h) HP (http://www.hp.com) Tru64 cxx 6.x compiler  
	i) HP (http://www.hp.com) aCC compiler 
 
II) Platforms :
   TAU has been tested on 
	a) SGI IRIX 6.5 systems (Origin 2000) with KCC, CC, g++, guidec++.
	b) LINUX x86 PC clusters with 
		i) 	KAI KCC compiler, 
		ii) 	GNU g++/egcs compiler,
		iii)	PGI pgCC, pgcc, pgf90 compiler suite,
	        iv) 	Fujitsu C++/f90 compiler suite,
		v)      KAI KAP/Pro compiler suite.
		vi)     Intel C++/C/F90 compiler suite.
                vii)    NAGWare F90 compilers.
                viii)   Leahy F90 compilers.
                ix)     Absoft F90 compilers.
	c) Sun Solaris2 with g++, KCC. 
	d) HP PA-RISC systems running HP-UX with g++, and aCC. 
	e) Cray T3E with Cray C++ compiler, and KAI KCC.
	f) HP Tru64 Alpha with g++, cxx.
        g) HP Alpha Linux clusters with g++.
	i) Microsoft Windows. Tested with MS Visual C++ v6.0.
	j) IBM SP AIX (RS6000) systems with KCC, and xlC compilers.
	k) PowerPC Linux with g++.
	l) IA-64 Linux with g++, SGI Pro64 and Intel C++/C/F90 compilers.
	m) Apple OS X (Darwin) with c++, IBM xlf compilers.
	n) Hitachi SR8000 with KCC, g++, Hitachi cc and f90 compilers. 
        o) NEC SX-5 system with NEC c++, cc, and f90 compilers.
        p) Cray SV-1 with Cray Compilers.
        q) Cray X1 with Cray compilers.
        r) AMD Opteron and IA-64 Linux systems with GNU compilers.
	   

   TAU may work with minor modifications on other platforms.
	
III) Software Requirements :
   a) java
   paraprof requires Java 1.2+. Java can be downloaded from http://www.sun.com

   b) Tcl/Tk
   TAU's GUI racy needs Tcl 7.4/Tk 4.0 or better. The default is 8.0. 
   Tcl/Tk can be downloaded from http://www.scriptics.com. 
   NOTE: Tcl/Tk is only required for running the profile browser racy. The
   current version of TAU supports the new Java based paraprof profile browser that
   replaces the Tcl/Tk based racy. 
    
7. Modifying user's Makefile for Tracing/Profiling.

   TAU provides a makefile stub file which is placed in the installation
   directory <prefix>/<arch>/lib/Makefile.tau[-optionlist]. Users need to 
   include this makefile and use the make variables TAU_INCLUDE TAU_LIBS
   and TAU_DEFS appropriately in their makefiles. See (examples/instrument/
   Makefile)  

8. Examples of configuration and usage on the IBM SP
        
     % cd tau-2.x
     Example I:
     Profiling a Multithreaded C++ program (compiled with xlC)
     
     % configure -pthread
     % make clean; make install
     % set path=($path <TAU DIRECTORY>/rs6000/bin)
     % cd examples/threads
     % make; 
     % hello
     
       It has two threads: the profiling data should show functions executing on
       each thread
     % pprof
       This is the text based profile browser.
     % paraprof  
     
     Example II:
     Profiling an MPI program using the TAU MPI wrapper library.
     
     % configure -mpiinc=/usr/lpp/ppe.poe/include -mpilib=/usr/lpp/ppe.poe/lib
     % make clean; make install
     % cd examples/pi
     % make 
     % poe cpi -procs 4 -rmpool 2
     % pprof or paraprof
       Note: Using the MPI Profiling Interface TAU can generate profile data for 
       all MPI routines as well.
     
     Example III:
     Profiling an application written in C++ (compiled with KCC) using automatic 
     source code instrumentation and using CPU time instead of (the default) 
     wallclock time.
     [ For KCC you'll need % module load KCC]
     Download PDT (Program Database Toolkit) from http://www.acl.lanl.gov/pdtoolkit
     
     % cd pdtoolkit-1.x
     % configure 
     % make ; make install
       This takes a while...
     
     Next configure TAU to use PDT for automatic source code instrumentation.
     % cd tau-2.x
     % configure -c++=KCC -cc=cc -pdt=<pdtoolkit-1.x root directory> -CPUTIME
     		e.g.,   ... -pdt=/u1/sameer/pdtoolkit-1.3 ...
     % make clean; make install
     % cd examples/autoinstrument
     % make 
       This takes klargest.cpp, an uninstrumented file, parses it (PDT), and 
       invokes tau_instrumentor, which takes the PDT output and generates an 
       instrumented C++ file, which when linked with the TAU library, generates
       performance date when executed.
     % klargest
     % pprof
     % paraprof
     
     Example IV:
     Tracing an MPI program (compiled with KCC) and displaying the traces in 
     Vampir.
     
     % configure -c++=KCC -cc=cc -mpiinc=/usr/lpp/ppe.poe/include 
       	  -mpilib=/usr/lpp/ppe.poe/lib -TRACE
     % make clean; make install
     % cd examples/pi
     % make CXX=mpKCC
     % poe cpi -procs 4 -rmpool 2 2000
       Calculate the value of pi using 2000 iterations. 
     
     % tau_merge tautrace.*.trc cpi.trc
     % tau_convert -vampir cpi.trc tau.edf cpi.pv
     
     % vampir cpi.pv 
     
     In the Menu, choose Preferences -> Color Styles -> Activities and choose a 
     distinct color for each activity. 
     
     Example V:
     Profiling an OpenMPI (OpenMP + MPI) C program using xlC.
     % configure -openmp -mpiinc=/usr/lpp/ppe.poe/include 
         -mpilib=/usr/lpp/ppe.poe/lib  
     % cd examples/openmpi
     % make CXX=mpCC_r CC=mpcc_r
     % setenv OMP_NUM_THREADS 2
     % poe stommel -procs 2 -rmpool 2 
     % pprof
