Automatic parallelization tool
An automatic parallelization tool is a computer program aiding in automatic parallelization of existing sequential code into parallel code. It aims to facilitate re-use of already written software with the performance benefits of parallelization, saving the need to rewrite the software entirely.
In the past, parallel hardware was only implemented in high-end machines or by means of distributed computing, but with the advent of GPUs and multi-core CPUs in consumer devices it has become widespread in low-end computers as well. Hence, it has become desirable to automate the process of converting older, single-threaded applications to take advantage of parallel hardware. Furthermore, the existence of automatic parallelization tools can enable programmers to focus on writing applications in a single-threaded manner while still benefiting from parallelization. Some caveats in the conversion include handling issues such as synchronization and deadlock avoidance which do not arise in single-threaded computation.
Need for automatic parallelization
Past techniques provided solutions for languages like FORTRAN and C; however, these were not enough. These techniques dealt with parallelization sections with specific system in mind-like loops or particular sections of code. Identifying opportunities for parallelization is a critical step while generating multithreaded application. This need to parallelize applications is partially addressed by tools that analyze code to exploit parallelism. These tools use either compile time techniques or run-time techniques. These techniques are built-in in some parallelizing compilers but user needs to identify parallelize code and mark the code with special language constructs. The compiler identifies these language constructs and analyzes the marked code for parallelization. Some tools parallelize only special form of code like loops. Hence, a fully automatic tool for converting sequential code to parallel code is required.General procedure of parallelization
1. The process starts with identifying code sections that the programmer feels have parallelism possibilities. Often this task is difficultsince the programmer who wants to parallelize the code has not originally written the code under consideration. Another possibility is that
the programmer is new to the application domain. Thus, though this first stage in the parallelization process seems easy at first it may
not be so.
2. The next stage is to shortlist code sections out of the identified ones that are actually parallelization. This stage is again most
important and difficult since it involves lot of analysis, generally for codes in C/C++ where pointers are involved are difficult to
analyze. Many special techniques such as pointer alias analysis, functions side effects analysis are required to conclude whether a section
of code is dependent on any other code. If the dependencies in the identified code sections are more the possibilities of parallelization
decreases.
3. Sometimes the dependencies are removed by changing the code and this is the next stage in parallelization. Code is transformed such that
the functionality and hence the output is not changed but the dependency, if any, on other code section or other instruction is removed.
4. The last stage in parallelization is generating the parallel code. This code is always functionally similar to the original sequential code
but has additional constructs or code sections which when executed create multiple threads or processes.
Automatic parallelization technique
See also main article automatic parallelization.Scan
This is the first stage where the scanner will read the input source files to identify all static and extern usages. Each line in the file will be checked against pre-defined patterns to segregate into tokens. These tokens will be stored in a file which will be used later by thegrammar engine. The grammar engine will check patterns of tokens that match with pre-defined rules to identify variables, loops, controls
statements, functions etc. in the code.....
Analyze
The analyzer is used to identify sections of code that can be executed concurrently. The analyzer uses the static data information provided by the scanner-parser. The analyzer will first find out all the functions that are totally independent of each other and mark them asindividual tasks. Then analyzer finds which tasks are having dependencies.
Schedule
The scheduler will lists all the tasks and their dependencies on each other in terms of execution and start times. The scheduler will produce optimal schedule in terms of number of processors to be used or the total time of execution for the application.Code Generation
The scheduler will generate list of all the tasks and the details of the cores on which they will execute along with the time that theywill execute for. The code Generator will insert special constructs in the code that will be read during execution by the scheduler. These
constructs will instruct the scheduler on which core a particular task will execute along with the start and end times......
Parallelization tools
There are a number of Automatic Parallelization tools for Fortran, C, C++, and several other languages.YUCCA
YUCCA is a Sequential to Parallel automatic code conversion tool developed by KPIT Technologies Ltd. Pune. It takes input as C source code which may have multiple source and header files. It gives output as transformed multi-threaded parallel code using pthreads functions and OpenMP constructs. The YUCCA tool does task and loop level parallelization.Par4All
Par4All is an automatic parallelizing and optimizing compiler for C and Fortran sequential programs. The purpose of thissource-to-source compiler is to adapt existing applications to various hardware targets such as multicore systems, high performance computers and GPUs. It creates a new source code and thus allows the original source code of the application to remain unchanged.
Cetus
is a compiler infrastructure for the source-to-source transformation of software programs. This project is developed by Purdue University. Cetus is written in Java. It provides basic infrastructure for writing automatic parallelization tools or compilers. The basic parallelizing techniques Cetus currently implements are privatization, reduction variables recognition and induction variable substitution.A new graphic user interface was added in Feb 2013. Speedup calculations and graph display were added in May 2013. A Cetus remote server in a client–server model was added in May 2013 and users can optionally transform C Code through the server. This is especially useful when users run Cetus on a non-Linux platform. An experimental Hubzero version of Cetus was also implemented in May 2013 and users can also run Cetus through a web browser.
PLUTO
PLUTO is an automatic parallelization tool based on the polyhedral model. The polyhedral model for compiler optimization is a representation for programs that makes it convenient to perform high-level transformations such as loop nest optimizations and loop parallelization. Pluto transforms C programs from source to source for coarse-grained parallelism and data locality simultaneously. The core transformation framework mainly works by finding affine transformations for efficient tiling and fusion, but not limited to those. OpenMP parallel code for multicores can be automatically generated from sequential C program sections.Polaris compiler
The Polaris compiler takes a Fortran77 program as input, transforms this program so that it runs efficiently on a parallel computer, and outputs this program version in one of several possible parallel FORTRAN dialects. Polaris performs its transformations in several "compilation passes". In addition to many commonly known passes, Polaris includes advanced capabilities performing the following tasks: Array privatization, Data dependence testing, Induction variable recognition, Inter procedural analysis, and symbolic program analysis.Intel C++ compiler
The auto-parallelization feature of the Intel C++ Compiler automatically translates serial portions of the input program into semantically equivalent multi-threaded code. Automatic parallelization determines the loops that are good work sharing candidates,performs the data-flow analysis to verify correct parallel execution, and partitions the data for threaded code generation as is needed in
programming with OpenMP directives. The OpenMP and Auto-parallelization applications provide the performance gains from shared
memory on multiprocessor systems.