Disclosed is a method for effectively mapping heavily communicating Message Passing Interface (MPI) Processes to a node on a cluster of single/multi-core Symmetric Multiprocessors (SMPs) for reducing communication overhead. Further, the method also maps the MPI processes to one or more cores which are closer to each other within a node for significantly reducing communication overhead.
A Method for Message Passing Interface (MPI) Process Mapping to Minimize the
Communication Latency on Symmetric Multiprocessor (SMP) and Multicore
Architectures
A method is disclosed for effectively mapping heavily communicating Message Passing Interface (MPI) Processes to a node on a cluster of single/multi-core Symmetric Multiprocessor (SMPs) for reducing communication overhead. The method also maps the MPI processes to one or more cores which are closer to each other within a node for further reducing communication overhead.
The method disclosed herein extends a compiler based communication analysis technique that effectively maps MPI processes at an inter-node level to the intra-node level. This method further maps the MPI processes to the cores on the node.
Consider an example, where there are 16
processes to be launched on to 2 nodes, each node with
two way quad-cores. With the compiler based communication analysis technique, it is determined that the mapping at the inter-node level is as follows:
Node1: 0, 2, 4, 6, 8, 10, 12, 14
Node2: 1, 3, 5, 7, 9, 11, 13, 15
where the numbers 0, 1, 2, …, 15 are the ranks of the MPI processes.
In order to find the cores that are close to each other on the nodes, two copies of a simple MPI latency determining application is executed on different combinations of cores. Based on the latency it is found that on node1 cores 0,1,2,3 are close to each other and cores 4,5,6,7 are close to each other - which are represented as (0,1,2,3) and (4,5,6,7).
Thereafter, a graph partition algorithm is applied at intra-node level to determine he...