1/28/2009

Programming Windows Hpc Server - Using MPI Model

Conventionally, HPC/Parallel problems can be roughly divided into the following two categories[ref]:

- Data Parallel, these applications divides the input data into a number of completely independent parts. The same computation is undertaken on each part. And some kind of post processing after the computations is needed.

- Task Parallel, these are those jobs that its functionality can be divided into many small tasks, each of which can be executed on one CPU core. These tasks may need to communication or not all at.

Using another taxonomy(orthogonal to data/task parallel), parallel problems can be divided into:

- Embarrassingly Parallel, for these applications, little or no effort is required to separate the problem into a number of small tasks that runs on one CPU core. No or very little lightweight post processing is needed. (no/little cooperation among task and post processing)

- Dependent Parallel, these are those problems in which there are dependencies among various tasks and communication among these tasks is required. Communication can be accomplished by sharing variables(on shared memory architecture) or passing(send/receive) messages(typically on distributed memory architecture).

The de facto interface of message passing model is MPI, which is the focus of this article.

Windows Hpc Server Network Topology (from Microsoft)

MPI is just an API standard, there are various implementations(see reference section). In this article, programing example is using MS-MPI on windows hpc server.

Part I - Environment for MPI Programming on Windows Hpc Server

To begin with, you should have the following environment:
- The Windows Hpc Cluster (1 Head Node, N Compute Node)
- The Hpc App Dev Machine

The Hpc App Dev Machine should have the following software installed:
- Visual Studio 2005/2008
- Hpc Pack 2008 Client Utilities
- Hpc Pack 2008 SDK

Then you should configure you VS environment:
1. Set MPI include dir: VS->Tools->Options->Projects and Solutions->VC++ Directories, for each platform(Win32/X64), choose "Include Files" in "show directories for" dropdown list, add "$(hpc pack 2008 sdk)\include" ($(hpc pack 2008 sdk) is where your hpc pack 2008 sdk is installed)
2. Set MPI library dir: VS->Tools->Options->Projects and Solutions->VC++ Directories, for each platform(Win32/X64), choose "Library Files" in "show directories for" dropdown list, add "$(hpc pack 2008 sdk)\Lib\i386"/"$(hpc pack 2008 sdk)\Lib\amd64" respectively. ($(hpc pack 2008 sdk) is where your hpc pack 2008 sdk is installed)

Part II - Programming Using MPI APIs in Visual Studio

For those who is new to MPI, here are some basic intro, more info please see[4][5][6]:
1. A MPI application consists of many processes, which is called Task in MPI. All tasks are associated with a unique identifier starts from 0 ... N - 1, which is called Rank in MPI vocabulary. Rank is used to identify the source and target of message passing.
2. MPI is Message-Oriented, not connection/stream oriented.
3. MPI uses Tag to identify message type.

Now let's write a MPI application that tells where itself is running:
1. Create a new empty win32 console application in your VS.
2. Add a new c++ source file named MpiHello.cxx and the content is
MPI Hello Source Code
 1 #include <mpi.h>
 2 #include <windows.h>
 3 #include <stdio.h>
 4 #include <stdlib.h>
 5
 6 int main(int argc, char** argv)
 7 {
 8         int nProc;
 9         int nThisRank;
10         char host[MAX_PATH];
11         char msg[1024];
12
13         MPI_Init(&argc, &argv);
14         MPI_Comm_size(MPI_COMM_WORLD, &nProc);
15         MPI_Comm_rank(MPI_COMM_WORLD, &nThisRank);
16
17         gethostname(host, sizeof(host) / sizeof(host[0]));
18
19         if (nThisRank == 0)
20         {
21                 printf("Master Process is running on host[%s].\n", host);
22
23                 char rcvMsg[1024];
24                 MPI_Status status;
25                 for (int i = 1; i < nProc; ++i)
26                 {
27                         MPI_Recv(rcvMsg,
28                                 sizeof(rcvMsg),
29                                 MPI_CHAR,
30                                 MPI_ANY_SOURCE,
31                                 MPI_ANY_TAG,
32                                 MPI_COMM_WORLD,
33                                 &status);
34                         printf("%s\n", rcvMsg);
35                 }
36         }
37         else
38         {
39                 sprintf_s(msg,
40                         sizeof(msg),
41                         "Worker Process [%d] of [%d] is running on host [%s].",
42                         nThisRank,
43                         nProc,
44                         host);
45
46                 MPI_Send(msg,
47                         (int)strlen(msg) + 1,
48                         MPI_CHAR,
49                         0,
50                         0,
51                         MPI_COMM_WORLD);
52         }
53
54         MPI_Finalize();
55
56         return 0;
57 }
58

3. Some Explanation:
- To use MPI APIs, you should include mpi.h header file
- Each MPI task should starts with MPI_Init() and ends with MPI_Finalize()
- MPI_Comm_Size() is used to get task count in this application
- MPI_Comm_Rank() is used to get this task's rank value

Now build your application and use MPI tool mpiexec to run it:
mpiexec -n 4 MpiHello.exe

The console output will be:
Master Process is running on host[hpc-01].
Worker Process [2] of [4] is running on host [hpc-01].
Worker Process [3] of [4] is running on host [hpc-01].
Worker Process [1] of [4] is running on host [hpc-01].

Part III - Deploy and Run MPI Application on Windows Hpc Cluster

1. Deploy
- Copy MpiHello.exe to \\your_head_node\App\
(You may copy app bin to each compute node's local disk and maybe some extra input data files. But deploying is nearly all about code/data file copying)

2. Submit Windows Hpc Job
- In Cmd.exe shell, change your dir to $(Hpc Pack 2008)\Bin.
- Run the following command:
Job.exe submit /scheduler:your_head_node /jobname:MpiHello /numprocessors:6-6 /workdir:\\your_head_node\users\your_name /stdout:_OUT.txt /user:your_domain\your_name mpiexec.exe \\your_head_node\app\MpiHello.exe

On return, a job ID will display in console window.

You can now use the Job Management component in Hpc Cluster Manager to monitor the progress of your Hpc application.

When the job is finished successfully, goto \\your_head_node\users\your_name and check the _OUT.txt file, it will contain contents very similar as:

Master Process is running on host[hpc-01].
Worker Process [1] of [6] is running on host [hpc-01].
Worker Process [3] of [6] is running on host [hpc-02].
Worker Process [4] of [6] is running on host [hpc-03].
Worker Process [2] of [6] is running on host [hpc-02].
Worker Process [5] of [6] is running on host [hpc-03].

[Reference]

About MPI

1. Message Passing Interface on wikipedia.
2. The MPI standard
4. Tutorial on MPI by William Gropp.
5. MPI Tutorial at ANL
6. Great MPI tutorial by LLNL
7. C++ MPI Exercises by John Burkardt.
8. Book online: MPI The Complete Reference.

About MS-MPI

10. Windows HPC Server 2008 - Using MS-MPI whitepaper .
11. Using Microsoft MPI (@TechNet).
12. MPI.NET Home Page

About Win Hpc Programming

20. MS Hpc Dev Center
21. Classic Hpc Programming Using Visual C++(Doc, Code)
22. Hpc Developing Using .Net: MPI.NET

No comments: