Deploying and Monitoring in Windows Cluster

When solving data/compute intensive problems in large scale cluster, system deploying and monitoring are important tasks besides software developing. In this article, I will show how to accomplish these two tasks in Windows Cluster without(or with very little) software developing efforts.

First let's clarify the two terms:
Deploying - copy bin and data(usually from a file server) to remote machines, set some configuration files
Monitoring - Start/Stop process on remote machines and get its status periodically

If performance is critical when deploying, or if your deploy policies are very complicated, you may want to develop your own management software or adopt some mature cluster management tools from ISV/open source community.

But here, I only focus on showing you how to accomplish this kind of works using ready-to-use(available from OS or well known community) tools and scripts.

[In the following sections, I assume you just have a bare Windows OS installed on the nodes in cluster. I have an previous article on how to install Windows OS in large cluster automatically]

1. How to deploy your data and executable

The simplest and most intuitive solution is to make some share folder on each machine and copy data/bin to that folder.
But how to make remote folder shared? You have two choice:
- Use windows default share. You can run "net share" command to see what's the share folders in a default installation.
- Run "net share" command on remote machines to share whatever folders you want. PsExec is an ideal tool for run remote commands, this blog has is an previous article on how PsExec works.
When share folders are ready, you can use copy/xcopy/robocopy to copy your application data and executable to them.

2. How to start/stop/monitor remote executable

Let's see starting up first. Windows OS has a great feature that you can remotely create/start/stop windows service, try "sc /?" to learn more. You can leverage this mechanism to control remote application:
- Make all your executable as windows service. It's convenient for management, but need many coding efforts.
- Write one "God" windows service, it start/stop common local windows applications according to some configuration file. You just use SC command to control this service. Most production use solution adopt this method. You can also add some cool feature, for example restart application when termination detected, to this "God" service.

But you have an alternative choice, use the great PsExec. It can start remote applications in a graceful manner. But:
- It can only start executable
- Itself is actually based on the remote service controlling feature of Windows OS
- to stop remote executable, you may use a system tool called taskkill, which comes with Windows OS

To monitor remote executable status, you can:
- use tasklist tool provided by Windows OS
- if the executable is windows service, you can use SC command

3. Security, AD Account and Access Control

Many windows clusters use dedicated/independent domain, consequently, you have to use some kind of boundary server to access those clusters.

According to my personal experience, do NOT run the upper deploy/monitor commands on boundary servers! The account and access control problems are really annoying!
It's better to do these tasks on a machine that is part of the dedicated cluster. So no user/password is needed for all your commands. The only problem is that you may need some extra efforts to make your data/executable repository accessible by this machine.

The final tip:
- You may want a decent and professional output from your scripts, so color console output is highly desired. cecho may help you, it is a colorful version of echo command.

No comments: