6/29/2009

I/O Concept - Blocking/Non-Blocking VS Sync/Async

  These concepts are discussed in the context of WinSock, but the basic ideas can be applied to other I/O types and also on other OS, such as Linux/Unix world as well.

I - Blocking V.S. Non-Blocking


  Under blocking mode, socket I/O operations, connect and accept operations all block until the operation in question is completed.
  Under non-blocking mode, if a Winsock call cannot complete immediately, the call fails and WSAGetLastError() returns a WSAEWOULDBLOCK error.

  The calls always return immediately, but if failed, nothing happened, i.e. no I/O request is issued at all.

  When a socket is created, by default it is a blocking socket.To change the socket operation mode from blocking mode to nonblocking mode, you can either use WSAAsyncSelect(), WSAEventSelect(), or the FIONBIO command in the ioctlsocket() API call.

II - Sync V.S. Async

  On windows platform, Async I/O is also called Overlapped I/O.
  In Winsock 2, you create an overlapped socket using WSASocket() with the WSA_FLAG_OVERLAPPED flag, or simply using the socket() API. You can use the Win32 file I/O APIs or Winsock 2 WSASend(), WSASendTo(), WSARecv(), and WSARecvFrom() to initiate an overlapped I/O operation.

  If an overlapped I/O operation can not complete immediately, the call fails and WSAGetLastError() return WSA_IO_PENDING or ERROR_IO_PENDING, which is actually the same define as WSA_IO_PENDING.

  So all calls return immediately, but if the operation can't be completed at that time, corresponding request is issued to the underlying system.

  By setting a socket's overlapped I/O attribute it doesn't mean that the socket will perform an overlapped I/O operation. For example, if you specify NULL for both the completion function and the overlapped structure in WSARecv() and WSASend(), or you simply call recv or send functions, they will complete in a blocking fashion. To make sure the I/O is performed in an overlapped fashion you need to provide an overlapped structure in your I/O function, depending on the function you use.

  If you use the SO_RCVBUF and SO_SNDBUF option to set zero TCP stack receive and send buffer, you basically instruct the TCP stack to directly perform I/O using the buffer provided in your I/O call. Therefore, in addition to the non-blocking advantage of the overlapped socket I/O, the other advantage is better performance because you save a buffer copy between the TCP stack buffer and the user buffer for each I/O call.

III - I/O Multiplexing VS Async I/O

1. In I/O Multiplexing, you want to be notified if one or more I/O conditions are ready. (So issuing some I/O operation won't be blocked)
2. In Asyn model, I/O operations won't block even if it can't be completed immediately. (So that you can issue multiple I/O requests simultaneously). When you get notification message, it tells you that the I/O operations are completed.

As summarized in [2][3]:
- Blocking I/O means that the calling system does not return control to the caller until the operation is finished. As a result, the caller is blocked and cannot perform other activities during that time.
- Non-blocking Synchronous I/O means that call returns control to the caller immediately and the caller is not made to wait. The invoked system immediately returns one of two responses: If the call was executed and the results are ready, then the caller is told of that. Alternatively, the invoked system can tell the caller that the system has no resources (no data in the socket) to perform the requested action.
- Non-blocking Asynchronous I/O means that the calling function returns control to the caller immediately, reporting that the requested action was started. The invoked system will notify the caller (by callback for example), when the result is ready for processing.

[Reference]
1. http://support.microsoft.com/kb/181611
2. Advanced Programming in Unix Environment
3. Two High Performance I/O Design Patterns
4. Boost application performance using asynchronous I/O

6/18/2009

Winsock I/O Model - Part II : Implementation

In the previous post[6], I summarized several scalable network I/O models in theory. In this article, I will give concrete code to show how to use each model to build a scalable network server. Building scalable server is a challenging task and needs a lot of considerations, we just consider the network I/O model problem here.

When a client request comes in, the server will do some float number calculation, get server system time and then response the client with these information. The code can be found in the
ServeTheClient() function in Util.cxx.

1. Multi-threading


The server consists of two parts:
- One Listening Thread: listen and accept client connection
- Many Worker Threads: one for each client

When new client comes, the Listening Thread accepts it and creates a new worker thread to serve it. The worker thread reads a string data from client, do the processing and send back the output to client.

This is the simplest solution, code can be found here: Multithreading Network Server.

2. I/O Multiplexing - BSD Select


In this model, the main thread only init winsock environment, start listening, other worker threads do the real work. Each worker thread handle upto N - 1 client connections(N is the 64 on most BSD compatible systems), one slot is for listening socket.

While using BSD Select to implement network server, you should set the server socket(socket that is used to listen/accept client connections) into non-blocking mode. The reasoning is that, even select() tells that some client connection request had arrived, accept() may block since it's possible that many threads get this notification.

There is a SELECT_CONN_CTX data structure for each client connection. It's definitely a state machine style server architecture.

One drawback of current implementation is that the number of threads only increases, never decrease. Worker threads should become less when concurrent client connections drops down.

An alternative design is use some dedicated thread(s) to Listen/Accept client connections and other threads only serve client requests. Since it introduces many shared data among different threads, synchronization mechanisms (such as lock) are needed.

The code - BSD Select based Network Server

3. I/O Multiplexing - Event Select

It's very similiar to BSD Select model with some small differences:
- WSAWaitForMultipleEvents() rather than select() is used to wait for network event
- Since the wait function only returns one array index value, we should check all event handles that follows the returned one to ensure fairness.
- WSAEnumNetworkEvents() is used to determine the exact network event
- each thread can only serve WSA_MAXIMUM_WAIT_EVENTS - 1 client connections

The drawback and possible new architecture is the same as BSD select.

Note: Socket should be in non-blocking mode to avoid blocking for the same reason as in BSD select model. But WSAEventSelect() will put a socket into non-blocking mode, so we don't need to call other function to do this as in BSD select model.

The code - Event Select based Network Server

4. Overlapped I/O - Event Waiting

In this model, you associate each winsock call with an overlapped data structure, and associate a kernel event with each overlapped data structure. This event is used to get async/overlapped call completion notification.

Core logic:
- The main thread only deal with Init work
- Worker threads do async accept using AcceptEx()
- When client connected, it's served by server in state machine style
- When WSAWaitForMultipleEvents() returns, we should check all following handles to ensure fairness
- When no free slot, a new worker thread will be created to serve more clients.

Since it uses kernel event to get completioin notification, it has the same drawback as "Kernel Event Select" model - each thread can only serve WSA_MAXIMUM_WAIT_EVENTS - 1 client connections.

The code - Overlapped I/O with Event Waiting based Network Server

5. Overlapped I/O - Callback Routine

In this model, you pass an extra callback routine parameter when issuing overlapped I/O calls.

Since each client connection is represented by a context data structure, which can be accessed in the callback routine, this models seems pretty simple:
- no multiple threading
- no synchronization
- callback routine represents state transition

Core Logic:
- Main thread is a async listen/accept loop
- AcceptEx() sends completion notification using Kernel Event
- Main thread must wait in alertable state in order to get overlapped i/o callback routine executed.

The code - "Overlapped I/O with Callbacking" based Network Server

6. Overlapped I/O - IO Completion Port

This is the most elegant solution for a scalable network server:
- The main thread is a listen/accept thread, when a client connection arrives, it post a notification to IOCP and let worker thread process it.
- Each worker thread deal with each client connection in state machine style: using network I/O completion notification to trigger state transition and next operation call.
- Let system to control thread number and scheduling.

You can see from the source code that the logic is very clear and easy to understand. If you are going to write a network server on windows platform, this model is highly recommended.

The code - "Overlapped I/O with IOCP" based Network Server

Note:
The original version of CreateIOCompletionPort() is heavily overloaded on its semantic. So I created two separated function CreateNewIoCompletionPort() and AssociateDeviceWithIoCompletionPort() to make each one only do one simple function. The definition and implementation can be found in Util.h and Util.cxx

Conclusion:
Here I listed the implementation of a scalable network server using various network I/O models. In next article, I will do some load test on each server and compare the performance/scalability of each model.

[Reference]
1. Write Scalable Winsock Application using I/O Completion Port
2. Writing scalable server applications using I/O Completion Port
3. Book - Network Programming for Microsoft Windows
4. Network I/O system call implementation on Linux
5. Winsock Network I/O Model - Part I : Theory
6. Scalable I/O Model

6/10/2009

Winsock I/O Model - Part I : Concept

The basic steps to do windows socket programming are simple and straightforward:

Server Side

  1. Initialize Winsock.
  2. Create a socket.
  3. Bind the socket.
  4. Listen on the socket for a client.
  5. Accept a connection from a client.
  6. Receive and Send data.
  7. Disconnect.

Client Side

  1. Initialize Winsock.
  2. Create a socket.
  3. Connect to the server.
  4. Send and Receive data.
  5. Disconnect.
More detailed information please refer Winsock Prgming Guide @ MSDN[2].

But the hard problem is how to write high performance & high scalable network application, which means:
- Serve as many concurrent connections as possible
- Response each request as soon as possible

The various solutions to this problem is called - Network I/O Model. I had writtend a general I/O model article[1], this post focus on the same problem in network area. Using the same taxonomy as [1], we can categorize the various Network I/O Models as:

1. Multitasking

2. I/O Multiplexing
 - BSD Select
 - Window Message Select
 - Kernel Event Select

3. Overlapped(
Async) I/O
 - Polling using GetOverlappedResult()
 - Waiting on Kernel Event
 - Callback using Completion Routine
 - Queuing using I/O Completion Port

Let's detail on each model one by one:

1. Multitasking

Usually, you use one thread to serve one client connection in this model. Since each thread consumes some resources (such as memory for stack), and context switching among large number of threads is expensive, this model is not thought to be scalable.

2. I/O Multiplexing

- 2.1 BSD Select

In this model, you only issue sync network I/O requests only when you know it won't block. You call the BSD style select() function to check what operations on what sockets won't block. (I.E. you use select() to get network event notifications)

The advantage of this model is that you can multiplex connections and I/O operations in single thread, which will reduce the os resource consumption greatly. The drawback is that, socket handle counts that one select() call supports is limited (defaults to 64) and there exist expensive data transfer cost and big kernel algorithm complexity.

- 2.2. Window Message Select

In this model, network events, such as socket connection arrives and data ready, are sent to some window procedure as standard window message, just as a keyboard strike or mouse movement event.

To use this model, you should call WSAAsyncSelect() to associate socket handle with some window handle. For what situation triggers what window message, please see WSAAsyncSelect Doc at MSDN.

The advantage is the same as BSD Select, the drawback is that it requires window handle and window procedure, which is not hard for some applications(such as service or console application).

- 2.3. Kernel Event Select

This model is similar to BSD Select. You call WSAEventSelect() to associate a socket with a WSAEvent. And call WSAWaitForMultipleEvents() to wait network event notification. when the kernel event is signaled, you should call WSAEnumNetworkEvents() to determine what network event(s) is occurred on what network handle.

Compared with Window Message Model, this model doesn't need window object and message queue. But since this model can only wait on at most 64 WSAEvents in one call, multiple threads is needed to handle more concurrent socket connections. So it is not thought as scalable model well.

3. Overlapped I/O

Overlapped i/o can be performed only on sockets created through the WSASocket() function with the WSA_FLAG_OVERLAPPED flag set or sockets created through the socket() function. When working in Overlapped model, each time you make a network method call, you should pass a WSAOVERLAPPED structure as parameter. The call will return immediately whether the operation is completed or not.

Overlapped network i/o only works with the following APIs:
WSASend
WSASentTo
WSARecv

WSARecvFrom
WSARecvMsg
AcceptEx
ConnectEx
DisconnectEx
TransmitFile

TransmitPackets
WSAIoctl

Completion Notification - Overlapped socket calls return immediately after issuing, there are 3 ways to get completion notification:
- 3.1 Callback: it gets called when the calling thread is in alertable state. The completion routine is specified as an optional parameter to some socket calls.
- 3.2 Event: WSAOVERLAPPED contians a WSAEVENT handle, which is signaled when the related I/O operation is completed. You must set valid WSAEvent handle in related overlapped structure that is passed to non-blocking socket calls.
- 3.3 Polling: After issuing overlapped socket calls, you can call WSAGetOverlappedResult to poll the completion status of previous non-blocking calls.

4. Overlapped I/O + IOCP

In IOCP model, you associate a socket handle with an I/O completion port and any further OverLapped socket operatioin will use this IOCP for completion notification. Compared with previous models, this model is the most complicated one. You must fully understand Overlapped I/O and I/O Completion Port mechanism to understand this model well.

Typically, when an overlapped I/O call is made, a pointer to an OVERLAPPED structure is passed as a parameter to that I/O call. GetQueuedCompletionStatus() will return the same pointer when the operation completes. With this structure alone, however, an application can't tell which operation just completed. In order to keep track of the operations that have completed, it's useful to define your own OVERLAPPED structure that contains any extra information about each operation queued to the completion port. This is so called "per-i/o data".

The CompletionKey parameter used when associate socket with iocp (using CreateIoCompletionPort() ) can be used to store "per-handle data". For each completion event related to a socket handle, the corresponding CompletionKey can be retrieved, as the 3rd parameter of GetQueuedCompletionStatus().

With IOCP, which in fact is a kernel Queue object, there is no limit on how many concurrent connections you can manage in one thread. And IOCP provides some system supports to optimize thread scheduling to reduce context switch cost. So this model is the prefered scalable server network model.

This article focus only on theory and idea, in next article of this topic, I will show the real code using these ideas.

NOTES:
1. WSAAsyncSelect/WSAEventSelect will change associated sockets from blocking mode to non-blocking mode, but BSD Select doesn't have this side effect
2. In I/O Multiplexing model (using BSD/Message/Event Select), a robust application must be prepared for the possibility that a network event arrives but the corresponding Win Sock call can't complete immediately.
3. "Socket overlapped I/O versus blocking/nonblocking mode" is a great article about Sync/Async VS Blocking/Non-Blocking
4. When using I/O multiplexing model, socket is usually set to NON_BLOCKING mode(manually when using BSD Select, automatically when using Message/Event Select ).

[Reference]
1. High Performance I/O Models
2. Winsock Programming Guide @ MSDN
3. Windows Sockets: A Quick And Dirty Primer
4. BSD Socket Introduction
5. Socket overlapped I/O versus blocking/nonblocking mode
6. Book : Network Programming for Microsoft Windows

6/04/2009

Inside Scalable I/O Model - In Sync & Async Way

  The so called "I/O Model" describes how you write code to accomplish I/O tasks, such as connecting to a server, writing data to disk, receiving data from network etc. Programmatically accomplishing I/O tasks is far more complicated than just calling a single related OS API, for example WriteFile() or ReadFile().

  In this article, I try to summarize and explain all the options you have when programming I/O tasks.

I. What's Sync & Async I/O

  The concepts - "Asynchronous and Synchronous" come from the telecommunication domain. They mainly deal with timing - "Sync communication needs an external clock signal to coordinate the rhythm of sending/receiving sides, while in Async model, any timing required to recover data from the communication symbol is encoded within the symbols."[1]

  Sync/Async are introduced into computer science for a different meaning: "Async operations are those occurring independently of the issuing program flow, while in Sync model, requesting program flow must wait for the operations to acknowledge completion."[1]

  Particularly, "Asynchronous I/O is a form of I/O processing that permits other processing to continue before the transmission has finished"[2]; in Synchrounous I/O, the operation sender "must wait until the hardware has completed the physical I/O, so that it can be informed of the success or failure of the operation"[5].

  MSDN also has a great article about the Sync/Async concepts - Synchronous and Asynchronous I/O.

  NOTE: in the following sections, we use Async/Non-Blocking, Sync/Blocking alternatively and don't distinguish them. This convention is conformed to wikipedia and MSDN, but some popular textbooks(such as APUE ) differentiate these terms explicitly.

II. Why Invent Two I/O Models?

  The Sync I/O model is simple to understand and straightforward to use, but you should wait while the I/O is performing, thus the ultimate performance is limited.

  The basic motivation for Async I/O is to boost performance. In fact, there are two assumptions about this problem:
  1. I/O operations are time consuming, there IS a lot of time for CPU to execute other instructions while the I/O ops are going on.
  2. There are large amount of available instructions to execute when the results from I/O ops are not available.

   If these two assumptions are not correct, the performance of your code may not be improved even if you adopted the powerful Async I/O programming model.

III. How to do Scalable I/O Operations?

The challenges for scalable I/O model are:
1. What code to execute after I/O operations are issued (maybe in async/sync way)?
2. How to detect and What to do when I/O operations are completed?

Scalable I/O operations can be accomplished in either Sync or Async way:

- Scalable Synchronous I/O Models

  In Sync I/O Model, the API calls return only when the requested operation completes. But this doesn't mean that these calls must block. For example, suppose you want to read data from socket, if there are some data in the socket buffer before you issued the read request, the read() call will succeed without block.

  In order to avoid CPU idle while i/o operations are blocked, you may execute other tasks that is not related to the pending i/o, or avoiding i/o blocking by means of only issuing i/o operations at proper time. That is to say:

1. Multitasking - Code in other tasks get executed after I/O request is issued and code follows the I/O operations get executed when they are completed. (Task may take the form of thread or process)

2. I/O Multiplexing - Use one thread to handle multiple sync I/O device handles.
  The key idea of this model is that - you call sync i/o functions only when you know that call will NOT block the calling thread (Because the data/status is ready even if you haven't issued any corresponding i/o operations, for example: connect(), read() on socket).
  You use some sync calls (such as select, poll, epoll) to check multiple device handles' status to know what i/o operations on them will not block.
  Essentially, I/O multiplexing provides a way to check the status of multiple device handles in one sync call. It only returns when some events happened on some handles. Thus you can handle multiple Sync I/O operations in one thread.

- Scalable Asynchronous I/O Models

  In Async Model, I/O operation call returns immediately after being issued and code that follows will get executed. The various models differs only on how to get I/O completion notification:

1. Polling - Use various polling function (such as HasOverlappedIoCompleted() on Windows) to poll whether some pending Async I/O is completed

2. Callback - Kernel will call some user provided call back functions when some i/o operations are completed.

3. Waiting - The OS kernel will signal some kernel event objects when i/o operations are completed. Application code just waits on these events to get completion notification.

4. Queuing - Kernel will place some information packet into a Queue (such as I/O Completion Port on Windows) when I/O operations completed. Client code should check that Queue for completion information.

Another advantage of Async I/O is that it reduces memory copy cost. Other I/O models need copy data(for send/receive) in system buffer from/to application buffer. In Async I/O, system knows the receiving/sending buffer when the io request is issued, so I can use it directly, no need to do memory copy between system space and user space.

IV. I/O Models on Windows Platform

- Multi-threading is supported through thread library
- I/O multiplexing is supported by select(), WSAAsyncSelect(), WSAEventSelect()
- Async Polling/Async Waiting is supported by means of Overlapped I/O
- Async Callback model is supported by means of Alertable I/O
- Async Queuing is supported by means of I/O Completion Port

NOTE:

1. To understand Alertable I/O, you must get familiar with the Asynchronous Procedure Call mechanism first, which is a great infrastructure for inter-thread/inter-process communication facility: Win32 API QueueUserAPC() can be used to queue APC entry to other thread's APC queue even in other process.
- The most famous usage of this facility is Gracefully Terminating Thread - if a thread is in alertable waiting state and you want to terminate it, you can post a dummy APC to it. That thread will then resume from wait status, execute this dummy APC. You can do graceful termination actions after that. [more detail can be found in chapter 10, "Windows via C/C++"]

2. Alertable I/O and Overlapped I/O are old techniques, use I/O Completion Port as much as you can, it's easy to use and also very powerful.

V. I/O Models on Linux Platform

- Multi-threading is supported
- I/O multiplexing is supported by select(), poll(), epoll_xxx()
- Async Callback is supported by Linux AIO using system signal or notify function

NOTE:
- poll() improves select() in that there is no limitation on handle counts, and epoll_xxx() improves poll() in that data transfer between kernel and user space is reduced greatly and the complexity of the corresponding kernel logic is also reduced from O(N^3) to O(N).[17]

VI. I/O Models on .Net

- Async Callback is supported in the form of Event-based Asynchronous Pattern
- Async Polling/Async Waiting/Async Callback is supported in the form of IAsyncResult Async Pattern

.Net also provides a mechanism for developers to call Synchronous method in Asynchronous way using Asynchronous Programming Using Delegates.

More information can be found @ Asynchronous Programming Design Patterns

VII. I/O Models on Java

-
I/O multiplexing is supported in Readiness Selection pattern in Java NIO library. It works similarly as BSD Select mechanism
- Async Polling/Async Waiting/Async Callback will be supported in JDK7 AIO as Java Future style and Callback Style[26]

The difference between Java NIO and AIO can be described as "With non-blocking I/O, you're getting events through a selector when the channel is ready to do I/O. The asynchronous I/O gives you a notification when the I/O is completed".[27]

Currently, scalable I/O library on Java is very raw & preliminary, many IO libraries for Java exit in the community, such as Apahce MINA, Jboss Netty, Grizzly from GlassFish and Naga.

Update@12/23/2009:
- The I/O Multiplexing mechanism is essentially a Reactor design pattern, while Asynchronous I/O is in fact a Proactor design pattern. [31]

[Reference]

- General
1. Wikipedia explanation, Async vs Sync, Async
2. Asynchronous IO, http://en.wikipedia.org/wiki/Asynchronous_I/O
3. C10K Problem, http://www.kegel.com/c10k.html
4. MPI Async I/O, http://beige.ucs.indiana.edu/I590/node109.html
5. Sync/Async by Oracle GURU, http://www.ixora.com.au/notes/asynchronous_io.htm

- Windows
6. Mulithreaded Async IO V.S. IO Completion Port
7. IO Completion Port (IOCP Introduction, Inside IOCP)
8. Callback supports by Windows Kernel
9. Tips for Async I/O and IOCP
10. Async vs Sync I/O on Windows

- Linux
11. Async IO on Linux from Lighttpd
12. Ideas for Async-IO model from Squid
13. Kernel AIO support for Linux
14. Linux AIO Programming Introduction
15. Linux AIO Design Notes
16. epoll performance analysis
17. select, poll, epoll comparison
18. I/O Event Handling Under Linux

- .Net
19. .NET Asynchronous Programming Design Patterns
20. .Net Asynchronous File I/O
21. Call Sync Method in Async Way on .Net

- Java
22. Java I/O Tutorial
23. Java Async I/O Library from IBM
24. Java NIO (Tutorial, JDK Doc, JSR51, Intr1, Intr2)
25. Java NIO.2/AIO (JSR203 OpenJdk NIO Page, Design Notes)
26. Java One 2009(NIO in JDK7, AIO in JDK7)
27. Interview on Java NIO.2
28. Java Concurrency Tutorial

- Talk
29. http://www.slideshare.net/Arbow/asynchronous-io-programming
[This Presentation describes many reasoning and insights behind Async IO]
30. http://www.slideshare.net/directi/async-io-and-multithreading-explained
[Interesting and Intuitve explanation on Threading and Async IO]

31. Great Article on Comparing High Performance I/O Design Patterns
32. Asynchronous I/O and Asynchronous Disk I/O Explorer
32. I/O Multiplexing & Scalable Socket Servers

6/01/2009

Brop2 - Time Management Made Practical

the Zen of Results is a great article about time management in work and life, but its too long to catch the core ideas. Here I summarized the core principles/practices to make it easier to remember and easier to apply them in daily work and life.

Monday Vision (Make Plan Before Action)
- Plan what to do in the upcoming week
- Identify work items from long term planning, work calendar, information inbox
- Identify valuble outcomes

Daily Outcome
- Make detail Todo list every day
- Prioritize work items
- Focus on Outcome/Value/Result

Friday Reflection
- Identify what's good, what's bad
- Think why good, why bad
- Apply learns into furture
- Balance your work and life

To make it simpler, I composed a word - BROP2
- Balance your work & life
- Reflect what's good/bad periodically
- Outcome drive your action and priority
- Plan before action, keep big picture in mind
- Prioritize work items: MUST, SHOULD, COULD

Brop2 your work and life!