3/27/2008

Command @ Windows - Part I

Since it is first introduced to the world, Windows has got the reputation that - "it is NOT CUI friendly". And it has been hated by many talents from *Nix world for decades of years.

As time goes by, things get better. Microsoft finally realized that a good CUI (a Command set and a Shell environment) is very very important for a decent Operating System, especially for an OS that aims to Server market, where automation is one of the biggest concerns. Consequently, Microsoft shipped his latest Shell environment - "Power Shell"(I'd like to call it as POSH) and lots of useful commands. In this series of articles, I will introduce some of them that interest me a lot.

First, you can go HERE, to get a full list of available commands in Windows system. Most likely, it will guide you to a TechNet url. It's full and in detail, but somewhat boring ......

1. shutdown - Shutting down the Machine
  There is a "reboot" in *Nix world, while "shutdown" in Windows. But both of them can do "reboot" AND "shutdown", interesting?
  There are mainly two types of parameters,
  operation type:
    /l - log off
    /s - shutdown
    /r - reboot
    /h - hibernate
  Misc option:
    /t xxx, time to wait before the action happen, xxx is in seconds
    /m machine_name, the machine to shutdown
  One thing need to mention is that if there is no option given, it is the same as "shutdown /?", and also you must give at least one command parameter to make it really shutdown.

  You can type "shutdown /?" for the detailed usage information.

2. findstr - Find string in files using Regular Expression
  It's an amazing tool, something like "grep" in *nix world. The command syntax is: findstr [optional parameters] string_to_found [optional files].
We can category its parameters into 2 types,
  About match pattern:
    /l - use search string literally
    /r - use search string as regular expression
    /i - ignore case when searching
    /b - match at the beginning of a line
    /e - match at the end of a line
    /s - search files in subdirectories
  About display formate:
    /n - display line number of found line
    /o - display offset of found string
    /m - display file name only
  If you don't give the search file list, it will search the content from stdin. This is a good feature to use in combination with command pipeline. For example, we can use:
dir /n /b /s d:\ | findstr /r /i /e ".*\.pdf"
to list all pdf files and its full path in D drive. And we can use:
dir /n /b /s E:\ | findstr /r /i /e ".*wcf[^\\]*\.chm" to find chm ebooks in E drive, whose file name contain "wcf". It's very fast and cool!

Here are the summary on Regular Expression syntax:
. Wildcard: any character
* Repeat: zero or more occurrences of previous character or class
^ Line position: beginning of line
$ Line position: end of line
[class] Character class: any one character in set
[^class] Inverse class: any one character not in set
[X-y] Range: any characters within the specified range
\X Escape: literal use of metacharacter X
\<xyz Word position: beginning of word
xyz\> Word position: end of word

3. where - find it in PATH
  It's used for search too. But there are two things that make it differ from findstr:
  a. It only searches directories listed in PATH variable
  b. It can't recognize Regular Expression stings, just string with wild char * and ?.
  This is a good tool to find which specific executable is used when you use a command found in PATH. For example, where cmd.exe tells you where the "cmd.exe" is stored.

4. whoami, hostname, winver/ver, systeminfo - all things about yourself
whoami - currently login user
hostname - name of current machine
winver/ver - windows version information
systeminfo - system information

3/20/2008

System Books on Windows OS


Windows System Programming 3E

The advantages of the first book "Windows System Programming":
1. It covers broader topics, nearly all the topics you need to know as system developer
2. It has some useful performance comparison using concrete codes
3. It has some helpful comparisons with Unix counterparts

The dawbacks of this book:
1. Most topics are just skin deep, especially Dll/Mem/Net related chapters
2. The writing style is not perfectly clear(at least to me)

In general, it's a good starting point. After reading this, you will know what to deep into.


Windows Via C/C++ 5E

The second book may be the most famous windows system programming book:
1. It covers core system programming topics in great detail: Dll/Thread/Mem etc
2. It explains these topics clearly with useful code examples
3. BUT, its coverage is somewhat narrow, no file i/o, networking, security
4. This books only tells you HOW, very few about WHY is mentioned.

In general, this book should be your reference book in daily work


Windows Internals 5E

The last book covers broad range of os topics in great detail. But it only tells How/Why the underlying system works. It just helps you to understand the system components. It's not for any programming purpose directly. But as you deep into windows system developing, you will need this book eventually.

But this book only show those internals using plain text or some kernel debugging tools, no os source code is exposed, no guidelines on how to change/modify the source code and rebuild it to see the effects. Its the main drawback of this book, compared with linux counterparts.

3/15/2008

LiveJournal's Backend: A history of scaling

Today, Livejournal has 20 million+ dynamic page views per day for 1 million users. This scalable website was developed by Danga Interactive(now part of Six Apart). In late 2005, the developers released a presentation about the backend architecture of this web site. The interesting part of it is not that it gave its architecture, but that it gave the history of the architecture, you can learn a lot from the scalable infrastructure evolution.

The PDF version of the presentation can be found at: http://www.danga.com/words/2007_06_usenix/usenix.pdf

You can also see the Flash version here: http://www.slideshare.net/vishnu/livejournals-backend-a-history-of-scaling

From this presentation we can learn that, the main components of a high scalable web site(interactive, which means it hosts user created content) are:

1. Load Balancer (Most likely, it will be F5 hardware like BIG/IP, or software Reverse Web Proxy, many web server can act as this role, or even LVS/HA). It sits at the most front of the whole system. Redistribute the client requests among the back end server pool and send response back properly.

2. Web Server Farm (Easies part to scale, maybe diskless.).It's CPU bound server. Since web request processing is in per-user style, the task division of web site application is very natural. The scalability of the web logic component is very easy and straight forward to reach - "Add More Machines". But in the case of user session data is involved in a unstable server farm(which means server may down), standalone cache server is needed. If the business logic is complex and separated from application logic, some framework like EJB/Spring is needed. But this only happens in business processing(Mainly in Java) world.

3. Distributed Cache (It's inside your application logic, not Content delivery network. To speed up your database access. It's network I/O bound and needs lots of memory). It should be apart from web server to provide better whole system availability. The performance and capability trade-off is very interesting and challenging. There are many available solutions in this area, for example, Oracle Coherence(formerly Tangosol), JBoss Cache and Danga's MemCacheD.

4. Database Partition/Replication/Clustering (Most likely, it will be the bottle neck of your system. Both I/O & CPU bounded). This is the most challenging part to be scalable. The presentation devotes most of its content to DB discussion. Good DB schema(vertical/horizontal partitioned) design and clustering(replication/mirroring, master/slave) configuration are very critical to extreme scalable and available web system.

5. Reliable Distributed Storage System, especially for those media(Photo/Video) hosting service provider. (RAID, NFS or home brewed file system)

6. Monitoring and Fail over service.

Danga design&implemented their own Cache/Reverse Proxy/Distributed Storage system. It's so cool!

Their memcached is widely used in many very popular web sites.

P.S.
1. Since web farm nodes are all diskless, the web server nodes are netbooted from a redundant NFS image. Thus the server farm is more cheaper and more stable. (Disk failure is one of the main system failure factor)

2. The current Database(mySql) architecture is that: one global db cluster, nine user specific data cluster. In one user data cluster, it's master-master pattern(mirroring each other actually). Only global db cluster is master-slave pattern. Why not make all clusters in master/slave pattern? Because this can help speed up write operations. If one master(write) node and 100 slave(read) nodes, many time will be spent on write propagation. In the meanwhile, each piece of data is redundant in every db nodes, it's not necessary, only wasting spaces.

A good article about Danga's presentation can be found here: http://www.linuxjournal.com/article/7451