4/29/2009

Feed Flex with Input and Its Multiple Buffer Support

Flex is a great tool to help implementing lexer incompilers. After lexer is generated by Flex, you(or Yacc generated code) can call yylex() to tell lexer to tokenize input language source code. yylex() returns when it matches a rule that has return action with it (return value is what defined in action code), or when it reaches the end of input file(return value is 0).

But where does the lexer read input language source code?

The auto-generated lexer use YY_INPUT() macro to read input data into an internal buffer, and operate on this buffer directly. When the buffer is empty, the YY_INPUT()'s default definition will read data from file pointed by global pointer - yyin - into the internal buffer and the scan operation continues. By default, the yyin points to stdin.

So, if you want the lexer to scan some specific file, you can point yyin to the corresponding FILE pointer or you can call yyrestart() to reset it indirectly.

But this is not enough. Consider the following two scenarios:

1. You want to process language that has #include directives that is similar to C/C++, how can you switch to the included file after recognizing it immediately?

2. You want to process language source code that is stored in a in-memory data structure, how can you tell lexer to read from that mem location?

Flex provides so called "multiple buffer" mechanism to solve these problems. The basic idea is that - you can switch among buffers when you are doing tokenizing. The lexer just read chars from the *current* buffer directly.

To switch to another buffer, you must create it first. You can use:

1. YY_BUFFER_STATE yy_create_buffer( FILE *file, int size ) to create buffer from file pointer

2. yy_scan_string(const char *str)/yy_scan_bytes(const char *bytes, int len)/yy_scan_buffer(char *base, yy_size_t size) to create buffer from in memory data

After buffer is created, you can switch to it using:
- yy_switch_to_buffer()

The lexer will then read from the new buffer in the further scaning.

When the buffer is not needed, on't forget to delete your created buffer using:
- yy_delete_buffer()

Please read Flex document for detailed doc about these functions.

[Reference]
http://fixunix.com/unix/526410-flex-bison-parse-command-line-arguments.html
http://flex.sourceforge.net/manual/Multiple-Input-Buffers.html
http://www.delorie.com/gnu/docs/flex/flex.1.html

4/25/2009

Unicode Support in Flex

Flex/Yacc is widely used in programming language design and implementation projects, especially those projects that are from academic community. But it's a pity that many serious projects which need language compiling tasks don't use Flex even if they use Yacc. For example, Mysql use Yacc and a hand written lexer to do SQL parsing. One of the reasons is that the original Flex lacks Unicode support directly, which is unacceptable for software that aims to today's flat world.

A hot discussion with useful information on this problem can be found Here .

Although Flex doesn't have direct Unicode support, there are some solutions to this problem:

Solution 1 - Use Utf-8 Encoding

The basic idea is that by using utf-8 encoding, the source language code can be looked as a byte stream that is compatible with ASCII, with which current Flex works well.

For single Unicode character not in range regex expression, just substitute it with its utf-8 counterpart (most likely, multiple bytes sequence). For unicode character range regex expression, you may need rewrite it into several regex expressions using explicit byte values (such as: \x1f, \xa5)

Let's see some concrete example:

1. Suppose you plan to use Chinese word "转移" as language keyword for branch statement, you can write the rule as:
"\xe8\xbd\xac\xe7\xa7\xbb" { return KW_BRANCH; }
Here, "e8 bd ac e7 a7 bb" is utf-8 representation of "转移".

2. Suppose you plan do define the Unicode version of Any Char (as '.' in ansi version), that is to say define chars in range: '\u0000' - '\uffff' as a name UniAnyChar , you can write the rule as:

UniAnyChar [\0-\x7F] | [\xC2-\xDF][\x80-\xBF] | \xE0[\xA0-\xBF][\x80-\xBF] | [\xE1-\xEF][\x80-\xBF][\x80-\xBF]

To fully understand this example, you need to know how utf-8 works and how regex is used in flex. Please read the following links for further information.
http://lists.gnu.org/archive/html/help-flex/2005-01/msg00030.html
http://lists.gnu.org/archive/html/help-flex/2005-01/msg00043.html

In fact, you are wrting regex against bit pattern other than more readable charset in this solution. You can see that if you adopt this method, the lex rule will be very hard to understand and maintain.

Solution 2 - Unicode Version Flex

There is a patch that can modify Flex to have Unicode support. After apply this patch, build the binary, you can use the -U option to process a Unicode lex file. In Unicode lex file, regex can be written using Unicode chars directly. Here is an example of Unicode lex file.

Here is the instruction on how to apply this patch.

But, this patch is NOT official release, and only works for Flex version 2.5.4a. Further more, it is designed for linux world, you may need cygwin/mingw to apply it on windows platform.

Other Tool

Of course you have other choices, such as using Unicode built-in language and corresponding lex/yacc tools.

Or you can use another fantastic tool called ANTLR, that works with C/C++.

Some paper also reported Unicode flex projects.

NOTE:
- On Windows platform, you can use WideCharToMultiByte()/MultiByteToWideChar() with CodePage = CP_UTF8 to do utf-8 encoding/decoding.

4/19/2009

Memory Management in Native Code

Memory management is a core task in native world, careless usage of dynamic memory may cause the following problems:

- 1. Heap Fragment, this will introduce performance penalty since it breaks data locality
- 2. Memory Leak, it's a prgm correctness problem and a horrible defect for long-run software

Here I summarized some tips related to the two issues: Mem Optimization & Mem Correctness


Part I - Mem Optimization

General Principles
1. Ensure mem layout cohesion (aka. improve data locality)
2. Avoid frequent alloc/free (aka. batch mem ops, prefer few & bulk over large & small mem ops)

How to implement them?
- Redesign your data structure to make them live in large blocks
- Make unrelated data structure in different region
- Use memory pool to manage mem


Part II - Mem Correctness

One of the challenges when doing native code development is avoiding memory leak. It's so easy (also difficult to avoid) to forget releasing each memory block/object that has been allocated explicitly.

Types of Mem Leaks
1. Constant Leak, allocated mem are totally forgotten to release
2. Casual Leak, allocated mem are not release under some conditions
3. One-Time Leak , allocated mem is not released but that line of code only get executed once (for example, mem allocated in ctor of singleton objec)
4. Implicit Leak, mem blocks are hold too long (released too late in application life cycle, this kind of mem leak happens even in GC enabled language such as Java/.Net, for example, unused objects are still reachable through Root Set in GC)

To deal with Mem Leak problem, you have two choices:
- Avoid it
- Detect & Fix it

Sec. I - How to Avoid Mem Leak?

1. Adopt Resource Acquisition Is Initialization (RAII) Mechanism in C++

std::auto_ptr is a good choice if RAII is semantic right for your problem. If you want to ensure your object/mem get released whenever the control goes out of some scope (for example, multiple exit path, potential exception etc.), RAII can be used to solve your problem.

But it can't be passed as return value, can't be put in STL containers.

2. stack based allocation

_alloca() will allocated mem from stack rather than heap. The mem returned will be released when function returns.

But there is potential stack overflow exceptions, since stack is far more smaller than heap.

3. Reference Counting (aka, share/smart pointer)

Use some data structure to track how many owners are referencing the mem block or objects. When reference counting is zero, the mem/object is released.

std::tr1::shared_ptr and boost::shared_ptr are all based on Reference Counting and RAII concepts. They resolved the problem of not being able to be put in container, can't be passed as parameter and return value etc.

But, if your objects have cyclic reference, this mechanism doesn't work. The fundamental problem behind is that - the semantic of "useless", should be defined as "Can't Be Reached", not "No One References".

Another draw back of smart pointer style reference counting is that, it can't handle pointers that should be put into a union structure. Because union can't consists of any member fields that has user defined ctor/non-trivial default ctor/dtor/copy ctor etc.

4. Garbage Collection (yes, gc for C/C++)

Most modern GC uses Mark & Sweep algorithm to implement GC. The idea behind is that, GC has pointer list for all heap-allocated objects and a Root Set object pointer list. When garbage collection is triggered, it traverse from root set to find all reachable objects and mark them. Those unmarked objects are garbage that can be deleted. GC for C/C++ is a huge topic, [7] is a very good reference doc.

General purpose GC for C/C++ is difficult, but for your specific application requirement, it may not that challenge. According to my own GC implementation experience, the most difficult part is define your object ownership policy.

GC is great, but it still can't handle some "semantic garbage objects". That is to say, if you hold references to some objects that are in fact you will never use again, GC won't collect mem and other resources owned by these objects.

Essentially, Memory Management is all about Consistency of Ownership.
- Each mem block should have an owner
- Each mem block should have only 1 owner
- Only the owner of the mem block is responsible for its life cycle

So, the most important design principle about C/C++ memory management is - consider carefully about the ownership of an object/mem block: when and who should be responsible for releasing it.

Sec. 2 - How to Detect Mem Leak

1. Use Debug Version C RunTime library

1.1 Use _CrtDumpMemoryLeaks()

Step 1. include the following directives into each cpp source file
#define _CRTDBG_MAP_ALLOC
#include "crtdbg.h"
#include "stdlib.h"

Step 2. call _CrtDumpMemoryLeaks() at the line where you want to check memory leaks.

This method has a drawback that mem objects that are released after _CrtDumpMemoryLeaks() invocation will be treated as leaked mem. (This happens when mem is released in global object's dtor) It's a false negative.

1. 2 Use _CrtSetDbgFlag()

Add the following code at the entry point of your application

int nFlag = _CrtSetDbgFlag( _CRTDBG_REPORT_FLAG );
nFlag |= _CRTDBG_LEAK_CHECK_DF;
_CrtSetDbgFlag( nFlag );

This method don't have the drawback of 1.1, but you have no control when the mem leak action performs.

1. 3 Use CrtMemState

_CrtMemState cms1, cms2, cms3;
_CrtMemCheckpoint(&cms1);

/* code to check */

_CrtMemCheckpoint(&cms2);
if(_CrtMemDifference(&cms3, &cms1, &cms2))
{
_CrtMemDumpStatistics(&cms3);
}

This code will dump heap statistics info about the changes happened in the "code to check".

_CrtSetReportMode() can be used to control where to output these diagnose information.

_crtBreakAlloc / {,,msvcrtd.dll}_crtBreakAlloc / _CrtSetBreakAlloc() can be used to control the debug break condition.
More info on CRT mem debug routines, please see reference [1] and [2]

2. Monitor Process Working Set

The Win32 API GetProcessMemoryInfo() can query process working set size. You can use this api to check whether the working set size is changed after calling some suspicious functions.

It can't tell you where mem leak happens, but it a good way to write unit test to track mem leak problems.

3. Use Professional Tools

BoundsChecker
IBM Purify
LeakTracer(Linux)
Windows Leaks Detector
Valgrind(Linux)
MemWatch
Insure++
Visual Leak Detector
User Mode Dump Heap

Part III - Other Tips

1. use "#define SAFE_DELETE(ptr) if (ptr != NULL) { delete ptr; ptr = NULL; }" to avoid redeleting the same object.
2. remember to delete objects pointed by elements in container that contains pointers.
3. pair delete/new delete[]/new[] malloc/free correctly.
4. use "new (std::nothrow)" to eliminate exceptions raising in low mem situation.


NOTE: (Lessons learned from topic investigating)

When solving hard problems, Be Sure To:
1. Use well-known idioms and well-understood mechanisms
2. Keep things as simple as possible

Memory Management:
1. It's another subsystem/component of your whole system
2. Design this component with care
3. Avoid using new/delete directly

Techniques discussed here apply to not only memory blocks, but also any type of "resource" that needs explicit requesting/releasing.

[Reference]

Mem Leak
1. Mem Leak Detection http://www.ddj.com/cpp/204800654
2. Microsoft CRT Debug Routines http://msdn.microsoft.com/en-us/library/1666sb98(VS.71).aspx
3. Microsoft CRT Debug Tech http://msdn.microsoft.com/en-us/library/zh712wwf.aspx
4. Mem Debugger List http://en.wikipedia.org/wiki/Memory_debugger
5. Purify from IBM - Use Purify for C code
6. Mem Leak in Java/.Net http://www.agiledeveloper.com/articles/MemoryLeak092002.pdf

Garbage Collection
7. C/C++ GC from HP http://www.hpl.hp.com/personal/Hans_Boehm/gc/
8. GC for C/C++ http://blog.codingnow.com/2008/06/gc_for_c.html
9. How .Net GC works, GC in OO Language, Auto GC

Understanding Mem Management
10. C++ Mem Management http://www.slideshare.net/reachanil/c-memory-management
11. Inside Mem Management http://www.ibm.com/developerworks/linux/library/l-memory/
12. C++ Memory Management: From Fear to Triumph (Part 1, Part 2, Part 3)
13. http://www.cantrip.org/wave12.html
14. Mem Optimization http://www.codingnow.com/2008/memory_management.ppt
15. Mem Mgmt 4 Sys Coder http://www.enderunix.org/simsek/articles/memory.pdf

Misc
16. C++ smart pointers http://www.onlamp.com/lpt/a/6559
17. Mem Leak Definition
18. Is Mem Leak Ever OK?

4/17/2009

Reading in Database Systems 4E

from http://redbook.cs.berkeley.edu/bib4.html

Data Models and DBMS Architecture

Michael Stonebraker Joseph M. Hellerstein. What Goes Around Comes Around.

Joseph M. Hellerstein Michael Stonebraker. Anatomy of a Database System.

Query Processing

Patricia G. Selinger Morton M. Astrahan Donald D. Chamberlin Raymond A. Lorie Thomas G. Price. Access Path Selection in a Relational Database Management System.. Proc. SIGMOD Conference, 1979, 23-34.

Leonard D. Shapiro. Join Processing in Database Systems with Large Main Memories.. ACM Trans. Database Syst., 11(3), 1986, 239-264.

David J. DeWitt Jim Gray. Parallel Database Systems: The Future of High Performance Database Systems.. Commun. ACM, 35(6), 1992, 85-98.

Goetz Graefe. Encapsulation of Parallelism in the Volcano Query Processing System.. Proc. SIGMOD Conference, 1990, 102-111.

Chris Nyberg Tom Barclay Zarka Cvetanovic Jim Gray David B. Lomet. AlphaSort: A Cache-Sensitive Parallel External Sort. VLDB J., 4(4), 1995, 603-627.

Lothar F. Mackert Guy M. Lohman. R* Optimizer Validation and Performance Evaluation for Distributed Queries.. Proc. VLDB, 1986, 149-159.

Michael Stonebraker Paul M. Aoki Witold Litwin Avi Pfeffer Adam Sah Jeff Sidell Carl Staelin Andrew Yu. Mariposa: A Wide-Area Distributed Database System. VLDB J., 5(1), 1996, 48-63.

Data Storage and Access Methods

Norbert Beckmann Hans-Peter Kriegel Ralf Schneider Bernhard Seeger. The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles.. Proc. SIGMOD Conference, 1990, 322-331.

Michael Stonebraker. Operating System Support for Database Management.. Commun. ACM, 24(7), 1981, 412-418.

Jim Gray Goetz Graefe. The Five-Minute Rule Ten Years Later, and Other Computer Storage Rules of Thumb.. SIGMOD Record, 26(4), 1997, 63-68.

David A. Patterson Garth A. Gibson Randy H. Katz. A Case for Redundant Arrays of Inexpensive Disks (RAID).. Proc. SIGMOD Conference, 1988, 109-116.

Transaction Management

Jim Gray Raymond A. Lorie Gianfranco R. Putzolu Irving L. Traiger. Granularity of Locks and Degrees of Consistency in a Shared Data Base.. IBM, September, 1975.

H. T. Kung John T. Robinson. On Optimistic Methods for Concurrency Control.. Proc. VLDB, 1979, 351.

Rakesh Agrawal Michael J. Carey Miron Livny. Concurrency Control Performance Modeling: Alternatives and Implications.. ACM Trans. Database Syst., 12(4), 1987, 609-654.

Philip L. Lehman S. Bing Yao. Efficient Locking for Concurrent Operations on B-Trees.. ACM Trans. Database Syst., 6(4), 1981, 650-670.

C. Mohan Donald J. Haderle Bruce G. Lindsay Hamid Pirahesh Peter M. Schwarz. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging.. ACM Trans. Database Syst., 17(1), 1992, 94-162.

C. Mohan Bruce G. Lindsay Ron Obermarck. Transaction Management in the R* Distributed Database Management System.. ACM Trans. Database Syst., 11(4), 1986, 378-396.

Jim Gray Pat Helland Patrick E. O'Neil Dennis Shasha. The Dangers of Replication and a Solution.. Proc. SIGMOD Conference, 1996, 173-182.

Extensible Systems

Michael Stonebraker. Inclusion of New Types in Relational Data Base Systems.. Proc. ICDE, 1986, 262-269.

Joseph M. Hellerstein Jeffrey F. Naughton Avi Pfeffer. Generalized Search Trees for Database Systems.. Proc. VLDB, 1995, 562-573.

Guy M. Lohman. Grammar-like Functional Rules for Representing Query Optimization Alternatives.. Proc. SIGMOD Conference, 1988, 18-27.

Database Evolution

Surajit Chaudhuri Vivek R. Narasayya. AutoAdmin 'What-if' Index Analysis Utility.. Proc. SIGMOD Conference, 1998, 367-378.

Philip A. Bernstein. Applying Model Management to Classical Meta Data Problems.. Proc. CIDR, 2003.

C. Mohan Inderpal Narang. Algorithms for Creating Indexes for Very Large Tables Without Quiescing Updates.. Proc. SIGMOD Conference, 1992, 361-370.

Data Warehousing

Surajit Chaudhuri Umeshwar Dayal. An Overview of Data Warehousing and OLAP Technology.. SIGMOD Record, 26(1), 1997, 65-74.

Patrick E. O'Neil Dallan Quass. Improved Query Performance with Variant Indexes.. Proc. SIGMOD Conference, 1997, 38-49.

Jim Gray Surajit Chaudhuri Adam Bosworth Andrew Layman Don Reichart Murali Venkatrao Frank Pellow Hamid Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub Totals.. Data Min. Knowl. Discov., 1(1), 1997, 29-53.

Yihong Zhao Prasad Deshpande Jeffrey F. Naughton. An Array-Based Algorithm for Simultaneous Multidimensional Aggregates.. Proc. SIGMOD Conference, 1997, 159-170.

Stefano Ceri Jennifer Widom. Deriving Production Rules for Constraint Maintainance.. Proc. VLDB, 1990, 566-577.

Joseph M. Hellerstein Ron Avnur Vijayshankar Raman. Informix under CONTROL: Online Query Processing.. Data Min. Knowl. Discov., 4(4), 2000, 281-314.

Yannis Kotidis Nick Roussopoulos. DynaMat: A Dynamic View Management System for Data Warehouses.. Proc. SIGMOD Conference, 1999, 371-382.

Data Mining

Tian Zhang Raghu Ramakrishnan Miron Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases.. Proc. SIGMOD Conference, 1996, 103-114.

John C. Shafer Rakesh Agrawal Manish Mehta. SPRINT: A Scalable Parallel Classifier for Data Mining. Proc. VLDB, 1996, 544-555.

Rakesh Agrawal Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules in Large Databases.. Proc. VLDB, 1994, 487-499.

Surajit Chaudhuri Vivek R. Narasayya Sunita Sarawagi. Efficient Evaluation of Queries with Mining Predicates.. Proc. ICDE, 2002, 529-.

Web Services and Databases

Eric A. Brewer. Combining Systems and Databases: A Search Engine Retrospective.

Sergey Brin Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine.. Computer Networks, 30(1-7), 1998, 107-117.

Sergej Sizov Martin Theobald Stefan Siersdorfer Gerhard Weikum Jens Graupmann Michael Biwer Patrick Zimmer. The BINGO! System for Information Portal Generation and Expert Web Search.. Proc. CIDR, 2003.

Dean Jacobs. Data Management in Application Servers.

Serge Abiteboul. Querying Semi-Structured Data.. Proc. ICDT, 1997, 1-18.

Roy Goldman Jennifer Widom. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases.. Proc. VLDB, 1997, 436-445.

Jianjun Chen David J. DeWitt Feng Tian Yuan Wang. NiagaraCQ: A Scalable Continuous Query System for Internet Databases.. Proc. SIGMOD Conference, 2000, 379-390.

Stream-Based Data Management

Eric N. Hanson Chris Carnes Lan Huang Mohan Konyala Lloyd Noronha Sashi Parthasarathy J. B. Park Albert Vernon. Scalable Trigger Processing.. Proc. ICDE, 1999, 266-275.

Praveen Seshadri Miron Livny Raghu Ramakrishnan. The Design and Implementation of a Sequence Database System.. Proc. VLDB, 1996, 99-110.

Ron Avnur Joseph M. Hellerstein. Eddies: Continuously Adaptive Query Processing. Proc. SIGMOD Conference, 2000, 261-272.

Donald Carney Ugur Çetintemel Mitch Cherniack Christian Convey Sangdon Lee Greg Seidman Michael Stonebraker Nesime Tatbul Stanley B. Zdonik. Monitoring Streams - A New Class of Data Management Applications.. Proc. VLDB, 2002, 215-226.

4/14/2009

Thread Model for Multi-threading Programming

The meaning of Thread Model (A.K.A. Thread Paradigm, Thread Design Pattern) varies in different context, it may refer to:
- How thread is implemented and its relation ship with process in OS
- How multi-threading is used to implement client/server based system
- How (usually multiple) thread works together to accomplish some large scale & complicated task

In this post, Thread Model means how to arrange multiple threads to work together and get assigned tasks done.

1. The Models

Master/Slave (a.k.a Boss/Worker or Master/Worker)
Master divide the whole big task into small sub tasks (a.k.a. work item), assign them to those worker threads, and then wait/poll until all these work items are done. Thread Pool, Work Queue are all variants of this model.

This model is best suitable for asynchronous I/O framework, such as scalable network service or work queue library. The boss thread receives client requests, put them into some queue and the worker threads fetch one request, serve it and continue the loop ...

Pipeline
The pipeline model divides up a task into a series of steps, and passes the results of one step on to the thread processing the next. Each thread does one thing to each piece of data and passes the results to the next thread in line. Usually, all stages are active at the same time, thus the whole system throughput is limited by the lowest stage. Very similar to assembly in manufacturing industry .

This model is suitable for stream data processing such as 3D graphic rendering.

Work Crew (a.k.a Peer Model)
All threads are the same type, each process some specific portion of the data, but perform the same operations. These threads may need to communicate in some predefined way and cooperate in the remaining steps. [for example, SMP version of merge/sort]

The peer model is suitable for applications that have a fixed or well-defined set of inputs (classical HPC applicatioins), such as matrix multipliers, parallel database search engines etc.

Producer/Consumer
It's just a special(also simple) case of Pipeline model or kind of Master/Slave model, depends on what's your perspective. In this model, producer output some data in to some buffer, and the consumer input some data from the bufffer. If the data is work item descriptioin, it is kind of special Master/Slave model; if it's pure data that needs some further manipulation, it is kind of Pipeline model.

2. Case Studies

WPF thread model
http://msdn.microsoft.com/en-us/library/ms741870.aspx

COM thread model
http://blogs.msdn.com/oldnewthing/archive/2004/06/02/146671.aspx
http://blogs.msdn.com/larryosterman/archive/2004/04/28/122240.aspx

Apache Axis2/C
http://wso2.org/library/articles/understanding-axis2-c-threading-model

Language Thread Model Overview(C#/Python/Perl/Ruby)
http://justin.harmonize.fm/index.php/2008/09/threading-model-overview/

[Reference]
1. Thread Design Pattern, http://www.cercs.gatech.edu/multicore/images/f/f0/MultithreadingModels.ppt
2. Concurrent Programming with Threads, http://www.gridbus.org/~raj/tut/multi-threading.ppt
3. Perl Thread Doc http://perldoc.perl.org/perlthrtut.html
4. HPC thread model, http://software.intel.com/en-us/articles/threading-models-for-high-performance-computing-pthreads-or-openmp/
5. The Thread Model, http://www.informit.com/articles/article.aspx?p=169479&seqNum=5
6. Pthread Tutorial, http://cs.gmu.edu/~white/CS571/pthreadTutorial.pdf
7. Another Pthread Tutorial, http://randu.org/tutorials/threads/
8. Parallel and Distributed Programming Using C++, http://www.informit.com/store/product.aspx?isbn=0131013769

4/09/2009

Compiler Books - the Dragon, Whale and Tiger one

The Dragon book focus more on the front end (especially theory for language parsing), it's suitable for college textbook.

The Wale book focus more on back end optimization, it's strongly recommended for industrial compiler developers.


The Tiger book is balanced on theory and implementation. It uses one simple but concrete language compiler as example through the whole book. It nearly covers all topics about compiler and language with proper depth. If you just want to read one compiler book, this one is for you!

If you are doing language implementation, you may need another piratical book about compiler, which is titles as "Yacc & Lex".

With these books in hand, you definitely don't have any technical(domain expertise related) bars to design/implement a programming language!

But in fact, reading/understanding these books are the most difficult part of compiler implementation. Understanding the detailed semantic of the language that you gonna to implement is the real challenging according to my compiler front end implementation experience.

Another challenge lies in engineering excellence. Compilers are large and complicated software, engineering skills such as design pattern, good coding style and efficient memory management are all critical factors to the success of the compiler implementation.

4/07/2009

The Zen of Results

“Absorb what is useful, reject what is useless, add what is specifically your own.”
– Bruce Lee

Sources Of Insight has a great article titled as The Zen of Results, which talks about daily personal management tips.

The basic practices it proposed are simply as:

Monday Vision
– Each Monday, identify the most important outcomes for the week.
– Take the time to see the forest from the trees.

Daily Outcomes
– Each day, make a short To Do list (toward Outcomes)
– Title it by date (e.g. 01-29-08).
– Start by listing your MUST items. Next, list your SHOULD or COULD.
– Use this list throughout your day, as you fish your various streams for action (Your streams include meetings, email, conversations, or bursts of brilliance throughout the day.)

Friday Reflection
– Evaluate what you got done, or didn't, and why.
– Identify 3 things that are working well.
– Identify 3 things that are not working well.
– Carry your lessons forward to your Monday Vision.

It also summarized its key concepts as:

– Carve out time for what’s important!
– Outcomes guide your action.
– Keep your outcomes scannable at a glance.
– Organize outcomes by work projects, personal projects, and life frame.
– Balance is your friend.
– Pioritize incoming action items against your outcomes.
– Triage incoming action items to either do, queue, schedule or delegate it.
– Use daily tickler lists for action items.
– Schedule items that will take time.
– Think in terms of value delivered (to yourself or others).
– Windows of opportunity matter!
If you fall off the horse, you can get back on.
– It’s question-driven.
– Principles, patterns and practices over tool-driven.

It has detailed suggestions how to apply these practices in daily life, a free ebook is available here.

I use some of these practices in my daily life and it works well. Hope it works for you as well.