4/29/2009

Feed Flex with Input and Its Multiple Buffer Support

Flex is a great tool to help implementing lexer incompilers. After lexer is generated by Flex, you(or Yacc generated code) can call yylex() to tell lexer to tokenize input language source code. yylex() returns when it matches a rule that has return action with it (return value is what defined in action code), or when it reaches the end of input file(return value is 0).

But where does the lexer read input language source code?

The auto-generated lexer use YY_INPUT() macro to read input data into an internal buffer, and operate on this buffer directly. When the buffer is empty, the YY_INPUT()'s default definition will read data from file pointed by global pointer - yyin - into the internal buffer and the scan operation continues. By default, the yyin points to stdin.

So, if you want the lexer to scan some specific file, you can point yyin to the corresponding FILE pointer or you can call yyrestart() to reset it indirectly.

But this is not enough. Consider the following two scenarios:

1. You want to process language that has #include directives that is similar to C/C++, how can you switch to the included file after recognizing it immediately?

2. You want to process language source code that is stored in a in-memory data structure, how can you tell lexer to read from that mem location?

Flex provides so called "multiple buffer" mechanism to solve these problems. The basic idea is that - you can switch among buffers when you are doing tokenizing. The lexer just read chars from the *current* buffer directly.

To switch to another buffer, you must create it first. You can use:

1. YY_BUFFER_STATE yy_create_buffer( FILE *file, int size ) to create buffer from file pointer

2. yy_scan_string(const char *str)/yy_scan_bytes(const char *bytes, int len)/yy_scan_buffer(char *base, yy_size_t size) to create buffer from in memory data

After buffer is created, you can switch to it using:
- yy_switch_to_buffer()

The lexer will then read from the new buffer in the further scaning.

When the buffer is not needed, on't forget to delete your created buffer using:
- yy_delete_buffer()

Please read Flex document for detailed doc about these functions.

[Reference]
http://fixunix.com/unix/526410-flex-bison-parse-command-line-arguments.html
http://flex.sourceforge.net/manual/Multiple-Input-Buffers.html
http://www.delorie.com/gnu/docs/flex/flex.1.html

No comments: