HOME

TkGen - A Lexical Analyzer Generator

What is tkgen?

TkGen is a lexical analyzer generator (also known as scanner generator) for C++, written in C++.

Input format

Basicaly the input file a list of token names and regular expressions. Input file sample:

NUMBER
[0-9]+(\.[0-9]+)?
SYMBOL
[a-zA-Z]+([0-9]+[a-zA-Z]+)?
BLANKS
[\ \n]+
OPEN
\(
CLOSE
\)
MULTI
\*
DIV
\/
PLUS
\+
MINUS
\-
}}}

The output is a C++ header file with a DFA (deterministic finite automaton) transitions information.

The generated file is combined with two template classes to create the final scanner which recognizes tokens from a source of characters that can be a file or string for instance.

===Try tkgen online===

[[http://www.thradams.com/webtkgen.aspx]]

===How to use the generated code?===

To create a Tokenizer you will need two more classes * TokenizerStream * Tokenizer

Both can be found [[tkgencode.htm|Tokenizer and InputStream]] tokenizer

Complete sample {{{cpp

include "stdafx.h"

include

include

//download it from http://www.thradams.com/codeblog/tkgencode.htm

include "tokenizer.h"

//generated by the compiler. copy from the online tkgen and paste it in your file

include "statemachine.h"

int _tmain(int argc, TCHAR* argv[]) { std::wifstream ss(argv[1]); FileTokenizerStream<wchart> fileStream(ss); Tokenizer > tk(fileStream);

std::wstring lexeme; Tokens token; while (tk.NextToken(lexeme, token)) { std::wcout << TokensToString(token) << L": '" << lexeme << L"'" << std::endl;
} } }}}

Input file details

Tkgen accepts these regex syntax: {{{ ? : optional + : one or more * : zero or more . : any char [] : or-groups \ : escape 0-9: range inside groups (Note: ^ is not yet supported) }}}

===Download sample===

[[tkgensample1.zip]]

===References===

===Acknowledgments===

===History===