TkGen is a lexical analyzer generator (also known as scanner generator) for C++, written in C++.
Basicaly the input file a list of token names and regular expressions. Input file sample:
NUMBER
[0-9]+(\.[0-9]+)?
SYMBOL
[a-zA-Z]+([0-9]+[a-zA-Z]+)?
BLANKS
[\ \n]+
OPEN
\(
CLOSE
\)
MULTI
\*
DIV
\/
PLUS
\+
MINUS
\-
}}}
The output is a C++ header file with a DFA (deterministic finite automaton) transitions information.
The generated file is combined with two template classes to create the final scanner which recognizes tokens from a source of characters that can be a file or string for instance.
===Try tkgen online===
[[http://www.thradams.com/webtkgen.aspx]]
===How to use the generated code?===
To create a Tokenizer you will need two more classes * TokenizerStream * Tokenizer
Both can be found [[tkgencode.htm|Tokenizer and InputStream]] tokenizer
Complete sample {{{cpp
//download it from http://www.thradams.com/codeblog/tkgencode.htm
//generated by the compiler. copy from the online tkgen and paste it in your file
int _tmain(int argc, TCHAR* argv[])
{
std::wifstream ss(argv[1]);
FileTokenizerStream<wchart> fileStream(ss);
Tokenizer
std::wstring lexeme;
Tokens token;
while (tk.NextToken(lexeme, token))
{
std::wcout << TokensToString(token) << L": '" << lexeme << L"'" << std::endl;
}
}
}}}
Input file details
Tkgen accepts these regex syntax: {{{ ? : optional + : one or more * : zero or more . : any char [] : or-groups \ : escape 0-9: range inside groups (Note: ^ is not yet supported) }}}
===Download sample===
[[tkgensample1.zip]]
===References===
===Acknowledgments===
===History===