==TkGen - A Lexical Analyzer ==Generator
===What is tkgen?===
TkGen is a lexical analyzer generator (also known as scanner generator) for C++, written in C++.
===Input format===
Basicaly the input file a list of token names and regular expressions.Input file sample:{{{NUMBERSYMBOLBLANKSOPEN\(CLOSE\)MULTI\*DIV\/PLUS\+MINUS\-}}}
The output is a C++ header file with a DFA (deterministic finite automaton) transitions information.
The generated file is combined with two template classes to create the final scanner which recognizes tokens from a source of characters that can be a file or string for instance.
===Try tkgen online===
===How to use the generated code?===
To create a Tokenizer you will need two more classes
- TokenizerStream
- Tokenizer
Both can be found
Complete sample{{{cpp
include "stdafx.h"
include
include
//download it from http://www.thradams.com/codeblog/tkgencode.htm
include "tokenizer.h"
//generated by the compiler. copy from the online tkgen and paste it in your file
include "statemachine.h"
int { std::wifstream ss(argv FileTokenizerStream<wchar Tokenizer<StateMachine, FileTokenizerStream<wchar
std::wstring lexeme; Tokens token; while (tk.NextToken(lexeme, token)) { std::wcout << TokensToString(token) << L": '" << lexeme << L"'" << std::endl; }}}}}
Input file details
Tkgen accepts these regex syntax:{{{? : optional+ : one or more
- : zero or more
. : any char\ : escape0-9: range inside groups(Note: ^ is not yet supported)}}}
===Download sample===
===References===
- "Compilers: Principles, Techniques, and Tools", Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman
- An Implementation of Regular Expression Parser in C
===Acknowledgments===
- Cesar Mello for the incentive over the years to implement this kind of tokenizer generator based on DFAs. - Marcelo B. for the feedbacks and patience talking about NFA DFA etc.
===History===
- 18 nov 2009 : web page released
- 02 dez 2010 : compact version added