LLBLGen Pro SDK - Template Sets

Template Sets - Extending the Template Definition Language (TDL)

Important:

Starting with v4.0 the TDL interpreter/parser have been frozen: no new functionality will be added, they'll only be adjusted to match changes in the LLBLGen Pro designer assemblies so the TDL interpreter works properly. This means that there's no functionality added for Table Valued Functions. From v4.0, it's recommeneded to write all templates using the .lpt system. To get started with the .lpt system, it's recommended to check out the various .lpt templates shipped with LLBLGen Pro. You can include .lpt templates inside TDL templates.

The TDL interpreter/parser won't be removed from the system in the future, they just won't receive new updates to support new functionality in the designer.

Preface

Although the TDL as it is shipped with LLBLGen Pro is sufficient enough to generate a wide range of possible code structures, it can be limiting for some situations. In those cases it might be handy to extend the TDL with new statements so you can generate exactly what you want. This section of the SDK documentation describes briefly the TDL parser and TDL Interpreter tandem and where you should place your code to extend these two to make them recognize your new statements. It's recommended to use .lpt templates whenever possible, and only extend the TDL language if you really have to.

The template parser and interpreter are not setup to have a pluggable language description, nor are they generated from a LR(1) grammar by a parser generator. The reason for not having an LR(1) parser generated from an LR(1) grammar is that the grammar is not LR(1), and will always cause shift/reduce conflicts, because of the pattern matching nature of the language, even though it's just a set of simple statements and 2 constructs. The parser is also now more straight forward, it doesn't require a degree in Computer Science to understand what it does, LR(1) grammar and parser technology would require some background information about how LR(1) parsing works and how LR(1) grammars work to be able to extend the grammar and language. The TDL parser/interpreter work with an LL(1)-like mechanism.

Some terminology is however unavoidable. This section uses terms like Terminal, Non-terminal, Token, Lexical Analyzer, Parser and Interpreter. These terms are described briefly when necessary. If you feel you need more information about the terms, feel free to use your favorite search engine, the terms used here are used in the way they are defined generally, so with a few definitions found on the Net you should be able to fully understand what's going on. If you are the lucky owner of "Compilers, Principles, Techniques and Tools" by Aho, Sethi and Ullman, you're all set and have the info you'll need to extend the TDL.

General structure

The complete structure of Lexical Analyzer, Parser and Interpreter is generally referred to as 'Parser' in most systems but as you see it's in fact consisting of 3 blocks. These blocks are executed in this order: the Lexical Analyzer reads the template and transforms the template in a series of Tokens. This Token stream is fed to the Parser, which transforms these tokens, or better: Terminals, into a stream of Non terminals. The stream of non-terminals is fed to the Interpreter which interprets the non-terminals and its tokens and emits texts to the output stream. Below are the terms explained in brief.

Term	Description
Token	A token is a named element which is recognized in the input stream. For example, in C# you have statements, like 'for'. 'for' is a 3 character string, but when recognized in the input stream, the 'for' string is transferred to a token, which is normally a number. Tokens are found by the Lexical Analyzer. TDL's Lexical Analyzer tokenizes (transfers strings to tokens) everything, also strings which are not represented by an element in the TDL language. Strings which do not represent a TDL element are tokenized as 'LiteralText'. This way the input stream consisting of solely ascii characters is transferred to a list of tokens which can be handled much easier. Tokens are objects, they contain their matching literal text, so an input stream can be re-created from the token stream.
Terminal	A terminal is an element in a grammar rule which is only defined at the right side of the '->' operator in grammar rules. Terminals and Non-terminals are terms related to the (E)BNF notation which is the general notation for defining grammars. To give you an example of an EBNF grammar rule, consider this: IfStatementStart -> If Expression Then The elements defined in italics are terminals, they represent tokens, thus here the literal strings 'If' and 'Then'. 'Expression' is not in italic, because it is a Non-terminal, it will be defined further in the grammar, and will then be specified at the left side of the '->' operator. You also see the 'IfStatementStart' element which is the Non-terminal of this grammar rule. This rule means that when you see If Expression Then in the input stream, you can reduce that with a the Non-terminal IfStatementStart. In the end, you can reduce all input to a single Non-terminal and parsing is then complete.
Non-terminal	See Terminal. An element in an EBNF grammar rule which can be specified at the left and at the right side of the '->' operator. Non-terminals always represent one or more terminals, they never directly represent tokens or strings of text. Non-terminals are essential, because when a stream of tokens is transformed to a stream of non-terminals by a parser, the interpreter can work much faster: it can decide what to do based on a single non-terminal and can be sure the non-terminal is correct, in other words: the token stream which forms the base of the non-terminal is correct.
Lexical Analyzer	Piece of functionality which scans the input stream of ascii strings for tokens. The TDL lexical analyzer uses a mechanism of regular expressions and tries to match these with the input stream. All matches are tokens, all non-matched text is literal text. It constructs a token stream of all tokens found and returns that stream as the lexical result.
Parser	Piece of functionality which scans the token stream supplied to it for sets of tokens. The TDL parser will look for sets of tokens are the tokens which form a non-terminal, for example Foreach Entity CrLf will be transformed into the EntityLoopStart non-terminal. All tokens will be transferred to non-terminals, errors will be transferred as LiteralText non-terminals, so the parser will always succeed.
Interpreter	Piece of functionality which will handle each non-terminal it has to handle. Each non-terminal is handled by a special handler routine, one per non-terminal. These routines are called by a general non-terminal handler which examines the current non-terminal and then calls the right handler routine. The interpreter doesn't know about a template set, it simply sees non-terminals, and based on these non-terminals, it emits strings to the output stream, be it text inside a token or data read from the project.

Adding a new statement

If you want to add a new statement to the TDL, you have to make changes to the Parser and the Interpreter. To be successful in this, it's best to first check out the code of the current parser and interpreter and how things are implemented. Below are the steps which should be taken to be successful in adding a statement to the TDL.

Add token and non-terminal definitions. First, you have to add new tokens to the Tokens enum in EnumsConstants.cs and the new non-terminal enum for your statement to the NonTerminals enum. Do not add tokens at the start of the enums, as the first 2 are reserved. Append new tokens at the end, preferably with a 'jump' in numbering so an update of the parser will not harm your code.
Add token definitions. To make the Lexical Analyzer be able to recognize your new token(s), you have to define regular expressions and constructor calls for each token. You do that in the Parser.cs file in the routine CreateTokenDefinitions. Look at the other tokens to get an idea what you have to do, it's pretty straightforward.
Add a statement handler to the Parser. The parser feeds the Lexical Analyzer with all the token definitions created in CreateTokenDefinitions. When it receives all tokens from the Lexical Analyzer, it will start parsing the tokens. The start token <[ and the end token ]> are being taken care of by the general routine ParseTokens. You have to add a routine to the statement handler, ParseStatement. This way, the parser will be able to call your token handler, which will transform the stream of tokens into a non-terminal. When you're adding a Name single token statement or a single token statement, it's wise to simply adjust the handlers for these non-terminals, as it frees you from implementing custom routines for creating your non-terminal and handling your non-terminal. Of course if you go for the custom token handler, you have to implement this routine yourself. Check out an existing token handler how to efficiently transform a stream of tokens into a non-terminal. You'll notice that 'NextForeach' is not part of a loop non-terminal. The parser collects all loop starts and if starts in two lists and if an if statement or foreach statement is not properly closed, the non-terminal belonging to the start of the if or foreach statement is transformed into a LiteralText non-terminal. This way the interpreter will not wait indefinitely for the NextForeach or EndIf statement but will simply emit the start of the if or foreach statement as literal text, showing you have an error.
Add a non-terminal handler to the Interpreter. To get some output from your statement, the interpreter has to do something when it sees your non-terminal. To make the Interpreter see your non-terminal, you have to add a call to the HandleNonTerminal() routine. This is the beating heart of the Interpreter, all non-terminals are handled from there. The routine is a simple switch based on the NonTerminal Id found in the current non-terminal. It simply calls the handler for the non-terminal seen. In your handler you can do whatever you want, for example scan the tokens for optional parameters, emit code or read project parameters and emit a single statement. You can also call the HandleTerminal() routine again from that routine to illustrate nested statements. See existing handlers for details about how to do simple but also complex operations.

Testing and debugging the parser/interpreter

When you're done with the implementation of your new statement, you can build the TDL parser and TDL interpreter assemblies. Close LLBLGen Pro and copy the debug builds of TDLParser and TDLInterpreter into the TaskPerformers directory of your LLBLGen Pro installation. Also copy the .pdb files into this directory, these files will point Visual Studio.NET to the right source files. When you've done that, you can simply add some of your new statements to a template (preferably a new template file), bind that template file to a templateID in a templatebindings file, define a task in the preset you want to use and specify the template id you've bound to your new template. It's then time to start LLBLGen Pro and generate some code using your new template and statement. If you want to debug the process, simply place a breakpoint into the Parser or Interpreter classes and attach to a running LLBLGen Pro process. Then generate code and the breakpoint should be hit by Visual Studio.NET and you can step through the code.