The scripting language

CodeWorker must be seen as a script interpreter that is intended to parse and to generate any kind of text or source code. This interpreter admits some options on the command line. Some of them look like those of a compiler.

CodeWorker doesn't provide any Graphical User Interface, but a console mode allows interactivity with the user.

1 Command line of the interpreter

The leader script is the name given to the script that is executed first by the interpreter. It exists six ways to pass this leader script to the interpreter via the command line:

To find easier a file to open for reading among some directories, the option -I specifies a path to explore. It gives more flexibility in sharing input files (both scripts and user files, excepting generated or expanded files) between directories, and it avoids relative or absolute paths into scripts.

It is possible to define some properties on the command line, thanks to option -define (or -D). These properties are intended to be exploited into scripts.

It is recommended to specify a kind of working directory with option -path. The assigned value is accessible into scripts via the function getWorkingPath(). This working directory generally indicates the output path for copying or generating files. The developer of scripts decides how to use it.

CodeWorker interprets scripts efficiently for speed. However, it is more convenient to run a standalone executable, instead of the interpreter and some script files. Moreover, once scripts are stable, why not to compile them as an executable to run the project a few times faster? Option -c++ allows translating the leader script and all its dependencies to C++ source codes, ready-to-compile.

To facilitate the tracking of errors, an integrated debugger is called thanks to the option -debug. It runs into the console, and some classical commands allow taking the control of the execution and exploring the stack and the variables.

Here are presented all switches that are allowed on the command line:

SwitchDescription
-args [arg]* Pass some arguments to the command line. The list of arguments stops at the end of the command line or as soon as an option is encountered. The arguments are stored in a global array variable called _ARGS.
-autoexpand file-to-expand The file file-to-expand is explored for expanding code at markups, executing a template-based script inserted just below each markup. It is identical to execute the script function autoexpand(file-to-expand, project).
-c++ generated-project-path
CodeWorker-path?
To translate the leader script and all its dependencies in C++ source code, once the execution of the leader script has achieved (same job as compileToCpp() compileToCpp()). The CodeWorker-path is optional and gives the path through includes and libraries of the software. However, it is now recommended to specify CodeWorker-path by the switch -home.
-c++2target script-file
generated-project-path target-language?
To translate the leader script and all its dependencies in C++ source code. Hence, the C++ is translated to a target language, all that once the execution of the leader script has achieved. Do not forget to give the path through includes and libraries of CodeWorker, setting the switch -home.
A preprocessor definition called "c++2target-path" is automatically created. It contains the path of the generated project. Call getProperty("c++2target-path") to retrieve the path value.
target-language is optional if at least one script of the project holds the target into its filename, just before the extension. Example: "myscript.java.cwt" means that the target language of this script is "java".
A property can follow the name of the target language, separated by a '=' symbol. The property is accessible via getProperty("c++2target-property"), and its nature depends on the target. For instance, in Java, this property represents the package the generated classes will belong to. Example: java=org.landscape.mountains.
-c++external filename To generate C++ source code for implementing all functions declared as external into scripts.
-commentBegin format To specify the format of a beginning of comment.
-commentEnd format To specify the format of a comment's end.
-compile scriptFile To compile a script file, just to check whether the syntax is correct.

SwitchDescription
-commands commandFile To load all arguments processed ordinary on the command-line. It must be the only switch or else passed on the command-line.
-console To open a console session (default mode if no script to interpret is specified via -script or -compile or -generate or -expand.
-debug [remote]? To debug a script in a console while executing it. The optional argument remote defines parameters for a remote socket control of the debugging session. remote looks like <hostname>:<port>. If <hostname> is empty, CodeWorker runs as a socket server.
-define VAR=value
or -D ...
To define some variables, as when using the C++ preprocessor or when passing properties to the JAVA compiler. These variables are similar to properties, insofar as they aren't exploited during the preprocessing of scripts to interpret. This option conforms to the format -define VAR when no value has to be assigned ; in that case, "true" is assigned by default to variable VAR. The script function getProperty("VAR") gives the value of variable VAR.
-expand pattern-script
file-to-expand
Script file pattern-script is executed to expand file file-to-expand into markups. It is identical to execute script function expand(pattern-script, project, file-to-expand).
-fast To optimize speed. While processing generation, the output file is built into memory, instead of into a temporary file.
-generate pattern-script
file-to-generate
Script file pattern-script is executed to generate file file-to-generate. It is identical to execute script function generate(pattern-script, project, file-to-generate).
-genheader text Adds a header at the beginning of all generated files, followed by a text (see procedure setGenerationHeader() setGenerationHeader()).
-help or ? Help about the command line.
-home CodeWorker-path Specifies the path to the home directory of CodeWorker.
-I path Specify a path to explore when trying to find a file while invoking include or parseFree or parseAsBNF or generate or expand or ... This option may be repeated to specify more than one path.
-insert variable_expression
value
Creates a new node in the main parse tree project and assigns a constant value to it. It is identical to execute the statement insert variable_expression = " value " ;.
-nologo The interpreter doesn't write the copyright in the shell at the beginning.

SwitchDescription
-nowarn warnings Specified warning types are ignored. They are separated by pipe symbols. Today, the only recognized type is undeclvar, which prevents the developer against the use of a undeclared variable.
-parseBNF BNF-parsing-script
source-file
The script file BNF-parsing-script parses source-file from an extended BNF grammar. It is identical to execute the script function parseAsBNF(BNF-parsing-script, project, source-file).
-path path Output directory, returned by the script function getWorkingPath(), and used ordinary to specify where to generate or copy a file.
-quantify [outputFile]? To execute scripts into quantify mode that consists of measuring the coverage and the time consuming. Results are saved to HTML file outputFile or displayed to the console if not present.
-report report-file
request-flag
To generate a report once the execution has achieved. The report is saved to file report-file and nature of information depends on the flag request-flag. This flag must be built by computing a bitwise OR for one or several of the following integer constants:
  • 1: provides every output file written by a template-based script (generate(), expand() or translate)
  • 2: provides every input file scanned by a BNF parse script (parseAsBNF() or translate())
  • 4: provides details of coverage recording for every output file using the #coverage directive
  • 8: provides details of coverage recording for every input file using the #matching directive
  • 16: provides details of coverage recording for every output file written by a template-based script
  • 32: provides details of coverage recording for every input file scanned by a BNF parse script
Notice that flags 16 and 32 may become highly time and memory consuming, depending both on how many input/output files you have to process and on their size.
-script script-file Defines the leader script, which will be executed first.
-stack depth To limit the recursive call of functions, for avoiding an overflow stack memory. By default, the depth is set to 1000.
-stdin filename To change the standard input for reading from an existing file. It may be useful for running a scenario.
-stdout filename To change the standard output for writing it to a file.
-time To display the execution time expressed in milliseconds, just before exiting.

SwitchDescription
-translate translation-script
source-file file-to-generate
Script file translation-script processes a source-to-source translation. It is identical to execute the script function translate(translation-script, project, source-file, file-to-generate).
-varexist To trigger a warning when the value of a variable that doesn't exist is required into a script.
-verbose To display internal messages of the interpreter (information).
-version version-name To force interpreted scripts as written in a precedent version given by version-name.

Note that the interpreter proposes a convenient way for running a common script with arguments:

codeworker <script-file> <arg1> ... <argN> [<switch>]*

This writing replaces the more verbose:

codeworker -script <script-file> -args <arg1> ... <argN> [<switch>]*

A console mode is launched when the command line is empty. The console only accepts scripts written in the common syntax, with common functions and procedures. So, parsing and generation scripts aren't typed directly on the console.

2 Syntax generalities and statements

A script in CodeWorker consists of a series of statements that are organized into blocks (also known as compound statements). A statement is an instruction the interpreter has to execute.

A single statement must close with a semicolon (';'). A compound statement is defined by enclosing instructions between braces ('{}'). A block can be used everywhere you can use a single statement and must never end with a semicolon after the trailing brace.

Comments are indicated either by surrounding the text with '/*' and '*/' or by preceding the rest of the line to ignore with a double slash ('//').

It exists three families of scripts here. To facilitate their syntax highlighting in editors, or to indicate briefly the type of the script, we suggest to employ some file extensions, depending on the nature of the script. The next table exposes the different extensions used commonly in CodeWorker.

ExtensionDescription
".cwt" a template-based script, for text generation
".cwp" a extended-BNF parse script, for parsing text
".cws" a common script, none of the precedent

The structure of the grammar is so rich that it is a challenge to find an editor, which offers a syntax highlighting engine powerful enough. JEdit proposes the writing of production rules to describe it, so it is possible to express the syntax highlighting of the scripting language.

You'll find a package dedicated to JEdit on the Web site, for the inclusion of these new highlighting modes. Many thanks to Patrick Brannan for this contribution.

2.1 preprocessor directives

A preprocessor directive always starts with a '#' symbol and is followed by the name of the directive.

2.1.1 Including a file

The #include filename directive tells the preprocessor to replace the directive at the point where it appears by the contents of the file specified by the constant string filename. The preprocessor looks for the file in the current directory and then searches along the path specified by the -I option on the command line.

2.1.2 Extending the language via a package

A package is an extension of the scripting language that allows adding new functions in CodeWorker at runtime. A package is implemented as an executable module, which exports all new functions the developer wants to make available in the interpreter.

Loading of a package

The preprocessor directive #use tells the interpreter that it must extend itself with the functions exposed by a package.

The syntax is: #use package-name

Loading a package more than once has no effect.

The name of the package must prefix the name of the function, when calling it: package-name::my-function(parameters...)

Example:

#use PGSQL
PGSQL::connect("-U pilot -d emergencyDB");
local sRequest = "SELECT solution FROM average_adjustment WHERE damage = 'broken wing'";
local listOfSolutions;
PGSQL::selectList(sRequest, listOfSolutions);
if listOfSolutions.empty()
  traceLine("No solution. Suggestion: parachute jump?");
else {
  traceLine("Solutions:");
  foreach i in listOfSolutions
    traceLine(" -" + i);
}
PGSQL::disconnect(); // if the plane hasn't crashed yet

The PGSQL package serves here for connecting to and querying a PostGreSQL database. For this example, the package exports three functions: PGSQL::connect, PGSQL::selectList and PGSQL::disconnect.

The executable module

CodeWorker expects a dynamic library, whose name is deduced from the package name and from the platform the interpreter is running to.
The short name of the dynamic library concatenates "cw" at the end of the package name. The extension of the dynamic library must be ".dll" under Microsoft Windows, and ".so" under Linux.

You must put the dynamic library at a place where CodeWorker will find it at runtime.
Microsoft Windows proceeds in the following order to locate the library:

Under Unix, a relative path for the shared object refers to the current directory (according to the man description of dlopen(3C)).

So, when CodeWorker reads #use PGSQL, it searches a dynamic library called "PGSQLcw.dll" under Windows or "PGSQLcw.so" under Linux.

Building a package

This section is intended to those that want to build their own packages, for binding to a database or to a graphical library ... or just for gluing with their own libraries.

When the interpreter find the preprocessor directive #use package-name in a script, it loads the executable module and executes the exported C-like function CW4DL_EXPORT_SYMBOL void package-name_Init(CW4dl::Interpreter*).

The preprocessor definition CW4DL_EXPORT_SYMBOL and the namespace CW4dl are both declared in the C++ header file "CW4dl.h". This header file is located in the "include" directory if you downloaded binaries, and at the root of the project if you downloaded sources.

The C-like function 'package-name_Init()' MUST be present! C-like means that it is declared extern "C" (done by CW4DL_EXPORT_SYMBOL).

Initializing the module that way is useful for registering new functions in the engine, via the function createCommand() of the interpreter (see the header file "CW4dl.h" in the declaration of the class Interpreter for learning more about it).

Every function to export must start its declaration with the preprocessor definition CW4DL_EXPORT_SYMBOL (means 'extern "C"', but a little more under Windows).

Every function returns const char*. The CodeWorker's keyword null designates an atypical tree node. It doesn't accept navigation and reference, only passing by parameter to a function. On the C++ side, this null tree node is seen as a null pointer of kind CW4dl::Tree*.

The interpreter CW4dl::Interpreter represents the runtime context of CodeWorker. It is the unavoidable intermediary between the module you are building and CodeWorker.
Use it for:

The #line directive forces to another number the line counter of the script file being parsed. The line just after the directive is supposed to be worth the number specified after #line.

2.1.3 Changing the syntax of the scripting language

The #syntax directive tells the preprocessor not to parse the following instructions as classical statements of the scripting language, but as conforming to another syntax. It allows adapting the syntax to what you are programming: The directive admits the following writing:
"#syntax" [parsing-mode [':' BNF-script-file]? | BNF-script-file]

How does it work? The piece of source code, which doesn't conform to the syntax of the script language, is put between the directives #syntax ... and #end syntax. If the trailing directive isn't found, the remaining of the script is considered as written in a foreign syntax. Be careful that the trailing directive must start at the beginning of the line necessary to be recognized and that no spaces are allowed between # and end.
At runtime, the famous piece of source code is parsed and processed via the BNF script file.

Note that it is possible to attach an identifier (called parsing-mode above) to a script file, and to specify later, in any other script, the parsing mode only; CodeWorker will find the corresponding BNF script file. It avoids to handle a physical name of the BNF parsing file, where a logical name of parsing mode is more convenient.

Example:

     // the first time, a parsing mode may be attached to the BNF script file
     #syntax shell:"TinyShell.cwp"
     ...
     #end syntax
     
     // at the second call, it isn't recommended to use the path of the parsing file
     // it is better to use the parsing mode registered previously
     #syntax shell
     ...
     #end syntax
     
     // here, I know that I'll call it once only, so I don't care about a parsing mode
     #syntax "MakeFile.cwp"
     ...
     #end syntax

where the parsing script "TinyShell.cwp" might be worth:

      // file "GettingStarted/TinyShell.cwp":
      tinyShell ::=
              #ignore(C++)
              #continue
              [
                  #readIdentifier:sCommand
                  #ignore(blanks) #continue
                  command<sCommand>
              ]* #empty;
     
      //----------------------------//
      // commands of the tiny shell //
      //----------------------------//
      command<"copy"> ::=
              #continue parameter:sSource parameter:sDestination
              => {copyFile(sSource, sDestination);};
     
      command<"rmdir"> ::=
              #continue parameter:sDirectory
              => {removeDirectory(sDirectory);};
     
      command<"del"> ::=
              #continue parameter:sFile
              => {deleteFile(sFile);};
     
     
      //--------------------
      // Some useful clauses
      //--------------------
      parameter:value ::=
              #readCString:parameter
                  |
              #!ignore #continue [~[' ' | '\t' | '\r' | '\n']]+:parameter;

Of course, the parsing and the processing are implemented in the scripting language, so changing the syntax will be slower than keeping the default one. However, it allows writing a code easy to support and to understand.

2.1.4 Managing changes in a multi-language generation

The directives #reference and #attach serve to be notified when a change has been made into a script for generating in a given language, but not taken back in another language. For example, you are writing a framework both in C++ and JAVA. You are adding some new features in C++ or correcting some mistakes. One day, you'll be care not to forget to update the JAVA generation. In fact, thanks to these directives, a warning will be produced up to changes will have been put in the other script.

How does it work? Directives must delimit the piece of script you have changed:
"#reference" key
...
"#end" key

The key is an identifier that allows putting more than one reference area into a script file. A #reference area might cover one or more #reference directives, without confusing about boundaries. The directive must be put at the beginning of the line.

Here are the directives delimiting the piece of script that should be updated later in another file:
"#attach" reference-file ':' reference-key
...
"#end" reference-key

A #attach area might cover one or more #reference or #attach directives, as a #reference area. The directive must be put at the beginning of the line.

The first time CodeWorker will encounter the reference script file, it will compute a number that depends on the content of the area. The first time CodeWorker will encounter an attached script file, it will get back the magic number of the reference area, found both by the file name and the key of the reference. And then, at the beginning, the reference and attached areas are considered as similar. CodeWorker stores the magic number of the reference just behind the #attach directive:
"#attach" reference-file ':' reference-key ',' reference-number

In fact, a script file that must be updated, so as to store the magic numbers for some attached areas, takes into account the modifications at the end of the parsing, and only if no error was encountered. If the writefileHook() function (see writefileHook) is implemented, it is called and the script file doesn't change if it returns false. If the script file is read-only, the corresponding readonlyHook() function is called (see readonlyHook). If it isn't possible to save the script file, an error is thrown.

When a change occurs in the reference area, the next time CodeWorker will encounter it, the magic number will be recomputed. When an attached piece of script is encountered after the change, the old magic number of the reference is compared to the new one. If they aren't the same, a warning is displayed to notify that the attached area hasn't been updated yet.

Once the changes have been taken back into the attached area, the magic number of the reference must be cut (don't forget the comma too!). And so, the next time this attached area will be encountered by the interpreter, it will get back the magic number of the reference area. And then, the reference area and the attached area are considered as similar once again.

Of course, the use of these directives is quite constraining. However, it is the only way in CodeWorker to assure that features and corrections have been taken back in all generated languages.

2.2 Constant literals

CodeWorker handles all basic types as strings, and doesn't distinguish a double from a boolean or a date. A string literal is a sequence of characters from the source character set enclosed in double quotation marks (" "). String literals are used to represent a sequence of characters which, taken together, form a null-terminated string. The interpretation done of the data depends on the context: function

increment(index)

expects that its argument index contains a number, but stored as a string.

A constant tree describes a tree as a list of constant trees and expressions, intended to be assigned to a variable. Example:

local aVariable = "a"{["yellow", "red":"or"{.alternative="orange"

], .vehicle="submarine"};}

You'll find more information in the sub section Scope below.

2.3 Variables, declaration and assignment

Variables serve as containers for the data you use into scripts. Data type is a tree that may be reduced to a leaf node, which contains a value and that's all.

2.3.1 Declaring variables

It isn't necessary to declare a variable before using if for the first time. A variable that is assigned without being declared is understood as a new sub-node to be added to the current tree context. The current context is obtained by the read-only variable called this. It corresponds to the main parse tree whose root name is project when you are into the leader script, and to the variable passed by parameter when calling a parsing or pattern script.

The next table exposes all pre-defined variable names (accessible from anywhere) and their meaning:

Variable NameDescription
project The main parse tree, always present.
this It points to the current context variable.
_ARGS An array of all custom command-line arguments. Custom arguments are following the script file name or the switch -args on the command-line.
_REQUEST If the interpreter works as a CGI program, it stores all parameters of the request in a association table. The key is the parameter name, which associates the corresponding value.

A variable that is read without being declared returns an empty string, but doesn't cause the creation of a sub-node. The danger is that you aren't safe from a spelling mistake. To prevent it, put the option -varexist on the command line and use the function existVariable() to check whether a variable exists or not.

2.3.2 Scope

When you declare a local variable, it is valid for use within a specific area of code, called the scope. When the flow of execution leaves the scope, the content of the variable, a subtree specially allocated during its declaration, is deleted and disappears forever from the stack. A scope is delimited by a block.

To declare a variable to the stack, use the following declaration statement:
local-variable-statement ::= "local" local-variable-declaration ';'
local-variable-declaration ::= variable [ '=' assignment-expression ]?
assignment-expression ::= constant-tree | expression
constant-tree ::= [tree-value]? '{' [tree-array-or-attribute [',' tree-array-or-attribute]* ]? '}'
tree-value ::= expression
tree-array-or-attribute ::= tree-array | tree-attribute
tree-attribute ::= '.' attribute-name '=' assignment-expression
tree-array ::= '[' tree-array-item [',' tree-array-item]* ']'
tree-array-item ::= expression ':' assignment-expression | assignment-expression

An extension of the syntax allows the declaration of more than one variable in one shot. A comma separates the variable declarations:
local-variable-statement ::= "local" local-variable-declaration [ ',' local-variable-declaration ]* ';'

The local variable points to a new empty tree, pushed into the stack.

To assign a reference to another variable, instead of either the result of evaluating an expression or a constant tree, use rather the following declaration statement:
local-ref-statement ::= "localref" local-ref-declaration [ ',' local-ref-declaration ]* ';'
local-ref-declaration ::= variable '=' reference

In the case of a CodeWorker version strictly older than 1.13, local variables that are declared in the body of a script or in the scope of a function may be accessed further in the scope of functions during their timelife. So a different behaviour may occur with a more recent CodeWorker interpreter.

This stack management had historical reasons, but it is now obsolete and often reflects an implementation's error. To preserve you from this kind of mistake, a warning may be displayed, so that scripts strictly older than version 1.13 may continue to run. Specify a version strictly older than 1.13 to the command line (option -version) for reclaiming that CodeWorker checks and generates a warning.

To correct this kind of mistake in old scripts, the variable should be propagated in an argument for functions that refer to it.

To declare a global variable, use the global statement. The declaration of a global variable can be specified anywhere in scripts. The first time the declaration of a global variable is encountered, the interpreter registers it as accessible from any point into scripts. The second time the interpreter encounters a global declaration for the variable, the latter remains global but its content is cleared.
Note that if a local variable or an attribute of the current node (this) is identical to the name of an existing global variable, the global variable remains hidden while the flow of control hasn't left the scope that contains the homonym.

the global declaration statement looks like:
global-variable-statement ::= "global" global-variable-declaration [ ',' global-variable-declaration ]* ';'
global-variable-declaration ::= variable [ '=' assignment-expression ]?

2.3.3 Navigating along branches

It is possible to navigate along a branch of the subtree put into the variable. A branch points to a node of the subtree. The syntax looks generally like:
branch ::= variable ['.' sub-node]*

If the branch isn't known before runtime, it may be build during the execution.

Example: while parsing an XML file, each time an XML attribute is encountered, one creates the corresponding attribute into the parse tree. But the name of the attribute is discovered during the parsing. The directive #evaluateVariable(expression) allows doing it. expression is evaluated at runtime and provides a branch:

#evaluateVariable("a.b.c")

will resolve the path "a.b.c" at runtime and navigate from a to textit{c}.

A node may contain an array of nodes, which are indexed by a key that is a constant string. A branch allows navigating through arrays, and the definitive syntax of branches conforms to:
branch ::= "#evaluateVariable" '(' expression ')'
                ::= variable ['.' sub-node | array-access]*
array-access ::= '[' expression ']'
                ::= '#' ["front" | "back" | "parent"] | "root"]
                ::= '#' '[' integer-expression ']'

We see that there are some ways to access an item node of an array or to change how to navigate from nodes to nodes:

2.3.4 Assignments

CodeWorker provides some different ways to put a data into a variable or into the node pointed to by a branch:

2.4 Expressions

2.4.1 Presentation

The BNF representation of an expression looks like:
expression ::= boolean-expr | ternary-expr
boolean-expr ::= comparison-expr [boolean-op comparison-expr]
boolean-op ::= '&' | '&&' | '|' | '||' | '^' | '^^'
ternary-expr ::= comparison-expr '?' expression ':' expression
comparison-expr ::= concatenation-expr [comparison-op concatenation-expr | "in" constant-set]
constant-set ::= '{' constant-string [',' constant-string]* '}'
comparison-op ::= '<' | '<=' | '==' | '=' | '!=' | '<>' | '>' | '>='
concatenation-expr ::= stdliteral-expr ['+' stdliteral-expr]*
stdliteral-expr ::= literal-expr
                ::= '$' arithmetic-expr '$'
literal-expr ::= constant-string | number
                ::= "true" | "false"
                ::= '(' expression ')'
                ::= '!' literal-expr
                ::= preprocessor-expr
                ::= function-call
                ::= variable-or-branch

arithmetic-expr ::= comparith-expr [boolean-op comparith-expr]*
comparith-expr ::= sum-expr [comparison-op sum-expr]
sum-expr ::= shift-expr [['+' | '-'] shift-expr]*
shift-expr ::= factor-expr [["<<" | ">>"] factor-expr]*
factor-expr ::= literal-expr [['*' | '/' | '%'] literal-expr]*
unary-expr ::= literal-expr ["++" | "--"]
literal-expr ::= string | variable-expr | number | unary-expr
                ::= '~' literal-expr
preprocessor-expr ::= '#' ["LINE" | "FILE"]

where:

2.4.2 Arithmetic expressions

The classical syntax of the interpreter forces expressions to work on sequences of characters. So, comparison operators apply the lexicographical order and the '+' operator concatenates two strings and the '*' operator doesn't exist.

Of course, it exists some functions to handle strings as number and to execute an arithmetic operation (the 'add()' or 'mult()' functions for instance) or a comparison (the 'isPositive()' or 'inf()' functions for instance).

However, it appears clearly more convenient to write arithmetic operations and comparisons in a natural way, using operators instead of the corresponding functions. So, CodeWorker provides an escape mode that draws its inspiration from LaTeX to express mathematical formulas: the arithmetic expression are delimited by the symbol '$'.

Example:


local a = 11;
local b = 7;
traceLine("Classical mode = '"
    + inf(add(mult(5, a), 3), sub(mult(a, a), mult(b, b))) + "'");
traceLine("Escape mode = '" + $5*a + 3 < a*a - b*b$ + "'");

Output:

Classical mode = 'true'
Escape mode = 'true'

2.5 Common statements

2.5.1 The 'if' statement

The BNF representation of the while statement is:
if-statement ::= "if" expression then-statement ["else" else-statement]?

The if statement evaluates the expression following immediately. The expression must be of arithmetic, text, variable or condition type. In both forms of the if syntax, if the expression evaluates to a nonempty string, the statement dependent on the evaluation is executed; otherwise, it is skipped.

In the if...else syntax, the second statement is executed if the result of evaluating the expression is an empty string. The else clause of an if...else statement is associated with the closest previous if statement that does not have a corresponding else statement.

2.5.2 The 'while'/'do' statements

The BNF representation of the while statement is:
while_statement ::= "while" expression statement

The while statement lets you repeat a statement or compound statement as long as a specified expression becomes an empty string. The expression in a while statement is evaluated before the body of the loop is executed. Therefore, the body of the loop may be never executed. If expression returns an empty string, the while statement terminates and control passes to the next statement in the program. If expression is non-empty, the process is repeated. The while statement can also terminate when a break, or return statement is executed within the statement body. When a continue statement is encountered, the control breaks the flow and jumps to the evaluation of the expression.

Note that the break and continue statements apply to the first loop statement (foreach/forfile/select, do/while) they encounter while leaving instruction blocks.

The BNF representation of the do statement is:
do_statement ::= "do" statement "while" expression ';'

The do-while statement lets you repeat a statement or compound statement until a specified expression becomes an empty string. The expression in a do-while statement is evaluated after the body of the loop is executed. Therefore, the body of the loop is always executed at least once. If expression returns an empty string, the do-while statement terminates and control passes to the next statement in the program. If expression is non-empty, the process is repeated. The do-while statement can also terminate when a break, or return statement is executed within the statement body. When a continue statement is encountered, control is transferred to the evaluation of the expression.

2.5.3 The 'switch' statement

The BNF representation of this statement is:
switch_statement ::= "switch" '(' expression ')' '{' (label_declaration)* ("default" ':' statement)? '}'
label_declaration ::= ["case" | "start"] constant_string ':' statement

The switch statement allows selection among multiple sections of code, depending on the value of an expression. The expression enclosed in parentheses, the controlling expression, must be of string type.

The switch statement causes an unconditional jump to, into, or past the statement that is the switch body, depending on the value of the controlling expression, the constant string values of the case or start labels, and the presence or absence of a default label. The switch body is normally a compound statement (although this is not a syntactic requirement). Usually, some of the statements in the switch body are labeled with case labels or with start labels or with the default label. The default label can appear only once.

The constant-string in the case label is compared for equality with the controlling expression. The constant-string in the start label is compared for equality with the first characters of the controlling expression. In a given switch statement, no two constant strings in start or case statements can evaluate to the same value.

The switch statement behaviour depends on how the controlling expression matches with labels. If a case label exactly matches with the controlling expression, control is transferred to the statement following that label. If failed, start labels are iterated into the lexicographical order, and the control is transferred to the statement following the first label that matches with the beginning of the controlling expression. If failed, control is transferred to the default statement or, if not present, an error is thrown.

A switch statement can be nested. In such cases, case or start or default labels associate with the most deeply nested switch statements that enclose them.

Control is not impeded by case or start or default labels. To stop execution at the end of a part of the compound statement, insert a break statement. This transfers control to the statement after the switch statement.

2.5.4 The 'foreach' statement

The BNF representation of this statement is:
foreach_statement ::= "foreach" iterator "in" [direction]?
                [sorted_declaration]? [cascading_declaration]? list-node body_statement
direction ::= "reverse"
sorted_declaration ::= "sorted" ["no_case"]? ["by_value"]?
cascading_declaration ::= "cascading" ["first" | "last"]?

A foreach statement iterates all items of the list owned by node list-node. The iterator refers to the current item of the list, and the body statement is executed on it.

Items are iterated either in the order of entrance, or in alphabetical order if option sorted is set. The sort operates on keys, except if the option by_value is set. The order is inverted if option reverse was chosen. To ignore the case, these options must be followed by no_case. If not, uppercase letters are considered as smaller than any lowercase letter.

      // file "Documentation/ForeachSampleSorted.cws":
      local list;
      insert list["silverware"] = "tea spoon";
      insert list["Mountain"] = "Everest";
      insert list["SilverWare"] = "Tea Spoon";
      insert list["Boat"] = "Titanic";
      insert list["acrobat"] = "Circus";
     
      traceLine("Sorted list in a classical order:");
      foreach i in sorted list {
          traceLine("\t" + key(i));
      }
      traceLine("Note that uppercases are listed before lowercases." + endl());
     
      traceLine("Sorted list where the case is ignored:");
      foreach i in sorted no_case list {
          traceLine("\t" + key(i));
      }
     
      traceLine("Reverse sorted list:");
      foreach i in reverse sorted list {
          traceLine("\t" + key(i));
      }
     
      traceLine("Reverse sorted list where the case is ignored:");
      foreach i in reverse sorted no_case list {
          traceLine("\t" + key(i));
      }

Output:

Sorted list in a classical order:
    Boat
    Mountain
    SilverWare
    acrobat
    silverware
Note that uppercases are listed before lowercases.

Sorted list where the case is ignored:
    acrobat
    Boat
    Mountain
    SilverWare
    silverware
Reverse sorted list:
    silverware
    acrobat
    SilverWare
    Mountain
    Boat
Reverse sorted list where the case is ignored:
    silverware
    SilverWare
    Mountain
    Boat
    acrobat

Control may not be sequential into the body statement. break and return enable exiting definitely the loop, and continue transfers the control to the head of the foreach statement for the next iteration.

Option cascading allows propagating foreach on item nodes. The way it works is illustrated by an example:


    foreach i in cascading myObjectModeling.packages ...

At the beginning, i points to myObjectModeling.packages#front and the body is executed. Before iterating i to the next item, the foreach checks whether the item node myObjectModeling.packages#front owns attribute packages or not. If yes, it applies recursively foreach on myObjectModeling.packages#front.packages.

Option cascading avoids writing the following code:


function propagateOnPackages(myPackage : node) {
    foreach i in myPackage {
        // my code to apply on this package
        if existVariable(myPackages.packages)
            propagateOnPackages(myPackages.packages);
    }
}
propagateOnPackages(myObjectModeling.packages);

Option cascading offers two behaviours:

2.5.5 The 'forfile' statement

The BNF representation of this statement is:
forfile_statement ::= "forfile" iterator "in" [sorted_declaration]? [cascading_declaration]? file-pattern body_statement
sorted_declaration ::= "sorted" ["no_case"]?
cascading_declaration ::= "cascading" ["first" | "last"]?

A forfile statement iterates the name of all files that verify the filter file-pattern. The iterator refers to the current item of the list composed of retained file names, and the body statement is executed on it. Note that the file pattern may begin with a path, which cannot contain jocker characters ('*' and '?').

Like for the foreach statement, items are iterated either in the order of entrance, or in alphabetical order of keys if option sorted is set. To ignore the case, the option must be followed by no_case. If not, uppercase letters are considered as smaller than any lowercase letter.

Control may not be sequential into the body statement. break and return enable exiting definitely the loop, and continue transfers the control to the head of the forfile statement for the next iteration.

The option cascading allows propagating forfile on directories recursively. The way it works is illustrated by an example:

      // file "Documentation/ForfileSample.cws":
      local iIndex = 0;
      forfile i in cascading "*.html" {
          if $findString(i, "manual_") < 0$ &&
              $findString(i, "Bugs") < 0$ {
                  traceLine(i);
          }
          // if too long, stop the iteration
          if $iIndex > 15$ break;
          increment(iIndex);
      }

Output:

cs/DOTNET.html
cs/tests/data/MatchingTest/example.csv.html
Documentation/LastChanges.html
java/JAVAAPI.html
java/data/MatchingTest/example.csv.html
Scripts/Tutorial/GettingStarted/defaultDocumentation.html
WebSite/AllDownloads.html
WebSite/examples/basicInformation.html
WebSite/highlighting/basicInformation.html
WebSite/repository/highlighting.html
WebSite/repository/JEdit/Entity.java.cwt.html
WebSite/serewin/ExempleIllustre.html
WebSite/tutorials/DesignSpecificModeling/tutorial.html
WebSite/tutorials/DesignSpecificModeling/highlighting/demo.cws.html
WebSite/tutorials/overview/tinyDSL_spec.html
WebSite/tutorials/overview/scripts2HTML/CodeWorker_grammar.html

At the beginning, i points to the first HTML file of the current directory and the body is executed. Before iterating i to the next item, the forfile checks whether the directory of the current file owns subfolders or not. If yes, it applies recursively forfile on subfolders.

Option cascading offers two behaviours:

2.5.6 The 'select' statement

The BNF representation of this statement is:
select_statement ::= "select" iterator "in" [sorted_declaration]? node-motif body_statement
sorted_declaration ::= "sorted" first-key [, other-key]*
first-key ::= branch
other-key ::= branch

A select statement iterates a list of nodes that match a motif expression. The iterator refers to the current item of the list composed of retained nodes, and the body statement is executed on it.

      // file "Documentation/SelectSample.cws":
      local a;
      pushItem a.b;
      pushItem a.b#back.c = "01";
      pushItem a.b#back.c = "02";
      pushItem a.b#back.c = "03";
      pushItem a.b;
      pushItem a.b#back.c = "11";
      pushItem a.b#back.c = "12";
      pushItem a.b#back.c = "13";
      pushItem a.b;
      pushItem a.b#back.c = "21";
      pushItem a.b#back.c = "22";
      pushItem a.b#back.c = "23";
      select i in a.b[].c[] {
          traceLine("i = "+ i);
      }

Output:

i = 01
i = 02
i = 03
i = 11
i = 12
i = 13
i = 21
i = 22
i = 23

Like for the foreach statement, items are iterated either in the order of entrance, or according to the sorting result if the option sorted is set.

Control may not be sequential into the body statement. break and return enable exiting definitely the loop, and continue transfers the control to the head of the select statement for the next iteration.

2.5.7 The 'try'/'catch' statement

The BNF representation of this statement is:
try-catch-statement ::= "try" try-statement "catch" '('error_message_variable')' catch-statement

Error handling is implemented by using the try, catch, and error keyword. With error handling, your program can communicate unexpected events to a higher execution context that is better able to recover from such abnormal events. These errors are handled by code that is outside the normal flow of control.

The compound statement after the try clause is the guarded section of code. An error is thrown (or raised) when command error(message-text) is called or when CodeWorker encounters an internal error. The compound statement after the catch clause is the error handler, and catches (handles) the error thrown. The catch clause statement indicates the name of the variable that must receive the error message.

2.5.8 The 'exit' statement

The BNF representation of this statement is:
exit_statement ::= "exit" integer-expression ";"

A exit statement leaves the application and returns an error code, given by the integer-expression.

Example:

exit -1;

2.6 User-defined functions

The BNF representation of a user-defined function to implement is:
user-function ::= classical-function-definition | template-function-definition
classical-function-definition ::= classical-function-prototype compound-statement
classical-function-prototype ::= "function" function-name '(' parameters ')'
template-function-definition ::= see the next section,
template function, for more information
parameters ::= parameter [',' parameter]*
parameter ::= argument [':' parameter-mode [':' default-value]? ]?
parameter-mode ::= "value" | "node" | "reference" | "index"
default-value ::= "project" | "this" | "null" | "true" | "false" | constant-string

The scripting language allows the user implementing its own functions. Parameters may be passed to the body of the function. A value may be returned by the function and, if so, the return type is necessary a sequence of characters. Of course, functions manage their own stack, and so, accept recursive calls.

An argument may have a default value if the parameter is missing in a call. All following arguments must then have default values too. A node argument can't have a constant string as a default argument, but it can be worth a global variable.

2.6.1 Parameters and return value

Arguments passed by parameter must be chosen among the following modes:

If you have omitted to return a value from a function, it returns an empty string ; in that case, you expects to call this function as a procedure and the result isn't exploited. The special procedure nop takes a function call as parameter and allows executing the function and ignoring the result. It isn't compulsory to use nop for calling a function as a procedure. As in C or C++, you can type the function call followed by a semi-colon and the result is lost.

It exists two possibilities for returning a value:

If you wish to execute a particular process in any case before leaving a function and:

2.6.2 The 'finally' statement

the statement finally warrants you that the block of instructions that follows the keyword will be systematically executed before leaving. This declaration may be placed anywhere into the body of the function. Its syntax conforms to:
finally-statement ::= "finally" compound-statement

Example:

      // file "Documentation/FinallySample.cws":
      1 function f(v : value) {
      2     traceLine("BEGIN f(v)");
      3     finally {
      4         traceLine("END f(v)");
      5     }
      6     // the body of the function, with more than
      7     // one way to exit the function, for example:
      8     if !v return "empty";
      9     if v == "1" return "first";
      10     if v == "2" return "second";
      11     if v == "3" return "third";
      12     return "other";
      13 }
      14
      15 traceLine("...f(1) has been executed and returned '" + f(1) + "'");

line 3: the finally statement is put anywhere in the body,
line 4: this statement will be executed while exiting the function, even if an exception was raised,

Output:

BEGIN f(v)
END f(v)
...f(1) has been executed and returned 'first'

2.6.3 Unusual function declarations

It may arrive that a function prototype must be declared before being implemented, because of a cross-reference with another function for instance. The scripting language offers the forward declaration to answer this need. To do that, the prototype of the function is written, preceded by the declare keyword:
forward-declaration ::= "declare" function-prototype ';'

If the body of the function must be implemented in another library and into C++ for example, the prototype of the function is preceded by the external keyword (see section C++ binding):
external-declaration ::= "external" function-prototype ';'

2.6.4 Template functions

CodeWorker proposes a special category of functions called template functions. Because of CodeWorker doesn't provide a typed scripting language, template hasn't to be understood as it is commonly exploited in C++ for instance.

A template function represents a set of functions with the same prototype, except the dispatching constant. The dispatching constant is a constant string that extends that name of the function. These functions instantiate the template function for a particular dispatching constant. Each instantiated function implements its own body.

The BNF representation of a template function to implement is:
template-function-definition ::= instantiated-function-definition | generic-function-definition
instantiated-function-definition ::= instantiated-function-prototype compound-statement
instantiated-function-prototype ::= "function" function-name '<' dispatching-constant '>' '(' parameters ')'
dispatching-constant ::= a constant string between double quotes
generic-function-definition ::= generic-function-prototype [compound-statement | template-based-body]
generic-function-prototype ::= "function" function-name '<' generic-key '>' '(' parameters ')'
generic-key ::= an identifier that matches any dispatching constant with no attached prototype
template-based-body ::= "{{" template-based-script "}}"
template-based-script ::= a piece of template-based script describing the generic implementation

A call to a template function requires to provide a dispatching expression to determine the dispatching constant. The dispatching expression will be evaluated during the execution and CodeWorker will resolve what instantiated function of this template to call: the result of the dispatching expression must match with the dispatching constant of the instantiated function. The BNF representation of a call to a template function is:
instantiated-function-call ::= function-name '<' dispatching-expression '>' '(' parameters ')'
parameters ::= expression [',' expression]*

Note that a dispatching constant may be empty and such an instantiated function can be called as a classical function. In fact, classical functions are considered as instantiated functions where the dispatching constant is empty.

template functions bring generic programming in the language: let imagine that we need function getType(myType : node), to decline for every language we could have to generate (C++, Java, ...). Normally, you'll write the following lines to recover the type depending on the language for which you are producing the source code:


if doc_language == "C++" {
    sType = getCppType(myParameterType);
} else if doc_language == "JAVA" {
    sType = getJAVAType(myParameterType);
} else {
    error("unrecognized language '" + doc_language + "'");
}

Thanks to the template functions, you may replace the precedent lines by the next one:


sType = getType<doc_language>(myParameterType);

with:


function getType<"JAVA">(myType : node) {
    ... // implementation for returning a Java type
}

function getType<"C++">(myType : node) {
    ... // implementation for returning a C++ type
}

During the execution, the function getType<T>(myType : node) resolves on what instantiated function it has to dispatch: either getType<"JAVA">(myType : node) or getType<"C++">(myType : node), depending on what value is assigned to variable doc_language.

Trying to call an instantiated function that doesn't exist, raises an error at runtime. However, one might imagine an implementation by default. For instance:


function getType<T>(myType : node) {
    ... // common implementation for any unrecognized language
}

For those that know generic programming with C++ templates, here is a classical example of using template functions:


function f<1>() { return 1; }
function f<N>() { return $N*f<$N - 1$>()$; }
local f10 = f<10>();
if $f10 != 3628800$ error("10! should be worth 3628800");
traceLine("10! = " + f10);

Output:

10! = 3628800

To provide more flexibility in the implementation of the template function, depending on the generic key <T>, the body admits a template-based script to implement the source code of the function. The specialization of the function for a given template instantiation key is then resolved at runtime.

Example:
The template function f inserts a new attribute in a tree node. The attribute has the name passed to the generic key for instantiation, and the value of the instantiation key is assigned to the new attribute. Then, the function calls itself recursively on the instantiation key without the last character.
For instance, the source code of f<"field"> should be:

function f<"field">(x : node) {
      insert x.field = "field";
      f<"fiel">(x); // cut the last character
}

Code:

//a synonym of f<"">(x : node), terminal condition for recusive calls
function f(x : node) {/*does nothing*/}

function f<T>(x : node) {{
      // '{{' announces a template-based script, which
      // will generate the correct implementation during the instantiation
      insert x.@T@ = "@T@";
      f<"@T.rsubString(1)@">(x);
@
      // '}}' announces the end of the template-based script
}}

f<"field">(project);
traceObject(project);

Output:

Tracing variable 'project':
      field = "field"
      fiel = "fiel"
      fie = "fie"
      fi = "fi"
      f = "f"
End of variable's trace 'project'.

2.6.5 Methods

For more readability, syntactical facilities are offered to call functions on a node as if this function was a method of the node. For example, it is possible to call function leftString on the node a like this: a.leftString(2), instead of the classical functional form: leftString(a, 2).

The rule is that every function (user-defined included) whose first argument is passed either by value or by node or by index (but never by reference) can propose a method call.

In that case, the method call applies on the first argument, which has to be a node. The BNF representation of a method call is:
method-call ::= variable '.' function-name '(' parameters ')'
parameters ::= expression [',' expression]*
where parameters have missed the first argument of the function called function-name.

It exists some exceptions where the method doesn't apply to the first argument:

The following methods offer a synonym to the function name:

2.6.6 The 'readonly' hook

The BNF representation of this statement is:
readonlyHook-statement ::= "readonlyHook" '(' filename ')' compound-statement

The token filename is the argument name that the user chooses for passing the name of the file to the body of the hook.

This special function allows implementing a hook that will be called each time a read-only file will be encountered while generating the output file through the generate or expand instruction.

Limitations: only one declaration of this hook is authorized, and it can't be declared inside a parsing or pattern script.

Example:

Common usage: file to generate has to be checked out from a source code control system (see system command to run executables).

readonlyHook(sFilename) {
  if !getProperty("SSProjectFolder") || !getProperty("SSWorkingFolder") || !getProperty("SSExecutablePath") || !getProperty("SSArchiveDir") {
    traceLine("WARNING: properties 'SSProjectFolder' and 'SSWorkingFolder' and 'SSExecutablePath' and 'SSArchiveDir' should be passed to the command line for checking out read-only files from Source Safe");
  } else {
    if startString(sFilename, getProperty("SSWorkingFolder")) {
      local sourceSafe;
      insert sourceSafe.fileName = sFilename;
      generate("SourceSafe.cwt", sourceSafe, getEnv("TMP") + "/SourceSafe.bat");
      if sourceSafe.isOk {
        putEnv("SSDIR", getProperty("SSArchiveDir"));
        traceLine("checking out '" + sFilename + "' from Source Safe archive '" + getProperty("SSArchiveDir") + "'");
        local sFailed = system(getEnv("TMP") + "/SourceSafe.bat");
        if sFailed {
          traceLine("Check out failed: '" + sFailed + "'");
        }
      }
    } else {
      traceLine("Unable to check out '" + sFilename + "': working folder starting with '" + getProperty("SSWorkingFolder") + "' expected");
    }
  }
}

2.6.7 The 'write file' hook

This special function allows implementing a hook that will be called just before writing a file, after ending a text generation process such as expanding or generating or translating text.

It is very important to notice that it returns a boolean value. A true value means that the generated text must be written into the file. A false boolean value means that the generated text doesn't have to be written into the file.

CodeWorker always interprets not returning a value explicitly of a function, as returning an empty string. If you forget to return a value, the generated text will not be written into the file!

The BNF representation of this statement is:
writefileHook-statement ::= "writefileHook" '(' filename ',' position ',' creation ')' compound-statement

ArgumentTypeDescription
filename string The argument name that the user chooses for passing the file name to the body of the hook.
position int The argument name that the user chooses for passing a position where a difference occurs between the new generated version of the file and the precedent one.
If the files don't have the same size, the position is worth -1.
creation boolean The argument name that the user chooses for passing whether the file is created or updated.
The argument is worth true if the file doesn't exist yet.

Limitations: only one declaration of this hook is authorized, and it can't be declared inside a parsing or pattern script.

Example:

writefileHook(sFilename, iPosition, bCreation) {
    if bCreation {
        traceLine("Creating file '" + sFilename + "'!");
    } else {
        traceLine("Updating file '" + sFilename + "', difference at " + iPosition + "!");
    }
    return true;
}

2.6.8 The 'step into' hook

This special function is automatically called before that the extended BNF engine resolves the production rule of a BNF non-terminal. Combined with stepoutHook(), it is very useful for trace and debug tasks.

This hook can be implemented in parse scripts only.

The BNF representation of this statement is:
stepintoHook-statement ::= "stepintoHook" '(' sClauseName ',' localScope ')' compound-statement

ArgumentTypeDescription
sClauseName string The name of the non-terminal.
localScope tree The scope of parameters used into the production rule.

2.6.9 The 'step out' hook

This special function is automatically called once the extended BNF engine has finished the resolution of a BNF non-terminal. Combined with stepintoHook(), it is very useful for trace and debug tasks.

This hook can be implemented in parse scripts only.

The BNF representation of this statement is:
stepoutHook-statement ::= "stepoutHook" '(' sClauseName ',' localScope ',' bSuccess ')' compound-statement

ArgumentTypeDescription
sClauseName string The name of the non-terminal.
localScope tree The scope of local variables and parameters used into the production rule.
bSuccess boolean Whether the resolution of the production rule has succeeded or not.

2.7 Statement's modifiers

A statement's modifier is a directive that stands just before a statement, meaning an instruction or a compound statement.

This directive operates some actions in the scope of the statement and then restores the behaviour as being before.

This action may be:

2.7.1 Statement's modifier 'delay'

This keyword stands just before an instruction or a compound statement. It executes the statement and then, it measures the time it has consumed.

Function getLastDelay (getLastDelay()) gives you the last measured duration.

Example:


local list;
local iIndex = 4;
delay while isPositive(decrement(iIndex)) {
    pushItem list = "element " + iIndex;
    traceLine("creating node '" + list#back + "'");
}
traceLine("time of execution = " + getLastDelay() + " seconds");

Output:

creating node 'element 3'
creating node 'element 2'
creating node 'element 1'
time of execution = 0.000037079177335661762 seconds

2.7.2 Statement modifier 'quiet'

This keyword stands just before an instruction or a compound statement. It executes the statement and all messages intended to the console are concatenated into a string, instead of being displayed. The variable that receives the concatenation of messages is specified after the quiet keyword.

The BNF representation of the quiet statement modifier looks like:
quiet_modifier ::= "quiet" '(' variable ')' statement

Note that the variable must have been declared before, as a local one or as an attribute of the parse tree. If this variable doesn't exist while executing the statement, an error is raised.

2.7.3 Statement modifier 'new project'

This keyword stands just before an instruction or a compound statement. A new project parse tree is created, which is empty and that replaces temporarily the current one. The statement is executed and, once the controlling sequence leaves the statement, the temporary parse tree is removed, and the precedent project comes back as the current one.

The BNF representation of the new_project statement modifier looks like:
new_project_modifier ::= "new_project" statement

This statement modifier is useful to handle a task that doesn't have to interact with the main parse tree.

2.7.4 Statement modifier 'file as standard input'

This keyword stands just before an instruction or a compound statement. A new standard input is opened for reading data. Generally, the keyboard is the standard input, but here, it will be the content of a file that is passed to the argument filename. Once the execution of the statement has completed, the precedent standard input comes back.

The BNF representation of the file_as_standard_input statement's modifier looks like:
file_as_standard_input_modifier ::= "file_as_standard_input" '(' filename ')' statement

This statement modifier is useful to replay a sequence of commands for the debugger or to drive the standard input from an external module that puts its instructions into a file for a batch mode or anything else.

2.7.5 Statement modifier 'string as standard input'

This keyword stands just before an instruction or a compound statement. A new standard input is opened for reading data. Generally, the keyboard is the standard input, but here, it will be the content of the string that is passed to argument. Once the execution of the statement has completed, the precedent standard input comes back.

The BNF representation of the string_as_standard_input statement's modifier looks like:
string_as_standard_input_modifier ::= "string_as_standard_input" '(' expression ')' statement

The standard input is the result of evaluating expression.

This statement modifier is useful to drive the standard input of CodeWorker from an external module, such as a JNI library or an external C++ application ( see chapter external bindings).

2.7.6 Statement modifier 'parsed file'

This keyword stands just before an instruction or a compound statement that belongs to a parsing/translation script exclusively. A new input file is opened for source scanning, and replaces temporarily the precedent during the execution of the statement.The statement is executed and, once the controlling sequence leaves the statement, the input file is closed properly and the precedent one comes back.

The BNF representation of the parsed_file statement modifier looks like:
parsed_file_modifier ::= "parsed_file" '(' filename ')' statement

The token filename is an expression that is evaluated to give the name of the input file.

This statement modifier is useful to handle a task that must redirect the text to parse into another input file. An example could be to emulate the C++ preprocessing on #include directives.

2.7.7 Statement modifier 'parsed string'

This keyword stands just before an instruction or a compound statement that belongs to a parsing/translation script exclusively. The result of an expression is taken as the source to scan, and replaces temporarily the precedent input during the execution of the statement.The statement is executed and, once the controlling sequence leaves the statement the precedent input comes back.

The BNF representation of the parsed_string statement modifier looks like:
parsed_string_modifier ::= "parsed_string" '(' expression ')' statement

The token fexpression is an expression that is evaluated to give the text to scan.

This statement modifier is useful to handle a task that must temporary parse a string.

2.7.8 Statement modifier 'generated file'

This keyword stands just before an instruction or a compound statement that belongs to a pattern script exclusively. A new output file is opened for source code generation, preserving protected areas as usually, and replaces temporarily the current one during the execution of the statement. The statement is executed and, once the controlling sequence leaves the statement, the output file is closed properly and the precedent one takes its place.

The BNF representation of the generated_file statement modifier looks like:
generated_file_modifier ::= "generated_file" '(' filename ')' statement

The token filename is an expression that is evaluated to give the name of the output file.

This statement modifier is useful to handle a task that must redirect the generated text into another output file. An example could be to split an HTML text to generate into a few files for implementing a frame set.

2.7.9 Statement modifier 'generated string'

This keyword stands just before an instruction or a compound statement that belongs to a pattern script exclusively. The output stream is redirected into a variable that replaces temporarily the current output stream during the execution of the statement. The statement is executed and, once the controlling sequence leaves the statement, the variable is populated with the content of the output produced during this scope and the precedent output stream takes its place.

The BNF representation of the generated_string statement modifier looks like:
generated_string_modifier ::= "generated_string" '(' variable ')' statement

The variable argument gives the name of the variable that will be populated with the generated text. This variable must already exist, declared on the stack or referring a node of the current parse tree.

2.7.10 Statement modifier 'appended file'

This keyword stands just before an instruction or a compound statement that belongs to a pattern script exclusively. A new output file is opened for appending source code generation at the end of the file and replaces temporarily the current one during the execution of the statement. The statement is executed and, once the controlling sequence leaves the statement, the output file is closed properly and the precedent one takes its place.

The BNF representation of the appended_file statement modifier looks like:
appended_file_modifier ::= "appended_file" '(' filename ')' statement

The token filename is an expression that is evaluated to give the name of the output file to append.

3 Common functions and procedures

All functions and procedures that are described below may be encountered in any kind of scripts : parsing, source code generation and file expanding, process driving, included script files.

Category interpreterFunction for running a CodeWorker script
autoexpand Expands a file on markups, following the directives self-contained in the file.
executeString Executes a script given in a string.
executeStringQuiet Interprets a string as a script and returns all traces intended to the console.
expand Expands a file on markups, following the directives of a template-based script.
extendExecutedScript Extend the current executed script dynamically with the content of the string.
generate Generates a file, following the directives of a template-based script.
generateString Generates a string, following the directives of a template-based script.
parseAsBNF Parses a file with a BNF script.
parseFree Parses a file with an imperative script.
parseFreeQuiet Parses a file with an imperative script, reroute all console messages and returns them as a string.
parseStringAsBNF Parses a string with a BNF script.
traceEngine Displays the state of the interpreter.
translate Performs a source-to-source translation or a program transformation.
translateString Performs a source-to-source translation or a program transformation on strings.

Category stringFunctions for handling strings
charAt Returns the characters present at a given position of a string.
completeLeftSpaces Completes a string with spaces to the left so that it reaches a given size.
completeRightSpaces Completes a string with spaces to the right so that it reaches a given size.
composeAdaLikeString Converts a sequence of characters to a Ada-like string without double quote delimiters.
composeCLikeString Converts a sequence of characters to a C-like string without double quote delimiters.
composeHTMLLikeString Converts a sequence of characters to an HTML-like text
composeSQLLikeString Converts a sequence of characters to a SQL-like string without single quote delimiters.
coreString Extracts the core of a string, leaving the beginning and the end.
countStringOccurences How many occurences of a string to another.
cutString Cuts a string at each separator encountered.
endString Compares the end of the string.
endl Returns an end-of-line, depending on the operating system.
equalsIgnoreCase Compares two strings, ignoring the case.
executeString Executes a script given in a string.
executeStringQuiet Interprets a string as a script and returns all traces intended to the console.
findFirstChar Returns the position of the first character amongst a set, encountered into a string.
findLastString Returns the position of the last occurence of a string to another.
findNextString Returns the next occurence of a string to another.
findString Returns the first occurence of a string to another.
generateString Generates a string, following the directives of a template-based script.
joinStrings Joins a list of strings, adding a separator between them.
leftString Returns the beginning of a string.
lengthString Returns the length of a string.
midString Returns a substring starting at a point for a given length.
parseStringAsBNF Parses a string with a BNF script.
repeatString Returns the concatenation of a string repeated a few times.
replaceString Replaces a substring with another.
replaceTabulations Replaces tabulations with spaces.
rightString Returns the end of a string.
rsubString Returns the left part of a string, ignoring last characters.
startString Checks the beginning of a string.
subString Returns a substring, ignoring the first characters.
toLowerString Converts a string to lowercase.
toUpperString Converts a string to uppercase.
trim Eliminates heading and trailing whitespaces.
trimLeft Eliminates the leading whitespaces.
trimRight Eliminates the trailing whitespaces.
truncateAfterString Special truncation of a string.
truncateBeforeString Special truncation of a string.

Category arrayFunctions handling arrays
findElement Checks the existence of an entry key in an array.
findFirstSubstringIntoKeys Returns the first entry key of an array, containing a given string.
findNextSubstringIntoKeys Returns the next entry key of an array, containing a given string.
getArraySize Returns the number of items in an array.
insertElementAt Inserts a new element to a list, at a given position.
invertArray Inverts the order of items in an array.
isEmpty Checks whether a node has items or not.
removeAllElements Removes all items of the array.
removeElement Removes an item, given its entry key.
removeFirstElement Removes the first item of the array.
removeLastElement Removes the last item of the array.

Category nodeFunctions handling a node
clearVariable Removes the subtree and assigns an empty value.
equalTrees Compares two subtrees.
existVariable Checks the existence of a node.
getVariableAttributes Extract all attribute names of a tree node.
removeRecursive Removes a given attribute from the subtree.
removeVariable Removes a given variable.
slideNodeContent Moves the subtree elsewhere on a branch.
sortArray Sort an array, considering the entry keys.

Category iteratorFunctions handling an iterator
createIterator Creates an iterator pointing to the beginning of a list.
createReverseIterator Creates a reverse iterator pointing to the end of a list.
duplicateIterator Duplicates an iterator.
first Returns true if the iterator points to the first item.
index Returns the position of an item in a list.
key Returns the entry key of the item pointed to by the iterator.
last Returns true if the iterator points to the last item.
next Move an iterator to the next item of a list.
prec Move an iterator to the precedent item of a list.

Category fileFunctions handling files
appendFile Writes the content of a string to the end of a file
canonizePath Builds an absolute path, starting to the current directory.
changeFileTime Changes the access and modification times of a file.
chmod Changes the permissions of a file.
copyFile Copies a file.
copyGenerableFile Copies a file with protected areas or expandable markups, only if the hand-typed code differs between source and destination.
copySmartFile Copies a file only if the destination differs.
createVirtualFile Creates a transient file in memory.
createVirtualTemporaryFile Creates a transient file in memory, CodeWorker choosing its name.
deleteFile Deletes a file on the disk.
deleteVirtualFile Deletes a transient file from memory.
existFile Checks the existence of a file.
existVirtualFile Checks the existence of a transient file, created in memory.
exploreDirectory Browses all files of a directory, recursively or not.
fileCreation Returns the creation date of a file.
fileLastAccess Returns the last access date of a file.
fileLastModification Returns the last modification date of a file.
fileLines Returns the number of lines in a file.
fileMode Returns the permissions of a file.
fileSize Returns the size of a file.
getGenerationHeader Returns the comment to put into the header of generated files.
getShortFilename Returns the short name of a file
indentFile Indents a file, depending on the target language.
loadBinaryFile Loads a binary file and stores each byte in a hexadecimal representation of 2 digits.
loadFile Returns the content of a file or raises an error if not found.
loadVirtualFile Returns the content of a transient file or raises an error if not found.
pathFromPackage Converts a package path to a directory path.
relativePath Returns the relative path, which allows going from a path to another.
resolveFilePath Gives the location of a file with no ambiguity.
saveBinaryToFile Saves binary data to a file.
saveToFile Saves the content of a string to a file
scanDirectories Explores a directory, filtering filenames.
scanFiles Returns a flat list of all filenames matching with a filter.

Category directoryFunctions handling directories
changeDirectory Changes the current directory (chdir() in C).
copySmartDirectory Copies files of a directory recursively only when destination files differ from source files.
createDirectory Creates a new directory.
existDirectory Check the existence of a directory.
exploreDirectory Browses all files of a directory, recursively or not.
getCurrentDirectory Returns the current directory (getcwd() in C).
removeDirectory Removes a directory from the disk.
scanDirectories Explores a directory, filtering filenames.
scanFiles Returns a flat list of all filenames matching with a filter.

Category URLFunctions working on URL transfers (HTTP,...)
decodeURL Decodes an HTTP URL.
encodeURL Encodes an URL to HTTP.
getHTTPRequest Sends an HTTP's GET request.
postHTTPRequest Sends an HTTP's POST request.
sendHTTPRequest Sends an HTTP request.

Category datetimeFunctions handling date-time
addToDate Change a date by shifting its internal fields days/months/years or time.
compareDate Compares two dates.
completeDate Extends an incomplete date with today characteristics.
fileCreation Returns the creation date of a file.
fileLastAccess Returns the last access date of a file.
fileLastModification Returns the last modification date of a file.
formatDate Changes the format of a date.
getLastDelay Returns the time consumed to execute a statement.
getNow Returns the current date-time.
setNow Fixes the current date-time.

Category numericFunctions handling numbers
add Equivalent admitted writing is $a + b$.
ceil Returns the smallest integer greater that or equal to a number
decrement Equivalent admitted writing is set a = $a - 1$;.
div Equivalent admitted writing is $a / b$.
equal Equivalent admitted writing is $a == b$.
exp Returns the exponential of a value.
floor Returns the largest integer less that or equal to a number
increment Equivalent admitted writing is set a = $a + 1$;.
inf Equivalent admitted writing is $a < b$.
isNegative Equivalent admitted writing is $a < 0$.
isPositive Equivalent admitted writing is $a > 0$.
log Returns the Neperian logarithm.
mod Equivalent admitted writing is $a % b$.
mult Equivalent admitted writing is $a * b$.
pow Raises a number to the power of another.
sqrt Calculates the square root.
sub Equivalent admitted writing is $a - b$.
sup Equivalent admitted writing is $a > b$.

Category standardClassical functions of any standard library
UUID Generates an UUID.
error Raises an error message
inputKey If any, returns the last key pressed on the standard input.
inputLine Wait for the standard input to the console.
isIdentifier Checks whether a string is a C-like identifier or not.
isNumeric Checks whether a string is a floating-point number or not.
randomInteger Generates a pseudorandom number.
randomSeed Changes the seed of the pseudorandom generator.
traceLine Displays a message to the console, adding a carriage return.
traceObject Displays the content of a node to the console.
traceStack Displays the stack to the console.
traceText Displays a message to the console.

Category conversionType conversion
byteToChar Converts a byte (hexadecimal representation of 2 digits) to a character.
bytesToLong Converts a 4-bytes sequence to an unsigned long integer in its decimal representation.
bytesToShort Converts a 2-bytes sequence to an unsigned short integer in its decimal representation.
charToByte Converts a character to a byte (hexadecimal representation of 2 digits).
charToInt Converts a character to the integer value of the corresponding ASCII.
hexaToDecimal Converts an hexadecimal representation to an integer.
hostToNetworkLong Converts a 4-bytes representation of a long integer to the network bytes order.
hostToNetworkShort Converts a 2-bytes representation of a short integer to the network bytes order.
longToBytes Converts an unsigned long integer in decimal base to its 4-bytes representation.
networkLongToHost Converts a 4-bytes representation of a long integer to the host bytes order.
networkShortToHost Converts a 2-bytes representation of a short integer to the host bytes order.
octalToDecimal Converts an octal representation to a decimal integer.
shortToBytes Converts an unsigned short integer in decimal base to its 2-bytes representation.

Category systemFunctions relative to the operating system
computeMD5 Computes the MD5 of a string.
environTable Equivalent of environ() in C
existEnv Checks the existence of an environment variable.
getEnv Returns an environment variable, or raises an error if not exist.
openLogFile Opens a log file for logging every console trace.
putEnv Puts a value to an environment variable.
sleep Suspends the execution for millis milliseconds.
system Equivalent to the C function system().

Category commandRelative to the command line
compileToCpp Translates a script to C++.
getIncludePath Returns the include path passed via the option -I.
getProperty Returns the value of a property passed via the option -D.
getVersion Returns the version of the interpreter.
getWorkingPath Returns the output directory passed via option -path.
setIncludePath Changes the option -I while running.
setProperty Adds/changes a property (option -D) while running.
setVersion Gives the version of scripts currently interpreted by CodeWorker.
setWorkingPath Does the job of the option -path.

Category generationFunctions relative to generation
addGenerationTagsHandler Adds your own CodeWorker's tags handler
autoexpand Expands a file on markups, following the directives self-contained in the file.
expand Expands a file on markups, following the directives of a template-based script.
extractGenerationHeader Gives the generation header of a generated file, if any.
generate Generates a file, following the directives of a template-based script.
generateString Generates a string, following the directives of a template-based script.
getCommentBegin Returns the current format of a comment's beginning.
getCommentEnd Returns the current format of a comment's end.
getGenerationHeader Returns the comment to put into the header of generated files.
getTextMode Returns the text mode amongst "DOS", "UNIX" and "BINARY".
getWriteMode Returns how text is written during a generation (insert/overwrite).
listAllGeneratedFiles Gives the list of all generated files.
removeGenerationTagsHandler Removes a custom generation tags handler
selectGenerationTagsHandler Selects your own CodeWorker's tags handler for processing generation tasks
setCommentBegin Changes what a beginning of comment looks like, perhaps before expanding a file.
setCommentEnd Changes what an end of comment looks like, perhaps before expanding a file.
setGenerationHeader Specifies a comment to put at the beginning of every generated file.
setTextMode "DOS", "UNIX" or "BINARY"
setWriteMode Selects how to write text during a generation (insert/overwrite).
translate Performs a source-to-source translation or a program transformation.
translateString Performs a source-to-source translation or a program transformation on strings.

Category parsingFunctions relative to scanning/parsing
parseAsBNF Parses a file with a BNF script.
parseFree Parses a file with an imperative script.
parseFreeQuiet Parses a file with an imperative script, reroute all console messages and returns them as a string.
parseStringAsBNF Parses a string with a BNF script.
translate Performs a source-to-source translation or a program transformation.
translateString Performs a source-to-source translation or a program transformation on strings.

Category socketSocket operations
acceptSocket Listens for a client connection and accepts it.
closeSocket Closes a socket descriptor.
createINETClientSocket Creates a stream socket connected to the specified port and IP address.
createINETServerSocket Creates a server stream socket bound to a specified port.
receiveBinaryFromSocket Reads binary data from the socket, knowing the size.
receiveFromSocket Reads text or binary data from a socket.
receiveTextFromSocket Reads text from a socket, knowing the size.
sendBinaryToSocket Writes binary data to a socket.
sendTextToSocket Writes text to a socket.

Category unknownVarious types of function
loadProject Loads a parse tree previously saved thanks to saveProject().
not The boolean negation, equivalent to !a.
produceHTML
saveProject Saves a parse tree to XML or to a particular text format.
saveProjectTypes Factorizes nodes of the projects to distinguish implicit types for node and saves it to XML.

3.1 acceptSocket

3.2 add

3.3 addGenerationTagsHandler

3.4 addToDate

3.5 appendFile

3.6 autoexpand

3.7 bytesToLong

3.8 bytesToShort

3.9 byteToChar

3.10 canonizePath

3.11 ceil

3.12 changeDirectory

3.13 changeFileTime

3.14 charAt

3.15 charToByte

3.16 charToInt

3.17 chmod

3.18 clearVariable

3.19 closeSocket

3.20 compareDate

3.21 compileToCpp

3.22 completeDate

3.23 completeLeftSpaces

3.24 completeRightSpaces

3.25 composeAdaLikeString

3.26 composeCLikeString

3.27 composeHTMLLikeString

3.28 composeSQLLikeString

3.29 computeMD5

3.30 copyFile

3.31 copyGenerableFile

3.32 copySmartDirectory

3.33 copySmartFile

3.34 coreString

3.35 countStringOccurences

3.36 createDirectory

3.37 createINETClientSocket

3.38 createINETServerSocket

3.39 createIterator

3.40 createReverseIterator

3.41 createVirtualFile

3.42 createVirtualTemporaryFile

3.43 cutString

3.44 decodeURL

3.45 decrement

3.46 deleteFile

3.47 deleteVirtualFile

3.48 div

3.49 duplicateIterator

3.50 encodeURL

3.51 endl

3.52 endString

3.53 environTable

3.54 equal

3.55 equalsIgnoreCase

3.56 equalTrees

3.57 error

3.58 executeString

3.59 executeStringQuiet

3.60 existDirectory

3.61 existEnv

3.62 existFile

3.63 existVariable

3.64 existVirtualFile

3.65 exp

3.66 expand

3.67 exploreDirectory

3.68 extendExecutedScript

3.69 extractGenerationHeader

3.70 fileCreation

3.71 fileLastAccess

3.72 fileLastModification

3.73 fileLines

3.74 fileMode

3.75 fileSize

3.76 findElement

3.77 findFirstChar

3.78 findFirstSubstringIntoKeys

3.79 findLastString

3.80 findNextString

3.81 findNextSubstringIntoKeys

3.82 findString

3.83 first

3.84 floor

3.85 formatDate

3.86 generate

3.87 generateString

3.88 getArraySize

3.89 getCommentBegin

3.90 getCommentEnd

3.91 getCurrentDirectory

3.92 getEnv

3.93 getGenerationHeader

3.94 getHTTPRequest

3.95 getIncludePath

3.96 getLastDelay

3.97 getNow

3.98 getProperty

3.99 getShortFilename

3.100 getTextMode

3.101 getVariableAttributes

3.102 getVersion

3.103 getWorkingPath

3.104 getWriteMode

3.105 hexaToDecimal

3.106 hostToNetworkLong

3.107 hostToNetworkShort

3.108 increment

3.109 indentFile

3.110 index

3.111 inf

3.112 inputKey

3.113 inputLine

3.114 insertElementAt

3.115 invertArray

3.116 isEmpty

3.117 isIdentifier

3.118 isNegative

3.119 isNumeric

3.120 isPositive

3.121 joinStrings

3.122 key

3.123 last

3.124 leftString

3.125 lengthString

3.126 listAllGeneratedFiles

3.127 loadBinaryFile

3.128 loadFile

3.129 loadProject

3.130 loadVirtualFile

3.131 log

3.132 longToBytes

3.133 midString

3.134 mod

3.135 mult

3.136 networkLongToHost

3.137 networkShortToHost

3.138 next

3.139 not

3.140 octalToDecimal

3.141 openLogFile

3.142 parseAsBNF

3.143 parseFree

3.144 parseFreeQuiet

3.145 parseStringAsBNF

3.146 pathFromPackage

3.147 postHTTPRequest

3.148 pow

3.149 prec

3.150 produceHTML

3.151 putEnv

3.152 randomInteger

3.153 randomSeed

3.154 receiveBinaryFromSocket

3.155 receiveFromSocket

3.156 receiveTextFromSocket

3.157 relativePath

3.158 removeAllElements

3.159 removeDirectory

3.160 removeElement

3.161 removeFirstElement

3.162 removeGenerationTagsHandler

3.163 removeLastElement

3.164 removeRecursive

3.165 removeVariable

3.166 repeatString

3.167 replaceString

3.168 replaceTabulations

3.169 resolveFilePath

3.170 rightString

3.171 rsubString

3.172 saveBinaryToFile

3.173 saveProject

3.174 saveProjectTypes

3.175 saveToFile

3.176 scanDirectories

3.177 scanFiles

3.178 selectGenerationTagsHandler

3.179 sendBinaryToSocket

3.180 sendHTTPRequest

3.181 sendTextToSocket

3.182 setCommentBegin

3.183 setCommentEnd

3.184 setGenerationHeader

3.185 setIncludePath

3.186 setNow

3.187 setProperty

3.188 setTextMode

3.189 setVersion

3.190 setWorkingPath

3.191 setWriteMode

3.192 shortToBytes

3.193 sleep

3.194 slideNodeContent

3.195 sortArray

3.196 sqrt

3.197 startString

3.198 sub

3.199 subString

3.200 sup

3.201 system

3.202 toLowerString

3.203 toUpperString

3.204 traceEngine

3.205 traceLine

3.206 traceObject

3.207 traceStack

3.208 traceText

3.209 translate

3.210 translateString

3.211 trim

3.212 trimLeft

3.213 trimRight

3.214 truncateAfterString

3.215 truncateBeforeString

3.216 UUID

4 The extended BNF syntax for parsing

A BNF description of a grammar is more flexible and more synthetic than a procedural description of parsing. CodeWorker accepts parsing scripts that conform to a BNF.

BNF is the acronym of Backus-Naur Form, and consists of describing a grammar with production rules. The first production rule that is encountered into the script and that isn't a special one (beginning with a '#' like the {#empty} clause), is chosen as the main non-terminal to match with the input stream, when the BNF-driven script is executed.

A non-terminal (often called a clause in the documentation) breaks down into terminals and other non-terminals. Defining how to break down a non-terminal is called a production rule. A clause is valid as soon as the production rule matches its part of the input stream.

The syntax of a clause looks like:
["#overload"]? <clause_specifier> <preprocessing> "::=" <sequence> ['|' <sequence>]* ';'

where:
<preprocessing> ::= "#!ignore" | "#ignore" ['(' <ignore-mode> ')']? ';'
<ignore-mode> ::= "blanks" | "C++" | "JAVA" | "HTML" | "LaTeX"; <sequence> ::= non-terminal | terminal; <terminal> ::= symbol of the language: a constant character or string

A sequence is a set of terminals and non-terminals that must match the input stream, starting at the current position. A production rule may propose alternatives: if a sequence doesn't match, the engine tries the next one (the alternation symbol '|' separates the sequences).

A regular expression asks for reading tokens into the input stream. If tokens are put in sequence, one behind the other, they are evaluated from the left to the right and all of them must match the input stream. For example, "class" '{' is a sequence of 2 non-terminals, which requires that the input stream first matches with "class" and then is followed by '{'.

Putting #overload just before the declaration of a production rule means that the non-terminal was already defined and that it must be replaced by this new rule when called. Example:

nonterminal ::= "bye";
...
#overload nonterminal ::= "bye" | "quit" | "exit";

Now, calling nonterminal executes the second production rule. Use the directive #super to call the overloaded clause. The precedent overloading might be written:

...
#overload nonterminal ::= #super::nonterminal | "quit" | "exit";

#overload takes an important place in the reuse of BNF scripts. A parser might be built as reusing a scanner, where some non-terminals only have to be extended, for populating a parse tree for instance.

The statement #transformRules provides also a convenient way to reuse a BNF script.

It defines a rule that describes how to transform the header (left member) and the production rule (right member) of a non-terminal declaration.

Example:

INTEGER ::= #ignore ['0'..'9']*;

INTEGER is the header and #ignore ['0'..'9']* is the production rule.

During the compilation of a BNF parse script, before processing the declaration of a non-terminal, the compiler checks whether a transforming rule validates the name of the non-terminal. If so, both the header of the declaration and the production rule are translated, following the directives of the rule.

The #transformRules statement must be put in the BNF script, before the production rules to transform.

The syntax the statement #transformRules looks like:
transform-rules ::= "#transformRules" filter header-transformation prod-rule-transformation
filter ::= expression
header-transformation ::= '{' translation-script ''
prod-rule-transformation ::= '{' translation-script ''} }

The filter is a boolean expression, applied on the name of the production rule. The variable x contains the name of the production rule.

header-transformation consists on a translation script, which describes how to transform the header. If the block remains empty, the header doesn't change.

prod-rule-transformation consists on a translation script, which describes how to transform the production rule. If the block remains empty, the header doesn't change.

Example:

This example describes how to transform each production rule, whose name ends with "expr".

or_expr ::= and_expr ["&&" and_expr]*;
becomes
or_expr(myExpr : node) ::= and_expr(myExpr.left) ["&&":myExpr.operator and_expr(myExpr.right)]*;

The original production rules are just scanning the input, and the example shows how to transform them for populating a node of the parse tree.

#transformRules
    // The filter accepts production rules that have a name
    // ending with "expr" only.
    // Note that the variable x holds the name
    // of the production rule.
    x.endString("expr")
    
    
A script for transforming the header of the production rule:
    {
        // By default, copies the input to the output
        #implicitCopy
        // Writes the declaration of the parameter myExpr
        // after the non-terminal and copies the rest.
        header ::= #readIdentifier
            => {@(myExpr : node)@}
            ->#empty;
    }
    
    
A script for transforming the production rule itself:
    {
        #implicitCopy
        // - Pass the left member of the expression to populate,
        // to the first non-terminal,
        // - assign the operator to the expression,
        // - Pass the right member of the expression to populate,
        // to the first non-terminal.
        // In any case, the rest of the production rule remains
        // invariant.
        prodrule ::= [
                #readIdentifier
                =>{@(myExpr.left)@}
                ->[
                    "'" #readChar "'" => {@:myExpr.operator@}
                  |
                    #readCString => {@:myExpr.operator@}
                ]
                #readIdentifier
                =>{@(myExpr.right)@}
            ]?
            ]->#empty;
    }

4.1 BNF tokens

Below are described all BNF tokens that CodeWorker recognizes:

4.2 Preprocessing of a clause

If no processing has been specified to a clause, characters will be ignored into the input stream, following the instruction of the ignore mode (determined by the predefined clause #ignore), just before running the clause.

Sometimes, it arrives that the ignore mode should change before calling the clause. Let's imagine that C++ comments and blanks are ignored, except at some places where a line-comment is expected, holding a description. If the clause that matches the line-comment is called description, each time a description has to be read, the following sequence must be written:

#ignore(blanks) description:sDescription #ignore(C++)

Thanks to the preprocessing of clause, it is possible to require a specific ignore mode while calling a clause. For example:

description #ignore(blanks) ::= "//" #!ignore [~['\r' | '\n']]*:description;

On our example, each time a description has to be read, calling the clause description is naturally reduced to:

description:sDescription

4.3 Inserting instructions of the scripting language

Instruction of the scripting language may be inserted into a sequence of tokens, and are considered as valid, except when the controlling sequence is interrupted by the break statement. These instructions doesn't apply a matching on the input stream, but they serve generally to check the consistence of data and to populate the parse tree. They are announced by the symbol '=>':
"=>" instruction ';' or
"=>" compound-statement where a compound-statement is a block of instructions between braces.

Example:

class_declaration(myClass : node) ::=
    "class" IDENT:myClass.name
     => traceLine("name = '" + myClass.name + "'");
     [
       ':' IDENT:sParent
       => {
         if !findElement(sParent, listOfClasses)
           error("class '" + sParent + "' hasn't been declared yet!");
         ref myClass.parent = listOfClasses[sParent];
       }

     ]?
     '{' ...

The first swapping to the scripting language is just an instruction to trace, which must end with a semi-colon and that isn't the end of the clause! The second swapping to the script language implements a little more work that is put between braces.

Be careful about declaration of local variables. If you declare a variable into a compound statement, it disappears once the controlling sequence leaves the scope. To declare a variable local to the clause, you can do:
...
=> local myVariable;

In some particular cases, you may have to execute a BNF sequence from within from such a piece of common script. The only way is to use the directive

#applyBNFRule

followed by a non-terminal call.

4.4 Common properties of BNF tokens

The sequence of characters that a BNF token has matched may be assigned to a variable. Then the variable may follow the token, separated by a colon:
token ':' variable_name

Example:

IDENT : sName

(where IDENT ::= ['a'..'z' | 'A'..'Z']+) means that if the clause IDENT is valid, the identifier matching the BNF token is assigned to sName. Be careful that if the variable doesn't exist, it is pushed into the stack, on the contrary of a classic CodeWorker script that asks for declaring explicitly a local variable.

You can also specify to concatenate the text covered by the BNF token, to the ancient value of the variable:
B:+v.

Example:
If v is worth "nebula:" and if the sentence starts with "Orion.", then v becomes "nebula:Orion" after the resolution of:
#readIdentifier:+v

The sequence of characters that a BNF token has matched may be worth a constant or may belong to a set of values. Then, the constant or the set of values is following the token, separated by a colon, as for variables:
token ':' constant_value [':' variable_name] or
token ':' '{' values_of_the_set '}'':' variable_name where
values_of_the_set ::= constant_value [',' constant_value]*

Examples:

4.5 BNF directives

Some directives are available:

4.6 Declaring a clause

We have seen that a clause may expect some arguments. Such a kind of clause conforms to the syntax:
clause_specifier ::= clause_name [parameters]? [':' return_type]? '::=' clause_body
clause_name ::= identifier [template_resolution]?; template_resolution ::= '<' [identifier | constant_string] '>'; parameters ::= '(' parameter [',' parameter]* ')'
parameter ::= argument_name ':' argument_mode
argument_mode ::= "value" | ["node" | "variable"] | "reference"
return_type ::= "list" | "node" | "value"
clause_body ::= rule_expression ';'

where the argument mode means:

ModeDescription
value the parameter is passed by value to the clause, as for user-defined functions
node or variable the parameter expects a tree node, as for user-defined functions
reference the parameter expects a reference to a variable, which allows changing the node pointed to by the variable, as for user-defined functions

Example:

attribute_declaration(myAttribute : node, sClassName : value) ::= type_specifier(myAttribute.type) IDENT:myAttribute.name;

While reusing production rules from a scanner to build a parser, for example, the non-terminal symbols of the parser need to pass a node intended to be fulfilled with parsing information, or to contain some data about the context.

It exists a special clause the user may have to define, named #ignore. It allows the implementation of its own production rule for processing empty characters between tokens.

This clause doesn't expect any parameter:
#ignore ::= ... /*the production rule of how to skip characters*/;

To activate it in a production rule, type #ignore with no parameter.

In some cases, you might have to define more than one customized #ignore clause. It is possible too, assigning a key to each new special clause while their implementation:
#ignore["the key"] ::= ... /*the production rule of how to skip characters*/;

To activate it in a production rule, type #ignore("the key") with no parameter, as you could have written #ignore(C++) for activating a predefined ignore mode.

Note that these special clauses must figure at the beginning of the extended-BNF script, before the first appearance for activation in a production rule.

5 Reading tokens for parsing

The functions and procedures described below are available in a kind of parsing scripts: those which read tokens in a procedural way, proposing a set of appropriate functions and procedures. All examples that illustrate how to exploit them are applied on the floowing text to parse:

      // file "Documentation/ParsingSample.txt":
      identifier: _potatoes41$
      numbers: 42 23.45e6
      string: "a C-like string that accepts backslash-escape sequences"
      word: 1\'ecurie_1stable
      blanks:
           "blanks are ignored"
      spaces: "spaces are ignored"
      C++: /*comment*/
           // other comment
           "blanks and C++ comments are ignored"
      HTML: <!--comment-->
           "blanks and HTML comments are ignored"
      LaTeX: % comment
      "blanks must be skipped explicitly"
           "only comments were ignored"

There is no syntax extension provided for this mode of parsing, so it is really considered as procedure-driven, in the opposite of the BNF-driven mode that has been seen in the precedent section.

5.1 attachInputToSocket

5.2 countInputCols

5.3 countInputLines

5.4 detachInputFromSocket

5.5 getInputFilename

5.6 getInputLocation

5.7 getLastReadChars

5.8 goBack

5.9 lookAhead

5.10 peekChar

5.11 readAdaString

5.12 readByte

5.13 readBytes

5.14 readCChar

5.15 readChar

5.16 readCharAsInt

5.17 readChars

5.18 readIdentifier

5.19 readIfEqualTo

5.20 readIfEqualToIdentifier

5.21 readIfEqualToIgnoreCase

5.22 readLine

5.23 readNextText

5.24 readNumber

5.25 readPythonString

5.26 readString

5.27 readUptoJustOneChar

5.28 readWord

5.29 setInputLocation

5.30 skipBlanks

5.31 skipEmptyCpp

5.32 skipEmptyCppExceptDoxygen

5.33 skipEmptyHTML

5.34 skipEmptyLaTeX

5.35 skipSpaces

6 Syntax and instructions for generating source code

A script that must be processed for source code generation is called a pattern script in the CodeWorker vocabulary. It exists three ways to generate a file:

A pattern script, except in translation mode, begins with a sequence of characters exactly like they must be written into the output file, up to it encounters special character '@' or JSP-like tag '<%'. Then it swaps into script mode, and everything is interpreted as script instructions, up to the special character '@' or the JSP-like tag '%>' are encountered. The content of the script file is again understood as a sequence of characters to write into the output file, up to the next special character. And it continues swapping from a mode to another...

For convenience, the script mode might just be restrained to an expression (often a variable expression) whose value is written into the output file.

Expanding a file consists of generating code to some determined points of the file. These points are called markups and are noted ##markup##"name-of-the-markup", surrounded by comment delimiters.

For example, a valid markup inlayed in a C++ file could be:
//##markup##"factory"
and a valid markup inlayed in an HTML file could be:
<!- -##markup##"classes"- ->

A pattern script intended to expand code is launched thanks to the procedure expand that expects three parameters:

Each time CodeWorker will encounter a markup, it will call the pattern script that will decide how to populate it. The code generated by the pattern script for this markup is surrounded by tags ##begin##"name-of-the-markup" and ##end##"name-of-the-markup", automatically added by the interpreter. If some protected areas were put into the generated code, they are preserved the next time the expansion is required.

Note that CodeWorker doesn't change what is written outside the markups and their begin/end delimiters.

A script that is intended to work on translation mode expects first a BNF-like description as presented at section BNF syntax. As for any kind of BNF-driven script, procedural-driven script may be inlayed in braces after symbol '=>'. This compound statement may contain a subset of pattern script, as described in the precedent paragraph, which will take in charge of generating code into the output file. Note that the flow of execution enters into the compound statement in script mode.

Such as for parsing, it exists some functions to handle a position into the output stream. However, the principle is quite different, insofar as the current position of the output stream cannot be changed and always points to the end.

A position is called a floating location and has an ID. A floating location is used for overwriting or for inserting text to a point of the stream that has already been generated. While generating a C++ body for example, it may be interesting to insert the '#include' preprocessor directive as references to other headers are discovered during the iteration of the parse tree.

The procedure newFloatingLocation allows attaching a position to an ID, which represents the name of the location. The function getFloatingLocation returns the position attached to a given floating location ID.

Inserting text at a position leads to shift all floating locations that are pointing to, or after, the insertion point. The offset corresponds to the size of the text. So, it is called a floating location because the position assigned initially to the ID might change in the future.

You'll find below a list of all built-in functions or procedures that may be used into a pattern script, as well as typical preprocessor directives.

6.1 Preprocessor directive: coverage recording

A functionality has been added to code generation, to know where the output comes from. In CodeWorker, an output file is generated by a template-based script. The directive #coverage asks for the recording of every script position giving rise to a piece of the output file.

This directive is located anywhere in the script to study, and requires a variable the code generation engine will populate with coverage data. The variable will be a list of sorted segments, entirely determined by their starting position in the output and by the position of the corresponding script instruction. These positions are respectively stored in attributes output and script.

An adding information is assigned to the node representing the segment. It specifies the type of script instruction, belonging to one of the following values:

Example:

rough text @this.name@ EOL
@

#coverage(project.coverage)

Let say that this.name is worth "VARIABLE_CONTENT". The script generates the following output file:

rough text VARIABLE_CONTENT EOL

The variable project.coverage is then worth the following list:

   ["0"] = "R"
     |--+
       script = 12
       output = 0
   ["1"] = "W"
     |--+
       script = 21
       output = 11
   ["2"] = "R"
     |--+
       script = 29
       output = 27

6.2 Aspect-Oriented Programming and template-based scripts

This section will be extended later. First, we'll just focus on features turning around AOP in CodeWorker.

A template-based script can indicate some joint points during the generation process. A joint point represents a remarkable place in the output stream, like the declaration of attributes or the body of a method. The developer is free to create as many joint point as needed. He gives to them the meaning that he wishes.

The syntax of a joint point looks like:
jointpoint [iterate]? name [( context )]? ;
or
jointpoint [iterate]? name [( context )]? instruction

context is a variable expression. If context is not specified, it is worth to this.

When the interpreter encounters a joint point, it checks the existence of the variable context. If the variable exists, the interpreter looks for actions to execute before, around and after the joint point.

These actions are referred to as advices. They normally intend for generating text at the place of some particular joint points. To determine what are joint points on which an advice must apply, the developer has to define point cuts. A point cut takes the form of a boolean expression attached to the advice.

The syntax of an advice looks like: advice advice-type : pointcut instruction
with: advice-type ::= before | around | after | before_iteration | around_iteration | after_iteration
Note that pointcut is a boolean expression whose variable scope contains two local variables:

When the interpreter considers a joint point, it first looks for each advice of type before where the point cut matches. Then it executes them in the order they were implemented. Next, it looks for each advice working around the joint point (of type around) and executes them. When the interpreter leaves a joint point, it executes each advice of type after where the point cut matches.

An advice can execute the body of a joint point at any time, eventually changing the current context, using the directive #jointpoint:
#jointpoint [( context )]?

If the option iterate is requested, the joint point works on an array. If the array is empty, the interpreter bypasses the joint point. If not, the interpreter iterates each element of the array, looking for advices before, around and after each iteration. The corresponding advice types are referred to as before_iteration, around_iteration and after_iteration.

Example of a jointpoint working on an array:

      // file "Documentation/AOP-example1.cwt":
      Separation of concerns in CodeWorker:
      @
      // a list of method declarations (names only)
      insert this.methods["display"].name = "display";
      insert this.methods["delete"].name = "delete";
      insert this.methods["visit"].name = "visit";
     
      // a joint point announcing the implementation
      // of methods
      jointpoint iterate methods(this.methods) {
          // generates the skeleton of a method
          @ void @this.name@() {}
      @
      }
     
      @... This is the end of the test
      @
     
      // Implementation of an aspect to apply on methods:
      // several advices
     
      // first advice: called while entering the joint point
      advice before : jointpoint == "methods" {
          @// beginning of methods
      @
      }
     
      // advice to apply around each method iteration
      advice around_iteration : jointpoint == "methods" {
          @ // BEGIN around iteration
      @
          #jointpoint
          @ // OUT around iteration
      @
      }
     
      // last advice: called while leaving the joint point
      advice after : jointpoint == "methods" {
          @// end of methods
      @
          traceObject(this, 10);
      }

It generates the following output:

      // file "Documentation/AOP-example1.txt":
      testing separation of concerns in CodeWorker:
     
      // beginning of methods
          // BEGIN around iteration
          void display() {}
          // OUT around iteration
          // BEGIN around iteration
          void delete() {}
          // OUT around iteration
          // BEGIN around iteration
          void visit() {}
          // OUT around iteration
      // end of methods
     
      This is the end of the test

6.3 allFloatingLocations

6.4 attachOutputToSocket

6.5 countOutputCols

6.6 countOutputLines

6.7 decrementIndentLevel

6.8 detachOutputFromSocket

6.9 equalLastWrittenChars

6.10 existFloatingLocation

6.11 flushOutputToSocket

6.12 getFloatingLocation

6.13 getLastWrittenChars

6.14 getMarkupKey

6.15 getMarkupValue

6.16 getOutputFilename

6.17 getOutputLocation

6.18 getProtectedArea

6.19 getProtectedAreaKeys

6.20 incrementIndentLevel

6.21 indentText

6.22 insertText

6.23 insertTextOnce

6.24 insertTextOnceToFloatingLocation

6.25 insertTextToFloatingLocation

6.26 newFloatingLocation

6.27 overwritePortion

6.28 populateProtectedArea

6.29 remainingProtectedAreas

6.30 removeFloatingLocation

6.31 removeProtectedArea

6.32 resizeOutputStream

6.33 setFloatingLocation

6.34 setOutputLocation

6.35 setProtectedArea

6.36 writeBytes

6.37 writeText

6.38 writeTextOnce