Integrating source code generation into a project

1 Reusability

IT people is convinced about the importance of reusability, which is applied to designs, source codes, technical and functional components:

designs offer to reuse some modeling processes called design patterns,
an object language answers to the concept of reusability with:
- the polymorphism,
- the inheritance,
- the encapsulation,
Some of them provide the reflectivity, which opens a large field of possibilities, thanks to a kind of generalization of the source code. Some others bring template functions and classes that is another step towards the factorization of the source code.
components may be seen as a black box intented to provide services, and the way to reuse a component is to embed it into a module (object, library, executable, server) and to offer an interface for accessing its services.

Some points are relative to the concept of reusability, considering the capitalization of skills, the IT knowledge, independently of the Business, but have no satisfying answers today:

How to integrate any kind of formal representation as a part of the project's modeling design?
A design is built under a modeling tool generally, but the modeling tool cannot accept to enrich the design with a very exotic format that conforms to an unknown syntax and that requires to change the internal representation of the world and the way to implement it.
How to automatize the implementation of a design pattern?
Perhaps that, starting from the design, your modeling tool offers the generation of some classical design patterns. Perhaps that it is possible, under some constraints, to implement its own design patterns via the modeling tool. But what about the flexibility and the readability and the convenience for use and the coverage of wishes?
How to automatize the bridge between the design and the implementation?
Modeling tools offer to generate the source code corresponding to the design in some classical languages. The implementation is often restricted to write the skeleton of classes and to implement some design patterns. If you want to choose another language:
- you may have to pay fees for a new package,
- you may have to adapt some properties that depend on the target language,
The generated code is quite poor, and you feel that it should be possible to generate more from the design.
How to customize the style of the implementation, the features to include?
It depends on the flexibility of your modeling tool. The style for implementing some parts (attributes, methods) can be customized often. But changing completely the way an implementation is done may become very tedious and unreadable (if possible!). For example:
- to move a behaviour out of the class and to put it into a visitor-like design pattern where all must be implemented,
- to implement a wrapper that inserts and updates and extracts objects into/from a database, where all convenient stored procedures are generated too,
The simplest and the most flexible it appears to change the generation, the furthest you might progress in the complexity and the proposal of new features. For instance, if the chosen syntax for writing JAVA or C++ programs was XML, which is very fashionable for the easiness it is lending itself to a computer processing, the human being wouldn't have been able to write as complex and powerful programs as today, because the syntax isn't adapted and too verbose: too much symbols should be required to express simple concepts.
If you are convinced of this argument, you will be perhaps convinced that the code generation should be processed thanks to a language as much adapted as possible to simplify its description. It eliminates XML, but also Visual Basic and all large domain programming languages (C, C++, JAVA, Fortran, ...).
How to preserve a project of the constraining and often definitive choice of a target language?
The only way to keep a project independent from the language's choice is to report the details of implementation as much as possible into the design. The most convenient way to express the details of implementation is often to use a programming language, but not systematically. If so, this programming language should be devoted to the phase of modeling of the design and might be translated to a classical programming language.
Writing to a such language makes rather tricky the validation if no adapted environment is available, so it slows down the development process: changing the design and next, generating code and next, testing and next, taking back corrections to the design and let's continue the loop. If not a lot of protected areas (implementation present in the source code of the target language only, nothing coming from the design) have been populated, a good deal might be to support a set of scripts for each targeted language. The difficulty lies in keeping all sets of scripts at the same level. The inconvenient of supporting multi languages is how to refer to standard libraries: some functionalities exist on a language, some other not, or are exploited differently. However, it is possible to minimize the impact of choosing a language and to isolate rigorously what depends on a programming language exclusively.
How to propagate a new feature into a lot of source files?
The new feature may consist of serializing all business objects in a XML stream, for instance. It is impossible to implement it with the reflectivity as in JAVA, because one cannot distinguish between an aggregation and a common association, which don't lead to the same serialization (the description of the aggregated object is embedded into the description of the parent). The most convenient way is to dispose of the modeling design and to modify the process of generation as simply as possible.
How to limit strongly on the implementation, the impact of modifying the modeling design?
The most information exploitable by the computer you put into the design, the less source code or documentation you will have to modify or to add by hand each time the design will change. The design must be able to express as much concepts as possible, which will have to be implemented automatically.
How to automatize the implementation of an architecture? Another underlying question: how to keep independent of the choice of the architecture while building the core of applications or modules? If you don't dispose of tools for generating automatically the layer of communication with the framework you have chosen:
- you will do it by hand and waste a lot of time for that,
- you will exploit some facilities provided by the framework to integrate your implementation, but you will write a framework-dependent source code to help accessing the functional part (see J2EE for example).
In both cases, the required investment for implementing the layer will discourage you to rewrite it later for another framework.
How to get back the IT knowledge to another project? You hope that some developers that own the technical skills will not leave you before the end and that will agree to work on the same technologies.

CodeWorker proposes an answer for each of these points.

2 The interest of controlling the format of the design

The first interest of controlling the format of the design is to be able to acquire data into the source code generator obviously.

The second one is to allow adapting a modeling language to specific needs. It may be to enrich a UML (Unified Modeling Language) design with some features that are necessary to a better mapping to implementation, if you consider that UML isn't expressive enough to allow a source code generation as fine as expected and not depending on the target language. Today, thanks to Rational ROSE, some more detailed information can be added to an UML design, but they are depending on the target language. For instance, if a method's parameter has to be given by reference, the designer has to know that he wants to generate C++ and then, he writes std::string& explicitly as the correct translation. Then, the design cannot be taken out for a JAVA code generation after. Finally, there is a strong dependency between the design and the language which will be used for generation.
So, some extensions might be brought on UML for remaining free from the choice of the target language:

containers, such as list<value-type> or hashtable<key-type, value-type> or set<value-type>,
PK<basic-type> for an attribute that holds the role of a kind of primary key,
conditions of existence of an attribute that determine whether an optional attribute must be populated or not,
checking rules that must be valid if the attribute exists,
definition of constructors,
a little design pattern called build, which is applied to aggregations and that generates a build<aggregation-name> method for each constructor of the aggregated object,
a little design pattern called add, which is applied to lists of aggregations for building new instances and adding them automatically into the lists,
const and static and virtual (or final keyword, it depends on what is admitted between C++/JAVA philosophy for polymorphism) as specifiers for methods,
the parameter modes:
- in: parameter is given by reference and can't be changed,
- out: parameter will be created and assigned into the method,
- inout: parameter is given by reference and allows changing,
- nothing: parameter is given by value (copied into the stack in C++),
the throw<exception-type> to specify what kind of exceptions might be raised by a method,
some design patterns:
- the visitor,
- the redirection of methods and accessors to an encapsulated object,
- the multi-dispatching like in ADA,
some free annotations (perhaps not reusable properly), such as sql, to specify how to map the design to a SQL schema (attributes to make persistent, objects to map to a table or to merge to another, ...) and to generate the stored procedures for inserting and deleting and selecting objects,

The advantage of enriching UML is that one can draw the design into a modeler, under RATIONAL Rose for instance, and to put extensions into. It becomes less readable and the project cannot be generated with the modeler anymore, but classes and relationships are built and displayed in a very convenient way. Of course, the design's file must be readable by CodeWorker and that's the case for "*.mdl" files, which are produced by RATIONAL Rose (This isn't advertising!). See the script repository on codeworker@free.fr for taking the adapted parsing script back.

If the readability has suffered too much of adding all these features in a graphical modeler or if the data you want to handle are far from a UML representation, you can develop your own modeling language or to adapt one of those proposed into the script repository.

Just a point to notice: we don't care about the syntax of the designs to parse, but the structure of the parse tree is very important to warrant the reusability. So, if you want to improve a modeling language provided into the script repository, be careful about changes you'll made on the parse tree. Adding new attributes on nodes have no impact on the existing generation scripts, but removing or renaming some attributes of the parse tree will change the generated text (some expected attributes will not be found).

So, an effort must be made to document as much as possible the structure of the parse tree, to avoid diverging, so as to allow the reusability of your work or to reuse the work of somebody else.

3 Driving the implementation with CodeWorker

CodeWorker provides a scripting language where the syntax is adapted for parsing and code generation tasks. It proposes an easy way to navigate along parse trees too. These three main aspects allow both acquiring data and generating any kind of free text in a very convenient way (adapted syntax and data structures).

CodeWorker offers some basic functionalities for handling files and directories, which avoid using shell or Perl scripts, while building development/test environments within a team for generating/compiling/debugging/sharing source code. The data structure of tree and the foreach...cascading statement (see foreach) provide a very convenient way to visualize/navigate along directories.

...