A whole re-engineering project on OpenVMS

Gerard Calliet, pia-sofer.
gerard.calliet@pia-sofer.fr

Introduction

The client must switch from OpenVMS Alpha to OpenVMS Itanium. The majority of the system is not a problem: it's Cobol / DECforms, or Tuxedo. HP provides the firsts, the second Oracle. But there is a core application that runs on Rally (Oracle L4G) and Oracle dropped the support since 1993. No port for the Rally engine on Itanium, the application blocks virtually the entire operation.


HP France has in France professional “jokers”, here, in this case, X9000 company, and calls them in difficult cases. pia-sofer and X9000 worked together for several years for the sake of the VMS ecosystem (provms association). A team is created, the adventure begins. 25 large Rally applications should be rewritten in COBOL / DECforms. Because the application is frozen since at least 10 years, they do not want to train again the users, so the requirement is strict iso-functionality.


The only possible starting points are the 25 Rally Reports, printable documents describing the applications, automatically generated by Rally. The client had tried to port some and had to write 20 to 30,000 lines of COBOL / DECforms per application. So, the target is about a million lines of code. The original developers are gone. Rally has become a little known product. The time from the order to a dead line is short.


Principles

The maxims that guided the project:

  • "Put ourselves in the place of the creators to do it again, deal properly with the realities of the situation",
  • "A program is a formal object created by human beings"
  • "Innovation is not the struggle of the modern against the old"
  • "There is no learning without iteration"

Their consequences for us:

  • A detailed analysis of the existing, a precise choice of tools,
  • An approach that combines human intelligence and automatic processes,
  • Include old and new technologies, old and new skills,
  • Systematically iterate analysis, experimentation and understanding.

Realization

Production: from reading to writing

In this chapter the whole chain of tools for analysis and production is described. In this chapter, no human intervention. Fragment by fragment, the final puzzle is constructed.


We will see in the next chapter how this chain can exist, and how it reaches the right result. For now, up to engines.


First of all, thanks to Mr. Thierry Uso: he is porting since many years any kind of open source products to OpenVMS. To him are due many of the choices of "engines" used. It is not uncommon that they have been ported by him on OpenVMS.


From text to content: regular expressions in Ruby

The text of the Report Rally must first be transformed into a material that can be used by computer. The best way to interpret this kind of text is the use of Regular Expression. We started by writing a Parser for Rally Reports. This parser was developed in Ruby. The Ruby language is particularly well suited for the use of regular expressions, adapted to the rapid development, object oriented and having good XML interfaces.


Our parser is able to read and interpret all the Rally clauses of the 25 applications (90% of all general clauses Rally). The parser takes into account all the vocabulary for describing applications, all specific values for the application, and turns it into an xml document. The parser does not add or subtract anything from the Report. It simply transforms a pure text document into an XML document. The writing of this parser is the first Rally learning for us.


The universal repository: xml

In all phases, XML is chosen as the repository for the information which describes the applications. XML has the advantages of being significant for both humans and machines, and xml editors easily permit simple explorations (xpath), synthesis, or extractions of data (XQuery or XSLT transformations). All our chain for analysis and production primarily produces xml. The analysis and processing of XML documents is central for our learning of the general structures of Rally and the specifics of each application.


From Pseudo-code to its meaning: antlr

Like any 4GL, Rally contains a micro-procedural language (ADL) that allows programmers to transform standard Rally interactions to fit the actual needs of the application. (The ADL is the “Rally javascript”).


The ADLs are snippets that look like a simplified Pascal, recorded in the Rally Report. The parser isolates them, and deposits them in the form of text files.


To extract meanings, we need a tool like micro-compiler that knows the ADL language. Product antlr (descendant of yacc) is used. Starting from the statement of the ADL grammar, antlr generates java classes that do the compiler tasks.


This micro-compiler extracts the essential information of ADL code, and turns them into xml snipets, then embedded in XML document for the application.


Analysis, transformations, synthesis: Scala

Once we know under what rules "Rally objects" will be translated into "Cobol or DECforms Objects", we must achieve these transpositions per program.


The translation programs take as input the XML documents describing the Rally application, and produce as output a set of xml documents describing the DECforms COBOL fragments that we want to create. These translation programs are written in Scala. Scala is a wrapper of Java, including xml excellent management, and adding the power of functional paradigm to the object paradigm. It can run on any OS with a Java virtual machine ... as OpenVMS (see Scala port by Thierry Uso).





Producing the code: xslt, antlr

It is not a good choice to product immediately, from analysis and synthesis applications, the target code.


The final code in COBOL or DECforms are fragments of synthetic structures, corresponding to the various transpositions of the Rally programming models to their corresponding DECforms or COBOL. These models are expressed as templates (XSL). The final code fragments, specific for each application, are instantiated using xslt. Xslt transformation takes as input one template and one file of specific values for an application and produces the final code.


For the translation of procedural snipets (ADL), the product antlr is used to produce, some COBOL to be called from DECforms as a Procedural Escape, or some Ifdl, to be included as an internal response in the DECforms Form.


Refactoring: Scala, Perl

The fragments produced may require additional refactoring pass. Refactoring operations are written as appropriate in Scala or Perl.


Seeing: graphml, yEd

Rally applications can be viewed as graphs of objects. Our analysis programs, in addition to pure data analysis, produce documentations in the dialect .graphml. .graphml files are transformed into pictures by the (Open Source) tool Yed.


These graphics are used both to conduct the analysis and synthesis and to provide help in the final debugging of applications.


Glue: DCL

All these machines for extraction, synthesis, refactoring create and use a large number of files (on average a thousand files per application). The production chain is a series of operations. The end result should combine fragments from different phases.


All these operations are performed in DCL, simple and powerful enough for such situations. Some simple XML write operations are made directly in DCL.


Conception: writing, understanding

Rally Model, DECforms Model

The transposition operation presupposes that it is possible to determine the programming model of Rally and to find equivalents in the programming model DECforms. We must also understand what use was made in the model Rally by the writers of the applications.


Rally model and DECforms model are both similar, and structurally different in important respects. Some differences are immediately recognizable: for example Rally transparently manages access to data, while DECforms cannot work without some COBOL data access layer. Other issues have emerged gradually, as the stack-oriented management for interaction sequences or the implementation of conditional navigation.


These three examples of difficult decision-making are outlined below.


Above all, we must note here that these discoveries and inventions were mainly carried out by "Cobol" team, who used their great experience and sagacity. Not to mention the others, we must mention here the key roles of Ms. Myra Benseghir and Marie-France Clech.


Inventions

Data Access

Rally has a catalog of data, rules of parameterized transparent access to data, and a micro-API language (DML) for the data access within ADLs. All this machinery, included in the Rally engine, is replaced by a set of specialized modules in COBOL and SQL-Cobol. The templates of these modules are instantiated according to data catalogs per application.


These machines meet the transactional model from Rally, multi-base and multi-transaction, using the "Connect" blocks from rdb.


Recursive DECforms

Rally organizes the user transactions and ADL actions following a stack model. On the contrary, DECforms is typically invoked by sequences of requests from a COBOL program or an ACMS task.


To fill this gap, we used DECforms reentrant mode. DECforms calls itself using external Escape Routines.


Navigation Engines

Conditional navigation in Rally and DECforms has similar results but very different implementations. Rally model can be described as mainly geographical, with memory of routes in pre-drawn paths, as DECforms model can be described as mainly historical, dynamically activating and deactivating fields the user visits.


The transposition resulted in the creation of a specific software component (which we named GPS) that transforms a Rally "geography" into a DECforms "story".


Experimentation, confrontation

The various models of transpositions are not "born fully armed" from our heads. A number of fundamental assumptions presided to our work. But it is through patient experimentation, confrontation between the actual behaviors, the discoveries and new formulations we induced from experiences that we could get a detailed understanding of the Rally models and we could imagine possible transpositions into DECforms.


Finalization

We have managed important issues on the one hand, but we also had to create detailed answers for each application or piece of application.


It's really at the very end of the process that we were finally able to understand the precise capabilities of the applications, after we could carefully master Rally dialect, and idioms decided by the client.


The end point: to know the truly application function and behavior. There the client's role was decisive. Just as we would not have succeeded without dealing internally between old and new skills, the ultimate success was only possible thanks to the collaboration between the client and us. Here we must thank the teams of the customer for their support in these latter difficult operations.


Method

From right to left, from left to right?

The re-engineering is confronted by nature with the need to conjoin induction and deduction. In any case it is necessary to walk backward the creators go, from their results back to their initial choices, and compare what they plan with the actual outcome, and to weigh the environment constraints against the free choices for development. We can call this part of the work "active reading". A reading which not only understand the meaning, but which must know the style, and has to re-learn sometimes the language itself.


In the other direction, the writing of the results is basically an experiment aimed at the same time for constructing the results and deepening the knowledge of the existing. We can call this part "experimental writing".


The method is a continuous communication between this reading and this writing.


Parallels must meet: « V » « U »

The method has a well-known relative, but it turns it significantly.


This well-known relative is the famous “V” cycle which places a left top-down analysis beside a right down-top implementation process. Functional specification and up and running software are at the top of the schema, and programming is a junction point for the "V".


Our method follows the top-down vision for analysis, the constructive vision for the realization and the need for correspondence between the two branches of the "V".


But the projected junction seems to distort perspective. A better picture is the famous twin tours descriptive diagram for network protocols. In this schema, the junction point is the medium of exchange itself (everything is actually on the Ethernet wire, fiber optics, etc ...) and each protocol layer is unified for itself.


This (“U”) schema is a better image for our method.


Any software realization aims for a "right" programming as correct point of contact between the user's need and its implementation by the program. And any software works joins top-down and down-top cycles, searching for a convergence.


The error of perspective drawing "V" is the belief that the junction is deductive and necessary, as it is both deductive and inductive, and is only aimed. We join projection-production cycles – carried out with our modern tools and by our modern designers -, with cycles of real invention and reflection - carried out by our COBOL experts -. By successive repetition of these cycles, promoting an ongoing exchange between designers and realizers, we build and refine our joint point – our concrete, actual and correct programs -.


Time

The apparent risks of this method are that nothing, officially, does guarantee that the process can converge. This risk is only more apparent than in a conventional method, which is a belief on a false assurance.


The real difficulty of the method is the exponential pace of his timing and of his quality cycle. The first rounds are very long, sometimes discouraging, and the method is slow in the beginning. However, when first convergences occur, it is growing very quickly, as well as quickly increasing in quality.


Results, prospects

Productivity

As stated in our header, the result in terms of productivity is impressive. The project produced about 1.3 million lines of code from scratch (and ready for use), with only text documents for starting point. The total resource were 1000 man / days - all phases of design and construction included. So we had an average production ratio of 1300 lines per day per person.


Accuracy

An interesting result is the speed with which the accuracy has been reached, once the core of our work had been established. The number of errors reported by the client, while the desired level of accuracy was highest, were very limited.


The reason for this is the method. The convergence of the method is placed namely in the programs themselves. When convergence is reached, the programs are actually very accurate.


XML Knowledge base

Intermediate results in xml which describe the Cobol and DECforms “objects” producted, as the generic DECforms and Cobol templates are side results of the port. First they are very rich program documentation.


These side results can be easily reused to transform the existing software, or create interfaces very naturally integrated into the SOA architecture.


Conclusion

We finally retrieve a fundamental cognitive constant, namely that there is no learning without iteration. Reengineering is the domain 'par excellence' of reactivation of formalisms by iterative approaches of learning; it must conjoin strong intuition of designers and use of good engines.


It is also noteworthy that OpenVMS culture, the pool of skills nearby it, and the availability on Itanium OpenVMS of most of the modern tools have been key factors in the success of the portage.


Our experience has validated our method which parallelizes top-down and down-top approaches, we were able to better use a large set of modern tools on OpenVMS. Our workshop is ready for new experiences of porting and modernization. One of the next projects is FMS, then, why not, DECforms or ACMS.


For more information