AnalysisFrameworkUpdate

Kevin Brightwell edited this page Aug 26, 2015 · 1 revision
Clone this wiki locally

Analysis Framework Changes

Introduction

It was discussed that a new way to analyse the result of parsing an input. The idea was to automatically detect where token leaf's were needed based on their location in the grammar. For example: invariant is the parent token of constraint and therefore whenever an invariant token needs to be analysed, all that need be done is a search for an appropriate analyzer which can take those subtokens, genExpr for example.

Details

The way we implemented it was that for general rules that is ones that have definitions like:

invariant: [[[constraint](.md)] ]

invariant needed an analyzer. The only things that need analyzers are tokens that look like [name] etc.

The way an analyzer is detected is it has a suffix Analyzer to its name and can be cast to the superclass Analyzer, and also is in the cruise.umple.analysis package. So for example InvariantAnalyzer { isA Analyzer; }` is an example of this. For [token] style tokens, TokenAnalyzer is acceptable, as well as more specifically the encompassing Analyzer's name, for example constraintName : [name]... needs an Analyzer for the [name] token; because NameAnalyzer is too general and would match other names, ConstraintNameNameAnalyzer can also be used. This capability can be used cleverly, for example EqualsOp is unique enough a name that it is known the context with which the token is created is going to be within some sort of constraint. Therefore a general EqualsOpAnalyzer can be specified for the majority of cases. And then further a AssociationEqualsOpAnalyzer can be used because the equals operator token in this case must be handled specially.

So once we have the names, the GrammarAnalyzer in cruise.umple.parsing.analysis (a separate entity from the above sense of Analyzer) will make a tree within the current UmpleModel of Analyzers with all their possible child Analyzers. To begin analysis of a token, from any position, for example within the analyzeInvariant method where we know that the token passed is an invariant token. We call the inherited analyzeToken method from the Analyzer super class. The Analyzer super class calls first the prepare method which the subclass Analyzers inherit, then it tries to call analyzeToken for each of the subtokens from the token. To call analyzeToken it must first create a new instance of the child class. To create a new instance it must know the variables it needs to pass, it does this by name matching, for example if a parent invariant class has a variable set "uClassifier" and the child class genExpr has the same variable name, the invariant analyzer will try to set the new instance of the genExr analyzer's variable "uClassifier". In this way variables are pass from one analyzer to another without having to specify such interaction. NOTE: this can have unintended consequences when it is intended for there to be a class variable that is initialized only for that class, but instead the analyzer has the same named variable and then tries to pass it. That being said, it is pretty straight forward.

One other thing to note is that the Analyzer classes can have no parameters to their constructor, initialization within the constructor is possible, but not recommended, instead use the prepare method inherited from the Analyzer super class.

The final step, after the attempt at calling analyzeToken on the child classes for each subtoken, the analyze method of the Analyzer is called, this method is inherited and should be overridden by subclasses, though it doesn't need to have anything in its body. This analyze step is usually the step when the analyzer does "what it needs to", such as a name token setting the attribute under construction to the correct value, or the different elements of a constraint being added to the growing "rawLine", etc.

The general three things steps in creating new Analzyer classes:

  1. make the class have the variables it needs, trying to make sure that those names don't conflict. These primarily include variables the parent class needs to send to the child class, as well as variables that hold values for collection.
  2. Make the prepare and analyze methods work appropriately, doing little bits of the work. The analyze method should only do the bit of analysis that concerns the parent token that has been passed. It should not concern itself with super or sub tokens, those will be handled by their appropriate Analyzer assuming this model works.
  3. Make sure that the entry point into the analysis is working. model.getAnalyzer("invariant").analyzeToken(token) is a typical looking example, these entry points may someday all become simply the root token, but only if all the UmpleInternalParser has been converted to this new format, until then the modularization of different analysis is key. There need not be a complete overhaul, just slow creep towards this new paradigm.

In the end there is a structure of classes that each does its own task in the context of analysis, the structure of classes being ordered in such a way as to have parent Analyzer's only have access to the child Analyzer which is needed.

One addition possible future functionality this technology lends is the ability to add .class files directly into the /bin/cruise/umple/analysis folder and have new grammar rules work without any need for knowledge about the other features of umple. All that would be required is the import of the Analyzer class which this new client's analyzer inherits from, the import of Token for obvious reasons, and the import of UmpleModel(though perhaps even the god-class-ed-ness of the umple model may someday change to accommodate such new classes, for example a more generalized "generator" class, which could be inherited from.

In the end the efficiency is slightly reduced, in favour of having a more modular code base which can easily be improved upon. Another point of note is that previous to this framework the Analysis was by far the fastest step in the generation process, so it could stand to use a little more time and still not bog down the system. Finally this new framework should lead to a more readable and maintainable code base, allowing for quicker integration of new features.