SqlGenerationSpecification

Timothy Lethbridge edited this page Jun 14, 2016 · 2 revisions
Clone this wiki locally

TL Tim has inserted comments in italics, with TL at the start. It is suggested others use the same notation

Introduction

SQL generation from Umple is still in it's infancy. Currently, only basic Umple can be converted to SQL successfully. This page is dedicated to the specification for future SQL code generation. Feel free to add comments or edit the page to add important Umple ideas that I may have missed or wrote bad SQL for.

TL Note that the experimental version can be tried in UmpleOnline by first typing 'av' into the line number box, in order to activate advanced features. The initial version was created in Fall 2011 by UCOSP students, and Adam has been working on improving it in Winter 2012. Although eventually this feature could help industrial users, initially I would like as a minimum to be able to generate basic SQL tables for use in teaching, with the objective of showing students what a database would look like, corresponding to any class diagram (with states shown using enums).

Details

Namespace Statements

Meaning in SQL:

  • The most similar idea in SQL to namespaces is the schema (database). A schema (database) is basically a group of database tables that share something in common. Usually, these tables are all part of the same system. In SQL, every table must be part of a schema. #### Concerns:
  • None #### Generated SQL:
CREATE DATABASE IF NOT EXISTS `[given namespace]`;
USE `[given namespace]`;

Status:

  • Working as specified

Interface Declaration

Meaning in SQL:

  • No reasonable SQL analogue. Interfaces are ignored entirely #### Concerns:
  • None.
  • Adam and Tim agree on this. #### Generated SQL:
  • N/A #### Status:
  • Working as specified

Class Declaration

Meaning in SQL:

  • The agreed upon choice is to generate a new SQL table for each Umple Class.
  • TL: Agreed, but note that these would not be the only classes; also there would be classes for many-many associations.
Umple “isA” keyword
  • Approach 1: have a parent table and a child table where the child has a foreign key to the parent. However, when querying the table for a child you would have to always use join operations, which are taxing on the database. Also, if you wanted to add a new child to the database, you would have to first insert a row into the parent table using the child's “parent” information. Then do a second insert into the child table with the child's “child” information, and include a foreign key to the parent row you added just prior. SQL can't do this on it's own.
  • In the case of classes that are children of other classes, what should their Primary Key(s) be? TL answer: Always the same primary key as in the class at the top of the hierarchy.
  • Approach 2: Have a child table that duplicates all the fields of the parent(s) to the top of the hierarchy. This means that queries at the 'parent' level would have to query all child tables.
  • Approach 3: Generate code only for databases that support inheritance.
  • TL: Use Approach 1 for now, but design the code so it would be easy to convert to allow use of database syntax for a database that allows inheritance. Comment well so we can figure out how this would be achieved.
Umple “singleton” keyword
  • The direct translation of this pattern into SQL would be a table that never has more then a single row in it. The singleton pattern can be implemented in SQL using triggers.
DELIMITER //
CREATE TRIGGER `singleton_test_table` BEFORE INSERT ON `test_table`
FOR EACH ROW BEGIN
    SELECT COUNT(*) INTO @num_rows FROM `test_table`;
    IF @num_rows > 0 THEN
        SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'ERROR: Attempt to add additional row to Singleton table `test_table`';
    END IF;
END;
//
DELIMITER ;
  • TL this would be a 'nice to have' but for now would be a lower priority unless it can be done very easily
Umple “immutable” keyword
  • An immutable table in SQL would be one where rows cannot be edited once they are entered. Like the singleton pattern, it can be implemented using triggers. Note: the INSERT trigger is there to catch REPLACE statements.
DELIMITER //
CREATE TRIGGER `immutable_test_table_insert` BEFORE INSERT ON `test_table`
FOR EACH ROW BEGIN
    SELECT COUNT(*) INTO @num_rows FROM `test_table` WHERE `test_table`.`name` = NEW.`name` AND `test_table`.`location` = NEW.`location`;
    IF @num_rows > 0 THEN
        SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'ERROR: Attempt to modify existing row in Immutable table `test_table`';
    END IF;
END;
//
CREATE TRIGGER `immutable_test_table_update` BEFORE UPDATE ON `test_table`
FOR EACH ROW BEGIN
    SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'ERROR: Attempt to modify existing row in Immutable table `test_table`';
END;
//
DELIMITER ;
  • Note that individual attributes can be immutable, or the entire table.
  • TL this would be a 'nice to have' but for now would be a lower priority unless it can be done very easily. All code generated needs to work in MYSQL. Eventually we might add profiles for use in other databases
Umple “depend” keyword
  • In SQL, there exists a “USE” statement. The meaning of this statement is similar to Umple's “depend” statement but not exactly the same. In SQL, the “USE” statement allows tables to be referenced without a particular database name preceding them. For example, the SQL “USE the_system;” allows one to reference the table the_system.users as, simply, users. This seems like an excellent analogue to Umple's “depend” statement, except for the fact that only the latest SQL “USE” statement is used.
  • Note that only databases can be referenced with the “USE” statement.
    • Concern 1: Issue generating SQL from Umple when Umple code contains a “depend” statement and a “namespace” statement. Impossible for Umple parser to figure out what class names are covered by the “depend” statement and which are covered by the “namespace” statement. For example, consider the following Umple code. The questionable item is the table banned_ip. Is it a pre-existing table in the system_abc database or is it a pre-existing table in the system_xyz table? How is Umple to know? Other languages will figure this out during compilation or runtime, but due to the limitation of the USE statement in SQL (described above), we must know which is which while generating the SQL from Umple.
namespace `system_abc`;

class `users` 
{
  depend `system.xyz`;

  name;
  password;
  ip_address VARCHAR(15);
  /* 
  * I would use CHECK here, but can't in Umple yet
  * Instead, here is a method, to be passed through to the generated SQL
  */

  CREATE TRIGGER `prevent_banned` BEFORE INSERT ON `users`
  FOR EACH ROW BEGIN
    SELECT COUNT(*) INTO @isBanned FROM `banned_ip` WHERE `banned_ip`.`ip` = NEW.`ip`;
    IF @isBanned > 0 THEN
      SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'ERROR: Attempt to add new user with banned IP address';
    END IF;
  END;
}
  * Answer from _TL: depends is allowing reference to other namespaces in the same way as SQL use. Namespace is defining the contents of this namespace, so there is no conflict._
* Concern 2: If the user enters a class name instead of a database name in an Umple “depend” statement, then the generated SQL will contain errors.
  * Answer from _TL: Don't generate code with errors. Better to generate nothing than code with errors._
Umple “key” keyword
  • The Umple “key” keyword has the same basic function as SQL's “PRIMARY KEY” clause.
  • Note: If the user did not specify a key in a class, then force the first available option below to be the Primary Key of the class. If none of the below options are possible, then the class shouldn't be generated at all.
    1. The parent class' Primary Key attribute(s) if class is child.
    2. The first attribute of the class.
    3. The Foreign Key of the first associated class.
Umple “trace” keyword
  • Obviously this Umple keyword has no place in SQL. It's existence will be ignored when generating SQL.
Method Definitions
  • User written methods in Umple code are written in the base language. Therefore, if the user plans on generating SQL, they will naturally write these “methods” in SQL as SELECT, INSERT, CREATE FUNCTION, etc statements. It is not unreasonable to pass these methods on to the generated code as is. They should be placed after the CREATE TABLE statement.
  • TL We will be adding a 'strictness' keyword soon that will allow people to avoid any methods at all and have them detected

Concerns:

  • The biggest issue I foresee is the question, “what if the table already exists?” Generating “CREATE TABLE IF NOT EXISTS class name” is probably the best answer to this question, but there exists the possibility of using an “ALTER TABLE” statement. If “ALTER TABLE” is used, it makes generating safe and error free SQL from the given Umple much more difficult to guarantee though.
  • I've noticed that in other languages, if you create a totally empty class in Umple, you can generate correct code from it. However, in SQL, a totally empty Umple class would cause SQL errors (can't have a table with no attributes & no Primary key). If the table has only attributes or only associations, then it's fine, but lacking any structure is a problem. I assume the best course of action is to throw a warning and generate nothing for that class.
  • TL: For the first cut, I would be content with “CREATE TABLE IF NOT EXISTS". or even simply "CREATE TABLE". Reason: In reality there would be a need for a ruby-like delta mechanism. For now we will limit Umple to creating a new database. If the Umple then changes, there would be a need to create an Umple delta and hence and SQL table delta. This would be future work

Generated SQL:

USE [given depend name];
CREATE TABLE IF NOT EXISTS [given class name]
(
  [If child class, add parent class' Primary Key attribute(s)]
  [Attributes & Associations]
  PRIMARY KEY([user-defined key(s) of class OR the parent class' Primary Key attribute(s) if class is child and no key was specified OR the first attribute of the class if no key was specified OR the Foreign Key of the first associated class])
);

[If singleton, add singleton trigger (see above)]

[If immutable, add immutable triggers (see above)]

[given methods (if any)]

Status:

  • Class declaration is done.
  • Key contents are output as the Primary Key of the class. A class with no key has a Primary Key generated for it using the available attributes/associations within the class.

Attribute Declaration

Meaning in SQL:

  • Since Umple classes are being converted to SQL tables, Umple class attributes must be interpreted as columns (otherwise known as “entity attributes”) of the generated SQL table. Columns have types like attributes do, so this conversion makes the most sense.
  • The Umple keywords "lazy" and "settable" are ignored in SQL generation. In SQL, all attributes are defaulted to NULL if no default value was given, so “lazy” is implemented by default. "internal" attributes are ignored entirely (non-existent in the generated SQL).

Concerns:

  • String type attributes have a very long max length (possibly only limited by the available system memory, depending on the language). In SQL, limiting strings to a small size is preferable to allowing very long strings to exist in the database. Very long strings can, however, be stored if desired as “TEXT” types. The issue is that “TEXT” types cannot have default values. So the Umple code 'String location = “United States”' could not be converted to valid SQL. This problem can be avoided by using the “VARCHAR” type instead of the “TEXT” type. However, while the “TEXT” type allows for strings up to 65k bytes long, “VARCHAR” allows only up to 255 bytes long.
  • If attribute type is another class within the Umple file, then it must be detected and converted into a SQL foreign key. If the attribute type is a class not contained within the given Umple file, then a foreign key cannot be specified (since we don't know the Primary Key(s) of the other class) so it must be converted into a “BLOB” type instead.
  • The Umple keyword “defaulted” has the same meaning in SQL as initializing an attribute. Actually implementing the “reset” function in SQL is technically possible (using a user-defined function), but adds hard questions to the code generation. For example, where to store the default values? (Tim says: ignore for now)
  • How to implement the SQL "CHECK" clause in Umple? (Other then the const usage)

Generated SQL:

If attribute type is an Umple primitive:

[given attribute name] [translated attribute type],

If attribute type is valid:

[given attribute name] [given attribute type],

If attribute type is a class within the Umple file, then for each Primary Key of the other class:

[given attribute name]_[Primary Key name] [attribute type of the Primary Key],
CONSTRAINT fk_[given attribute name]_[Primary Key name] FOREIGN KEY ([Primary Key name]) REFERENCES [associated class name]([Primary Key name]),

If attribute type is not valid and not a class within the Umple file:

[given attribute name] BLOB,

In all cases, (except type is not valid and not a class within the Umple file):

[if initialized or defaulted, add “DEFAULT [given value]”]
[if autounique, add “AUTO_INCREMENT”]
[if const, add “CHECK ([given attribute name] = '[given value]')”]
[if unique, add “UNIQUE”]
[if notnull, add “NOT NULL”]

Furthermore, for const attributes, add the following after the class declaration:

DELIMITER //
CREATE TRIGGER `const_[attribute name]_insert` BEFORE insert ON `[class name]`
FOR EACH ROW BEGIN
    IF NEW.`[attribute name]` <> "[const value]" THEN
        SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'ERROR: Attempt to modify constant attribute `[attribute name]` in  `[class name]`';
    END IF;
END;
//
CREATE TRIGGER `const_[attribute name]_update` BEFORE update ON `[class name]`
FOR EACH ROW BEGIN
    IF NEW.`[attribute name]` <> "[const value]" THEN
        SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'ERROR: Attempt to modify constant attribute `[attribute name]` in  `[class name]`';
    END IF;
END;
//
DELIMITER ;

(Note that the "unique" and "notnull" Umple keywords are not implemented yet issue 87 and issue 88 )

If attribute is immutable, add the following trigger to the class which contains it. Note: the INSERT trigger is there to catch REPLACE statements.

DELIMITER //
CREATE TRIGGER `immutable_[attribute name]_insert` BEFORE INSERT ON `[class name]`
FOR EACH ROW BEGIN
    SELECT COUNT(*) INTO @isDisallowed FROM `[class name]` WHERE /*Not sure what to put yet...*/;
    IF @isDisallowed > 0 THEN
        SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'ERROR: Attempt to modify Immutable attribute `[attribute name]` in  `[class name]`';
    END IF;
END;
//
CREATE TRIGGER `immutable_[attribute name]_update` BEFORE UPDATE ON `[class name]`
FOR EACH ROW BEGIN
    IF OLD.`[attribute name]` <> NEW.`[attribute name]` THEN
        SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'ERROR: Attempt to modify Immutable attribute `[attribute name]` in  `[class name]`';
    END IF;
END;
//
DELIMITER ;

Status:

  • Simple attributes work fine.
  • Autounique, initialized, defaulted, and const all work.
  • Internal attributes are ignored in SQL generation as specified.
  • Attribute types are handled properly unless the attribute type is that of another class.

Association Declaration

Meaning in SQL:

  • In SQL, associations between classes (tables) are called Foreign keys. A Foreign key is an attribute that has some constraints on it that make sure it always references a particular row in another table.

Concerns:

  • “Many” associations cannot be stored within a table row. This means that traversing the association from the “many” class to an associated class is impossible. Instead, the user must traverse the other way. In “many to many” associations, traversal becomes impossible completely, unless a new association class is created that contains two “one to many” associations in it. Still, traversal is more difficult.
  • Unidirectional associations can be somewhat enforced if the association is a “constant to constant” type. However, the user can always circumvent this enforcement by running a different query. Other than in this case, they cannot be enforced at all. (Look at indexes)
  • “MtoN” associations are only enforceable via the addition of triggers. However, the real question is what to do if the M..N constraint is broken? Cascade delete all the remaining associated rows, or simply throw a "Multiplicity out of bounds" error? TL: Do the latter I think.
  • Unsure how to handle Primary Keys that are also Foreign Keys. For example:
    class A
    {
        id INT;
        1 – 1 B;
        key {B};
    }

    class B
    {
        id INT;
        1 – 1 A;
        key {ID};
    }
* _TL: In the above case, you would have to inject a column into A that would have the structure of the key of B._
  • In just about any (all?) situation regarding associations (though constant -- constant or 0..1 -- 0..1 or 0..1 -- constant associations are the most obvious), the foreign key constraints should be created AFTER the creation of all of the tables in the database, in order to avoid missing reference errors (can't make a foreign key constraint to a table that doesn't exit). This means, potentially, that outputting all classes, with no foreign key constraints within the CREATE TABLE statements themselves, into a single file with the foreign key constraints declared at the bottom of the file is the only way to ensure error-free SQL.
    • TL Seems reasonable to me.

Generated SQL:

If multiplicity is a constant or 0..1, then for each Primary Key in the associated class:

[associated class name]_[associated Primary Key name][add “_[association number]” if multiplicity > 1] [associated Primary Key type] [add “NOT NULL” if multiplicity is constant],
CONSTRAINT fk_[associated class name]_[associated Primary Key name][add “_[association number]” if multiplicity > 1] FOREIGN KEY ([associated class name]_[associated Primary Key name][add “_[association number]” if multiplicity > 1]) REFERENCES [associated class name]([associated Primary Key name]),

If multiplicity is M..N or * -- M..N or * -- * then add the following outside of any other class declaration:

CREATE TABLE IF NOT EXISTS `[associated class A name]_to_[associated class B name]`
(
  [class A's Primary Keys],
  [class B's Primary Keys],
  PRIMARY KEY ([All attributes]),
  [Add Foreign Key constraints for every attribute of the form]
  CONSTRAINT fk_[associated class name]_[associated Primary Key name] FOREIGN KEY ([associated class name]_[associated Primary Key name]) REFERENCES [associated class name]([associated Primary Key name]),
);

Add the following for each M..N associaiton

DELIMITER //
CREATE TRIGGER `multiplicity_bounds_[associated class A name]_to_[associated class B name]_insert` BEFORE INSERT ON `[associated class A name]_to_[associated class B name]`
FOR EACH ROW BEGIN
    SELECT COUNT(*) INTO @num_rows FROM `[associated class A name]_to_[associated class B name]` WHERE [Not sure what to put here yet...];
    IF @num_rows > [N] THEN
        SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'ERROR: Insert cannot be performed due to multiplicity out of bounds';
    END IF;
END;
//
CREATE TRIGGER `multiplicity_bounds_[associated class A name]_to_[associated class B name]_insert` BEFORE UPDATE ON `[associated class A name]_to_[associated class B name]`
FOR EACH ROW BEGIN
    SELECT COUNT(*) INTO @num_rows FROM `[associated class A name]_to_[associated class B name]` WHERE [Not sure what to put here yet...];
    IF @num_rows < [M] OR @num_rows > [N] THEN
        SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'ERROR: Update cannot be performed due to multiplicity out of bounds';
    END IF;
END;
//
CREATE TRIGGER `multiplicity_bounds_[associated class A name]_to_[associated class B name]_insert` BEFORE DELETE ON `[associated class A name]_to_[associated class B name]`
FOR EACH ROW BEGIN
    SELECT COUNT(*) INTO @num_rows FROM `[associated class A name]_to_[associated class B name]` WHERE [Not sure what to put here yet...];
    IF @num_rows < [M] THEN
        SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'ERROR: Delete cannot be performed due to multiplicity out of bounds';
    END IF;
END;
//
DELIMITER ;

Status:

  • Associations with 0..1 or constant multiplicities are done. The next order of business is to get association classes added for M..N or * associations.

State Machine Declaration

Meaning in SQL:

  • State machines have no obvious place in SQL. Tim has suggested replacing the entire user-specified state machine with an ENUM type attribute, with the names of each possible state as it's possible values. #### Concerns:
  • None #### Generated SQL: Add the following to the class that contains the state machine:
ENUM([Comma separated list of all state machine state names]),

Status:

  • Complete.

List of SQL Generation Issues

In approximate order of severity.

  1. Proper handling of private keys that are blank OR that have associations in them
    • Key contains associations: Must recursively visit each associated class and determine its primary key.
      • Would be nice if you could set the primary key to internally be whatever it should be as a result of this recursive check.
      • Can be made more efficient by caching the result of the recursive check in one way or another.
        • There is probably a better way to implement this then a cached result. Such as the addition and use of an isExternal flag/option for various entities.
    • Key was not defined: Make the first appearing attribute into the key. If no attributes in class, make the first appearing association into the key, and run the above recursive check. If no associations in class, then (ideally) make the class external and remove references to it in all other classes.
      • If class is made external, then every other class must be checked for unidirectional associations to the, now external, class. If an association exists, make it external as well (does this option exist currently?) and remove reference to the association from the primary key if it is part of it (make the reference external as well (this can't be done right now)). Then make sure the primary key is non-empty. If it is empty, re-run the algorithm above.
        • This is recursive in a horrible way. Every change caused by this causes the algorithm to recurse over every non-external class. O(n^2)
          • This can made O(n) by changing current implementation of unidirectional associations. Add association variables to both classes as if it is unidirectional, but add boolean "traversable" flag to association variable class. Make one direction traversable and the other not.
  2. Proper code generation regarding the position of the declaration of foreign key constraints.
    • Foreign key constraints should be declared after the declaration of all the tables.
      • One way to implement this is to change what the getCode() method returns. Currently it returns a String, but if it were made to return a array or list of strings then the problem could be solved. The writeFile() method would have to store all of the returned string lists/arrays for each class and then output them all at once (all of the first strings, then all of the second strings, etc).
        • This change would actually solve the problem of how to output other things as well, like user-defined methods and all the triggers that need to be defined for various things (trigger examples shown above in the specification).
        • This change would affect the other languages though. They would output a string list/array with just one string in it and would write that file right away. Still, it would be nice if this could be implemented without affecting the other language generators?
        • TL: There are several refactoring possibilities to consider. One would be to leave getCode as it is, but to create getCodeArray. getCodeWould call this in the SQL case. See if you can figure out a way to eliminate massive changes elsewhere, or to do minor refactorings elsewhere. An objective would be to avoid changing existing test cases to the largest extent possible
          • What about, instead of changing the getCode() method as described above, add special trigger strings to the .umpletl files that will allow the getCode() method to separate the returned string into it's important categories. This way, none of the other languages would be modified and no new methods (other then maybe private helper methods) would be added. This will introduce some extra overhead to the compilation of UmpleToSql, but it also has a very tiny effect on the existing code. I believe a smaller effect then implementing a getCodeArray method.
  3. Proper handling of any association with a "m..n" or a "*" in its multiplicity
    • I think that the following should work:
//Speaking from the position of the "m..n" or "*" class (class A)
If the association is bidirectional
  If the other side of the association is 0..1 or a constant
    Make the association external
    //Ignore the association from this point on.
  Else
    Make the association external
    Make the associated class' (class B's) association external
    Create a new Internal class
      Add unidirectional 1..[m..n or *] association to class A //Does this have to be internal as well?
      Add unidirectional 1..[0..1 or constant] association to class B //Same question
If association is unidirectional
  //Option 1
  Make the association external
  Create an internal unidirectional 0..1 or constant (whatever it was) to * association in class B
  //Option 2
  Make the association external
  Create a new Internal class
    Add unidirectional 1..[m..n or *] association to class A 
    Add unidirectional 1..[0..1 or constant or m..n or *] association to class B
  1. Assuming the SQL generation was flawless, many existing test cases (cruise.umple\test\cruise\umple\implementation) would cause invalid SQL code (or no code at all) to be generated, making the tests themselves pointless if left unchanged.
    • Custom test cases are required. I'm unsure if this is possible in the cruise.umple\test\cruise\umple\implementation tests.
  2. Association multiplicity enforcement
    • 0..1 or constant multiplicities are easily enforced.
    • Other multiplicities require triggers for enforcement.
      • If enforcement is strict, then it disallows the ability to add any new rows to tables (in certain cases). Loose enforcement seems required.
        • Loose enforcement = Only check lower bound when removing or modifying a row. Only check upper bound when adding or modifying a row.
          • TL: Enforcement would be great, but do it as a subsequent increment
  3. MySQL allows only one trigger per event per table (events: Insert, Update, Delete). This makes the generation of triggers a bit more complicated, though it is still possible. Just keep this in mind.