Add CUBRID Engine #93

Open
stefansbv opened this Issue Jun 13, 2013 · 51 comments

Projects

None yet

2 participants

@stefansbv
Contributor

I have made an attempt to implement the CUBRID engine, it would be great if you can take a look at it, it's in my fork, in the 'cubrid' branch.

@theory
Owner
theory commented Jun 13, 2013

Ooh, awesome, thanks! I've never heard of CUBRID; love that you're jumping on this. I've just had a quick look. Here are a few very preliminary thoughts.

  • Pity that "change" is a reserved word. Rather than copy all the queries out of App::Sqitch::Role::DBIEngine, maybe add a change_col method to App::Sqitch::Role::DBIEngine, with a default of "change", and override it in curbrid.pm to return "change_name"? Then update all references to that column to get it from the method instead of hard-coding it. I would think that would let you eliminate a ton of duplicate code.
  • Looks like CUBRID supports a STRING type; may I suggest using that instead of VARCHAR(n)? Unless it would kill performance for some reason, of course. I personally prefer not to set those sorts of limits unless I have a business need, but Postgres makes that easy with its TEXT type, and not all databases have the same flexibility.
  • No time zone, but it looks like everything is supposed to be UTC (GMT), yes? If so, just make sure timestamps are inserted with times UTC.
  • I see that CUBRID has a high level of MySQL compatibility. And while MySQL does not appear to have SCHEMA support, the truth is that its DATABASE objects are really schemas: you can join across them, for example. Is the same true of CUBRID?
  • The LIST/SEQUENCE thing looks like a perfect fit for the use of arrays in Postgres and Oracle. I wonder, though, do you need to quote the values in the lists created by __log_tags_param() and friends? And maybe escape characters in the strings? (Sqitch object names can have punctuation characters in them, though not white space).

Provided you get all the tests passing and address the issues you yourself have listed in the comments, I guess I will have to install CUBRID and test it all out myself so I can merge it.

Love that this might go in before MySQL support. :-P

Thanks for your efforts, greatly appreciated!

—Theory

@stefansbv
Contributor

Thanks for your appreciation :-)

I have changed the type from VARCHAR(n) to STRING and I see no reason for performance problems.

TIMESTAMP is also a reserved word... From the manual: "these reserved keywords can be used an identifier when they are enclosed in double quotes, square brackets, or backtick symbol (`)".

Still working on the other problems...

Stefan

@theory
Owner
theory commented Jun 14, 2013

They seem a bit aggressive in their use of reserved words…pity.

@stefansbv
Contributor

Another difference: have to remove | comment 'BEGIN' in the generated SQL scripts...

@theory
Owner
theory commented Jun 14, 2013

Yeah, I have to do that for Oracle scripts, too. I think it might be necessary to create engine-specific default templates.

@stefansbv
Contributor

There is something I don't understand... I do like you say: add a change_col method to App::Sqitch::Role::DBIEngine, with a default of "change", and override it in curbrid.pm to return "change_name" update all references to that column to get it from the method instead of hard-coding it.

Than in DBIEngine I have something like:

sub current_state {
my $chgcol = $self->change_col;
...
SELECT $chgcol, ...
}

But in cubrid.pm should be(?):

sub change_col { 'change_name' }

sub current_state {
my $chgcol = $self->change_col;
...
SELECT $chgcol AS [change], ...
}

to get the same data-structure, but than I still duplicate almost all the code.

@theory
Owner
theory commented Jun 14, 2013

Well, the hope is that you would need to duplicate far fewer methods in cubrid.pm. Of course, for those you do have to override, you can hard-code change_name if you want.

@stefansbv
Contributor

Regarding the LOCK TABLE issue, the best I could find is to use SET TRANSACTION ISOLATION LEVEL 6, instead, (the default level is 3). That means that all other read or write operations from other transactions are delayed until the transaction is committed.

Another option would be, as suggested on the forum, to use triggers to prevent the modification of the changes table. The triggers would be dynamically dropped/created to allow only the current user to make changes.

sub init_triggers {
    my $self = shift;
    my $dbh  = $self->dbh;
    my $current_user = uc($self->user) . '@' . hostname;
    foreach my $stmt (qw{insert update delete}) {
        $dbh->do(
            qq{
            DROP TRIGGER trg_${stmt}_lock_changes;}
        );
        $dbh->do(qq{
            CREATE TRIGGER trg_${stmt}_lock_changes
            BEFORE $stmt ON changes
            IF
            (
               user() != '$current_user'
            )
            EXECUTE REJECT;
        });
    }
    return;
}

The caveat is that the same user@host still can make modification from another instance of Sqitch.

@theory
Owner
theory commented Jun 17, 2013

Well, the point is not to prevent other folks from accessing the database,. It's to prevent more than one instance of Sqitch deploying or reverting at once. Locking the changes table to prevent anyone else from writing accomplishes this very well Serialized isolation will not (it would just prevent conflicts between two transactions)

I'm not sure the trigger would not work, either as I would assume that one might use the same user for each running of Sqitch. Also, if that's all in a single transaction, would the triggers even be visible enough in other transactions to execute?

It may be that you can't effectively create such a lock. Which would be a shame, but not the end of the world.

@stefansbv
Contributor

Another idea is to create a simple custom locking mechanism. A flag somewhere in the database. If it's ON other users must cancel if OFF put the flag ON and proceed..., than turn it OFF.

I realize now that the idea about the triggers is bad as I presented it, it does nothing, if the user can drop other user's triggers and make his own instead, is pointless. :-)

@theory
Owner
theory commented Jun 18, 2013

Well, you have to commit such a "lock" so that other transactions see it. And then you have do deal with the race condition that comes between two processes checking at the same time:

  1. Process 1 checks, sees it's free
  2. Process 2 checks, sees it's free
  3. Process 1 takes the lock
  4. Process 2 takes the lock

This is why locking is a PITA. I'm glad it's easy to access cross-transactional locks in Postgres.

@stefansbv
Contributor

Regarding your question about joining across databases my opinion is that is not possible. Unlike in Pg each database have to be started before it can be used. From my tests the FROM clause accepts a user name (user.table) but not a database name. However I can forward the question to the CUBRID forum if you like.

In this case the live test setup will be a little more complicated, the databases must* be created before the testing.

  • The cubrid createdb command can be used to create a database.
@theory
Owner
theory commented Jun 18, 2013

The user.table syntax sounds like schemas. In Oracle, a username is always also a schema. Is there a default schema, like public? Or are things always put into the user's own schema if a different schema/username is not specified using the dot syntax?

See how the Oracle engine does it. It just uses the current schema and expects the deployment scripts to specify a different scheme name/username when they deploy. From the Oracle tutorial:

First Sqitch created the metadata tables used to track database changes. The structure and name of the metadata tables varies between databases, but in Oracle they are simply stored in the current schema -- that is, the schema with the same name as the user you've connected as. In this example, that schema is scott. If you'd like to keep the Sqitch metadata in some other schema, set the core.oracle.sqitch_schema configuration variable:

% sqitch config core.oracle.sqitch_schema sqitch

Ideally, only Sqitch data will be stored in this schema, so it probably makes the most sense to create a superuser named sqitch or something similar and use it to deploy changes.

@stefansbv
Contributor

There is a default user named public. It can be used to GRANT privileges to all the users in the same time. The creator of the table is the owner and the DBA user can change the owner of the table.

Unfortunately CREATE TABLE sqitch.projects... ignores the user part and makes the current, logged in, user the owner of the table.

It might work using users instead of schemas because SELECT * FROM sqitch.projects throws an error if sqitch is not the current user, but the names of the tables have to be unique in the database.

@theory
Owner
theory commented Jul 2, 2013

Yeah, that sounds like the Oracle approach. If you use the Oracle username sqitch to do a deploy, the objects are created in the sqitch schema, which means you can qualify them with the username, e.g., sqitch.projects.

On the other hand, the same user runs the deploy scripts. If there is no way in the scripts themselves to specify that objects should be owned by some other user (no SET USER foo or some such), then everything will have to be owned by the deploying user, including the Sqitch tables. Which would be unfortunate.

BTW, I have spent most of the last week porting Sqitch to MySQL. It includes addressing the reserved column name change. You might want to rebase off that branch. Or wait a day or so, as I will finish the tutorial and merge it into master, and you can rebase from there. My fix for the reserved word was to table-qualify all references to the column, where possible, and to double-quote the column name in the two places where MySQL was too stupid to realize that they are unambiguously column names.

@stefansbv
Contributor

So we are stuck with the separate database approach, yes?

Table-qualify-ing all references to the column is good, but not enough :) CUBRID wants c."change", but that seems to be OK, at least for Pg and SQLite.

I have merged your mysql, branch, made the supplementary quoting including for timestamp and it looks good so far, it has eliminated some duplicated code (8 subs).

Currently I'm stuck at:

t/cubrid.t .......... 1/?     
    ...
    #   Failed test 'Database error should be converted to Sqitch exception'
    #   at t/lib/DBIEngineTest.pm line 89.
    # expecting: App::Sqitch::X
    # found: normal exit
Can't call method "ident" without a package or object reference at t/lib/DBIEngineTest.pm line 91.
...

Any hints for where to look?

@theory
Owner
theory commented Jul 2, 2013

So we are stuck with the separate database approach, yes?

Yeah, I think so.

Table-qualify-ing all references to the column is good, but not enough :) CUBRID wants c."change", but that seems to be OK, at least for Pg and SQLite.

Seriously? I would file a bug against that. It's one way it's not compatible with MySQL.

I have merged your mysql, branch, made the supplementary quoting including for timestamp and it looks good so far, it has eliminated some duplicated code (8 subs).

\o/

Any hints for where to look?

Yeah, it looks like the HandleError sub defined in the connect call is not being reached for some reason.

@stefansbv
Contributor

Using DBI_TRACE=1 prove -vlr t/cubrid.t:

ok 15 - And it should show the proper schema in the error message
!! ERROR: -20001 'CUBRID DBMS Error : (-493) Syntax: In line 1, column 13 before ' __bar_____'
Syntax error: unexpected 'INTO', expecting SELECT or VALUE or VALUES or '(' ' (err#0)
<- do('INSERT blah INTO __bar_____')= ( undef ) [1 items] at DBIEngineTest.pm line 89
not ok 16 - Database error should be converted to Sqitch exception

#   Failed test 'Database error should be converted to Sqitch exception'
#   at t/lib/DBIEngineTest.pm line 89.
# expecting: App::Sqitch::X
# found: normal exit
   ERROR: -20001 'CUBRID DBMS Error : (-493) Syntax: In line 1, column 13 before ' __bar_____'
Syntax error: unexpected 'INTO', expecting SELECT or VALUE or VALUES or '(' ' (err#0)
<- DESTROY(DBI::db=HASH(0xabeb9d0))= ( undef ) [1 items] at Builder.pm line 239

Can't call method "ident" without a package or object reference at t/lib/DBIEngineTest.pm line 91.

@theory
Owner
theory commented Jul 2, 2013

So it is throwing the proper error, that's good. Maybe it's a bug in DBD::Cubrid that HandleError does not work? Or maybe it does, but it turns it into a string or something?

@theory
Owner
theory commented Jul 2, 2013

Maybe stick some debugging code in that HandleError sub to see if it's reached. Something like

print STDERR "here I am!\n";
@stefansbv
Contributor

It's not called, looks like I bug to me.

@theory
Owner
theory commented Jul 3, 2013

It's not called, looks like I bug to me.

Okay, I suggest wrapping it in a SKIP block and only running the test if $class ne App::Sqitch::Engine::cubrid. You can update it to test on Cubrid if a new version is released with a fix and you either require that version (in which case just remove the SKIP block) or detect the version and only test HandleError on the fixed version.

@theory
Owner
theory commented Jul 3, 2013

BTW, I have now merged the mysql branch into master. You might want to rebase from there. Also, some of what I ended up having to do might be of interest; if so, check out the merge commit or individual commits starting from 75f2ad2 and ending with 7b4e087.

@theory
Owner
theory commented Jul 4, 2013

Crap, the quoting of changes makes Oracle crap out. The quotes force case-sensitivity, and unlike MySQL, Postgres, and SQLite, the default case internal to Oracle is uppercase. So "CHANGE" will work, but not "change".

This is starting to really chap my hide.

I think I need to add something to the Engine interface to specify reserved names. Blargh.

@stefansbv
Contributor

I suppose that a global rename from change to change_name is out of the question because of the backward compatibility problems?!... or maybe is possible to make a plan to automate the change ;-)

@theory
Owner
theory commented Jul 4, 2013

Yeah. Besides, some other engine will be added that reserves some other term we use. Might as well generalize the solution now. Commit coming shortly.

@theory theory added a commit that referenced this issue Jul 4, 2013
@theory Add _quote_idents() to DBI Engine role.
The double-quoting of "changes" for the MySQL support broke the Oracle
support. This is because the double-quotes force case-sensitivity, and unlike
MySQL, SQLite, and Postgres, Oracle internally uses uppercase identifiers, not
lowercase. So it would have to be `"change"` for MySQL and `"CHANGE"` for
Oracle.

So instead add the `_quote_idents()` method. Pass it a list of identifiers and
it quotes any that are reserved. By default it quotes none, but the MySQL
engine will quote `changes`. Currently used only in the two places where MySQL
requires it. Its use is likely to be expanded in the future when other engines
require it such as CUBRID (issue #93).
dfcdbae
@theory
Owner
theory commented Jul 4, 2013

So you will probably need to add the use of that method to a few more places for the CUBRID support, @stefansbv.

@stefansbv
Contributor

Not a problem. Speaking about limitations, I first tried to make a Firebird engine, but there is a limit for the index size, so the project and uri columns from the projects table would be limited to VARCHAR(250). That was a big surprise for me, because I was thinking I know Firebird... :-)

@theory
Owner
theory commented Jul 4, 2013

MySQL has similar limitations, which is why its columns are not all TEXT. Setting things up to be UTF-8, I made them as big as could be indexed, which is 255.

The only databases that don't make me nuts with shit like this are Postgres and SQLite.

@stefansbv
Contributor

Than is still hope for a Firebird engine if all goes well with CUBRID... :-)

@stefansbv
Contributor

How about the c."change" issue, that I have mentioned earlier?, you have not taken it into account. A bug report would, at least, delay indefinitely the implementation of the CUBRID engine.

Is it necessary to override _dt in pg.pm and oracle.pm?

@theory
Owner
theory commented Jul 4, 2013

How about the c."change" issue, that I have mentioned earlier?, you have not taken it into account. A bug report would, at least, delay indefinitely the implementation of the CUBRID engine.

Well, I think a bug report would be useful in any event. But for now, I think you will need to do something like this:

--- a/lib/App/Sqitch/Role/DBIEngine.pm
+++ b/lib/App/Sqitch/Role/DBIEngine.pm
@@ -45,7 +45,7 @@ sub _limit_default { undef }

 sub _simple_from { '' }

-sub _quote_idents { shift; @_ }
+sub _quote_idents { shift; wantarray ? @_ : $_[0] }

 sub _in_expr {
     my ($self, $vals) = @_;
@@ -80,35 +80,41 @@ sub latest_change_id {

 sub current_state {
     my ( $self, $project ) = @_;
-    my $cdtcol = sprintf $self->_ts2char_format, 'c.committed_at';
-    my $pdtcol = sprintf $self->_ts2char_format, 'c.planned_at';
-    my $tagcol = sprintf $self->_listagg_format, 't.tag';
+    my $cdtcol = sprintf $self->_ts2char_format, 'c.' . $self->_quote_ident('committed_at');
+    my $pdtcol = sprintf $self->_ts2char_format, 'c.' . $self->_quote_ident('planned_at');
+    my $tagcol = sprintf $self->_listagg_format, 't.' . $self->_quote_ident('tag');
     my $dbh    = $self->dbh;
+    my $cols   = 'c.' . join  "\n             , c." qw(
+        change_id
+        change
+        project
+        note
+        committer_name
+        committer_email
+        planner_name
+        planner_email
+    );
+    my $group = join  "\n             , c." qw(
+        change_id
+        change
+        project
+        note
+        committer_name
+        committer_email
+        committed_at
+        planner_name
+        planner_email
+        planned_at
+    );
     my $state  = $dbh->selectrow_hashref(qq{
-        SELECT c.change_id
-             , c.change
-             , c.project
-             , c.note
-             , c.committer_name
-             , c.committer_email
+        SELECT $cols
              , $cdtcol AS committed_at
-             , c.planner_name
-             , c.planner_email
              , $pdtcol AS planned_at
              , $tagcol AS tags
           FROM changes   c
           LEFT JOIN tags t ON c.change_id = t.change_id
          WHERE c.project = ?
-         GROUP BY c.change_id
-             , c.change
-             , c.project
-             , c.note
-             , c.committer_name
-             , c.committer_email
-             , c.committed_at
-             , c.planner_name
-             , c.planner_email
-             , c.planned_at
+         GROUP BY $group
          ORDER BY c.committed_at DESC
          LIMIT 1
     }, undef, $project // $self->plan->project ) or return undef;

Kind of a PITA, I know, but the only way to make sure all engines behave in the future.

Is it necessary to override _dt in pg.pm and oracle.pm?

Only if you feel like you need it. Those are left from before the creation of the DBIEngine role, and used as a shorthand for converting values. If you find you are doing it a lot, go ahead and use it. It wasn't needed in the SQLite and MySQL libraries, because they override very few of the DBIEngine methods.

@stefansbv
Contributor

The sequence type has no support in the driver. From the POD:

Note that, `DBD:cubrid` does not support BIT, SET, MULTISET and
SEQUENCE now.

Not having any supporting functions in the driver or in SQL to work with the SEQUENCE type makes it difficult to use it.

The input has to be a string in this format: "{'a','b','b'}" and the output is a string like this: "{a,b,b}". So we would need two different methods to format the input and the output, I think is better to change to STRING type, for now.

@stefansbv
Contributor

Another, small, issue is a warning when the uri is undef, but the record is inserted:

ok 26 - Should have no registered projects
Use of uninitialized value in subroutine entry at ... linux-thread-multi/DBI.pm line 1675.
Use of uninitialized value in subroutine entry at ... App/Sqitch/Role/DBIEngine.pm line 324.
ok 27 - Register the project

Probably another bug in the driver?, because DBD::Pg doesn't not throw any warnings for undef parameters.

@stefansbv
Contributor

Good news, the last 2 live test failures (not counting the skipped ones)...

not ok 181 - Should now have three current tags in reverse chron order

#   Failed test 'Should now have three current tags in reverse chron order'
#   at t/lib/DBIEngineTest.pm line 915.
#     Structures begin differing at:
#          $got->[0]{tag} = '@beta'
#     $expected->[0]{tag} = '@gamma'
ok 182 - Should have 9 events
ok 183 - The limit param to search_events should work
ok 184 - The offset param to search_events should work
not ok 185 - The limit and offset params to search_events should work together

#   Failed test 'The limit and offset params to search_events should work together'
#   at t/lib/DBIEngineTest.pm line 959.
#     Structures begin differing at:
#          $got->[3] = HASH(0xa76c8b8)
#     $expected->[3] = Does not exist
ok 186 - Should work to set direction "DESC" in search_events

But still there are many problems...

@theory
Owner
theory commented Jul 8, 2013

Probably another bug in the driver?, because DBD::Pg doesn't not throw any warnings for undef parameters.

Yeah, the driver should detect undef and insert a NULL.

@theory
Owner
theory commented Jul 8, 2013

Failed test 'Should now have three current tags in reverse chron order'

Does CUBRID have sub-second precision in its datetime time? Looks like it does.

If that's not the issue, it probably has the same problem as MySQL, which is that the order in which strings are concatenated by the group_concat() aggregate is random. See 01c29a9 for how I worked around that issue in the test: resorting them in Perl code only if the engine is MySQL. You can just modify that code to also check for cubrid, I suspect.

@stefansbv
Contributor

The problem was that I did not understand what is about add_second_format so was not set, now the first failed test pass, one more to debug...

@stefansbv
Contributor

Not to much progress from the last message, I have reported some issues and I'm waiting for the next release of the driver..., maybe that will also magically fix the last failed test :)

Cannot bind SEQUENCE type value in Perl

HandleError subroutine is NOT called on ERROR

The Callback function is not invoked on "connect"

When bind value is "undef", NULL should be inserted with no warning

Setting the DBI "TraceLevel" attribute does not work

Another issue is with the csql utility. It doesn't have a quiet option and reports the time it took to complete the
command. IPC::System::Simple doesn't seem to be able to capture the output on STDERR or STDOUT (or I don't know how to use it).

The code:

use strict;
use warnings;
use IPC::System::Simple qw(capturex $EXITVAL EXIT_ANY);

my $output = capturex(EXIT_ANY, "csql", "--version");

print "CAPTURE: >>$output<<\n";
print "Program exited with value $EXITVAL\n";

Prints:

csql (CUBRID utilities)
9.1
Copyright (C) 2008 Search Solution Corporation. All rights reserved by Search Solution.
CAPTURE: >><<
Program exited with value 0

Update: And the reason for not capturing that copyright string is because is printed to STDERR !

Any hints?

Regards,
Stefan

@theory
Owner
theory commented Jul 29, 2013

Not to much progress from the last message, I have reported some issues and I'm waiting for the next release of the driver..., maybe that will also magically fix the last failed test :)

Okay, I won't hold up the next release for it.

Any hints?

Use $self->sqitch->capture(qw(csql --version)) to capture STDOUT. I tend to leave STDERR alone, so that if there is an error, the error output is visible in the shell.

You probably want to do something like run_verify() as implemented for MySQL and PostgreSQL: It uses capture, but switches to run when verbosity is enabled.

@stefansbv
Contributor

There is a new release for the CUBRID driver but the problems still remain: Issues marked as fixed but no related changes in code.

@theory
Owner
theory commented Sep 13, 2013

Well, that doesn't inspire much confidence, does it?

@stefansbv
Contributor

Good news, there is a new driver version: 9.2.0.0001 and it looks like the important issues are fixed:

  • HandleError subroutine is NOT called on ERROR;
  • The Callback function is not invoked on "connect";
  • When bind value is "undef", NULL should be inserted with no warning;

Maybe I can (with a little help) make it after all...

@theory
Owner
theory commented Oct 8, 2013

Awesome!

@stefansbv
Contributor

Bad news, the reason for the last failed test: 'The limit and offset params to search_events should work together' seems to be another bug in DBD::cubrid.

A short test script shows that LIMIT works OK without OFFSET but both fail to work together. The result for a SELECT from the athlete table from the demodb test database returns 21 records with LIMIT=3 and OFFSET=2.

@stefansbv
Contributor
@theory
Owner
theory commented Aug 26, 2014

Any news on this, @stefansbv?

@theory theory removed this from the v1.0.0 milestone Aug 26, 2014
@stefansbv
Contributor

Unfortunately, no, it seems to be a bug in CCI, at a lower level than DBD::cubrid.

@theory
Owner
theory commented Aug 26, 2014

Oh, bummer. Sounds kind of serious. Are they not fixing it?

@stefansbv
Contributor

The last comment in http://jira.cubrid.org/browse/APIS-693 is:

The result of cci is not correct in my opinion,so the other drivers are not correct,too.

but I don't know if it was reported to the developers of CCI.

@theory
Owner
theory commented Aug 26, 2014

Maybe time to leave a nag message there? (Kinda like I'm doing here. ;-P)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment