Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record and Record Array functionality #16

Closed
hsyed opened this issue Nov 28, 2013 · 30 comments
Closed

Record and Record Array functionality #16

hsyed opened this issue Nov 28, 2013 · 30 comments

Comments

@hsyed
Copy link
Contributor

hsyed commented Nov 28, 2013

Just an FYI,

I hacked together a way of doing records and array of records. I started working on it from this Stackoverflow question.

The adapted mappers are here .

It's based on regular expressions and pattern matching. Rather nasty. Hope you guys can come up with a nicer way of doing it and add it to the project <3. There should be away of generating matchers and regular expressions from the case classes, but my scala metaprogramming skills suxor :D I need records in hstore in the next few days. I'm dreading having to alter hstore as the operator support looks tasty, and I would want those features.

P.s., sorry if this is not the right place to put this. I don't know feature requests go.

Regards

Hassan

@tminglei
Copy link
Owner

yes, it's good idea to add support for pg composite type. i'll try to implement it.
but i want to take into account of the related operations support together, as well as composite type in/out.

i did some researching these days, but need more time before i start coding.

will keep you posted for the updates.

@hsyed
Copy link
Contributor Author

hsyed commented Nov 29, 2013

Cool :D thanks. I've been looking into it as well, doing it properly that is. Already having problems with regular expressions when it comes to parsing strings.

I think we need a lifted representation of a record, and use that description to parse and generate records. What do you think ?

@tminglei
Copy link
Owner

I've started coding for it.
it won't have its lifted representation at first, but it will be convenient to be used.

@hsyed
Copy link
Contributor Author

hsyed commented Nov 30, 2013

aaah you around, I started looking into it more last night. Have you found a way to parse the array / composite value strings ? I couldn't find any java code that does it in the postgres JDBC libraries, I could only find a basic tokenizer that doesn't seem to do much. Are you going to write your own parser ?

@tminglei
Copy link
Owner

yes, maybe

@hsyed
Copy link
Contributor Author

hsyed commented Nov 30, 2013

This guy goes into some detail of the issues of parsing postgres custom types.

Are you going to allow nested arrays / composites ? I think that should definitely be covered as I was doing an hstore of
hstore => Array[ Composite[ (Int, Array(Composite[....])]] .

I was making use of pythons psyopg2 in my work, it dealt very well with composite types.

I have some experience with writing parsers/tokenisers, Would you like me to look into writing a tokeniser for the composite type using the scala parser combinator library ? If you have it under control I'll get out of your way <3.


On a separate note, I'm currently still writing postgres ddl. If we had arrays and composites along with all the stuff you already implemented. I could do all the ddl off of the the lifted representation. I mention this, because in this case we would need a lifted representation of the composite type.

All we need is something like this :

case class ForumQuote(   index:            Int,
                         startOffset:    Int,
                         endOffset:      Int,
                         isDirect:       Boolean,
                         quotedId:       Int)

object ForumQuoteRecord extends CompositeType[ForumQuote]("forum_quote") {
  def index = column[int]("index")
  def startOffset ...
  def endOffset   ...
  def isDirect    ...  
  def quotedId    ...
  def Type = index ~ startOffset ~ endOffset ~ isDirect ~ quotedId
  )

@tminglei
Copy link
Owner

yes, it's good help. please

my basic way is,

  1. parse composite array's string to composite string list, and continue, parse composite string to string list
  2. for every string list of a composite, convert it to a case class instance corresponding to the composite

for lifted representation, yes, we need it if we want to add ddl support for it.
thanks for your help

@hsyed
Copy link
Contributor Author

hsyed commented Nov 30, 2013

Ok I will start by writing a tokeniser as follows.

I will make a list of test strings covering:

  1. All the primitive types.
  2. Arrays
  3. Composite types
  4. nested arrays / composite types.

I will then write a level 1 tokeniser for this. Which puts the tokens onto a stack.

I guess a Level 2 tokeniser should reduce the stack into level 2 tokens of case class ValueToken, ArrayToken, CompositeToken.

From here we should be able to map the data.

We can add other level 1 tokens, and reduction cases later on for new types.

@tminglei
Copy link
Owner

ok, thanks again.

@hsyed
Copy link
Contributor Author

hsyed commented Dec 1, 2013

I've implemented integer / decimal nested array / record parsing so far adding strings next.

object Tokenizer extends JavaTokenParsers {
  trait Element
  case class ParElement(value: List[Element]) extends Element
  case class CurElement(value: List[Element]) extends Element
  trait Value extends Element
  case class WholeInteger(value: String) extends Value
  case class Decimal(value: String) extends Value

  def signedIntegral = "[+|-]?\\d+".r ^^ { WholeInteger(_) }
  def decimal = "[+|-]?\\d+\\.\\d+".r ^^ { Decimal(_) }

  def numeric : Parser[Value] =  decimal | signedIntegral
  def element : Parser[Element] = numeric | parElement

  def parElement = '{' ~> repsep(element, ",") <~ '}' ^^ { ParElement(_) }
  def curElement = '(' ~> repsep(numeric, ",") <~ ')'

  def tokenize: Parser[ List[Element] ] = rep(parElement)

  def apply(input : String) = {
    parseAll(tokenize,new scala.util.parsing.input.CharArrayReader(input.toCharArray));
  }
}

@tminglei
Copy link
Owner

tminglei commented Dec 1, 2013

cool

@hsyed
Copy link
Contributor Author

hsyed commented Dec 1, 2013

Seems nested records/arrays have a quotation format depending on depth.

Level 1 is unquoted
Level 2 uses "
Level 3 uses "
Level 4 uses ""

Composite type "public.rec"
 Column |  Type   | Modifiers
--------+---------+-----------
 val    | integer |
 val2   | integer |

Composite type "public.rec2"
 Column |  Type   | Modifiers
--------+---------+-----------
 val    | integer |
 val2   | rec[]   |

    Table "public.test5"
 Column |  Type   | Modifiers
--------+---------+-----------
 val    | integer |
 val2   | rec2[]  |

val |            val2
-----+-----------------------------
   1 | {"(1,\"{\"\"(1,2)\"\"}\")"}
   1 | {"(1,\"{\"\"(1,2)\"\"}\")"}

insert into test5 values(1,Array[Row(1,Array[Row(1,2)::rec])::rec2]);

Ouch and it gets crazier for strings

Composite type "public.rec"
 Column |  Type   | Modifiers
--------+---------+-----------
 val    | integer |
 val2   | text    |

Composite type "public.rec2"
 Column | Type  | Modifiers
--------+-------+-----------
 val    | text  |
 val2   | rec[] |

     Table "public.test"
 Column |  Type  | Modifiers
--------+--------+-----------
 val    | rec2[] |

                      val
-----------------------------------------------------------------
 {"(\"ni hao\",\"{\"\"(1,\\\\\"\"domo arigato\\\\\"\")\"\"}\")"}
(1 row)

I just figured monads out a few days ago (sort of).

I think I need to change the parser monad to track the depth,


Composite type "public.rec3"
 Column | Type | Modifiers
--------+------+-----------
 val    | rec2 |

Composite type "public.rec4"
 Column | Type | Modifiers
--------+------+-----------
 val    | rec3 |

   Table "public.test8"
 Column | Type | Modifiers
--------+------+-----------
 val    | rec4 |
 ("(""(""""domo arigato"""",""""{""""""""(1,\\\\\\\\""""""""ni hao\\\\\\\\"""""""")""""""""}"""")"")") 

omg the quotes and backslashes follow power 2^n depending on depth.


Ok I have figured out a way of dealing with it :D

@tminglei
Copy link
Owner

tminglei commented Dec 2, 2013

Hi @hsyed, I submitted some draft codes to new branch struct. and the major changes are in utils.JdbcTypeHelper.

Will be enhanced and refactorred.

@hsyed
Copy link
Contributor Author

hsyed commented Dec 2, 2013

Cool. I made it to a parse that handled almost everything but I couldn't get past the quotation problem. I decided to write a tokeniser first. The output is as follows :

(" hassan","("" ali"",""{""""(\\\\"""" ali\\\\"""")""""}"")")
List(ArrOpen(,-1), Marker(",0), Chunk(hassan), Marker(",0), Comma(), ArrOpen(",0), Marker("",1), Chunk(ali), Marker("",1), Comma(), RecOpen("",1), ArrOpen("""",2), Marker(\\\\"""",3), Chunk(ali), Marker(\\\\"""",3), ArrClose("""",2), RecClose("",1), ArrClose(",0), ArrClose(,-1))
true
{"(1,\"{\"\"(2,\\\\\"\"{\\\\\"\"\\\\\"\"(3,3)\\\\\"\"\\\\\"\"}\\\\\"\")\"\"}\")"}
List(RecOpen(,-1), ArrOpen(",0), Chunk(1), Comma(), RecOpen(\",1), ArrOpen(\"\",2), Chunk(2), Comma(), RecOpen(\\\\\"\",3), ArrOpen(\\\\\"\"\\\\\"\",4), Chunk(3), Comma(), Chunk(3), ArrClose(\\\\\"\"\\\\\"\",4), RecClose(\\\\\"\",3), ArrClose(\"\",2), RecClose(\",1), ArrClose(",0), RecClose(,-1))
object Tokenizer extends RegexParsers {
  trait Token
  case class Comma() extends Token
  case class Chunk(value : String) extends Token

  case class ArrOpen(marker: String,l: Int) extends Token
  case class RecOpen(marker: String,l: Int) extends Token
  case class ArrClose(marker: String,l: Int) extends Token
  case class RecClose(marker: String,l: Int) extends Token
  case class Marker(m : String, l: Int) extends Token

  var level = -1

  def marker = "[\\\\|\"]+".r
  def open: Parser[Token] = opt(marker) ~ (elem('{') | elem('(')) ^^ { case(x ~ y) => val r = y match {
    case '{' => RecOpen(x.getOrElse(""),level)
    case '(' => ArrOpen(x.getOrElse(""),level)
  } ; level += 1; r }
  def close: Parser[Token] = (elem('}') | elem(')')) ~ opt(marker) ^^ { case (x~ y) => level -= 1 ; x match {
    case '}' => RecClose(y.getOrElse(""),level)
    case ')' => ArrClose(y.getOrElse(""),level)
  } }
  def comma = elem(',') ^^ { x=> Comma() }
  def chunk = """[^}){(\\\,"]+""".r ^^ { Chunk(_)}
  def tokens = open | close | marker ^^ { Marker(_,level) } | comma | chunk
  def tokenise = rep(tokens)

  def apply(input : String) = {
    level = -1
    parseAll(tokenise,new scala.util.parsing.input.CharArrayReader(input.toCharArray));
  }
}

I will rewrite this list and keep only those markers that signify opening and closing strings. I will further reduce those quotation markers to a single ", or a double for strings that have '' in them. Then I'll run it through a parser.

Sigh it seems I haven't catered for escapes but good thing escapes have a regular pattern which you can verify with

select Row(Array[Row(Array[Row('hello \n "" world')])]); 

Ok Quotations and escapes are tokenising correctly :D

("{""(\\""{\\""\\""(\\\\\\\\\\""\\""hello \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\n \\\\\\\\\\""\\""\\\\\\\\\\""\\""\\\\\\\\\\""\\""\\\\\\\\\\""\\"" world\\\\\\\\\\""\\"")\\""\\""}\\"")""}")
List(ArrOpen(,-1), RecOpen(",0), ArrOpen("",1), RecOpen(\\"",2), ArrOpen(\\""\\"",3), Quote("), Chunk(hello ), Escape(\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\,4,\n), Quote(""""), Chunk(world), Quote("), ArrClose(\\""\\"",3), RecClose(\\"",2), ArrClose("",1), RecClose(",0), ArrClose(,-1))

@hsyed
Copy link
Contributor Author

hsyed commented Dec 2, 2013

Do you have any advise on how you would like me to proceed ? The output above is useable, but I could parse it further.

The parser I was working on is here so have a look at it and tell me what you want.

I think you probably want the arrays and records grouped so :

trait Element
case class ParElement(value: List[Element]) extends Element
case class CurElement(value: List[Element]) extends Element

Do you want me to parse what I can when it comes to the value types ? I think you will have to check the value types so there isn't much value in me parsing beyond just tokenising them, and it is probably better to do it outside of the parser. These are the value type parsing I got to before.

trait Value extends Element
case class WholeInteger(value: String) extends Value
case class Decimal(value: String) extends Value
case class QuotedValue(value: String) extends Value

If you don't need any further parsing I'll probably write a simple function for grouping things together.

@tminglei
Copy link
Owner

tminglei commented Dec 2, 2013

hi @hsyed, i think you already did enough, and no more parsing will be needed.
based on your work, i can simplify the implementation of my part. thanks so much^^

btw, for value type, i think case class Value(value: String) extends Element will be enough, because, according to the type info provided by reflection, it can find the right converter to convert the string to the expected value.

Can you send me a pull request? to branch slick2 and/or slick1, it's up to you want it was implemented for slick2 version and/or slick1 version of slick-pg.

Thanks again!

@hsyed
Copy link
Contributor Author

hsyed commented Dec 2, 2013

Heya, ok I finished the tokeniser and merger. This is what it does now.

("{""(\\""{\\""\\""(\\\\\\\\\\""\\""hello \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\n \\\\\\\\\\""\\""\\\\\\\\\\""\\""\\\\\\\\\\""\\""\\\\\\\\\\""\\"" world\\\\\\\\\\""\\"")\\""\\""}\\"")""}")
CTArray(List(ArrOpen(,-1), CTRecord(List(RecOpen(",0), CTArray(List(ArrOpen("",1), CTRecord(List(RecOpen(\\"",2), CTArray(List(ArrOpen(\\""\\"",3), CTString(List(SingleQuote(), Chunk(hello ), Escape(4,\n), Quote(""""), Chunk(world), SingleQuote())), ArrClose(\\""\\"",3))), RecClose(\\"",2))), ArrClose("",1))), RecClose(",0))), ArrClose(,-1)))

This is my first piece of open source work :D

Next thing we need is Asynchronous Upsert operations supported at the ORM level. That + hstore/gist/gin + records = nosql in postgres :D Postgres doesn't seem to have merge support so the asynchronous element would have to be done in an actor I guess.

@tminglei
Copy link
Owner

tminglei commented Dec 3, 2013

hi @hsyed, great job. and i did some refactoring according parsed result structure as below:

  //TODO analog @hsyed's implementation, should be deleted later
  sealed trait Element
  case class ValueE(value: String) extends Element
  case class ArrayE(elements: List[Element]) extends Element
  case class CompositeE(members: List[Element]) extends Element

here is the latest codes. and you can see some simple tests at the bottom.

Will do more refactoring when applying it to slick-pg

@hsyed
Copy link
Contributor Author

hsyed commented Dec 3, 2013

See comment in pull request :D

@tminglei
Copy link
Owner

tminglei commented Dec 3, 2013

hi @hsyed, i merged your changes into Slick1 branch.
Thanks so much for your great jobs^^

@tminglei
Copy link
Owner

tminglei commented Dec 4, 2013

hi @hsyed, you'd better to send a pull request to branch slick2, too. then it will be merged into branch master when i merge changes from slick2 to master. then your name will appear in the contributors.

I'd like your name appeared in the contributors ^^

@tminglei
Copy link
Owner

tminglei commented Dec 5, 2013

hi @hsyed, we also need a generator from Composite/Array to pg string.

I already implemented the function which consumes Composite/Array object and produce PGElements.Element, pls check the code here.
are you interested in implementing the function which consumes PGElements.Element and produce pg string?

BTW, i did some refactoring for your codes, to codes unification and be convenient for using PGElements. you can check the latest codes here.

again, thanks so much for your nice contribution^^

@hsyed
Copy link
Contributor Author

hsyed commented Dec 5, 2013

I just made two important functions tail recursive. Performance should be good now.


Sure, what do you have in mind ? "Array[Row()]" style of doing it ? I didn't get a chance to make sense of the way postgres interleaves quotes and backslashes. Array[Row()] is probably not bad anyway, considering at nested level 4 your already doing 2^3 of interleaved \ or ".

@tminglei
Copy link
Owner

tminglei commented Dec 5, 2013

thanks for your enhancement^^


well, I already implemented: CompositeType => PGElements.Element and List[T] => PGElements.Element.

so I want a PGElements.Element => pg string implementation, then i can combine them and get CompositeType => pg string and List[T] => pg string.

(ps: we needn't care pg string => CompositeType and pg string => List[T], because we already got it by combine pg string => PGElements.Element and PGElements.Element => CompositeType/PGElements.Element => List[T])

@hsyed
Copy link
Contributor Author

hsyed commented Dec 5, 2013

Ok this will not be a problem with Array and Row annotation. Although I seem to remember issues with JDBC in the past when it came to composite data and Array and Row. ( I can't remember but I might have had to do a Row()::row_type. ).

Could you do the following for me :

  1. Test if postgres will accept through JDBC Array[ and Row( formatting, for 3-4 levels.
  2. Write a test function with some data to call the function.
  3. Tell me where you want the function

I ask for 1 because I don't have much experience with JDBC, and I ask for 2 because I was developing the parser code in a messy project with messy tests. It would be easier if I open the slick-pg project in the IDE directly and I don't have any experience on who to write unit tests with scala :D


My hack of inserting arrays of composites which started this feature request, produces the following SQL. As you can see this doesn't use Array or Row notation. But it's only 1 level deep. if you want to combine pg string at nested levels > 2, without Array and Row this will be a nightmare.

println(TestTable.insertStatementFor(List(ForumQuote(1,2,3,false,4),ForumQuote(1,2,3,false,4)),List(ForumLink(1,2,"hassan"))))
INSERT INTO "test_table" ("fq","fl") select {"(1, 2, 3, false,4)","(1, 2, 3, false,4)"}, {"(1,2,hassan)"}

@tminglei
Copy link
Owner

tminglei commented Dec 5, 2013

Ok, I'll do them next morning (ps: it's midnight in my time now :D), and give you updates.
Thanks

@hsyed
Copy link
Contributor Author

hsyed commented Dec 5, 2013

Ok cool :D Are you Chinese ? My wife is from china and is in Hong Kong on business at the moment :D

@tminglei
Copy link
Owner

tminglei commented Dec 5, 2013

yes, I lived in Xiamen, China, same timezone with Hong Kong :D

@hsyed
Copy link
Contributor Author

hsyed commented Dec 5, 2013

What a coincidence. My wifes family is from xiamen, her Parents live in Fuzhou though. I spent Chinese new year in Xiamen a few years ago :D

@tminglei
Copy link
Owner

tminglei commented Dec 6, 2013

hi @hsyed, so nice to meet you :D


btw, to prevent the thread becoming too long, i filed a new issue #23 to continue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants