-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support unsigned integer #618
Support unsigned integer #618
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks great so far.
@@ -146,6 +146,23 @@ object EOF extends Decoder[EOF] { | |||
case class EOF(warnings: Short, serverStatus: ServerStatus) extends Result | |||
|
|||
/** | |||
* These bit masks are to understand whether corresponding attribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should be styled like:
/**
*
*/
case Type.Tiny => ByteValue(str.toByte) | ||
case Type.Short => ShortValue(str.toShort) | ||
case Type.Int24 => IntValue(str.toInt) | ||
case Type.Long if field.isUnsigned() => LongValue(str.toLong) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't work for longs bigger than 2^63, right? should we add a field to LongValue that we're encoding an unsigned long in a signed long? or add a new type, UnsignedLong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get. Longs bigger than 2^63 has Type.LongLong instead of Type.Long. Should we use isUnsigned check for Type.LongLongs too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, unsigned longs have the same number of bits as signed longs, they're just encoded differently, so we could conceivably encode them as signed longs in scala. it's just a design decision. I don't think it would make sense to make them LongLongs, since we know it can't be most of the numbers in LongLongs. yeah, it would be nice to use isUnsigned for all of the types that could be unsigned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok then, just to clarify. I am using isUnsigned for Type.LongLong too and add another test case for that, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not exactly. suppose mysql wants to return an unsigned long of 9223372036854775808 (as we talked about below). "9223372036854775808".toLong fails. How do we do this properly? Note that mysql will consider it a long, not a longlong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah and if the data type of the column in MySQL, is unsigned long then the type of the field in finagle side is Type.LongLong.
BTW by saying
The column should be able to be unsigned long
You mean BIGINT UNSIGNED, right. Is there a data type long n MySQL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, there's a Long type, that's the field that we match on in this code. although reading more, I think it's actually 4 bytes, not 8 bytes. but I'm really starting to think that all of the numeric values that we get out of mysql, we should annotate whether they're unsigned or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so too because the unsigned logic applies to all numeric values.
Think about smallint if it is unsigned it can go until 65535 in MySQL side, but we are converting it to Short which has 32767 as a MaxValue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mosesn, should I write and send the new version that includes all numerical fields with isUnsigned check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good to me!
@@ -111,6 +112,13 @@ class NumericTypeTest extends FunSuite with IntegrationClient { | |||
case v => fail("expected a RawValue but got %s".format(v)) | |||
} | |||
} | |||
|
|||
test("extract %s from %s".format("unsigned", rowType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a test for a long that's bigger than what scala can fit into a long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sth like: 9223372036854775808 (BigInt(Long.MaxValue) + 1)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be perfect =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the delay, I was sick over the weekend. I'm not sure how much I love this API, but since unsigned stuff is probably not the common case (and is difficult to represent in scala), and we can always change it if we have to, it's probably OK. let me get some more eyes on this but I think this should be good to go modulo a few small changes.
case Type.Float => FloatValue(str.toFloat) | ||
case Type.Double => DoubleValue(str.toDouble) | ||
case Type.Year => ShortValue(str.toShort) | ||
case Type.Tiny if field.isSigned() => ByteValue(str.toByte) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a comment as to why we're doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed that you're talking about isSigned logic, right? If it is, should I write comments for BinaryEncodedRow too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be rad, but I think if you add the helper method you described, that will document it well enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That helper method might need implicit parameters, is that OK? Some project might have rules likes "do not use implicits" so I want to ask.
case Type.Tiny => ShortValue(str.toShort) | ||
case Type.Short if field.isSigned() => ShortValue(str.toShort) | ||
case Type.Short => IntValue(str.toInt) | ||
case Type.Int24 if field.isSigned() => IntValue(str.toInt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can merge ones where it doesn't matter if it's signed or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I thought it might be good to see all cases in discrete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like them explicit too, tbh.
Hi @mosesn, get well soon. Yeah, I do not like checking isSigned in each case too. I though maybe adding this kind of function might help but couldn't be sure:
so that isUnsigned check would be only in that function. |
@mehmetgunturkun I like that, |
Cool, let me change this code into that. Awesome! |
|
||
private val unsignedLongMaxValue: BigInt = BigInt("18446744073709551615") | ||
|
||
def readUnsignedLongLE(): BigInt = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these seem generally useful. Maybe we should move them under ByteReader
, where we have O(1) get?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it is under twitter/util repo, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks for patch!
/** | ||
* Retrieves string of the given code. | ||
*/ | ||
private[mysql] def getCodeString(code: Short): String = code match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added for debugging actually but left as is, it might be helpful.
case Type.Tiny => ShortValue(str.toShort) | ||
case Type.Short if field.isSigned() => ShortValue(str.toShort) | ||
case Type.Short => IntValue(str.toInt) | ||
case Type.Int24 if field.isSigned() => IntValue(str.toInt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like them explicit too, tbh.
Hi @mosesn I tried to do that with the helper function I mentioned It gets too complicated due to the implicit classes, so I guess this is a more clear way to do that, but if you want to see that version I can create another branch for you to review. @vkostyukov I added this part of the code to ByteReader, but how should I merge that to this? One idea is to wait for twitter/util to release a new version then upgrade its version in twitter/finagle and remove the duplicated part. |
@mehmetgunturkun no need for a full review, but do you mind posting your work in a gist? I'm a bit curious. |
Hi @mosesn here is the link for gist: https://gist.github.com/mehmetgunturkun/e3f7974819a29bc6cd048eb4366707cc Check |
@mehmetgunturkun ah, I see what you mean. LGTM then. |
@mosesn What about the merge with twitter/util? Are we going to wait for twitter/util to publish changes in ByteReader? |
@mehmetgunturkun FYI, I've put up that change for review internally. It required some additional changes to finagle/finagle-netty4 but should be (hopefully) shipped later today. |
@mehmetgunturkun this was merged in util twitter/util@94ab01f, so you should be able to take advantage of it now. thanks for your patience! |
Hi @mosesn Should I wait for new release of twitter/util (probably 6.44.0) to move on to this issue, or is there a quicker way to use develop branch of twitter/util? |
@@ -207,6 +223,9 @@ case class Field( | |||
) extends Result { | |||
def id: String = if (name.isEmpty) origName else name | |||
override val toString = "Field(%s)".format(id) | |||
|
|||
def isUnsigned(): Boolean = (flags & FieldAttributes.UnsignedBitMask) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some style nits for ya that I missed earlier:
We typically leave parens off of these types of methods (both defining and calling) unless they mutate in some way. Since this doesn't, please omit here and in isSigned
@mehmetgunturkun you don't have to wait for the new release. the instructions in CONTRIBUTING are for using the tip of util |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@ryanoneill Was the change, you mentioned, related to CopyingByteBufByteReader.scala? |
@mehmetgunturkun yes, via |
Codecov Report
@@ Coverage Diff @@
## develop #618 +/- ##
===========================================
- Coverage 68.2% 68.03% -0.18%
===========================================
Files 634 634
Lines 21101 21161 +60
Branches 1567 1578 +11
===========================================
+ Hits 14392 14396 +4
- Misses 6709 6765 +56
Continue to review full report at Codecov.
|
@mosesn I guess I broke something. I just merged develop -> Support_Unsigned_Integer and removed these unnecessary functions, but coverage is decreased. I did sth. wrong? |
@mehmetgunturkun, I have some bad news: we have a few problems. First one is simple: we need a Second one is not so simple, but not too hard. With your change, most int // Will result in a MatchError
// terrible idea, but it happens and only fails at runtime. :(
val LongValue(foo) = row("foo").get A workaround for this is to make consideration of signedness configurable with a default of off. In my minds eye you just need to add a flag to the Row implementations and in the match statements do something like this: field.fieldType match {
...
case Type.Long if considerSign && field.isSigned =>
... What do you think? |
@bryce-anderson, I got it, I'm adding that flag. For CanbeParameter what should be the typeCode for BigIntValue, is Type.LongLong ok: |
@mehmetgunturkun I think |
Hi @bryce-anderson, sorry for long delay. I have added considerSign flag to Row and take as a parameter in constructors of BinaryEncodedRow and StringEncodedRow, but I am not sure who is going to set this flags. BTW, if it is not clear, I can push the current version for you to check. |
@mehmetgunturkun, it would be great to see the code. Ultimately this will probably be something that gets set during client construction with a stack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach works for me. Thanks!
@@ -20,6 +20,14 @@ trait Row { | |||
/** The values for this Row. */ | |||
val values: IndexedSeq[Value] | |||
|
|||
/**/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an unfinished comment.
Fun side note, this broke github's syntax highlighting
@@ -70,16 +78,17 @@ class StringEncodedRow(rawRow: Buf, val fields: IndexedSeq[Field], indexMap: Map | |||
RawValue(field.fieldType, field.charset, false, bytes) | |||
else { | |||
val str = new String(bytes, Charset(charset)) | |||
println(field, str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stray print
val signedTextEncodedQuery = """SELECT `tinyint`, `smallint`, `mediumint`, `int`, `bigint`, `float`, `double`,`decimal`, `bit` FROM `numeric` """ | ||
runTest(c, signedTextEncodedQuery)(testRow) | ||
|
||
// TODO Comment out after ignoreUnsigned = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean remove TODO part (Line 51) or remove comments for unsigned test cases?
new CanBeParameter[BigInt] { | ||
def sizeOf(param: BigInt) = 8 | ||
def typeCode(param: BigInt) = Type.LongLong | ||
def write(writer: MysqlBufWriter, param: BigInt) = writer.writeBytes(param.toByteArray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert that the BigInt being written can indeed fit in 8 bytes. (Unless this is verified somewhere else that I can't see)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting checking the size of the array, to be sure that it is shorter than or equal to 8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, since (AFAICT) BigInt could store values that would be larger than 8 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if it is larger than 8, take first 8 bytes or throw an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think throwing is the way to go. I'd rather have that then write invalid data.
@bryce-anderson what say you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @nepthar: I think a throw
is fine in this case, although I don't know where that exception will surface off hand. Also, I think you need to make sure you write exactly 8 bytes and right now you will write less in the majority of cases:
scala> BigInt(10).toByteArray
res1: Array[Byte] = Array(10) // only one byte!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can create an array of bytes in size of 8 and fill it in little endian order, is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that or just pad with 0's via .writeByte(0x0)
.
@@ -20,7 +20,7 @@ trait Row { | |||
/** The values for this Row. */ | |||
val values: IndexedSeq[Value] | |||
|
|||
/**/ | |||
/** The value is to consider unsigned flag of field or not */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha awesome commit message :)
Sorry, I didn't mean to suggest that a comment was necessary here, only that there was a stray "/**/"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I miswrote that, message should have been "a sane comment" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, I think we've got all of the basis covered one you switch to a named exception. It'll help stack traces make sense.
val lengthOfByteArray: Int = byteArray.length | ||
|
||
if (lengthOfByteArray > 8) { | ||
throw new Exception(s"The length of BigInt is larger than 8 bytes: $lengthOfByteArray > 8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you create a new named exception for this (can define it somehwere in the file here). I suggest something like:
class BigIntTooLongException(size: Int) extends Exception(s"BigInt is stored as an unsigned long and thus cannot be grater than 8 bytes. Size: $size"
@bryce-anderson I am trying to merge develop branch into Support_Unsigned_Integer. However, I cannot pass all tests in develop branch. NumericTypeTest.scala fails and when I change |
@mehmetgunturkun hard to say for sure without more details, but I don't understand why you would be changing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this is what you had concerns with regarding merging master, so maybe its already addressed.
case Type.Tiny => ShortValue(reader.readUnsignedByte()) | ||
case Type.Short if isSigned(field) => ShortValue(reader.readShortLE()) | ||
case Type.Short => IntValue(reader.readUnsignedShortLE()) | ||
case Type.Int24 if isSigned(field) => IntValue(reader.readIntLE()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type.Int24
and TypeInt24 if signed
need to call reader.readMediumLE()
and reader.readUnsignedMediumLE()
, respectively.
# Conflicts: # finagle-mysql/src/main/scala/com/twitter/finagle/mysql/Row.scala
@bryce-anderson merged |
} | ||
|
||
for (i <- 0 until lengthOfByteArray) { | ||
writer.writeByte(byteArray(i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to reverse the order of the bytes we read out of byteArray
: the BigInt.toByteArray
method returns the bytes in big-endian order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sorry I should have checked that - sending the fix.
val signedTextEncodedQuery = """SELECT `tinyint`, `smallint`, `mediumint`, `int`, `bigint`, `float`, `double`,`decimal`, `bit` FROM `numeric` """ | ||
runTest(c, signedTextEncodedQuery)(testRow) | ||
|
||
// TODO Comment out after ignoreUnsigned = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ready to be uncommented now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, not sure how to set ignoreUnsigned
flag for StringEncodedRow and BinaryEncodedRow.
class StringEncodedRow(rawRow: Buf, val fields: IndexedSeq[Field], indexMap: Map[String, Int], val ignoreUnsigned: Boolean = true) extends Row
class BinaryEncodedRow(rawRow: Buf, val fields: IndexedSeq[Field], indexMap: Map[String, Int], val ignoreUnsigned: Boolean = true)
As you see I set it to true
in the constructor, how to set it to false explicitly, especially in a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're going to need to make an way to configure the Client
to operate in 'support unsigned' mode. This will entail making a Stack.Param
and threading the boolean all the way through. Which will be a bit of a pain, but it is what it is. I'd remove the default argument for the Row constructors since you're always going to have to pass it up from a higher level anyway.
Stack.Params can be a little convoluted at times (all times, really), so look at the HTTP implementation for an example here. The basic idea is to put typed 'params' (Stack.Param[T]
) into a map where the stored items are typed based on the key (the T
) and all params have a default.
It can then be configured using the pattern:
client.configured(SupportUnsignedInts(true))
.whatever(..)
.newService(..)
@mehmetgunturkun, hows it going on this? |
@mehmetgunturkun this has been merged internally and we're just waiting for the github sync process to publish it. Good news is that it should also make it out into pending finagle release. |
Summary: This reverts commit ba1438af3971129c7e6c80080efdf669a7fde4e3. This commit is based on a Github PR, #618, and the authorship wasn't merged correctly so this will revert the work, then a follow up will revert-the-revert to fix the authorship. Differential Revision: https://phabricator.twitter.biz/D80814
Closed by 598acc5. Thanks again, @mehmetgunturkun. |
Hey, @bryce-anderson that's great news. Thanks for your support too. |
Summary: This reverts commit ba1438af3971129c7e6c80080efdf669a7fde4e3. This commit is based on a Github PR, twitter#618, and the authorship wasn't merged correctly so this will revert the work, then a follow up will revert-the-revert to fix the authorship. Differential Revision: https://phabricator.twitter.biz/D80814
Summary: This reverts commit ba1438af3971129c7e6c80080efdf669a7fde4e3. This commit is based on a Github PR, twitter#618, and the authorship wasn't merged correctly so this will revert the work, then a follow up will revert-the-revert to fix the authorship. Differential Revision: https://phabricator.twitter.biz/D80814
Problem
finagle-mysql does not support unsigned integers. If an integer field is with unsigned attribute, it has larger maximum limit (4294967295) than signed integer. Since there is no support for unsigned integer in Java, if the value is higher than Integer.MAX_VALUE (2147483647), it is interpreted as IntValue(-1)
Solution
In order to understand the field is unsigned or not, I added bit masks for type attributes and a function to Field class. Bit masks for attributes are listed in the source code of mysql (https://github.com/mysql/mysql-server/blob/5.7/include/mysql_com.h). During field decoding, if the field is unsigned, I created a LongValue and used readUnsignedIntLE function from MysqlBufReader.
Additionally, added an integration test case into com.twitter.finagle.mysql.integration.TypeTest.NumericTypeTest. Added unsigned int field, inserted an integer value higher than Integer.MAX_VALUE, and expected to be parsed as LongValue.