New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support foreign EntityIds. #678
Conversation
@@ -15,7 +15,7 @@ class PropertyId extends EntityId implements Int32EntityId { | |||
/** | |||
* @since 0.5 | |||
*/ | |||
const PATTERN = '/^P[1-9]\d{0,9}\z/i'; | |||
const PATTERN = '/^(:?|(\w+:)+)P[1-9]\d{0,9}\z/i'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception case in the constructor is not tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided earlier today that the patterns should not cover the prefix. It will be left to the DispatchingEntityIdParser to apply the pattern to only the last part of the serialized ID, by using the splitSerialization emthod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is :prefix1:prefix2:Q42
allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it gets normalized to prefix1:prefix2:Q42
in the constructor.
* Returns the foreign repository name or empty string for local entities. | ||
* @return string | ||
*/ | ||
public function getRepoName() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-private should have @since
tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
} | ||
|
||
protected function sanitizeIdSerialization( $id ) { | ||
return $this->upcaseLocalId( ltrim( $id, ':' ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ltrim here is not tested as far as I can tell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes!
|
||
private function upcaseLocalId( $id ) { | ||
$parts = explode( ':', $id ); | ||
$parts[count( $parts ) - 1] = strtoupper( end( $parts ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it decided to not have (local) IDs with colons in them? I'm not sure if such a thing would be needed for commons or some such.
If there is no such decision, then how about not doing this little code reuse via inheritance thing and simply putting the few lines in both ItemId and PropertyId? Same goes for the test. As far as I remember the reason for why we have this code and the test like this is that we started out with just having EntityId and only later introduced ItemId and PropertyId, ie for legacy reasons and not by design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local IDs never have colons in them (we decided that), but foreign IDs can be "chained": foo:bar:Q123 is Q123 in the repo that is called bar by the repo foo. That's also how interwiki prefixes work. You can do en:fr:wiktionary:de:wikibooks:it:wikipedia:en:Foo
. This will result in a series of HTTP redirects, eventually giving you the Foo page on English Wikipedia.
My personal preference for code re-use would be: protected static upcaseLocalId() called from the constructors of ItemId and ProeprtyId.
f293876
to
a2c1ee9
Compare
/** | ||
* @return string | ||
*/ | ||
public function getLocalPart() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so it is clear for me: Given serialized ID foo:bar:Q1337
is "local part" of it Q1337
or bar:Q1337
.
I thought the former but I am not sure if we're done with the naming discussion here. If the latter is the local part what is Q1337
then?
Sorry if I am just creating confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea... "core ID"? "actual ID"? "local local ID"?
Or the bar:Q1337
part could be called the "foreign local ID"? Or "relative ID"?...
Bah. The two hardest things in programming. Cache invalidation, naming things, and off-by-one errors.
@@ -18,6 +18,13 @@ | |||
protected $serialization; | |||
|
|||
/** | |||
* @param $serialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is always a string, right? then you could have @param string $serialization
here
* Returns an array with 3 elements: the foreign repository name as the first element, the local ID as the last | ||
* element and everything that is in between as the second element. | ||
* | ||
* @param $serialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string?
* element and everything that is in between as the second element. | ||
* | ||
* @param $serialization | ||
* @return array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems PHP's explode always returns an array of strings, so this method also does so. This might also be @return string[]
then
* @param array $parts | ||
* @return string | ||
*/ | ||
public static function joinSerialization( $parts ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be a type hint for $parts
, not only phpdoc @param
comment
@@ -48,7 +48,7 @@ private function assertValidIdFormat( $idSerialization ) { | |||
* @return int | |||
*/ | |||
public function getNumericId() { | |||
return (int)substr( $this->serialization, 1 ); | |||
return (int)substr( $this->getSerialization(), 1 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is wrong, as it should take the last part of the serialized ID and drop P
from it. E.g. this will break for foo:P123
.
Also, I believe you forgot to do changes regarding accessing $this->serialization
you did for this class in ItemId
class as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, getNumericId should fail for foreign IDs. Using the numeric ID of a foreign ID locally can never be right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly documentation issues.
Normalization should be split into a part that always applies, and a part that only applies for some IDs.
} | ||
|
||
/** | ||
* Concatenates parts of an EntityId serialization with ':'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't state the contract. The contract is "builds an ID serialization from the parts returned by splitSerialization()".
* element and everything that is in between as the second element. | ||
* | ||
* @param $serialization | ||
* @return array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be string[] and should say that this wall always be 3 elements.
/** | ||
* Concatenates parts of an EntityId serialization with ':'. | ||
* | ||
* @param array $parts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string[]. Could be required to be exactly three elements, but maybe it's useful to leave this open.
} | ||
|
||
/** | ||
* Returns the foreign repository name or empty string for local entities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should say that it returns '' if it's a local ID. Should also clarify that for a "chained" foreighn ID, this returns the first part only.
public function getLocalPart() { | ||
$parts = self::splitSerialization( $this->serialization ); | ||
|
||
return self::joinSerialization( array( '', $parts[1], $parts[2] ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The '' bit isn't needed.
return strpos( $this->serialization, ':' ) > 0; | ||
} | ||
|
||
protected static function normalizeIdSerialization( $id ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add at least a minimal doc block that states the type of the parameter. Perhaps also explain when this is (or should be) called.
protected static function normalizeIdSerialization( $id ) { | ||
$parts = self::splitSerialization( ltrim( $id, ':' ) ); | ||
|
||
return self::joinSerialization( array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trimming of the leading ":" should not be tied to the upper-casing of the local part. The leading ":" must always be stripped, that's part of the convention for foreign repo prefixes, which use the same syntax for all IDs.
Making the local part upper case is tied to specific EntityId types, and should only be called by those that actually need that.
* @return array | ||
*/ | ||
public static function splitSerialization( $serialization ) { | ||
$parts = explode( ':', $serialization ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should strip any leading ":" first.
* @param $serialization | ||
*/ | ||
protected function __construct( $serialization ) { | ||
$this->serialization = $serialization; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should fail if $serialization[0] === ':', and also if $serialization is empty or has the wrong type. Keep the checks simple though, this is a potential performance hotspot, and callers (subclass constructors) should already be doing normalization.
} | ||
|
||
/** | ||
* @return string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain what the "local part" is, especially in the context of chained foreign IDs.
@@ -24,7 +24,7 @@ class ItemId extends EntityId implements Int32EntityId { | |||
*/ | |||
public function __construct( $idSerialization ) { | |||
$this->assertValidIdFormat( $idSerialization ); | |||
$this->serialization = strtoupper( $idSerialization ); | |||
parent::__construct( parent::normalizeIdSerialization( $idSerialization ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What motivated using a constructor in the parent class? Definitely not worth it IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to do normalization in the constructor and to make serialization
private to the parent class which isn't possible yet since it's still used in the unserialize
methods of the subclasses.
* @dataProvider serializationSplitProvider | ||
*/ | ||
public function testJoinSerialization( $serialization, $split ) { | ||
$this->assertSame( ltrim( $serialization, ':' ), EntityId::joinSerialization( $split ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having logic such as ltrim
in tests is an anti-pattern.
$prefixRemainder = implode( ':', $parts ); | ||
|
||
return array( $repoName, $prefixRemainder, $localPart ); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is procedural and uses a positional array for returning multiple values. Both are not cool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's not the prettiest code but we decided to leave it procedural for now since the alternative was less readable. Also left it positional since the order in which the parts are returned is more intuitive than the names we could come up with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Positional arrays act as tuples, which seem to be a standard feature in several languages, especially functional ones. They are ideomatic in php via the list() syntax.
Also, returning a list seems fine here, since this is a method that splits something.
I agree with jakobw: We could add keys and make the result associative - if we could think of good names for the parts. "start", "middle", "end" are not very helpful...
a3e6314
to
a13e8c2
Compare
@brightbyte @manicki @JeroenDeDauw thanks for the review! |
dfc0baa
to
1e5a85f
Compare
Updated again to do foreign repo prefix validation in |
@@ -17,6 +18,30 @@ | |||
|
|||
protected $serialization; | |||
|
|||
const PATTERN = '/^:?(\w+:)*[^:]+\z/'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, this regular expression does allow :a:b:Q1
, but the expressions in the subclasses do not. I believe this should not be allowed, to make things as simple as possible and not have multiple string representations (a:Q1
and :a:Q1
) for the exact same ID.
Please add tests for all the cases listed above. :Q1
should succeed while :a:Q1
should fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leading colons can and should always be ignored. They are currently stripped by the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should :a:Q1
fail? It should just be normalized to a:Q1
.
To wit, :a:Q1
is valid as input, but should always be normalized. No EntityId would ever return a serialization starting with : from getSerialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex in ItemId and PropertyId do not allow :a:Q1
, but this here does. One must be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thiemowmde I think you're looking at an outdated version of ItemId and PropertyId. Github shows some strange things on the "Conversation" page of the pull request since they introduced their new review features...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we agree this morning to remove the prefix matching part from this pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, this is in EntityId. I would have expected the pattern for checking the prefix syntax to have a different constant name than the patterns that check the final parts of the ID.
throw new InvalidArgumentException( '$serialization must be a string' ); | ||
} | ||
|
||
if ( $serialization === '' ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it best practice to merge the first two trivial assertions into a single "must be a non-empty string".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually started to do it the other way around, for readability... I'd consider it a matter of taste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will always merge it when I see it. Super trivial and super easy to understand and to read. I wish there would be a "non-empty string" check in Assert. This pattern repeats a thousand times in our code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thiemowmde just make a pull request, it's "ours" anyway: https://github.com/wmde/Assert
} | ||
|
||
private function assertValidSerialization( $serialization ) { | ||
if ( empty( $serialization ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use empty() for strings, because empty( "0" ) returns true, which is not what we want.
empty() should really only be used on arrays. For strings, just use === ''.
The same problem exists with == 0 or !$foo.
@@ -24,7 +25,7 @@ class ItemId extends EntityId implements Int32EntityId { | |||
*/ | |||
public function __construct( $idSerialization ) { | |||
$this->assertValidIdFormat( $idSerialization ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to validate the prefix and the final part of the ID separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if you already saw the latest version of this but validating the two parts completely separately is kind of a chicken-egg-problem since the splitting method also wants to validate the serialization. EntityId now looks at the entire string to validate the prefix and also makes sure that there's something at the end that does not contain colons. The last part is then validated in the subclass as it should be.
@@ -48,7 +49,11 @@ private function assertValidIdFormat( $idSerialization ) { | |||
* @return int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document that this throws a RuntimeException if called on a foreign ID.
@@ -64,7 +69,7 @@ public function getEntityType() { | |||
* @return string | |||
*/ | |||
public function serialize() { | |||
return json_encode( array( 'property', $this->serialization ) ); | |||
return json_encode( array( 'property', $this->getSerialization() ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really like to drop support for php serialization, but i'm not 100% sure we really use it nowhere, ever.
@@ -15,7 +16,7 @@ class ItemId extends EntityId implements Int32EntityId { | |||
/** | |||
* @since 0.5 | |||
*/ | |||
const PATTERN = '/^Q[1-9]\d{0,9}\z/i'; | |||
const PATTERN = '/^(:?|(\w+:)+)Q[1-9]\d{0,9}\z/i'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided earlier today that the patterns should not cover the prefix. It will be left to the DispatchingEntityIdParser to apply the pattern to only the last part of the serialized ID, by using the splitSerialization emthod.
@@ -15,7 +15,7 @@ class PropertyId extends EntityId implements Int32EntityId { | |||
/** | |||
* @since 0.5 | |||
*/ | |||
const PATTERN = '/^P[1-9]\d{0,9}\z/i'; | |||
const PATTERN = '/^(:?|(\w+:)+)P[1-9]\d{0,9}\z/i'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided earlier today that the patterns should not cover the prefix. It will be left to the DispatchingEntityIdParser to apply the pattern to only the last part of the serialized ID, by using the splitSerialization emthod.
* @return string | ||
*/ | ||
public static function joinSerialization( array $parts ) { | ||
return implode( ':', array_filter( $parts ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not fully compatible with the splitSerialization
implementation above. splitSerialization
guarantees the first last element is never empty. But this code here returns the exact same result for all these test cases:
[ 'Q1' , '', '' ]
[ '' , 'Q1', '' ]
[ '' , '', 'Q1' ]
This is quite bad. Two of these cases should fail (I believe the first two).
This code also fails for [ 'wd', '', '0' ]
, [ '0', '0', '0' ]
and so on, because the string "0"
is falsy in PHP. But according to the regex on top this string is a valid ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It's now checking that the last element is not an empty string and also makes sure '0'
doesn't get filtered.
public function getLocalPart() { | ||
$parts = self::splitSerialization( $this->serialization ); | ||
|
||
return self::joinSerialization( array( $parts[1], $parts[2] ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shifts everything and passes the entity ID as the "remainder" part, and the remainder part as a prefix.
joinSerialization( [ '', $parts[1], $parts[2] ] )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will only cause the returned string to have a preceding :
which will be ignored.
* @param string $id | ||
* @return string | ||
*/ | ||
protected static function upcaseLocalId( $id ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"upcase" is a weird naming convention. Why not "upperCase"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I heard upcase before e.g. in the Ruby standard library but I could change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that this method turns the string upper case is an implementation detail, by the way, and should not be in the name. Something like "normalizeLocalId" would be better, I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normalizing the local ID should be the concern of the subclass. This method is just there for convenience so that all subclasses can easily access it to normalize the local ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed with Thiemo that it would be cleaner to drop the convenience function, and just copy the relevant line of code to the child constructors that need it. I don't care terribly much either way, though.
throw new RuntimeException( 'getNumericId must not be called on foreign ItemIds' ); | ||
} | ||
|
||
return (int)substr( $this->getSerialization(), 1 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know this is normalized and does not start with a colon? I do miss this line of code somehow, or don't see it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's normalized in the parent constructor.
@@ -64,7 +70,7 @@ public function getEntityType() { | |||
* @return string | |||
*/ | |||
public function serialize() { | |||
return json_encode( array( 'property', $this->serialization ) ); | |||
return json_encode( array( 'property', $this->getSerialization() ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit of basically reverting the inlining we did here (and the same in other lines of code in this patch) and calling the getters again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that serialization
is supposed to be private to the parent class. The only thing blocking that is that it's still set directly in the unserialize
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to not touch these "magic" serialization methods to much, and not discuss this unrelated issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this can be left for later. If we can't change unserialize(), why change serialize().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it back to not using getSerialization
@@ -23,8 +24,9 @@ class ItemId extends EntityId implements Int32EntityId { | |||
* @throws InvalidArgumentException | |||
*/ | |||
public function __construct( $idSerialization ) { | |||
$this->assertValidIdFormat( $idSerialization ); | |||
$this->serialization = strtoupper( $idSerialization ); | |||
parent::__construct( self::upcaseLocalId( $idSerialization ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is upcaseLocalId called here, when both upcaseLocalId and the constructor call are in the parent class? The way this code is organized is a bit confusing and can mislead people into using this wrong. Basically: Not all entity IDs must be upper case. This is a decision the specific subclass (ItemId and PropertyId) must make. Which raises the question why the helper method is in the base class. Answer: Code sharing.
In the past we tried hard to get rid of code sharing in the Entity and EntityId base classes.
I suggest to rearrange the code in a way that you can reuse the split method, do a trivial strtoupper( $array[2] )
here in the subclass and then pass the array to the base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to rearrange the code in a way that you can reuse the split method, do a trivial strtoupper( $array[2] ) here in the subclass and then pass the array to the base class.
I don't understand the last bit here. Where would I pass an array to the base class and what would the base class do with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method encodes knowledge about an implementation detail in the wrong class. The base class should not know anything about the fact that some subclasses do a magic "to upper case" conversion. Simply inline the strtoupper
in the two subclasses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
26a4612
to
11ad539
Compare
public function getLocalPart() { | ||
$parts = self::splitSerialization( $this->serialization ); | ||
|
||
return self::joinSerialization( array( $parts[1], $parts[2] ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way you do it seems a bit hacky way to me. I know it works currently but one could argue it only works by chance.
The code does not check this but I believe one might expect EntityId::joinSerialization to always get array with three elements (per symmetry to EntityId::splitSerialization always returning three-element array).
What @thiemowmde suggested seems to make sense in this context (as we are skipping the first element - the repo name). Your call to joinSerialization
might be interpreted as a short form of joinSerialization(array( $parts[1], $parts[2], '' ) )
. Which would be obviously wrong.
And also I think given you've changed the array_filter
call in joinSerialization
the code @thiemowmde proposed should work just fine and not generate redundant leading colon (if I am reading it right, I havent checked it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that having ''
as the first array element does not do anything to the return value. In fact I had it there in the beginning and removed it by request here: #678 (comment). It does seem to confuse people though so I guess I'll add it again :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, yeah. After looking at at this code multiple times I even started to think maybe something like return $parts[1] !== '' ? $parts[1] . $parts[2] : $parts[2];
would do better here :)
11ad539
to
41e8f75
Compare
@@ -17,6 +18,30 @@ | |||
|
|||
protected $serialization; | |||
|
|||
const PATTERN = '/^:?(\w+:)*[^:]+\z/'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we agree this morning to remove the prefix matching part from this pattern?
@@ -64,7 +70,7 @@ public function getEntityType() { | |||
* @return string | |||
*/ | |||
public function serialize() { | |||
return json_encode( array( 'property', $this->serialization ) ); | |||
return json_encode( array( 'property', $this->getSerialization() ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this can be left for later. If we can't change unserialize(), why change serialize().
The new non-private methods still don't have |
* @param string $serialization | ||
* @return string[] Array containing the serialization split into 3 parts. | ||
*/ | ||
public static function splitSerialization( $serialization ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell this could be protected and non-static
Edit: I had a go at modifying the code and this appears to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work with wmde/WikibaseDataModelServices#146 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just checked, and yes, it does
edit: keeping in mind that the services PR also depends on #679 which is depending on this PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused - I thought the EntityIdParser would need splitSerialization, which is removed in Jeroen's PR? I'll have to look at it more closely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll merge this for now, we can still discuss Jeroen's change before the next release.
*/ | ||
public function __construct( $serialization ) { | ||
self::assertValidSerialization( $serialization ); | ||
$this->serialization = self::normalizeIdSerialization( $serialization ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the derivatives, I suspect this normalization is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to make sure that any leading ":" is always stripped.
* | ||
* @return string | ||
*/ | ||
public function getRepoName() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getRepositoryName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* | ||
* @return string | ||
*/ | ||
public function getLocalPart() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by this naming. If you only remove the first repository prefix, then the remainder can still be non-local right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it turned out to be tricky naming this one. Suggestions are very welcome!
* @dataProvider invalidSerializationProvider | ||
*/ | ||
public function testSplitSerializationFails_GivenInvalidSerialization( $serialization ) { | ||
$this->setExpectedException( 'InvalidArgumentException' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InvalidArgumentException::class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this and all similar cases!
* @dataProvider invalidJoinSerializationDataProvider | ||
*/ | ||
public function testJoinSerializationFails_GivenEmptyId( $parts ) { | ||
$this->setExpectedException( 'InvalidArgumentException' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InvalidArgumentException::class
* @dataProvider invalidSerializationProvider | ||
*/ | ||
public function testConstructor( $serialization ) { | ||
$this->setExpectedException( 'InvalidArgumentException' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InvalidArgumentException::class
@@ -147,4 +151,9 @@ public function invalidNumericIdProvider() { | |||
); | |||
} | |||
|
|||
public function testGetNumericIdThrowsExceptionOnForeignIds() { | |||
$this->setExpectedException( 'RuntimeException' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RuntimeException::class
@@ -147,4 +150,9 @@ public function invalidNumericIdProvider() { | |||
); | |||
} | |||
|
|||
public function testGetNumericIdThrowsExceptionOnForeignIds() { | |||
$this->setExpectedException( 'RuntimeException' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RuntimeException:class
array( ':Q42', 'Q42' ), | ||
array( 'foo:Q42', 'foo:Q42' ), | ||
array( 'foo:bar:q42', 'foo:bar:Q42' ), | ||
array( 'Q42', 'Q42' ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this is there twice
Follow up to #678 This removes the addition of static code to EntityId and pushes the responsibility of parsing serializations out of the class. I did not touch various other issues and have created a number of small ones, so this leaves the code in a state, that while it works, still needs some work before it is mergeable. That includes making names consistent again and adding some edge case tests. The first constructor argument contains what the existing getter has named the "local part", which can also include prefixes defined on other sites. It's not immedidately clear to me why we need this here, so I'd be nice if someone could explain that. Should this not already have been resolved by the time the id is constructed?
Follow up to #678 This removes the addition of static code to EntityId and pushes the responsibility of parsing serializations out of the class. I did not touch various other issues and have created a number of small ones, so this leaves the code in a state, that while it works, still needs some work before it is mergeable. That includes making names consistent again and adding some edge case tests. The first constructor argument contains what the existing getter has named the "local part", which can also include prefixes defined on other sites. It's not immedidately clear to me why we need this here, so I'd be nice if someone could explain that. Should this not already have been resolved by the time the id is constructed?
I think the responsibility of this string splitting is misplaced and that all this static code is going down the wrong path. Follow up: #681 |
41e8f75
to
1a6953b
Compare
@JeroenDeDauw I agree in principle, but I have not yet seen a good alternative. The current solution isn't perfect, but I think the cost in terms of technical dept is low. The problem as I see it is that there is now a shared responsibility between the EntityId classes and the EntityIdParser implementations: both need to be able to split the ID into the relevant three parts (the EntityId for validation and normalization, the parser for mapping prefixes and picking the right instantiator callback for the EntityId). I want to keep the knowledge about the syntax used for these parts in one place (currently, in static methods in EntityId). A cleaner way to do this would be an EntityIdPartitioner class I suppose (with "parser" being taken). But the EntityId's constructor would need access to this, which makes things annoying again. In any case, the syntax used for prefixing IDs is not "pluggable", it has to be the same for all types of entities, and more importantly, it has to be the same across all repos to allow federation. Because of this, I think it's ok to encode it in static methods bound to the EntityId base class: because it is part of the contract of all EntityIds. |
Bug: T145516
1a6953b
to
ac1cd71
Compare
*/ | ||
public function __construct( $serialization ) { | ||
self::assertValidSerialization( $serialization ); | ||
$this->serialization = self::normalizeIdSerialization( $serialization ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to make sure that any leading ":" is always stripped.
* @param string $serialization | ||
* @return string[] Array containing the serialization split into 3 parts. | ||
*/ | ||
public static function splitSerialization( $serialization ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll merge this for now, we can still discuss Jeroen's change before the next release.
Task on Phabricator: https://phabricator.wikimedia.org/T145516