Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support foreign EntityIds. #678

Merged
merged 1 commit into from Sep 22, 2016
Merged

Conversation

jakobw
Copy link
Member

@jakobw jakobw commented Sep 19, 2016

@@ -15,7 +15,7 @@ class PropertyId extends EntityId implements Int32EntityId {
/**
* @since 0.5
*/
const PATTERN = '/^P[1-9]\d{0,9}\z/i';
const PATTERN = '/^(:?|(\w+:)+)P[1-9]\d{0,9}\z/i';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception case in the constructor is not tested

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided earlier today that the patterns should not cover the prefix. It will be left to the DispatchingEntityIdParser to apply the pattern to only the last part of the serialized ID, by using the splitSerialization emthod.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is :prefix1:prefix2:Q42 allowed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it gets normalized to prefix1:prefix2:Q42 in the constructor.

* Returns the foreign repository name or empty string for local entities.
* @return string
*/
public function getRepoName() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-private should have @since tag

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this fixed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

}

protected function sanitizeIdSerialization( $id ) {
return $this->upcaseLocalId( ltrim( $id, ':' ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ltrim here is not tested as far as I can tell

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this fixed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!


private function upcaseLocalId( $id ) {
$parts = explode( ':', $id );
$parts[count( $parts ) - 1] = strtoupper( end( $parts ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it decided to not have (local) IDs with colons in them? I'm not sure if such a thing would be needed for commons or some such.

If there is no such decision, then how about not doing this little code reuse via inheritance thing and simply putting the few lines in both ItemId and PropertyId? Same goes for the test. As far as I remember the reason for why we have this code and the test like this is that we started out with just having EntityId and only later introduced ItemId and PropertyId, ie for legacy reasons and not by design.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local IDs never have colons in them (we decided that), but foreign IDs can be "chained": foo:bar:Q123 is Q123 in the repo that is called bar by the repo foo. That's also how interwiki prefixes work. You can do en:fr:wiktionary:de:wikibooks:it:wikipedia:en:Foo. This will result in a series of HTTP redirects, eventually giving you the Foo page on English Wikipedia.

My personal preference for code re-use would be: protected static upcaseLocalId() called from the constructors of ItemId and ProeprtyId.

@jakobw jakobw changed the title Support foreign EntityIds. [WIP] Support foreign EntityIds. Sep 19, 2016
/**
* @return string
*/
public function getLocalPart() {
Copy link
Member

@manicki manicki Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so it is clear for me: Given serialized ID foo:bar:Q1337 is "local part" of it Q1337 or bar:Q1337.
I thought the former but I am not sure if we're done with the naming discussion here. If the latter is the local part what is Q1337 then?
Sorry if I am just creating confusion.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea... "core ID"? "actual ID"? "local local ID"?
Or the bar:Q1337 part could be called the "foreign local ID"? Or "relative ID"?...

Bah. The two hardest things in programming. Cache invalidation, naming things, and off-by-one errors.

@@ -18,6 +18,13 @@
protected $serialization;

/**
* @param $serialization
Copy link
Member

@manicki manicki Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is always a string, right? then you could have @param string $serialization here

* Returns an array with 3 elements: the foreign repository name as the first element, the local ID as the last
* element and everything that is in between as the second element.
*
* @param $serialization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string?

* element and everything that is in between as the second element.
*
* @param $serialization
* @return array
Copy link
Member

@manicki manicki Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems PHP's explode always returns an array of strings, so this method also does so. This might also be @return string[] then

* @param array $parts
* @return string
*/
public static function joinSerialization( $parts ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be a type hint for $parts, not only phpdoc @param comment

@@ -48,7 +48,7 @@ private function assertValidIdFormat( $idSerialization ) {
* @return int
*/
public function getNumericId() {
return (int)substr( $this->serialization, 1 );
return (int)substr( $this->getSerialization(), 1 );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is wrong, as it should take the last part of the serialized ID and drop P from it. E.g. this will break for foo:P123.
Also, I believe you forgot to do changes regarding accessing $this->serialization you did for this class in ItemId class as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, getNumericId should fail for foreign IDs. Using the numeric ID of a foreign ID locally can never be right.

Copy link

@brightbyte brightbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly documentation issues.
Normalization should be split into a part that always applies, and a part that only applies for some IDs.

}

/**
* Concatenates parts of an EntityId serialization with ':'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't state the contract. The contract is "builds an ID serialization from the parts returned by splitSerialization()".

* element and everything that is in between as the second element.
*
* @param $serialization
* @return array

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be string[] and should say that this wall always be 3 elements.

/**
* Concatenates parts of an EntityId serialization with ':'.
*
* @param array $parts

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string[]. Could be required to be exactly three elements, but maybe it's useful to leave this open.

}

/**
* Returns the foreign repository name or empty string for local entities.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should say that it returns '' if it's a local ID. Should also clarify that for a "chained" foreighn ID, this returns the first part only.

public function getLocalPart() {
$parts = self::splitSerialization( $this->serialization );

return self::joinSerialization( array( '', $parts[1], $parts[2] ) );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The '' bit isn't needed.

return strpos( $this->serialization, ':' ) > 0;
}

protected static function normalizeIdSerialization( $id ) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add at least a minimal doc block that states the type of the parameter. Perhaps also explain when this is (or should be) called.

protected static function normalizeIdSerialization( $id ) {
$parts = self::splitSerialization( ltrim( $id, ':' ) );

return self::joinSerialization( array(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trimming of the leading ":" should not be tied to the upper-casing of the local part. The leading ":" must always be stripped, that's part of the convention for foreign repo prefixes, which use the same syntax for all IDs.
Making the local part upper case is tied to specific EntityId types, and should only be called by those that actually need that.

* @return array
*/
public static function splitSerialization( $serialization ) {
$parts = explode( ':', $serialization );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should strip any leading ":" first.

* @param $serialization
*/
protected function __construct( $serialization ) {
$this->serialization = $serialization;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should fail if $serialization[0] === ':', and also if $serialization is empty or has the wrong type. Keep the checks simple though, this is a potential performance hotspot, and callers (subclass constructors) should already be doing normalization.

}

/**
* @return string

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain what the "local part" is, especially in the context of chained foreign IDs.

@@ -24,7 +24,7 @@ class ItemId extends EntityId implements Int32EntityId {
*/
public function __construct( $idSerialization ) {
$this->assertValidIdFormat( $idSerialization );
$this->serialization = strtoupper( $idSerialization );
parent::__construct( parent::normalizeIdSerialization( $idSerialization ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What motivated using a constructor in the parent class? Definitely not worth it IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to do normalization in the constructor and to make serialization private to the parent class which isn't possible yet since it's still used in the unserialize methods of the subclasses.

* @dataProvider serializationSplitProvider
*/
public function testJoinSerialization( $serialization, $split ) {
$this->assertSame( ltrim( $serialization, ':' ), EntityId::joinSerialization( $split ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having logic such as ltrim in tests is an anti-pattern.

$prefixRemainder = implode( ':', $parts );

return array( $repoName, $prefixRemainder, $localPart );
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is procedural and uses a positional array for returning multiple values. Both are not cool

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's not the prettiest code but we decided to leave it procedural for now since the alternative was less readable. Also left it positional since the order in which the parts are returned is more intuitive than the names we could come up with.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Positional arrays act as tuples, which seem to be a standard feature in several languages, especially functional ones. They are ideomatic in php via the list() syntax.

Also, returning a list seems fine here, since this is a method that splits something.

I agree with jakobw: We could add keys and make the result associative - if we could think of good names for the parts. "start", "middle", "end" are not very helpful...

@jakobw jakobw force-pushed the foreign-entityids branch 2 times, most recently from a3e6314 to a13e8c2 Compare September 20, 2016 15:37
@jakobw
Copy link
Member Author

jakobw commented Sep 20, 2016

@brightbyte @manicki @JeroenDeDauw thanks for the review!
I updated the patch and tried to address all of the comments.

@jakobw jakobw force-pushed the foreign-entityids branch 2 times, most recently from dfc0baa to 1e5a85f Compare September 21, 2016 11:26
@jakobw jakobw changed the title [WIP] Support foreign EntityIds. Support foreign EntityIds. Sep 21, 2016
@jakobw
Copy link
Member Author

jakobw commented Sep 21, 2016

Updated again to do foreign repo prefix validation in EntityId so that the subclasses don't have to know about it as per discussion with @manicki and @brightbyte.

@@ -17,6 +18,30 @@

protected $serialization;

const PATTERN = '/^:?(\w+:)*[^:]+\z/';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, this regular expression does allow :a:b:Q1, but the expressions in the subclasses do not. I believe this should not be allowed, to make things as simple as possible and not have multiple string representations (a:Q1 and :a:Q1) for the exact same ID.

Please add tests for all the cases listed above. :Q1 should succeed while :a:Q1 should fail.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leading colons can and should always be ignored. They are currently stripped by the constructor.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should :a:Q1 fail? It should just be normalized to a:Q1.
To wit, :a:Q1 is valid as input, but should always be normalized. No EntityId would ever return a serialization starting with : from getSerialization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex in ItemId and PropertyId do not allow :a:Q1, but this here does. One must be wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thiemowmde I think you're looking at an outdated version of ItemId and PropertyId. Github shows some strange things on the "Conversation" page of the pull request since they introduced their new review features...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we agree this morning to remove the prefix matching part from this pattern?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, this is in EntityId. I would have expected the pattern for checking the prefix syntax to have a different constant name than the patterns that check the final parts of the ID.

throw new InvalidArgumentException( '$serialization must be a string' );
}

if ( $serialization === '' ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it best practice to merge the first two trivial assertions into a single "must be a non-empty string".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually started to do it the other way around, for readability... I'd consider it a matter of taste.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will always merge it when I see it. Super trivial and super easy to understand and to read. I wish there would be a "non-empty string" check in Assert. This pattern repeats a thousand times in our code base.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thiemowmde just make a pull request, it's "ours" anyway: https://github.com/wmde/Assert

}

private function assertValidSerialization( $serialization ) {
if ( empty( $serialization ) ) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use empty() for strings, because empty( "0" ) returns true, which is not what we want.
empty() should really only be used on arrays. For strings, just use === ''.

The same problem exists with == 0 or !$foo.

@@ -24,7 +25,7 @@ class ItemId extends EntityId implements Int32EntityId {
*/
public function __construct( $idSerialization ) {
$this->assertValidIdFormat( $idSerialization );
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to validate the prefix and the final part of the ID separately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you already saw the latest version of this but validating the two parts completely separately is kind of a chicken-egg-problem since the splitting method also wants to validate the serialization. EntityId now looks at the entire string to validate the prefix and also makes sure that there's something at the end that does not contain colons. The last part is then validated in the subclass as it should be.

@@ -48,7 +49,11 @@ private function assertValidIdFormat( $idSerialization ) {
* @return int

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document that this throws a RuntimeException if called on a foreign ID.

@@ -64,7 +69,7 @@ public function getEntityType() {
* @return string
*/
public function serialize() {
return json_encode( array( 'property', $this->serialization ) );
return json_encode( array( 'property', $this->getSerialization() ) );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really like to drop support for php serialization, but i'm not 100% sure we really use it nowhere, ever.

@@ -15,7 +16,7 @@ class ItemId extends EntityId implements Int32EntityId {
/**
* @since 0.5
*/
const PATTERN = '/^Q[1-9]\d{0,9}\z/i';
const PATTERN = '/^(:?|(\w+:)+)Q[1-9]\d{0,9}\z/i';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided earlier today that the patterns should not cover the prefix. It will be left to the DispatchingEntityIdParser to apply the pattern to only the last part of the serialized ID, by using the splitSerialization emthod.

@@ -15,7 +15,7 @@ class PropertyId extends EntityId implements Int32EntityId {
/**
* @since 0.5
*/
const PATTERN = '/^P[1-9]\d{0,9}\z/i';
const PATTERN = '/^(:?|(\w+:)+)P[1-9]\d{0,9}\z/i';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided earlier today that the patterns should not cover the prefix. It will be left to the DispatchingEntityIdParser to apply the pattern to only the last part of the serialized ID, by using the splitSerialization emthod.

* @return string
*/
public static function joinSerialization( array $parts ) {
return implode( ':', array_filter( $parts ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not fully compatible with the splitSerialization implementation above. splitSerialization guarantees the first last element is never empty. But this code here returns the exact same result for all these test cases:

  • [ 'Q1' , '', '' ]
  • [ '' , 'Q1', '' ]
  • [ '' , '', 'Q1' ]

This is quite bad. Two of these cases should fail (I believe the first two).

This code also fails for [ 'wd', '', '0' ], [ '0', '0', '0' ] and so on, because the string "0" is falsy in PHP. But according to the regex on top this string is a valid ID.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It's now checking that the last element is not an empty string and also makes sure '0' doesn't get filtered.

public function getLocalPart() {
$parts = self::splitSerialization( $this->serialization );

return self::joinSerialization( array( $parts[1], $parts[2] ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shifts everything and passes the entity ID as the "remainder" part, and the remainder part as a prefix.

joinSerialization( [ '', $parts[1], $parts[2] ] )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only cause the returned string to have a preceding : which will be ignored.

* @param string $id
* @return string
*/
protected static function upcaseLocalId( $id ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"upcase" is a weird naming convention. Why not "upperCase"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I heard upcase before e.g. in the Ruby standard library but I could change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this method turns the string upper case is an implementation detail, by the way, and should not be in the name. Something like "normalizeLocalId" would be better, I believe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normalizing the local ID should be the concern of the subclass. This method is just there for convenience so that all subclasses can easily access it to normalize the local ID.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed with Thiemo that it would be cleaner to drop the convenience function, and just copy the relevant line of code to the child constructors that need it. I don't care terribly much either way, though.

throw new RuntimeException( 'getNumericId must not be called on foreign ItemIds' );
}

return (int)substr( $this->getSerialization(), 1 );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know this is normalized and does not start with a colon? I do miss this line of code somehow, or don't see it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's normalized in the parent constructor.

@@ -64,7 +70,7 @@ public function getEntityType() {
* @return string
*/
public function serialize() {
return json_encode( array( 'property', $this->serialization ) );
return json_encode( array( 'property', $this->getSerialization() ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of basically reverting the inlining we did here (and the same in other lines of code in this patch) and calling the getters again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that serialization is supposed to be private to the parent class. The only thing blocking that is that it's still set directly in the unserialize method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to not touch these "magic" serialization methods to much, and not discuss this unrelated issue here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this can be left for later. If we can't change unserialize(), why change serialize().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it back to not using getSerialization

@@ -23,8 +24,9 @@ class ItemId extends EntityId implements Int32EntityId {
* @throws InvalidArgumentException
*/
public function __construct( $idSerialization ) {
$this->assertValidIdFormat( $idSerialization );
$this->serialization = strtoupper( $idSerialization );
parent::__construct( self::upcaseLocalId( $idSerialization ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is upcaseLocalId called here, when both upcaseLocalId and the constructor call are in the parent class? The way this code is organized is a bit confusing and can mislead people into using this wrong. Basically: Not all entity IDs must be upper case. This is a decision the specific subclass (ItemId and PropertyId) must make. Which raises the question why the helper method is in the base class. Answer: Code sharing.

In the past we tried hard to get rid of code sharing in the Entity and EntityId base classes.

I suggest to rearrange the code in a way that you can reuse the split method, do a trivial strtoupper( $array[2] ) here in the subclass and then pass the array to the base class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to rearrange the code in a way that you can reuse the split method, do a trivial strtoupper( $array[2] ) here in the subclass and then pass the array to the base class.

I don't understand the last bit here. Where would I pass an array to the base class and what would the base class do with it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method encodes knowledge about an implementation detail in the wrong class. The base class should not know anything about the fact that some subclasses do a magic "to upper case" conversion. Simply inline the strtoupper in the two subclasses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@jakobw jakobw force-pushed the foreign-entityids branch 3 times, most recently from 26a4612 to 11ad539 Compare September 21, 2016 15:05
public function getLocalPart() {
$parts = self::splitSerialization( $this->serialization );

return self::joinSerialization( array( $parts[1], $parts[2] ) );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way you do it seems a bit hacky way to me. I know it works currently but one could argue it only works by chance.
The code does not check this but I believe one might expect EntityId::joinSerialization to always get array with three elements (per symmetry to EntityId::splitSerialization always returning three-element array).
What @thiemowmde suggested seems to make sense in this context (as we are skipping the first element - the repo name). Your call to joinSerialization might be interpreted as a short form of joinSerialization(array( $parts[1], $parts[2], '' ) ). Which would be obviously wrong.
And also I think given you've changed the array_filter call in joinSerialization the code @thiemowmde proposed should work just fine and not generate redundant leading colon (if I am reading it right, I havent checked it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that having '' as the first array element does not do anything to the return value. In fact I had it there in the beginning and removed it by request here: #678 (comment). It does seem to confuse people though so I guess I'll add it again :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, yeah. After looking at at this code multiple times I even started to think maybe something like return $parts[1] !== '' ? $parts[1] . $parts[2] : $parts[2]; would do better here :)

@@ -17,6 +18,30 @@

protected $serialization;

const PATTERN = '/^:?(\w+:)*[^:]+\z/';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we agree this morning to remove the prefix matching part from this pattern?

@@ -64,7 +70,7 @@ public function getEntityType() {
* @return string
*/
public function serialize() {
return json_encode( array( 'property', $this->serialization ) );
return json_encode( array( 'property', $this->getSerialization() ) );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this can be left for later. If we can't change unserialize(), why change serialize().

@JeroenDeDauw
Copy link
Contributor

The new non-private methods still don't have @since tags

* @param string $serialization
* @return string[] Array containing the serialization split into 3 parts.
*/
public static function splitSerialization( $serialization ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell this could be protected and non-static

Edit: I had a go at modifying the code and this appears to work

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@manicki manicki Sep 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just checked, and yes, it does
edit: keeping in mind that the services PR also depends on #679 which is depending on this PR!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused - I thought the EntityIdParser would need splitSerialization, which is removed in Jeroen's PR? I'll have to look at it more closely.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this for now, we can still discuss Jeroen's change before the next release.

*/
public function __construct( $serialization ) {
self::assertValidSerialization( $serialization );
$this->serialization = self::normalizeIdSerialization( $serialization );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the derivatives, I suspect this normalization is not needed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to make sure that any leading ":" is always stripped.

*
* @return string
*/
public function getRepoName() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getRepositoryName

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

*
* @return string
*/
public function getLocalPart() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this naming. If you only remove the first repository prefix, then the remainder can still be non-local right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it turned out to be tricky naming this one. Suggestions are very welcome!

* @dataProvider invalidSerializationProvider
*/
public function testSplitSerializationFails_GivenInvalidSerialization( $serialization ) {
$this->setExpectedException( 'InvalidArgumentException' );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InvalidArgumentException::class

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this and all similar cases!

* @dataProvider invalidJoinSerializationDataProvider
*/
public function testJoinSerializationFails_GivenEmptyId( $parts ) {
$this->setExpectedException( 'InvalidArgumentException' );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InvalidArgumentException::class

* @dataProvider invalidSerializationProvider
*/
public function testConstructor( $serialization ) {
$this->setExpectedException( 'InvalidArgumentException' );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InvalidArgumentException::class

@@ -147,4 +151,9 @@ public function invalidNumericIdProvider() {
);
}

public function testGetNumericIdThrowsExceptionOnForeignIds() {
$this->setExpectedException( 'RuntimeException' );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuntimeException::class

@@ -147,4 +150,9 @@ public function invalidNumericIdProvider() {
);
}

public function testGetNumericIdThrowsExceptionOnForeignIds() {
$this->setExpectedException( 'RuntimeException' );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuntimeException:class

array( ':Q42', 'Q42' ),
array( 'foo:Q42', 'foo:Q42' ),
array( 'foo:bar:q42', 'foo:bar:Q42' ),
array( 'Q42', 'Q42' ),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this is there twice

JeroenDeDauw added a commit that referenced this pull request Sep 22, 2016
Follow up to #678

This removes the addition of static code to EntityId and pushes
the responsibility of parsing serializations out of the class.

I did not touch various other issues and have created a number of
small ones, so this leaves the code in a state, that while it works,
still needs some work before it is mergeable. That includes making
names consistent again and adding some edge case tests.

The first constructor argument contains what the existing getter
has named the "local part", which can also include prefixes defined
on other sites. It's not immedidately clear to me why we need this
here, so I'd be nice if someone could explain that. Should this
not already have been resolved by the time the id is constructed?
JeroenDeDauw added a commit that referenced this pull request Sep 22, 2016
Follow up to #678

This removes the addition of static code to EntityId and pushes
the responsibility of parsing serializations out of the class.

I did not touch various other issues and have created a number of
small ones, so this leaves the code in a state, that while it works,
still needs some work before it is mergeable. That includes making
names consistent again and adding some edge case tests.

The first constructor argument contains what the existing getter
has named the "local part", which can also include prefixes defined
on other sites. It's not immedidately clear to me why we need this
here, so I'd be nice if someone could explain that. Should this
not already have been resolved by the time the id is constructed?
@JeroenDeDauw
Copy link
Contributor

I think the responsibility of this string splitting is misplaced and that all this static code is going down the wrong path. Follow up: #681

@brightbyte
Copy link

@JeroenDeDauw I agree in principle, but I have not yet seen a good alternative. The current solution isn't perfect, but I think the cost in terms of technical dept is low.

The problem as I see it is that there is now a shared responsibility between the EntityId classes and the EntityIdParser implementations: both need to be able to split the ID into the relevant three parts (the EntityId for validation and normalization, the parser for mapping prefixes and picking the right instantiator callback for the EntityId).

I want to keep the knowledge about the syntax used for these parts in one place (currently, in static methods in EntityId). A cleaner way to do this would be an EntityIdPartitioner class I suppose (with "parser" being taken). But the EntityId's constructor would need access to this, which makes things annoying again.

In any case, the syntax used for prefixing IDs is not "pluggable", it has to be the same for all types of entities, and more importantly, it has to be the same across all repos to allow federation. Because of this, I think it's ok to encode it in static methods bound to the EntityId base class: because it is part of the contract of all EntityIds.

*/
public function __construct( $serialization ) {
self::assertValidSerialization( $serialization );
$this->serialization = self::normalizeIdSerialization( $serialization );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to make sure that any leading ":" is always stripped.

* @param string $serialization
* @return string[] Array containing the serialization split into 3 parts.
*/
public static function splitSerialization( $serialization ) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this for now, we can still discuss Jeroen's change before the next release.

@brightbyte brightbyte merged commit ced2370 into wmde:master Sep 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants