Support foreign EntityIds. #678

jakobw · 2016-09-19T09:44:55Z

Task on Phabricator: https://phabricator.wikimedia.org/T145516

JeroenDeDauw · 2016-09-19T14:31:07Z

src/Entity/PropertyId.php

@@ -15,7 +15,7 @@ class PropertyId extends EntityId implements Int32EntityId {
 	/**
 	 * @since 0.5
 	 */
-	const PATTERN = '/^P[1-9]\d{0,9}\z/i';
+	const PATTERN = '/^(:?|(\w+:)+)P[1-9]\d{0,9}\z/i';


The exception case in the constructor is not tested

We decided earlier today that the patterns should not cover the prefix. It will be left to the DispatchingEntityIdParser to apply the pattern to only the last part of the serialized ID, by using the splitSerialization emthod.

Is :prefix1:prefix2:Q42 allowed?

Yes, it gets normalized to prefix1:prefix2:Q42 in the constructor.

JeroenDeDauw · 2016-09-19T14:31:26Z

src/Entity/EntityId.php

+	 * Returns the foreign repository name or empty string for local entities.
+	 * @return string
+	 */
+	public function getRepoName() {


non-private should have @since tag

is this fixed?

JeroenDeDauw · 2016-09-19T14:33:05Z

src/Entity/EntityId.php

+	}
+
+	protected function sanitizeIdSerialization( $id ) {
+		return $this->upcaseLocalId( ltrim( $id, ':' ) );


The ltrim here is not tested as far as I can tell

is this fixed?

JeroenDeDauw · 2016-09-19T14:36:57Z

src/Entity/EntityId.php

+
+	private function upcaseLocalId( $id ) {
+		$parts = explode( ':', $id );
+		$parts[count( $parts ) - 1] = strtoupper( end( $parts ) );


Was it decided to not have (local) IDs with colons in them? I'm not sure if such a thing would be needed for commons or some such.

If there is no such decision, then how about not doing this little code reuse via inheritance thing and simply putting the few lines in both ItemId and PropertyId? Same goes for the test. As far as I remember the reason for why we have this code and the test like this is that we started out with just having EntityId and only later introduced ItemId and PropertyId, ie for legacy reasons and not by design.

Local IDs never have colons in them (we decided that), but foreign IDs can be "chained": foo:bar:Q123 is Q123 in the repo that is called bar by the repo foo. That's also how interwiki prefixes work. You can do en:fr:wiktionary:de:wikibooks:it:wikipedia:en:Foo. This will result in a series of HTTP redirects, eventually giving you the Foo page on English Wikipedia.

My personal preference for code re-use would be: protected static upcaseLocalId() called from the constructors of ItemId and ProeprtyId.

manicki · 2016-09-19T16:05:48Z

src/Entity/EntityId.php

+	/**
+	 * @return string
+	 */
+	public function getLocalPart() {


Just so it is clear for me: Given serialized ID foo:bar:Q1337 is "local part" of it Q1337 or bar:Q1337.
I thought the former but I am not sure if we're done with the naming discussion here. If the latter is the local part what is Q1337 then?
Sorry if I am just creating confusion.

Yea... "core ID"? "actual ID"? "local local ID"?
Or the bar:Q1337 part could be called the "foreign local ID"? Or "relative ID"?...

Bah. The two hardest things in programming. Cache invalidation, naming things, and off-by-one errors.

manicki · 2016-09-19T16:10:42Z

src/Entity/EntityId.php

@@ -18,6 +18,13 @@
 	protected $serialization;

 	/**
+	 * @param $serialization


this is always a string, right? then you could have @param string $serialization here

manicki · 2016-09-19T16:10:58Z

src/Entity/EntityId.php

+	 * Returns an array with 3 elements: the foreign repository name as the first element, the local ID as the last
+	 * element and everything that is in between as the second element.
+	 *
+	 * @param $serialization


manicki · 2016-09-19T16:14:40Z

src/Entity/EntityId.php

+	 * element and everything that is in between as the second element.
+	 *
+	 * @param $serialization
+	 * @return array


it seems PHP's explode always returns an array of strings, so this method also does so. This might also be @return string[] then

manicki · 2016-09-19T16:16:34Z

src/Entity/EntityId.php

+	 * @param array $parts
+	 * @return string
+	 */
+	public static function joinSerialization( $parts ) {


There could be a type hint for $parts, not only phpdoc @param comment

manicki · 2016-09-19T16:33:22Z

src/Entity/PropertyId.php

@@ -48,7 +48,7 @@ private function assertValidIdFormat( $idSerialization ) {
 	 * @return int
 	 */
 	public function getNumericId() {
-		return (int)substr( $this->serialization, 1 );
+		return (int)substr( $this->getSerialization(), 1 );


I believe this is wrong, as it should take the last part of the serialized ID and drop P from it. E.g. this will break for foo:P123.
Also, I believe you forgot to do changes regarding accessing $this->serialization you did for this class in ItemId class as well.

Actually, getNumericId should fail for foreign IDs. Using the numeric ID of a foreign ID locally can never be right.

brightbyte

Mostly documentation issues.
Normalization should be split into a part that always applies, and a part that only applies for some IDs.

brightbyte · 2016-09-19T16:43:36Z

src/Entity/EntityId.php

+	}
+
+	/**
+	 * Concatenates parts of an EntityId serialization with ':'.


That doesn't state the contract. The contract is "builds an ID serialization from the parts returned by splitSerialization()".

brightbyte · 2016-09-19T16:44:35Z

src/Entity/EntityId.php

+	 * element and everything that is in between as the second element.
+	 *
+	 * @param $serialization
+	 * @return array


should be string[] and should say that this wall always be 3 elements.

brightbyte · 2016-09-19T16:45:25Z

src/Entity/EntityId.php

+	/**
+	 * Concatenates parts of an EntityId serialization with ':'.
+	 *
+	 * @param array $parts


string[]. Could be required to be exactly three elements, but maybe it's useful to leave this open.

brightbyte · 2016-09-19T16:46:36Z

src/Entity/EntityId.php

+	}
+
+	/**
+	 * Returns the foreign repository name or empty string for local entities.


Should say that it returns '' if it's a local ID. Should also clarify that for a "chained" foreighn ID, this returns the first part only.

brightbyte · 2016-09-19T16:47:04Z

src/Entity/EntityId.php

+	public function getLocalPart() {
+		$parts = self::splitSerialization( $this->serialization );
+
+		return self::joinSerialization( array( '', $parts[1], $parts[2] ) );


The '' bit isn't needed.

brightbyte · 2016-09-19T16:51:18Z

src/Entity/EntityId.php

+		return strpos( $this->serialization, ':' ) > 0;
+	}
+
+	protected static function normalizeIdSerialization( $id ) {


Please add at least a minimal doc block that states the type of the parameter. Perhaps also explain when this is (or should be) called.

brightbyte · 2016-09-19T16:53:33Z

src/Entity/EntityId.php

+	protected static function normalizeIdSerialization( $id ) {
+		$parts = self::splitSerialization( ltrim( $id, ':' ) );
+
+		return self::joinSerialization( array(


The trimming of the leading ":" should not be tied to the upper-casing of the local part. The leading ":" must always be stripped, that's part of the convention for foreign repo prefixes, which use the same syntax for all IDs.
Making the local part upper case is tied to specific EntityId types, and should only be called by those that actually need that.

brightbyte · 2016-09-19T16:53:58Z

src/Entity/EntityId.php

+	 * @return array
+	 */
+	public static function splitSerialization( $serialization ) {
+		$parts = explode( ':', $serialization );


Should strip any leading ":" first.

brightbyte · 2016-09-19T16:54:49Z

src/Entity/EntityId.php

+	 * @param $serialization
+	 */
+	protected function __construct( $serialization ) {
+		$this->serialization = $serialization;


Should fail if $serialization[0] === ':', and also if $serialization is empty or has the wrong type. Keep the checks simple though, this is a potential performance hotspot, and callers (subclass constructors) should already be doing normalization.

brightbyte · 2016-09-19T16:56:09Z

src/Entity/EntityId.php

+	}
+
+	/**
+	 * @return string


Explain what the "local part" is, especially in the context of chained foreign IDs.

JeroenDeDauw · 2016-09-19T23:03:04Z

src/Entity/ItemId.php

@@ -24,7 +24,7 @@ class ItemId extends EntityId implements Int32EntityId {
 	 */
 	public function __construct( $idSerialization ) {
 		$this->assertValidIdFormat( $idSerialization );
-		$this->serialization = strtoupper( $idSerialization );
+		parent::__construct( parent::normalizeIdSerialization( $idSerialization ) );


What motivated using a constructor in the parent class? Definitely not worth it IMO.

The idea was to do normalization in the constructor and to make serialization private to the parent class which isn't possible yet since it's still used in the unserialize methods of the subclasses.

JeroenDeDauw · 2016-09-19T23:05:39Z

tests/unit/Entity/EntityIdTest.php

+	 * @dataProvider serializationSplitProvider
+	 */
+	public function testJoinSerialization( $serialization, $split ) {
+		$this->assertSame( ltrim( $serialization, ':' ), EntityId::joinSerialization( $split ) );


Having logic such as ltrim in tests is an anti-pattern.

JeroenDeDauw · 2016-09-19T23:10:55Z

src/Entity/EntityId.php

+		$prefixRemainder = implode( ':', $parts );
+
+		return array( $repoName, $prefixRemainder, $localPart );
+	}


This code is procedural and uses a positional array for returning multiple values. Both are not cool

I agree that it's not the prettiest code but we decided to leave it procedural for now since the alternative was less readable. Also left it positional since the order in which the parts are returned is more intuitive than the names we could come up with.

Positional arrays act as tuples, which seem to be a standard feature in several languages, especially functional ones. They are ideomatic in php via the list() syntax.

Also, returning a list seems fine here, since this is a method that splits something.

I agree with jakobw: We could add keys and make the result associative - if we could think of good names for the parts. "start", "middle", "end" are not very helpful...

jakobw · 2016-09-20T15:49:47Z

@brightbyte @manicki @JeroenDeDauw thanks for the review!
I updated the patch and tried to address all of the comments.

jakobw · 2016-09-21T11:52:08Z

Updated again to do foreign repo prefix validation in EntityId so that the subclasses don't have to know about it as per discussion with @manicki and @brightbyte.

thiemowmde · 2016-09-21T11:58:30Z

src/Entity/EntityId.php

@@ -17,6 +18,30 @@

 	protected $serialization;

+	const PATTERN = '/^:?(\w+:)*[^:]+\z/';


Wait, this regular expression does allow :a:b:Q1, but the expressions in the subclasses do not. I believe this should not be allowed, to make things as simple as possible and not have multiple string representations (a:Q1 and :a:Q1) for the exact same ID.

Please add tests for all the cases listed above. :Q1 should succeed while :a:Q1 should fail.

Leading colons can and should always be ignored. They are currently stripped by the constructor.

Why should :a:Q1 fail? It should just be normalized to a:Q1.
To wit, :a:Q1 is valid as input, but should always be normalized. No EntityId would ever return a serialization starting with : from getSerialization.

The regex in ItemId and PropertyId do not allow :a:Q1, but this here does. One must be wrong.

@thiemowmde I think you're looking at an outdated version of ItemId and PropertyId. Github shows some strange things on the "Conversation" page of the pull request since they introduced their new review features...

Didn't we agree this morning to remove the prefix matching part from this pattern?

Ah sorry, this is in EntityId. I would have expected the pattern for checking the prefix syntax to have a different constant name than the patterns that check the final parts of the ID.

thiemowmde · 2016-09-21T11:59:58Z

src/Entity/EntityId.php

+			throw new InvalidArgumentException( '$serialization must be a string' );
+		}
+
+		if ( $serialization === '' ) {


I find it best practice to merge the first two trivial assertions into a single "must be a non-empty string".

I actually started to do it the other way around, for readability... I'd consider it a matter of taste.

I will always merge it when I see it. Super trivial and super easy to understand and to read. I wish there would be a "non-empty string" check in Assert. This pattern repeats a thousand times in our code base.

@thiemowmde just make a pull request, it's "ours" anyway: https://github.com/wmde/Assert

brightbyte · 2016-09-20T18:20:40Z

src/Entity/EntityId.php

+	}
+
+	private function assertValidSerialization( $serialization ) {
+		if ( empty( $serialization ) ) {


Please don't use empty() for strings, because empty( "0" ) returns true, which is not what we want.
empty() should really only be used on arrays. For strings, just use === ''.

The same problem exists with == 0 or !$foo.

brightbyte · 2016-09-20T18:24:45Z

src/Entity/ItemId.php

@@ -24,7 +25,7 @@ class ItemId extends EntityId implements Int32EntityId {
 	 */
 	public function __construct( $idSerialization ) {
 		$this->assertValidIdFormat( $idSerialization );


This needs to validate the prefix and the final part of the ID separately.

I'm not sure if you already saw the latest version of this but validating the two parts completely separately is kind of a chicken-egg-problem since the splitting method also wants to validate the serialization. EntityId now looks at the entire string to validate the prefix and also makes sure that there's something at the end that does not contain colons. The last part is then validated in the subclass as it should be.

brightbyte · 2016-09-20T18:26:09Z

src/Entity/ItemId.php

@@ -48,7 +49,11 @@ private function assertValidIdFormat( $idSerialization ) {
 	 * @return int


Please document that this throws a RuntimeException if called on a foreign ID.

brightbyte · 2016-09-20T18:27:31Z

src/Entity/PropertyId.php

@@ -64,7 +69,7 @@ public function getEntityType() {
 	 * @return string
 	 */
 	public function serialize() {
-		return json_encode( array( 'property', $this->serialization ) );
+		return json_encode( array( 'property', $this->getSerialization() ) );


I would really like to drop support for php serialization, but i'm not 100% sure we really use it nowhere, ever.

brightbyte · 2016-09-20T18:30:55Z

src/Entity/ItemId.php

@@ -15,7 +16,7 @@ class ItemId extends EntityId implements Int32EntityId {
 	/**
 	 * @since 0.5
 	 */
-	const PATTERN = '/^Q[1-9]\d{0,9}\z/i';
+	const PATTERN = '/^(:?|(\w+:)+)Q[1-9]\d{0,9}\z/i';


We decided earlier today that the patterns should not cover the prefix. It will be left to the DispatchingEntityIdParser to apply the pattern to only the last part of the serialized ID, by using the splitSerialization emthod.

brightbyte · 2016-09-20T18:31:03Z

src/Entity/PropertyId.php

@@ -15,7 +15,7 @@ class PropertyId extends EntityId implements Int32EntityId {
 	/**
 	 * @since 0.5
 	 */
-	const PATTERN = '/^P[1-9]\d{0,9}\z/i';
+	const PATTERN = '/^(:?|(\w+:)+)P[1-9]\d{0,9}\z/i';


We decided earlier today that the patterns should not cover the prefix. It will be left to the DispatchingEntityIdParser to apply the pattern to only the last part of the serialized ID, by using the splitSerialization emthod.

thiemowmde · 2016-09-21T12:05:55Z

src/Entity/EntityId.php

+	 * @return string
+	 */
+	public static function joinSerialization( array $parts ) {
+		return implode( ':', array_filter( $parts ) );


This is not fully compatible with the splitSerialization implementation above. splitSerialization guarantees the ~~first~~ last element is never empty. But this code here returns the exact same result for all these test cases:

[ 'Q1' , '', '' ]

[ '' , 'Q1', '' ]

[ '' , '', 'Q1' ]

This is quite bad. Two of these cases should fail (I believe the first two).

This code also fails for [ 'wd', '', '0' ], [ '0', '0', '0' ] and so on, because the string "0" is falsy in PHP. But according to the regex on top this string is a valid ID.

Good point. It's now checking that the last element is not an empty string and also makes sure '0' doesn't get filtered.

thiemowmde · 2016-09-21T12:07:50Z

src/Entity/EntityId.php

+	public function getLocalPart() {
+		$parts = self::splitSerialization( $this->serialization );
+
+		return self::joinSerialization( array( $parts[1], $parts[2] ) );


This shifts everything and passes the entity ID as the "remainder" part, and the remainder part as a prefix.

joinSerialization( [ '', $parts[1], $parts[2] ] )

This will only cause the returned string to have a preceding : which will be ignored.

thiemowmde · 2016-09-21T12:08:39Z

src/Entity/EntityId.php

+	 * @param string $id
+	 * @return string
+	 */
+	protected static function upcaseLocalId( $id ) {


"upcase" is a weird naming convention. Why not "upperCase"?

I heard upcase before e.g. in the Ruby standard library but I could change it.

The fact that this method turns the string upper case is an implementation detail, by the way, and should not be in the name. Something like "normalizeLocalId" would be better, I believe.

Normalizing the local ID should be the concern of the subclass. This method is just there for convenience so that all subclasses can easily access it to normalize the local ID.

I discussed with Thiemo that it would be cleaner to drop the convenience function, and just copy the relevant line of code to the child constructors that need it. I don't care terribly much either way, though.

thiemowmde · 2016-09-21T12:10:27Z

src/Entity/ItemId.php

+			throw new RuntimeException( 'getNumericId must not be called on foreign ItemIds' );
+		}
+
+		return (int)substr( $this->getSerialization(), 1 );


Do we know this is normalized and does not start with a colon? I do miss this line of code somehow, or don't see it.

It's normalized in the parent constructor.

thiemowmde · 2016-09-21T12:12:31Z

src/Entity/PropertyId.php

@@ -64,7 +70,7 @@ public function getEntityType() {
 	 * @return string
 	 */
 	public function serialize() {
-		return json_encode( array( 'property', $this->serialization ) );
+		return json_encode( array( 'property', $this->getSerialization() ) );


What is the benefit of basically reverting the inlining we did here (and the same in other lines of code in this patch) and calling the getters again?

The reason is that serialization is supposed to be private to the parent class. The only thing blocking that is that it's still set directly in the unserialize method.

I suggest to not touch these "magic" serialization methods to much, and not discuss this unrelated issue here.

Yea, this can be left for later. If we can't change unserialize(), why change serialize().

Changed it back to not using getSerialization

thiemowmde · 2016-09-21T12:34:13Z

src/Entity/ItemId.php

@@ -23,8 +24,9 @@ class ItemId extends EntityId implements Int32EntityId {
 	 * @throws InvalidArgumentException
 	 */
 	public function __construct( $idSerialization ) {
-		$this->assertValidIdFormat( $idSerialization );
-		$this->serialization = strtoupper( $idSerialization );
+		parent::__construct( self::upcaseLocalId( $idSerialization ) );


Why is upcaseLocalId called here, when both upcaseLocalId and the constructor call are in the parent class? The way this code is organized is a bit confusing and can mislead people into using this wrong. Basically: Not all entity IDs must be upper case. This is a decision the specific subclass (ItemId and PropertyId) must make. Which raises the question why the helper method is in the base class. Answer: Code sharing.

In the past we tried hard to get rid of code sharing in the Entity and EntityId base classes.

I suggest to rearrange the code in a way that you can reuse the split method, do a trivial strtoupper( $array[2] ) here in the subclass and then pass the array to the base class.

I suggest to rearrange the code in a way that you can reuse the split method, do a trivial strtoupper( $array[2] ) here in the subclass and then pass the array to the base class.

I don't understand the last bit here. Where would I pass an array to the base class and what would the base class do with it?

This method encodes knowledge about an implementation detail in the wrong class. The base class should not know anything about the fact that some subclasses do a magic "to upper case" conversion. Simply inline the strtoupper in the two subclasses.

manicki · 2016-09-21T14:56:08Z

src/Entity/EntityId.php

+	public function getLocalPart() {
+		$parts = self::splitSerialization( $this->serialization );
+
+		return self::joinSerialization( array( $parts[1], $parts[2] ) );


The way you do it seems a bit hacky way to me. I know it works currently but one could argue it only works by chance.
The code does not check this but I believe one might expect EntityId::joinSerialization to always get array with three elements (per symmetry to EntityId::splitSerialization always returning three-element array).
What @thiemowmde suggested seems to make sense in this context (as we are skipping the first element - the repo name). Your call to joinSerialization might be interpreted as a short form of joinSerialization(array( $parts[1], $parts[2], '' ) ). Which would be obviously wrong.
And also I think given you've changed the array_filter call in joinSerialization the code @thiemowmde proposed should work just fine and not generate redundant leading colon (if I am reading it right, I havent checked it)

I agree that having '' as the first array element does not do anything to the return value. In fact I had it there in the beginning and removed it by request here: #678 (comment). It does seem to confuse people though so I guess I'll add it again :)

Haha, yeah. After looking at at this code multiple times I even started to think maybe something like return $parts[1] !== '' ? $parts[1] . $parts[2] : $parts[2]; would do better here :)

brightbyte · 2016-09-21T16:02:59Z

src/Entity/EntityId.php

@@ -17,6 +18,30 @@

 	protected $serialization;

+	const PATTERN = '/^:?(\w+:)*[^:]+\z/';


Didn't we agree this morning to remove the prefix matching part from this pattern?

brightbyte · 2016-09-21T16:04:24Z

src/Entity/PropertyId.php

@@ -64,7 +70,7 @@ public function getEntityType() {
 	 * @return string
 	 */
 	public function serialize() {
-		return json_encode( array( 'property', $this->serialization ) );
+		return json_encode( array( 'property', $this->getSerialization() ) );


Yea, this can be left for later. If we can't change unserialize(), why change serialize().

JeroenDeDauw · 2016-09-22T03:29:05Z

The new non-private methods still don't have @since tags

JeroenDeDauw · 2016-09-22T03:48:55Z

src/Entity/EntityId.php

+	 * @param string $serialization
+	 * @return string[] Array containing the serialization split into 3 parts.
+	 */
+	public static function splitSerialization( $serialization ) {


As far as I can tell this could be protected and non-static

Edit: I had a go at modifying the code and this appears to work

Does it work with wmde/WikibaseDataModelServices#146 ?

I've just checked, and yes, it does
edit: keeping in mind that the services PR also depends on #679 which is depending on this PR!

I'm confused - I thought the EntityIdParser would need splitSerialization, which is removed in Jeroen's PR? I'll have to look at it more closely.

I'll merge this for now, we can still discuss Jeroen's change before the next release.

JeroenDeDauw · 2016-09-22T03:51:36Z

src/Entity/EntityId.php

+	 */
+	public function __construct( $serialization ) {
+		self::assertValidSerialization( $serialization );
+		$this->serialization = self::normalizeIdSerialization( $serialization );


Looking at the derivatives, I suspect this normalization is not needed

we need to make sure that any leading ":" is always stripped.

JeroenDeDauw · 2016-09-22T03:58:48Z

src/Entity/EntityId.php

+	 *
+	 * @return string
+	 */
+	public function getRepoName() {


getRepositoryName

JeroenDeDauw · 2016-09-22T03:59:37Z

src/Entity/EntityId.php

+	 *
+	 * @return string
+	 */
+	public function getLocalPart() {


I'm a bit confused by this naming. If you only remove the first repository prefix, then the remainder can still be non-local right?

Yes, it turned out to be tricky naming this one. Suggestions are very welcome!

JeroenDeDauw · 2016-09-22T04:09:08Z

tests/unit/Entity/EntityIdTest.php

+	 * @dataProvider invalidSerializationProvider
+	 */
+	public function testSplitSerializationFails_GivenInvalidSerialization( $serialization ) {
+		$this->setExpectedException( 'InvalidArgumentException' );


InvalidArgumentException::class

Fixed this and all similar cases!

JeroenDeDauw · 2016-09-22T04:09:12Z

tests/unit/Entity/EntityIdTest.php

+	 * @dataProvider invalidJoinSerializationDataProvider
+	 */
+	public function testJoinSerializationFails_GivenEmptyId( $parts ) {
+		$this->setExpectedException( 'InvalidArgumentException' );


InvalidArgumentException::class

JeroenDeDauw · 2016-09-22T04:09:18Z

tests/unit/Entity/EntityIdTest.php

+	 * @dataProvider invalidSerializationProvider
+	 */
+	public function testConstructor( $serialization ) {
+		$this->setExpectedException( 'InvalidArgumentException' );


InvalidArgumentException::class

JeroenDeDauw · 2016-09-22T04:09:26Z

tests/unit/Entity/ItemIdTest.php

@@ -147,4 +151,9 @@ public function invalidNumericIdProvider() {
 		);
 	}

+	public function testGetNumericIdThrowsExceptionOnForeignIds() {
+		$this->setExpectedException( 'RuntimeException' );


RuntimeException::class

JeroenDeDauw · 2016-09-22T04:09:45Z

tests/unit/Entity/PropertyIdTest.php

@@ -147,4 +150,9 @@ public function invalidNumericIdProvider() {
 		);
 	}

+	public function testGetNumericIdThrowsExceptionOnForeignIds() {
+		$this->setExpectedException( 'RuntimeException' );


RuntimeException:class

JeroenDeDauw · 2016-09-22T04:34:33Z

tests/unit/Entity/ItemIdTest.php

+			array( ':Q42', 'Q42' ),
+			array( 'foo:Q42', 'foo:Q42' ),
+			array( 'foo:bar:q42', 'foo:bar:Q42' ),
+			array( 'Q42', 'Q42' ),


Now this is there twice

Follow up to #678 This removes the addition of static code to EntityId and pushes the responsibility of parsing serializations out of the class. I did not touch various other issues and have created a number of small ones, so this leaves the code in a state, that while it works, still needs some work before it is mergeable. That includes making names consistent again and adding some edge case tests. The first constructor argument contains what the existing getter has named the "local part", which can also include prefixes defined on other sites. It's not immedidately clear to me why we need this here, so I'd be nice if someone could explain that. Should this not already have been resolved by the time the id is constructed?

JeroenDeDauw · 2016-09-22T05:21:03Z

I think the responsibility of this string splitting is misplaced and that all this static code is going down the wrong path. Follow up: #681

brightbyte · 2016-09-22T11:27:54Z

@JeroenDeDauw I agree in principle, but I have not yet seen a good alternative. The current solution isn't perfect, but I think the cost in terms of technical dept is low.

The problem as I see it is that there is now a shared responsibility between the EntityId classes and the EntityIdParser implementations: both need to be able to split the ID into the relevant three parts (the EntityId for validation and normalization, the parser for mapping prefixes and picking the right instantiator callback for the EntityId).

I want to keep the knowledge about the syntax used for these parts in one place (currently, in static methods in EntityId). A cleaner way to do this would be an EntityIdPartitioner class I suppose (with "parser" being taken). But the EntityId's constructor would need access to this, which makes things annoying again.

In any case, the syntax used for prefixing IDs is not "pluggable", it has to be the same for all types of entities, and more importantly, it has to be the same across all repos to allow federation. Because of this, I think it's ok to encode it in static methods bound to the EntityId base class: because it is part of the contract of all EntityIds.

Bug: T145516

brightbyte · 2016-09-22T16:41:00Z

src/Entity/EntityId.php

+	 */
+	public function __construct( $serialization ) {
+		self::assertValidSerialization( $serialization );
+		$this->serialization = self::normalizeIdSerialization( $serialization );


we need to make sure that any leading ":" is always stripped.

brightbyte · 2016-09-22T16:41:39Z

src/Entity/EntityId.php

+	 * @param string $serialization
+	 * @return string[] Array containing the serialization split into 3 parts.
+	 */
+	public static function splitSerialization( $serialization ) {


I'll merge this for now, we can still discuss Jeroen's change before the next release.

JeroenDeDauw suggested changes Sep 19, 2016

View reviewed changes

jakobw changed the title ~~Support foreign EntityIds.~~ [WIP] Support foreign EntityIds. Sep 19, 2016

jakobw force-pushed the foreign-entityids branch from f293876 to a2c1ee9 Compare September 19, 2016 15:13

manicki requested changes Sep 19, 2016

View reviewed changes

manicki mentioned this pull request Sep 19, 2016

Add support for repo prefixes in DispatchingEntityIdParser #679

Merged

brightbyte suggested changes Sep 19, 2016

View reviewed changes

JeroenDeDauw reviewed Sep 19, 2016

View reviewed changes

jakobw force-pushed the foreign-entityids branch 2 times, most recently from a3e6314 to a13e8c2 Compare September 20, 2016 15:37

jakobw force-pushed the foreign-entityids branch 2 times, most recently from dfc0baa to 1e5a85f Compare September 21, 2016 11:26

jakobw changed the title ~~[WIP] Support foreign EntityIds.~~ Support foreign EntityIds. Sep 21, 2016

thiemowmde reviewed Sep 21, 2016

View reviewed changes

brightbyte suggested changes Sep 21, 2016

View reviewed changes

thiemowmde reviewed Sep 21, 2016

View reviewed changes

jakobw force-pushed the foreign-entityids branch 3 times, most recently from 26a4612 to 11ad539 Compare September 21, 2016 15:05

manicki reviewed Sep 21, 2016

View reviewed changes

jakobw force-pushed the foreign-entityids branch from 11ad539 to 41e8f75 Compare September 21, 2016 15:25

brightbyte suggested changes Sep 21, 2016

View reviewed changes

JeroenDeDauw reviewed Sep 22, 2016

View reviewed changes

JeroenDeDauw mentioned this pull request Sep 22, 2016

Changed how "forgeign repository name" is stored in entity ids #681

Closed

jakobw force-pushed the foreign-entityids branch from 41e8f75 to 1a6953b Compare September 22, 2016 11:04

Support foreign EntityIds.

ac1cd71

Bug: T145516

jakobw force-pushed the foreign-entityids branch from 1a6953b to ac1cd71 Compare September 22, 2016 13:57

brightbyte approved these changes Sep 22, 2016

View reviewed changes

brightbyte merged commit ced2370 into wmde:master Sep 22, 2016

manicki mentioned this pull request Oct 14, 2016

Remove no longer valid note in EntityId docs #685

Merged

		@@ -17,6 +18,30 @@

		protected $serialization;

		const PATTERN = '/^:?(\w+:)*[^:]+\z/';

		@@ -48,7 +49,11 @@ private function assertValidIdFormat( $idSerialization ) {
		* @return int

Support foreign EntityIds. #678

Support foreign EntityIds. #678

Conversation

jakobw commented Sep 19, 2016

Choose a reason for hiding this comment

brightbyte Sep 20, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JeroenDeDauw Sep 19, 2016 • edited

Choose a reason for hiding this comment

brightbyte Sep 19, 2016 • edited

Choose a reason for hiding this comment

manicki Sep 19, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manicki Sep 19, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manicki Sep 19, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brightbyte left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brightbyte Sep 19, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakobw commented Sep 20, 2016

jakobw commented Sep 21, 2016

thiemowmde Sep 21, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brightbyte Sep 21, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brightbyte Sep 21, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brightbyte Sep 20, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brightbyte Sep 20, 2016 • edited

Choose a reason for hiding this comment

brightbyte Sep 20, 2016 • edited

Choose a reason for hiding this comment

thiemowmde Sep 21, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brightbyte Sep 20, 2016 •

edited

JeroenDeDauw Sep 19, 2016 •

edited

brightbyte Sep 19, 2016 •

edited

manicki Sep 19, 2016 •

edited

manicki Sep 19, 2016 •

edited

manicki Sep 19, 2016 •

edited

brightbyte Sep 19, 2016 •

edited

thiemowmde Sep 21, 2016 •

edited

brightbyte Sep 21, 2016 •

edited

brightbyte Sep 21, 2016 •

edited

brightbyte Sep 20, 2016 •

edited

brightbyte Sep 20, 2016 •

edited

brightbyte Sep 20, 2016 •

edited

thiemowmde Sep 21, 2016 •

edited