Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce EntityRetrievingClosestReferencedEntityIdLookup #195

Merged
merged 1 commit into from
Apr 24, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
249 changes: 249 additions & 0 deletions src/Lookup/EntityRetrievingClosestReferencedEntityIdLookup.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
<?php

namespace Wikibase\DataModel\Services\Lookup;

use Wikibase\DataModel\Entity\EntityId;
use Wikibase\DataModel\Entity\EntityIdValue;
use Wikibase\DataModel\Entity\PropertyId;
use Wikibase\DataModel\Services\Entity\EntityPrefetcher;
use Wikibase\DataModel\Snak\PropertyValueSnak;
use Wikibase\DataModel\Snak\Snak;
use Wikibase\DataModel\Statement\StatementListProvider;

/**
* Service for getting the closest entity (out of a specified set),
* from a given starting entity. The starting entity, and the target entities
* are (potentially indirectly, via intermediate entities) linked by statements
* with a given property ID, pointing from the starting entity to one of the
* target entities.
*
* @since 3.10
*
* @license GPL-2.0-or-later
* @author Marius Hoch
*/
class EntityRetrievingClosestReferencedEntityIdLookup implements ReferencedEntityIdLookup {

/**
* @var EntityLookup
*/
private $entityLookup;

/**
* @var EntityPrefetcher
*/
private $entityPrefetcher;

/**
* @var int Maximum search depth: Maximum number of intermediate entities to search through.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is confusing. A "maximum depth" is not the same as a "maximum number of entities to search through". This search spans a tree. If the tree is very flat, the search depth might be something like 5, while a few thousand entities have been traversed already. Please count the entities, and not the depth. We run into issues with this in the QualityConstrains codebase already. @lucaswerkmeister might be able to provide an example.

Further down in the constructor I see both a depth and some maximum number. It's hard to tell what they do. I think a depth is not needed.

If you think having both is helpful, I would not make the depth throw an exception. An exception is only thrown if the maximum number of entities is reached. My reasoning for this idea is: Think of a tree with two branches, the first is extremely deep, while the second is only a few entities deep. Both together exceed the "maximum number of entities" setting. If the depth-check throws an exception, nothing would be found. But if the depth-check only makes the code stop traversing a particular branch, the second, smaller branch would be found.

I hope this makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope I correctly understood your comment, if not, please correct me!

Please note that this is not doing a depth-first search, but a breadth-first search, thus there's no need to make "the code stop traversing a particular branch". In this context I think it might be useful for a caller to say that they only want us to look for paths with a certain maximum length.
The entity visit limitation is just an added safety, which I don't imagine firing very often, but only in specific cases where one or more visited entities references many other entities.

I agree that the documentation for this might be improvable. I'll see what I can do there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I still don't know what this integer represents. The documentation uses the words "depth" (as in how deep a tree is) and "number" (as in the number of visited leaves) in the same sentence. For me these are two very different concepts. Which one does this integer represent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this represents the maximum depth of the tree. You can also think of the depth as the number of entities above the current leaf (i. e., intermediate entities) at any point. Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also think of the depth as the number of entities above the current leaf […]

"Above the current leave" is an other tree. What "number of entities" from this tree is meant? A horizontal or a vertical slice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maximum number of intermediate entities to search through, this means given we have a path Q1 -> … -> Q42, this is the number of entities represented by the .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So "maxDepth" is indeed the maximum depth of that particular tree. :-) Thanks!

* For example 0 means that only the entities immediately referenced will be found.
*/
private $maxDepth;

/**
* @var int Maximum number of entities to retrieve.
*/
private $maxEntityVisits;

/**
* Map (entity id => true) of already visited entities.
*
* @var bool[]
*/
private $alreadyVisited = [];

/**
* @param EntityLookup $entityLookup
* @param EntityPrefetcher $entityPrefetcher
* @param int $maxDepth Maximum search depth: Maximum number of intermediate entities to search through.
* For example if 0 is given, only the entities immediately referenced will be found.
* If this limit gets exhausted, a MaxReferenceDepthExhaustedException is thrown.
* @param int $maxEntityVisits Maximum number of entities to retrieve during a lookup.
* If this limit gets exhausted, a MaxReferencedEntityVisitsExhaustedException is thrown.
*/
public function __construct(
EntityLookup $entityLookup,
EntityPrefetcher $entityPrefetcher,
$maxDepth,
$maxEntityVisits
) {
$this->entityLookup = $entityLookup;
$this->entityPrefetcher = $entityPrefetcher;
$this->maxDepth = $maxDepth;
$this->maxEntityVisits = $maxEntityVisits;
}

/**
* Get the closest entity (out of $toIds), from a given entity. The starting entity, and
* the target entities are (potentially indirectly, via intermediate entities) linked by
* statements with the given property ID, pointing from the starting entity to one of the
* target entities.
*
* @since 3.10
*
* @param EntityId $fromId
* @param PropertyId $propertyId
* @param EntityId[] $toIds
*
* @return EntityId|null Returns null in case none of the target entities are referenced.
* @throws ReferencedEntityIdLookupException
*/
public function getReferencedEntityId( EntityId $fromId, PropertyId $propertyId, array $toIds ) {
if ( !$toIds ) {
return null;
}

$this->alreadyVisited = [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you keep this cache for the lifetime of the object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because we want to maybe visit the same entities again, if another (or even the same) reference-relationship is queried.
We may have a caching decorator for this in the future, but that's not this classes business.


$steps = $this->maxDepth + 1; // Add one as checking $fromId already is a step
$toVisit = [ $fromId ];

while ( $steps-- ) {
$this->entityPrefetcher->prefetch( $toVisit );
$toVisitNext = [];

foreach ( $toVisit as $curId ) {
$result = $this->processEntityById( $curId, $fromId, $propertyId, $toIds, $toVisitNext );
if ( $result ) {
return $result;
}
}
// Remove already visited entities
$toVisit = array_unique(
array_diff( $toVisitNext, array_keys( $this->alreadyVisited ) )
);

if ( !$toVisit ) {
return null;
}
}

// Exhausted the max. depth without finding anything.
throw new MaxReferenceDepthExhaustedException(
$fromId,
$propertyId,
$toIds,
$this->maxDepth
);
}

/**
* Find out whether an entity (directly) references one of the target ids.
*
* @param EntityId $id Id of the entity to process
* @param EntityId $fromId Id this lookup started from
* @param PropertyId $propertyId
* @param EntityId[] $toIds
* @param EntityId[] &$toVisit List of entities that still need to be checked
* @return EntityId|null Target id the entity refers to, null if none.
*/
private function processEntityById(
EntityId $id,
EntityId $fromId,
PropertyId $propertyId,
array $toIds,
array &$toVisit
) {
$entity = $this->getEntity( $id, $fromId, $propertyId, $toIds );
if ( !$entity ) {
return null;
}

$mainSnaks = $this->getMainSnaks( $entity, $propertyId );

foreach ( $mainSnaks as $mainSnak ) {
$result = $this->processSnak( $mainSnak, $toVisit, $toIds );
if ( $result ) {
return $result;
}
}

return null;
}

/**
* @param EntityId $id Id of the entity to get
* @param EntityId $fromId Id this lookup started from
* @param PropertyId $propertyId
* @param EntityId[] $toIds
*
* @return StatementListProvider|null Null if not applicable.
*/
private function getEntity( EntityId $id, EntityId $fromId, PropertyId $propertyId, array $toIds ) {
if ( isset( $this->alreadyVisited[$id->getSerialization()] ) ) {
trigger_error(
'Entity ' . $id->getSerialization() . ' already visited.',
E_USER_WARNING
);

return null;
}

$this->alreadyVisited[$id->getSerialization()] = true;

if ( count( $this->alreadyVisited ) > $this->maxEntityVisits ) {
throw new MaxReferencedEntityVisitsExhaustedException(
$fromId,
$propertyId,
$toIds,
$this->maxEntityVisits
);
}

try {
$entity = $this->entityLookup->getEntity( $id );
} catch ( EntityLookupException $ex ) {
throw new ReferencedEntityIdLookupException( $fromId, $propertyId, $toIds, null, $ex );
}

if ( !( $entity instanceof StatementListProvider ) ) {
return null;
}

return $entity;
}

/**
* Decide whether a single Snak is pointing to one of the target ids.
*
* @param Snak $snak
* @param EntityId[] &$toVisit List of entities that still need to be checked
* @param EntityId[] $toIds
* @return EntityId|null Target id the Snak refers to, null if none.
*/
private function processSnak( Snak $snak, array &$toVisit, array $toIds ) {
if ( ! ( $snak instanceof PropertyValueSnak ) ) {
return null;
}
$dataValue = $snak->getDataValue();
if ( ! ( $dataValue instanceof EntityIdValue ) ) {
return null;
}

$entityId = $dataValue->getEntityId();
if ( in_array( $entityId, $toIds, false ) ) {
return $entityId;
}

$toVisit[] = $entityId;

return null;
}

/**
* @param StatementListProvider $statementListProvider
* @param PropertyId $propertyId
* @return Snak[]
*/
private function getMainSnaks(
StatementListProvider $statementListProvider,
PropertyId $propertyId
) {
return $statementListProvider
->getStatements()
->getByPropertyId( $propertyId )
->getBestStatements()
->getMainSnaks();
}

}
51 changes: 51 additions & 0 deletions src/Lookup/MaxReferenceDepthExhaustedException.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<?php

namespace Wikibase\DataModel\Services\Lookup;

use Exception;
use Wikibase\DataModel\Entity\EntityId;
use Wikibase\DataModel\Entity\PropertyId;

/**
* @since 3.10
*
* @license GPL-2.0-or-later
* @author Marius Hoch
*/
class MaxReferenceDepthExhaustedException extends ReferencedEntityIdLookupException {

/**
* @var int
*/
private $maxDepth;

/**
* @param EntityId $fromId
* @param PropertyId $propertyId
* @param EntityId[] $toIds
* @param int $maxDepth
* @param string|null $message
* @param Exception|null $previous
*/
public function __construct(
EntityId $fromId,
PropertyId $propertyId,
array $toIds,
$maxDepth,
$message = null,
Exception $previous = null
) {
$this->maxDepth = $maxDepth;
$message = $message ?: 'Referenced entity id lookup failed: Maximum depth of ' . $maxDepth . ' exhausted.';

parent::__construct( $fromId, $propertyId, $toIds, $message, $previous );
}

/**
* @return int
*/
public function getMaxDepth() {
return $this->maxDepth;
}

}
52 changes: 52 additions & 0 deletions src/Lookup/MaxReferencedEntityVisitsExhaustedException.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
<?php

namespace Wikibase\DataModel\Services\Lookup;

use Exception;
use Wikibase\DataModel\Entity\EntityId;
use Wikibase\DataModel\Entity\PropertyId;

/**
* @since 3.10
*
* @license GPL-2.0-or-later
* @author Marius Hoch
*/
class MaxReferencedEntityVisitsExhaustedException extends ReferencedEntityIdLookupException {

/**
* @var int
*/
private $maxEntityVisits;

/**
* @param EntityId $fromId
* @param PropertyId $propertyId
* @param EntityId[] $toIds
* @param int $maxEntityVisits
* @param string|null $message
* @param Exception|null $previous
*/
public function __construct(
EntityId $fromId,
PropertyId $propertyId,
array $toIds,
$maxEntityVisits,
$message = null,
Exception $previous = null
) {
$this->maxEntityVisits = $maxEntityVisits;
$message = $message ?: 'Referenced entity id lookup failed: Maximum number of entity visits (' .
$maxEntityVisits . ') exhausted.';

parent::__construct( $fromId, $propertyId, $toIds, $message, $previous );
}

/**
* @return int
*/
public function getMaxEntityVisits() {
return $this->maxEntityVisits;
}

}
Loading