Wikibase Phrase Entity, Viewing

July 12, 2024 2 By addshore
This entry is part 7 of 7 in the series Wikibase ecosystem

In my previous post, we got to the point of being able to create a new Wikibase Entity, it is stored in the MediaWiki database as a page, however we can’t actually view it via any interface currently.

In this post, we will work through another set of code changes, tackling each issue as we see it arise, until we can see the entity represented in the various places that users might expect.

Viewing the page

The provided entity serialization is neither legacy nor current

When clicking on one of the links on Special:RecentChanges to a phrase page that we have created, we get our first error.

/wiki/Phrase:Phrase66900b01937842.29097733 MWContentSerializationException: The provided entity serialization is neither legacy nor current
from /var/www/html/w/extensions/Wikibase/lib/includes/Store/EntityContentDataCodec.php(253)Code language: JavaScript (javascript)

The full stack trace is a little large, but you can find it in a paste bin.

This error is very similar to an issue we saw in the creation blog post, but this time the codec class can not deserialize what we have stored in the database, as we have not registered a deserializer for phrases.

Adding a deserializer to the entity registration file is very simple:

Def::DESERIALIZER_FACTORY_CALLBACK => static function ( \Wikibase\DataModel\Deserializers\DeserializerFactory $deserializerFactory ) {
	return new PhraseDeserializer();
},Code language: PHP (php)

And the serializer itself can look something like this:

<?php

namespace Wikibase\Repo\Phrase;

use Deserializers\TypedObjectDeserializer;

class PhraseDeserializer extends TypedObjectDeserializer {

	public function __construct() {
		parent::__construct(PhraseDocument::TYPE,'type');
	}

	public function deserialize( $serialization ) {
		return new PhraseDocument(
			new PhraseId($serialization['id']),
			$serialization['language'],
			$serialization['phrase']
		);
	}

}Code language: PHP (php)

This deserializer relies on the id and type being part of the serialization, which is not something that I added to the serialization in the previous post. So I’ll need to add that to the PhraseSerailizer in order for TypedObjectDeserializer to function correctly.

The serialize method should look something like this:

public function serialize( $object ) {
	return [
		'id' => $object->getId()->getSerialization(),
		'type' => $object->getType(),
		'language' => $object->getLanguage(),
		'phrase' => $object->getPhrase(),
	];
}Code language: PHP (php)

This also means that the previous entities that we have created that lack this id and type in serialization, will not be deserializable moving forward. If we try to load one of these, we will continue getting the exception noted above, however if we create a new entitiy with the new serializer, we can continue to the next error.

⚠️ I expect with some refactoring this requirement of including the type and ID within the serialization may be possible, but it seems this is the status quo currently, and is the easiest to copy for now. ⚠️

After fixing this error, our code is at 49afe2026a8914c79bc034ef4166a8dff41b2410.

No EntityDocumentView is registered for entity type ‘phrase’

/wiki/Phrase:Phrase669100fbb322c4.15128648 OutOfBoundsException: No EntityDocumentView is registered for entity type 'phrase'
from /var/www/html/w/extensions/Wikibase/repo/includes/ParserOutput/DispatchingEntityViewFactory.php(51)Code language: JavaScript (javascript)

Again, another large stack trace, but you can find it in a paste bin.

Once again, at the line that is causing this error, we see another case where some part of our entity registration is missing, in this case it is entityViewFactoryCallbacks.

The registration of a View service is simple:

Def::VIEW_FACTORY_CALLBACK => function(
	\Language $language,
	\Wikibase\Lib\TermLanguageFallbackChain $fallbackChain,
	\Wikibase\DataModel\Entity\EntityDocument $entity
) {
	return new PhraseView();
},Code language: PHP (php)

And the PhraseView implementation itself we can keep nice and simple for now.

This code should just output a small string on the page, showing both the language and the phrase that is set.

<?php

namespace Wikibase\Repo\Phrase;

use Wikibase\View\EntityDocumentView;
use Wikibase\View\ViewContent;

class PhraseView implements EntityDocumentView {

	function getTitleHtml(\Wikibase\DataModel\Entity\EntityDocument $entity) {
		return "Title of " . $entity->getId()->getSerialization();
	}

	function getContent(\Wikibase\DataModel\Entity\EntityDocument $entity, $revision): ViewContent {
		/* @var PhraseDocument $entity */
		return new ViewContent(
			"Language: " . $entity->getLanguage() . "<br>Phrase: " . $entity->getPhrase(),
			[]
		);
	}
}Code language: PHP (php)

Reloading the page that we are trying to view, we do now see our content rendering, however we also see another large error presented above.

At the end of this section, our code is at 3aa7aa2bedd3c17c1c66536fc4f66924551a7d66

Failed to parse EntityId config var

This is actually only a warning, and I am only seeing this as I have development mode etc turned on, otherwise at this point, things would appear to be working from a user perspective.

wfLogWarning( $msg = 'Failed to parse EntityId config var: Phrase669100fbb322c4.15128648', $callerOffset = ???, $level = ??? )	
.../OutputPageEntityIdReader.php:52Code language: PHP (php)

The warning once again comes from a section of code that is looking for something in the entity registration, which I have not yet defined.

In this case, the entityIdParser in Wikibase, doesn’t know how to parse IDs for phrase entities.

Looking at the stacktrace, this parser is needed in the Wikibase hook for OutputPageBodyAttributes, which is adding some HTML class attributes to the body of the page, based on the entity type.

And looking at what the Wikibase ID parser requires us to do, we must define some EntityIdBuilders. Continuing down the rabbit hole, we see that these builders actually depend on the ENTITY_ID_PATTERN and ENTITY_ID_BUILDER parts of entity registration.

This one is a very simple addition to the definitions:

Def::ENTITY_ID_PATTERN => '/^Phrase[0-9a-z]+\.[0-9]+/i',
Def::ENTITY_ID_BUILDER => static function ( $serialization ) {
	return new PhraseId( $serialization );
},Code language: PHP (php)

Reloading the page one more time, and we have a fully functional (in the fact that there are no errors) page for the phrase.

At the end of this section, the code is now at 52a94cde6c5e50cd2a27e9d2b7cc70d0dea9c7c4.

Special:RecentChanges

Interestingly, as part of the above changes, Special:RecentChanges is now broken for display. (I was actually expecting this, as it happened in my 2022 branch too).

LogicException: Unable to find Wikibase\DataAccess\PrefetchingTermLookup

/wiki/Special:RecentChanges LogicException: Unable to find Wikibase\DataAccess\PrefetchingTermLookup Service callback for Entity Type phrase for Source local
from /var/www/html/w/extensions/Wikibase/lib/includes/ServiceBySourceAndTypeDispatcher.php(54)Code language: JavaScript (javascript)

So, something along this code path has changed, and now requires an extra service to be defined (I did have a quick look but didn’t come to any concrete conclusion about what changed and don’t think it’s worth staring at too much.)

⚠️ This is another point that highlights some of the bad assumptions that still exist within the Wikibase entity system, as my entity doesn’t have any terms, so why do I need to define something relating to them for my entity? ⚠️

So, let’s try adding a Def::PREFETCHING_TERM_LOOKUP_CALLBACK service to our definition. Conveniently, Wikibase provides a NullPrefetchingTermLookup that we can shove in there for now if we don’t want to worry about prefetching.

Def::PREFETCHING_TERM_LOOKUP_CALLBACK => static function () {
	return new \Wikibase\DataAccess\NullPrefetchingTermLookup();
},Code language: PHP (php)

LogicException: Unable to find Wikibase\Lib\Store\EntityUrlLookup

Very similar to the error above, another service that is required somewhere for recent changes now, despite it working before?

/wiki/Special:RecentChanges LogicException: Unable to find Wikibase\Lib\Store\EntityUrlLookup Service callback for Entity Type phrase for Source local
from /var/www/html/w/extensions/Wikibase/lib/includes/ServiceBySourceAndTypeDispatcher.php(54)Code language: JavaScript (javascript)

You can find the full stack trace in a pastebin.

Again, Wikibase provides a fairly basic service already for this service that I don’t need to modify at all, which can be dropped right in.

Def::URL_LOOKUP_CALLBACK => static function () {
	return new \Wikibase\Lib\Store\TitleLookupBasedEntityUrlLookup( WikibaseRepo::getEntityTitleLookup() );
},Code language: PHP (php)

LogicException: Unable to find Wikibase\Lib\Store\EntityExistenceChecker

Again very similar to the two errors above, yet another service is required now that wasn’t before…

/wiki/Special:RecentChanges LogicException: Unable to find Wikibase\Lib\Store\EntityExistenceChecker Service callback for Entity Type phrase for Source local
from /var/www/html/w/extensions/Wikibase/lib/includes/ServiceBySourceAndTypeDispatcher.php(54)Code language: JavaScript (javascript)

You can find the full stack trace for this error in a pastebin.

And once again, Wikibase already provides a drop in service for this that we can use.

Def::EXISTENCE_CHECKER_CALLBACK => static function () {
	$services = \MediaWiki\MediaWikiServices::getInstance();
	return new \Wikibase\Lib\Store\TitleLookupBasedEntityExistenceChecker(
		WikibaseRepo::getEntityTitleLookup( $services ),
		$services->getLinkBatchFactory()
	);
},Code language: PHP (php)

LogicException: Unable to find Wikibase\Lib\Store\EntityTitleTextLookup

Now for one, that didn’t happen at this point in my 2022 branch, so something must have changed within Wikibase to now require this service for the recent changes page.

/wiki/Special:RecentChanges LogicException: Unable to find Wikibase\Lib\Store\EntityTitleTextLookup Service callback for Entity Type phrase for Source local
from /var/www/html/w/extensions/Wikibase/lib/includes/ServiceBySourceAndTypeDispatcher.php(54)Code language: JavaScript (javascript)

Once again, the full stack trace can be found in a pastebin.

The fact that I don’t already know the answer to this error allows me to explore how I found the services for the above fixes. Ultimately, I have been looking at the various existing entity type definitions for items, properties, lexemes, mediainfo and using them as examples.

If we have a look at the item entity type definition, we can see that item also provides a very basic, and already provided default service.

Infact, properties, lexemes, forms, senses and mediainfo all use this same service, but all also must explicitly provide it to function as a wikibase entity (maybe good maybe bad, certainly worth thinking about).

So let’s shove this templated code in:

Def::TITLE_TEXT_LOOKUP_CALLBACK => function () {
	return new \Wikibase\Lib\Store\TitleLookupBasedEntityTitleTextLookup(
		WikibaseRepo::getEntityTitleLookup()
	);
},Code language: PHP (php)

And tada, recent changes works again

And at the end of this section the code is at 5b085f3c29cd8950608eb90508785f8aa6a4a19f.

wbgetentities

Next, I’ll take a little look at the action API that is provided by Wikibase for entities out of the box.

The following URL should present me with a JSON representation of my phrase entity.

http://default.mediawiki.mwdd.localhost:8080/w/api.php?action=wbgetentities&ids=Phrase669100fbb322c4.15128648Code language: JavaScript (javascript)

Instead, it presents me with an error!

{
    "error": {
        "code": "internal_api_error_Serializers\\Exceptions\\UnsupportedObjectException",
        "info": "[ac565d7c5c4b33e90c81a405] Exception caught:",
        "errorclass": "Serializers\\Exceptions\\UnsupportedObjectException",
        "*": "Serializers\\Exceptions\\UnsupportedObjectException at /var/www/html/w/vendor/serialization/serialization/src/Serializers/DispatchingSerializer.php(46)\nfrom /var/www/html/w/vendor/serialization/serialization/src/Serializers/DispatchingSerializer.php(46)\n#0 /var/www/html/w/extensions/Wikibase/repo/includes/Api/ResultBuilder.php(395): Serializers\\DispatchingSerializer-&gt;serialize()\n#1 /var/www/html/w/extensions/Wikibase/repo/includes/Api/ResultBuilder.php(335): Wikibase\\Repo\\Api\\ResultBuilder-&gt;getModifiedEntityArray()\n#2 /var/www/html/w/extensions/Wikibase/repo/includes/Api/GetEntities.php(351): Wikibase\\Repo\\Api\\ResultBuilder-&gt;addEntityRevision()\n#3 /var/www/html/w/extensions/Wikibase/repo/includes/Api/GetEntities.php(196): Wikibase\\Repo\\Api\\GetEntities-&gt;handleEntity()\n#4 /var/www/html/w/includes/api/ApiMain.php(1952): Wikibase\\Repo\\Api\\GetEntities-&gt;execute()\n#5 /var/www/html/w/includes/api/ApiMain.php(928): ApiMain-&gt;executeAction()\n#6 /var/www/html/w/includes/api/ApiMain.php(899): ApiMain-&gt;executeActionWithErrorHandling()\n#7 /var/www/html/w/includes/api/ApiEntryPoint.php(158): ApiMain-&gt;execute()\n#8 /var/www/html/w/includes/MediaWikiEntryPoint.php(200): MediaWiki\\Api\\ApiEntryPoint-&gt;execute()\n#9 /var/www/html/w/api.php(44): MediaWiki\\MediaWikiEntryPoint-&gt;run()\n#10 {main}"
    },
    "servedby": "34649244130e"
}Code language: JSON / JSON with Comments (json)

This error appears to be talking about Serializers once again, which we have already defined as part of the entity type definition, however some additional definition must be missing.

Looking at the stacktrace, the issue comes from ResultBuilder (a class I wrote 11 years ago) which ultimately takes in a different serializer for presentation to users than is used for database storage.

This comes from WikibaseRepo::getAllTypesEntitySerializer which ultimately makes use of the SERIALIZER_FACTORY_CALLBACK key from the entity type definitions.

For convenience, as there is currently no need to have different presentation and storage serialization, we can just provide the same PhraseSerailizer here.

Def::SERIALIZER_FACTORY_CALLBACK => static function ( \Wikibase\DataModel\Serializers\SerializerFactory $serializerFactory ) {
	return new PhraseSerailizer();
},Code language: PHP (php)

Reloading the wbgetentities API, we can now see our presented serialization (combined with the additional spec that the current action API adds to the JSON).

At this point in the post, the code is at ec64885a7b3b4cb205f025056a5c2a2f025f8864.

Special:EntityData

Here I immediately started to regret a change that I made between my 2022 branch, and this new 2024 branch.

IDs with .s in them

In the 2022 branch I was using bin2hex(random_bytes(16)) to generate my entity ID with. In my 2024 branch I decided to move to uniqid( 'Phrase', true ). The side affect of this is that the IDs are generated with a . character in them, such as Phrase669100fbb322c4.15128648, vs the old IDs that would look like a2ce04965e4a9dcb03939d7c87f71dc4.

This is a small lesson in either “if it aint broke don’t fix it”, or “don’t make unnecessary changes at the same time as doing something else”.

Trying to load Special:EntityData for the first time the error message looks a bit odd, and also different to what I saw in 2022.

http://default.mediawiki.mwdd.localhost:8080/wiki/Special:EntityData/Phrase669100fbb322c4.15128648.jsonCode language: JavaScript (javascript)

The error seems to have chopped off part of the ID.

Rather than figure out where within Wikibase this is happening, I instead opted to change my ID generation so that this . was no longer present. This can be seen in 8585316e668e80490889ad21a4d60280d57937d6.

⚠️ It would be great if this were either documented earlier, or enforced by something in code if the “framework” of wikibase can not handle .s in IDs. Just another possible pain point highlighted :)⚠️

ID Capitalization

I created a new phrase, which has a fresh ID with no . within it, and when trying to retrieve this via Special:EntityData another odd thing happens.

I request:

http://default.mediawiki.mwdd.localhost:8080/wiki/Special:EntityData/Phrase6691235471660120190320.jsonCode language: JavaScript (javascript)

However, my URL gets rewritten to:

http://default.mediawiki.mwdd.localhost:8080/wiki/Special:EntityData/PHRASE6691235471660120190320.jsonCode language: JavaScript (javascript)

And I get an error that says the entity can not be found…

This is the error I was expecting that also happened in 2022, and this change in case can be traced back to EntityDataUriManager which makes an assumption that app characters in an entity ID MUST be upper case (code from 11 years ago).

I roughly remember the reason code like this exists. Back in the day, particularly in statement GUIDs there was a case where GUIDs would be created with a lowercase P or Q id, such as q63 instead of Q63. Since then, for the entities that generally exist today, and because there is no reason not to, all entity IDs have all characters upper case.

There are two approaches to “fix” this for my branch.

  1. Only uppercase the Int32EntityId IDs, allowing the PhraseId to have different handeling in this case
    • This is what I did back in 2022
    • Pros: My phrase ID can stay the same
    • Cons: There might be even more places in Wikibase that make this assumption that I will have to fix moving forward
  2. Make my PhraseId generation always be uppercase
    • This is going to be my chosen approach for this case
    • Pros: I don’t have to touch Wikibase code for this
    • Cons: PHRASE looks ugly? & I once again change my ID (but i just did that anyway…)

After this ID generation change my code is at 3a74ae4829ed72c0ee5285b32c105fd43e411057.

And now making a request for the JSON of the entity via Special:EntityData works, and makes use of the same presentation based entity serializer defined above for wbgetentities.

RDF mapping

Requesting another format such as RDF or TTL also works at this stage, although the mapping of the content itself is lacking.

...

data:PH6691283689E56137095964 a schema:Dataset ;
	schema:about wd:PH6691283689E56137095964 ;
	cc:license <http://creativecommons.org/publicdomain/zero/1.0/> ;
	schema:softwareVersion "1.0.0" ;
	schema:version "7"^^xsd:integer ;
	schema:dateModified "2024-07-12T12:57:26Z"^^xsd:dateTime .

wd:PH6691283689E56137095964 a wikibase:Phrase .Code language: JavaScript (javascript)

In order to add additional RDF mapping for the phrase entity, I need to define a Def::RDF_BUILDER_FACTORY_CALLBACK.

The service definition is once again very simple:

Def::RDF_BUILDER_FACTORY_CALLBACK => static function (
	$flavorFlags,
	\Wikibase\Repo\Rdf\RdfVocabulary $vocabulary,
	\Wikimedia\Purtle\RdfWriter $writer,
	$tracker,
	$dedupe
) {
	return new PhraseRdfBuilder(
		$vocabulary,
		$writer
	);
},Code language: PHP (php)

As is the service itself (if you know enough about RDF etc…)

<?php

namespace Wikibase\Repo\Phrase;

use Wikibase\Repo\Rdf\EntityRdfBuilder;
use Wikibase\Repo\Rdf\RdfVocabulary;
use Wikimedia\Purtle\RdfWriter;
use Wikibase\DataModel\Entity\EntityDocument;

class PhraseRdfBuilder implements EntityRdfBuilder {

    private $vocabulary;
    private $writer;

	public function __construct(
		RdfVocabulary $vocabulary,
		RdfWriter $writer
	) {
		$this->vocabulary = $vocabulary;
		$this->writer = $writer;
	}

	public function addEntity( EntityDocument $entity ){
		// Stolen from TermsRdfBuilder::getLabelPredicates
		$labelPredicates = [
			[ 'rdfs', 'label' ],
			[ RdfVocabulary::NS_SKOS, 'prefLabel' ],
			[ RdfVocabulary::NS_SCHEMA_ORG, 'name' ],
		];

		for ( $i = 0; $i < count( $labelPredicates ); $i++ ) {
			$this->writer->say( $labelPredicates[$i][0], $labelPredicates[$i][1] )->text( $entity->getPhrase(), $entity->getLanguage() );
		}
	}
}Code language: PHP (php)

Reloading the TTL output, we now see our additional data that shows the phrase content.

...

data:PH6691283689E56137095964 a schema:Dataset ;
	schema:about wd:PH6691283689E56137095964 ;
	cc:license <http://creativecommons.org/publicdomain/zero/1.0/> ;
	schema:softwareVersion "1.0.0" ;
	schema:version "7"^^xsd:integer ;
	schema:dateModified "2024-07-12T12:57:26Z"^^xsd:dateTime .

wd:PH6691283689E56137095964 a wikibase:Phrase ;
	rdfs:label "Phrase with new ID again"@en ;
	skos:prefLabel "Phrase with new ID again"@en ;
	schema:name "Phrase with new ID again"@en .Code language: JavaScript (javascript)

After this RDF mapping change, the code is now at 84c6e127912ef763501fff6ca1a32d27fb57d085.

A summary

We can now actually view the phrase in a few different ways and formats (UI, action API, RDF).

There are still many errors dotted around the place to fix, for example, simply trying to load the MediaWiki action API documentation currently results in an error, search doesn’t work other than for “Page title matches” and there is currently no way to edit the phrase at all..

These 2 posts bring me mostly to the end of my 2022 branch, except there I also messed around with fingerprints a little, and added a non JS termbox to the experimental entity, both of which I won’t be covering in this series.

Although it has taken 2 blog posts, 19 files, and 535 line additions to get to this point, in essence all I have defined at this stage is the following 15 lines of information for Wikibase and MediaWiki to do things with.

mediawiki:
 - namespace:
   - id: 4269
name:
 - human: Phrase
 - internal: phrase
id:
 - format: /^PH[0-9A-Z]+/
 - generation: strtoupper( str_replace( '.', '', uniqid( 'PH', true )))
fields:
 - language: string name:Language,validation:languageCode
 - phrase: string name:Phrase,validation:len<=1000
rdf:
 - labels:
   - language: phraseCode language: YAML (yaml)
Series Navigation<< Wikibase Phrase Entity, Creation