Wikibase Phrase Entity, Creation

July 11, 2024 1 By addshore
This entry is part 6 of 7 in the series Wikibase ecosystem

Finally, after a long lead up of discussing what an entity is, looking at some examples of entity extensions, and one extension that chose not to make use of the Wikibase Entity system & EntityDocument. What does it take to create a new type of data entity within Wikibase that implements the EntityDocument interface and makes use of the various integrations that have evolved over the past 10+ years?

I slapped together a very rough branch exploring this in 2022, but it’s hard to follow at best, and doesn’t really discuss any of the challenges that crop up along the way. This post, and those following are the redo, with much more context. And with any luck, it will work mostly as before (as Wikibase hasn’t changed much internally when it comes to how Entities are handled in the last 2 years)

If you want to follow along, you’ll need a development environment, and for that I would recommend the mwcli walkthrough that I wrote in the past weeks.

Where to start

I have a slight advantage here, as the closest thing that comes to documentation around how to add a new entity type to Wikibase is the documentation of the various fields that make up the entity registration system.

Beyond that, your only way in would likely be to start looking at one of the extensions that already provides an additional entity type, such as WikibaseMediaInfo, and the entity type registration that it makes. But each of these extensions come with their own complexity to muddle your view.

So, rather than reading through any documentation, I’m going to follow the same process as my 2022 branch, and start where it seems to “feel right”.

And more specifically:

And with the Wikibase extension loaded in my LocalSettings.php file.

<?php
require_once '/mwdd/MwddSettings.php';

wfLoadSkin('Vector');

# Load Wikibase Repository
wfLoadExtension( 'WikibaseRepository', "$IP/extensions/Wikibase/extension-repo.json" );
require_once "$IP/extensions/Wikibase/repo/ExampleSettings.php";Code language: PHP (php)

I have decided to model a simple phrase, that has a language code (that the phrase is in), and also some text that represents the phrase itself.

Lorem ipsum dolor sit amet

en

And for the purposes of these blogs, and to make following simple, I’ll be including all code I write directly in the Wikibase extension, in repo/includes/phrase.

I’ll make working commits along the way, that directly tie to progression through the posts.

Data model

I know that within Wikibase code, there is an interface called EntityDocument which needs to be implemented. This in turn requires the implementation of EntityId for my specific type of Entity.

<?php

namespace Wikibase\Repo\Phrase;

use Wikibase\DataModel\Entity\EntityDocument;

class PhraseDocument implements EntityDocument {

	const TYPE = 'phrase';

	private $id;
	private $language;
	private $phrase;

	public function __construct( PhraseId $id = null, string $language = 'en', string $phrase = '' ) {
		$this->id = $id;
		$this->language = $language;
		$this->phrase = $phrase;
	}

	public function getType() {
		return self::TYPE;
	}

	public function getId() {
		return $this->id;
	}

	public function setId( $id ) {
		if ( $id instanceof PhraseId ) {
			$this->id = $id;
		} else {
			throw new \InvalidArgumentException( 'Invalid id type' );
		}
	}

	public function isEmpty() {
		return $this->phrase === '';
	}

	public function equals( $target ) {
		return $target instanceof self && $this->language === $target->language && $this->phrase === $target->phrase;
	}

	public function copy() {
		return new self( clone $this->id, $this->language, $this->phrase );
	}
}
Code language: PHP (php)

This EntityDocument implementation includes the required methods of the interfaces only.

The constructor includes the language and phrase values that I want the document to contain.

In the document class, I already refer to a PhraseId that I need to also create.

<?php

namespace Wikibase\Repo\Phrase;

use Wikibase\DataModel\Entity\EntityId;

class PhraseId implements EntityId {

	private string $id;

	public function __construct( string $id ) {
		$this->id = $id;
	}

	public function getEntityType() {
		return PhraseDocument::TYPE;
	}

	public function getSerialization() {
		return $this->id;
	}

	public function __toString() {
		return $this->id;
	}

	public function equals( $target ) {
		return $target instanceof self && $this->id === $target->id;
	}

	
	public function __serialize(): array {
		return [ 'serialization' => $this->id ];
	}

	public function __unserialize( array $data ): void {
		$this->__construct( $data['serialization'] );
		if ( $this->id !== $data['serialization'] ) {
			throw new \InvalidArgumentException( '$data contained invalid serialization' );
		}
	}

}Code language: PHP (php)

It’s nice to see that in comparison to my 2022 branch, it seems that I no longer need to additionally implement getLocalPart and getRepositoryName within my EntityId implementation, which used to be requirements of the interface.

You’ll see a bunch of pretty empty looking code in these classes. One of the reasons for this is the interfaces leave a lot of, albeit simple, work for implementors to do. SerializableEntityId does exist to enable some of this boilerplate logic to come for free, but I’ll avoid using it in these post for now.

You should now be able to imagine that I could have a phrase with ID PHRASE1, in language en and an actual value of Hello world!

However, these classes are not currently connected to Wikibase or MediaWiki in any way, and I have no way of creating a new phrase.

At the end of this section, our code is at a245e111aeb7ec7a4187e2ae314fa64b643c3f4d.

Content

Content is a core concept of MediaWiki that Wikibase makes use of.

In order to get everything hooked up, we will need to implement the EntityContent abstract class and EntityHandler abstract class from within Wikibase. These in term extend AbstractContent and ContentHandler from within MediaWiki.

These abstract classes, both within Wikibase and MediaWiki, provide a basic mapping of functionality between Wikibase and it’s entities, and the MediaWiki content system, as well as MediaWiki content to the rest of the MediaWiki system.

A minimal PhraseContent, that simply holds an entity via a generic EntityHolder might look like this.

<?php

namespace Wikibase\Repo\Phrase;

use \Wikibase\Repo\Content\EntityHolder;
use \Wikibase\Repo\Content\EntityContent;

class PhraseContent extends EntityContent {

	const ID = 'phrase';

	private $holder;

	public function __construct(
		EntityHolder $holder
	) {
		parent::__construct( PhraseContent::ID );
		$this->holder = $holder;
	}

	public function getEntity() {
		return $this->holder->getEntity();
	}

	public function getEntityHolder() {
		return $this->holder;
	}

	public function getTextForSearchIndex() {
		return ""; // TODO implement in the future
	}

	public function isEmpty() {
		return ( !$this->holder || $this->getEntity()->isEmpty() );
	}

	public function getIgnoreKeysForFilters() {
		return [];
	}
}Code language: PHP (php)

⚠️ The default implementation of getTextForSearchIndex brings us to one of the first elements where the entity system tries to impose possibly unwarranted assumptions on entities. It expects entities to have a Fingerprint (labels, descriptions, aliases), which we do not have at this stage, hence the need to override this method with an empty string for now. ⚠️

And the handler for this content type and entity might look like this.

<?php

namespace Wikibase\Repo\Phrase;

use \Wikibase\Repo\Content\EntityHolder;

class PhraseContentHandler extends \Wikibase\Repo\Content\EntityHandler {

	public function getEntityType() {
		return PhraseDocument::TYPE;
	}

	public function makeEmptyEntity() {
		return new PhraseDocument();
	}

	protected function newEntityContent( EntityHolder $entityHolder = null ) {
		return new PhraseContent( $entityHolder );
	}

	public function makeEntityId( $id ) {
		return new PhraseId( $id );
	}

}Code language: PHP (php)

As with the data model implementations, so far we have written a lot of nothing, and that nothing is also still not really connected to anything.

At the end of this section, our code is at c037bdc915e1f9f06f6f794147b3d716057bb1ea

Registration

So, we have classes, but nothing is connected, and it needs to be connected in a few different ways.

Entity type definition

We can create a minimal entity type definition.

<?php

namespace Wikibase\Repo\Phrase;

use Wikibase\Lib\EntityTypeDefinitions as Def;

return [
	Def::CONTENT_MODEL_ID => PhraseContent::ID,
];Code language: PHP (php)

And load it directly into the list of other default Wikibase entity type definitions (although normally this would be done via an extension hook).

The file you are looking for is repo/WikibaseRepo.entitytypes.php, and your change will look something like this

...
return [
	\Wikibase\Repo\Phrase\PhraseDocument::TYPE => require __DIR__ . '/includes/Phrase/Definition.php',
	'item' => [
...Code language: PHP (php)

Namespace

Wikibase also needs to be told where to try and store this entity in MediaWiki itself. To do that, we need to add it to the list of entity namespaces (again something normally done via an extension hook).

There are quite a few places that this needs to be hooked up, again one of the pain points of entity implementation, and the best way I found to figure it all out was to simply look for something like WB_NS_PROPERTY, which only occours a handfull of times, and add the phrase namespace ID there too.

One file you need to look for is repo/config/Wikibase.default.php, and we want to end up with the following snippet, that says we will store the phrase entities in the namespace with ID 4269.

...
$entityNamespaces = [
	'item' => WB_NS_ITEM,
	'property' => WB_NS_PROPERTY,
	\Wikibase\Repo\Phrase\PhraseDocument::TYPE => 4269,
];
...Code language: PHP (php)

repo/includes/RepoHooks.php also has a reference.

...
$wgExtraNamespaces[WB_NS_PROPERTY_TALK] = 'Property_talk';
$wgExtraNamespaces[4269] = 'Phrase';

$wgNamespacesToBeSearchedDefault[WB_NS_ITEM] = true;
...Code language: PHP (php)

And the namespace also needs an i18n name which can be provided in repo/Wikibase.i18n.namespaces.php in the list under $namespaceNames['en'] (for now).

...
	WB_NS_QUERY_TALK    => 'Query_talk',
	4269 => 'Phrase',
];
...Code language: PHP (php)

At the end of this section, our code is at 7afc1e59abded170f26533d3e5e4e0df65763368

Special page for creation

At this stage, the boilerplate definitions are all hooked up, but none of this code is actually usable, so let’s create a basic special page to try and create our first actual stored phrase entity.

The special page code detracts from the main focus of this post, so I’ll leave you to view the full body in the commit, and again this is all done in Wikibase for simplicity, but could also be done from an extension.

The interesting parts are, firstly, that ID generation for this new entity currently happens in here (because I am assuming nothing else can make these entities). This is done using PHP’s uniqid method, prefixed with Phrase

new PhraseId(uniqid( 'Phrase', true ))Code language: PHP (php)

⚠️When drafting this post, Ollie and I were caught out for 30 minuites or so when the ID started with a lower case letter, ie phrase, which lead to an invalid title error from within MediaWiki.⚠️

Secondly, this special page does the saving using the generic Wikibase repo EntityStore.

\Wikibase\Repo\WikibaseRepo::getEntityStore()->saveEntity(
	$entity,
	"New phrase created in language " . $entity->getLanguage() . " with content " . $entity->getPhrase(),
	$this->getContext()->getUser(),
	EDIT_NEW
);Code language: PHP (php)

If we head to our list of special pages, we can see a new one called New Phrase, and opening it gives us a very basic UI.

At the end of this section, our code is at b17e4659216c3a57cc555fe1c36a48fa1902fb46

Make it work

Of course if you try to enter some text in each box, and save, it doesnt work, as we have missed some things along the way.

Let’s try and figure out what those are…

No content handler defined

First we have an issue with No content handler being defined for our new phrase entity type.

/wiki/Special:NewPhrase OutOfBoundsException: No content handler defined for entity type phrase
from /var/www/html/w/extensions/Wikibase/repo/includes/Content/EntityContentFactory.php(114)Code language: JavaScript (javascript)

Looking at this line, i can see that entityHandlerFactoryCallbacks is being checked, and seemignly the phrase entity has not configured one of these callbacks. This is needed for the connection between MediaWiki and Wikibase, and can be added in our Definition.php file using the Def::CONTENT_HANDLER_FACTORY_CALLBACK key.

Def::CONTENT_HANDLER_FACTORY_CALLBACK => function() {
	$services = \MediaWiki\MediaWikiServices::getInstance();
	return new PhraseContentHandler(
		PhraseContent::ID,
		null, // unused
		WikibaseRepo::getEntityContentDataCodec( $services ),
		WikibaseRepo::getEntityConstraintProvider( $services ),
		WikibaseRepo::getValidatorErrorLocalizer( $services ),
		WikibaseRepo::getEntityIdParser( $services ),
		WikibaseRepo::getFieldDefinitionsFactory( $services )
		->getFieldDefinitionsByType( PhraseDocument::TYPE ),
		null
	);
},Code language: PHP (php)

MWContentSerializationException

Next we end up getting told that MediaWiki, or rather the Wikibase storage layer, doesn’t know how to serialize this content.

/wiki/Special:NewPhrase MWContentSerializationException:
from /var/www/html/w/extensions/Wikibase/lib/includes/Store/EntityContentDataCodec.php(154)Code language: JavaScript (javascript)

Taking a look at the line of code that generates this error. The main dispatching entitySerializer for all Wikibase entities is used, and it doesnt know how to serialize the phrase entity, as we havn’t yet told it.

For this, we need another alteration to the entity type definition:

Def::STORAGE_SERIALIZER_FACTORY_CALLBACK => function( \Wikibase\DataModel\Serializers\SerializerFactory $serializerFactory ) {
	return new PhraseSerailizer();
},Code language: PHP (php)

Also to define a PhraseSerializer. For now this will simply take the language and phrase, setting them in easy to read keys.

<?php

namespace Wikibase\Repo\Phrase;

class PhraseSerailizer implements \Serializers\DispatchableSerializer {

	public function isSerializerFor( $object ) {
		return $object instanceof PhraseDocument;

	}

	public function serialize( $object ) {
		return [
			'language' => $object->getLanguage(),
			'phrase' => $object->getPhrase(),
		];
	}
}Code language: PHP (php)

Diffing the provided types of entities is not supported

/wiki/Special:NewPhrase RuntimeException: Diffing the provided types of entities is not supported
from /var/www/html/w/extensions/Wikibase/lib/packages/wikibase/data-model-services/src/Diff/EntityDiffer.php(64)Code language: JavaScript (javascript)

Taking a look at the line causing this error, we see a similar pattern to that which we saw above. differStrategies is expected to have a strategy per entity type, and we havn’t added one for phrase, and no default is provided.

So we need to register and create a basic differ, specifically for the phrase entity type.

Def::ENTITY_DIFFER_STRATEGY_BUILDER => static function () {
	return new PhraseDiffer();
},Code language: PHP (php)
<?php

namespace Wikibase\Repo\Phrase;

use Diff\DiffOp\DiffOpAdd;
use Diff\DiffOp\DiffOpChange;
use Diff\DiffOp\DiffOpRemove;
use Wikibase\DataModel\Entity\EntityDocument;

class PhraseDiffer implements \Wikibase\DataModel\Services\Diff\EntityDifferStrategy {

	public function canDiffEntityType( $entityType ) {
		return $entityType === PhraseDocument::TYPE;
	}

	public function diffEntities( EntityDocument $from, EntityDocument $to ) {
		$dops = [];

		if ($from->isEmpty() && !$to->isEmpty()) {
			$dops['language'] = new DiffOpAdd( $to->getLanguage() );
			$dops['phrase'] = new DiffOpAdd( $to->getPhrase() );
		} elseif (!$from->isEmpty() && $to->isEmpty()) {
			$dops['language'] = new DiffOpRemove( $from->getLanguage() );
			$dops['phrase'] = new DiffOpRemove( $from->getPhrase() );
		} elseif (!$from->isEmpty() && !$to->isEmpty()) {
            if ($from->getLanguage() !== $to->getLanguage()) {
                $dops['language'] = new DiffOpChange( $from->getLanguage(), $to->getLanguage() );
            }
            if ($from->getPhrase() !== $to->getPhrase()) {
                $dops['phrase'] = new DiffOpChange( $from->getPhrase(), $to->getPhrase() );
            }
		}

		return new PhraseDiff($dops);
	}

	public function getConstructionDiff( EntityDocument $entity ) {
        return new PhraseDiff([
            'language' => new DiffOpAdd( $entity->getLanguage() ),
            'phrase' => new DiffOpAdd( $entity->getPhrase() )
        ]);
	}

	public function getDestructionDiff( EntityDocument $entity ) {
        return new PhraseDiff([
            'language' => new DiffOpRemove( $entity->getLanguage() ),
            'phrase' => new DiffOpRemove( $entity->getPhrase() )
        ]);
	}

}Code language: PHP (php)

And this Differ in turn makes use of a PhraseDiff object too.

<?php

namespace Wikibase\Repo\Phrase;

use Diff\DiffOp\Diff\Diff;
use Diff\DiffOp\DiffOp;
use Wikibase\DataModel\Services\Diff\EntityDiff;

class PhraseDiff extends EntityDiff {

	public function __construct( array $operations = [] ) {
		parent::__construct( $operations );
	}

	public function getContentDiff() {
		return $this['content'] ?? new Diff( [], true );
	}

	public function isEmpty(): bool {
		// FIXME: Needs to be fixed, otherwise conflict resolution may lead to unexpected results
		return $this->getContentDiff()->isEmpty();
	}

	public function toArray( callable $valueConverter = null ): array {
		throw new \LogicException( 'toArray() is not implemented' );
	}

}Code language: PHP (php)

Finally, it saves!

Trying to create the phrase one final time using the special page, we no longer get an error, and instead get redirected to Special:RecentChanges (what we expect on success).

In the list of recent changes we can see the entity (and MediaWiki page) that we have created.

However, at this stage, don’t expect to be able to view the entity anywhere, as we havn’t defined anything to allow that to happen at this stage.

  • The Wikibase action API doesn’t know how to lookup the entity (get from an ID to a MediaWiki page / entity object)
  • MediaWiki doesn’t know how to deserialize the stored entiity (PhraseSerializer above is only one way)
  • MediaWiki doesn’t know how to display the entity, even if it could read it from the database into an object
  • etc…

At the end of this seciton, our code is at 581a694d5624edc1bf0746e800004da57840784f.

A summary

We now have an simple phrase entity, with a language and phrase value, which can be created, but not viewed.

The next post will likely cover:

  • Viewing the entity in the UI
  • Adding link formatting to MediaWiki for the entity
  • Viewing the entity via the API (wbgetentities)
  • Viewing RDF output for the entity (Special:EntityData)

At this point, most of these will just present errors instead.

Series Navigation<< EntitySchema, and the entity flip-flopWikibase Phrase Entity, Viewing >>