EntitySchema, and the entity flip-flop
The EntitySchema extension, previously called WikibaseSchema, has had an interesting life since its initial creation back in early 2019.
The main point this story is intended to highlight is that EntitySchema started off its planned life as an Entity within a Wikibase. As the development team started work on an initial version, it flipped away from an entity. And in continued development, it has slowly inched its way back towards perhaps being an Entity.

Background
As is noted in the first ADR of the extension (which was actually written in 2023), the team initially decided to try and develop the extension entirely separate from Wikibase
Although Entity Schemas relate to Wikibase entities by name and purpose, the implementation of the EntitySchema extension, at the time of this decision, is completely decoupled from Wikibase, and the concept of Entities that it adds to MediaWiki. Thus, a MediaWiki instance can theoretically operate with only the EntitySchema extension, and without the Wikibase extension installed.
Keeping EntitySchema separate from Wikibase, and the idea of an Entity it provides altogether, was a conscious decision to not marry its implementation to the inherent complexity of Wikibase itself. As well as an attempt to avoid overloading EntitySchema with unnecessary functionality so that its ongoing implementation could be done iteratively and in a more flexible, organic manner, to answer user’s needs as they are brought to us.
0001 Extend Entity Schema to support additional “traits” ADR
In a nutshell, this extension, and the developments and discussions about it over the past years (and that are still happening today), was one of the things that has led me to recently writing a series of blog posts about what I think an “entity” is from my perspective, as well as looking at some other entities, and the use of EntityDocument in the codebase.
Project kick-off
Internally within WMDE, the extension started off (having already been planned and discussed for some time) with a series of kick-off meetings in December 2018. The first of which was deemed to have too many open questions, hence the follow-up of a second. Ultimately, a team formed around the creation of the extension and this started further discussions.
I feel it worth noting that during these years at WMDE, “The Journey Model” was being used, which is a modification of the Spotify model for making agile work at scale. A key part of this that likely ties into some of the interesting presentation and development along this journey is the desire that a team working on a feature would “not last more than 1Q(uarter) to avoid long running teams“. This short running team would be called a hike.
As I remember it, Product had presented a high level overview of a problem:
- enable humans and machines to find items that do not fit a certain shape in order to find mistakes and omissions in our data
- increase the confidence in our data
As well as some key usage scenarios for a first version:
- I’d like to find existing schemas that are relevant to me
- I’d like to understand what an existing schema does
- I’d like to discuss an existing schema and form modelling consensus
- I’d like to adapt an existing schema
- I’d like to be able to store a new schema
- I’d like to be able to test a set of entities against an existing schema
And some vision into the future:
- First: Allow storing of schemas
- Next: Allow storing of explanatory text and categorizing schemas
- Later: Allow checking the current entity against a schema
It was clear to see, through the mockups of what this thing might look like, that Product’s intention back then was for this to be an Entity (whatever that meant in the Wikibase of 2018). The page mockup explicitly references a “termbox”, something that to date only appears on Entity pages.


Even ahead of this kick-off meeting, other elements of discussion certainly pointed toward a Schema being an Entity, such as “schemas will be identified by a sequential number, prefixed by the letter “O” (wd:O123)”.
But this strongly ties back to what you think an entity is of course.
Scoping
After the kick-off, discussions within the implementing team lead to the conclusion that the currently scoped work could be carried out entirely separate from Wikibase. I don’t believe this was documented publicly anywhere, but it wouldn’t surprise me if there is some documentation hidden in an internal only WMDE Google Doc.
Development
Thanks to the ever open records that are contained within the Wikimedia Phabricator instance, Gerrit UI, and git repos (mirrored to github), we can get a pretty good idea of how the early development happened.
You can find all Phabricator tasks in the order that they were created, the Gerrit code reviews that were happening, and all commits that were ultimately made to the extension – although some of these links may need tweaking as the number of pages grow!
2019 hike
I may have missed things in the below summary, but I think it gives a rather nice condensed view on how extension development may work, and also how EntitySchema evolved in the past.
Roughly speaking, and glossing over small unimportant patches, development looked something like this:
- 2019 Jan:
- Make a git repo
- Make a boilerplate MediaWiki extension
- Add a MediaWiki content type and namespace to hold the schemas
- Add a form of labels, descriptions and aliases to the content
- Add an incremental ID generator, and use them as MediaWiki titles
- Check some permissions while using Special:NewSchema
- Make use of Fingerprint, Serialization and Deserialization from Wikibase
- Ultimately this starts storing the labels, descriptions and aliases the same way within JSON as is done in Wikibase, but does not make EntitySchema an entity.
- Disable use of the MediaWiki provided
edit
API (and followup)
- 2019 Feb:
- Don’t show “create” tab on non-existing schema pages
- Refactorings to UseCases and Domain\Model
- Add SchemaDiffer
- Introduce a lighter way to write schemas internally (and use it)
- Start making use of presentation objects
- Drop dependencies on Wikibase
- Labels, descriptions, and aliases are no longer making use of Fingerprint from Wikibase
- Add Schema patcher
- Implement undo
- Add created and edited schemas to user watchlist according to settings
- Add special page to get text of the schema
- 2019 March
- Remove duplicate aliases in SetSchemaLabelDescriptionAliases
- Implement editing schema in multiple languages
- Set display title from label and schema ID
- Add link to check entities against schema
- Report edit conflicts when submitting schema text (and badges)
- Validate length of label, description, aliases
- Translatable edit summaries
- Make Schema namespace immovable
- Check for internal edit conflicts in undo/restore
- 2019 April
- Make edit summaries specific for label, description or aliases
- Improve search index test for Schemas
- Grand rename from WikibaseSchema to EntitySchema
So a working extension was created and released to the community in a single quarter!
Within the list of things that the team had to work through above you can see many that existed, one way or another, within Wikibase already, that Wikibase either forces you to use, or optionally provides for you to use.
Edit conflict handling, ID generation, undos, restores and patches, translatable edit summaries (as well as basic out of the box edit summaries), search index integration, figuring out how to configure MediaWiki content correctly (not editable directly, immovable namespaces etc), special pages for basic interactions, diffing, storage.
This list, and the first iteration of development above, will become more relevant as I continue this series of blog posts, so watch out for what is next!
The intermediate
Between 2019 and 2023, a series of small pokes, prods, adjustments and minor updates happened to the extension.

2023 focus
In 2023, a second round of work started on the extension. This was tracked in a series of milestones (M1-M5) on Phabricator with the general overall goals of make EntitySchemas linkable from statements on Entities, appear as formatted text instead of IDs in many MediaWiki interfaces, and use a standard termbox (as is done on Items and Properties).
In summary:
- M1: technical preparation, Updating dependencies and development docs, first prototypes for hook containers and dependency injection, etc.
- M2: Linking to EntitySchemas in statements, EntitySchemas should now be a valid datatype for Statements, including a sensible RDF export and being queryable in the WDQS
- M3: EntitySchemas shown as labels instead of ID, EntitySchemas show up as their label (with fallbacks) in all the relevant places: watchlists, recent changes, statement values, listings, when used with LUA.
- M4: EntitySchema Termbox standardised (seemingly not done as I see no tasks in M4)
- M5: Project closure
And this is the time period that saw a series of ADRs start to be written about how the extension was going to continue being developed.
- 1 Extend Entity Schema to support additional “traits”
- 2 Use Cypress for Browser Testing
- 3 The wiring for creating a new EntitySchema Datatype will be in the EntitySchema extension
Ultimately these overall goals, I would argue, come for free from Wikibase and being an Entity, as did many of the code patches that needed to be implemented in the 2019 effort. And this is again something I hope to explore in future posts.
Complexity of Wikibase
ADR1 does reference inherent complexity of Wikibase
in its introduction, but I’ll refer back some quotes from a prior post here.
Many lessons have been learnt throughout their current ~11 year lifespan, and reimplementing the idea of human-readable text for an ID from scratch will lead to things being missed, and years down the line ultimately (and hopefully) ending up in the same place, just now with 2 systems to maintain.
addshore.com – Lexeme and MediaInfo, implementing EntityDocument – June 2024
The quote above is specifically talking about labels, descriptions and aliases, and how they are handeled in Wikibase, and mapped to MediaWiki, in particular how they are stored and exposed to users.
Roughly, what I see reflecting on the ~5 year lifetime of the EntitySchema extension is exactly this.
Something minimal was implemented, which didn’t meet the requirements that Product was already aware of, and meet the feature level that Wikibase and Entities already provided across the board. And many years later M3 is finally delivering this baseline functionality to the extension.
I think part of the attraction of rewrites is that it saves you the trouble of understanding the old system. But in my experience, that just means that you’ll repeat the mistakes because you didn’t take the time to learn from the past.
Wise-ish German man
I don’t think it’s neccesarily only mistakes that end up being repeated, but also core assumed functionality that ends up being missed because it is assumed by some, or not understood by others, and seen as unneccesary complexity.
EntitySchema is now kind of an Entity?
Now to talk about the main reason that I wrote this post at all…
The Support additional types in wbsearchentities Gerrit change that was merged in the past weeks.
In a nutshell, this patch adds EntitySchemas, which are not an entity in code, to the websearchentities action API module of Wikibase for be found by their ID.

So, according to wbsearchentities, EntitySchema are now an Entity, they have an entityId
, and also labels, descriptions and aliases that share at least some commonality between all other entities and themselves.
Reasoning & Approach
This change relates to the M2 milestone in the last batch of work done on the extension, “Linking to EntitySchemas in statements”.
What I believe happened is the team implemented their own “expert” (a UI element in Wikibase that is used for editing a particular type of data, that is normally the target value of a statement).
But along part of that journey, likely to save time, they decided to make use of the existing wbsearchentities API to return the results for the expert to use. Some other modifications were also needed to Wikibase to enable this “non entity id” value to be used as a statement value.
This ultimately has lead to a working feature, but I’d argue this is at some rather confusing cost.

Implications
EntitySchema is now in a weird middle ground. As time progresses, it slowly looks more and more like an entity, but it still doesn’t quack like one, and when looking under the surface, the complexity required to maintain this second not-an-entity system along side the existing entity system is going up and up.
Taking a look specifically at the way EntitySchema now hooks into the wikibase entity search. The wiring for search for all entities used to look like this.
'WikibaseRepo.EntitySearchHelperCallbacks' => function ( MediaWikiServices $services ): array {
return WikibaseRepo::getEntityTypeDefinitions( $services )
->get( EntityTypeDefinitions::ENTITY_SEARCH_CALLBACK );
},
Code language: PHP (php)
Ultimately this looks at all registered entities, and uses their defined ENTITY_SEARCH_CALLBACK
to add things to the Wikibase entity search results.
Adding EntitySchema to this search, as it is not an entity and can thus not make use of entity registration, has introduced another hook to this point for things that are not entities to use.
'WikibaseRepo.EntitySearchHelperCallbacks' => function ( MediaWikiServices $services ): array {
$callbacks = WikibaseRepo::getEntityTypeDefinitions( $services )
->get( EntityTypeDefinitions::ENTITY_SEARCH_CALLBACK );
$services->getHookContainer()->run( 'WikibaseRepoEntitySearchHelperCallbacks', [ &$callbacks ] );
return $callbacks;
},
Code language: PHP (php)
It may look small, but this is ultimately the start of a second entity registration system.
More explanation is now included along side some parts of Wikibase code, explaining this complexity too.
/**
* @internal
* @return string[] List of entity type identifiers for search.
* This includes all the {@link self::getEnabledEntityTypes() enabled entity types},
* and potentially additional types that are not registered with Wikibase’s entity registration yet.
* Such “types” must be used with caution, as they may not support anything other than search.
*/
public static function getEnabledEntityTypesForSearch( ContainerInterface $services = null ): array {
return ( $services ?: MediaWikiServices::getInstance() )
->get( 'WikibaseRepo.EnabledEntityTypesForSearch' );
}
Code language: PHP (php)
Wording in many places along this search path just no longer makes sense in terms of what an Entity is known to be in code within the Wikibase extension.
What if?
My general hope is that as the team comes to need more and more things that are provided by Wikibase Entities “for free” via existing interfaces, or as the team needs EntitySchemas to exist within the Wikibase ecosystem, rather than next to it (such as adding to the query service), EntitySchema will eventually end up an Entity.
Of course there is work to be done around Entity registration, and tidying up the legacy that has existed for over 10 years at this point.
I highlighted many of these issues some years ago in a badly named branch to Wikibase where I added a new Entity type. Ultimately, this will be the branch that I will rewrite in my next few blog posts, stepping through issues the code changes identify.
Hi Adam!
I notice that you push emphasis in some of your posts to quantify or subclassify “what is a Entity, or what should be an Entity, or is this going to be/or should be an Entity?” in the Wikipedia/Wikidata context. For the ecosystem, what is an Entity has not changed over its lifetime since it was introduced into the Wikibase extension. Hear me out a bit, if I’ve ruffled feathers just now. There is a difference between “giving a thing an identifier” and “this thing is an Entity (or Concept)… and has an identifier”. And I see that you sometimes skip over that most important part of “identified things / identified documents” in the Wikibase ecosystem. Indeed EntitySchemas need and have an identifier. BUT. Just because EntitySchemas have an identifier and their internal object model reuses the Entity model (really it’s actually a Document model, remember?!?!) that does not automatically classify or make them an entity. Multiple kinds of document objects need an identifier. (remember the last post and my mention of Document?). In the case of “a Document that describes kinds of Entities”(an EntitySchema), those documents also need an identifier like
E32
. Which really should have beenES32
in my opinion to avoid confusing folks that they are true Entities when they really are not. Instead EntitySchemas are Documents that describe kinds of Entities or the shape of Entities, or simply “traits of an Entity”.The important point to take away here from my opinion is that just because internally an identifier is given to some Document object in the ecosystem, that identifier DOES NOT automatically assert or bring along an Entity classification for the Document or described Thing.
My general hope is that you really mentally replace the high level Entity construct that is in Wikibase with that of a Document, as previously suggested. Then things make much more sense and we also don’t further confuse new developers coming in. My further hope is that you PUSH THE TEAM to “rename the damn thing” to Document to avoid further confusion across namespaces and domains so that in the future, AI doesn’t also hallucinate with the Wikibase document model at its core that has many kinds of Document objects like Entities, EntitySchemas, etc.
I wrote a very nice reply to this and it failed to save, and my second attempt doesn’t feel as good, but I think I hit on the major points again here…
In a nutshell, I totally agree with everything you have written here.
This series of posts focusing on Wikibase ecosystem / Wikibase entities is trying to cover most of the journey around Wikibase entities, how they started, where we are now (with the introduction of Document in the code base https://addshore.com/2024/06/wikibase-from-entity-to-entitydocument/), and where we should probably go, both in terms of code and more abstract ideas.
https://en.wikipedia.org/wiki/Dissoi_logoi
Two quotes I very much like form your comment are:
While writing content about Wikibase, it is hard not to refer to these things as entities, or Wikibase entities, as both in code and interfaces to date this is still what they are called.
+1 to this, and this aligns with a previous post where I say it needs a new name, but I don’t have a particularly strong opinion on what to call it, though Document could be a strong candidate.
The main thing the team has to commit to in order to push forward in this area, and make their and everyone else’s lives easier is to actually start working on the central Wikibase codebase again in a large way, rewrite “entity registration” as it is currently implemented, and simultaneously take into account everything that has been learned through adventures such as EntitySchema (highlighted in the blog post), and other Wikibase entities that have been added along the way on top of the original Item and Property.
With luck, that is around the corner.
[…] a long lead up of discussing what an entity is, looking at some examples of entity extensions, and one extension that chose not to make use of the Wikibase Entity system & EntityDocument. What does it take to create a new type of data […]