Wikibase Phrase Entity, Viewing

This entry is part 7 of 7 in the series Wikibase Entities

In my previous post, we got to the point of being able to create a new Wikibase Entity, it is stored in the MediaWiki database as a page, however we can’t actually view it via any interface currently.

In this post, we will work through another set of code changes, tackling each issue as we see it arise, until we can see the entity represented in the various places that users might expect.

Viewing the page

The provided entity serialization is neither legacy nor current

When clicking on one of the links on Special:RecentChanges to a phrase page that we have created, we get our first error.

/wiki/Phrase:Phrase66900b01937842.29097733 MWContentSerializationException: The provided entity serialization is neither legacy nor current
from /var/www/html/w/extensions/Wikibase/lib/includes/Store/EntityContentDataCodec.php(253)Code language: JavaScript (javascript)

The full stack trace is a little large, but you can find it in a paste bin.

This error is very similar to an issue we saw in the creation blog post, but this time the codec class can not deserialize what we have stored in the database, as we have not registered a deserializer for phrases.

Adding a deserializer to the entity registration file is very simple:

Read more

Wikibase Phrase Entity, Creation

This entry is part 6 of 7 in the series Wikibase Entities

Finally, after a long lead up of discussing what an entity is, looking at some examples of entity extensions, and one extension that chose not to make use of the Wikibase Entity system & EntityDocument. What does it take to create a new type of data entity within Wikibase that implements the EntityDocument interface and makes use of the various integrations that have evolved over the past 10+ years?

I slapped together a very rough branch exploring this in 2022, but it’s hard to follow at best, and doesn’t really discuss any of the challenges that crop up along the way. This post, and those following are the redo, with much more context. And with any luck, it will work mostly as before (as Wikibase hasn’t changed much internally when it comes to how Entities are handled in the last 2 years)

If you want to follow along, you’ll need a development environment, and for that I would recommend the mwcli walkthrough that I wrote in the past weeks.

Where to start

I have a slight advantage here, as the closest thing that comes to documentation around how to add a new entity type to Wikibase is the documentation of the various fields that make up the entity registration system.

Beyond that, your only way in would likely be to start looking at one of the extensions that already provides an additional entity type, such as WikibaseMediaInfo, and the entity type registration that it makes. But each of these extensions come with their own complexity to muddle your view.

Read more

EntitySchema, and the entity flip-flop

This entry is part 5 of 7 in the series Wikibase Entities

The EntitySchema extension, previously called WikibaseSchema, has had an interesting life since its initial creation back in early 2019.

The main point this story is intended to highlight is that EntitySchema started off its planned life as an Entity within a Wikibase. As the development team started work on an initial version, it flipped away from an entity. And in continued development, it has slowly inched its way back towards perhaps being an Entity.

Background

As is noted in the first ADR of the extension (which was actually written in 2023), the team initially decided to try and develop the extension entirely separate from Wikibase

Although Entity Schemas relate to Wikibase entities by name and purpose, the implementation of the EntitySchema extension, at the time of this decision, is completely decoupled from Wikibase, and the concept of Entities that it adds to MediaWiki. Thus, a MediaWiki instance can theoretically operate with only the EntitySchema extension, and without the Wikibase extension installed.

Keeping EntitySchema separate from Wikibase, and the idea of an Entity it provides altogether, was a conscious decision to not marry its implementation to the inherent complexity of Wikibase itself. As well as an attempt to avoid overloading EntitySchema with unnecessary functionality so that its ongoing implementation could be done iteratively and in a more flexible, organic manner, to answer user’s needs as they are brought to us.

0001 Extend Entity Schema to support additional “traits” ADR

In a nutshell, this extension, and the developments and discussions about it over the past years (and that are still happening today), was one of the things that has led me to recently writing a series of blog posts about what I think an “entity” is from my perspective, as well as looking at some other entities, and the use of EntityDocument in the codebase.

Project kick-off

Internally within WMDE, the extension started off (having already been planned and discussed for some time) with a series of kick-off meetings in December 2018. The first of which was deemed to have too many open questions, hence the follow-up of a second. Ultimately, a team formed around the creation of the extension and this started further discussions.

Read more

Wikibase Repository development environment (mwcli)

This entry is part 4 of 7 in the series Wikibase Entities

Back in 2022, while working at Wikimedia Germany, I ran two sessions with people from the Wikibase Stakeholder Group, focused on Ecosystem Enablement.

These sessions were video recorded and documented in quite a lot of detail, but following through with the videos would probably lead to a bit of a drawn out experience, as they were focused around a workshop setting with participants following along.

  • Session 1, 2022-04-28: Using mwcli, loading extensions, understanding Mediawiki’s general extension mechanism (Video, Overview)
  • Session 2, 2022-05-24: Running your first extension, Wikibase stable interface policy, Mediawiki hooks, building a new API function (Video, Overview)

In this post, I will focus on the core steps required to get a MediaWiki and Wikibase Repository development environment setup in a few minutes with mwcli, and will serve as a basis for some blog posts that I will be writing in the future.

Getting mwcli

If you head to the home page of mwcli, you’ll see a link to an installation guide.

Read more

Lexeme and MediaInfo, implementing EntityDocument

This entry is part 3 of 7 in the series Wikibase Entities

As we continue the journey, looking at Entity and EntityDocument within Wikibase, another useful thing to look at are the third and fourth widely used (at least within the Wikimedia space) entity types for Wikibase.

Both of these entity types make use of the EntityDocument, with none of the old assumptions baked into the Entity base class that used to exist.

MediaWiki extensions

As these entity types were decoupled from the main body of Wikibase, they were developed as MediaWiki extensions. https://www.mediawiki.org/wiki/Extension:WikibaseMediaInfo and https://www.mediawiki.org/wiki/Extension:WikibaseLexeme

This was the easy choice at the time, and probably still makes perfect sense, as Wikibase itself is a MediaWiki extension, and there is already a common pattern of extensions extending extensions. This ultimately saves some work around coding an extension mechanism, though we should remember that ultimately the Wikibase codebase has free choice when it comes to choose how it can be extended.

Read more

Wikibase, from Entity to EntityDocument

This entry is part 2 of 7 in the series Wikibase Entities

The term document has already come up a few times while discussing what a Wikibase entity is, and if that should change (be that in name only, code or structures), including in my first post of this series.

Looking at the very first definition of entity in the duck duck go search that I performed 6 seconds ago, an entity is:

Something that exists as a particular and discrete unit.

The American Heritage® Dictionary of the English Language, 5th Edition

At the most basic level, it’s fairly straightforward to say that a Wikibase doesn’t hold the actual entities (such as a type of tree), rather data about said entities.

And in a nutshell, this data is collected within a document.

Image from “What is the semantic web” by onotext.com

Quoting a few choice people again, before diving deeper into this topic…

The “entities” in the Wikibase base are not Entities. They are descriptions of entities. The entity is the thing in the world not the data we have about it, even tough colloquially, we don’t make the distinction. But we have separate URIs for the thing and the description in the abstract and for specific renderings.
I think that’s important to mention when discussing what an entity “is”.

Daniel Kinzler in conversation, June 2024

The data model chose to use the term “Entity” for the top-level Thing/class in the hierarchy of the data model. But in reality, a better term would have been “Document” or “Record”. In general, the confusion is often due simply to folks that are more familiar with one of the domains than the other, between OOP Objects and Semantic Web Objects.

Thad Guidry in a comment, June 2024

Read more

Wikibase: What is an entity?

This entry is part 1 of 7 in the series Wikibase Entities

I left the Wikidata and Wikibase teams roughly a year ago, and at the time there were some long and deep discussions going on inside the team trying to define what an entity was, and what should and should not be an entity.

At the recent Hackathon in Tallinn, this topic resurfaced to me, as current and previous members of the Wikidata and Wikibase teams were in attendance, along with myself.

I have opinions, others have opinions, and feel that a short blog post summarizing the currently publicly written details, as well as some of the more on point things I have heard people say may help further discussion, or perhaps bring it to some kind of conclusion.

What I actually found when pulling the various written details together is they mostly describe what I would say is the ideal path forward without rewriting the world (of Wikibase), but it’s taken me a while to sit back, relax, and actually reread all the things that we have written over the years.

Read more

Creating properties with statements using Wikidata Integrator

Wikidata Integrator is a Python library that simplifies data integration from Wikidata (and other Wikibases). It is written in Python, is focused on Wikibase concepts (as opposed to some libraries which are MediaWiki focused) and has a user-friendly interface.

I’m currently working on a demo Wikibase and decided to bring all of the data into the Wikibase making use of a Jupyter notebook, and Wikidata integrator was my choice library to use for this task. (Jupyter notebooks are interactive coding environments that allow users to create and share documents containing live code, visualizations, and explanations.)

Along that journey I found the Wikidata Integrator documentation lacking slightly, but I managed to get initial property and item creation working with little effort. However, I couldn’t get properties to create with statements already on them (needed a subsequent edit instead).

Read more

Wikibase and reconciliation

Over the years I have created a few little side projects, as well as working on other folks’ Wikibases, and of course Wikidata. And the one thing that I still wish would work better out of the box is reconciliation.

What is reconciliation

In the context of Wikibase, reconciliation refers to the process of matching or aligning external data sources with items in a Wikibase instance. It involves comparing the data from external sources with the existing data in Wikibase to identify potential matches or associations.

The reconciliation process typically follows these steps:

  1. Data Source Identification: Identify and select the external data sources that you want to reconcile with your Wikibase instance. These sources can include databases, spreadsheets, APIs, or other structured datasets.
  2. Data Comparison: Compare the data from the external sources with the existing data in your Wikibase. This step involves matching the relevant attributes or properties of the external data with the corresponding properties in Wikibase.
  3. Record Matching: Determine the level of similarity or matching criteria to identify potential matches between the external data and items in Wikibase. This can include exact matches, fuzzy matching, or other techniques based on specific properties or identifiers.
  4. Reconciliation Workflow: Develop a workflow or set of rules to reconcile the identified potential matches. This may involve manual review and confirmation or automated processes to validate the matches based on predefined criteria.
  5. Data Integration: Once the matches are confirmed, integrate the reconciled data from the external sources into your Wikibase instance. This may include creating new items, updating existing items, or adding additional statements or qualifiers to enrich the data.

Reconciliation plays a crucial role in data integration, data quality enhancement, and ensuring consistency between external data sources and the data stored in Wikibase. It enables users to leverage external data while maintaining control over data accuracy, completeness, and alignment with their knowledge base.

Existing reconciliation

One of my favourite places to reconcile data for Wikidata is by using OpenRefine. I have two previous posts looking at my first time using it, and a follow-up, both of which take a look at the reconciliation interface (You can also read the docs).

Read more

What is Wikibase Cloud (according to ChatGPT)

In this post, I’ll be asking ChatGPT a bunch of questions about Wikibase Cloud. mainly to see how accurate it manages to be, and how helpful the information is.

Messages to Chat GPT will be in bold, things that are wrong, in red, and things that are right in green.

Firstly, “What is Chat GPT?

But I needed to follow up, asking “Can you answer again but in the third person?

ChatGPT is a large language model developed by OpenAI that has been trained on a vast corpus of text data. The acronym GPT stands for “Generative Pretrained Transformer,” which refers to the model’s ability to generate natural language responses to text-based prompts. ChatGPT uses a transformer neural network architecture that enables it to process and comprehend long sequences of text. Its training data is sourced from a diverse range of materials, including books, articles, and websites, and covers a broad spectrum of topics. ChatGPT has been designed to assist users with natural language processing tasks such as answering questions, summarizing text, and generating human-like responses in conversation.

ChatGPT on 4 March 2023

But when asking about Wikibase Cloud it gets a little confused and answers incorrectly…

What is Wikibase Cloud?

Wikibase Cloud is a hosting service provided by Wikimedia Cloud Services that allows users to create and manage their own Wikibase instances in the cloud. …

ChatGPT on 4 March 2023
The full ChatGPT response for “What is Wikibase cloud”

Wikibase Cloud is a hosting service provided by Wikimedia Deutschland, so the reference to Wikimedia Cloud Services here is incorrect.

So naturally, I need to ask “Are you sure it is provided by Wikimedia Cloud Services?

Read more