Reflection on filling a new Wikidata item

December 4, 2021 8 By addshore

A few days ago I watched a Twitch stream by Molly / GorillaWarfare where they created the Louis W. Roberts English Wikipedia page. I decided to follow along and populate the matching Wikidata item (Q109662645) with as much information as I could from the same references that were being found for the Wikipedia article.

Along the way, I remembered some of the quirks of the manual editing experience for Wikidata and noted some other things that generally might be interesting folks.

This is a write-up of those thoughts.

Louis W. Roberts

Louis featured on a list of African-Americans in Boston having articles created or improved on English Wikipedia. This list was generated from the content of a book called “African-Americans in Boston : more than 350 years” by Hayden, Robert C which can be found on archive.org. Louis is specifically noted on page 149.

This provides some starting context and some good elements to match against other references. From here multiple other references expanding what was known about Louis were found using search engines, Wikipedia library and more. All of which are now included on the Wikipedia and Wikidata pages.

Wikipedia article formation

Molly started creating the article in their userspace with the first version including a single line of content. This was gradually expanded while looking through the references. This expansion continued during the first hour after which there was enough referenced content to warrant a move to the main namespace. It made sense to keep the article in user space while it was being worked on to avoid unsuspecting readers seeing a work in progress article.

After landing in the main namespace categories were added using the HotCat gadget. The article continued to expand, including more references and categories. A person infobox was added to the article using data already found and stated in the article. Categories were sorted using another user script, and content continued to get added.

A few days after editing by Molly was complete, another user came along and added the Short description template copying the description from the Wikidata item.

Screenshot of the Louis W. Roberts articles after creation by Molly

Wikidata item population

While following the stream I tried to add roughly the same information to the Wikidata item that was appearing in the Wikipedia article. Molly had already added the first few basic statements to the Item such as instance of, sex or gender, given name, family name, occupation and employer. And the single reference to African-Americans in Boston page 149 already exists on one of the statements.

My first changes added the initial reference to another 2 statements that already existed, and added a new alias matching the Wikipedia article title. I then noticed that the existing employer statement could be more specific per one of the newly found references, so I altered the value and added a new reference.

Date of birth and date of death came next, but I immediately followed up these changes adding some new statements with the preferred rank for some more specific dates that I found in another reference. Also then a reference place of birth.

Many education and employer related changes ensued, followed by a realization that one of the references that I had been copying between statements using a gadget had been copying an incorrect reference that I didn’t want to keep, which I removed.

More content was added, followed by another realization of some incorrect data (the end time was before the start time). I’ll skip over the rest, but many more statements were added and tweaks made.

Comparison

I have a feeling that the creation of the Wikipedia article and Wikidata item would probably have been much more interesting to watch than to read after the fact with a bunch of links to diffs. But you’ll have to cope with my poorly written adventure for context.

Edits & Bytes: The Wikipedia article was fairly complete with around 32 edits, totalling 8k bytes of text storage for the final revision. The Wikidata item was fairly complete after around 114 edits, totalling 71k bytes text storage for the final revision. That’s 3-4x the number of edits as the Wikipedia article, and nearly 10x the number of bytes in the final revision.

References: The Wikipedia article includes a reference list of 8 sources, and these are referenced 34 or so times in the article text. References need only be defined once using a <ref> tag, and can then be referred to by name for subsequent uses. The Wikidata item contains 51 distinct references (no reuse is available) making use of 7 sources on 27 statements.

Wikidata editing quirks

The DuplicateReferences provides copy links next to references that already exist. This is helpful to avoid retyping things when a reference already exists that is the same or very similar to what you want to add. There is also the functionality to drag references from one statement to another (possibly provided by the same gadget). As noted above sometimes the gadget doesn’t quite hit the spot when you want a similar but slightly different reference. You can end up copying things you don’t want, and need to remove them later.

As a long time Wikidata editor I know when to use the rank feature, and what rank means. A couple of edits that I made set a preferred rank leaving some less specific but still referenced values at a normal rank. However, the interface for making this change is not very user friendly and there is no real guidance in the editing flow covering how and when to use this.

When creating some statements that needed references to Items that did not already exist (mother and father), I had to break out of my editing flow to navigate to Special:NewItem in order to create entities to point to. A nicer experience could be to be able to create such Items on the fly while creating a statement.

As the item got longer and longer it got harder to copy references between statements, but also harder to keep an eye on the statements that I had already added so as not to add them again. When complete the Wikidata item was over 4-6 whole heights of my monitor, meaning lots of scrolling back and forth.

Some information ends up duplicated, for example, educated at statements often container a qualifier for academic degree. There can however also be a top-level academic degree statement. The more I thought about this the more I thought that editing Wikidata 1 level higher might be nice, where some other community maintained definitions map higher-level data input to actual statement changes.

Wikidata data in use

Some Wikipedia projects, such as eo.wikipedia.org, have an extension called ArticlePlaceholder enabled. This provides a special page on the Wikipedia that can be found via search results including some information for a given concept, and a prompt to create an article for the topic.

Wikidata and other smaller Wikipedia projects can also link to the Reasonator tool created by Magnus Manske that can give a summary of a topic, and is viewable in multiple languages. Other elements, such as timelines, are also generated in this tool.

Infoboxes are generally used on all Wikipedia sites. We saw above that the current English Wikipedia article uses a manual infobox created by Molly using Infobox Person. There is a Wikidata powered version of this infobox too called Infobox person/Wikidata.

According to the template transclusion count tool this Wikidata powered infobox is currently used only 3.7k times on English Wikipedia.

At the time of writing this, you can find an example of this infobox on the Aelbert Cuyp article. But you can also see a previewed rendering of the Wikidata infobox for the Louis Roberts article in this tweet.

Final thoughts

I really enjoyed watching Molly’s stream editing Wikipedia, I’ll be sure to join again. It’s a great excuse to relax and do some Wikidata editing too.

I’m not really sure how many folks edit individual Wikidata items in this way anymore. The largest numbers of Wikidata edits come from other bulk editing interfaces, or from bots, but that doesn’t mean that high-quality individual editing should not be possible.

Using some napkin maths I’d say that there are generally 400-900k edits per day. Roughly ~60k edits per day come from the Wikidata UI (so ~10%). ~5k edits a day come from changes on client sites such as Wikipedia (so ~1%).

It’d be really nice to connect the workflows of content creation a bit more. The research done when either writing a Wikipedia article or Wikidata item is ultimately the work that we want to be able to easily share between projects. If when editing Wikipedia you could define facts/statements as you went to then be included in Wikidata, I imagine we would reduce effort everywhere and increase reuse of this research across projects.