Creating new Wikidata items with OpenRefine and Quickstatements

July 8, 2020 1 By addshore

Following on from my blog post using OpenRefine for the first time, I continued my journey to fill Wikidata with all of the Tors on Dartmoor.

This post assumes you already have some knowledge of Wikidata, Quickstatements, and have OpenRefine setup.

Note: If you are having problems with the reconciliation service it might be worth giving this mailing list post a read!

Getting some data

I searched around for a while looking at various lists of tors on Dartmoor. Slowly I compiled a list that seemed to be quite complete from a variety of sources into a Google Sheet. This list included some initial names and rough OS Map grid coordinates(P613).

In order to load the data into OpenRefine I exported the sheet as a CSV and dragged it into OpenRefine using the same process as detailed in my previous post.

Reconciliation in OpenRefine

This data set doesn’t yet link to Wikidata at all! And that’s where the OpenRefine reconciliation features get used once again.

Column5 represents something that is close to a label for Wikidata items, and that is what I will use for reconciliation alongside matching the type of tor(Q1343179).

Reconciliation took a few minutes and matched the tors that already exists on Wikidata with the names that were loaded into OpenRefine. Depending on the data you’re reconciling with you might want to choose a more general type or even no type at all, but be prepared to do more manual work matching things.

The screenshot below shows the records, with reconciliation applied, filtered by judgement state (on the left hand side). “matched” refers to records that were already linked to a Wikidata item and “none” refer to those that need some manual work.

Note: this screenshot was taken after I performed my data load, hence many are matched, but it still illustrates the manual matching process.

Even the “matches” records should probably be checked, depending on the options that are used for reconciliation. Next, the records with no match need to either be connected to one of the found Wikidata items or set to “Create new item”.

The case described here is very simple, and there are many more details that can be taken into account with reconciliation. You can find more docs here.

Mutating a data element

The grid reference is in the data set is not yet in the correct format for Wikidata which expects a format with no spaces such as SX604940.

To do this the “Edit cells” > “Replace” option can be used to simply replace any whitespace with nothing.

Although the screenshot doesn’t show much, as whitespace is being replaced, this had the desired effect on the data!

There are also many other mutations that can be applied, including regex alterations which open up a world of possibilities.

Mapping to Wikidata

The “Schema” tab is the next one to look at, allowing mapping the simple table of data to Wikidata items and statements.

To get here I clicked “+ add item” and used tor(Q1343179) as the type for the items.

The name of the tor which is in my Column5 can be used as an English label.

Finally, the one data value from my table can be included as a Statement, using OS grid reference(P613) can be added referring to Column9 for the value. The data set also included a URL value in another column which was the source of the grid reference. This was also added as a Reference with a retrieved(P813) date.

Editing with Quickstatements

I’m sure there is a way to create these items within OpenRefine itself, however, I wanted to have try out the Quickstatements integration, which is why I chose this creation method.

Under the “Wikidata” menu there is an item allowing an “Export to QuickStatements”. Clicking this will general a list of Quickstatments commands (sample below).

Q1343179	Len	"Fox Tor (Fox Tor Mires)"
Q1343179	P613	"SX62616981"	S813	+2020-07-08T00:00:00Z/11	S854	"https://someURL"
Q1343179	P613	"SX74257896"	S813	+2020-07-08T00:00:00Z/11	S854	"https://someURL"
Q1343179	P613	"SX70908147"	S813	+2020-07-08T00:00:00Z/11	S854	"https://someURL"
Q1343179	P613	"SX55689094"	S813	+2020-07-08T00:00:00Z/11	S854	"https://someURL"
Code language: JavaScript (javascript)

These commands can be pasted into a “New batch” on the quickstatments tool.

Clicking “Import V1 commands” and then “Run” will start making your edits.

The edits

You can see the initial batches of edits in the editgroups tool (which indexes this sort of batched editing) here and here. The first was a small test batch, the second completing the full run.