It's a blog

Tag: Wikidata (Page 1 of 6)

Tech Lead Digest – August 2021

Welcome to the 4th instalment of my Wikidata & Wikibase Tech lead Digest for August 2021. For previous instalments see Q1, Q2 & July.

🧑‍🤝‍🧑Wikidata & Wikibase

The Wikidata Query Builder has been deployed. The Wikidata Query Builder provides a visual interface for building a simple Wikidata query. It is ideal for users with little or no experience in SPARQL.

The Wikibase fall release, which will be compatible with MediaWiki 1.36 will be made in the next month or so. At some point in the next 3-6 months we will likely also make a Wikibase 1.37 release. Keep an eye out on the mailing list for these.

Work is about to wrap up on the next iteration of Wikibase Federated Properties which will enable the use of properties from multiple sources at once, such as Wikidata and also the local Wikibase.

Work continues on the Wikidata Mismatch Finder. You can read more about this future tool on wikidata.org.

The campsite worked on many other things. Most notably Ladsgroup spotted that SpamBlacklist was rendering content on Wikidata twice (phabricator). This fix resulted in a rather significant improvement in save times for Wikidata, and users of Wikibase and SpamBlacklist in combination.

Continue reading

Tech Lead Digest – July 2021

Welcome to the third installment of my tech lead digest digest. In order to allow myself some extra space to write, and also to provide these public updates and thoughts on a more regular basis, this is now becoming a monthly digest.

I’m going to try to incorporate some of the ongoings from other Wikidata / Wikibase projects, as well as my regular digest and reading.

🧑‍🤝‍🧑Wikidata & Wikibase

Work continues on the next iteration of Wikibase Federated Properties (phabricator board). This work will allow use of properties from multiple sources at once, such as Wikidata and also the local Wikibase.

Work also continues on the Wikidata Mismatch Finder (phabricator board) which is a tool to enable finding mismatches between Wikidata’s data and data in other databases.

The Campsite continues to work on a variety of smaller tasks, in the last month including a new release of our Design System, dealing with the Query Builder security review and preparing for deployment, performing some maintenance on WBStack including preparing for 1.36 and adding Elasticsearch. We also continue to support a university team in deploying a new Property Suggester algorithm (announcement coming soon), work towards tagging all edits made from the UI (T236893), as well as many other smaller tasks.

Continue reading

Tackling Technical Debt, big and small, in Wikidata and Wikibase

If you’re working with legacy code, chances are you’ve inherited some technical debt. Infact, if you’re working with code, chances you’re already surrounded by technical debt of varying sizes, at least by some measures.

Some believe that technical debt is something to be avoided, and that technical debt that exists is a dirty secret that should be hidden. The reality is that technical debt is a fact of life when code iteratively changes to deliver product solutions.

Striving for programming perfection is great in principle, but ultimately code is meant to deliver features, and there is always a good, better and best approach, with many other variations in-between.

Over the last year at Wikimedia Deutschland we have worked on refining how we record, triage, prioritize and tackle technical debt within the Wikidata and Wikibase product family.

There are many thoughts out there about how to track, tackle, and prioritize technical debt. This post is meant to represent the current status of the Wikidata / Wikibase team. Hopefully you find this useful.

Continue reading

Tech Lead Digest – Q2 2021

This is the second installment of my tech lead digest digest with my tech lead hat on for the Wikidata Wikibase team.

This is a digest of my internal digest emails. These contain lots of links to reading, podcasts and general goings on that could be useful to a wider audience.

🧑‍🤝‍🧑Wikidata & Wikibase

Continue reading

Tech Lead Digest – Q1 2021

At some point last year I started sending a weekly internal digest to the Wikidata Wikibase team with my tech lead hat on.

The emails are internal only but contain lots of links to reading, podcasts and general goings on that could be useful to everyone.

So here is my first Wikidata Wikibase tech lead digest digest!

🧑‍🤝‍🧑Wikidata & Wikibase

Continue reading

WBStack setting changes, Federated properties, Wikidata entity mapping & more

During the first 3 months of 2021, some Wikimedia Deutschland engineers, from the Wikidata / Wikibase team, spent some time working on WBStack as part of an effort to explore the WBaaS (Wikibase as a service) topic during the year, as outlined by the development plan.

We want to make it easier for non-Wikimedia projects to set up Wikibase for the first time and to evaluate the viability of Wikibase as a Service.

Wikibase 2021 Development plan

This has lead to a few new Wikibase features being exposed through the WBStack dashboard for sites that run on the platform. These features are primarily features developed by the Wikibase team in 2020 and 2021. The work also brought some other quality of life improvements for the settings pages.

Here is a quick rundown of what’s new and improved.

Continue reading

Twitter bot powered by Github Actions (WikidataMeter)

Recently 2 new Twitter bots appeared in my feed, fullyjabbed & fullyjabbedUK, created by iamdanw and powered entirely by Github Actions (code).

I have been thinking about writing a Twitter bot for some time and decided to copy this pattern running a cron based Twitter bot on Github Actions, with an added bit of free persistence using jsonstorage.net.

This post if my quick walkthrough of my new bot, WikidataMeter, what it does and how it works. You can find the code version when writing this blog post here, and the current version here.

Continue reading

Testing WDQS Blazegraph data load performance

Toward the end of 2020 I spent some time blackbox testing data load times for WDQS and Blazegraph to try and find out which possible setting tweaks might make things faster.

I didn’t come to any major conclusions as part of this effort but will write up the approach and data nonetheless incase it is useful for others.

I expect the next step toward trying to make this go faster would be via some whitebox testing, consulting with some of the original developers or with people that have taken a deep dive into the code (which I started but didn’t complete).

Continue reading

Faster munging for the Wikidata Query Service using Hadoop

The Wikidata query service is a public SPARQL endpoint for querying all of the data contained within Wikidata. In a previous blog post I walked through how to set up a complete copy of this query service. One of the steps in this process is the munge step. This performs some pre-processing on the RDF dump that comes directly from Wikidata.

Back in 2019 this step took 20 hours and now takes somewhere between 1-2 days as Wikidata has continued to grow. The original munge step (munge.sh) makes use of only a single CPU. The WMF has been experimenting for some time with performing this step in their Hadoop cluster as part of their modern update mechanism (streaming updater). An additional patch has now also made this useful for the current default load process (using loadData.sh).

This post walks through using the new Hadoop based munge step with the latest Wikidata TTL dump on Google clouds Dataproc service. This cuts the munge time down from 1-2 days to just 2 hours using an 8 worker cluster. Even faster times can be expected with more workers, all the way down to ~20 minutes.

Continue reading

How can I get data on all the dams in the world? Use Wikidata

During my first week at Newspeak house while explaining Wikidata and Wikibase to some folks on the terrace the topic of Dams came up while discussing an old project that someone had worked on. Back in the day collecting information about Dams would have been quite an effort, compiling a bunch of different data from different sources to try to get a complete worldwide view on the topic. Perhaps it is easier with Wikidata now?

Below is a very brief walkthrough of topic discovery and exploration using various Wikidata features and the SPARQL query service.

A typical known Dam

In order to get an idea of the data space for the topic within Wikidata I start with a Dam that I know about already, the Three Gorges Dam (Q12514). Using this example I can see how Dams are typically described.

Continue reading
« Older posts

© 2021 Addshore

Theme by Anders NorénUp ↑