It's a blog

Author: addshore (Page 1 of 13)

Creating a new replica after purging binlogs with bitnami mariadb docker images

I have been using the bitnami mariadb docker images and helmfiles for just over a year now in a personal project (wbstack). I have 1 master and 1 replica setup in a cluster serving all of my SQL needs. As the project grew disk space became pressing and from an early time I has to start automatically purging the bin logs setting expire_logs_days to 14. This meant that I could no longer easily scale up the cluster, as new replicas would not be able to entirely build themselves.

This blog post walks through the way that I ended up creating a new replica from my master after my replica corrupt itself and I was all out of binlogs. This directly relates to the Github issue on the bitnami docker images of https://github.com/bitnami/bitnami-docker-mariadb/issues/177

The walkthrough was performed on a Google Kubernetes Engine cluster using the 7.3.16 bitnami/mariadb helm charts which contain the 10.3.22-debian-10-r92 bitnami/mariadb docker image. So if you are using something newer expect some differences, but in principle it should all work the same.

Continue reading

2020 Election, Registered voters misinformation #voterfraud?

On November 4th 2020 I managed to get an overview of exactly how misinformation and “fake news” can start so accidently, and spread so rapidly.

While scrolling through Twitter during the 2020 US Presidential election, I spotted some tweets saying that more people had voted in Wisconsin than were originally registered in the state. You can find a bunch of them using this twitter search.

After performing a quick Google search looking for some data I found a worldpopulationreview.com list of states by registered voter count for 2020 as the first result, interestingly with the same value as included in the tweet, 3,129,000. Looking into the “Sources” of the page helpfully listed by the author I couldn’t see data being referenced for 2020, only for 2018 and 2016. This page has the wrong title!

Some more research lead me to what appeared to be the first fact check article also confirming that the number being circulated appeared to be from 2018, not 2020.

Rather than leaving it there, for whatever reason I decided to get more involved, dig a little deeper, talk to some people on twitter and see what I could change as this misinformation continued to be spread.

Continue reading

Faster munging for the Wikidata Query Service using Hadoop

The Wikidata query service is a public SPARQL endpoint for querying all of the data contained within Wikidata. In a previous blog post I walked through how to set up a complete copy of this query service. One of the steps in this process is the munge step. This performs some pre-processing on the RDF dump that comes directly from Wikidata.

Back in 2019 this step took 20 hours and now takes somewhere between 1-2 days as Wikidata has continued to grow. The original munge step (munge.sh) makes use of only a single CPU. The WMF has been experimenting for some time with performing this step in their Hadoop cluster as part of their modern update mechanism (streaming updater). An additional patch has now also made this useful for the current default load process (using loadData.sh).

This post walks through using the new Hadoop based munge step with the latest Wikidata TTL dump on Google clouds Dataproc service. This cuts the munge time down from 1-2 days to just 2 hours using an 8 worker cluster. Even faster times can be expected with more workers, all the way down to ~20 minutes.

Continue reading

How can I get data on all the dams in the world? Use Wikidata

During my first week at Newspeak house while explaining Wikidata and Wikibase to some folks on the terrace the topic of Dams came up while discussing an old project that someone had worked on. Back in the day collecting information about Dams would have been quite an effort, compiling a bunch of different data from different sources to try to get a complete worldwide view on the topic. Perhaps it is easier with Wikidata now?

Below is a very brief walkthrough of topic discovery and exploration using various Wikidata features and the SPARQL query service.

A typical known Dam

In order to get an idea of the data space for the topic within Wikidata I start with a Dam that I know about already, the Three Gorges Dam (Q12514). Using this example I can see how Dams are typically described.

Continue reading

Creating new Wikidata items with OpenRefine and Quickstatements

Following on from my blog post using OpenRefine for the first time, I continued my journey to fill Wikidata with all of the Tors on Dartmoor.

This post assumes you already have some knowledge of Wikidata, Quickstatements, and have OpenRefine setup.

Note: If you are having problems with the reconciliation service it might be worth giving this mailing list post a read!

Getting some data

I searched around for a while looking at various lists of tors on Dartmoor. Slowly I compiled a list that seemed to be quite complete from a variety of sources into a Google Sheet. This list included some initial names and rough OS Map grid coordinates(P613).

In order to load the data into OpenRefine I exported the sheet as a CSV and dragged it into OpenRefine using the same process as detailed in my previous post.

Continue reading

Using OpenRefine with Wikidata for the first time

I have long known about OpenRefine (previously Google Refine) which is a tool for working with data, manipulating and cleaning it. As of version 3.0 (May 2018), OpenRefine included a Wikidata extension, allowing for extra reconciliation and also editing of Wikidata directly (as far as I understand it). You can find some documentation on this topic on Wikidata itself.

This post serves as a summary of my initial experiences with OpenRefine, including some very basic reconciliation from a Wikidata Query Service SPARQL query, and making edits on Wikidata.

In order to follow along you should already know a little about what Wikidata is.

Starting OpenRefine

I tried out OpenRefine in two different setups both of which were easy to set up following the installation docs. The setups were on my actual machine and in a VM. For the VM I also had to use the -i option to make the service listen on a different IP. refine -i 172.23.111.140

Continue reading

Minecraft Java mod using Bukkit / Spigot

I have owned Minecraft Java for several years, but despite being a software developer, I have never looked into creating a mod, until now! This is certainly a different topic compared with my regular blog posts, but as always, I hope it will help someone somewhere.

I stumbled upon a video by one of the fastest-growing Minecraft YouTube channels (Dream) in which he quickly demonstrates creating some mods from suggestions in comments. My journey starts here, and with the fact that I can see an org.bukkit.event.Listener class imported.

This post should serve as a guide that works today, and I also now have a template bukkit mod on GitHub that you may find useful, as all Bukkit templates that I found were years out of date. However, perhaps I should have been looking for Spigot templates! Figuring all of this out only took an hour or so, and at the end of it, I was able to create a mod that left me with a world which you can see below.

Continue reading

Browser extension to clear your Facebook advert interests

Extension logo

At the end of 2018, I wrote a blog post that included some JavaScript code to quickly remove all of your Facebook advert interests from this settings page. This has started to become one of my more popular posts and so I decided to take another pass at the project and convert the code into a browser extension.

The new extension provides the user with an extra button on the ad interests page that will automatically go through and click all of the remove buttons for all of the interest tabs that appear on the top bar. The UI isn’t the best, but it is functional!

Continue reading

mediawiki-docker-dev v1 rewrite

Back in 2017 at the Wikimedia Hackathon, I played around with Docker and docker-compose in relation to MediaWiki and testing with multiple setups at once while developing, meaning multiple PHP versions, web servers and databases. My original slides can still be found here.

Since then mediawiki-docker-dev evolved into less of a testing system and more of a development environment, allowing the use of a master replica DB setup, easily swappable PHP versions, debugging and more. The project on GitHub currently has 40 stars, 38 forks and has seen 17 people contributing back.

Over the past couple of years, developer productivity and development environments have been a big discussion area. The Wikimedia technical conference in 2019 had the main topic of Developer Productivity. There have also been a few efforts in a few directions trying to figure out what is best for the majority of people. These include local-charts (Kubernetes based environment) and MediaWiki-Docker (simple docker-compose based environment).

Continue reading

Adding git bash to Windows terminal

I just saw a tweet saying that Windows terminal is now generally available, so I had to give it a try.

After downloading from the store and booting up I realized that only powershell, cmd and wsl are listed by default (and also Azure which I don’t really care about).

Clicking around the UI a little there is a settings menu item that opens a JSON configuration file in notepad. This configuration file defines the behaviours of the terminal including the profiles that can be loaded.

After a bit of searching and documentation reading I came up with this profile which I now use for my git bash installation (I hope it can help you too).

Continue reading
« Older posts

© 2020 Addshore

Theme by Anders NorĂ©nUp ↑