Addshore

It's a blog

Page 2 of 13

2020 Year Review

Another year is up, and what a year it has been. I finally open sourced wbstack, I complained about fake news, looked at Minecraft mods and took a look at how COVID-19 was affecting Wikipedia page views.

I make this post mainly for me to be able to look back at each year in a small snapshot. You can find similar posts for previous years in 2019, 2018 and 2017.

Currently I generate this post in a very manual way, sifting through data from WordPress stats, Twitter Analytics and my Github user page. Maybe I should change that for next year!

Blogging

On the whole this blog continues to grow year on year, both in terms of content and readers.

Continue reading

Google outage article by The Express ‘This could be 9/11 of hacks’

Image

I’m here after a certain Google outage lead to at least 1 sensational headline misleading some people that contact me asking for an opinion. I was aware of the outage at the time as I was trying to use Google products. The article headline that I dive into below just made me laugh at the time and I had to dive into it a bit more.

On the 14th December 2020 Google had a pretty large outage for nearly an hour due to problems with their User ID service, which makes up part of their authentication infrastructure. The postmortem of the incident is up explaining exactly what happened, as well as a less technical blog post.

On the day, and following the incident, there was quite a bit of media coverage on the topic. One article by The Daily Express stood out to me really aimed to mislead with its headline: Google DOWN: ‘This could be 9/11 of hacks’ Security expert admits grave concerns.

Continue reading

Auto reloading pi kiosk script from Github

While at Newspeak House in 2020 I found myself wanting to change how the screens dotted around the place worked. A little bit of context is needed here. These screens were dotted around the communal areas, each attached to as raspberry pi, and each running a kiosk script to load a browser and website when they first boot up. The code for the screens is on Github, and the pis do not have SSH enabled…

I wanted to change the website that they pointed to. In essence this mean going around and modifying the kiosk script on 6 or so pis using a small bluetooth keyboard and mouse. While doing that, to avoid anyone needing to do it again in the future, I modified the kiosk script to automatically reload itself from Github.

Continue reading

Open Sourcing WBStack

wbstack organization on Github

Open Sourcing the code and config for WBStack has always been part of the plan, although functionality came first throughout the first year or so. Finally there is a github organization for wbstack containing 16 public repositories that make up the entire deployment for wbstack.com.

This effort took a few weeks trying to split sensible components out of the original mono repo that was started back in 2017 that now has over 1600 commits, making sure that no secrets were swept up along the way, and also trying to preserve git history where possible.

Although everything is now on Github that doesn’t mean that it is clearly understandable just yet, or in the most sensible layout, that will come with time.

Continue reading

Creating a new replica after purging binlogs with bitnami mariadb docker images

I have been using the bitnami mariadb docker images and helmfiles for just over a year now in a personal project (wbstack). I have 1 master and 1 replica setup in a cluster serving all of my SQL needs. As the project grew disk space became pressing and from an early time I has to start automatically purging the bin logs setting expire_logs_days to 14. This meant that I could no longer easily scale up the cluster, as new replicas would not be able to entirely build themselves.

This blog post walks through the way that I ended up creating a new replica from my master after my replica corrupt itself and I was all out of binlogs. This directly relates to the Github issue on the bitnami docker images of https://github.com/bitnami/bitnami-docker-mariadb/issues/177

The walkthrough was performed on a Google Kubernetes Engine cluster using the 7.3.16 bitnami/mariadb helm charts which contain the 10.3.22-debian-10-r92 bitnami/mariadb docker image. So if you are using something newer expect some differences, but in principle it should all work the same.

Continue reading

2020 Election, Registered voters misinformation #voterfraud?

On November 4th 2020 I managed to get an overview of exactly how misinformation and “fake news” can start so accidently, and spread so rapidly.

While scrolling through Twitter during the 2020 US Presidential election, I spotted some tweets saying that more people had voted in Wisconsin than were originally registered in the state. You can find a bunch of them using this twitter search.

After performing a quick Google search looking for some data I found a worldpopulationreview.com list of states by registered voter count for 2020 as the first result, interestingly with the same value as included in the tweet, 3,129,000. Looking into the “Sources” of the page helpfully listed by the author I couldn’t see data being referenced for 2020, only for 2018 and 2016. This page has the wrong title!

Some more research lead me to what appeared to be the first fact check article also confirming that the number being circulated appeared to be from 2018, not 2020.

Rather than leaving it there, for whatever reason I decided to get more involved, dig a little deeper, talk to some people on twitter and see what I could change as this misinformation continued to be spread.

Continue reading

Faster munging for the Wikidata Query Service using Hadoop

The Wikidata query service is a public SPARQL endpoint for querying all of the data contained within Wikidata. In a previous blog post I walked through how to set up a complete copy of this query service. One of the steps in this process is the munge step. This performs some pre-processing on the RDF dump that comes directly from Wikidata.

Back in 2019 this step took 20 hours and now takes somewhere between 1-2 days as Wikidata has continued to grow. The original munge step (munge.sh) makes use of only a single CPU. The WMF has been experimenting for some time with performing this step in their Hadoop cluster as part of their modern update mechanism (streaming updater). An additional patch has now also made this useful for the current default load process (using loadData.sh).

This post walks through using the new Hadoop based munge step with the latest Wikidata TTL dump on Google clouds Dataproc service. This cuts the munge time down from 1-2 days to just 2 hours using an 8 worker cluster. Even faster times can be expected with more workers, all the way down to ~20 minutes.

Continue reading

How can I get data on all the dams in the world? Use Wikidata

During my first week at Newspeak house while explaining Wikidata and Wikibase to some folks on the terrace the topic of Dams came up while discussing an old project that someone had worked on. Back in the day collecting information about Dams would have been quite an effort, compiling a bunch of different data from different sources to try to get a complete worldwide view on the topic. Perhaps it is easier with Wikidata now?

Below is a very brief walkthrough of topic discovery and exploration using various Wikidata features and the SPARQL query service.

A typical known Dam

In order to get an idea of the data space for the topic within Wikidata I start with a Dam that I know about already, the Three Gorges Dam (Q12514). Using this example I can see how Dams are typically described.

Continue reading

Creating new Wikidata items with OpenRefine and Quickstatements

Following on from my blog post using OpenRefine for the first time, I continued my journey to fill Wikidata with all of the Tors on Dartmoor.

This post assumes you already have some knowledge of Wikidata, Quickstatements, and have OpenRefine setup.

Note: If you are having problems with the reconciliation service it might be worth giving this mailing list post a read!

Getting some data

I searched around for a while looking at various lists of tors on Dartmoor. Slowly I compiled a list that seemed to be quite complete from a variety of sources into a Google Sheet. This list included some initial names and rough OS Map grid coordinates(P613).

In order to load the data into OpenRefine I exported the sheet as a CSV and dragged it into OpenRefine using the same process as detailed in my previous post.

Continue reading
« Older posts Newer posts »

© 2021 Addshore

Theme by Anders NorénUp ↑