Reducing Java JVM memory usage in Containers and on Kubernetes

For a while I have been running a Wikibase query service update script for WBStack, which is a Java application on a Kubernetes cluster. Part of that journey has included the updater using all available memory, hitting into the kubernetes memory limit and being OOM killed. The title of the post is a little verbose, but I wanted to include all of the keywords that might help people find the answers to the memory issues that I was running into.

Before getting into my weeds, if you want the answers head straight to https://developers.redhat.com/blog/2017/04/04/openjdk-and-containers/

UPDATE: This blog post was written with Java 8. Java 10+ now automatically recognizes memory limits and enforces them. https://www.docker.com/blog/improved-docker-container-integration-with-java-10/

Read more

WBStack 2020 Update 2 (May)

This entry is part 5 of 12 in the series WBStack

WBStack is now in its 7th month with 76 user accounts who have created 226 MediaWiki sites running Wikibase, of which 145 are currently online (81 deleted sites). 295,000 edits have now been made in total, which is an increase of 95,000 in the last month, which roughly equates to 2 edits a minute for the month.

The most active site is currently UniTest which is “a Wikibase sandbox with information about the research ecosystem”. Second and third come School of Design and Hercules Demo.

Screenshot of the WESO UniTest Main Page, 17 May 2020

Read more

Wikidata Map May – November 2019

This entry is part 13 of 17 in the series Wikidata Map

It’s time for another blog post in my Wikidata map series, this time comparing the item maps that were generated on the 13th May 2019 and 11th November 2019 (roughly 6 months). I’ll again be using Resemble.js to generate a difference image highlighting changed areas in pink, and breakdown the areas that have had the greatest change throughout the 6 month period. The full comparison image can be found here.

Differences in the Wikidata map highlights in pink for changes between May 2019 and November 2019

If you don’t know what Wikidata is, or what items are then give this page a read. This map shows all items that have a “coordinate location” as a light pixel on a black canvas. The more items with coordinates in a single pixel, the brighter that pixel. This map is generated using code that can be found here.

Read more

2019 Year Review

This entry is part 3 of 7 in the series Year Reviews

A year or so ago I decided to start making yearly posts reviewing one of my online list. I’m a bit late this year considering it is April already, but it’s been one rollercoaster after another during the start of 2020.

Blog stats

  • 23,940 page views, up from 12,374 (93% increase)
  • 16,276 visitors, up from 8,578 (89% increase)
  • 11 posts, down from 25
  • 101 comments, up from 28

It’s a shame I wrote less, but I did go travelling for 6 months of the year, so it makes sense.

Read more

WBStack 2020 Update 1

This entry is part 4 of 12 in the series WBStack

WBStack has now been up and running for 6 months. During that time it has helped 70 people create 178 MediaWiki installs running Wikibase, a SPARQL query service and quickstatements, all at the click of a button, with a total of around 200,000 edits across all sites.

The most active site is currently virus-taxonomy.wiki.opencura.com which was developed during the Virtual Biohackathon on COVID-19 as a staging environment for “improving the taxonomy of viruses on Wikidata”. It currently stands at 20,000 edits, around 7000 Items.

Screenshot of the virus-taxonomy Wikibase Main Page, 19 April 2020

Thanks again to Rhizome, who run their very own Wikibase, for their support paying the Google Cloud bill in the early stages of this project.

Read more

Automatic cleanup of old gcloud container images

I have been using Google Cloud Build for a budget project for roughly a year now. Cloud Build stores built images in a storage bucket which you are of course billed for. Within the first weeks of using it I realized that I needed some automated way to cleanup unused and old images that were built there.

At the time I had a quick search around on the web for something already implemented that I could copy, but I came up blank, and decided putting my problem off would be the best solution. I filed issue number 6 for my project and left it for future me.

Now it’s time to finally close that issue, and I hope others might also find the small bash script useful.

Read more

Add Exif data back to Facebook images – 0.10

Screenshot of the Facebook Exif tool version 0.10

In 2019 I wrote a post introducing a tool that I created to add Exif data back to images downloaded as part of a Facebook information download. The tool allowed me to download and delete my uploaded Facebook images while keeping some of the useful data such as date taken. After some Twitter pressure I have finally released an updated and slightly fixed version, and it’s time that I wrote a updated guide to go with it!

What is Exif data?

Exchangeable image file format (officially Exif) is a standard that specifies the formats for images and tags used by digital cameras and other systems handling image files.

Snipped from WIkipedia

Common Exif data for an image includes the time that it was taken, the camera make and model and the coordinate data for the location of the image.

Read more

Covid-19 Wikipedia pageviews, a first look

World events often have a dramatic impact on online services. A past example would be the death of Michael Jackson which brought down Twitter and Wikipedia and made Google believe that they were under attack according to the BBC.

Events like the COVID-19 (Coronavirus) pandemic have less instantaneous affect but trends can still be seen to change. Cloudflare recently posted about some of the internet wide traffic changes due to the pandemic and various government announcements, quarantines and lockdowns.

Currently the main English Wikipedia article for the COVID-19 pandemic is receiving roughly 1.2 million page views per day (14 per second). This article has already gone through 4 different names over the past months, and the pageview rate continues to climb.

Wikipedia pageviews tool showing English Wikipedia COVID-19 pandemic article views up to 21 March 2020 (source)

Read more

WBStack Infrastructure (2020)

This entry is part 3 of 12 in the series WBStack

UPDATE: You can find an up to date 2021 version of this post here.

WBStack currently runs on a Google Cloud Kubernetes cluster made up of 2 virtual machines, one e2-medium and one e2-standard-2. This adds up to a current total of 4 vCPUs and 12GB of memory. No Google specific services make up any part of the core platform at this stage meaning WBStack can run wherever there is a Kubernetes cluster with little to no modification.

A simplified overview of the internals can be seen in the diagram below where blue represents the Google provided services, with green representing everything running within the kubernetes cluster.

Read more

WBStack – November review

This entry is part 2 of 12 in the series WBStack

It’s been roughly 1 month since WBStack appeared online, and it’s time for a quick review of what has been happening in the first month. If you don’t already know what WBStack is, then head to my introduction post.

The number of users and wikis has slowly been increasing. In my last post I stated ” 20 users on the project with 30 Wikibase installs”. 3 weeks after that post WBStack now sits at roughly 38 users with roughly 65 wikibases. Many of these wikibases are primarily users test wikis, but that’s great, the barrier to trying out Wikibase is definitely lowered.

If you would like an invite code to try WBStack, or have any related thoughts of ideas, then please get in touch.

What’s changed

As WBStack is a shared platform, all changes mentioned in this blog post are immediately visible on all hosted Wikibases. In the future there will be various options to turn things on and off, but at this early stage things are being kept simple.

Read more