WBStack is a platform allowing shared scalable hosting of Wikibase and surrounding services.
A year ago I made an initial post covering the state of WBStack infrastructure. Since then some things have changed, and I have also had more time to create a clear diagram. So it is time for the 2021 edition.
I have been using the bitnami mariadb docker images and helmfiles for just over a year now in a personal project (wbstack). I have 1 master and 1 replica setup in a cluster serving all of my SQL needs. As the project grew disk space became pressing and from an early time I has to start automatically purging the bin logs setting expire_logs_days to 14. This meant that I could no longer easily scale up the cluster, as new replicas would not be able to entirely build themselves.
The walkthrough was performed on a Google Kubernetes Engine cluster using the 7.3.16 bitnami/mariadb helm charts which contain the 10.3.22-debian-10-r92 bitnami/mariadb docker image. So if you are using something newer expect some differences, but in principle it should all work the same.
For a while I have been running a Wikibase query service update script for WBStack, which is a Java application on a Kubernetes cluster. Part of that journey has included the updater using all available memory, hitting into the kubernetes memory limit and being OOM killed. The title of the post is a little verbose, but I wanted to include all of the keywords that might help people find the answers to the memory issues that I was running into.
UPDATE: You can find an up to date 2021 version of this post here.
WBStack currently runs on a Google Cloud Kubernetes cluster made up of 2 virtual machines, one e2-medium and one e2-standard-2. This adds up to a current total of 4 vCPUs and 12GB of memory. No Google specific services make up any part of the core platform at this stage meaning WBStack can run wherever there is a Kubernetes cluster with little to no modification.
A simplified overview of the internals can be seen in the diagram below where blue represents the Google provided services, with green representing everything running within the kubernetes cluster.
While working on a new Mediawiki project, and trying to setup a Kubernetes cluster on Wikimedia Cloud VPS to run it on, I hit a couple of snags. These were mainly to do with ingress into the cluster through a single static IP address and some sort of load balancer, which is usually provided by your cloud provider. I faffed around with various NodePort things, custom load balancer setups and ingress configurations before finally getting to a solution that worked for me using ingress and a traefik load balancer.
Below you’ll find my walk through, which works on Wikimedia Cloud VPS. Cloud VPS is an openstack powered public cloud solution. The walkthrough should also work for any other VPS host or a bare metal setup with few or no alterations.