Creating a new replica after purging binlogs with bitnami mariadb docker images

November 15, 2020 0 By addshore

I have been using the bitnami mariadb docker images and helmfiles for just over a year now in a personal project (wbstack). I have 1 master and 1 replica setup in a cluster serving all of my SQL needs. As the project grew disk space became pressing and from an early time I has to start automatically purging the bin logs setting expire_logs_days to 14. This meant that I could no longer easily scale up the cluster, as new replicas would not be able to entirely build themselves.

This blog post walks through the way that I ended up creating a new replica from my master after my replica corrupt itself and I was all out of binlogs. This directly relates to the Github issue on the bitnami docker images of https://github.com/bitnami/bitnami-docker-mariadb/issues/177

The walkthrough was performed on a Google Kubernetes Engine cluster using the 7.3.16 bitnami/mariadb helm charts which contain the 10.3.22-debian-10-r92 bitnami/mariadb docker image. So if you are using something newer expect some differences, but in principle it should all work the same.

First failed attempt

My first failed attempt involved starting up an empty replica and trying to dump all of the data across following general best practices for this sort of thing, such as using mysqldump, or mariabackup, or a data copy (mariadb docs, mysql docs).

Though seemingly I hit into the issue described on Github where most users are deleted as part of the startup process, meaning you can’t really do a lot with the image alone.

However, root login is not available on the slave instance because all users other than mysql.sys are deleted during mysql_initialize() in libmysql.sh.

bitnami docker-mariadb issue #177

I’m sure there is probably some fix that can be made inside the docker images to make this easier, but rather than wait for that lets hack around it…

Solution requirements

This solution needs a working master that you can set into read only mode for a period. That length of time will be determined by the size of your data and copy method.

You also need somewhere to copy the entire /bitnami/mariadb directory to from your master.

Depending on how long the whole process takes you’ll also need binlogs for the time period of this process, as at the end the new replica will need to catch up from the point of the first step. For me this was some minutes, for others this could be longer.

In my case I had a working master, and a replica that was ~ 2 weeks old, and thus I no longer had the binlogs to catch up with. The replica with old data is not used as part of the creation process, so the process will already work for creating a replica when only a master exists.

Snapshot data with clear position

All of the needed data for the replica, or any other host, is /bitnami/mariadb directory of the master, so in essence, you need to grab a copy of that in a consistent state without causing too much service disruption.

To do this set the master into read only mode and flush the binary logs to disk.

FLUSH TABLES WITH READ LOCK;
SET GLOBAL read_only = ON;
FLUSH BINARY LOGS;Code language: PHP (php)

Then take a copy of the /bitnami/mariadb directory. Depending on your deployment solution you might need to do this in one of a variety of ways. If you can just copy the directory, do so.

Remember that this this step will have your master in a read only state for some period of time. The length of time to copy the data will depend on your method of copying, data size etc.

Once the copy / snapshot is complete you can turn read only mode off.

SET GLOBAL read_only = OFF;
UNLOCK TABLES;Code language: PHP (php)

Using the helmcharts on GKE?

If using helmcharts on a GKE cluster making use of persistent disks you can use something like this to find and create a snapshot of your disk.

# Get the name of the PVC claim for the node
kubectl get pvc -l app=mariadb,release=sql,component=master

# Match that PVC claim up to a gcloud disk in this list
gcloud compute disks list

# Create a snapshot of that disk
DISK_TO_COPY=gke-cluster-1-fe067d2a-pvc-fe32db80-h5a3-55r9-9851-74610g8y88g5
gcloud compute disks snapshot \
    ${DISK_TO_COPY} \
    --snapshot-names=tmp-sql-snapshot-1
 \
    --project=PROJECTNAME--zone=us-east1-b --storage-location=us-east1Code language: PHP (php)

Overwrite new replica data with the copy

You need to copy this snapshotted /bitnami/mariadb over the /bitnami/mariadb of newly created and non operational (because of the issues explained above) replica.

When just deploying docker images this could be as simply as copying the data to a new directory on disk that can later be mounted by the replica service, or to a new docker volume.

Using the helmcharts on GKE?

If using helmcharts of a GKE cluster you’ll need to scale up your replica pool so that a new PVC and disk is allocated. You can then scale this replica back down so that the PVC and disk remains, but is not being accessed by the service. Then you can follow these probably over complicated steps to get the data where you need it without messing anything up.

# Restore the snapshot to a temporary disk
SNAPSHOT_NAME=tmp-sql-snapshot-1
SNAPSHOT_SIZE=$(gcloud compute snapshots describe ${SNAPSHOT_NAME} --format=json | jq -r .diskSizeGb) && echo $SNAPSHOT_SIZE
gcloud compute disks create tmp-snapshot-restore-1 \
    --size ${SNAPSHOT_SIZE} \
    --source-snapshot ${SNAPSHOT_NAME} \
    --zone us-east1-b \
    --type pd-ssd

# Get the name of the PVC claim for the node
 & from that the gcloud disk for the replica
kubectl get pvc -l app=mariadb,release=sql,component=slave
gcloud compute disks list
RESTORE_TO_DISK=gke-cluster-1-hh044d7c-pvc-0e05e05c-1b8g-11gg-j104-92648o8v022cf

# Create a small VM that has this snapshot disk & the new replica disk mounted
gcloud compute instances create tmp-snapshot-restore-machine \
    --machine-type=f1-micro \
    --boot-disk-size=10GB \
    --zone=us-east1-b \
    --disk=name=tmp-snapshot-restore-1,mode=ro,device-name=from \
    --disk=name=${RESTORE_TO_DISK},mode=rw,device-name=to

# Connect to the VM
gcloud compute ssh tmp-snapshot-restore-machine --zone=us-east1-b
Code language: PHP (php)

On that machine copy the data over:

# See the devices
lsblk

# Mount them
sudo mkdir -p /srv/source /srv/target
sudo mount -o ro,noload /dev/sdb /srv/source
sudo mount -o rw /dev/sdc /srv/target

# Copy everything over
# (~8 mins for 16GB disks (13GB full))
# (~14 mins for 30GB disks (15GB full) )
sudo rm -rf /srv/target/*
sudo cp -a /srv/source/. /srv/target/Code language: PHP (php)

And then delete the machine once the copy is complete:

gcloud compute instances delete tmp-snapshot-restore-machine --zone=us-east1-bCode language: JavaScript (javascript)

Starting the replica

This “replica” is now in quite an odd state, having data and some settings of a master, but startup commands on a replica., so we need to fix that. In order to stop replication attempting to run on startup (and failing) you’ll need to use the –skip-slave-start option. If you have some liveness checks you may also want to turn these off.

You can then look at the binlogs that exist on the actual master and the replica in creation to figure out where replication needs to start. This should be fairly easy to see as the master will have started a new binlog when it was set to readonly and snapshotted.

SHOW BINARY LOGS;

You want to start replication from the newest binlog that exists on the master but not on our new replica. You need to find the starting position from that binlog (probably 4?). To do that you can look at the first few entries.

 SHOW BINLOG EVENTS IN 'mysql-bin.000381' LIMIT 10;Code language: JavaScript (javascript)

Using a mysql client on the new replica you can then setup replication. The values in these commands come from either the binary log checks above, or the general environment variables / settings you are using. You can read more about these commands here.

CHANGE MASTER TO MASTER_HOST='XXX',
MASTER_PORT=XXX,
MASTER_USER='XXX',
MASTER_PASSWORD='XXX',
MASTER_CONNECT_RETRY=10,
MASTER_LOG_FILE='XXX',
MASTER_LOG_POS=XXX;Code language: JavaScript (javascript)

And start replication.

START SLAVE;
SHOW SLAVE STATUS\G;

Remember that we started this service with the –skip-slave-start option, so be sure to remove that and restart the service once replication is all caught up and everything is working.

Using the helmcharts on GKE?

If you are using helmcharts on GKE for this section you might find the following values snippet useful.

slave:
  extraFlags: --skip-slave-start
  livenessProbe:
    enabled: falseCode language: JavaScript (javascript)

Cleanup

Remember that you have left a bunch of copies of data all over the place that you probably want to cleanup.

Also note that your new replica has all of the master binlogs on it taking up space, you probably want to go ahead and delete these now (I’m not sure if mariadb in replication mode will automatically purge these?)

Final thoughts

A simplified version of this solution would also work for creating a new replica from an existing replica. This would avoid the need for readonly time on the master and also for lots of the “faff” covered above. You should just be able to 1) pause replication 2) snapshot 3) copy to the new location 4) startup the new replica.

Using a snapshot of the master in this way to get around the Github issue discussed could be used in other ways too. The main problem with the Github issue is that the replicas startup with no users to do anything with. You could take a snapshot with inconsistent data (no readonly) to fire up a replica just so that you have a root user etc. From there you could follow normal replication creation guides using mysqldump or similar.