Online RDS column type modification, using pt-online-schema-change from EC2

I’m using percona-tools to do an online schema modification today, and thought I would document the process, especially as even the installation guides seem to be badly linked, out of date, and did not work out of the box…

EC2 instance

This is all running on a t3.micro EC2 instance with Ubuntu. I deliberately didn’t go with Amazon Linux, as I wanted to be able to use apt. For simplicities’ sake, I’ll be using the EC2 Instance Connect feature, which allows connection to a session in a web browser! (although the copy and paste via this is annoying)

This instance of course also needs access to your MySQL server, in this case an RDS instance. So I’ll go ahead and add it to the security group.

Percona toolkit

Percona Toolkit is a powerful open-source collection of advanced command-line tools designed to help MySQL and MariaDB DBAs perform tasks like online schema changes, replication troubleshooting, and data auditing safely and efficiently.

It’s used a Wikimedia for online database migrations (the reason I know about it), however I have never actually used it myself!

Read more

Profiling Wikibase APIs and import speed

There has been some recent chat once again on the Wikibase telegram groups around importing, and the best approach to import a large amount of data into a Wikibase instance. 2 years ago I started a little GitHub project aimed at profiling the speed of loading using the action API, and various settings, DB versions etc, as well as trying out a bulk load API. And I have just taken the opportunity to take another look at it and try to visualize some of the comparisons given changes through the last 2 years.

In case you don’t want to read and follow everything below, the key takeaways are:

  • EPS (edits per second) of around 150 are achievable on a single laptop
  • When testing imports, you really need to test at least 50k items to get some good figures
  • The 2 ID generation related settings are VERY IMPORTANT if you want to maximise import times
  • Make async requests, but not too many, likely tuned to the number of CPUs you have serving web requests. You wan near 100% utilization
  • A batch API, such as FrozenMink/batchingestionextension would dramaticly increase import times

Some napkin math benchmarks for smallish items, I would hope:

  • 1 million items, 2 hours (validated)
  • 10 million items, 1 day
  • Wikidata (116 million) items, 14 day+

Read more

A copy-paste go SQL mock for GORM

When it comes to writing robust and reliable tests for your Go applications, having a well-structured and efficient testing setup is crucial. One common challenge in testing Go applications is dealing with database interactions. To ensure that your code functions correctly, it’s essential to create a controlled environment for database operations during testing. In this blog post, we’ll explore how to create a mock GORM database for testing purposes, allowing you to isolate and verify your database interactions in a controlled manner.

Here is some copy-and-paste code (explained below) which should get you started.

import (
	"github.com/DATA-DOG/go-sqlmock"
	"gorm.io/gorm"
	"gorm.io/gorm/logger"
)

func NewMockGORM() (GORM *gorm.DB, close func()) {
	db, mock, err := sqlmock.New()
	if err != nil {
		panic(err)
	}

	// GORM always runs this query, so mock it for all tests
	mock.ExpectQuery("SELECT VERSION()").WillReturnRows(sqlmock.NewRows([]string{"version"}).AddRow("5.7.0"))

	GORM, err = gorm.Open(mysql.New(mysql.Config{Conn: db}), &gorm.Config{Logger: logger.Default.LogMode(logger.Silent)})
	if err != nil {
		panic(err)
	}

	return GORM, func() { db.Close() }
}Code language: PHP (php)

The code block above defines a NewMockGORM function that sets up a mock GORM database instance for testing. Let’s break down what this code does and how it can be a valuable addition to your testing toolkit.

Setting up a Mock GORM Database

Read more