Online RDS column type modification, using pt-online-schema-change from EC2
I’m using percona-tools to do an online schema modification today, and thought I would document the process, especially as even the installation guides seem to be badly linked, out of date, and did not work out of the box…
EC2 instance
This is all running on a t3.micro EC2 instance with Ubuntu. I deliberately didn’t go with Amazon Linux, as I wanted to be able to use apt. For simplicities’ sake, I’ll be using the EC2 Instance Connect feature, which allows connection to a session in a web browser! (although the copy and paste via this is annoying)
This instance of course also needs access to your MySQL server, in this case an RDS instance. So I’ll go ahead and add it to the security group.
Percona toolkit
Percona Toolkit is a powerful open-source collection of advanced command-line tools designed to help MySQL and MariaDB DBAs perform tasks like online schema changes, replication troubleshooting, and data auditing safely and efficiently.
It’s used a Wikimedia for online database migrations (the reason I know about it), however I have never actually used it myself!
Installation Resources:
- https://docs.percona.com/percona-toolkit/installation.html
- https://docs.percona.com/percona-software-repositories/installing.html
However, following these exactly doesn’t work, and I ended up running the following on my Ubuntu EC2 instance to get started…
sudo apt update
sudo apt install curl
curl -O https://repo.percona.com/apt/percona-release_latest.generic_all.deb
sudo apt install ./percona-release_latest.generic_all.deb
sudo percona-release enable tools release
sudo apt update
sudo apt install percona-toolkitCode language: JavaScript (javascript)
And we have it installed.
ubuntu@ip-1-2-3-4:~$ pt-online-schema-change --version
pt-online-schema-change 3.6.0
Using tmux
When running a long-running pt-online-schema-change on a remote server, using tmux ensures your session stays alive even if your SSH connection drops. Start by launching tmux with tmux and you’ll be inside a new session.
Essential tmux shortcuts:
- Detach session:
Ctrl+b→d - Reattach session:
tmux attach(ortmux attach -t <name>) - New window:
Ctrl+b→c - Next window:
Ctrl+b→n - Previous window:
Ctrl+b→p - Split pane horizontally:
Ctrl+b→" - Split pane vertically:
Ctrl+b→% - Move between panes:
Ctrl+b→ arrow keys - Rename session/window:
Ctrl+b→,
This way, you can keep your schema change running in one pane while monitoring logs or running queries in others — all without losing your place if your SSH session drops.
Scrolling in the AWS console can be annoying, but you can also see the whole log via a command line this, when detatched from tmux…
tmux capture-pane -pS - > ~/tmux-buffer.txt
To get started, just run tmux to start a session.
The change
Then you can simply run the change
pt-online-schema-change \
--alter "MODIFY COLUMN id INT UNSIGNED NOT NULL AUTO_INCREMENT" \
D=someDB,t=someTable \
--host 111.222.111.222:3306 \
--user aaaa \
--password xxxx \
--port 3306Code language: JavaScript (javascript)
You need to add --execute to actually run the change, but can also do --dry-run to just trial some of it.
When executing, you’ll see output like this…
No slaves found. See --recursion-method if host ip-10.9.8.7 has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
analyze_table, 10, 1
copy_rows, 10, 0.25
create_triggers, 10, 1
drop_triggers, 10, 1
swap_tables, 10, 1
update_foreign_keys, 10, 1
Altering `someDB`.`someTable`...
Creating new table...
Created new table someDB._someTable_new OK.
Altering new table...
Altered `someDB`.`_someTable_new` OK.
2025-08-12T12:20:15 Creating triggers...
2025-08-12T12:20:15 Created triggers OK.
2025-08-12T12:20:15 Copying approximately 257784808 rows...
Copying `someDB`.`someTable`: 0% 08:09:32 remain
Copying `someDB`.`someTable`: 0% 08:05:05 remain
Copying `someDB`.`someTable`: 1% 08:07:49 remain
...
Copying `someDB`.`someTable`: 98% 04:55 remain
Copying `someDB`.`someTable`: 98% 04:43 remain
Copying `someDB`.`someTable`: 98% 04:28 remain
2025-08-12T17:40:06 Copied rows OK.
2025-08-12T17:40:06 Analyzing new table...
2025-08-12T17:40:06 Swapping tables...
2025-08-12T17:40:06 Swapped original and new tables OK.
2025-08-12T17:40:06 Dropping old table...
2025-08-12T17:40:07 Dropped old table `someDB`.`_someTable_old` OK.
2025-08-12T17:40:07 Dropping triggers...
2025-08-12T17:40:07 Dropped triggers OK.
Successfully altered `someDB`.`someTable`.Code language: CSS (css)
Once completed, you’ll see your updated table, with no read or write interruptions, just some increased load during the operation…
Behind the scenes
pt-online-schema-change performs online schema modifications by creating a new table with the desired structure, then copying data from the original table in small chunks. It uses triggers on the original table to capture ongoing changes (inserts, updates, deletes) and apply them to the new table in real time. Once fully synced, it swaps the tables atomically, minimizing downtime and allowing reads and writes throughout the process.
My reality
EDIT on 19 August, once my change has actually finished…
My table was at ID 4,1500,000,000, which was nearing the maximum value of 4,294,967,295 for my unsigned int column.
The table grows by roughly 140k entries per hour, or around 40 per second.
The migration I was performing was dropping 2 columns that were no logner used, and also changing the primary key from an int unsigned to bigint unsigned…
DROP COLUMN zz_xxxConfigId,
DROP COLUMN zz_xxxDeviceId,
MODIFY COLUMN id BIGINT UNSIGNED NOT NULL AUTO_INCREMENTCode language: PHP (php)
When initially starting my migration, the tool estimated that it would take 1 day and 12 hours, however it actually took 4 days and 6 hours. Throughout the execution, I saw the ETA slowly creep up, until roughly halfway through, at which point it actually started to decrease again.
Another thing of note for my migration was that the last copying line which said 99% and 3 seconds remaining actually remained as my last output from the tool for about 4 or so hours. During this final time, I was still able to see the inserts from the tool happening in the process list, and the ID there was still slowly approaching the current MAX id of the table, but it still had 50+ million rows to go…
INSERT LOW_PRIORITY IGNORE INTO `db`.`_xxx_new` (`id`, `a`, `b`, `c`, `d`)
SELECT `id`, `a`, `b`, `c`, `d` FROM `db`.`xxx`
FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '4175516775')) AND ((`id` <= '4175519539'))
LOCK IN SHARE MODE /*pt-online-schema-change 52232 copy nibble*/Code language: JavaScript (javascript)
Once this caught up, despite the continual insertions to the table, the tool correctly swapper them around, dropping the triggers, and tada the migration was done.
Overall, my change saw:
- 34GB index increase on the table when converting to bigint unsigned (from int unsigned)
- 17GB data decrease on the table when dropping 2 empty columns
- Overall, less than a 15GB increase for the change
Rather watch a video?
I have not watched this one, but planet scale are likely to be trusted!