MIgrating databases without anyone knowing

Jul 26, 2023

8 min read

Development

Blog Image
Blog Image
Blog Image
Blog Image

A high-level migration guide outlining how we migrated databases from self-hosted to GCP hosted MySQL with 0 downtime.


If there's anything in our technology stack that introduces to much responsibility, technical debt, and attention for our staff it is the database and management of it. Every few months or so we like to evaluate and understand potential technical road blocks and hurdles that could keep our team from reaching goals and building our core products. In addition to this evaluation phase we also like to look at areas of high risk vs business impact. What this means is that even though there may not be things currently distracting the team from their day-to-day, there might be "what if's" surrounding a topic leading to technical debt.


Our first MySQL implementation


VOXO MySQL Open Source Implementation


Above is a repo containing ALL of the necessary components to run a multi-regional single primary async replicated MySQL cluster deployed in Kubernetes. It relies heavily or the Orchestrator project along with Consul KV. Our Kubernetes config for this can also be found here: VOXO k8s. As you can see, this project uses ProxySQL and consul-template. This stack is a power house and makes the deployment/management lifecycle completely automated and free of maintenance burden.


However, there's 2 issues with this stack:

  • I built it myself (single maintainer)

  • The Orchestrator (used by GitHub) project has been abandoned and is no longer being worked on


With these two issues looking like a double barrel shotgun waiting to fire at any moment starring us in the face, we decided to plan for a hosted MySQL migration.


The hunt for the MySQL promised land


We struck out on a discovery path to find a MySQL implementation that checked most of our boxes:

  • Automated backups

  • Point-in-time recovery

  • Self healing

  • Eventual consistent (we don't need immediate write consistency)

  • Point and click admin (or CLI/API based management)

  • Write forwarding (nice to have but not required since we run ProxySQL already)

  • 0 downtime disk space scale out

  • Regional node placement (we need the data very close to the application)

  • Local regional reads (read traffic needs to route to the closest node)

  • Single primary

  • Mostly MySQL compatible

  • Automatic primary recovery

  • Some level of maturity

  • HTAP (analytical query) nice to have


The item that heavily narrowed down the list of possibly providers below was item in bold above (regional node placement). Here is the narrowed down list that mostly checked all of the boxes ⬆️


We went with Google MySQL but let's talk about each good option ⬇️


Platnetscale


This is the most aesthetically pleasing, elegant, and proven option in my opinion (especially since we used to use Vitess in our environment). The technology behind this product originated at YouTube and provided the MySQL infrastructure for many millions of req/sec. I LOVE their migration feature currently in BETA. It is well thought out and exactly how a DB migration should be implemented. The biggest turn off with this solution and the reason we didn't go this route is because the replicas are forced to have a completely different username/password and connection string. This is a deal stopper for companies that need multi-regional databases in my opinion.


AWS Aurora Serverless V2


This is a clever solution and even has write-forwarding (which is why we considered this as an option). This is the ONLY option in this list that didn't require us to manage ProxySQL instances. This wasn't a big enough reason to go this route especially since AWS regions are few and far between compared to GCP's region choices.


Google Cloud MySQL


As a result of checking ALL of our boxes (except for HTAP), this was a clear winner. We also have LAN IP access out of the box since we are hosting in various GCP regions. It was a breeze to stand up a few nodes in each of our hosting regions and being a data migration step using the built-in GCP migration tool.


Turso


This is the most forward thinking option above in my opinion. It is super lightweight and checks ALL of the boxes in our checklist except for maturity. Embedded replicas made this product super attractive. I've met with these guys and they come from building databases in previous companies and have a great path for gaining more traction that I think anyone realizes (including myself), but the product is pretty new still. This is SQLite (or libsql) under the hood (which is fine for us since we primarily use Drizzle-ORM), however like I said, we need more time before going all-in.


TiDB


These guys are doing great stuff at massive scale. The most impressive thing about their solution is the HTAP functionality. The ability to make transactional queries just as fast as a relational database along with analytical queries was a huge selling point. Currently we use BigQuery for analytical queries and it would be nice to eventually use the same database for both query types in the future. TiDB also uses key-value to store data vs rows and columns. This method is also attractive and scary at the same time. It is attractive because it's essentially infinitely scalable but it's scary because of MySQL compatibility. They have done a good job solving the MySQL compatibility problem and we actually went into the testing phase the most with this company. There were two main reasons we didn't go this route:


  1. Cost (it was way more expensive than any of the other solutions)

  2. Uncertainty using a KV backed storage (this product needs a little more time in the market in my opinion)



The migration


I'll keep this short and sweet.

Google has a data migration tool that is pretty handy which allows you to define a source MySQL instance and it will setup the async replication to the Google MySQL primary for you. You can then create replicas of this new primary however you see fit so that's what we did. This new primary it sets up isn't actually a primary yet but will be as soon as you are ready to complete the migration. When you choose to complete the migration it sets the read_only flag to 0 in the primary and disconnects from the source. Not bad. So we set this up and created all of the replicas we needed in each region and noted the LAN IP for each one.

Since we already use ProxySQL and Consul KV and would continue to have to use ProxySQL (for read/write splitting and local regional reads) we just used our current ProxySQL cluster in each GKE region to direct traffic throughout this whole migration process. It was beautiful and here's how it worked.

Once the data migration started we let it run for a few weeks to test different failure scenarios and then proceeded to with the following steps:

  1. Added LAN IP of a couple of Google replicas in Consul KV (which ProxySQL immediately picked up and started load balancing read traffic to). We let this run for awhile to make sure performance wasn't degraded in any way and that our sub-millisecond queries were still within their threshold.

  2. Removed existing self hosted MySQL read-replica IP's from Consul KV so ALL read traffic was now hitting Google MySQL. We let this run for awhile until we felt comfortable with a primary switchover.

  3. Clicked the button in Google data migration to "Make as primary and stop source replication" which made the Google MySQL the primary. Immediately swapped the primary IP in Consul KV with the Google primary LAN IP and all write traffic for each region started being sent to the new MySQL primary in Google.



At some point in the future I can see us revisiting this as new relational database offerings gain more traction and maturity. Overall this was a somewhat painless experience ONLY because we use Consul and ProxySQL together. It would have been almost impossible without these two units.

Related Articles