The Importance of Being Agile in the Database World

The Importance of Being Agile in the Database World

·

6 min read

We all know DevOps. We build small and well-tested increments, we deploy them often, and we automate pipelines to not worry about manual steps. We build monitoring around our applications, introduce alerts, roll back offending changes, and get notifications when things do not work well. However, we don’t have the same abilities around our databases. We may have a hard time debugging performance issues. We may lack specific insights into what happens and why databases slow down. Even schema migrations and modifications may simply get out of control. Addressing these challenges requires a focus on strategies that enhance schema migration and adaptation, enabling us to modify database structures efficiently without significant downtime or performance degradation. This agility in managing database schema changes is key to maintaining speed and flexibility in our database strategies. But how can we move fast around databases? How can we be agile in the database world? Read on to see.

Automate Your Testing Today

Many things can break around databases. Just like we test our applications, we need to test our databases. However, our testing suites are not well-suited for doing that. We tend to verify if our applications read and write the right data, however, we disregard how they do that. We don’t check if proper indexes were used or if we sent one query instead of loading data lazily. We don’t verify how many rows were read but we focus on how many of them were returned by the database. Similarly, we don’t check our procedures around rollbacks and we risk data loss with each such modification. To be agile, we need to cover everything with automated tests that can capture issues without our intervention.

We may also think that our load tests will capture the issues. This is indeed correct as the load tests can find problems around the performance and if our queries are fast enough for production. However, load tests pose their problems. They are very expensive to build and maintain. We need to deal with GDPR, data anonymity, and stateful applications. However, there is another issue with load tests. They happen far too late in the pipeline. When they find issues, we need to go back to the whiteboard and start the implementation from scratch. Load tests happen after we already implemented our changes, reviewed them, and merged them into the repository. Last but not least, load tests take too long to execute. We need to run them for hours to fill the caches and verify if our applications are reliable.

Similarly, schema migrations are not covered in our tests. We run test suites after we have already finished the migrations. We don’t check how long they took to complete and if they caused table rewrites. If there are any performance issues during migrations, we won’t find them with our testing pyramid. However, we’ll immediately notice the problem when we deploy to production.

Recommended reading: Common Challenges in Schema Migration & How To Overcome Them

We use too small databases in our testing procedures. We don’t capture performance issues early and we waste too much time running load tests. In some cases like schema migrations, we have no tests at all to verify if they’re okay to be deployed. This decreases our velocity, breaks our applications, and stops us from being agile. We need a solution to all these issues. The solution is database guardrails.

Database guardrails can analyze our queries, schema migrations, configurations, and database designs right when we’re working on the code changes. We don’t need to run our commits through the pipelines or load tests. Instead, we can get the checks done entirely in our IDE and developer environments. Database guardrails use observability and projection to the production database to check the execution plans, statistics, and configurations to verify if all is going to work well after deployment.

Understanding Instead of Watching

Once we deploy to production, many things can change over time. CPU load may spike, memory usage may increase, data size may grow, and data distribution may change. We need to notice these issues early and work fast to fix them. However, that is not enough. We shouldn’t accept how monitoring tools work today. They just swamp us with raw signals and expect us to do the reasoning. They only show that the CPU load increased but they don’t present explain why it happened. We are the ones responsible for investigating the issue and explaining the root cause. This must change.

To move fast, we can’t troubleshoot on our own. We need to move from monitoring to observability. We need to get actionable insights instead of raw signals. We need to move from monitoring to observability and from seeing to understanding. Database guardrails can do that for us. They can connect the dots and show us how things interconnect, what the problem is, and how to fix it. Instead of “seeing the CPU load increasing”, we can “understand that the last deployment changed the query which in turn caused the index to not be used anymore which resulted in higher CPU usage”. The solution is to fix the query or the index.

Recommended reading: Observability vs Monitoring: Key Differences & How They Pair

The next step we need to take is to move from automated investigating to automated fixing. Many issues can be resolved automatically if we integrate our systems well. Observability tools can reason about performance and reliability, and they can create code or configuration changes on our behalf. These changes can be applied automatically, or we may need to approve them explicitly. This way, we can have the issues fixed immediately without any work done on our end.

Last but not least, we need to prevent issues from happening. We can’t move fast if we need to roll back often or if we fail along the way. Our goal is not to fix issues fast. Our ultimate goal is not to have the issues at all. This is what it takes to be agile. This may be hard to achieve and we may need to take many steps to get there, but this is the north start we’re aiming for.

Metis lets you avoid all these problems. Metis can check your changes even before you commit them to the repository. It analyzes queries, schema migrations, execution plans, performance, and correctness of everything you push through your pipelines Metis integrates with your CI/CD and prevents bad changes from reaching production. But it doesn’t stop there. Metis gives you observability around your production database and connects all the dots for you. It analyzes database-oriented metrics, monitors deployments, extensions, and configurations, fixes issues on your behalf, and alerts you when things can’t be fixed automatically. This way you can move faster and automate everything in your CI/CD pipeline.

Let’s Do It Now

Being agile in the database world is about preventing issues from happening, moving towards automated understanding and fixing, and having database-oriented checks along the way. We can’t use the same tools and processes as we did many years ago. We need to adopt a new approach and have tools that support us. Database guardrails give us all of that. They can protect our developers from creating slow code, analyze schemas and configurations, and verify every part of software development in our pipelines. They can also turn raw monitoring signals into automated understanding explaining what happened and how to fix it. We need that no matter where we work. The world will not get less complex and we need to build new tools and processes to move even faster.