Platform Engineers Must Change Developers and Databases - And Here Is How

Platform Engineers Must Change Developers and Databases - And Here Is How

·

10 min read

We went through a big transformation in recent decades, we learned how to deliver the software better, and we devised new practices and patterns to streamline the work between the teams. However, the current software development lifecycle (SDLC) is still far from being perfect. Teams spend much time on the handoffs of the artifacts. They communicate inefficiently, and they lack proper checks early in the pipeline. Platform Engineers can reshape the industry by pushing left guardrails to protect production, however, they need to have proper tools and processes. In this article, we are going to see what Platform Engineers can do to make Developers’ lives better.

The Inefficiency of Today’s Software Engineering

Twenty years ago, we had no cloud environments. Most of the software was running on the on-premise servers. Applications were mostly contained and limited to only a few blocks. One application had one database, a couple of web servers, and file storage. When there was a bug, we knew where to look for the logs and traces, and how to troubleshoot.

However, we didn’t have teams that were capable of delivering the software efficiently. Developers worked on their changes, then contained them in changesets that were delivered to system engineers to be deployed later on. Effectively, Developers were not involved in the process of deployment and maintenance. When there was a bug, system engineers had to step in and troubleshoot the issue. When Developers were needed, the communication barrier was inevitable and led to a much longer remediation process.

We learned that Developers can’t be excluded from the deployment and maintenance. Initially, we just postulated that both Developers and system engineers should work hand-in-hand to streamline the operations. We called it DevOps and saw that it was good. However, later on, we realized that it’s still inefficient when two teams work together but are still independent and separate. This is how DevOps Engineers emerged and this is how we decided to merge competencies to make both the development and the deployment much smoother. Now, DevOps Engineers can develop the business code and deploy it. They can deploy the cloud infrastructure using Infrastructure as Code (IaC) solutions. They have the tools and the processes to efficiently develop, deploy, and troubleshoot their web applications and infrastructure.

Recommended reading: How To Prevent Database Problems From Hitting Production?

The world has changed significantly now. We use microservices, each small application has an independent database, and the services talk to each other. When there is a bug, it is unclear where to look for it. The complexity of communication is higher than ever before, the systems are heavily distributed, and signals are scattered throughout the ecosystem. While we can deploy components much faster now, we struggle to manage the complexity. We don’t have effective solutions to prevent the issues from happening in production. We lack processes that would streamline the debugging. Most importantly, we can’t scale our teams.

When an application had just one database, everything was easier. A team of Database Administrators was enough to manage the database, configure replication, maintenance tasks, partitioning, and optimizations. Deployments were rare, so it was easy to notice bugs and help with performance issues. However, today we deploy many applications and we do that many times every day. If there is a performance drop, then it’s unclear which microservice is responsible for that. Even if we can pinpoint the problem, we can’t scale the team of Database Administrators to fix all the issues in time. We lack the tools and processes to do so.

Platform Engineers need to change that. Just like we learned that both Developers and System Engineers can work together and then we turned them into DevOps Engineers, we need to give Developers the ownership of all the components they own. They already own the deployment, CI/CD pipelines, and the IaC pieces. However, they still don’t own the databases. They can’t take control of their data because they can’t easily monitor them, they don’t have tools to automatically troubleshoot the issues, and they lack a process aligned with their CI/CD pipelines that would prevent the faulty changes from being deployed to production. However, Platform Engineers can change that. Let’s see how.

How to Give Developers the Ownership

There are three parts that Platform Engineers need to cover to let Developers own their databases. They are:

  • Tools and processes for preventing the faulty code from reaching the production

  • Means for semantic monitoring of the databases

  • Solutions for automated troubleshooting

Let’s go through these areas one by one to understand what we need.

Preventing the Faulty Code from Reaching the Production

Many issues that can happen around databases without noticing. N+1 queries problem, lack of or too many indexes, eager loading vs lazy loading in ORMs, schema migrations, and impedance mismatch – just to name a few. We covered them in some greater detail in our article about ORMs’ bad sides.

It’s important to understand that Developers can’t prevent these problems. They don’t have good tools and processes that would let them spot performance issues when they develop their applications. They don’t test databases properly, as we described in our article how to test databases. They don’t do that because current CI/CD solutions and the testing pyramid are unable to detect those issues. Unit and integration tests focus on data correctness. They won’t check if we face the n+1 queries problem, whether we use an index, or if we decrease the performance by rewriting the code to use Common Table Expression (CTE). Unit and integration tests verify if we get the right data, not if we get the data right. Load tests can help; however, they happen way too late in the pipeline. They are executed nearly at the end of the deployment pipeline, far after we wrote, reviewed, and merged the code. While load tests can identify some issues, they don’t save Developers’ time.

However, as we described in our database testing tutorial, we can build proper database guardrails that would help developers detect these issues early and push all the checks to the left as much as possible. Database guardrails can detect unused indexes, wrong configuration, performance issues, and wrong ORM settings right when it’s needed – when Developers write their code. We can do all of that before Developers make a single commit which significantly decreases the turnaround time. This way Developers can finally own the performance of their databases and have tools that let them own the performance aspect without hampering their productivity.

Semantic Monitoring of Databases

Another aspect that Platform Engineers need to handle to let Developers own their databases is monitoring. Current monitoring solutions are far from being perfect. They swamp users with raw data, aggregate signals and hide problems in specific user cohorts, or they don’t allow for easy debugging to figure out where the issues are. We explain this problem more in our article covering the difference between monitoring and observability.

To let Developers own their databases, we need to build tools that are aware of database-related activities and how Developers work with them. Database Monitoring tools need to understand schema migrations, maintenance tasks, different hosting methods, multi-tenancy applications, database extensions, configuration, and many other aspects. We can’t ask Developers to own their databases if monitoring tools swamp them with raw data without an explanation of what actually happens in the system.

However, Platform Engineers can move from telemetry and monitoring to understanding and observability, as we described in our article covering observability. Platform Engineers can incorporate tools that are database-aware, and then let Developers use them. Once it’s done, Developers can finally monitor their databases and understand how they evolve over time.

Automated Troubleshooting

Last but not least, Developers can’t own their databases if they need to do all the hard work and manual steps. Setting thresholds, configuring alarms, reviewing dashboards, or correlating queries with REST commands – all these activities can be automated. Instead of having monitoring systems that report “high CPU usage”, we need to have a full story like “we deployed these changes to the production that modified the data distribution which in turn made the application not use index anymore due to outdated execution plan when we execute the query in line 123”. This is the full story that we need.

Platform Engineers need to provide Developers with tools that tell the story instead of just explaining the symptoms. This way Developers will be able to fix issues much faster, without spending time on tedious and mundane troubleshooting that involves getting logs from many places and using grep to look for correlation IDs. This needs to be automated based on what we know about databases. We described various strategies to improve database performance and the goal is to have them automated in the system. Once we have all these three areas covered with database guardrails, we can let Developers own their databases again. Let’s see what benefits we can get.

Benefits of Ownership with Database Guardrails

The most important benefit of using database guardrails and letting Developers own their databases is scalability. We don’t limit our teams anymore, we unlock their full potential and let them move at their fastest pace. Since they don’t need to work with other teams that lack the full context, they can work much faster and reduce the communication burden. In the same way, we learned that decreasing the communication time between Developers and System Engineers was just the first step and we had to turn them into DevOps Engineers, we needed to remove the dependency on other teams. Developers don’t need to depend on System Engineers or Database Administrators anymore. Developers can finally maintain their databases.

This leads to much faster evolution. Since each database is now owned by the microservice owner, any issues with the database are also owned and resolved by the owner. We don’t need to centralize performance management anymore; we don’t need to keep teams of Database Administrators who know how to optimize but can’t cope with the pace.

Yet another aspect is reducing the bus factor. Since the knowledge of the database is now held within one Database Administrators team, we don’t need to worry about people leaving the company or taking longer vacations. Teams can manage the database task handover just like they manage the regular development workstreams. This unlocks the potential of the agile process. Database tasks become yet other tasks that can be handled with scrum methodology.

Finally, Developers owning their databases can minimize the time needed for identifying and fixing the database issues. Developers don’t need to spend time on slow and mundane tasks anymore. They immediately see the issues (thanks to semantic monitoring), they know the full story (thanks to automated troubleshooting), and they can just fix the problems on their own. No more war rooms or call bridges to understand what’s going on.

Next Steps and Look into the Future

Database guardrails make a new era for Developers and databases, however, it’s just the beginning. Thanks to machine learning, we can later turn automated troubleshooting into automated code changes. Just like we have static code analysis finding common issues in our programming languages, we can have tools that submit automated pull requests fixing typical issues. This can be done automatically based on the production database data that we can capture automatically. Instead of raising a ticket to change the ORM configuration, database guardrails can change the code automatically and just ask for approval.

Developers owning their databases will also be able to use their best practices around CI/CD to improve the state of databases. The testing pyramid will be extended to check not only business requirements saying “what to do” but also “how to do that”. We will not only do the right thing, but we will also do things right.

Finally, we’ll be able to decrease the communication bottlenecks between teams and across different roles. We’ll be able to move from DevOps to DevDbOps. That’s the direction we need to take to unlock Developers’ potential

Summary

The world has become much more complex in recent years. We have more databases, more services, more communication, and more moving pieces. Just like moving to DevOps and building CI/CD with Infrastructure as Code to deploy changes faster, we need to incorporate database guardrails to let Developers own their databases. It’s the role of Platform Engineers to push this new approach within their companies. Thanks to database guardrails and Metis, this is now easier than ever.