Defending your database delivery pipeline against drift

Contents

TL;DR

The simplest way to prevent drift is to prohibit direct changes to the production DB. But this is not necessarily a panacea. Until the gaps in your delivery capability are identified and addressed, an adaptive approach towards drift may be required.

Drift in database delivery blog series

If, after having assessed your environment , you expect that your environment will be affected by drift at some point, you might be thinking about next steps. While there are some useful techniques and tools in this space , these tend to address the symptoms of the problem rather than its underlying causes. In this post I'll look at a couple of ways of dealing with drift in an enterprise environment, including the trade-offs involved with each approach.

The Non-negotiable approach

One method of attempting to tackle drift is to simply adopt a hard-line position on all database changes, decreeing that under no circumstances should the pipeline be bypassed in favour of out-of-process deployments (be they automated or otherwise). The obvious advantage of this type of approach is in its simplicity; by making compliance a clear-cut issue, the scope for misinterpretation is virtually nil. But while clearly communicating expectations is important, towing such a strict line may not necessarily work in your favour.

Why issuing ultimatums can be an own goal

In the previous post , I outlined some of the ways in which a database pipeline can fall short of an enterprise's delivery needs. And while there is a lot of truth to the adage if it hurts, do it more often , attempting to conquer every change use case before firmly grasping the constraints of your environment may turn out to be counterproductive.

What I've witnessed with teams that follow such a strict approach is that it tends to lead to creative workarounds being deployed to production in a covert way (e.g. schema changes performed via application code). While this may reflect deeper cultural issues in an organisation, it's important to consider the team's appetite for large scale change at the outset of any process improvement. The last thing you want to hear when embarking on such an ambitious project is the phrase "let's just go back to the old way", before you've even had a chance to show the value of your pipeline.

Fortunately there is another way to deal with drift that doesn't involve taking such a hard-line stance; instead of requiring a big bang process change to take place, it favours a more flexible approach to how the pipeline is introduced.

The Tolerance-within-limits approach

Rather than seeing drift as something to be avoided at all costs, an alternative is to build a degree of tolerance into the pipeline so it can accommodate changes applied via an alternate deployment method. At first this may seem like giving up and letting chaos reign, but as I hope to illustrate there is a difference between taking a calculated decision to relax the rules versus simply abandoning them altogether.

Understanding your constraints

Say, for example, you anticipate that you pipeline will be able to handle 9 out of 10 of the database change use cases. But every now and again that pesky 1 in 10 comes along that your pipeline doesn't support, for example:

  • When a data change requires capabilities that your dev tooling/migration framework doesn't have (e.g. the ability to atomically apply changes across database boundaries)
  • When the data change will need to be performed in batches over a number of days (more on this below)
  • When a COTS software vendor needs to apply a change to your database with a custom tool
  • When you're doing almost anything with SQL replication

Until such time as the pipeline can handle every single one of your deployment use cases (assuming that goal is achievable or even desirable), adopting a more tolerant approach toward drift may be needed.

Pragmatically dealing with drift

Think of it as approaching the problem with an adaptive mindset: by taking a hard look at your pipeline's capabilities (or lack thereof), you'll be in a better position to serve the needs of each change use case that comes along. If you're just starting to think about automating database deployments, this approach also has the benefit of lowering the barrier to entry as you'll be able to start your pipeline off with a reduced scope.

However, creating a drift-tolerant pipeline is not simply about opening up the production environment to any type of ad hoc method of change. It requires assessing the risk of each candidate change use case and putting in place the appropriate checks and balances to prevent impact on the delivery pipeline.

Part 1: Defining the pipeline scope

Start with the default position that everything should go through the delivery pipeline. Then, look at each change outlier on its merits to decide whether it is an acceptable candidate for an alternate deployment method. Consider what kind of impact that drifted objects would have on the ability to effectively develop, test and release new application functionality via the delivery pipeline. For example, including table schema changes and stored procedure/view/function changes in source control can be vitally important to the calling application (not to mention for traceability), so should usually be kept within the pipeline scope.

Conversely, anything that isn't critical to the functionality of the calling application may be a candidate for an alternate deployment method. While generalising about suitable change types is difficult, there are a few specific cases that I've seen work quite well outside of the pipeline:

  • Storage object changes: dataspaces like files and partitions may be critical to performance of the live environment, but functionally they may not at all be important to the calling application so can be excluded from the pipeline scope
  • Data transformations: if there is a need for data movement to occur over a period of days, then running it through the pipeline might hold up other changes from getting through to Production if the pipeline is not able to handle concurrent deployments, or the tool has a hard timeout (e.g. Azure DevOps will halt a release after 48 hours ).
  • Regular maintenance tasks: updating statistics and extending partition ranges are often best left to CRON jobs to perform (although the code for those jobs can always go via the pipeline)
  • Security object changes: adding users/groups to roles is often better handled outside the pipeline, especially when production is located in a separate domain. Note: I'd still recommend controlling object-level permissions via the pipeline, to avoid accidental loss of access arising from DROP/CREATE-style changes. Following role-based access control will help keep clear lines between pipeline and non-pipeline changes and also help to prevent permission loss.

When choosing which objects to accommodate via an alternate deployment method, there is a balance between risk and benefits to be struck. Don't be afraid to readjust the pipeline scope as developments tip the scales in one direction over another.

Part 2: Keeping the baseline in-synch

The practice of occasionally refreshing databases within the dev, QA and staging environments with sanitised clones of the Production database is as old as time. Whether or not you consider this a good practice, keeping lifecycle environments in-synch is generally accepted as being critical to reliable test outcomes (i.e. if it worked in dev, it should also work when released to prod). Given that drift has a tendency to arise even when the standard process is stridently followed, refreshing the lifecycle environments in this way is often a necessary evil.

And if it's necessary to refresh lifecycle environments, then it follows that refreshing source control should also be done. Especially when you consider that developers need to be able to rely on the baseline to be an accurate starting point for new changes.

Source and Target drift

For sake of clarity, I've focused on dealing with target drift in this post (i.e. the type of drift that typically results from direct changes to the production environment) as opposed to source drift (which originates in your migration scripts). However, as the following steps will actually resolve both types of drift simultaneously, your schema/data comparison tool may report some unexpected differences.
Steps for migration frameworks

If you're using a migration framework for database database (e.g. Flyway, DbUp, Redgate SQL Change Automation etc), the process for synchronizing the baseline will be along these lines:

  1. In source control, create a new branch at the tag or commit corresponding to the most recent production deployment
  2. Use your migration framework to deploy the schema from scratch to a test SQL instance
  3. Use a comparison tool, like Visual Studio Schema Compare or Redgate SQL Compare, to compare the production environment (from a snapshot file, preferably) to your test SQL instance
  4. Select the drifted objects that you wish to import back into source control, and save the generated change script as a new migration
  5. Modify the migration to ensure that it is idempotent, in order to prevent the script from attempting to perform the changes in production
  6. Commit and merge the migration file to trunk

Keeping static/reference/lookup data in-synch

To refresh lookup data from production, simply repeat the above steps, substituting the schema comparison tool mentioned in step 3 with a data comparison tool.
Steps for desired-state tools

If you're using a desired-state tool for database development (e.g. SQL Server Data Tools or SQL Source Control):

  1. In source control, create a new branch at the tag or commit corresponding to the most recent production deployment
  2. Use Visual Studio Schema Compare/Redgate SQL Compare to compare the production environment to the project folder containing your object file declarations
  3. Select the drifted objects and click the Update/Synchronize button
  4. Commit and merge the file(s) to trunk

The importance of refreshing prior to deployment

Tools that use a desired-state approach (as opposed to the incremental style of migration frameworks) can cause anything not present in source control to be reverted in the target environment at deployment time. Fortunately, the creators of these tools saw fit to build in a number of options to prevent this from occurring. For example, SSDT includes the ability to prevent deployment if drift is detected .
Steps for tools that support an external baseline

Tool vendors are starting to realise the value of simplifying the way that drift is managed. One notable example of this is Redgate SQL Change Automation's integration with their SQL Clone product which allows the baseline to be sourced from a full snapshot of the production database, as opposed to storing/maintaining the baseline as a set of migrations (Disclosure: I was previously employed by redgate). Read more about how to setup this integration .

The possibility of using an external baseline is an exciting development that I'll explore more in a later post.

Conclusion

The inevitability of drift requires an open mind about the types of solutions that can help mitigate its effects. Protecting the integrity of the pipeline starts with getting to grips with the limits of its capabilities and then channelling efforts into either addressing those limitations or allowing certain types of changes to take an alternate path to production.

While it might feel unsatisfying to let the occasional change type run outside of the pipeline, remember that it is important to not let perfect be the enemy of the good. A lot of value can be gained through automation even when it is limited in scope, and having the flexibility to choose the right tool for the job can provide significant value in of itself.

Continue reading

View a summary of key points from this blog series

Image credit: Albert Pego