Unit Testing for Data Warehousing / Database Development
In the past few years, I learned much about collaborative data warehouse development and deployment automatization by using Database Projects (SSDT) and Azure DevOps (and other tools).
I had my fair share of learning curves, making mistakes, and having great learning opportunities. Lately, I started my next journey to learn about Unit Testing for data warehousing/database development.
In this data cookbook (blog post series), we will discover the wonderful world and different flavors of unit testing from a data perspective. In the coming weeks/months, new data recipes (blog posts) will be released bi-weekly.
Why Unit Testing?
The concept of unit testing in software development exists already for a long time. Though for database development it is less known and popular. However, unit testing can be very powerful and important for several reasons.
The first two reasons that I want to highlight are data accuracy and integrity, I mean, which data engineer didn’t ever make a mistake during development in one of the SQL scripts that caused duplication of records? Or, that you configured a parameter the wrong way? By creating unit tests, you can catch a lot of these mistakes.
Unit testing can be very helpful, the use of them can identify coding issues early in the development process, which can help for the overall reliability of data warehouse or database development work.
Key-Concepts
Before we get started with the practical side of unit testing, we need to understand the fundamentals and key concepts.
To get started, we need to identify testable functionality, a dimension load for example, updating a specific column in a database by using a Stored Procedure… The most important is that we need to be able to test these components completely isolated.
To get this done, we will need to mock our data, which needs to cover various scenarios, for example to unit test a fact table load for sales, we need test data for direct sales, returns… depending on the environment. All my examples are with a retail environment in mind.
For every functionality, we will need to write different test cases where we will be defining our input data, the expected output data, and the assertions we are making. (how our input data should look like after executing the functionality) For example, we mock input data for sales with a sales transaction and how the LOB system would generate it. We also prepare the expected output for our sales load, and how we want to represent the input data in our Sales fact in the end. Finally, we compare our expected output with the output generated by the sales load.
As soon as we have prepared our test cases it is time to execute our unit tests.
Tools
There are a couple of unit testing frameworks available, since we are focussing on SQL unit testing, the most relevant unit testing frameworks would be
- tSQLt: tSQLt – Database Unit Testing for SQL Server
- SQL Server Unit Tests (included in SSDT): Verifying Database Code by Using SQL Server Unit Tests – SQL Server Data Tools (SSDT) | Microsoft Learn
- Redgate SQL Test: SQL Test – SQL Server Unit Testing Tool From Redgate (red-gate.com)
Since I want to keep my unit tests solely in T-SQL and want to keep it completely free, I will be diving into tSQLt in future data recipes. And, since I’m a big fan of Database Projects, we will be using a Database Project to set up our Database Unit Test Cases.
CI/CD Integration
Unit Testing automation will come into the picture later on, to make sure that all the unit tests are successfully passed with every single deployment. In future data recipes, we will go through the steps to take to automate our unit tests with Azure DevOps
Next Data Recipe in this series: Getting started with tSQLt