Automate Data Warehouse Testing

Automated Data Warehouse Testing

Automate Data Warehouse Testing

This book aims to assist QA (Quality Assurance) teams to automate data warehouse testing. The complexities and challenges of data warehouse development have evolved over the past few years and testing teams are now expected to verify each implementation by running tests multiple times to ensure the system hasn’t regressed. Volumes of data are fed into the data warehouse through heterogeneous sources having data quality issues at source. Extract, Transform, Load (ETL) processes have grown too complex. Data warehouse testing is usually associated with a very high cost of quality and any defect slippage has a significant effect on downstream end-user systems. As a result, the ability to perform automated data warehouse testing becomes extremely useful as it provides testers and the organisation as a whole an added level of confidence.

One good thing about automation is that you can initiate your tests before leaving the office and then, when you get back in the morning, your test results are ready for review!

There are usually millions of records in a data warehouse and it is not possible to check all of this data manually. You may have an automated test that picks up 1000 records from the sales table and verifies the ETL logic while you are getting your tea! Furthermore, this becomes even more powerful when you consider the fact that this test may be picking up a different set of 1000 records each time it is executed, providing additional data coverage on each execution.

“With an automated test in my quiver, I always feel like I have an extra pair of hands.”

With each additional automated test you can concentrate on other important tasks while your ‘automated buddy’ is executing the tests. The help from this buddy becomes even more significant when as a testing team your are given a new build to test at the eleventh hour and in a few hours are asked “What do you guys think about the new build?” In situations like this, you can feel positive that you have some automated tests that will give you a degree of confidence about the quality of the build.

I have experience working in a Quality Assurance role, on a similar data warehousing project and thought it would help other people like me who perhaps need a helping hand in testing.

Automated data warehouse testing is a powerful, rare and exciting new skill that can benefit any testing team.

This book is aimed at novice to intermediate level testers who want to try their hands at automated testing. It provides a step by step guide that will teach you how to setup an Automation Framework from scratch. As we will see, the framework is quite generic and as such can be applied to automate most data warehousing projects. This book concentrates on the ‘practical side’ of automated testing rather than the ‘theoretical side’. It includes a complete listing of the automated code for the sample data warehouse that we will setup for testing. The code listings explain the logic for the individual tests and generic functions.

As usual, the book is divided into a number of chapters.

  • Chapter 1, Getting Started, introduces the data warehouse architecture. It provides a high level explanation of the components of a data warehouse and explains how they fit together. It also summarises the main reasons to automate testing and provides some golden rules for you to follow when automating tests.
  • Chapter 2, Sample Dimensional Data Warehouse, discusses the three models of a system – the business, the logical and the physical model. For automation purposes, we will need a example data warehouse and this chapter sets the scene for this. Throughout this book we will consider one hypothetical system called Order Processing System in its source form (source system) and convert it to a star schema for our dimensional data warehouse (target system). We will then go through an important document called Data Mapping which is a primary input to QA test design, in addition to other documentation available in the project. We will identify the the source-to-target mappings required for our sample data warehouse and see how to derive our expected and actual results from this mapping document.
  • Chapter 3, Automation Approach, deals with the very important need for a framework which is the driving force behind a successful automation. It looks at the software you will need to setup the Automation Framework and one piece of good news is that all of this is open source and freeware! It also discusses the Automation Framework components. We will write our first automated test in this chapter, and expand on it to add a number of important features to the framework such as logging to console files (so that you can refer back when required) etc.
  • Chapter 4, Setting up Sample Data Warehouse, describes how to set up the databases that act as our source systems, staging area and data warehouse. Each database is defined in three stages – creating the database files, defining the schema and finally loading the data.
  • Chapter 5, Automating Staging Area, is where the real automation of the data warehouse testing begins, when the data is transferred from the source systems to the staging area and we ensure that the transferred data is correct. Before we finish this chapter we will add further features to the Automation Framework.
  • Chapter 6, Automating Dimensional Data Warehouse, deals with automating the testing of the dimensional data warehouse and the type of tests that are applicable. We will then automate the fact table in our sample data warehouse and add further important features to the Automation Framework.
  • Chapter 7, Automating Other Data Sources, looks at automating the data feeds, as the data in a data warehouse can arrive from a variety of sources. This chapter deals with automating the data feeds from commonly used CSV and Excel files. We will cover XLSX and XLS Excel formats.
  • Chapter 8, Automated HTML Test Report, describes how to report on the test results. The ability to run a lot of tests is of little benefit if the results of the tests are not easy to understand. We need the framework to automatically generate reports from the tests and present the results in an easy to read format. A report should provide details of where the failures have occurred and what test data was used, including what SQL queries were fired on the source and target systems. This chapter explains how to achieve this and we will generate a formatted HTML report of our testing. The report also generates a test summary for the senior management.
  • Chapter 9, Data Profiling, deals with how to automate some of the data quality checking tasks that are very helpful in identifying faults.

A unique step by step guide for testers to perform Automated Data Warehouse Testing.