Practical story — how we conduct complex database testing.
I won’t tell you how to create a test plan, select/build test strategy and so on. A lot of articles describing this. Below I will give strict details based on our experience and share tips what we collect during database testing.
Our test team had following starting points:
- back end build using Oracle schemas;
- incomplete or outdated data mapping documentation.
Schemas structure:
- Schema A: 11 tables with columns range from 15 till 700+. It collects data from external source system as 1-to-1 relation.
- Schema B: 12 tables having approximately the same columns range but data in some columns received from schema A being aggregated, filtered and converted based on some simple rules.
- Schema C: 5 tables with columns range from 15 till 120. This schema is the main one from logic perspective as all calculations and data manipulations are performing here. Data collected from Schema B being aggregated, transformed and reworked according to business rules. Results are stored into this schema.
Relying on the BE structure and requirements it was agreed to split testing into 2 phases: phase 1 — include testing from source till Schema B; phase 2 — include testing of business rules and data from Schema B to Schema C.
Phase 1. Testing between Source, Schema A and Schema B.
1. Validate type and size of each column.
2. Validate number of records per table.
3. Validate data and transformation logic.
Phase 2. Testing between Schema B and Schema C.
1. Validate business logic for Schema C.
2. Validate data and some transformation logic between B and C schemas.
Details for Phase 1:
validation of type and size. This test is aimed to check that type and size of corresponding column between each two instances are correct and they also matching requirements. To make this test, we execute next query per table and store results for later comparison in a CSV file.
Select c.column_name, c.data_type, c.data_length
From sys.all_tab_columns c
Inner Join sys.all_tables t on c.owner = t.owner and c.table_name = t.table_name
Where c.owner = ‘${db_owner}’ and c.table_name = ‘${table}’
Order By c.column_id
Usually this type of validation is done across all involved databases. Of course there are can be exceptions. For our case we skipped this type of validation between source system as due to its limitation there are only two types of data, so it is useless to check.
Why we do it? This checks are important as if types will be not match — data can be incorrectly decoded during moving into next schema or front end/third party solution will be incorrectly behave. Like date type into numeric, etc. With size all slightly easy — data can be just shorten during its move to next schema.
validate number of records per table. This test is to confirm that records are not lost or added during transferring between schemas. For example using inner vs left joins, etc. To do this, Select count(*) From ${table} query is executed and results are stored in special CSV file for later use.
validation of data and transformation logic. This test is to validate that data flow starting from source dB until Schema B per every table is correct. This step is complex and most time consuming. In comparison with other tests this tests are highly dependant on SQL knowledge and test framework architecture. If with SQL knowledge all clear, then regarding test framework architecture it is important how independently keywords are created and how deep dependence between them is. Than they will be adaptive when new requirements will came up. To do this test, test team created complex SQL queries that are equivalent of child schema. As we were not able to test whole data, we select randomly 100 of unique IDs and use them in “where” clause.
Query example:
Select tab1.col1, tab2.col2
From table1 tab1
Inner Join table2 tab2 on tab1.col4 = tab2.col4
Where tab1.col4 in (‘unique ids list’)
Order By tab1.col1
Why we do it? Two main points: point 1 — validate that data is moving without loss; point 2 — validate DEVs work.
Details for Phase 2:
validate business logic. Business has some logic and rules according to which data is being aggregated. So we implemented that rules using SQL queries and compare results that we receive with existing results in Schema C.
Query example:
-- business logic 1
Select case when col1 is null and col2 = ‘oracle’ then ‘True’ else ‘False’ end Valid_val
From table1
Where deleted_flag != ‘Y’
validate data and some transformation logic. Here we made the same action as we did for validation of data and transformation logic in Phase 1 but taking into account business logic as due to it we are having limitation in data, like work with data that has deleted_flag = ‘Y’ only.
Our framework structure:
1. Suite setup.
1.1 We connect to source dB and select randomly 100 unique IDs. Store that IDs in temp variable.
1.2 Get total rows number per table for each data source.
1.3 Get columns type and size per table for each data source.
1.4 Connect to each data source and select data for the same table filtered by 100 IDs.
2. Start Tests execution.
Note: Ideally we could skip data validation for Schema A and validate data between source and Schema B to reduce time. But in our case data between source and Schema A moving as 1-to-1 with some minor transformations and between A and B schemas more complex transformations are taking place. So to avoid having too complex queries on level B we test all.
This was a brief description with some details of how we build our test framework and select tests. During implementation of tests and also after completion we collect some hints that I want to share:
Tip 1. Collect and work with data per table only.
Main problem that we had - Suite Setup from the very beginning was incorrectly build. We collect all data for ALL tables and then do all testing. This implementation sounds good as we have only one connection per database during whole test execution. But on the other side, in case of some change needs to be done in any select statement, for testing this — whole data should be retrieved. Also if some tests will fail during execution, to rerun them — again, whole data should be retrieved.
Tip 2. Do not rely fully on queries created by dev team.
Test team should work on creation it’s own queries to test: a) DEV team work; b) data flow. Main weak point here is that Test team can also made a mistake during queries creation. To minimise this, we added peer-review for each finished task by minimum of one teammate. Is case of complex queries, two teammates had to review the final work.
Tip 3. Work with CSV files instead of Excel.
As amount of data is quiet big firstly we used to work with Excel files. So we store data from all data sources in Excel and then do a comparison via Python. But later we realized that working with Excel files are consuming a lot of time. After making simultaneous tests using all available Python modules to work with Excel we switched to storing and comparing data in CSV format. This allow us to increase speed of suite setup and file comparison functions.
Tip 4. Store all queries in separate files.
As amount of queries and their complexity quiet big, it is useful to store them into separated TXT or SQL files. This will help to debug a query easily and also be better understandable for newcomers.
Data Base testing is complex thing. But as any kind of testing, it will become easy task if to plant carefully and decompose whole ‘testing’ into smaller and separated items. Of course, main point here is good knowledge of SQL. But imho if test team work with close cooperation and had good planing, all goals can be archived on a high level.
Thank you for reading!