In collaboration with a leading international bank, our recent challenge centred on overseeing and ensuring the accurate migration of data between two databases. This critical task was integral to the migration of business processes from a legacy mainframe application to a modern system with an entirely revamped structure. As the new processes were developed directly on the contemporary application, the challenge lay in constructing queries that meticulously compared data between the two databases, considering essential transformations. The complexity increased due to the need for content-accurate and consistently structured queries, given the vast volume of data points involved.
When transferring a datapoint in a 1-to-1 manner, the tests were relatively straightforward, with a low probability of errors. However, even in such cases, tests were imperative to account for potential loss of information during the transfer, leading to an incomplete dataset. Conversely, a significant portion of datapoints in the new database resulted from the combination and transformation of multiple pieces of information. Here, a profound understanding of the underlying business logic was crucial, demanding meticulous testing to identify mismatches and mitigate the likelihood of migration errors.
To facilitate testing, data before and after migration was made accessible on a SAS server. This enabled the team to develop SQL queries, bringing information from both systems together for thorough testing. The resulting scripts were designed to run frequently, ideally on a daily basis, with all test results logged in a central dataset and visualized in a meaningful dashboard.
Step by step, we tackled the different data topics that were identified in cooperation with our client’s experts, who work daily with this data. This whole project was, in fact, an excellent example of strong teamwork. Throughout the project, we worked closely with our client’s international data team, their customer data team, and the teams in charge of both databases.
According to our client’s priorities, we created several SQL modules in a SAS environment to keep track of incorrect data points. Incorrect data points, for example, are data points that appear only in one of the two databases or data points where there is a mismatch of information between both databases. After finishing a module, further enhancement, optimization or refactoring is in placing depending on the results and/or performance.
Those SQL modules were complemented by Kibana dashboard to present the information on incorrect data points in an easy-to-read dashboard. Any spikes appearing in those graphs should obviously be researched so that remediation actions can be undertaken by our client’s data teams. Ultimately, this ensures that the number of incorrect data points shown in the graphs steadily decreases as the information transfer continues.
To meet these requirements, we established a robust, automated process for verifying the correctness of data transfers and detecting any inaccuracies. Additionally, for auditability, it was imperative to maintain a detailed record of all actions taken. We also adhered to relevant privacy standards throughout the project.
Our contribution has been all about empowering the data teams. With the implementation of a Dashboard and the utilization of SQL queries, we've enabled these teams to effectively address and resolve database discrepancies. Since Datashift's involvement in the project, there has been a noticeable surge in the creation and tracking of modules, highlighting our commitment to enhancing data quality. We've instilled a culture of continuous improvement, where established workflows are regularly scrutinized and adapted for heightened performance and efficiency. The project is finished on time with a successful conclusion in November.
The introduction of data views has significantly benefited the client, offering a swift and convenient means of assessing data quality. This newfound capability allows for the rapid identification of inconsistencies, which, in turn, enables the team to promptly address underlying issues. Our commitment to providing these valuable tools has not only elevated data quality but has also streamlined the entire data management process, facilitating a smoother and more efficient operation.