Monitoring data quality just got easier with Soda

5 September 2022
Blog Image

In many of the recently completed client projects, data quality emerged as a hot topic. For our clients, we’ve co-developed a data quality solution based on the Soda Cloud platform. Soda is a new kid on the block when it comes to data quality. It excels in making it easy for business users to monitor the health of their data, enabling them to define data quality rules without a high level of technical knowledge.

Building a tailor-made data quality solution on Soda Cloud

Soda Cloud is a data quality platform that enables business teams to monitor the quality of their data through a series of dedicated health dashboards, and dive deeper into data sets whenever they don’t meet data quality rules. Partnering up with our clients and the Belgium-based company Soda, the developer of the platform, Datashift implements tailor-made data quality solutions on Soda Cloud.

Monitoring dashboard in Soda
Monitoring your Data Quality with Soda

During such an implementation, we typically connect Soda Cloud to the most commonly used data sources such as Athena, PostgreSQL, Snowflake and Apache Spark. In addition, we also work closely with our client’s business teams to translate their data expectations into Soda data quality rules and create the related data health dashboards.

Typically, we then define several monitors in Soda to automatically detect data issues whenever business data sets are scanned, such that clients can resolve those issues long before they have a downstream impact. One use case is where the client wanted Soda Cloud to support their Socially Responsible Investing, making it easier for them to halt or discontinue investments if the related data quality rules are no longer met. Another use case is when data is bought from an external party and needs to pass quality checks before the company will internalize the data. If the data quality checks are not met, the buyer will immediately be notified and can refuse to buy the data. One might even think to lay down the data quality checks in the contract between the buyer and the supplier.

Defining data quality rules in Soda Core

When it comes to defining data quality rules, a lot of change is happening in Soda. Traditionally, the business team needed to pass their data quality requirements to IT through an Excel file, and the IT team had to translate those requirements into Soda rules.

Quite a few things can go wrong during this type of communication, such as data columns that are no longer found. Also, it’s not always straightforward for the IT team to define the appropriate Soda rules. After all, IT is not that familiar with the data owned by the business community. Ideally, therefore, business end-users would be able to write their own data quality rules in Soda and take full ownership of the quality of their data.

That is now becoming possible with the transition from the older Soda SQL version to the new Soda Core. Thanks to the new Soda Core web interface, data quality rules can be written and shared without extensive technical knowledge. Rather than having to rely on IT specialists to add data quality tests through the back end, business end-users can do this by themselves in the Soda front end.

In addition to the business knowledge they already have, end-users only need to understand the intuitive Soda Core language. We help them achieve that through a series of training classes, such as the ones we are currently organizing for the investment company where we’ve developed a data quality solution on the Soda Cloud platform.

The Soda data quality platform is alive and kicking

The new Soda Core web interface is an excellent example of how Soda is currently working on making data quality increasingly accessible to business end-users. As we’ve pointed out before, making it possible for business end-users to implement data quality rules by themselves is crucial for the future of Data Quality. The more closely those data quality rules match what’s relevant for the business, the more likely it is that issues will be found on time, thus preventing high-impact errors.

By decentralizing data quality and empowering business end-users, Soda fixes a lot of the ping pong that is currently ongoing between technical and business teams. Are you eager to know more about Soda's hands-on data quality testing and monitoring approach? Are you looking for a data quality solution that integrates well with your technology stack and can be set up fairly quickly? Reach out to us for more information.