Data integrity is the extent to which you can rely on a given set of data for use in decision-making. The level of data integrity depends on the data’s completeness, accuracy and consistency. The more incomplete, inaccurate and inconsistent data your enterprise manages, the lower your level of data integrity.
Data integrity issues and causes
Lack of consistency: if one source of data uses one definition or calculation for “Customer Lifetime Value,” and another source uses a different definition or calculation, but still calls it “Customer Lifetime Value,” the inconsistency will reduce your ability to understand your enterprise’s situation and make effective decisions.
Lack of completeness: when multiple data sources are combined to produce one record (for example, in a CRM), resulting in some records that are more complete than others (e.g. Anne’s record has name, email, phone number, address and profession; Bob’s record has only name and email), this lack of completeness in your overall combined data asset will potentially skew any decisions that use it as a basis. This is reduced data integrity.
Lack of accuracy: this is a nice way of saying that the data is simply wrong. Wrong numbers, wrong names, wrong information. Much of inaccurate data is due to human influence on data assets (most of it unintentional, like mistakes in data entry, but malicious actors can play a part as well). Sometimes errors are due to the automated processes through which the data is transferred or transformed. Sometimes the data was correct… once upon a time. Now it’s out-of-date, rendering it unreliable (or, at the very least, less reliable). The chance of making the right decisions is significantly reduced when they are based on wrong data.
How to ensure data integrity
The way to ensure data integrity is with a combination of:
- Controls and constraints
Data management policies dictate how your data is processed, stored, combined and used – the very things that impact data integrity. Does your data first sit in silos and get merged just before the analytics and reporting stage? What interoperability standards are used? What automated workflows do you have in place to automatically detect and correct data errors? Who is able to access and change data? When your policies are designed to prioritize data accuracy, completeness, consistency, currency and validity, that’s a giant step forward for your enterprise data integrity.
Data management tools aid the achievement of data consistency. Tools like a data catalog keep everyone in the enterprise on the same page when it comes to any data asset: the same business definition, the same technical info, the same data owner and steward. It’s the one-stop shop where everyone in your enterprise buys. And when the data catalog is automated, it becomes self-updating, making it always current and never out-of-date. A tool like a data lineage solution enables your data team to assess data validity by viewing its path from its point of origin in your data landscape, through every system it has traversed.
Data integrity controls and constraints address database integrity and information integrity in other structured ways of storing or viewing data. As data is loaded into the database or processed for an integration flow, it is checked against a certain set of rules. For example, a field that is supposed to hold “AGE” will only accept a value that is an integer. Anything else (e.g. a fraction, a string of letters, etc.) will invalidate the entry. These constraints filter any data that doesn’t meet the requirements and prevent it from ever making it into the data storage or view, or flag it for a human to check and correct.