Data Quality Implementation in Data Warehouses | Toptal

Data Quality Dimensions

DQ dimensions are a common way to identify and cluster DQ checks. There are many definitions, and the number of dimensions varies considerably: You might find 16, or even more dimensions. From a practical perspective, it is less confusing to start with a few dimensions and find a general understanding of them among your users.

  • Completeness: Is all the data required available and accessible? Are all sources needed available and loaded? Was data lost between stages?
  • Consistency: Is there erroneous/conflicting/inconsistent data? For example, the termination date of a contract in a “Terminated” state must contain a valid date higher than or equal to the start date of the contract.
  • Uniqueness: Are there any duplicates?
  • Integrity: Is all data linked correctly? For example, are there orders linking to nonexistent customer IDs (a classic referential integrity problem)?
  • Timeliness: Is the data current? For example, in a data warehouse with daily updates, I would expect yesterday’s data available today.

Source: Data Quality Implementation in Data Warehouses | Toptal

Leave a Reply

Your email address will not be published. Required fields are marked *