Decision makers are finally starting to internalize the business value of having an accurate and thorough understanding of the data movement process. They get that in order for the BI group to be successful, they need to provide them with best of breed data lineage tools. But, what many managers don’t get yet is why the horizontal data lineage that many tools provide simply isn’t enough for the BI group.
We’ve discussed the difference between horizontal and vertical data lineage and why both are necessary for a robust BI environment (and we all know that a robust BI environment is key for the business to flourish). To recap, horizontal data lineage—the “traditional” notion of data lineage—traces the path of data from system-to-system and database-to-database, and finally to reports, dashboards, and key performance indicators (KPIs) that businesses use to make tactical and strategic decisions, comply with regulatory reporting requirements, and more.
Today we’ll dive a bit deeper into the concept of vertical data lineage.
Horizontal data lineage is concerned with the systems involved in the journey from source to destination. Vertical data lineage is all about the transformations that occur along the way—what we call the “column-to-column” path. With vertical data lineage, BI analysts can delve into the minutiae of various extract, transform, and load (ETL) processes a data field encounters on its journey.
Vertical data lineage enables BI professionals to find answers to questions such as:
-Did a field name change along the way? Was it done by design, or did a system make the change automatically to avoid duplicate field names when combining data from different sources?
-Did a format change at some point? For example, did a data field with four-digit years become shortened to two-digit years?
-Did a numeric field lose precision (i.e., decimal places)? Was it truncated or rounded? What rounding rules were used?
-Given that different systems and databases handle missing data values differently, was there a change in how missing values are treated along the way? Do missing values stay missing from beginning to end, or do they become literal “.”, “ “, or “NULL” at some point?
-Was a column calculated on the basis of data from other columns? What formula was used?
The Advantage of Vertical Data Lineage
Just as a Cartesian plane, with its x and y-axes, can convey much more information than a simple number line, the combination of horizontal and vertical data lineage (what we at Octopai call “advanced data lineage”) gives BI professionals an unprecedented level of insight into the lifecycle of data from generation to generation in a database (or, increasingly, from streaming, real-time sources) to consumption in a report or dashboard. This provides BI analysts and developers two key capabilities:
–Tracking the source of reporting errors – The visual tools Octopai provides can significantly reduce time spent on hunting bugs.
–Conducting impact analysis – All too often, system and database owners make changes to database structures, table names, column metadata, stored procedures, and other data properties without regard to how these changes might negatively affect calculations, downstream systems
This is the power Octopai’s advanced data lineage mapping tools bring to the table.
Using these tools, BI professionals spend less time putting out fires and more time doing what brought them into the BI field in the first place: Finding creative and innovative ways to leverage the mountains of data every enterprise has in order to make informed, educated, insightful decisions—decisions that result in competitive advantage for the business.