Once upon a time, almost all data had one shape, and it was a rectangle.
For data structures that followed the relational database model, which relate to each other by “primary keys” and “foreign keys,” this approach worked well, as long as different databases didn’t have to “talk” to each other, at least not directly.
Over time, however, the data landscape in many organizations became more complex. Not everything was in neat tables anymore, and queries became more complex and involved various transformations, views, stored procedures, and other artifacts.
The result was a spider web of connections and relationships that were difficult to manage, characterize, document, and maintain.
What’s one to do?
One Database to Rule Them All? (Hint: No)
One possible approach is to try to bring everything together in one place. Just import data from source A into destination database B.
But although this might work for absorbing small data sources into a larger one, it soon becomes impractical. A better approach is to leave the data sources alone and bring summarized versions of each into a data warehouse.
When designed and implemented properly, a data warehouse standardizes data structures from disparate sources to enable apples-to-apples reporting. This works well too, but usually involves complex transformations and a dizzying array of business rules and exceptions.
The Hurrier the Data Goes, the Behinder You Get
Meanwhile, both data sources and the volume of data are expanding at an alarming rate, and much of the new incoming data is unstructured information. The problem of how to organize, manage, and extract meaning from the data is becoming intractable. To paraphrase the white rabbit in Lewis Carroll’s Alice in Wonderland, the hurrier the data goes, the behinder you get.
In particular, enterprises are having an ever more difficult time understanding where their data comes from, where it goes, and how it gets there.
And when there’s a problem, tracking down the source can represent days or weeks of manual detective work.
Data Lineage: A Crucial Business Tool
There’s no getting around the proliferation of data sources and the explosion of data volume. However, there is a tool that businesses can leverage to get a handle on things: data lineage.
Simply put, data lineage is a tracking tool that documents the journey from data source to target. Every step of the journey is represented and presented graphically, in a way that’s easy to understand, manipulate, interpret, and report on. This visual aspect is the key to the usefulness of any large and complex data landscape.
How Data Lineage Works
The key to data lineage is metadata:
– Physical location of the data
– File format
– Who owns and is responsible for the data
– How the data is secured, and who has access to it
Automated data lineage implies automated metadata discovery, and any automated data lineage tool incorporates or leverages an automated metadata discovery process (which has additional benefits of its own).
Want to learn more about automated data lineage?
Check out our white paper “Metadata Blending: Automating Data Lineage Views”Learn More
An automated data lineage tool examines the metadata and makes the connections from source to target. And being automated, it can be executed quickly (in just a few seconds, versus days or weeks for manual data lineage). Moreover, it can be executed repeatedly—as frequently as you need to ensure your data lineage is up to date and represents the true state of your data environment at all times.
The Future of Data Lineage Visualization
Enterprises are incorporating and leveraging not just more data, but data in more formats and delivery mechanisms.
Remember when maps were printed on paper? Services such as MapQuest were revolutionary in part because they were updated much more frequently. And being digital, they could be manipulated and used in ways that paper maps never could. They are so useful, in fact, that it’s difficult to imagine a time when they didn’t exist.
That’s what automation is doing with data lineage: bringing the data landscape into the 21st century with modern visualization and management tools. Data lineage tools, especially automated ones, are poised to become just as essential to the enterprise as present-day database management and reporting tools. Are you ready for automated data lineage?