End-to-end data lineage is the tracing of a data point’s path throughout your entire data landscape, from source (where it entered) to target (where it ended its journey), including what transformations it underwent and what other data and systems it affected.
End-to-end data lineage in a typical data environment would begin at a source system and track the data through ETL or ELT processes, the data storage repository, analysis systems and finally analytics and reporting systems.
Types of end-to-end data lineage
Not all end-to-end data lineage is created equal. “End-to-end” tells you about the breadth of the lineage, but it doesn’t tell you about the depth.
End-to-end system lineage takes a bird’s eye view, enabling you to see the overall path of the data in question through the different systems in your data environment. This type of data lineage is ideal for predicting or analyzing the impact of a process change, discovering redundant processes, or high-level visualization of end-to-end data flow.
Need to dig deeper and see precisely where data went – or went wrong? For root cause analysis of reporting errors, regulatory compliance audit preparation, or impact analysis of a change to a column in the source system, you need column-to-column, end-to-end data lineage. This deep level of data lineage gives you column-level transparency of your end-to-end data flow.
End-to-end data lineage vs. other types of lineage
You may not always want or need to understand the entire end-to-end data flow. Sometimes the purpose of your data lineage analysis is:
- To understand the logic of a specific ETL process
- To visualize a database object data flow
- To locate dependencies within a particular report
In those and other such cases, you need focused, localized data lineage. End-to-end data lineage is not necessary here, but column-to-column data lineage is a must.
In most modern data environments, end-to-end data lineage management must be automated to be effective. Modern organizations simply have too much data and too frequent changes to perform end-to-end data lineage management at scale with a manual process.