Data has become the norm. It’s literally everywhere, but are we really able to fully understand what our data is telling us? Are we even seeing the whole picture?
Probably not. Unless you’re using a tool that provides full data lineage, the story you’re able to glean is, well, let’s just say incomplete. Why? Keep reading to find out.
Let’s define data lineage
We can define Data Lineage as the data lifecycle or the data journey. This lifecycle includes where the data originates, how it has gotten from point to point, and of course where it is today. Through data lineage, organizations can better understand what happens to data as it travels through different pipelines (ETL, files, reports, databases etc.) and therefore make more informed business decisions. Data lineage also enables companies to trace sources of specific business data for the purposes of tracking errors, implementing changes in processes, and implementing system migrations to save significant amounts of time and resources, thereby tremendously improving BI efficiency.
What does data lineage look like?
The image at the top of this post shows how Octopai compares and presents the lineage of two different reports (either from the same system or different systems), which clearly illustrates any differences between the reports and enables users to quickly understand exactly how any two or more reports ended up being different. Specifically, we see that an additional ETL process and table have been found in the report on the bottom of the screen that is missing in the report on the top. This is the point that the two reports began to deviate.
Data lineage is a visual representation of the overall flow of data, and provides a look at how data is manipulated via the ETL process so that organizations can assess the quality of their data before it is loaded into an analytics tool. Data lineage visualization is an overview, a journey map of our data.
How metadata fits in
Not surprisingly, just as metadata’s role in the larger big data governance realm has become central, metadata is also a key player when it comes to data lineage. Let me explain:
Whereas data lineage is the visual representation of the data journey, the actual data presented in the lineage must first be located and verified. This is done via none other than our dear friend, metadata. Indeed, metadata and lineage are intertwined, for it is by way of metadata that we are able to find any and all data items related to any specific report or ETL process, see all the dependencies related to them and trace their entire lifecycle. In short, metadata is to data lineage what wheels are to a car. Metadata is what makes data lineage possible, and the demand for tools for big metadata is growing rapidly.
Whether you’re gearing up to comply with GDPR or need to gain a better grasp of your data to improve your business, you must have a data lineage tool in place. You absolutely must know where every single piece of data originated, and you must have a clear understanding of every single change it encountered and stop it made along the way. Knowing the entire story behind each and every data item is a clear case of ‘knowledge is power,’ as the more information an organization has, the smarter and more able it becomes.
Is your data driven organization struggling to get full data lineage that is accurate? Octopai gets it done in 5 seconds, enabling BI groups to double their capacity. Get in touch now for a free trial.