You Can’t Have Best-in-Class Governance without Best-in-Class Lineage

Play Video

Without a complete and accurate understanding of how data flows throughout the organization, it is extremely difficult to establish the processes and metrics necessary for a successful data governance program. Best-in-class data lineage that provides multi-layered views of the data (cross-system, end-to-end and inner-lineage) plays a critical role in knowledge transfer, issue identification, information on the use of sources/resources, impact analysis, & definition clarity – all extremely necessary for best-in-class data governance.


In this webinar, you’ll hear it straight from the horse’s mouth as Anilh Rameshwar, Data Architect at Zego, shares exactly how automated data lineage provides his department with unprecedented visibility into their data, which is absolutely critical for the organization’s data governance efforts.

Webinar Transcript

Shannon: Hello and welcome. My name is Shannon Kempe, and I’m the chief digital manager of DATAVERSITY. We would like to thank you for joining this DATAVERSITY webinar, you can’t have best-in-class governance without best-in-class data lineage. Sponsored today by Octopai. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar.

For questions, we will be collecting them via the Q&A in the bottom right-hand corner of your screen, or if you’d like to tweet, we encourage you to share highlights or questions via Twitter using #dataversity. If you’d like to chat with us or with each other, we certainly encourage you to do so. Just to note, the Zoom chat default tends to adjust the panelist, but you may absolutely change that to network with everyone.

To access the Q&A or the chat panels, you can find those icons in the bottom middle of your screen for those features. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now let me introduce to our speakers for today, David Bitton and Anilh Rameshwar.

David has over 20 years of experience working with technology companies and a solid history of global leadership across leadership success, and business to business enterprise sales, specifically software as a service. During the last four years, he has led sales and business development efforts at Octopai, where he enjoys helping BI and analytics professionals harness the power of automated data lineage and discover to achieve full control of their data.

Anilh is an accomplished big data analyst, data developer, database developer, and software engineer with 15 years of experience processing multiple petabyte datasets, and is skilled in finding the subtle nuances of data that make the difference between day-to-day metrics and valuable business insights. With that, I will give the floor to David and Anilh to get today’s webinar started. Hello, and welcome.

David: Thank you very much, Shannon. I’m super excited today to be with the host this together with our good friend, Anilh and Zego. What I’d like to do is jump straight into the presentation, where I’ll have Anilh share some of their challenges, and how they used Octopai to address them. Anilh, would you like to introduce yourself? I think Shannon already did, so but maybe you’d like to talk about the company that you work for, and the existing data environment and so on.

Anilh: Thanks for having me. My name is Anilh Rameshwar. I’m the data architect at Zego. I started at Zego in February of 2020. At that point, what I came into was a software stack with a BI solution that was built on Microsoft SQL Server. There was no lineage whatsoever and there was a decided lack of trust in the data. Maybe we can go to the next slide here, David.

David: Sure.

Anilh: Excellent. There was multiple reports being consumed by multiple business units, and this had evolved over time, without any governance or oversight. There was no data governance team. It was developed by third-party consultants. What happened is, as the data got into more and more hands, the trust in the data began to erode. By the time I had arrived, nobody really trusted what they found in the data warehouse. The metrics were in conflict with queries against source systems.

I was tasked with rebuilding Zego’s entire data platform from the ground up. Part of that endeavor included providing an end-to-end data lineage solution so that we could rebuild trust in the data and to make effective business decisions. Go to the next slide, please.

David: Sure. With these challenges in mind, you embarked on your team’s initiatives, correct?

Anilh: Correct.

David: Can you share a little bit about that with us?

Anilh: Sure. The initiatives were primarily to regain trust in the data so we could make effective business decisions, or are you talking about the actual mechanics behind it? Sorry, David.

David: That’s okay. Whatever you’d like to share with us. Basically, the slide here, the data engineering initiatives.

Anilh: Sure. We had four core applications that were disparate. The legacy data warehouse only included one of those four applications. Folks were frequently trying to pull data from application B and commingle it with data from application A, and they were getting the wrong results. We determined that our best course of action was to build a data lake in Snowflake. From the data lake, we developed a conformed layer and then on top of that conformed layer, we developed the reporting layer.

This is where we decided we needed a data lineage solution. Some of the challenges that we faced with the legacy data warehouse, were fundamental related to that source code, which was buried deep within stored procedures and SSIS packages. We found multiple problems, for example, stale dimension data where our lookup table hadn’t been updated, changes in the source system behavior where the LBW code remained static, transformations that changed over time without history tracking. In other words, the metrics would change over time.

Then, one of the scariest pieces that I had to tackle was the derived columns with ambiguous names and unknown definitions. We decided to rebuild all of this in Snowflake and tackle the majority of those problems with the conformed layer. The data pipeline– Oh, there we go. It’s the next slide. I think I was one slide ahead of you. I apologize.

David: No, worries. Sure. This slide?

Anilh: I think I’ve gone over all of these. The biggest thing was, we didn’t have a single source of truth. The other challenge that we faced was the current system with a legacy data warehouse. It only served the financial department, but then as it began to be used by the customer success department, and our tech ops teams, they were getting the wrong data. That’s when we determined we needed Octopai and a data lineage solution so we could accurately track data from source system all the way to end consumers.

David: Sure, okay. This effort could have been reduced a few hours with the data lineage solutions in place, is that what you’re saying?

Anilh: One of the big challenges is a report consumer would get a report and they would compare the results to what they saw in the source system. Then it would take hours or weeks to actually track down what had changed, what was the variance between what they were seeing in the legacy data warehouse report, and what they were seeing in the source system.

David: All right. Up until now, there was basically no management tracking or visibility in where sensitive data exists or beyond the source systems, and now it was consumed. Zego is now, as I understand, applying appropriate data masking and governance policies in order to ensure that the data is protected from source systems through all the different endpoints. Can you maybe elaborate on that?

Anilh: Yes, thanks for reminding me of that. One of the other challenges I found with the legacy data warehouse is sensitive data, thankfully, not PCI, but PII and other sensitive data was exposed, originally for consumption by the finance department. However, as the usage grew without governance, the inappropriate data was in the hands of the inappropriate folks. What we also found is that data was being shared directly to our clients, without any governance or oversight.

We needed to put a stop to that as soon as we got a data governance policy in place. We use Snowflake’s tagging mechanism to tag the data, and we’re using Octopai now to see where the sensitive data lands in the conformed layer. Then we’re applying masking policies using Octopai’s data lineage to ensure that the appropriate departments don’t see data that they’re not privy to.

David: Great. Thank you for sharing that with us, Anilh. Basically, to summarize what you just covered up today, what you need to look for in lineage solutions in order to have the best-in-class governance program. Data lineage is actually the crucial part of data governance, since it provides the records of data movement. The top three features that you need to look for in a good data lineage solution will be, of course, to provide the best support to your governance program.

Those are coverage. We want to see, of course, as many systems that’s possible covered, such as ETLs, data warehouses, analysis, reporting tools, visual map, which will be quick to follow and understand, and of course, the automatic data catalog, which integrates lineage, and that will allow access to assets with the capability to track their lineage at any given time. What I’m going to do now is jump into the demo portion of tonight’s webinar.

Sure, so give me one moment, you should be able to see the Octopai demo environment in front of us today. Thanks, once again, Anilh. I’m going to now jump into the demo and show some of the attendees here today, how Octopai can actually provide the best-in-class lineage for their data governance initiative. What we have here on the screen is the Octopai demo environment. On the left-hand side, what we see here, just to go through what we have on the screen, and then we’ll jump into a demo using a few different use cases.

On the left-hand side, these are basically the different modules within Octopai, showing you, as far as we understand, the best-in-class data lineage, because we actually have three layers for data lineage XT, which is cross-system lineage, inter-system lineage, and end-to-end column lineage. Together with that, we also have the discovery space and I’ll explain to you why that’s important just in a few moments as well.

All right, so to further explain what we have on the screen, on the left hand side these in our dev environment we have 398 different ETLs from various different systems that we can see here on the screen. In the middle we have roughly or well exactly 3247 DB objects including tables and views for example from these various systems that we see here, and that’s basically a sampling of the types of technologies that Octopai is able to out of the box extract or automatically extract the metadata from.

Then here we see here on the right-hand side of the BI tools or the reporting tools in the 23 different reports. What I’d like to do is show you the power of Octopai with reference to a few different use cases as I mentioned. We’ll touch on the very high level, the various areas within Octopai and from there of course you’ll be able to see how that’s important to understand at the very granular level, the data lineage of your data environment to support a best-in-class data governance platform initiative.

The way I’ll do that is through a use case. The first one that I will go through is going to be the most common one that we see amongst our customers or that our customers are telling us is the most common one amongst their organizations or their data environments. That is you have an error in a report. I’m sure we’re all very familiar with that. In general in most organizations today, there’s an error and a report. If there’s an error in a report, the way that’s handled is probably very similar to this.

Let’s say you have Mr. or Mrs. CFO looking at a report, let’s say at the end of the quarter. Of course they’re stressed to in order to be able to support or to provide the quarterly earnings. Let’s say there’s something wrong with that report. Of course they’re going to open up a support ticket. The appropriate team is going to now need to look into that to try to understand what went wrong with that report. Most likely they’ll need to go through a process which is very similar in most organizations.

That is you’ll start off by probably taking a look at the map of the systems and then probably taking a look at the tables and views that were involved in the creation that report. They’ll probably look into maybe the glossary to see if the labels were given the same names, and if not which glossary sorry was used. After that the error may not be in the database area, then probably look into the ETL. Looking into that of course all of this will be done manually.

Most likely will involve multiple peoples within different or different teams within the organization that have different responsibilities within their domain. What I’m getting at is basically a lot of people, a lot of time, may not be of course even 100%, so it’s basically not efficient. Now imagine– Of course that would take literally most likely in most organizations anywhere from hours, days, weeks, even months if at the most extreme case.

Now what I’d like to do is show you that same scenario and giving you an example of the lineage capabilities within Octopai, and show you how that would be done literally automatically in a few seconds. Let’s imagine for a moment that the issue that we’re having with is in a report called customer products. I’m going to come into the Octopai’s lineage space type in the name of the report that we’re having trouble with. Right now we’re going to go into the first level of lineage that Octopai provides.

Remember I mentioned three layers of lineage, Octopai lineage XD, cross system, inner system, and then end to end column to column. Right now at the high level what we need to understand is how that report was created. As I typed it in you see here that Octopai’s filtered through all the metadata and shown me the report we’re having trouble with. If I click on cross system lineage in about a second and now I understand how that report was created, and most organizations just to get to this very high level of understanding may take hours or days.

As you can see here at the click of a mouse, we now have that understanding here on the screen. On the right hand side what we see here is the report that we’re having trouble with. As we move to the left, we can start to see how that report was created, and we see here that there are two views that were involved in the creation of that report. If I click on any object on the screen as you see here, you get a radial dial that comes up and what that does is offers us more capabilities and more information.

Let’s say for example I needed to get a visualization of this view, maybe there were many different transformations and I wanted to get a visualization which may help me understand it. Clicking on that will show us now the source transformation and target. As I move to my left I continue to try to trace how the data landed in that report. I can see here that there were also three different tables that were involved in the creation of that report.

Similarly if I click on a table I can get also that similar radial dial. In this case let’s say for example I’m sure you’re probably familiar with being tasked making a change to a specific table a calculation, a transformation whatever that might be in that table. What we can see here is that if I click on that table, I get that familiar radial dial. What we see here on the bottom right is a six with an arrow to the right. That means that there are actually target objects or objects that are dependent on that table.

If I were to make changes to that table, I can most likely be sure that some if not all of these different objects that have now popped up will have been affected including an additional stored procedure, a tabular table, measure groups and these four different report or three different reports. As we continue to move to our– You can imagine how long that would have taken of course if you had to do that manually, or in another situation where you’re not using Octopai.

In any case as we move to the left, we start to see here or we come to the ETLs that were involved in that report not one ETL, but multiple different ETLs were involved in the creation of that report. The reason why I’m pointing that out is that many organizations are using many different systems, because systems come along and they are integrated, maybe there’s a merger and acquisition. Maybe there’s a legacy system that you just haven’t haven’t put to rest, and you’ve introduced new technologies, so you’re probably using many different systems to manage and move your data.

That’s not a challenge for Octopai you can see here, we can still show you the path that data has taken in order to land on that report. Now a couple more things that I wanted to point out before I continue on, you may have noticed I know it’s probably small on your screen but there is a shadow to the right of this object over here, this table. There is a shadow to the left of this ETL, and there’s a shadow all the way around this ETL over here.

What that’s telling you is basically there are dependent objects where this object is actually sourcing from other objects and then you can actually continue to decipher or unravel the lineage by just clicking on this case. For example the eight to the left will show us now the lineage or the other objects that this ETL is sourcing from and which is basically these eight different tables. To continue on with our scenario, we asked our customer this one they hear that was having troubles with this report if they had any idea what went wrong with that report.

They admitted that a few weeks earlier before they started using Octopai they had made changes to this one ETL over here. Most likely when they make changes, they usually run into or encounter production issues which is a common scenario in most organizations. Now we asked them if they were going to be making those changes and they knew that they were going to be encountering production issues, why not be proactive? Why not look into what will be affected?

Make the appropriate corrections and save everybody the hassles of the production issues, save the data quality issues that then result with all those production issues, save or increase the confidence in the data or the trust in the data as Anilh was speaking about before because if the data is seen as solid, then of course the trust goes up. Now of course as we all know that’s a lot easier said than done because in most organizations in order to do that it means looking into many, many different objects, many different ETLs tables and views reports and so on.

It could be literally thousands or even hundreds of thousands, so to try to be proactive really is just almost impossible. Most organizations work in that reactive measure, in the reactive way of course trying to avoid production issues whenever there are changes done. Then if there are production issues they will address them as they become apparent, and therein lies a lot of the issues with the data quality.

Of course because you’re only fixing what you know of, and if you’re only fixing what you know I’m sure you can imagine that there will be things that fall through the cracks. Now with Octopai we’ve empowered you to become proactive, and so you can actually now ensure that there are little or no production issues by understanding exactly what will be broken if you make a change. This customer if you were to make a change to that wanted to make a change of the CTL, you’re now empowered to understand exactly so far what will be of course affected.

Now before I jump into that, what we’ve shown you so far at the system level which is the highest level of lineage was a root cause analysis for a specific report. Now we’re going to jump the other way. We’re going to do an impact analysis. Let’s say for example before we were to make a change of this ETL, we jumped into the cross-system lineage of that ETL. We understood exactly the lineage of that ETL in order to be prepared to make the appropriate corrections before we make those changes.

Now what we see here though is that for example when we started this scenario, we were looking into this one report over here that was the error that we were having trouble with. Now that what we see here is that when we have complete clarity and understand the lineage of this ETL, we can see that most likely when we’re changes were made to this ETL that is not going to be the only story– Is not going to be the only error.

Most likely what will happen is some if not all of these different objects on the screen could have or would have been affected by any one change to these this ETL. Of course these stored procedures dimensions, tabular tables, measure groups of views, tables, stored procedures and reports could have been affected. What most likely will happen as time progresses in most organizations, these reports will start to get opened by different people at different times throughout the year. Of course then we hope that those who are those users who are going to be opening these reports will notice the errors in them.

They will open a support ticket. Now I say hope because of course as you understand. If they don’t notice the errors, then it’s just worse. Let’s say that we hope that they opened the support tickets, they noticed the errors in them. Those responsible now looking for into those errors have a mid-test to try to figure out what the root cause is. As you can imagine throughout the year. You’re probably now stuck with only two, or three, or four, or what do we have here? Seven different reports that you have errors with them. it’s probably hundreds, if not thousands, as you can imagine.

We established earlier that most likely will take anywhere from hours, days, or even weeks to try to get to a root cause of an error, so you can multiply and extrapolate and see, understand how long or how much time is being wasted by those who are looking into that. Of course, if they were using Octopai, they could know from the GetGo that ETL is the root cause.

Of course put all that time and effort to better use such as migration projects, data governance, initiated data, quality initiatives, and so on. Now to continue on again, what we’ve shown you is a root cause analysis. Then we showed you an impact analysis at the system level. Right now, what I’d like to do now is show you the next level of lineage, which is intersystem lineage.

Let’s say we now needed to actually make a change to this ETL, and we wanted to know what the impact that those changes might be at the column level. Simply clicking on now on the ETL, we’re going to jump into the inner lineage intersystem lineage. By simply if you’re using SSIS and you’ll be familiar with this, I’m just really taking a 90,000-foot view and dropping down. We’re going from the top all the way down into what we see here, the container, and then within the container, we can now see the data flows themselves.

If I needed to get out understanding at the column level, within the system itself with the intersystem lineage, I simply click on map view. Now if I click on any field, I can actually see that there is the, or actually see the of the journey that that field has taken from the source all the way to the target. Now in addition to that, what we see here on the screen on the green are the source, in orange are the transformations, and red are the targets.

Now let’s say you have a transformation, you’ll have a little icon on the top left over here that tells you that there is actually a transformation in there. If I double-click it, I can click here and see actually the expression for that transformation. If additionally, if you have a calculation, you’ll see an FX somewhere in the lineage, you can double kick that FX, and you’ll get that calculation. Now, of course, we can go at this level, we can go forwards and backwards within the system. We have a complete system lineage or inner system lineage.

Here, I’m going to basically give you an idea. I’m going to go backwards, taking a look at the ETL that’s loading to that table itself. What we see here is also the data flow at the column level. Now that I’ve shown you the inner system lineage, I want to further continue on and show you the actual column to column lineage. Finding that out, it’s very simple. Let’s say the error that you’re having is with, or the issue that you’re having is with unit price column. Right-clicking on it and clicking on end to end column lineage.

You’re actually clicking on the three dots and clicking on end-to-end column lineage will now show me the lineage of that column from the moment that that column enters into the landscape, all the way to the reporting system. Now we can see that at the column level, we can see it at the schema level, table level, and also at the database level as well, giving you the granularity that you might need in order to understand, or to help you with your day-to-day activities.

Now further going on. Now, let’s say you needed to get an understanding of this column. you want to complete the picture and you want to have an understanding of that column within you want to get a business description of that column. For example, it’s tax amount that we clicked on. Now, we get a business description of it. In this case, what’s a demo environment without having a little issue. Let’s take a look and see here. This is the one actually that I was looking for.

Here we go. We’re supposed to jump into unit price. That’s the one that I clicked on. What we see here is first of all, there’s a checkmark on it, which tells us that that one is approved. If you come into this description, you can now get a business description, being confident that that was approved by the data owner. The automated data catalog is actually built for you automatically. It’s the A and the ADC, the way we do that is by extracting the metadata and analyzing the metadata in order to create it for you.

The descriptions can also be populated but of course, the caveat to that is that those descriptions be somewhere within your environment. I won’t go into all of the details within our data catalog. Of course, we can schedule another call for that. Before I go into that, I just wanted to show you one more point, which is the data discovery. What we were in was the end to end column to column lineage. I’m going to go back to that. Should be able to go back to that. Let’s go back to that from here.

All right. We’re here. We understand the column lineage. Now let’s say we now want to understand the column itself and understand everywhere. It would be impact everywhere that that column is referenced and what would be impacted if I needed to make a change. That’s also completely integrated, right clicking on, or clicking on it. Now searching and discovery will take us to the final module, that I wanted to show you. Octopai now goes through all of the different systems that are connected to it.

You can see here, the automated, the ADF, Informatica SSI. The various ETLs databases, data, warehouses analysis, reporting tools, and so on. Shows you everywhere it’s found unit price within your environment. If you need to make a change, you’re going to take to need to take this into consideration. It’s also going to show you where it’s not. You see here in green, where it is and how many times it’s founded, and you’ve seen it in gray and where it’s telling you, it’s not saving you.

I would say, just as important saving you that much time, not enabling you not to go into that, to look into that. I’m just going to go further to give you an idea more, what you can get, the granularity of information you can get from the data discovery module. Let’s say, for example, we see here SQL server, it’s found unit price in objects, 46 times. If I click on any one of those green objects on the screen, it gives us more information such as in this case, we’re looking at the objects themselves.

If I jump into any one of these, I can actually jump into the definition. When I click on the definition, what pops up on the right-hand side is actually the sequel that was used in order to create that definition. In this case, let’s take a look at it, fingers crossed that it works, and it’s showing us actually a map. You were a visualization of that. Finally, you can see that with Octopai’s automation, we can help you reduce the amount of time that you’ve been investing in looking for these or trying to trace back the lineage.

For example, in this case, we can see here one specific column, if you needed to make a change, I’m sure that happens very often. You now stand literally in seconds, the impact about those changes would have and how much effort it would take, of course, in order to do that project. Shannon, that was everything that I had to share with you. Maybe you wanted to open the panel up to questions?

Shannon: Absolutely. Just answer the most commonly asked questions, just a reminder. I will send a follow-up email for this webinar by end of day, Monday with links to the slides and links to the recording of the session. Diving in here, there’s been a lot of questions in both the chat and the Q&A. I’ll try to get to the Q&A here in a second, but I just wanted to jump into this first question that came in for you Anilh.

I know you answered it in the chat, but just if you want to expand on it. Did you decide to build the platform rebuild using Snowflake et cetera before you devised a data management strategy or after, or during?

Anilh: It was in parallel. We knew that part of the entire data architecture solution and rebuilding trust in the data would include a data management solution. In addition, when I started with Snowflake we did not have a data governance team or a data governance office. Those were installed approximately three months after we got the Snowflake development effort started.

Shannon: Awesome. David, what kind of metadata is collected from reports? Also, do you take SQL codes, views, functions, et cetera, as metadata scanning task?

David: Sure. What we’re extracting from the different systems. Every different system is different, but we’re all set. For example, we’re looking at tables and views, of course, we’re looking at SQL or stored procedures, et cetera. As long as it’s technology that is supported by Octopai, we can actually out of the box going into there, extract that metadata and provide you with that lineage.

There’s nothing different that you need to do in order to work with Octopai. As long as it’s one of these technologies that are supported here, we can connect to it out of the box, extract that metadata, and provide you with that lineage that you saw here.

Shannon: Perfect. Anilh, how did you integrate metadata management practice and solution with Octopai ‘s data lineage, and does Octopai leverage an organizations’ metadata repository?

Anilh: What we-

Shannon: Yes, go ahead.

Anilh: What we’ve done is for each of the underlying database source plat applications, we first imported their metadata. Just their information schema. Information schema.tables, information schema.columns. Then that’s imported into the Snowflake data lake. Then when it gets to the conformed layer, we’re actually adding in it’s part of a requirement in our deployment.

You have to have a business definition that’s included in JSON construct. That’s what we export to Octopai into their automated data catalog. Hopefully, that answers the question.

Shannon: Yes. Anything you want to add, David?

David: No, that was perfectly answered. Thank you.

Shannon: I love it. There’s lots of questions, David here about what Octopai connects to, or doesn’t connect to. Do you have a list of products that you connect to?

David: Yes sure, I showed that earlier, but you can simply come to the octopai.com supported-technologies and you’ll see all of the technologies that we support out of the box. Currently, that’s what we have here. What’s coming soon, will be available in the next quarter or two. Additionally, to what you see here, we are in development of open APIs, which in addition to the out-of-the-box technologies you’ll be able to use those to connect to just about any other technology. In essence, you’ll be able to have full lineage whether it’s supported out-of-the-box or not.

In addition to that, we also have augmented links and that is currently available today. That is also for technologies that we don’t support. That is a somewhat manual process for the unsupported system, but you can do that once, and then it will be represented within the lineage. Hopefully, that answers that question.

Shannon: Yes, and I love it. Okay, and I just put the link in the chat for everyone in case you have that. What does Octopai data lineage do that– How does your data lineage differ from what snowflake just announced with their integration and how are you–?

David: Sorry, was that the end of the question?

Shannon: It was. [chuckles]

David: Okay, certainly. First of all, Octopai is, as far as we understand it has the broadest breadth and depth of technologies that we support, as you saw on the screen here. I would imagine that the ETL data warehouse and Reporting tools is going to be a lot different than what snowflake is supporting the depth of the lineage that you can see within Octopai as you saw here today, not just one or two layers of lineage.

We provide you with all three layers of lineage and the third, which I really didn’t cover yet, setting up Octopai literally takes hours, not days weeks, or months as maybe some of the competitors might say, literally hours.

Shannon: Does it that target extension show where exactly the error occurred?

David: I didn’t understand the question.

Shannon: Does that target expansion, sorry, six-show exactly where the error occurred?

David: No. We provide you with the information, of course, we don’t tell you where the error could be, but we give you the information in order to for you then be able to go ahead and correct that. Going forward actually, we are working on AI technology. We will actually show you and even bring you to the actual area, or the actual space. For example, with it’s a snowflake-specific column. We are working on that going forward. That would be available.

Shannon: What relation did you use to connect conceptual and entity person to logical entity person, entity belongs to entity, note goal is connect conceptual model to logical model.

David: I’ll have to defer that question to our technical people, and get back to the questioner via email.

Shannon: All right. Sorry, my questions just moved. Sorry, let me get back to my questions here. [chuckles] I would assume that the business terms are harvested from the available column descriptions if any, if the column descriptions do not exist, can one manually enter a business term definition and lock it, so it cannot be changed when the process is run again?

David: Yes, absolutely. In addition to that, if it’s not in the reporting system, if you have that kept somewhere for example, in a spreadsheet, we can also upload that into Octopai.

Anilh: I’d actually like to augment that response, David because one of the most attractive pieces of the automated data catalog is, it does allow you to track– I know it’s beyond the scope of lineage, but you can identify the data stewards, and that’s where the final arbiter of that definition exists. You can actually run reports against that, you can tag individuals to say, “Hey subject matter expert on this table, has this definition changed?”

Shannon: Very cool.

David: I’m actually maybe show with that right now. Let’s say, for example, you have unit price and it says, “Unit price in US Dollars including tax.” Like Anilh said, you have, of course, the data owners and the data stewards, but you can also just have a chat with them, so if you needed to ask a question, just click on the chat button, type in the name of the person whoever the data steward is, ask them a question they’ll get an email indicating that there is a question and they have to come back to here to answer that.

The point is we had a lot of people asking why don’t you Slack, or why don’t you use Teams? That would actually defeat the purpose, the reason for that is because when you ask the questions and the answers, most likely that question and answer will come up again, maybe even the same day, and what happens with Octopai, you actually have it listed here, so that the following users who are going to actually be looking for those probably similar questions can have that and find the answers for themselves.

Shannon: How’s the lineage harvested by Octopai?

David: Great question. Give me one second, and I hope to be able– Actually, I don’t have that in this PowerPoint presentation. Actually, maybe I do, no I don’t. All right, it’ll take me a little bit of time to find that slide, but I’ll explain it, in any case, the way it works is, Octopai sends you a client, Anilh can attest to this, the client setup literally takes no more than an hour or two, in theory, I guess it shouldn’t, and of course at that’s ensuring that you have the appropriate permissions. If you do, it shouldn’t take more than an hour or two.

What you’re doing is basically pointing Octopai to the various systems that you’re going to be extracting metadata from, such as the ETL, the data warehouse, the Analysis Reporting tool. We give you full instructions on where we need you to point Octopai to, once you hit the run button, Octopai, goes ahead connects to those systems, extracts that metadata, saves it into XML format and those XML formatted files of course can be opened and inspected to ensure that there’s no data.

Which is another point that I want to make sure is absolutely clear. We don’t analyze data whatsoever, so there is no data that’s going outside of your environment. It will be strictly metadata. Once you’ve confirmed that those XML files can be uploaded to the cloud. That then is then uploaded to our instance, in Octopai, your instance, or the customers and portal within Octopai. Once that metadata or those XML files have been uploaded there. That’s triggers the Octopai [clears throat] excuse me, the Octopai service to run.

That’s where all of the magic happens, where the algorithms, the machine learning the vast amount of processing power, comes to play in order to crunch that metadata and provided it in the way that you saw it here today.

Shannon: Can Octopai auto-tagging of business term of conceptual data model to column in PDD in data catalog?

David: I didn’t understand the question. I’ll have to defer that one. Then again, to our technical people. We will answer those questions via `email after the webinar.

Shannon: I love it, I’ll make sure I’ll get those over to you. Can handle– Sorry, let me just getting tongue-tied here. Can Octopai parse Python scripts and display it in lineage? It’s another technical question. I see SQL Transformations.

David: Yes, so we do have, of course, we do support SQL of stored procedures, or any type of external stored procedures from databases or data warehouse that are supported by Octopai. Python currently is not supported by Octopai, we’re looking in developing that specifically. However, as I mentioned earlier, we are in the middle of developing open APIs, which will be enable you to connect and read basically any python or Jason or so on, as script and extract the metadata from there.

Shannon: Awesome. I saw a question in here Anilh, on how large is your company?

Anilh: Zego was recently acquired, but prior to our acquisition, by Global Payments, we were less than 500.

Shannon: Can Octopai infer or determine all variations of customer ID across data landscape, like SUS_ID. C_ ID, customer_number et cetera and show, they all mean or are the same thing.

David: Absolutely, that’s actually part of the lineage and what I showed you earlier, that’s actually yes, absolutely.

Shannon: How about data governance, like complying with CCPA? A client can request and delete their social security number things like that.

David: Absolutely. That’s one of the main use cases. If you need to ensure that. For example, customers are social insurance number has been deleted, you need to know exactly where that is found within the environment and Octopai can show you that.

Shannon: So many great questions coming in. Is it required to have relationships PK FK, relations connected, between logical entities and data catalog tool, showing data lineage or entities enough in LDM or we need relations?

David: That is again deferred question, apologize for that.

Shannon: No worries, lots of great questions here. For the best data lineage, what is the prerequisite? Should the databases have primary key, foreign key all set up?

David: Really there’s nothing else that you need to do. As I mentioned earlier, if it’s technology that we support out of the box, you don’t need to work or do anything in order to prepare for Octopai. You don’t need to do anything to work differently in order for Octopai to extract that metadata. The key point here is that it’s a technology that’s supported by Octopai if that’s the case, that’s where algorithms come into play, we connect to those sources, and we extract that metadata through the analysis that we do with the algorithms, the machine learning, the processing power, the fact that we analyze also all three layers, the Symantec presentation and physical layer. We’re able to provide you with that lineage, and nothing else that you need to do.

Shannon: How quickly is the harvest done and then translated into lineage? Also, how are systems connected together? Do you use do you analyze feeds from one system to another?

David: Yes, of course, and even if they’re in different locations, those again major use cases for Octopai providing you with that lineage. How is it done? As I mentioned earlier, we connect to the very system, extract that metadata, that entire process, the initial setup should take an hour after that it should take you a half an hour to do the extraction automatically, that can be set up to be run on a weekly basis. The upload shouldn’t take more than a few minutes.

The analysis can take up to 24 to 48 hours. Doesn’t mean it takes 24 to 48 hours, but it can take up to 24 to 48 hours and then you have, for example most customers will work this way is they’ll upload a new extraction of the metadata on a Friday and Monday morning, they’ll be certain to have a new version, and they can continue working with the development. [silence] Any other questions, Shannon?

Shannon: Yes, I was talking to my mute button, and does occupy scan and link other objects in the lineage like XML, Jason file structures, flat file structures like C sharp.

David: Languages in general are not supported by Octopai except for if you want to call SQL which is actually language, that is supported as in the scripts. XML, yes it is supported, flat files, yes are supported in our discovery module which I showed you earlier.

Shannon: What about source to target maps in spreadsheets?

David: You mean that I’m assuming that the customer’s asking if they have source to target maps in spreadsheets? No, I don’t see that that being as supported. It’s not necessary in any case, because Octopai will do that for you. Actually, let me answer it a different way, because if that question is asking if we can provide a source of target in an Excel spreadsheet.

Yes, that’s going against the reason for Octopai which is the automation, and able to being able to see that within Octopai, but within Octopai, you can export everything into Excel spreadsheets, and that should have answered both sides of that question.

Shannon: What’s your licensing and pricing structure?

David: Octopai is priced for the platform there is one price, and the module there is no price. There is no charge for anything else, so all of the users can use Octopai with no additional costs. All of the training is included with that, the cloud fees are included with that, maintenance and upgrades are included with that. Together with that subscription you also get a dedicated customer success manager. The moment you sign on with Octopai, we assign your a customer or CSM.

They take you through the ropes from the beginning to the end, provide you with any amount of training necessary, and we can get into the specific details on what the costs are if anybody wants to schedule a call, we can talk about your specific environment, and I can give you exact pricing on that.

Shannon: I love it. Very nice, so can business rooms be take for sensitivity for personal identifiable information PII and that relations inherited express to downstream objects, like a materialized view, so when I’m developing report, I can see field X, Y, Z as PII?

David: Report, repeat that question, because I think I might actually be able to answer it if not, I’ll have to defer, but I think I heard something about tagging about, can you continue? Can you do that again?

Shannon: Yes. Let me rephrase a little bit, so can I tag sensitive information, personally identifiable information and that relations inherited express downstream objects like materialized view, so when I’m developing a report, I can see that this field is tagged with as personal identifiable information?

David: Yes, sure. Absolutely. Within the data catalog, you can do that. You have the capability of tagging, so for example, PII, and then of course you can see through the lineage. If you want to see through the lineage, you can actually understand the lineage within the automated data catalog. You can see that column, for example, is PII or sensitive.

Shannon: Great. Can Octopai scan older programs used for ETL like C basic, Java, Cobol?

David: No, I don’t know that there’s any technology that’s still uses well, that could scan Cobol, but no Octopai does not.

Shannon: Does it have intelligence to show potential data relations from one data source to another?

David: Yes, absolutely.

Shannon: Can you input do a CSV? Is it a CSV import?

David: For, I didn’t answer. Is that an addition to the question that you just mentioned, or is it a new question?

Shannon: New question.

David: Understand the question.

Shannon: New question.

David: What is it?

Shannon: CSV import?

David: Can we import CSV? Yes. Absolutely. That would be supported within the data discovery. As I mentioned earlier, so when I asked question about flat files and XML similar.

Shannon: I don’t see GCP technologies, for example, big query on the support of technologies page, do if they’re on the roadmap?

David: Yes, they are on roadmap later on this year.

Shannon: Now does the lineage know an object is a BI report? Is it necessary to import metadata separately from a reporting server?

David: No, absolutely not. As mentioned earlier, there’s nothing that you need to do separately or differently in order to prepare or for Octopai to work. The key criteria is that you’re using technology that’s supported by Octopai such as power BI, for example. If you are, we connect to it out of the box automatically with that initial setup, and we extract everything that we need. There’s nothing else that you need to do.

Shannon: Can you show an example of your data masking?

David: Data masking. I don’t know that I mentioned that we have it, we don’t.

Shannon: Okay.

David: Sorry. Let me take one step back. We do not analyze data, as we mentioned earlier, we’re only analyzing metadata, so of course there’s no reason for data masking in Octopai.

Shannon: Makes sense. There’s a lot of questions in the chat about data quality, Is there any to manage data quality in Octopai and if so, how?

David: Data quality is outside of the scope of Octopai. Having said that though, we are working on BI for BI or business intelligence for the data intelligence team the data intelligence, I guess environment that is going to be developing released in the next year which we’ll be able to give you insights and ideas about data quality and so on.

Shannon: If there’s not any keys, foreign keys in the database, does it still have to provide the lineage? Is it still able to provide the lineage?

David: I would imagine the question. The answer is yes, because I don’t know the exact answer to this question, but I know that we don’t need anything other than extracting the metadata. The only other thing that we might need on occasion, if is the connection parameters is the only other thing that Octopai would require in order to provide you with that lineage, so I would say the answer is yes.

Shannon: All right. I think the other questions are a bit technical, so we’ve got, I think that’s all the questions we have for now. Well, David and Anilh, this has been so great. I will get all those technical questions over to you and that we weren’t able to get to today, so we can get that included in our follow up. Again, I’ll send a follow up email by end of day, Monday for this webinar, with links to the slides and links to the recording as well.

David: Thank you, Shannon. I just wanted to take this opportunity to thank Anilh once again. Really appreciate it, and thank you everyone for joining and listening to what we have to say. If you liked what you saw, or if there’s anything that piqued interest, and you’d like to find out more, we’d encourage you to schedule a call with one of our representatives to take you in more detail through everything that you saw here today.

Shannon: Thank you both. Thanks to our attendees. Hope you all have a great day.

Anilh: My pleasure. Thanks.

Webinar Transcript

Shannon: Hello and welcome. My name is Shannon Kempe, and I’m the chief digital manager of DATAVERSITY. We would like to thank you for joining this DATAVERSITY webinar, you can’t have best-in-class governance without best-in-class data lineage. Sponsored today by Octopai. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar.

For questions, we will be collecting them via the Q&A in the bottom right-hand corner of your screen, or if you’d like to tweet, we encourage you to share highlights or questions via Twitter using #dataversity. If you’d like to chat with us or with each other, we certainly encourage you to do so. Just to note, the Zoom chat default tends to adjust the panelist, but you may absolutely change that to network with everyone.

To access the Q&A or the chat panels, you can find those icons in the bottom middle of your screen for those features. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now let me introduce to our speakers for today, David Bitton and Anilh Rameshwar.

David has over 20 years of experience working with technology companies and a solid history of global leadership across leadership success, and business to business enterprise sales, specifically software as a service. During the last four years, he has led sales and business development efforts at Octopai, where he enjoys helping BI and analytics professionals harness the power of automated data lineage and discover to achieve full control of their data.

Anilh is an accomplished big data analyst, data developer, database developer, and software engineer with 15 years of experience processing multiple petabyte datasets, and is skilled in finding the subtle nuances of data that make the difference between day-to-day metrics and valuable business insights. With that, I will give the floor to David and Anilh to get today’s webinar started. Hello, and welcome.

David: Thank you very much, Shannon. I’m super excited today to be with the host this together with our good friend, Anilh and Zego. What I’d like to do is jump straight into the presentation, where I’ll have Anilh share some of their challenges, and how they used Octopai to address them. Anilh, would you like to introduce yourself? I think Shannon already did, so but maybe you’d like to talk about the company that you work for, and the existing data environment and so on.

Anilh: Thanks for having me. My name is Anilh Rameshwar. I’m the data architect at Zego. I started at Zego in February of 2020. At that point, what I came into was a software stack with a BI solution that was built on Microsoft SQL Server. There was no lineage whatsoever and there was a decided lack of trust in the data. Maybe we can go to the next slide here, David.

David: Sure.

Anilh: Excellent. There was multiple reports being consumed by multiple business units, and this had evolved over time, without any governance or oversight. There was no data governance team. It was developed by third-party consultants. What happened is, as the data got into more and more hands, the trust in the data began to erode. By the time I had arrived, nobody really trusted what they found in the data warehouse. The metrics were in conflict with queries against source systems.

I was tasked with rebuilding Zego’s entire data platform from the ground up. Part of that endeavor included providing an end-to-end data lineage solution so that we could rebuild trust in the data and to make effective business decisions. Go to the next slide, please.

David: Sure. With these challenges in mind, you embarked on your team’s initiatives, correct?

Anilh: Correct.

David: Can you share a little bit about that with us?

Anilh: Sure. The initiatives were primarily to regain trust in the data so we could make effective business decisions, or are you talking about the actual mechanics behind it? Sorry, David.

David: That’s okay. Whatever you’d like to share with us. Basically, the slide here, the data engineering initiatives.

Anilh: Sure. We had four core applications that were disparate. The legacy data warehouse only included one of those four applications. Folks were frequently trying to pull data from application B and commingle it with data from application A, and they were getting the wrong results. We determined that our best course of action was to build a data lake in Snowflake. From the data lake, we developed a conformed layer and then on top of that conformed layer, we developed the reporting layer.

This is where we decided we needed a data lineage solution. Some of the challenges that we faced with the legacy data warehouse, were fundamental related to that source code, which was buried deep within stored procedures and SSIS packages. We found multiple problems, for example, stale dimension data where our lookup table hadn’t been updated, changes in the source system behavior where the LBW code remained static, transformations that changed over time without history tracking. In other words, the metrics would change over time.

Then, one of the scariest pieces that I had to tackle was the derived columns with ambiguous names and unknown definitions. We decided to rebuild all of this in Snowflake and tackle the majority of those problems with the conformed layer. The data pipeline– Oh, there we go. It’s the next slide. I think I was one slide ahead of you. I apologize.

David: No, worries. Sure. This slide?

Anilh: I think I’ve gone over all of these. The biggest thing was, we didn’t have a single source of truth. The other challenge that we faced was the current system with a legacy data warehouse. It only served the financial department, but then as it began to be used by the customer success department, and our tech ops teams, they were getting the wrong data. That’s when we determined we needed Octopai and a data lineage solution so we could accurately track data from source system all the way to end consumers.

David: Sure, okay. This effort could have been reduced a few hours with the data lineage solutions in place, is that what you’re saying?

Anilh: One of the big challenges is a report consumer would get a report and they would compare the results to what they saw in the source system. Then it would take hours or weeks to actually track down what had changed, what was the variance between what they were seeing in the legacy data warehouse report, and what they were seeing in the source system.

David: All right. Up until now, there was basically no management tracking or visibility in where sensitive data exists or beyond the source systems, and now it was consumed. Zego is now, as I understand, applying appropriate data masking and governance policies in order to ensure that the data is protected from source systems through all the different endpoints. Can you maybe elaborate on that?

Anilh: Yes, thanks for reminding me of that. One of the other challenges I found with the legacy data warehouse is sensitive data, thankfully, not PCI, but PII and other sensitive data was exposed, originally for consumption by the finance department. However, as the usage grew without governance, the inappropriate data was in the hands of the inappropriate folks. What we also found is that data was being shared directly to our clients, without any governance or oversight.

We needed to put a stop to that as soon as we got a data governance policy in place. We use Snowflake’s tagging mechanism to tag the data, and we’re using Octopai now to see where the sensitive data lands in the conformed layer. Then we’re applying masking policies using Octopai’s data lineage to ensure that the appropriate departments don’t see data that they’re not privy to.

David: Great. Thank you for sharing that with us, Anilh. Basically, to summarize what you just covered up today, what you need to look for in lineage solutions in order to have the best-in-class governance program. Data lineage is actually the crucial part of data governance, since it provides the records of data movement. The top three features that you need to look for in a good data lineage solution will be, of course, to provide the best support to your governance program.

Those are coverage. We want to see, of course, as many systems that’s possible covered, such as ETLs, data warehouses, analysis, reporting tools, visual map, which will be quick to follow and understand, and of course, the automatic data catalog, which integrates lineage, and that will allow access to assets with the capability to track their lineage at any given time. What I’m going to do now is jump into the demo portion of tonight’s webinar.

Sure, so give me one moment, you should be able to see the Octopai demo environment in front of us today. Thanks, once again, Anilh. I’m going to now jump into the demo and show some of the attendees here today, how Octopai can actually provide the best-in-class lineage for their data governance initiative. What we have here on the screen is the Octopai demo environment. On the left-hand side, what we see here, just to go through what we have on the screen, and then we’ll jump into a demo using a few different use cases.

On the left-hand side, these are basically the different modules within Octopai, showing you, as far as we understand, the best-in-class data lineage, because we actually have three layers for data lineage XT, which is cross-system lineage, inter-system lineage, and end-to-end column lineage. Together with that, we also have the discovery space and I’ll explain to you why that’s important just in a few moments as well.

All right, so to further explain what we have on the screen, on the left hand side these in our dev environment we have 398 different ETLs from various different systems that we can see here on the screen. In the middle we have roughly or well exactly 3247 DB objects including tables and views for example from these various systems that we see here, and that’s basically a sampling of the types of technologies that Octopai is able to out of the box extract or automatically extract the metadata from.

Then here we see here on the right-hand side of the BI tools or the reporting tools in the 23 different reports. What I’d like to do is show you the power of Octopai with reference to a few different use cases as I mentioned. We’ll touch on the very high level, the various areas within Octopai and from there of course you’ll be able to see how that’s important to understand at the very granular level, the data lineage of your data environment to support a best-in-class data governance platform initiative.

The way I’ll do that is through a use case. The first one that I will go through is going to be the most common one that we see amongst our customers or that our customers are telling us is the most common one amongst their organizations or their data environments. That is you have an error in a report. I’m sure we’re all very familiar with that. In general in most organizations today, there’s an error and a report. If there’s an error in a report, the way that’s handled is probably very similar to this.

Let’s say you have Mr. or Mrs. CFO looking at a report, let’s say at the end of the quarter. Of course they’re stressed to in order to be able to support or to provide the quarterly earnings. Let’s say there’s something wrong with that report. Of course they’re going to open up a support ticket. The appropriate team is going to now need to look into that to try to understand what went wrong with that report. Most likely they’ll need to go through a process which is very similar in most organizations.

That is you’ll start off by probably taking a look at the map of the systems and then probably taking a look at the tables and views that were involved in the creation that report. They’ll probably look into maybe the glossary to see if the labels were given the same names, and if not which glossary sorry was used. After that the error may not be in the database area, then probably look into the ETL. Looking into that of course all of this will be done manually.

Most likely will involve multiple peoples within different or different teams within the organization that have different responsibilities within their domain. What I’m getting at is basically a lot of people, a lot of time, may not be of course even 100%, so it’s basically not efficient. Now imagine– Of course that would take literally most likely in most organizations anywhere from hours, days, weeks, even months if at the most extreme case.

Now what I’d like to do is show you that same scenario and giving you an example of the lineage capabilities within Octopai, and show you how that would be done literally automatically in a few seconds. Let’s imagine for a moment that the issue that we’re having with is in a report called customer products. I’m going to come into the Octopai’s lineage space type in the name of the report that we’re having trouble with. Right now we’re going to go into the first level of lineage that Octopai provides.

Remember I mentioned three layers of lineage, Octopai lineage XD, cross system, inner system, and then end to end column to column. Right now at the high level what we need to understand is how that report was created. As I typed it in you see here that Octopai’s filtered through all the metadata and shown me the report we’re having trouble with. If I click on cross system lineage in about a second and now I understand how that report was created, and most organizations just to get to this very high level of understanding may take hours or days.

As you can see here at the click of a mouse, we now have that understanding here on the screen. On the right hand side what we see here is the report that we’re having trouble with. As we move to the left, we can start to see how that report was created, and we see here that there are two views that were involved in the creation of that report. If I click on any object on the screen as you see here, you get a radial dial that comes up and what that does is offers us more capabilities and more information.

Let’s say for example I needed to get a visualization of this view, maybe there were many different transformations and I wanted to get a visualization which may help me understand it. Clicking on that will show us now the source transformation and target. As I move to my left I continue to try to trace how the data landed in that report. I can see here that there were also three different tables that were involved in the creation of that report.

Similarly if I click on a table I can get also that similar radial dial. In this case let’s say for example I’m sure you’re probably familiar with being tasked making a change to a specific table a calculation, a transformation whatever that might be in that table. What we can see here is that if I click on that table, I get that familiar radial dial. What we see here on the bottom right is a six with an arrow to the right. That means that there are actually target objects or objects that are dependent on that table.

If I were to make changes to that table, I can most likely be sure that some if not all of these different objects that have now popped up will have been affected including an additional stored procedure, a tabular table, measure groups and these four different report or three different reports. As we continue to move to our– You can imagine how long that would have taken of course if you had to do that manually, or in another situation where you’re not using Octopai.

In any case as we move to the left, we start to see here or we come to the ETLs that were involved in that report not one ETL, but multiple different ETLs were involved in the creation of that report. The reason why I’m pointing that out is that many organizations are using many different systems, because systems come along and they are integrated, maybe there’s a merger and acquisition. Maybe there’s a legacy system that you just haven’t haven’t put to rest, and you’ve introduced new technologies, so you’re probably using many different systems to manage and move your data.

That’s not a challenge for Octopai you can see here, we can still show you the path that data has taken in order to land on that report. Now a couple more things that I wanted to point out before I continue on, you may have noticed I know it’s probably small on your screen but there is a shadow to the right of this object over here, this table. There is a shadow to the left of this ETL, and there’s a shadow all the way around this ETL over here.

What that’s telling you is basically there are dependent objects where this object is actually sourcing from other objects and then you can actually continue to decipher or unravel the lineage by just clicking on this case. For example the eight to the left will show us now the lineage or the other objects that this ETL is sourcing from and which is basically these eight different tables. To continue on with our scenario, we asked our customer this one they hear that was having troubles with this report if they had any idea what went wrong with that report.

They admitted that a few weeks earlier before they started using Octopai they had made changes to this one ETL over here. Most likely when they make changes, they usually run into or encounter production issues which is a common scenario in most organizations. Now we asked them if they were going to be making those changes and they knew that they were going to be encountering production issues, why not be proactive? Why not look into what will be affected?

Make the appropriate corrections and save everybody the hassles of the production issues, save the data quality issues that then result with all those production issues, save or increase the confidence in the data or the trust in the data as Anilh was speaking about before because if the data is seen as solid, then of course the trust goes up. Now of course as we all know that’s a lot easier said than done because in most organizations in order to do that it means looking into many, many different objects, many different ETLs tables and views reports and so on.

It could be literally thousands or even hundreds of thousands, so to try to be proactive really is just almost impossible. Most organizations work in that reactive measure, in the reactive way of course trying to avoid production issues whenever there are changes done. Then if there are production issues they will address them as they become apparent, and therein lies a lot of the issues with the data quality.

Of course because you’re only fixing what you know of, and if you’re only fixing what you know I’m sure you can imagine that there will be things that fall through the cracks. Now with Octopai we’ve empowered you to become proactive, and so you can actually now ensure that there are little or no production issues by understanding exactly what will be broken if you make a change. This customer if you were to make a change to that wanted to make a change of the CTL, you’re now empowered to understand exactly so far what will be of course affected.

Now before I jump into that, what we’ve shown you so far at the system level which is the highest level of lineage was a root cause analysis for a specific report. Now we’re going to jump the other way. We’re going to do an impact analysis. Let’s say for example before we were to make a change of this ETL, we jumped into the cross-system lineage of that ETL. We understood exactly the lineage of that ETL in order to be prepared to make the appropriate corrections before we make those changes.

Now what we see here though is that for example when we started this scenario, we were looking into this one report over here that was the error that we were having trouble with. Now that what we see here is that when we have complete clarity and understand the lineage of this ETL, we can see that most likely when we’re changes were made to this ETL that is not going to be the only story– Is not going to be the only error.

Most likely what will happen is some if not all of these different objects on the screen could have or would have been affected by any one change to these this ETL. Of course these stored procedures dimensions, tabular tables, measure groups of views, tables, stored procedures and reports could have been affected. What most likely will happen as time progresses in most organizations, these reports will start to get opened by different people at different times throughout the year. Of course then we hope that those who are those users who are going to be opening these reports will notice the errors in them.

They will open a support ticket. Now I say hope because of course as you understand. If they don’t notice the errors, then it’s just worse. Let’s say that we hope that they opened the support tickets, they noticed the errors in them. Those responsible now looking for into those errors have a mid-test to try to figure out what the root cause is. As you can imagine throughout the year. You’re probably now stuck with only two, or three, or four, or what do we have here? Seven different reports that you have errors with them. it’s probably hundreds, if not thousands, as you can imagine.

We established earlier that most likely will take anywhere from hours, days, or even weeks to try to get to a root cause of an error, so you can multiply and extrapolate and see, understand how long or how much time is being wasted by those who are looking into that. Of course, if they were using Octopai, they could know from the GetGo that ETL is the root cause.

Of course put all that time and effort to better use such as migration projects, data governance, initiated data, quality initiatives, and so on. Now to continue on again, what we’ve shown you is a root cause analysis. Then we showed you an impact analysis at the system level. Right now, what I’d like to do now is show you the next level of lineage, which is intersystem lineage.

Let’s say we now needed to actually make a change to this ETL, and we wanted to know what the impact that those changes might be at the column level. Simply clicking on now on the ETL, we’re going to jump into the inner lineage intersystem lineage. By simply if you’re using SSIS and you’ll be familiar with this, I’m just really taking a 90,000-foot view and dropping down. We’re going from the top all the way down into what we see here, the container, and then within the container, we can now see the data flows themselves.

If I needed to get out understanding at the column level, within the system itself with the intersystem lineage, I simply click on map view. Now if I click on any field, I can actually see that there is the, or actually see the of the journey that that field has taken from the source all the way to the target. Now in addition to that, what we see here on the screen on the green are the source, in orange are the transformations, and red are the targets.

Now let’s say you have a transformation, you’ll have a little icon on the top left over here that tells you that there is actually a transformation in there. If I double-click it, I can click here and see actually the expression for that transformation. If additionally, if you have a calculation, you’ll see an FX somewhere in the lineage, you can double kick that FX, and you’ll get that calculation. Now, of course, we can go at this level, we can go forwards and backwards within the system. We have a complete system lineage or inner system lineage.

Here, I’m going to basically give you an idea. I’m going to go backwards, taking a look at the ETL that’s loading to that table itself. What we see here is also the data flow at the column level. Now that I’ve shown you the inner system lineage, I want to further continue on and show you the actual column to column lineage. Finding that out, it’s very simple. Let’s say the error that you’re having is with, or the issue that you’re having is with unit price column. Right-clicking on it and clicking on end to end column lineage.

You’re actually clicking on the three dots and clicking on end-to-end column lineage will now show me the lineage of that column from the moment that that column enters into the landscape, all the way to the reporting system. Now we can see that at the column level, we can see it at the schema level, table level, and also at the database level as well, giving you the granularity that you might need in order to understand, or to help you with your day-to-day activities.

Now further going on. Now, let’s say you needed to get an understanding of this column. you want to complete the picture and you want to have an understanding of that column within you want to get a business description of that column. For example, it’s tax amount that we clicked on. Now, we get a business description of it. In this case, what’s a demo environment without having a little issue. Let’s take a look and see here. This is the one actually that I was looking for.

Here we go. We’re supposed to jump into unit price. That’s the one that I clicked on. What we see here is first of all, there’s a checkmark on it, which tells us that that one is approved. If you come into this description, you can now get a business description, being confident that that was approved by the data owner. The automated data catalog is actually built for you automatically. It’s the A and the ADC, the way we do that is by extracting the metadata and analyzing the metadata in order to create it for you.

The descriptions can also be populated but of course, the caveat to that is that those descriptions be somewhere within your environment. I won’t go into all of the details within our data catalog. Of course, we can schedule another call for that. Before I go into that, I just wanted to show you one more point, which is the data discovery. What we were in was the end to end column to column lineage. I’m going to go back to that. Should be able to go back to that. Let’s go back to that from here.

All right. We’re here. We understand the column lineage. Now let’s say we now want to understand the column itself and understand everywhere. It would be impact everywhere that that column is referenced and what would be impacted if I needed to make a change. That’s also completely integrated, right clicking on, or clicking on it. Now searching and discovery will take us to the final module, that I wanted to show you. Octopai now goes through all of the different systems that are connected to it.

You can see here, the automated, the ADF, Informatica SSI. The various ETLs databases, data, warehouses analysis, reporting tools, and so on. Shows you everywhere it’s found unit price within your environment. If you need to make a change, you’re going to take to need to take this into consideration. It’s also going to show you where it’s not. You see here in green, where it is and how many times it’s founded, and you’ve seen it in gray and where it’s telling you, it’s not saving you.

I would say, just as important saving you that much time, not enabling you not to go into that, to look into that. I’m just going to go further to give you an idea more, what you can get, the granularity of information you can get from the data discovery module. Let’s say, for example, we see here SQL server, it’s found unit price in objects, 46 times. If I click on any one of those green objects on the screen, it gives us more information such as in this case, we’re looking at the objects themselves.

If I jump into any one of these, I can actually jump into the definition. When I click on the definition, what pops up on the right-hand side is actually the sequel that was used in order to create that definition. In this case, let’s take a look at it, fingers crossed that it works, and it’s showing us actually a map. You were a visualization of that. Finally, you can see that with Octopai’s automation, we can help you reduce the amount of time that you’ve been investing in looking for these or trying to trace back the lineage.

For example, in this case, we can see here one specific column, if you needed to make a change, I’m sure that happens very often. You now stand literally in seconds, the impact about those changes would have and how much effort it would take, of course, in order to do that project. Shannon, that was everything that I had to share with you. Maybe you wanted to open the panel up to questions?

Shannon: Absolutely. Just answer the most commonly asked questions, just a reminder. I will send a follow-up email for this webinar by end of day, Monday with links to the slides and links to the recording of the session. Diving in here, there’s been a lot of questions in both the chat and the Q&A. I’ll try to get to the Q&A here in a second, but I just wanted to jump into this first question that came in for you Anilh.

I know you answered it in the chat, but just if you want to expand on it. Did you decide to build the platform rebuild using Snowflake et cetera before you devised a data management strategy or after, or during?

Anilh: It was in parallel. We knew that part of the entire data architecture solution and rebuilding trust in the data would include a data management solution. In addition, when I started with Snowflake we did not have a data governance team or a data governance office. Those were installed approximately three months after we got the Snowflake development effort started.

Shannon: Awesome. David, what kind of metadata is collected from reports? Also, do you take SQL codes, views, functions, et cetera, as metadata scanning task?

David: Sure. What we’re extracting from the different systems. Every different system is different, but we’re all set. For example, we’re looking at tables and views, of course, we’re looking at SQL or stored procedures, et cetera. As long as it’s technology that is supported by Octopai, we can actually out of the box going into there, extract that metadata and provide you with that lineage.

There’s nothing different that you need to do in order to work with Octopai. As long as it’s one of these technologies that are supported here, we can connect to it out of the box, extract that metadata, and provide you with that lineage that you saw here.

Shannon: Perfect. Anilh, how did you integrate metadata management practice and solution with Octopai ‘s data lineage, and does Octopai leverage an organizations’ metadata repository?

Anilh: What we-

Shannon: Yes, go ahead.

Anilh: What we’ve done is for each of the underlying database source plat applications, we first imported their metadata. Just their information schema. Information schema.tables, information schema.columns. Then that’s imported into the Snowflake data lake. Then when it gets to the conformed layer, we’re actually adding in it’s part of a requirement in our deployment.

You have to have a business definition that’s included in JSON construct. That’s what we export to Octopai into their automated data catalog. Hopefully, that answers the question.

Shannon: Yes. Anything you want to add, David?

David: No, that was perfectly answered. Thank you.

Shannon: I love it. There’s lots of questions, David here about what Octopai connects to, or doesn’t connect to. Do you have a list of products that you connect to?

David: Yes sure, I showed that earlier, but you can simply come to the octopai.com supported-technologies and you’ll see all of the technologies that we support out of the box. Currently, that’s what we have here. What’s coming soon, will be available in the next quarter or two. Additionally, to what you see here, we are in development of open APIs, which in addition to the out-of-the-box technologies you’ll be able to use those to connect to just about any other technology. In essence, you’ll be able to have full lineage whether it’s supported out-of-the-box or not.

In addition to that, we also have augmented links and that is currently available today. That is also for technologies that we don’t support. That is a somewhat manual process for the unsupported system, but you can do that once, and then it will be represented within the lineage. Hopefully, that answers that question.

Shannon: Yes, and I love it. Okay, and I just put the link in the chat for everyone in case you have that. What does Octopai data lineage do that– How does your data lineage differ from what snowflake just announced with their integration and how are you–?

David: Sorry, was that the end of the question?

Shannon: It was. [chuckles]

David: Okay, certainly. First of all, Octopai is, as far as we understand it has the broadest breadth and depth of technologies that we support, as you saw on the screen here. I would imagine that the ETL data warehouse and Reporting tools is going to be a lot different than what snowflake is supporting the depth of the lineage that you can see within Octopai as you saw here today, not just one or two layers of lineage.

We provide you with all three layers of lineage and the third, which I really didn’t cover yet, setting up Octopai literally takes hours, not days weeks, or months as maybe some of the competitors might say, literally hours.

Shannon: Does it that target extension show where exactly the error occurred?

David: I didn’t understand the question.

Shannon: Does that target expansion, sorry, six-show exactly where the error occurred?

David: No. We provide you with the information, of course, we don’t tell you where the error could be, but we give you the information in order to for you then be able to go ahead and correct that. Going forward actually, we are working on AI technology. We will actually show you and even bring you to the actual area, or the actual space. For example, with it’s a snowflake-specific column. We are working on that going forward. That would be available.

Shannon: What relation did you use to connect conceptual and entity person to logical entity person, entity belongs to entity, note goal is connect conceptual model to logical model.

David: I’ll have to defer that question to our technical people, and get back to the questioner via email.

Shannon: All right. Sorry, my questions just moved. Sorry, let me get back to my questions here. [chuckles] I would assume that the business terms are harvested from the available column descriptions if any, if the column descriptions do not exist, can one manually enter a business term definition and lock it, so it cannot be changed when the process is run again?

David: Yes, absolutely. In addition to that, if it’s not in the reporting system, if you have that kept somewhere for example, in a spreadsheet, we can also upload that into Octopai.

Anilh: I’d actually like to augment that response, David because one of the most attractive pieces of the automated data catalog is, it does allow you to track– I know it’s beyond the scope of lineage, but you can identify the data stewards, and that’s where the final arbiter of that definition exists. You can actually run reports against that, you can tag individuals to say, “Hey subject matter expert on this table, has this definition changed?”

Shannon: Very cool.

David: I’m actually maybe show with that right now. Let’s say, for example, you have unit price and it says, “Unit price in US Dollars including tax.” Like Anilh said, you have, of course, the data owners and the data stewards, but you can also just have a chat with them, so if you needed to ask a question, just click on the chat button, type in the name of the person whoever the data steward is, ask them a question they’ll get an email indicating that there is a question and they have to come back to here to answer that.

The point is we had a lot of people asking why don’t you Slack, or why don’t you use Teams? That would actually defeat the purpose, the reason for that is because when you ask the questions and the answers, most likely that question and answer will come up again, maybe even the same day, and what happens with Octopai, you actually have it listed here, so that the following users who are going to actually be looking for those probably similar questions can have that and find the answers for themselves.

Shannon: How’s the lineage harvested by Octopai?

David: Great question. Give me one second, and I hope to be able– Actually, I don’t have that in this PowerPoint presentation. Actually, maybe I do, no I don’t. All right, it’ll take me a little bit of time to find that slide, but I’ll explain it, in any case, the way it works is, Octopai sends you a client, Anilh can attest to this, the client setup literally takes no more than an hour or two, in theory, I guess it shouldn’t, and of course at that’s ensuring that you have the appropriate permissions. If you do, it shouldn’t take more than an hour or two.

What you’re doing is basically pointing Octopai to the various systems that you’re going to be extracting metadata from, such as the ETL, the data warehouse, the Analysis Reporting tool. We give you full instructions on where we need you to point Octopai to, once you hit the run button, Octopai, goes ahead connects to those systems, extracts that metadata, saves it into XML format and those XML formatted files of course can be opened and inspected to ensure that there’s no data.

Which is another point that I want to make sure is absolutely clear. We don’t analyze data whatsoever, so there is no data that’s going outside of your environment. It will be strictly metadata. Once you’ve confirmed that those XML files can be uploaded to the cloud. That then is then uploaded to our instance, in Octopai, your instance, or the customers and portal within Octopai. Once that metadata or those XML files have been uploaded there. That’s triggers the Octopai [clears throat] excuse me, the Octopai service to run.

That’s where all of the magic happens, where the algorithms, the machine learning the vast amount of processing power, comes to play in order to crunch that metadata and provided it in the way that you saw it here today.

Shannon: Can Octopai auto-tagging of business term of conceptual data model to column in PDD in data catalog?

David: I didn’t understand the question. I’ll have to defer that one. Then again, to our technical people. We will answer those questions via `email after the webinar.

Shannon: I love it, I’ll make sure I’ll get those over to you. Can handle– Sorry, let me just getting tongue-tied here. Can Octopai parse Python scripts and display it in lineage? It’s another technical question. I see SQL Transformations.

David: Yes, so we do have, of course, we do support SQL of stored procedures, or any type of external stored procedures from databases or data warehouse that are supported by Octopai. Python currently is not supported by Octopai, we’re looking in developing that specifically. However, as I mentioned earlier, we are in the middle of developing open APIs, which will be enable you to connect and read basically any python or Jason or so on, as script and extract the metadata from there.

Shannon: Awesome. I saw a question in here Anilh, on how large is your company?

Anilh: Zego was recently acquired, but prior to our acquisition, by Global Payments, we were less than 500.

Shannon: Can Octopai infer or determine all variations of customer ID across data landscape, like SUS_ID. C_ ID, customer_number et cetera and show, they all mean or are the same thing.

David: Absolutely, that’s actually part of the lineage and what I showed you earlier, that’s actually yes, absolutely.

Shannon: How about data governance, like complying with CCPA? A client can request and delete their social security number things like that.

David: Absolutely. That’s one of the main use cases. If you need to ensure that. For example, customers are social insurance number has been deleted, you need to know exactly where that is found within the environment and Octopai can show you that.

Shannon: So many great questions coming in. Is it required to have relationships PK FK, relations connected, between logical entities and data catalog tool, showing data lineage or entities enough in LDM or we need relations?

David: That is again deferred question, apologize for that.

Shannon: No worries, lots of great questions here. For the best data lineage, what is the prerequisite? Should the databases have primary key, foreign key all set up?

David: Really there’s nothing else that you need to do. As I mentioned earlier, if it’s technology that we support out of the box, you don’t need to work or do anything in order to prepare for Octopai. You don’t need to do anything to work differently in order for Octopai to extract that metadata. The key point here is that it’s a technology that’s supported by Octopai if that’s the case, that’s where algorithms come into play, we connect to those sources, and we extract that metadata through the analysis that we do with the algorithms, the machine learning, the processing power, the fact that we analyze also all three layers, the Symantec presentation and physical layer. We’re able to provide you with that lineage, and nothing else that you need to do.

Shannon: How quickly is the harvest done and then translated into lineage? Also, how are systems connected together? Do you use do you analyze feeds from one system to another?

David: Yes, of course, and even if they’re in different locations, those again major use cases for Octopai providing you with that lineage. How is it done? As I mentioned earlier, we connect to the very system, extract that metadata, that entire process, the initial setup should take an hour after that it should take you a half an hour to do the extraction automatically, that can be set up to be run on a weekly basis. The upload shouldn’t take more than a few minutes.

The analysis can take up to 24 to 48 hours. Doesn’t mean it takes 24 to 48 hours, but it can take up to 24 to 48 hours and then you have, for example most customers will work this way is they’ll upload a new extraction of the metadata on a Friday and Monday morning, they’ll be certain to have a new version, and they can continue working with the development. [silence] Any other questions, Shannon?

Shannon: Yes, I was talking to my mute button, and does occupy scan and link other objects in the lineage like XML, Jason file structures, flat file structures like C sharp.

David: Languages in general are not supported by Octopai except for if you want to call SQL which is actually language, that is supported as in the scripts. XML, yes it is supported, flat files, yes are supported in our discovery module which I showed you earlier.

Shannon: What about source to target maps in spreadsheets?

David: You mean that I’m assuming that the customer’s asking if they have source to target maps in spreadsheets? No, I don’t see that that being as supported. It’s not necessary in any case, because Octopai will do that for you. Actually, let me answer it a different way, because if that question is asking if we can provide a source of target in an Excel spreadsheet.

Yes, that’s going against the reason for Octopai which is the automation, and able to being able to see that within Octopai, but within Octopai, you can export everything into Excel spreadsheets, and that should have answered both sides of that question.

Shannon: What’s your licensing and pricing structure?

David: Octopai is priced for the platform there is one price, and the module there is no price. There is no charge for anything else, so all of the users can use Octopai with no additional costs. All of the training is included with that, the cloud fees are included with that, maintenance and upgrades are included with that. Together with that subscription you also get a dedicated customer success manager. The moment you sign on with Octopai, we assign your a customer or CSM.

They take you through the ropes from the beginning to the end, provide you with any amount of training necessary, and we can get into the specific details on what the costs are if anybody wants to schedule a call, we can talk about your specific environment, and I can give you exact pricing on that.

Shannon: I love it. Very nice, so can business rooms be take for sensitivity for personal identifiable information PII and that relations inherited express to downstream objects, like a materialized view, so when I’m developing report, I can see field X, Y, Z as PII?

David: Report, repeat that question, because I think I might actually be able to answer it if not, I’ll have to defer, but I think I heard something about tagging about, can you continue? Can you do that again?

Shannon: Yes. Let me rephrase a little bit, so can I tag sensitive information, personally identifiable information and that relations inherited express downstream objects like materialized view, so when I’m developing a report, I can see that this field is tagged with as personal identifiable information?

David: Yes, sure. Absolutely. Within the data catalog, you can do that. You have the capability of tagging, so for example, PII, and then of course you can see through the lineage. If you want to see through the lineage, you can actually understand the lineage within the automated data catalog. You can see that column, for example, is PII or sensitive.

Shannon: Great. Can Octopai scan older programs used for ETL like C basic, Java, Cobol?

David: No, I don’t know that there’s any technology that’s still uses well, that could scan Cobol, but no Octopai does not.

Shannon: Does it have intelligence to show potential data relations from one data source to another?

David: Yes, absolutely.

Shannon: Can you input do a CSV? Is it a CSV import?

David: For, I didn’t answer. Is that an addition to the question that you just mentioned, or is it a new question?

Shannon: New question.

David: Understand the question.

Shannon: New question.

David: What is it?

Shannon: CSV import?

David: Can we import CSV? Yes. Absolutely. That would be supported within the data discovery. As I mentioned earlier, so when I asked question about flat files and XML similar.

Shannon: I don’t see GCP technologies, for example, big query on the support of technologies page, do if they’re on the roadmap?

David: Yes, they are on roadmap later on this year.

Shannon: Now does the lineage know an object is a BI report? Is it necessary to import metadata separately from a reporting server?

David: No, absolutely not. As mentioned earlier, there’s nothing that you need to do separately or differently in order to prepare or for Octopai to work. The key criteria is that you’re using technology that’s supported by Octopai such as power BI, for example. If you are, we connect to it out of the box automatically with that initial setup, and we extract everything that we need. There’s nothing else that you need to do.

Shannon: Can you show an example of your data masking?

David: Data masking. I don’t know that I mentioned that we have it, we don’t.

Shannon: Okay.

David: Sorry. Let me take one step back. We do not analyze data, as we mentioned earlier, we’re only analyzing metadata, so of course there’s no reason for data masking in Octopai.

Shannon: Makes sense. There’s a lot of questions in the chat about data quality, Is there any to manage data quality in Octopai and if so, how?

David: Data quality is outside of the scope of Octopai. Having said that though, we are working on BI for BI or business intelligence for the data intelligence team the data intelligence, I guess environment that is going to be developing released in the next year which we’ll be able to give you insights and ideas about data quality and so on.

Shannon: If there’s not any keys, foreign keys in the database, does it still have to provide the lineage? Is it still able to provide the lineage?

David: I would imagine the question. The answer is yes, because I don’t know the exact answer to this question, but I know that we don’t need anything other than extracting the metadata. The only other thing that we might need on occasion, if is the connection parameters is the only other thing that Octopai would require in order to provide you with that lineage, so I would say the answer is yes.

Shannon: All right. I think the other questions are a bit technical, so we’ve got, I think that’s all the questions we have for now. Well, David and Anilh, this has been so great. I will get all those technical questions over to you and that we weren’t able to get to today, so we can get that included in our follow up. Again, I’ll send a follow up email by end of day, Monday for this webinar, with links to the slides and links to the recording as well.

David: Thank you, Shannon. I just wanted to take this opportunity to thank Anilh once again. Really appreciate it, and thank you everyone for joining and listening to what we have to say. If you liked what you saw, or if there’s anything that piqued interest, and you’d like to find out more, we’d encourage you to schedule a call with one of our representatives to take you in more detail through everything that you saw here today.

Shannon: Thank you both. Thanks to our attendees. Hope you all have a great day.

Anilh: My pleasure. Thanks.