Amnon Drori: Hi everyone again. We can start. Thank you everyone for sparing the time today. My name is Amnon Drori. I’m one of the co-founders at Octopai. What we would like to do today is to share with you some of our insights regarding migration, which involves a lot of projects that we are experiencing in the past couple of years. Octopai is targeted to help business intelligence organizations with running better, their BI landscape. One of the most popular use cases that we had experienced in the past couple of years is what we call migration, moving from one system to another, because of a use case.
What we’ve decided to do today is to share with you some of the insights that we got from our customers as they were doing migration projects in the past couple of years. Some of them had moved from one system to another. Some of them had moved from one environment to another, and we wanted to capture some of the challenges and some of the insights, and some of the ways they had to deal with, that we can share with you today.
Either you have a migration project or thinking of a migration project, one of the things that we’ve learned is that within the BI landscape, there are some changes that are happening between different tools and different environments. In some of the projects that we were involved, we’ve seen moving away from what they call legacy systems into more modern systems in the BI space, case in point, moving from IBM DataStage ETL that had been exist for the past 7 to 10 years to Azure Data Factory or some other ETL tools.
In some cases, we’ve seen moving away from Oracle to Snowflake or other means of databases in the cloud. In more popularly, we’ve seen moving away from old legacy reporting tools like Cognos, OBIEE, the Oracle reporting system, to more robust and more modern reporting tools, case in point, Tableau, Power BI, or other reporting tools.
Now, in all of the migration from one system to another, either because of a company bought another company and they want to consolidate some of the systems or get rid of the old ones, or because you want to allow your business users to enjoy better reporting skills, therefore, you want to migrate from old costly systems to more modern ones, there are some questions that are being asked all the time, either as a prep to the project or during the project in which it involves a lot of difficulties and a lot of questions to answer when you want to start your project.
What we would like to do is to share with you some questions that we’ve been asked to help our clients as they were thinking about migration projects. Before we ask them those questions, we asked them, “What worries you? What are you afraid of? What are you concerned about that involve around a migration project?” From their experience of doing this couple of times before they said, “When you think about migrating from one system to another, and you want to make it even more spicy, you want to move from an environment, let’s say on-prem to the cloud, it takes a lot of preparation to make that happen.”
It takes a lot of time to be able to migrate the project in a way that you are well prepared. It’s a tedious effort that has to do with many cycles just to make sure that you have migrated from the old system to the new system. You can do this within timeframe, within budget. The most scariest part is when you go live, when you move away entirely from the old system to the new systems. Some project took anything from the lowest side on the six to eight months, and some project took two to three years.
What were the questions that have been asking during the preparation of the BI migration project? What is it that we can share with you today? For example, moving away from one of the projects we were recently involved in the past two and a half years, moving away from OBIEE to Tableau in the cloud. Here are some of the questions that had been asked, even when thinking about moving from the old BI system to the new system.
For example, starts from the very, very basic, how many reports exist at all in the OBIEE? Do we know how many of them? Do we know who uses those reports? If we know that, do we know if all the reports are being used at all? Do we really need to migrate all of the reports to Tableau? If there are 17,000 reports, do we need all of them to recreate in Tableau? Or maybe we don’t have to do all of them. How many duplicates of reports do we have that maybe we can have only the latest ones? If there are any unique reports, which are the most sensitive ones that we should be aware of, that we will not miss?
If we found those number of reports that we do want to recreate where the data is coming from? Once you generate the report, the report captures data from the data warehouse, but if you recreate them in Tableau, where should I take the data from, or in practice, which are the right database tables and views that are associated to the report in the old system that I need to now associate to the new system? How do I capture easily the metadata that then I need to recreate exactly in the new reporting system and its description? What you know is either business glossary or data dictionary.
There’s also migration that we have witnessed in ETLs that also are associated or related to this. For example, I want to move away from my DataStage into SSIS, or Azure Data Factory. How many business processes do I have in place at all? How many of them are actually being consumed? Because if they’re not being consumed, I may decide not to migrate them or maybe to delete them. Another case in point, how many data pipes do exist at all? Is that 1,000 ETLs? Is that 2,000 ETL processes? What is the interconnections and the dependencies between the different business processes?
Even go back to the reporting case. If I have business processes that I have decided that I don’t want to migrate, the ability to know if they can be dismissed is by understanding if there are any reports that are dependent on running those specific ETLs because if there is no reports at the end of the day, maybe I can delete those ETL processes and there’s no reports that are going to be damaged.
Understanding the entire BI landscape, that includes ETL systems, that includes data warehouse, and databases, that includes analysis services, and includes reporting tools, collectively represents the landscape around business intelligence. To be able to answer those questions, some tool is needed to be in place in order to help those customers to be able to answer those questions as they were thinking, planning, considering a migration project.
The most popular tools that we have seen clients using either individually or collectively is the most popular one, as you know, is data lineage that help organization to understand where data is coming from and where is it going? and what happens to the data as it flows within different BI systems, data discovery, where you want to understand, “Where can I find certain data elements, calculations, formulas within the landscape and business glossaries, which actually associate the meaning of the data element into its true description? Then I need to recreate them in the new reporting tool.”
For that sake, we’ve seen that if you are able to collect, meaning extract very, very easily metadata from all the different tools that exist in your landscape, you would probably have a better mapping of the entire flow of the data that relates to either dismissing an ETL or a data warehouse or a reporting tool.
When you want to migrate your reports from OBIEE to Tableau, you need to be able to map out all the reports that exist in your OBIEE, to where the data is coming from. Meaning what are the exact named ETL processes, and exact named database tables of views that all related to lending the data on a specific report? Which then you can recreate it in the new reporting system, and associate that report to the relevant exact name ETL processes and database tables.
That mapping is what takes a long time within the organization to make. This is one of the most painful timely and costly area in that project. This is where we decided that if you can extract metadata, centralize it, and analyze it, it will help you to build the tools that are critical for your migration project to be able to answer those questions and as you migrate, will help you constantly to keep on making sure that the data flows properly.
Once you have the metadata centralized and analyzed, you can shrink-wrap the analyzed metadata in different shapes and forms of visualization of what you know as tools, as I mentioned, data lineage, discovery, or even version management of that metadata, and obviously create the data catalog.
Now, once you have it in place, it can be also used for other use cases in which we’ve seen clients are using the capability of centralization of metadata and the different stakeholders that collectively participate in either the project migrating from one system to another, migrating from one environment to another, from on-prem to cloud, or within cloud, or dismissing other tools in order to be able to answer those questions.
Let me share with you some examples on our demo environment that will illustrate, visualize, what, in practice, clients did in order to be able to use data lineage and data discovery when they started to map out what is it that they need to do in order to make a migration? I’m going to move forward to our demo environment. What you see here is a collection of metadata from variety of different tools that represent that virtual client.
You can see about 400 ETL processes in this case, DataStage, Informatica, SSIS, and Stored Procedures. For that virtual client, the 400 ETL processes are shipping data and store it in about 2,500 database tables and views that in this client virtual client, we have Oracle SQL Server and also Analysis Services. We have 23 reports being generated in several reporting tools.
Let’s take one example, and through that example, you can see what our clients did thousands of times with Octopai automatically. Let’s decide to migrate a report called Customer Product. Why? Because it’s being generated in a certain system. I want to now recreate it in a different reporting system that enables me to do much more in the visualization and analysis within the reporting tool.
What I will do, I will need first to find out where the report exists. What I will do, I will look for the reporting section, and I will look for that specific report called Customer Products. I’m going to type in “Customer Products.” What I would like to do is once I find this report, and I want to recreate it in a different reporting system, what I want to do is to understand where is the data lens on that report once running it?
My question is, which out of the 2,500 database tables and views and ETL processes are relevant to lending the data on that specific report? If I would know that, once I recreate the report on the new system, I will be able to associate that report to the right database tables and views that now are being used to lend the data on that specific report. This is where I’m going to use lineage or reverse lineage.
Reverse lineage means that once I’m going to click that button, Octopai will analyze how the data landed in that report by figuring out and exposing the exact name database tables and views and ETLs that explicitly responsible for lending the data in that report. I’m going to click on the lineage and see what happens. Within few seconds only, this is what you get. This report right here called Customer Products, double click in that, is being generated in SSRS, in this example, is actually based on a view. Here’s the view.
That view, actually captured data from these exact named database tables and views, that here are the three ETL processes that are responsible to ship the data from the data sources, lending them on these tables that then are being aggregated to that view, which is the base of lending the data on that report. The ability to map that with a click of a button helps me to understand that here are the exact tables that then I would probably need to associate to that report that’s going to be recreated in the new reporting systems, by the way with additional capabilities.
Another interesting part is that once I recreate that report in the new reporting systems, I will have probably more capabilities to massage the data in order to show this in a different way. I might decide that I want to ship more data now to the new reporting system in order to enrich that report that now it’s probably limited because of the old reporting system. My ability to do that would be to probably decide to adjust or to modify some of the ETL processes that are related to lending the data in that report.
Before I’m going to do that, I want to understand the lineage moving forward. Meaning if I were to consider changing something in this ETL, I would want to understand if that impacts only this report or any other report that might be used in my organization maybe by a different tool. What I will do now, I’m going to double-click on the ETL and do the lineage forward, meaning, “Show me all the direct and indirect impacted, either tables, views, reports, dimensions, and schemas, that might be affected if I were considering planning any change here.” I’m going to do lineage and see what happens.
I’m traveling now on the other way around, which means that if I were to consider doing any changes here, in order to enrich that report that now is being recreated in the new reporting system, I need this mapping here to make sure that all the other reports that are being generated in different systems, let’s pick the sales report, for example, and I see that it’s being generated in Business Object, I should have this mapping prior to doing any change in order to make sure that I’m not going to have any errors or problems when I’m planning or as I’m planning to go live.
Now, this mapping right here, when we make equivalents of that lineage with this type of complexity with our clients, they’ve shared with us that that mapping here, just this one, takes about two to three weeks. The prior mappings that I showed you from the report backwards, takes about two to four days. Now, when you have 5,000 reports, 10,000 reports, this is a huge amount of effort to be done. Why not map all the reports with their associated lineage in an automated way?
By the way, you can ask Octopai to make an output document that maps and visualize each and every report so you can have this mapping with a click of a button. In other words, you can have as many lineages being documented by Octopai with a click of a button. Just to make a quick illustration here, if you think about 400 ETL processes, or let’s say the other way around, if we have 23 reports that are being fed, I would say by 2,500, database tables and views, just this collection of relationship can lead to millions and hundreds of thousands of data pipes. The question is why would you want to do this manually when you can have a click of a button do that for you?
Let’s take the other use case. What if I want to migrate an ETL process to a new ETL tool? Let’s take another ETL called load data warehouse. What I want to do now, I want to understand where this ETL process exists. Now I can see is being generated in SSIS. I want to understand what’s going on within this load data warehouse. Can I keep it? Should I delete it? Is there any report that is dependent on running this ETL?
As I did before I clicked on a button called Lineage, and now I can see everything that is dependent and associated to that ETL. What’s also important is to understand not only the relationship between ETL databases and reports, but I want to understand also the column level of that ETL by drilling in that specific ETL. By the way, you can do that in SSIS, Stored Procedure, DataStage, Informatica, or any other tool that you may have.
Let’s drill into that tiny bubble. Click on the package view, I found a container. In the container, I could find four different maps or actually five. Each one of those maps is the gray line that is coming out of that load data warehouse into a certain table. If you want to understand the lineage inside one of these boxes, this is the column level lineage that you would need to map in each one of those containers in each one of those load data warehouse ETL. That can take weeks of work. The question is, why wouldn’t you want to use an automation to do that in place?
The last thing I want to show you on that side is how you can leverage the same metadata that had been extracted for your BI landscape and presented not only in a relationship map or what you know as lineage but also in different form called a discovery. A discovery is a way to present the same metadata that had been analyzed in a Google style. The Google style will enable you to find specific field in which you want to keep or not, or if you want to modify it or you want to mask it, or you want to associate it with other fields that carry the same meaning.
In this case, what you can do, you can look for a certain data element and map it throughout the entire landscape that exists, or you have decided to extract its metadata. It’s being presented in a way that leverages the semantic language of each and every BI system that you currently use. You can see ETL systems. You can see databases. You can see reporting tools of different vendors.
The ability to centralize metadata from different tools in a central repository and create one single view that helps you map the metadata in a way that you can decide which field you want to migrate, and you can understand this all the way by drilling down to the SQL script and find exactly where does it exist? helps you a lot to get confidence about what exists in this landscape of yours. You can find it with a click of a button.
The last thing I wanted to show here is how do you, again, leverage the same metadata that had been distracted and been presented in either discovery or in a relationship map like lineage? how the same metadata is being captured and shrink-wrapped in something we call business glossary. Let’s take an example. Let’s go here or here. In this case, I want to migrate a report called sales report, but I also do want to understand what are the fields or what is the data elements that exist in that report that I may leverage to other reports? Now, let’s do the lineage inside that report.
I’m going to do report view. In this report, I can see the structure of the report, the inner structure, I can see the presentation layer. This is what the business users see. There are two columns, full name and tax amount. I can also see that this is a calculated column that is created by a concat of two physical data elements called FirstName and LastName. Let’s understand what exactly a full name means.
I’m going to go to the business glossary, then I’m going to look for a description that describes the full name. Why? Because I want to make the description available also in the glossary that relates to the new reporting system so the next analyst will understand which report exists, what is the purpose of that report? and all the relevant data elements that they can leverage by creating additional reports.
I’m going to type here “full name,” and within the full name, I found the specific data element. I’m going to click on that. On the right-hand-side, you can see that it appears both in the SSRS and also in the Power BI. If you look very carefully, we also have a different shape of a full name with a space and a full name here with a space appears only in the BO. By clicking on the full name description in the BO, it appears to be that we have a great description that describes that specific full name with a space. We see very clearly that this full name data element includes, first name, last name, and also middle name.
A couple of things to understand; first of all, I’m missing, in the full name, the middle name. I only have two. I need to be aware of that when I recreate that report in the reporting system. The second thing I need to understand is that this data element belongs or being captured from SAP and the owner is Jeff Smith. Also, this data element is automatically linked to two physical data elements, Full Name, and Sales Report.
If I were to associate the full name with a space and without a space, and as you can see, you cannot find this in the BO, I will be able to populate the full name description to all different relevant Full Name data elements that may be written differently, but they mean same. The way to do this is by leverage Octopai’s automatically association of similar names into other fields that do not have descriptions.
Going back to what I can share with you guys today, if I need to capture the five things that most of our customers decided to share with us, that worthwhile to share with you is the following. First of all, don’t migrate everything. In other words, don’t migrate things you don’t need. How do you know that? You can know that by automatically mapping the entire landscape that you have, either reports, ETLs, business processes.
Once you understand the full mapping that can consist maybe billions of data pipes, only then you have a true, accurate map of your landscape that will be the base or the starting point for you to take decisions what not to migrate, which in most cases could be the most costly part of the project.
The second thing, for example, you can compare between different ETL systems in order to understand that maybe you have replications of certain data elements that you don’t want to migrate, or on the other way, you might have duplicate of reports that are minor changes between each and every reports, but you don’t want to migrate 5,000 reports that maybe you have only 2,500 unique reports.
How do you compare that? You can use Octopai or any other tool that automates the analysis of the metadata and show you the differences between the different reports that may lead you to consider not to migrate all the reports.
Simplify. Simplify the processes in order to reduce the interdependencies among various type of data. In some cases, you don’t really need complexity to migrate to the new system and that is a great place to improve how would the new systems look like? versus what had been developed in the old system for the past 5, 6, 8, 10 years in different layers of development between so many different people that some of them may not exist within the organization anymore, not to mention documentation. If there’s one thing I’m certain is that if I would ask you how much of your BI landscape is actually documented, probably you would say only a few percentage, if at all.
The last, protect your data critical path. When you do migrations, aside of migrating everything you want to migrate, you want to put focus on those critical areas that must be migrated well, zero problems, zero mistakes. Therefore, you want to make sure that you have the entire landscape mapped prior in order to make sure that once you migrate you can constantly continue to map what had been migrated, you constantly can compare what had been migrated and if it is truly similar to what you were expecting, in order to make sure that when you go live you don’t hit any bottlenecks or any holes during the migration that may put you in a situation that there are some data leaks or data mismatch or things like that.
Before I finish, I want to recommend you to listen to my colleague, Amichai. Amichai is the customer success manager. He works with many of our clients. There’s a section in his video that takes I think two minutes by, “Migrating systems doesn’t have to be the end for us.” That is true feedback from some of the projects that he had been involved with, with our clients.
If you want to get in touch with us either through info or through my email, feel free to send us emails asking questions about migrations or things that bother your mind in approaching to such projects. What I can also offer or recommend you is to go to our website, and in our website, you will be able to see under the section, Resources, white papers, blogs, and the most popular download is the eBook that talks about migration, that has more information beyond what I shared with you today.
Also understand the different use cases that has to do with BI that relates to migration that in some cases also are under very tight microscope as part of the overall data governance policies. Migration is not just a technical project; it impacts the entire organization.
With that said, if you have any questions, feel free to ask either now or send an email to [email protected] or to myself. We’ll be more than happy to answer those questions or associate this with our technical team. Anything that we can share more given our four years’ experience, automating metadata for these use cases or migration is something that we will be more than happy to share. We’re going to wait another maybe a minute to see if you have any questions in the Q&A.
I got one of the questions here from Alex. Thank you so much. One of the question is, does the forward lineage mapping presume that metadata had been previously captured from all possible downstream applications? The answer is yes, we are not generating metadata. We are showing the metadata in a relationship map, and we need that metadata that relates to that specific project.
The way it’s being done is by capturing or extracting the metadata from those relevant systems that collectively are part of the same environment of which you want to use for your new system, and then once we analyze it you are able to use Octopai for that. If there’s any missing systems metadata that we had not captured, we will not be able to show it.
Here’s another question. What does you estimate is the average saving in terms of an effort? Oh, that’s a good question.
Some clients during the migration have testified that we saved, in time, anything between 30% to 40% of time or 20%, 30% of effort. Effort means that the same team could have done the project shorter in time or we could have used less people to do the project in a more accurate way.
I can share two projects that we have been involved with. One, migrating DataStage ETL processes into Informatica that had exist within one of the, I would say acquired companies. We shortened the project from 1.4 years into at about eight months, so that’s a saving of about six to eight months of work of about five people. You can imagine that if you do multiplications, you can see hundreds of thousands of dollars.
The second part of that was not only the time frame, but they have budgeted three cycles before they go live and they’ve been able to use Octopai automation, and they had reduced one-and-a-half cycles which means they did not do three cycles. They had one full before going live, another half on the critical part, and that saved them again, huge amount of time, a very critical time to market, and a lot of money.
The other project that we were involved that I can share numbers is moving away from OBIEE, shifting about 13,000 reports that initially were 17,000 reports into Tableau. We helped those clients to understand two things. Indeed, do they have 17,000 reports in practice, where with full mapping of Octopai they had found only 13,000 reports? Within those reports, they migrated only 7,000 reports given the fact that they have run duplicates with Octopai and understand downstream ETL relationship to those specific reports. The project was designed to be two years. It ended up in one year and two months. We saved them about $6,000 in that project. This is a long answer to a short question.
There’s no additional questions at this point. I would like to encourage you to contact us and ask questions. If you can share some feedback, that’s okay. If you would like to get more information about how Octopai can use, we’ll be delighted to do that. With that said, I want to thank you again for your time, your patience, and interest. Should you be interested in automation of your entire landscape, Octopai is here to serve you. Thank you so much for your time. Thank you for your patience. Thank you for participating in today’s webinar. Have a great day and stay safe. Thank you.