Enterprises today are focused on ensuring they have robust data management tools in place to enable them to find and understand their data. Some organizations know exactly what they need, while others can be overwhelmed or confused by all the different solutions out there.
Let’s take “Data Dictionary”, “Business Glossary” and “Data Catalog” for example. They might sound similar, but are they? Yes and no. They’re actually quite different in what they are and how they’re used. It is important to note when differentiating between them that they are commonly silo – or project-based, and therefore the value they provide across the multi-system business intelligence infrastructure is limited.
Let’s review each and clarify their uses.
What is a Data Dictionary?
As the name suggests, Data Dictionaries provide information about your data. Descriptions can include data attributes, fields, or other properties such as data type, length, valid values, default values, relations with other data fields, business definition, transformation rules, business rules, constraints etc.—anything you need to define each physical data element inside operational data sources and data warehouses. This is also relevant for logical BI data objects, and it should have a business flavor to it, not just technical.
A Data Dictionary should be a one-stop-shop for IT system analysts, designers and developers to understand everything about their metadata. They are used to help translate data level business requirements into technical requirements, and should ideally be able to provide this information in an easily understood, structured and organized way. IT teams should be able to tell within a few seconds exactly which inputs should be included in order to meet project goals, from attribute type to field requirements to default values.
Data Dictionaries are often presented in spreadsheet format with rows and columns defining each attribute or metadata category that needs to be addressed in a system. They are sometimes something someone enters on their own and has to comment on and refresh manually. Data Dictionaries look over the system catalog of a database and pull specific objects into the database. For a column this may include:
- Column Name
- Column Location
- Column Datatype
- Descriptive information that a user has entered available in the System Catalog
Information within a Data Dictionary mainly helps BI developers. The Data Dictionary is essentially an inventory that shows which type of tables and columns exist.
What is a Business Glossary?
Business Glossaries help define terminology across business units. They offer clear definitions across the entire enterprise with the goal of keeping terms consistent and helping everyone stay on the same page.
A quality Business Glossary is an important part of collaboration, particularly in larger businesses that span numerous departments. You’d be surprised at how differently each different business unit defines data elements relevant to their own operations, even in related departments (such as sales and marketing). As self-service users define the logical meaning of data elements and can create their own calculated columns, there is a lot of room for inconsistency.
What is a Data Catalog?
Much like the BI team’s role in creating one source of the truth in the data, Data Catalogs provide one source of the truth about your data. While Business Glossaries help define terminology across business units and Data Dictionaries provide technical information about physical data assets, Data Catalogs are a one-stop shop for anyone shopping for data they would like to use, manage or understand.
The data catalog ties the business terms to the physical data assets and includes capabilities intended to make organizational data easy to locate, understand and use. Having good business definitions is great but if they are not related to their underlying data their value is greatly limited, when any user wants to locate data associated with a business term they will need to start hunting for it with the BI team. Good technical documentation is great for the technical team, but it will only take your BI so far, the BI team will most often still need to spend a majority of it’s time mitigating between the data users and the data. True independence in using the data to its fullest can only be achieved by fully democratizing the data to all data citizens through documentation that bridges the gaps between technical, physical and semantic data assets and their related business terms and aspects.
Data driven organizations have realized the necessity of empowering data users, doing this starts with organizational wide accessible documentation and collaboration around the data, these organizations recognize that an excellent Data Catalog is the most efficient way to unleash the power and hidden value of all the data they work so hard to create and maintain.
With a quality Data Catalog, any data consumer or creator should be able to easily locate any data asset, understand what the data means, how the data can be used, what are its limitations, who is responsible for the data, What else the data is associated with and last but not least, enable in context collaboration around the data.
Here’s a chart that clearly lays out the differences between a business glossary, a data dictionary and a data catalog:
A Data Catalog should be more. Much more.
Having an accurate understanding of what’s going on in your BI systems is a must, but you cannot have an accurate understanding of what’s going on in your BI systems when, alas, your Data Catalog standardizes terminologies only within single silo systems, and not across the whole landscape and certainly not if the terminology isn’t tied directly to the data it represents.
Looking for a User-Friendly, Modern, Automated Data Catalog?
See what a catalog that features built-in data lineage and collaboration can do!Learn More
Differences in Application
Clearly, every business needs both Data Dictionaries and Business Glossaries, but there’s still plenty of confusion out there about the application of each. What is clear though, is that both require a lot of time and manual setup to get them going and are in general difficult to implement, but most importantly combining capabilities from both into a collaborative Data Catalog is the only way to make the most of your data.
Data dictionaries provide IT frameworks
Since Data Dictionaries deal with the specifications of each database and system, they’re used more by IT teams. Data Dictionaries are used primarily by the designers and engineers who build/change the processes, and as such, they’re fairly technical. Most departments outside of IT won’t deal with Data Dictionaries too often.
Business glossaries offer more company-wide consistency
Business Glossaries are a bit more accessible. As Business Glossaries standardize definitions, they’re often used by just about everyone in the business side of the organization—especially business analysts. Unlike Data Dictionaries (which are more technical), Business Glossaries are more logic-based; their purpose is to clarify terminology and help each department tie unique definitions into each used term.
Because common Business Glossaries exist in silo tools, the definition of terms is not always standardized – which leads to multiple truths. Ideally, business intelligence teams will keep these resources close at hand and integrate both into their decision-making. Business Glossaries provide the business language, and data dictionaries provide the technical details. Together, these aspects influence how communication flows across a company and how teams collaborate on each project.
The Major Difference Between a Data Dictionary, a Business Glossary, and a Data Catalog
The main issue with Data Dictionaries is that they typically only display a database’s physical structure, which isn’t usually enough information for a BI developer to understand what each column contains and certainly not information that could be of value to anyone outside IT.
This is a significant issue that both Data Dictionaries and Business Glossaries face. A Business Glossary contains business terms, but exactly which database columns relate to these business terms? A Data Dictionary contains database columns, but which business terms relate to these database columns?
The first way to reconcile the discrepancy between a Data Dictionary and a Business Glossary is manually. Many organizations have attempted this but the task can be extremely expensive and time-consuming, and the results may be prone to errors. This is typically performed by analyzing data values in the physical columns. Attempting to determine what each column translates to in business terms can be unreliable. Additionally, the budget required relative to the results produced is not feasible for many companies.
Data profiling is another approach implemented by many organizations. In this context, “Data profiling” means automatically looking over and classifying the content of a column. This solution also poses some problems. For example, it may be discovered that each individual value in a column contains an “@” sign followed by a website. As we all know, this signifies an email address. But what type of email address is it exactly? This limits our understanding of what the column actually represents and how it can be applied in a business context. So, the only way to remedy this situation involves manual effort, which is costly and may result in many mistakes.
Instead of wasting time trying to connect the dots, the Data Catalog becomes applicable here, as it helps fix the discrepancy between Data Dictionary and Business Glossary. It does so by attaching meaningful business information to data assets, which helps explain where the data originates and the type of data that is associated with each business term (and vise-versa).
Octopai’s Data Catalog is automated, this means that instead of having to create each data asset whether it be a Column in a report, a metric in an analytical model, a process or a table in the database manually, from scratch, all assets from the entire BI landscape are harvested automatically, centralized and organized in a searchable collaborative module.
So how do you boost your data governance efforts?
How metadata management automation can help organizations implement a Data Catalog
For organizations working on implementing a Data Catalog, having a full view of metadata across the entire BI infrastructure is critical. With metadata management automation, all metadata from each individual silo tool throughout the BI landscape is centralized in one place and easily accessible.
Likewise, organizations looking at implementing a Data Catalog would be able to get a full description and full path for any searched element if they leveraged metadata management automation. Octopai, for example, takes all the descriptions from database objects and reporting tools (from the semantic/logic layers) and correlates logical columns to physical columns, drastically improving accuracy and streamlining the implementation of a Data Catalog.
Happy Boss, Happy BI Team: Automate Your Data Catalog
Building a Data Catalog can be an incredibly time-consuming and costly project – especially since it demands lots of manual data entry by multiple teams. This kind of project can cause your BI team to burn out and keep them from focusing on the tasks they were hired to do. We don’t want that now, do we? Actually, since building a Data Catalog demands so much costly manpower and dedicated time, most enterprises opt to keep putting it off, despite how critical it is to the organization.
Using automation to build a Data Catalog nips both those problems in the bud as the majority of the manual work is done, you guessed it, automatically. Seriously – your data assets are automatically harvested and populated into your Data Catalog, which is generated on the spot with your own BI metadata and all documentation you already have in place.
Using automation to generate your Data Catalog provides other benefits as well. If Joe from IT adds a data asset in the reporting system, you don’t have to worry about him updating you about it while building the Data Catalog. It will update automatically. In addition, when doing any process manually the chances of there being errors are pretty high. Automation eliminates errors completely.
First understand what it is that you have, and then get organized.
Many companies choose to work with Octopai as they embark on a Data Catalog, Business Glossary or Data Dictionary project as they know just how cumbersome such a project can be, and they understand the added value of metadata management automation – specifically when it comes to cutting down the set-up time, reducing the amount of manual tracing and boosting overall project accuracy.
Moreover, it is important to realize that there’s no real use in getting organized before you are able to see everything you have. Organizations use Octopai to discover and understand all their metadata, and then they move to the next step of getting organized with Data Dictionaries, Business Glossaries, Data Catalogs etc.
Updated August 2021