Centralized metadata management is an approach to managing metadata where all metadata sources are copied to and processed in a central metadata repository. Any metadata uses, queries or applications are powered by and through that repository.
Benefits of centralized metadata management
The primary benefit of the centralized approach to big data metadata management is the establishment of a single source of truth. The single source of truth helps avoid cross-department misunderstandings based on different taxonomy or definitions. The usefulness of and trust in the enterprise’s data increases correspondingly. Similarly, from a data governance perspective, it is much easier to implement global policies, standards and rules when dealing with one repository.
Because the central metadata repository copies and processes the metadata as it comes in, it can add additional metadata that enhances the understanding and efficient use of the data.
The information in the central repository is now independent of the original source systems, reducing dependence and maximizing efficiency when it comes to information access.
Disadvantages of centralized metadata management
The most obvious disadvantage of the centralized approach is that there is a single point of failure. If the central metadata repository has an issue, all the enterprise metadata management and use is affected.
Additionally, because the information in the central repository is a copy of the original source data, it may not reflect up-to-the-minute changes in the source systems. Keeping the central repository synchronized with the most current information in the source systems requires significant resources, but the alternative is an outdated repository and steadily decreasing trust in the metadata.
Alternatives to centralized metadata management
The two alternatives to centralized metadata management are distributed metadata management and hybrid (or “federated”) metadata management.
In the distributed approach, all metadata is processed and stored by its own source system repository. There is no central metadata repository; instead, there is a “central metadata engine,” which accepts the user queries and gets information from the source repositories in real-time.
Distributed metadata management has the advantage of constant access to the most current information without the need for synchronization. Its disadvantage is the inability to add additional metadata or enhance understanding beyond what is contained in the source systems.
In the federated, or hybrid, approach, the “central metadata engine” that accesses information from source system repositories in real-time, is combined with the “central metadata repository,” which stores copies of source system metadata and can enhance it with additional information.
Hybrid metadata management provides most enterprise use cases with the best of both worlds: real-time metadata access, ensuring that all information is current and reliable, along with storage of a baseline copy in the central repository (in the event that a source system experiences downtime) and the ability to add useful enhancements.