ANNOUNCEMENT: Octopai has reached Microsoft's Co-Sell Partner Status for Microsoft Azure Customers: Read More

Data Mesh Strategy: How to Plan for Data Mesh Implementation Success

Fishing nets.
Chain mail.
Basketball shorts.

What do they have in common?


Raining Knight In Shining Armor GIF - Find & Share on GIPHY

From mesh to data mesh

The term mesh is defined as an interlaced structure. At any time before the computing era, mesh was usually referring to material made of a network of interlocking pieces of metal or fabric. From nets for fishing or keeping mosquitos at bay to chain link armor and chain link fences, mesh’s big benefit was its combination of burden distribution plus flexibility. 

When there is a burden to be borne or pressure to be withstood, the burden distribution of the mesh over all the linked entities that comprise it is its great advantage. No one entity has to support all the others; they all bear part of the burden and support each other.

If you’re fishing with a pole and string, the single string needs to be able to support the weight of the entire fish – and all the pressure put on it as the fish thrashes around, trying to get away. If you’re using a net, on the other hand, no single thread making up the net has to support the entire fish’s weight; all the threads in the network (ah, yes – this is what “network” used to mean…) combine their strength to bear much more strain than any one of them on their own. 

Mesh’s composition of many interlocking pieces also gives it flexibility. Try shaping a wooden fence into the rolls in which chain link fences are sold. And just imagine if flexible mosquito netting was a solid “mosquito tent.” For one, draping it over your bed would have been much tougher. Plus its benefits in malaria prevention would probably have been outweighed by a tendency to cause suffocation and heatstroke. 

Blazing Heat Wave GIF - Find & Share on GIPHY

As mesh has moved down the timeline into the digital era, it has been used to describe mesh networks and mesh computing. Mesh networks consist of multiple, connected nodes that are all used to route information, giving the network multiple routes and possibilities for information travel. The distribution of the computing and routing burden, plus the flexibility of how information can be routed, leads to a more resilient network.

The term “mesh”’s latest appearance is in the concept of data mesh, coined by Zhamak Dehghani in her landmark 2019 article, How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh.

How is data mesh a mesh? 

Data mesh is an approach to data architecture that is intentionally distributed, with intentionally decentralized ownership and governance. Data is owned and governed by domain-specific teams, who treat the data as a product to be consumed by other domain-specific teams. Unified standards and tools for governance, discoverability and access enable a data mesh to function smoothly as an ecosystem. 

In data mesh architecture, the data engineering burden is distributed among domain-specific data engineering teams. They produce a final product – clean, usable, domain-specific data – ready to be consumed by any other business domain team in the enterprise. 

Here You Go Seth Meyers GIF by Late Night with Seth Meyers - Find & Share on GIPHY

This burden distribution eliminates bottlenecks, long waiting times on IT requests, and misunderstandings that happen because the data engineering team is disconnected from the business domain it’s being asked to do work for.

Sounds so simple, no?

Well, no. 

Implementing a data mesh strategy successfully has several prerequisites – some of them technical and practical, others more ideological.

Let’s take a look at some must-have components of a data mesh strategy.

Embrace decentralization

For years, centralization was the direction of data management. If we could just get all our data into one place – a warehouse, a lake, a lakehouse, whatever – we’d be good, said the theory.

True, centralization was usually better than fragmented data silos. But the problem with centralization became apparent as the amount of data managed by the average enterprise got bigger, and bigger, and BIGGER. Centralized monolithic systems distanced those who understood the technical side of data from those who understood the contextual side of data. These big data monoliths introduced or exacerbated the need for lengthy data cleansing and processing in order to get usable data. 

The key to moving ahead is to take a step back. Not a step back to fragmented data silos, but a step back in the direction of decentralization. Intentional decentralization. One of the core data mesh principles is decentralized operations with centralized standards, otherwise known as a federated governance model. 

This paradigm shift needs to be at the heart of any data mesh strategy implementation. Centralized is nice for small, limited-complexity systems. Once you get to enterprise levels, embracing decentralization (and federation) is the key to success. 

Domain-driven leads to dominion

The data domain is the organizing principle of domain-driven design and of data mesh architecture.

Any large organization is divided up into different functional domains, e.g. sales, customer support, business administration. While these different domains will often have concepts in common (e.g. a “customer” exists in the context of both sales and customer support; “revenue” is a concept used by both sales and business administration), the exact definitions and uses are going to differ based on the domain context. 

Trying to come up with one model that will unify, explain and direct a large, complex system is doomed to failure. A much more effective approach is to divide the system into recognizable, functional domains, come up with unified intra-domain models, and explicitly identify the interrelationships between the different domains and their models.   

The human body with its specialized systems and cells is an amazing example of a domain-driven ecosystem. The circulatory system and the gastrointestinal system are both parts of the large enterprise we call the human body. They have concepts in common (e.g. blood), but what blood means and does in the gastrointestinal system differs in many functional ways from what blood means and does in the circulatory system. 

Additionally, when the business domain is the basic unit of data ownership, management and development, the potential exists for more effective ownership, management and development of that data. This potential stems from the deeper understanding of context and a greater sense of responsibility felt by data owners, data engineers and software developers whose work is focused on a specific data domain.

PermissionIO GIF - Find & Share on GIPHY

The starting point of a data mesh strategy is appreciating your organization as made up of different domains defined by their business function – and spreading that perspective among your teams.

Data as a product to be delivered

What’s the difference between data as an asset and data as a product?

If you’re in charge of an asset, your job is to manage it: make sure it stays in good shape and is available should anyone want to use it. 

If you’re in charge of a product, your job is to deliver it: make sure it’s in good shape and get it into the hands of the people who want to use it. 

If you’re in charge of products, and at the end of the sales quarter they’re all still in perfect shape, sitting on warehouse shelves, then something went wrong.

In the data mesh approach, the perspective of any data domain’s cross-functional team (data owner, data engineer, software developer, etc.) is that they are a team in charge of data product. Their job is to make sure that their domain data is in good shape and get it into the hands of people across the organization who want to use it. 

Suddenly, work is about customer expectations and customer support. It’s about making the resources you have practically – not just theoretically – available for use. 

Adopting this view of data is key to reaping one of the major benefits of the data mesh approach: availability of high-quality, accessible, self-serve data at scale. 

And speaking of scale…

Scale the pattern, not the system

What must you do to keep on top of your data environment (or, in fact, anything) as it gets bigger and more complex?

If you said, “Scale!”, you’re right – and you also might be wrong.

It depends on what you intend to scale.

Let’s say you fish for a living. You’re used to catching about 10 fish a day with your current net. Then you find a hidden cove that’s teeming with fish, and you realize that if you had a bigger net, you could catch more fish!

So you create a larger-scale version of your fishing net. It looks just like your old one, but ten times as big. Do you catch 100 fish with your new net?

Unfortunately, no. Because you scaled the entire fishing net, the holes are also 10 times as big. So all the fish swam through, and you ended up with zero fish. 

Raz Degan Isola 12 GIF by Isola dei Famosi - Find & Share on GIPHY

What you needed to do was to scale the pattern. Keep the holes the same size, but add 10 times more of them.

When you scale a system, it can get too big and break or bottleneck.

When you scale a pattern, each piece still does what it’s designed to do, and they all help each other out. 

This appreciation of the difference between scaling the system and scaling the pattern is a key prerequisite to data mesh strategy success. Data mesh is about scaling the pattern. Every data product is a self-contained, self-maintained system. Dealing with more data, or more consumers, is a matter of replicating and connecting another pattern piece: another data product, another API, another data subscription. 

Sometimes the pattern can be scaled on different levels. In a data mesh process flow, each data product has its own data catalog. The enterprise data catalog is the same “data catalog pattern” taken to the level of the entire data mesh, serving as the coordinated access point to all of the product-level data catalogs.

Can you make it mesh?

Implementing a data mesh strategy for data management is not a simple job. The most critical (and most challenging!) pieces are the paradigm shifts: 

  • Taking a view of your organization as made up of different domains defined by their business function
  • Viewing data as a product that domain-specific cross-functional teams have the responsibility to make usable and available to other teams in your enterprise 
  • Intentionally decentralizing data operations

Of course, data mesh technology and technical implementation take time and have their own challenges. But if you manage to get your organization to think like a data mesh, you’re well on the way to data mesh success.

Jerry Seinfeld Reaction GIF - Find & Share on GIPHY

Is your organization Octopied?

With effortless onboarding and no implementation costs, Octopai’s data intelligence platform gives you unprecedented visibility and trust into the most complex data environments.