From data warehouse to data mesh: actionable data is still key

A data mesh turns the script towards centralization and having a monolithic data structure by decentralizing data management to the different business domains in the enterprise.

Using data is difficult (clearly illustrated in the annual Big Data and AI Executive Survey and other similar studies). Companies know this to be true and have spent the past three decades trying to make it easier, eagerly drawn to the next “platform du jour” that promises to enable greater data access and analytic insights. First came enterprise data warehouses (EDWs), then the various warehouses and lakes designed for the cloud, and now data mesh is all the rage.

Time and again, these approaches have spawned different schools of thought — each with its own vocal industry frontrunners and advocates — about how companies should manage and run their data. Separating the hype and what should be assumed can be exhausting. And we can expect to see only more emerging solutions in the future until organizations address a fundamental and overlooked challenge within their data stacks: data usability.

The Centralized Mindset

The traditional EDW introduced the idea of ​​integrating structured data in one place that would make it easy to access Business Intelligence (BI) reporting. The data would be highly managed, meaning organizations would only populate their EDWs with data deemed necessary for specific BI reports. While this helped save resources and costs, it also meant cutting out other valuable, related data that could provide deeper, useful insights.

To collect and leverage even more of their data, companies migrated the concept of the EDW to the cloud. For the most part, these companies saw and wanted to emulate the successes of the world’s digital-native FAANGs, who are gaining competitive advantage by using rich data in the cloud to guide business decisions and hyper-personalize products and services for customers. But cloud-based EDWs were still limited to structured data only, disregarding the vast wealth of unstructured data in the modern business. Most organizations therefore end up just replicating the same on-premise BI reports instead of achieving something transformational.

In 2010, data lakes emerged as a promising solution, where organizations would consolidate all raw, unstructured, semi-structured, and structured data into one central location — available for use in analytics, predictive modeling, machine learning, and more. However, data lakes were also compared to “data swamps” because they often became expensive dumps for all data due to poor design, governance and management. The data would be far from usable, creating mistrust in the data quality and the resulting insights or solutions.

Anyone who has experienced limitations in BI reporting or a data swamp will not be surprised to learn that a TDWI research study of 244 companies using a cloud data warehouse or more, 76 percent found that 76 percent experienced most, if not all, of the same challenges as their on-premise environments.

Also see: Data Fabric vs. Data Mesh: Key Differences and Similarities

Decentralize with a data mesh

Originally suggested by Zhamak Dehghani of ThoughtWorks, the data mesh turns the script towards centralization and a monolithic data structure by decentralizing data management to the different business domains in the enterprise. The goal of a data mesh is for each business domain to treat data as a product that they can transform, use, and make available to other cross-domain users.

The thinking is that your business domain experts know best if the information is up-to-date, accurate, reliable and better able to deliver the right data at the right time. In a fully centralized approach, they would rely on data teams, which are often resource-limited and have to juggle numerous competing requests from other business units, which can cause delays. With data mesh, however, there is no longer the need to query data from a massive data lake, allowing users to act on the data closer to where they are, accelerating the time to insights and value. The network will be merged into federated computational governance – core standards, rules and regulations for the entire organization to ensure interoperability between the domain units and data as a product.

It is important to note that data mesh is not a single turnkey solution, but rather an organizational approach that can use multiple technologies and may even include a data lake. Since the approach is radically different from what organizations are used to, change management is required, including getting buy-in from your domain experts who are used to consuming reports rather than doing the data engineering and scientific work themselves. To make this decentralized model a success, data upskilling within the domain units will therefore be necessary.

Also see: The promises of data fabric in digital transformation

Data usability is still a prevalent issue

While data mesh sounds essentially different from the cloud data warehouses and lakes that have long dominated the industry, these approaches present similar challenges that underscore the need for data usability.

The fundamental problem is that raw data is useless. You have huge amounts of data full of errors, duplicate information, inconsistencies and different formats all floating around in isolation in different systems. With cloud data warehouses and lakes, these bits are usually just moved with their existing issues from their on-premises environments to the cloud — warts and all. In turn, the data is still isolated and siled, except it’s all in one place now. This is why people end up experiencing the same on-premise challenges in the cloud. These floating bits eventually need to be included, integrated, and enriched to become usable.

The same transformation must take place with a data mesh – only, instead of central data teams doing the work, each business domain becomes responsible for its own data. The decentralized nature of a data mesh can also introduce new complexities. For example, it can lead to business domains duplicating efforts and resources on the same datasets. In addition, data products from one business domain can often be beneficial to other domains. So in addition to uncovering relationships between the data sets, users also need to reconcile entities in data products across domains, such as when combining data from different systems to form a complete picture of a customer.

We highlighted the need to upskill business users within a data mesh. A shift to more citizen data scientists may be necessary, even among companies that don’t adopt data mesh, simply because of the rampant shortage of data scientists, with the latest estimates point to a gap of 250,000 between vacancies and searches. The talent shortage, coupled with the growing amount of data in modern businesses, means few organizations are able to leverage their data effectively at scale.

Establish a data usability layer

Whether your organization takes a centralized or decentralized approach to business data management, you ultimately need a way to connect, integrate, and understand all the pieces of information from across your business. If you don’t have the talent to do this critical work and the amount of data is overwhelming, then automation is something to consider.

Today, AI can be applied to facilitate the incorporation, enrichment, and distribution of data from all resources and manages every step necessary to obtain actionable data assets. You go from having fragmented, floating bits of information to linking and merging them within a metadata layer, or data usability layer, in your data stack so that the data is ready for use in reporting, analytics, products and services by any user.

A data usability layer sits next to any cloud data warehouse, data lake, or data mesh environment. It empowers companies to optimize whatever strategy they choose for their organization by empowering you to understand, use and monetize every last bit of data at absolute scale.

Leave a Comment

Your email address will not be published. Required fields are marked *