Master Data Management explained

In this blog series we dive into the world of master data management and explain what it is, what are the future trends of it and why it is so important.

Imagine sending a confidential email to a wrong address, or having factory standing by due to malfunction, or even crashing a rocket into a planet instead of safely reaching its orbit. All of these cases are real life examples caused by the poor master data quality. In the first case the problem was in the customer data containing erroneous address for person. The second case was caused by poor master data regarding factory maintenance schedule. The third case happened due to invalid reference data of metrics used. The saying “devil is in the detail” applies really well in the world of master data. Those seemingly small issues will quickly grow into major problems impacting the whole business. Most of these data quality related issues would have been easily avoided by a proper master data management.

But what exactly is master data and how it should be managed? This blog series dives into the world of master data management: What is it, why it is important, and how it should be executed. But before going too deep into technical detail, let’s start by defining what could be considered as master data. There are various definitions for the term and none of those is absolute. One definition could be that master data can be seen as the data describing the most relevant business entities, on which the activities of an organization are based on. But what are those entities?

Each industry has its own master data requirements and characteristics, but there can be total of five common core master data domains identified:

  1. Product (What): All master data related to e.g. products, services, or assets offered or owned by the organization. This domain can vary a lot depending on the industry.
  2. Party (Who): Domain consisting of all business partner related master data such as customers, suppliers, distributors, employees or citizens. It can be either person or organization, and contains e.g. name, party identifier and contact details.
  3. Location (Where): Master data regarding places, sites and regions. Sales territories, cities, offices and production facilities are examples of locations.
  4. Account (How): Explains how parties are related to each other and to the things offered by organizations. It contains data such as accounts, financials, agreements, contracts and other relationships between entities.
  5. Calendar (When): Time domain of master data containing e.g. validity periods and schedules associated with e.g. creation, marketing, shipping, and obsolescence of products. This is the most ambiguous domain since the time dimension can also be seen as part of the life-cycle management of other domains.

Master data also has some special characteristics compared to other types of data:

  • Master data always describes the basic characteristics of entities in the real world.
  • Master data has relatively long lifecycle.
  • Transaction volumes do not directly affect the amount of master data.
  • Master data can exist without transaction data, but not vice versa.

Now that there is a clear picture about what master data is, we can define what is master data management. There are plenty of different definitions for the term master data management (MDM). I would define MDM as a method of enabling organization to create a unified view of its core data. The focus of MDM is to create an integrated, accurate, timely, and complete set of data needed to manage and grow the business. The goal of master data management can be summarized into these three objectives [1]:

  1. Provide a consistent understanding and trust of master data entities
  2. Provide a mechanism for consistent use of master data across the organization
  3. Accommodate and manage change

The Zachman Framework for Enterprise Architecture [2] defines six categories of enterprise architecture components. Interestingly, the core basic master data domains share similar categories of Who, What, Where, How and When. The remaining category is “Why”, and it should definitely be taken into concern when thinking about master data management. The “Why” dimension can be seen as reasoning behind the whole master data management initiative. This is the part where most of the MDM initiatives fail when business is not being involved. This being said, the next part of the MDM blog series will open more the “Why” of master data management.

“Without a systematic way to start and keep data clean, bad data will happen.” — Donato Diorio

REFERENCES:

[1] Dreibelbis, A., Hechler, E., Milman, I., Oberhofer, M., van Run, P., & Wolfson, D. (2008). Enterprise master data management: an SOA approach to managing core information. Pearson Education.

[2] Zachman, J. (2006). The zachman framework for enterprise architecture (pp. 1-15). Virginia: Zachman Framework Associates.

A Data Catalog can be the foundation for your data democracy – if you think of it being more than just a catalog

The hype around data catalog software is justified, but the term "data catalog" is misleading and often misunderstood. We should talk about data libraries i.e. combinations of software, new ways of working and user experience; all aiming to drive data democratization.

I love libraries.

In my younger days, I spent a lot of time in libraries. Many associate libraries with books, but for me libraries are also about music. In libraries, I explored music that would greatly shape my personality and my own approach to music as my personal passion. This was all before music was digitized, so I had to acquire music in physical form.

For me, libraries were a perfectly designed service for music discovery:

  • All products were indexed and easy to find
  • Music magazines were adjacently available for further context of the product
  • You could preview (listen to) the products right there and then
  • The product offering was actively curated by knowledgeable professionals
  • The service was public and available to anyone who had an interest.

I was very curious of non-mainstream music and libraries helped me find artists like Tom Waits, Nick Cave and Pavement, who have all since meant much to me. Had I relied only on my own social tribe (friends and schoolmates), I would never have discovered these artists. This is why I will always be thankful for the existence of libraries.

The Hype Around Data Catalog Software

Much has been said and written about data catalog software in the last year. Data catalogs have been called “the new black”[1] and “the most important data management breakthrough to have emerged in the last decade”[2]. While I agree with most of what Forrester, Gartner and others say, they have all gotten one important aspect wrong: The term “data catalog” is misleading and too narrow. These new software solutions are to data-driven organizations what libraries were to me; a safe place for discovery, learning and transformation. Think of this software as an open-to-everyone service, where:

  • Content is cataloged and curated by experts
  • Contextual information is easily available alongside the core product
  • It is safe and easy to explore the content.

As a bonus, while traditional libraries encourage silence, modern data catalog software encourage conversations, thought exchange and collaboration. They foster data literacy, which is essential for data-driven organizations. Libraries, as we know them, are a foundation for our democracy and even civilization. A data catalog, or a Data Library as I would call it, should be the foundation for your organization’s data/analytic democracy.

I should also add that, unlike with books in libraries, data catalog software does not require data to be physically loaded into the software. The software connects to the data at its original location.

Building the Case for a Data Library

Each organization has to identify its own drivers and build its own business case for a data library. I have worked on and studied data catalog initiatives driven by very different needs:

  • A desire to capture more contextual metadata
  • An aim to foster collaboration in a data & analytic community
  • A need to identify certain data assets for compliance purposes (e.g. PII for GDPR)
  • An ambition to collect information about a multitude of data platforms into one location.

Below are some examples of how common pain points (or “opportunities” if you prefer) can be expressed as benefits for a catalog business case, and how these can then drive the focus of the solution.

 

Some of you might now be wondering what happened to metadata management, ETL, lineage and similar functionalities often associated with data catalogs. While these are important features, they narrow the solution too much and you end up with a technical data catalog, not a data library.

Time & Productivity. You are likely to achieve greater buy-in and success when focusing on business benefits like shorter time to insight or productivity boost. These benefits are achieved by reducing the time data workers spend in finding, accessing and learning how to use data.

Better Decisions. You can also focus on the benefits of faster and more accurate decision making. Faster analytics leads to faster decisions, which leads to seizing business opportunities faster. Similarly, a better understanding of available data assets, leads to richer and more precise analytic outputs, making business decisions more accurate.

Data Value Assessment. Finally, Chief Data Officers in particular should actively be looking at the value of their data assets to understand investment and optimization opportunities. Features like data utilization metrics and user stories shared in the data library will help a CDO understand and assess the value of their data assets.

The highest prioritized benefits should then drive your approach to implementing a data library. Is it more important to capture rich metadata or usage data? Will you favor crowdsourcing or automation for creating the content? Is the library centrally curated or can anyone add/edit/remove contents? The answers to these questions should drive your technology selection and implementation plan.

New call-to-action

It’s a Way of Working Aiming To Deliver a Great User Experience

While modern data catalog software products are already pretty great on their own, just like libraries, they won’t provide value without active curation and stewardship. The key to success with a data library is to make it a way of working. This means tampering with behaviors in your organization, i.e. what people do and how they do it.

Libraries in essence are just hollow buildings with empty shelves. What makes them work is their content, and the people curating and designing services around that content. You need to think of data libraries similarly. Identify the people most passionate about your data assets and promote them to become “Supreme Information Curators” or “Esteemed Data Council Members”. Avoid making this about policing and controlling, and instead focus on enabling and empowering users. Then, use insight from how your organization works to design a data library user experience that does to data workers what libraries did to me: Changed beliefs, thoughts and behaviors. Well-designed data libraries will make data workers more engaged, productive, collaborative and knowledgeable. By doing so, the libraries will also drive data workers to contribute to the common wisdom of your organization.

Data Catalogs Are Just Software But Data Libraries Are Foundational Capabilities

The data catalog software market is a dynamic and interesting market, and these products can solve a number of business problems. However, they miss the mark if they are deployed as catalogs only. Think of these products as the enabler of your organization’s data library and the foundation for your data and/or analytic democracy. When building the case for your data library, identify which business problem is most relevant to your organization and then look for solutions that address those specific problems. A data library is a way of working, not just a technology solution. Find the right people in your organization and give them the responsibility – and privilege – of curating your organization’s data wisdom. Think of your data library as a service that provides value to its users and changes behaviors. If you get this user experience right, your organization will generate more value of its data assets, and your data people will be smarter, more engaged and more productive than any of your competitors.

—–

At Solita we have explored the data catalog software market extensively and have also implemented data catalog/library solutions with our clients. We’d love to help you build a business case for a data library and implement a library service that drives data democracy in your organization.

[1] Gartner: “Data Catalogs Are the New Black in Data Management and Analytics” Analyst Report, Ehtisham Zaidi, December 13, 2017

[2] 451 Research: “From out of nowhere: the unstoppable rise of the data catalog”, Matt Aslett, October 10, 2018

Impact of AI