A Data Catalog can be the foundation for your data democracy – if you think of it being more than just a catalog

The hype around data catalog software is justified, but the term "data catalog" is misleading and often misunderstood. We should talk about data libraries i.e. combinations of software, new ways of working and user experience; all aiming to drive data democratization.

I love libraries.

In my younger days, I spent a lot of time in libraries. Many associate libraries with books, but for me libraries are also about music. In libraries, I explored music that would greatly shape my personality and my own approach to music as my personal passion. This was all before music was digitized, so I had to acquire music in physical form.

For me, libraries were a perfectly designed service for music discovery:

  • All products were indexed and easy to find
  • Music magazines were adjacently available for further context of the product
  • You could preview (listen to) the products right there and then
  • The product offering was actively curated by knowledgeable professionals
  • The service was public and available to anyone who had an interest.

I was very curious of non-mainstream music and libraries helped me find artists like Tom Waits, Nick Cave and Pavement, who have all since meant much to me. Had I relied only on my own social tribe (friends and schoolmates), I would never have discovered these artists. This is why I will always be thankful for the existence of libraries.

The Hype Around Data Catalog Software

Much has been said and written about data catalog software in the last year. Data catalogs have been called “the new black”[1] and “the most important data management breakthrough to have emerged in the last decade”[2]. While I agree with most of what Forrester, Gartner and others say, they have all gotten one important aspect wrong: The term “data catalog” is misleading and too narrow. These new software solutions are to data-driven organizations what libraries were to me; a safe place for discovery, learning and transformation. Think of this software as an open-to-everyone service, where:

  • Content is cataloged and curated by experts
  • Contextual information is easily available alongside the core product
  • It is safe and easy to explore the content.

As a bonus, while traditional libraries encourage silence, modern data catalog software encourage conversations, thought exchange and collaboration. They foster data literacy, which is essential for data-driven organizations. Libraries, as we know them, are a foundation for our democracy and even civilization. A data catalog, or a Data Library as I would call it, should be the foundation for your organization’s data/analytic democracy.

I should also add that, unlike with books in libraries, data catalog software does not require data to be physically loaded into the software. The software connects to the data at its original location.

Building the Case for a Data Library

Each organization has to identify its own drivers and build its own business case for a data library. I have worked on and studied data catalog initiatives driven by very different needs:

  • A desire to capture more contextual metadata
  • An aim to foster collaboration in a data & analytic community
  • A need to identify certain data assets for compliance purposes (e.g. PII for GDPR)
  • An ambition to collect information about a multitude of data platforms into one location.

Below are some examples of how common pain points (or “opportunities” if you prefer) can be expressed as benefits for a catalog business case, and how these can then drive the focus of the solution.

 

Some of you might now be wondering what happened to metadata management, ETL, lineage and similar functionalities often associated with data catalogs. While these are important features, they narrow the solution too much and you end up with a technical data catalog, not a data library.

Time & Productivity. You are likely to achieve greater buy-in and success when focusing on business benefits like shorter time to insight or productivity boost. These benefits are achieved by reducing the time data workers spend in finding, accessing and learning how to use data.

Better Decisions. You can also focus on the benefits of faster and more accurate decision making. Faster analytics leads to faster decisions, which leads to seizing business opportunities faster. Similarly, a better understanding of available data assets, leads to richer and more precise analytic outputs, making business decisions more accurate.

Data Value Assessment. Finally, Chief Data Officers in particular should actively be looking at the value of their data assets to understand investment and optimization opportunities. Features like data utilization metrics and user stories shared in the data library will help a CDO understand and assess the value of their data assets.

The highest prioritized benefits should then drive your approach to implementing a data library. Is it more important to capture rich metadata or usage data? Will you favor crowdsourcing or automation for creating the content? Is the library centrally curated or can anyone add/edit/remove contents? The answers to these questions should drive your technology selection and implementation plan.

It’s a Way of Working Aiming To Deliver a Great User Experience

While modern data catalog software products are already pretty great on their own, just like libraries, they won’t provide value without active curation and stewardship. The key to success with a data library is to make it a way of working. This means tampering with behaviors in your organization, i.e. what people do and how they do it.

Libraries in essence are just hollow buildings with empty shelves. What makes them work is their content, and the people curating and designing services around that content. You need to think of data libraries similarly. Identify the people most passionate about your data assets and promote them to become “Supreme Information Curators” or “Esteemed Data Council Members”. Avoid making this about policing and controlling, and instead focus on enabling and empowering users. Then, use insight from how your organization works to design a data library user experience that does to data workers what libraries did to me: Changed beliefs, thoughts and behaviors. Well-designed data libraries will make data workers more engaged, productive, collaborative and knowledgeable. By doing so, the libraries will also drive data workers to contribute to the common wisdom of your organization.

Data Catalogs Are Just Software But Data Libraries Are Foundational Capabilities

The data catalog software market is a dynamic and interesting market, and these products can solve a number of business problems. However, they miss the mark if they are deployed as catalogs only. Think of these products as the enabler of your organization’s data library and the foundation for your data and/or analytic democracy. When building the case for your data library, identify which business problem is most relevant to your organization and then look for solutions that address those specific problems. A data library is a way of working, not just a technology solution. Find the right people in your organization and give them the responsibility – and privilege – of curating your organization’s data wisdom. Think of your data library as a service that provides value to its users and changes behaviors. If you get this user experience right, your organization will generate more value of its data assets, and your data people will be smarter, more engaged and more productive than any of your competitors.

—–

At Solita we have explored the data catalog software market extensively and have also implemented data catalog/library solutions with our clients. We’d love to help you build a business case for a data library and implement a library service that drives data democracy in your organization.

[1] Gartner: “Data Catalogs Are the New Black in Data Management and Analytics” Analyst Report, Ehtisham Zaidi, December 13, 2017

[2] 451 Research: “From out of nowhere: the unstoppable rise of the data catalog”, Matt Aslett, October 10, 2018