Analyst Chat

Analyst Chat #157: How to Refine Data like Oil - Data Quality and Integration Solutions


Who has not heard of the statement that "Data is the new Oil". But oil needs to be refined and so does data. The challenge of gathering, integrating, cleansing, improving, and enriching data across the complete range of data sources in an organization, for enabling use of that data as well as enabling data governance and supporting data security initiatives, that is the topic of this episode. Martin Kuppinger joins Matthias and explains this market segment and its relevance on the occasion of the publishing of a new Leadership compass covering "Data Quality and Integration Solutions".

Welcome to the KuppingerCole Analyst Chat. I'm your host. My name is Matthias Reinwarth, I’m the Director of the Practice Identity and Access Management here at KuppingerCole Analysts. To keep up with the ongoing research that we are doing, I have invited my colleague and the founder, or one of the founders of KuppingerCole Analysts, Martin Kuppinger. And we want to talk about a topic that I think we have covered at least partially already. Hi, Martin. Good to see you.

Hi, Matthias. Pleasure to talk to you again.

Pleasure to have you. And we want to talk about a topic that is called Data Quality and Data Integration. I think we did an episode around data management and data consolidation already. You did a Leadership Compass, and it's a new one. So what have you done before and where are the changes?

Yes. So it's not changes, it's adding to that. So when we look at this entire field of data. And I sometimes get a question, Why is KuppingerCole Analysts looking at this area and I think the story is very simple and very stringent as well. So KuppingerCole Analysts, our roots are in identity and digital identity and identity management, in identity governance or access governance. We also look at cybersecurity, so which is also very close to identity, with identity being a major element within security. From a governance perspective then it is that we have this access governance, which is about access to data. And, it also goes into data governance like information rights management or secure information sharing, so protecting unstructured data. And so it's very logical to look also at how do we deal with structured data. So data in databases and other types of systems. Databases have become a very broad term and a range of technologies. When I was young, there was mainly relational databases and maybe from the past a bit hierarchical etcetera. So this has fundamentally changed. But there's this need for data governance, and for data governance on the other hand, it means we need to be in control of data. And one element of being in control of data is data catalogs and meta data management, understanding where data resides. This is what a data catalog provides. And then we have the other part, which is about integrating that data, bringing it together and enforcing the quality, which is also part of that but goes beyond because it's also what makes, so to speak, data ready for being used. So when we look at this, there's surely this aspect of data utilization. So how do you make data available for all the various business use cases like analytics BI (Business Intelligence) and so on, but also what do you need to do to really be able to mine the value of data and this will not work when you don't have the right foundation, and this foundation is what we covered on one hand, with data catalogs and metadata management, so getting an overview. And right now we have added data quality and integration. So integrating data and keeping quality high. With these two things, we are talking about a foundation. There will be other researcher on data governance who did already a lot of around data privacy. So we are adding to this and complementing this. But starting from where we have our roots, but expanding into this field because at the end you can’t protect what you don't know. You can't govern what you don't know. You can't manage what you don't know and you can't use what you don't know.

From the advisor perspective, we see more and more that attribute based, policy based access management is more and more in place, and that heavily relies on data quality. And therefore we see in some projects already arising the need for solutions that are ETL solutions. So extract, transform and load again solutions. Are these solutions that also are looked at in this Leadership Compass?

This is part of it, yes. So this segment, or ETL and ELT, which are two versions or two variants of the same thing to extract, transform, load or extract, load, transform, this is more the question of the all over flows, are part of the data integration. And data quality then comes in to enforce data quality, but also data enrichment to enrich data. And I think you're spot-on, there are use cases which probably are not that much on the radar of many organizations. But because the main focus surely is about data utilization. So enrichment, for instance, frequently focuses on address enrichment, address verification, stuff like that. But there are other use cases, so to speak, in our traditional domains, like authorization. When you build on data from data sources and attributes you use in a policy, then you must on one hand, ensure that the policy is correct, on the other hand you must ensure that the data used for making the decision is correct so that this is the real valid data not altered by someone, etc.. So yes, there are use cases in other domains which are gaining momentum and I'm quite sure we will see more of these.

Right, from the picture that you brought with you, I can see this is really a broad scope of potential functionality that can be covered. Is this a uniform market or is this a market which has different types of solutions combined into this Leadership Compass. How homogeneous is this?

It's not super homogeneous. So when we look at the data quality data integration part, then we have vendors that really are more on the quality and vendors that are more on the integration side. That is one distinction. So there are vendors in this market to primarily focus for instance on customer data integration as well. So specific domains where they have to strengthen. And we have some vendors that are more on the end user side. So really also delivering insights, making it available to the business user and there are vendors that are very developer centric that provide solutions that are really focus on developers that create solutions where data becomes integrated. So it's a bit a mix of, of different approaches. And we see that also from a product portfolio that a lot of vendors have their data quality or data integration technology, sometimes this is combined, sometimes they have separate products, sometimes this merges with MDM, so master data management capabilities, which are usually more use case or domain specific like product master data management, customer master data management. And so it's not a fully homogeneous market segment yet. And it probably never will be because there are so many use cases, so many domains and different types of users.

Okay. It is a Leadership Compass that you've produced in that area. So you are looking at actual vendors and their products / services. When we talk about homogeneous, how diverse is that market? Are these the big players? Are these small player start up or is it again a mix of those?

A mix. So there are some of the very large players we see in the data market like Informatica, like IBM, like Oracle in this field, surely. But there is also a lot of focus to vendors that specific solutions, specifically the ones that are targeting developers. Some of these are a bit smaller. And so it's really a broad mix. And what we also see is that that we see vendors that come more from the sort of the data utilization side like analytics, etc., that are adding capabilities like that because they learn and feel that they need to do something like that. So this is a rather broad perspective we are taking here.

Okay. My usual question at that point is many organizations, vendors and products come with the promise of adding machine learning, artificial intelligence, more intelligence mechanisms to get a grip on the mere amount of data and the mere amount of challenges around data. Is this also something that you can see in that market? Are they coming with that promise and are they delivering?

Yes. So several of the vendors, not all, have some level of AI / ML in their solutions. And I think there's a lot. When you look at specifically the data quality aspects, so understanding and analyzing data quality and where things are correct or not correct, then these technologies really can augment, the users can simplify things beyond and just rule based approaches, they definitely help in this area and we see these technologies increasingly being used across sort of the entire space of data management and data utilization, which is like it can be seen in the graphic, which is a broader space with a couple of core elements like the data or the meta data management catalogs, the data quality integration that data analysis, the data utilization, the data governance. So we have really a range of different areas. And I dare to say there is no area, no single area where AI / ML couldn't contribute and deliver value to the users.

Right. In such a podcast episode, we can only actually scratch the surface of such a market. So I really recommend that people interested in this Leadership Compass, in that market segment, head over to our website and look at the Leadership Compass that has just been published. So it's called Data Quality and Integration Solutions. So look out for this. During the creation of this Leadership Compass, Martin, have you identified trends when it comes to how will that market develop in the future? Are there some actual trends that you could identify that you would expect in this market to surface quite quickly?

Hmm. I think, you know, the point is it's a very, very mixed market. So we see this AI / ML utilization, and using it to augment the users surely as one of the evolutions. We see, I would say more generally speaking, that this market is becoming more and more relevant. Data quality is understood as something we need to have. So it's probably more the trend toward saying, okay, we need that foundation, that strong foundation for being successful. It's not that ETL or ELT are new, they are around for decades, but more specifically, the perspectives on how to enrich data and how to make all this available also to business users, to provide them insight into that, to enable them to work on improving data quality. This is really changing. So when you look at UIs et cetera, we see a lot of modern UIs, modern dashboards, etc., to help the users to really work with the data.

Great. Thank you. So it's really the relevance, which is the trends. So we are getting to a market that really supports the business. Thank you very much, Martin, for being my guest today. Again, I can highly recommend to have a look at this document. And if there are any questions towards Martin, towards me, towards the content of this episode or towards the actual topic, please leave your comments in the comments section, if you're watching this on YouTube just below in the comments section and we will get back to you, there are any questions and you're listening to that as an audio podcast, please have a look at the show notes. There will be a link where you can reach out to us and we are happy to answer the questions. Please get in touch in any other cases, just have a look at the document. Thank you very much again, Martin, for being my guest today.

Thank you, Matthias. Bye