Unlocking the Power of Generative AI: Modern Data Quality

1818432

[UPBEAT MUSIC] Hi, everyone. My name is Yetkin Ozkucur. I run the professional services team here at Quest. With me today, I have Raj Joseph. He's the CEO of DQLabs. And today, we will be talking about the topic, "Unlocking the power of modern data quality." Wow, it's a mouthful. So, Joseph-- I'm sorry, Raj, why don't you start with talking a little bit about yourself, your background. How long have you been in this market? And what have you been doing? Sure. Thanks again, as well, for having me on board in this presentation. And then, also, just to give you some background, my first job is around data quality, trying to fix bad data. And fast forward 25 years later, I'm still doing the same, but in a way where many enterprises and organizations can leverage the power of modern data quality. So the passion behind it is primarily, this has been a problem since the day one of data. If you could solve it once and forever, that would be great. So that's what is really fueling the passion behind what DQLabs is. And we have been on this journey for the last three, four years. And we have been helping organizations and partnering with you, in specific, and doing a lot of different innovative things as part of that. Thank you. So today, we will have a glimpse of the new release, 2.0. I heard a lot of good things about it. I'm really excited to see it live today. But before we dive in, Raj, what makes DQLabs different? How is DQLabs different from the other tools in the market? Can you summarize with a couple of bullet items? Yeah, definitely. That's a good question, Yetkin. So I think there is a lot of hypes and things that is in the market. So the first thing we want to focus on is data quality from an end-to-end focus. So what that means is we first help organizations identify unknown issues, things that they may not even know of. That's one. And that capability is made possible by observability. Second, there are lots of organizations today where they know what the problem is. They want to measure. They want to have some abilities to handle it. This could be an example of partner data or things between two different systems. Source-to-target comparisons could be another use case. That's the second aspect of it, measurability from known issues. And the third one is having a context in terms of understanding and treating a data quality process or a remediation is very important. So that's where the discovery component comes in in more semantic, which we will more talk about how we closely align with the business terms, glossaries, things like that. And the last one is, of course, we are all doing this to identify and fix the issues. If we cannot do and improve the quality, then why do it at all? So that's the end-to-end focus. The main difference is we are trying to do it more automated, which comes along with an enablement. So in short, we position ourselves as a modern data quality platform providing these two different capabilities all in one platform-- augmented data quality, which is primarily data quality automation made possible by AAML empowerment and then data observability in one single platform for the users powered by semantics and [INAUDIBLE]. So that has been our primary major differentiation. Our features are more towards no-code automation, business user-friendly focus, and et cetera. No, that's great, Raj. I mean, I think it's clear how you guys differentiate. So today, like I said, we'd like to focus on the newest platform. I really heard a lot of good things. I've seen a couple of demos. And I've seen, also, our customers getting so excited when they saw a glimpse of it. So can you give us an overview of what are some of the new features in this platform. What are we excited about? Yeah, I'll just give you some highlights. I think the main first thing-- we will show, definitely, a demo so that you can see the platform in action more than me talking. So we'll just set the stage in terms of how the platform is designed. One, in any organization ecosystem or landscape, you have multiple personas. So if you see this triangle, you can use this as a way to understand organization hierarchy from top to bottom, on the top being the leaders. At the bottom are the foundational infrastructure, the data architecture, where the data engineers are primarily the one who is responsible for making sure data is available, reliable, and et cetera. In the middle is the consumption layer, where business users-- it could be reporting analysts, or it could be data scientists who are building models and things like that of sort. All in all, what we have built is a platform that kind of gives feature sets for different personas. So if you are looking purely from a data engineering persona standpoint, you can leverage observability capability to monitor the data all out of the box without doing any kind of rules and alerts back into teams' Slacks in a way that you can collaborate and identify issues. One on top, if you're more a data scientist or building a model or a report user who is building a BI report, you want to make sure your data is good. And so you have a very specific requirement in terms of measuring the data and making sure the data that is going into avoids any kind of degradation or report quality or model quality. There, you can use this ability to define your custom rules. You can use a semantic layer to automatically configure or identify automatic and discover rules. And all in all, ability to do a business quality evaluation all out of the box. The one on the top is where the leadership maybe want to know if the KPIs that they use for decisions are useful. And they want an easy way to tag and understand the KPIs and the trust ability of the KPIs in terms of the decision-making process, along with the reporting insights. So we have lots of features by personas, which we'll go through into the demo. That's one main call-out. The second call-out is, if you see, the whole platform is designed into this observe, measure, discover. And so you can see these are some core features. These are some augmented features, which we will talk about. And then we are also leveraging genAI capabilities, which I'm going to show a little bit of the demo. So bringing that all into a platform, one single platform, and then making it more conversational is another aspect of that. And then we will see some of these observability use cases right out of the box without no code, how you will be able to use it in a platform-setting standpoint. Then we have way more integrations than anyone, some hundred-plus product integrations, 250-plus measures and checks, and 500-plus quality profile metrics, things like that, and et cetera, which we will also get into that. Sounds exciting, Raj. So are we moving towards the demo now? Yeah, yeah, let me go into the demo. So you should be seeing a screen. Let me know if you do. Yeah, I can see it. So the first user, let's say I'm an end user. I'm coming in, and I want to know, what is my quality of all the assets that I may have in a landscape. Or if I'm searching to build something for a report, or even from a data scientist standpoint, if you want to build some models, I want to see it, I can quickly come here and search by any of the top searches. Or I can look into-- put any kind of keywords and then search if there is any tables. This view gives you a scorecard view, where you have every asset, the type of source it is coming from. We have fabricated, into a simple metadata information, a data quality score, the number of alerts that may be automatically found, and any issues that you are trying to work through from a remediation standpoint. So this scorecard view is not only available for any assets. If you want to go into, for example, Tableau reports, you have the same visualization. If you want to go into pipelines, like DBT, for example, you have the same things. And if you want to go into attributes, more like columns, you have the ability to. Not only that, you have lots of different filters on your left. If you want to quickly go into focus on the attributes with the issues, you can simply come down and two clicks like this. You also have an ability to define by scores. Either you can look into the topmost score or the least score, whatever filters that you may choose. And then, especially with a catalog system such as RDI, you can also have some governance stewardship filters, like any domains that you may be related, any applications, or even any custom tags. For example, if I'm interested in understanding, OK, this is my data science model that I created, and then I want to see all the elements associated with it, I can quickly click into this and then see all those associated filters related to that. In this example, I'm kind of showing you API-related elements that may be filters. The quick navigation gives you a high-level visibility without going into the details of what, how, and et cetera, but gives you an easy way to discover new assets but with more quality metrics associated with it. So that's one use case from an end-user standpoint. Let's say if I'm now a data science modeler, I'm on the process of building a claim policy or insurance model, a risk-mitigation model, then I can go a little bit more into the tables or the data that I'm interested in. So I'm now into a particular table I'm interested in. I understand this is coming from Snowflake. I see immediately all the alerts that has been automatically identified. And then, also, if I see on this top here, I have some metadata-level information. It gives you the score, alerts the issues, how many users have seen this views, the usage factor-- what are the queries that has been run today against this particular table-- but more importantly, also the stewardship information, the owners connected with it. But mainly, I can go from here and look into any one of these particular alerts that comes into it or if it's of interest. Or if I'm really looking into a particular column and go more into a profiling information, I can really go from table to this particular column, City, in this example and do something called [INAUDIBLE] profiling. So this is a data distribution page, which kind of gives you more statistics about your completeness, your cardinalities, your character distribution, space distribution. But not only that, every parameter that you see here, you can simply click and then also observe. The platform is automatically observing for you. So if I come here, I can see the distribution of null counts over a period of time. And anytime when there is a high level of nulls, I can get alerted with this. So this way, as a data scientist, when I'm building models, not only I can keep in check in terms of feature testing, model testing, and et cetera, but also keep in tab the quality of the data that is feeding into these models easily using DQLabs in a more automated way. We have almost 250-plus profile metrics. And here is an example of a value frequency, where you can see the platform is automatically alerted with the bad data and the good data. And then if we come further down, you can see, also, the distribution of the values within a particular column. And then same-- the platform using the business context has identified what is good and bad. But more importantly, if I see something that is bad, I can also come here and tag those data in the platform, which we'll learn from those. So this is kind of the automation and the power of AAML we are talking, where users can immediately get a lift just by connecting to one particular table and understanding a little bit about the distribution, the characteristics of the data for multiple different personas, and use cases of sort. Yeah, Raj, I mean, the UI looks really much more cleaner, much more business user-friendly. I was going to ask about the persona question. So obviously, with this kind of UI, you can definitely speak to business users and end users, correct? Yep. And to talk about a little bit-- so we, a little bit, segued into the business users. But if you also see, there is another aspect of data engineering and more engineering users. So for engineers, when I go look into this particular triangle slide, which I was showing here, here they want to make sure some metadata-level information is available and good and there is no deviation. In those instances, they can come here at the property level. Without going any details, we have automatically, for every table, every asset, we are measuring some observability metrics at the table level. So this is like Volume, Schema, Freshness. For example, if I click this Freshness here, what this one really gives you is the time when it gets updated, or how up to date your data is, really the freshness of the data and when there is a place where the table is not getting updated for whatever reasons. It could be a pipeline failure. Or it could be your partner data didn't come in the time it was supposed to be, and the loading didn't happen. All of this can be easily identified, and an alert could be sent. So by this way, your engineers can make sure your data is available at the first place. And then, on top of it, your business can come using features like what we talked just a few minutes ago with the deep profiling and data distribution and trending. They can put this code on top of it. And the leaders and the other personas who are navigating into this, end users, can simply come and get all this information in a much more standardized way and decide on what to use and where to use and et cetera. So that's the three personas we are talking, really. And everything is done out of the box with minimal coding for the users to leverage on this. Now, that's also interesting, Raj. Observability, obviously, is a very hot topic. We hear a lot of different vendors talk about it in different ways. But I do realize you kept mentioning the alerts. So if I'm understanding right, all these different personas, different users, they don't need to keep coming back to the platform to see how the data quality is changing. Or they will be just notified. So they don't need to keep coming back and looking at it. Yeah, agreed. That is the purpose. So assume DQLabs is an engine that is running behind the scenes. So we automatically look into your whole ecosystem and identify all those alerts. So the alerts is also not only found but also categorized in terms of high, medium, low alerts. And any one of those alerts also has more details in terms of what it fails. So for example, I clicked into this one particular information. It just shows there's a phone length range that has been a problem. And it has created an alert. From here, if I want to create an issue, I can also create it. And both the alert and the issues can be pushed into either a Slack or Teams, whatever your collaboration tool choice preference is. So to your point, they don't have to come here if you set up the integration into whatever your medium of choice. It could be email. It could be Slack, Teams, or even if you want to manage a ticketing system or a ServiceNow workflow. This is great. And another question is, Raj, I mean, when we think about data quality, of course we think about trust in data. What's the quality of my data? Of course, answering all those questions, that's the primary purpose. But in terms of ROI, in terms of, for our customers, when they invest in DQLabs, what other benefits can they expect, like, for instance, operational perspective, cost, or performance, or other aspects? Can you talk about that a little bit? Yeah, so we normally look from a table standpoint. There are-- as I was pointing out, we look into Tableau reports or any kind of BI reports, and then also pipelines. So this gives, for example, an overall view into your ecosystem in terms of everything. So let me go to the one example table I was looking into this. So if I come here, I can easily identify the usage of this particular table, the number of issues, and then also the related [INAUDIBLE] in terms of utilization. And this could be for any deviation. Furthermore, [INAUDIBLE], if you see here, we are showing an example of a-- if I come here into this, let me go down. See, not only we are showing the information, but, this way, you get the overall on usage from what pipeline. And then from here, I can go into the pipeline, too, and then also do whatever needs to be done-- all fabricated in one way and available in the platform, which, again, is available from a reporting standpoint. So there is multiple users who can come and create reports in different ways. So for example, I'm more looking into, from a cost standpoint or a measurement standpoint. I can create based on whatever it could be. It could quality score. It could be usage score. It could be cost-based metrics. All these metrics can be done in different tables and charts. Or even, you can create a hierarchical dashboard, where you can see this is your organization, high-level assets. You can go into domain. And then further down, you can go more into the insights of the particular attribute, and et cetera. Similarly, if you want to look more into a scoring or an alert standpoint, you can also do that within this dashboard, where you can see timely information and a navigation standpoint. I can create based on whatever it could be. It could be quality score. It could be usage score. It could be cost-based metrics. All these metrics can be done in different tables and charts. Or even, you can create a hierarchical dashboard, where you can see this is your organization, high-level assets. You can go into domain. And then further down, you can go more into the insights of the particular attribute, and et cetera. Similarly, if you want to look more into a scoring or an alert standpoint, you can also do that within this dashboard, where you can see primarily from a issue and alert navigation standpoint. That's very interesting. And as you were scrolling, Raj, I mean, I keep seeing Snowflake, SAP, different technologies. I know you showed the list of technologies earlier in your slide. I'm sure our customers can find it on your website. Yeah. How about I'm a customer looking for a technology, and I cannot find it? You guys don't support it yet. Yeah, so we know that-- today, we support 100-plus integrations. Some of the major call-outs earlier I was showing here. You can see a lot of call-outs of this. All the major providers we do support. In case, let's say, we find something where we are not supporting today, we have created a process where our connector development is just two to four weeks from creation to all the way to go live. We have streamlined the process. As you can see here, I just selected a subselect just to show the power. We have data virtualization support in terms of Denodo. We have pipelines, like Fivetran, DBT, from a transformation standpoint, legacy tools like DBTU, modern cloud-- lake clouds like Databricks or Snowflake, SAP ecosystem, HANA, ECC, BW. We have a whole database and then Google BigQuery and Azure Synapse with the AD, just to give you a spectrum of how we support all the modern cloud, like AWS, Azure, Google, and then also some of those on-prem-based SQL servers, Legacy, DBT, and things like that of sort. And these are directly from this. And the connection is also very straightforward. You can simply come into any one of these connections here and then establish-- if I click into this, you can connect to any instance, either at the account level or at the table level. Here, you can see you don't have to select all. You could if you want. Or you can go table by table or even within a specific set of columns. There is much more granularity which is available right out of the box. Yeah, that's impressive. And doing a time check, I think we have less than 10 minutes left. I'm used to seeing it on prem, like on the actual desktop application. And I realize you did the whole demo on the cloud. And you did mention there's a cloud option, which, to be honest with you, it seems quite similar to what we had on prem, I mean, from a feature-function perspective. Can you talk a little bit about that? Is there a difference between the cloud and on prem? Or how do you recommend to our customers when they try to decide which way they go? So what's your guidance? I think, in the cloud, it's all auto-scaled. So you don't have to worry about any of those infrastructure, instrumentation, or in terms of any scalabilities. We automatically, on the back end, do the workflow orchestration, do the compute, whatever is needed, and scale based on the size of the data that you are processing as you are doing it. If it's in on prem, then if it's a [INAUDIBLE], then you are talking about limited scalabilities. We can do more like EKS and Docker in terms of containerization. But then, also, we are kind of a little bit dependent on your infrastructure engineers. But today, we do support both cloud as well as on prem. But I think it really comes down to the speed and scale of deployment. If you don't have any resources or are a little bit less regulated or you are OK to use a cloud, you can do it. But the main difference is we do not pull any data. So it just makes cloud a much more easier option. Since we are not pulling any data from your system and we are just pulling and extracting metadata information, the process is very seamless. Your data never leaves. We are SOC2 type 2 compliant with HIPAA. And we are expanding our regulation and compliance in different ways now for different verticals. So the security and the privacy is well taken care of. So it makes it an obvious choice to go faster with cloud versus going through a process of deployment. Thanks. I mean, it's good to have options, right, like you said. All right, so, I mean, starting to wrap up the demo here. A lot of new features. A lot of nice UI. I mean, I'm impressed. I'm sure our customers-- so what's your personal favorite, Raj? So what's your personal favorite new feature? I think-- that's a very good question. I think there is a lot of features I like. The one I most like is the feature around genAI. Of course, genAI has totally changed the way we see in lots of different applications. When it comes to data quality, we see it's making it even more conversational. Assume DQLabs is an engine powering all this information on quality observability. But now, with genAI, you have an additional of CoPilot, where you can converse and then ask any questions. You can ask anything in terms of the quality checks that you want it to do. You can even create checks. And then, not only that, you can also ask for missing data, things like that. There is a lot of different use cases we can do. So here is a quick architecture in terms of how we are navigating through that. Here is the CoPilot. But begin the CoPilot, we are using our metadata repository to make it metadata aware. As a user is navigating through multiple different pages, it tracks the page where you are. So it is a little bit more context aware. It also knows the abilities of what actions you can do, all made possible via DQLabs' LLM foundational model. Also leveraging some of the client data store-specific LLM models, we are making this option for various different use cases. Something to call out, you can identify now problems in large data sets by asking questions versus going and configuring something or looking into some issues or alerts. You can recommend what could be a missing value. You can even write checks. You can ask CoPilot to do some of those checks. There's a lot of different use cases which we can cover. But in the interest of time, I'll go really quickly some use cases here. Let me log into this platform. You can see there is a CoPilot here, which is there. So within this, I can ask various different questions. I will show some simple questions here. If I'm interested in understanding what is the most-used asset here, I can come and simply type, give me the shop, show the top most-queried asset. And it gives you immediately a feedback into the table that has been more queried. You can, even more interesting, so in, like, show me the least or the most-queried asset with the lowest score, if can type. So I'm interested in, OK, show me the one which has bad data, but also the most used. So you can have a little bit nuances using all this metadata information. And it also gives you this. Not only that, you can go straight directly into that particular table. From me here, by clicking into this, now you can do whatever you want to do. Another simple example, if you want to, in the interest of time, let's say I'm looking into a lot of these issues here. Let's say I'm going into this particular alert. And I want to now create a remediation plan or an issue. I can come here and say, create an issue for this alert. So now what happens is the platform CoPilot automatically identifies this is the issue we are talking about and then goes and gets all this information and then extracts this and then gets a create an issue. So now you can see there is no issues associated with it. Now, if I close this back and then again refresh the screen and go back to the alert, now you will see there is an issue that is tagged. And the CoPilot has created an issue. Not only that, it puts lots of information on the issue in itself. It has also created a workflow in Jira, which I can go and do this from here, assign the users, created a PDF report, and all that available. Now I can take this particular issue description, go into my Jira, and then search for it. Now I can find this same issue being reported here with that information. That's a quick demo. There is a lot more capabilities that we can show. But I think, as I was pointing out, we are making it more and more conversational using the abilities of genAI. And more to come. And probably a separate demo itself is needed for this feature set. Wow, Raj, I mean, a lot of new features, a lot to look forward to in the new year with the new release. So thank you very much for your time, Raj, today. And thank you very much, everyone, for participating and listening. As always, if you guys have any questions, any follow-up conversations, feel free to ping us in the chat. Or just find us on quest.com or [? urban.com. ?] We'll be happy to reconnect with you guys. Thank you very much. Sure, thanks. Thank you, everyone. Thank you, Yetkin, for having me. [UPBEAT MUSIC]

自动执行备份和灾难恢复

成为数据驱动型企业

获得全面的数据保护

迁移和整合Microsoft工作负载

管理和保护您的终端

增强混合AD和Microsoft 365的安全性

增强基于身份的安全性

Unlocking the Power of Generative AI: Modern Data Quality

United States of America (EN)

会社情報

支援入口網站

联系我们

资源

Security Guardian

Recovery Manager

KACE

NetVault Plus

erwin

Foglight

Toad

SharePlex

On Demand Migration

Migrator Pro for AD

Content Matrix

Archive Shuttle

Disaster Recovery for Identity

解決方案

自动执行备份和灾难恢复

成为数据驱动型企业

获得全面的数据保护

迁移和整合Microsoft工作负载

管理和保护您的终端

增强混合AD和Microsoft 365的安全性

增强基于身份的安全性

依平台瀏覽

依產業瀏覽

Unlocking the Power of Generative AI: Modern Data Quality

会社情報

支援入口網站

联系我们

资源