Revolutionizing Data Accessibility: An Insight into Power Virtual Agents and Azure OpenAI

Revolutionizing Data Accessibility: An Insight into Power Virtual Agents and Azure OpenAI

Revolutionizing Data Accessibility
Dewain Robinson

  • Ever wondered about the vast universe of Power Virtual Agents and AI? Brace yourselves for an enlightening conversation with our guest, Dewain Robinson, straight from Nashville, Tennessee, the Principal Program Manager for Power, Virtual Agents, and Conversational AI at Microsoft. We dive into the intricate workings of Power Virtual Agents and how Azure OpenAI service can revolutionize data accessibility by creating an easy-to-navigate knowledge base. 
  • Our discourse traverses the democratization of data science and AI, revealing how co-pilot is opening new doors for people without coding backgrounds, and how large language models can extract knowledge from data. We navigate through the multifaceted world of Azure OpenAI, its significance and the necessity of recognizing the loopholes when training a model. Dwayne also shares insights on how Azure OpenAI in Microsoft Teams can make data access more efficient. 
  • As we advance, we tackle the challenges of using large language models and search engine optimization to help customers identify data issues. The importance of starting with public data before using internal data is emphasized, alongside the benefits of publishing content on a web page. We wind up with a sneak peek into the upcoming innovations with Azure Cognitive Services and their potential to create more powerful virtual agents and conversation AI. Prepare to be amazed by the technological advances that are just around the corner! 

AgileXRM 
AgileXRm - The integrated BPM for Microsoft Power Platform

Register your interest for the 2024 90-Day Mentoring Challenge. ako.nz365guy.com

Support the show

If you want to get in touch with me, you can message me here on Linkedin.

Thanks for listening 🚀 - Mark Smith

Transcript

Mark Smith: Welcome to the Power 365 show. We're an interview staff at Microsoft across the Power Platform and Dynamics 365 Technology Stack. I hope you'll find this podcast educational and inspire you to do more with this great technology. Now let's get on with the show. Today's guest is from Nashville, tennessee in the United States. He works at Microsoft as a principal program manager for Power, virtual Agents and conversational AI. He's been working with conversational AI for eight years and with Microsoft Technologies in IT for over 25 years. He combined links to his bio, social media etc. In the show notes for the episode as well, as we'll discuss a couple of other topics and we'll put links in the show notes so you can jump straight to what we refer to in this episode. Welcome to the show, Dewain.

Dewain Robinson: Thanks, Mark. Thanks for having me today.

Mark Smith: Good to have you on the show. I always like to bridge the personal with the professional to kick off an episode. So just to start with food, family and fun. What do they mean to you, all those things, when you're not focused on your day job?

Dewain Robinson: Yeah, so I got married, have two kids, one 16-year-old and one 13-year-old, and one of them is a field goal kicker. It's being professionally trained and it's actually pretty good at that. Then a daughter who just won a national championship on in cheering for competitive cheerleading. And then, as far as food is concerned, unfortunately it's a little bit of a too heavy of a sin for me. I like a lot of Southern US type of foods, which is generally bad for me but it's enjoyable on the back end. But outside of that, I do a lot of DJing and stage and lighting design. I actually even do that for a friend of mine who used to be the lead singer of Creed, Scott Stapp. So I do a lot of work with him and his charity as well to do stuff like that. So lots of stuff Everybody, I think. This week, Mark, I even learned how to stream from a drone through an live stream so that my son's football games I can stream it out over the internet and do the cool stuff that you do.

Mark Smith: Wicked? Are we talking about a DJI drone?

Dewain Robinson: I am talking about a DJI drone.

Mark Smith: Awesome. Can you see that in the background there on mine?

Dewain Robinson: Oh, yes, you. In my opinion, I had to buy that same controller that you have in order to be able to live stream.

Mark Smith: Yeah, so the funny thing is I virtually never used that controller. I tend to use the one that I plug my phone into rather than the one with the onboard screen. But, yeah, okay, so that's awesome. And the other thing that caught my attention in what you said there is lighting design, and, for Parsons, in my life, I've been involved in that area myself as well, and isn't it just an amazing area and what you can do with software and lighting these days.

Dewain Robinson: Oh, it's crazy. I actually have like two of the X lasers that you have to have FDA variances for, so I can do like atmospheric or laser shows on the side, and so being able to learn how to do that and then like computers coming into DMX control and all of the it's just it's a super amazing topic and who knows, Mark, maybe we'll end up having a podcast conversation about that one day.

Mark Smith: It's so cool, so cool and it's just, it's such a creative area, I feel, because it can bring you know concerts, events to life, like yeah, I mean the last, and I'm probably gonna get some flak for this the last big concert I went to was Taylor Swift with my wife, and what I couldn't get over, just the amount of theatrics now involved in a concert. You know it's everything from in the audience having bands on your arm, that light up, or you know things that as you enter at the gates that they give you to. I can't even remember how it was done because that was actually another concert I was at in Sydney as well and just the whole thing that plays in the audience plays into the theatrics now of big stage events. It's pretty phenomenal where tech has enabled this.

Dewain Robinson: Oh yeah, totally. Could you imagine, as we kind of start, as we talk about like conversational, could you imagine just programming lights by just telling it the mood that you wanted?

Mark Smith: Yeah, amazing, right and things like that. We're not far from that right. Yeah, yeah, totally. Now we're gonna talk today a bit about Power Virtual Agents and AI, and I hope I don't come across as too excitable about this, because I honestly think this is the biggest opportunity in the power platform space bar none. I mean co-pilots awesome, don't get me wrong and where that's ultimately gonna play out. You know I'm loving the co-pilot, particularly in Power Automate and the signs of what's gonna be possible from app development, but when I think of organizations and the amount of data that they have tied up inside them, that isn't so data silos, et cetera, that they obviously can't in many situations. Most a lot of companies in the enterprise space have banned their staff using chat GPT right, the public accessible. Because why there's been already a bunch of use cases around people leaking confidential data out to train those models and, of course, microsoft and why I just wanna dwell on this a second, because it took me awhile to get my head round it, even back at MVP Summit, because I couldn't figure how you could hand your data to chat GPT and, if you're like, not transfer across to a public system, until I understood, of course, the whole concept of Azure OpenAI service, which means the large language model sits in your Azure tenant and does your bidding on your data as you make it available. Why this excites me is because really anybody can bring their data inside their Azure tenant and then, with the power virtual agent interface, they have a way of tapping into current data, historic data. The knowledge of that organization becomes available in a fully secure, trusted manner. So, as I said, hopefully I don't get too excited about this, but I just think this is the most amazing area of tech at the moment.

Dewain Robinson: Yeah, totally, I think, the shocking part of this. So, Mark, you gotta realize I've been in this whole conversation in the AI space with bot framework and everything, and it's just been Kind of like it is going along and people were interested and you have these implementations here and there, but the just I think it was around Christmas time this past year when this just exploded and then we double down in the power of agents team. I remember the day that my leader came in and he's like no, this is the thing, like we have to get on this, and everybody Just went heads down to make sure that we could really deliver some really interesting experiences that were really differentiating in the market space. And this is where you saw the other generative actions, the co-pilots, the Generative answers type of feature sets coming into the product and how can we make these things work together and such. And you just saw this massive increase in the in that direction and over time, it's just really started to make a lot of sense why, why conversational applications are we gonna become really a thing of the future? And we just didn't think it was gonna come quite as quick as I was still worried about, like in L U and how is the best way to train the NLU. And now we're at a point where I would say, in twelve to eighteen months you probably would never train an NLU ever again because of this type of technology. So it's just an interesting innovation, you know, and so quick, that's just the craziest part.

Mark Smith: NLU, just for the audience meaning.

Dewain Robinson: Natural language understanding. For the excellent brilliant, brilliant.

Mark Smith: Tell me what does what in your words. You know you, cuz you're living and breathing this every day. You know for the last eight years, but particularly in what's happened since Christmas. As you said, power virtual agents and I. How do you explain it to people? How do you get them excited? What are your thoughts?

Dewain Robinson: Yeah, that's a interesting challenge. So you know, if you're talking to just the general layman, right, you kind of have to explain to them. There's a smart speaker you talk to the Thomas. People think of conversational I or they think of Imagine, go to a website, you have a chat experience or something like that, and so what people will tell you is, yeah, I hate those things, right, or are they're either really passionate about the use of all the time. What you you basically have to get people to start to understand is the concept. Imagine if that smart speaker that sat on your Desk had access to knowledge and information everyone's experience issue, ask a question to your phone's assistant or anything of that nature and you don't get. It really isn't program to do that thing. So what you end up with is you end up with a result that sub par, it's like, falls back to search or something like that. What we're starting to see now is the ability for those things to really actually understand the words you're saying. Before we're doing like keyword matching and things of this nature. Now it actually understands what you said and can understand the context of a conversation, but then it can also consume data in the same way and look for relevancy and be able to match those things up. I want you do that. You start unlocking the power of data and the natural language in a way that's never been done before, and so that's really where it becomes. The massive innovation is your teaching. You're telling the computer what to do In natural language and it truly understands what you mean. And it also truly understands the data and the things that are available in the back end. And when you start thinking like, oh, well, then you can even explain to it in natural language what an API does right or an application programming interface for those that don't know, like, how to Submit a form for something to a back end or help you find the best you know product that it meets your needs. You know you can pull that out of the words and pull it from the data and then map it and automatically get answers. The things just like is if you were talking to a live person. It's really innovative. And then you start getting into like, oh, and I can make it say it in the way you want, make it talk a travel agent or something like that. So it's really interesting. I think we're just at the tip of an iceberg Playing with the stuff right now, and I think we will get. It would just get better and better over time, because I think part of the problem is that the language models are in major major league Innovative right now, but it's the surrounding tech around it that makes it safe and makes it usable For the average person, because right now, what you see is a lot of people going back and forth and trying to figure out how to apply it. I think this will normalize itself and commoditize itself over time, and that's going to be where and that's what we're doing right. We're trying to take it and make it commoditize for use for the average person and unlock that power without having to have a team of data scientists sitting on your own staff. Right? That's kind of the key.

Mark Smith: Yeah, and it's interesting that you say that about data scientists, because I was always in the past, up to December January, the that for an organization to be massively successful, it's going to need data science as scientists on their team. And and dating back to who twenty fifteen, twenty sixteen, the organization work for, we had data scientists and yet they never saw the breakthroughs that we're seeing now. But also what I'm meaning is that when that was kind of a wow, that is a phenomenal skillset and and and out of the reach, now with prompt engineering and things like that, a lot of lay people are starting to get high value responses from data is just talking to somebody you say another Microsofty and they were saying that a person needed to build a connector to the power platform and to load a whole bunch of data and, based on the size of it, that was the preferred method of a power automate been the method. This person didn't know how to code. They use co pilot, you know, and your studio got the got it, got it set up and in ten hours, build a full working API connector and was able to load the sixty thousand records that being quoted by a partner. Fifty hours to do that and it's kind of like it's it's boarded into the realms of people that had the will to do it but didn't have the tech training, background resources to be able to make that a reality. And I think that's the you know the game changing kind of situation that we're seeing in a lot of organizations. It's, it is democratization right, it's making it available to everybody.

Dewain Robinson: Yeah, for those that aren't familiar with how to do a connector or whatever, you go, take a look at this, and I actually have a set of videos that I've provided Mark and it'll be in the show notes. There's like a I think I'll bake it down to like 15 minutes teach you how to build a custom connector to consume any API that you want. So if it's not already there or you have, like this custom endpoint that you want to bring in, it's super, super key, and if you start really thinking about it and, mark, I'm going to blow your mind for a second right, because we will get to talking about generative actions as we go into this conversation. And for those that don't know, generative action is kind of like lane chain. If you want to go look up lane chain or things of that nature, what you're going to see is how is a plug-in and a connector really different in this whole world? So people don't realize that I've been building connectors for the Power Platform forever. You've been basically defining an API and you're writing descriptions on what it is and you're explaining in natural language what it does. So what keeps you from using a large language model to be able to understand how to use it. And so when you start getting to that world, it really starts blowing people's minds like the stuff we showed at Build around generative actions. They're like, how in the world is it possible? And it's like when you have over a thousand connectors in a library that say what they do and you just simply explain what it does and plug it into your thing and allow it and give it how to connect to the service, it can just figure it out. And that's really it's almost scary in some ways. Mark right, it's, but I will say that we're not 100% there yet. I feel like there's a lot of, like I said, the wrapper around this stuff that you got to get down to be able to make sure that it does it with a high level of accuracy at the orchestration layer. But it's going to, it's not. It's not far away, right? I think a lot of people think, oh, that's six years down the road and I'm like, no, I think that's like six months down the road type of thing.

Mark Smith: Five to 10 years ago I would go around Australia and I would speak at events for Microsoft on big data the four mega trends that we're going to transform, and big data was one of them. Right, and I felt that what we have seen is organizations get really good at collecting data, but they haven't got really good at making it available. So they collect it, collect it. You know, things like GDPR et cetera have gone and put some governance around it, but they've not really good at making all these data silos across their data estate available for consumption, to grow a more informed staff to do the role of whatever they're doing inside that organization, and so I feel like the payback for all that data collection is really AI is going to make it possible to turn it into something practical.

Dewain Robinson: Yeah, I think there's still a. There's still a piece, and this is one of the things that's as we start working with customers in trying to unlock the knowledge out of the data. Right, you have to keep in mind that a large language model, for example, can understand the data, can understand what you asked about the data and can read tons of data and come back and give you an answer, summarize to offer the information. But I'll give you a good example of where there's still a challenge in the world, right? So there's this for those that don't know, there's this feature called generative answers. That's inside of Power Virtual Agents and what it is is. It's basically what we refer to as search and summarization. So it searches across data that you pointed at and then it understands the question that you asked and without you having to have data scientists to make sure that it doesn't cuss at you or say something it shouldn't say. What it does is it will search and summarize the answers over that data set that you have, that you pointed it at, and if that data set happens to be stored in a Microsoft place, we figure out where's the best place to search to get that content. So, if it's a website. It will use Bing and the Prometheus engine to share a point. It will use the graph, things like this, right? So what we found, though, is we went and pointed it at all of the different financial reports from Microsoft, and then we asked it who's the CEO of Microsoft? The problem is is that the answer you get would be different. Sometimes it would say Sacha Nadella, sometimes it would say Bill Gates, sometimes it would say Steve Ballmer right, and the reason is because I think where we're going to start seeing people to really unlock their data, they're going to have to start thinking data equivalent to code, in a way. So the same way you think about you have data and you develop the data for your website and you go through a publishing process with the data and the content, to say this is the authoritative answer this is that article that says how to do this or this? is that knowledge-based update? Right, you might have versions of it, but what you want to present to the model and to the responder or the response system? Right, because you want to present the authoritative answer data, right, you don't want all the drafts, you don't want all the old stuff, and so what we're seeing is people having to realize that they need to think about their data publishing process and how they're doing it. Like pointing it at a SharePoint library that you have everybody's making a copy of the same file and coming up with different answers to the question will just get you inconsistent results. It will be more results. So it's kind of like if you put garbage in, you'll get garbage out. Everybody always says that. So now what we're seeing is people, especially with PVA. What our hope is is that people will stop trying to build models to do search and summarization. Just use the commodity for that and focus on data quality to the model, and that's where you will get higher quality results out of it. It's actually not necessarily in the prompt engineering, which is where everybody seems to want to play right now, but that stuff's going to get commoditized pretty darn quick, so I don't think you really need to play in that space too much just to get knowledge summarization and things from content.

Mark Smith: That alone is a phenomenal cost savings to organizations. I'll give you a use case working with a bank, a large bank, across Australia, new Zealand. They have a call center, internal call center that costs them between three and four million per annum to run and its job is at a bank teller if they have a customer in front of them and they need to know the 90-day fixed term rate. Now when they type that into their data repository, like you said with the example of the who's, the CEO of Microsoft, they get 300 responses of that fixed rate could have been from five years ago, right, and what they want is the. So what they're doing at the moment is that that teller, it's calling up somebody in the internal contact center getting them to search and filter down and give them it's 17.9 or 5.3, right, percent, because that's today's term deposit rate. And of course we're going hey, this is. You know that teller could easily, with a chat experience on the device, key in what was requested and it gives the answer based on the location, the date of today, etc. And provide the most relevant answer with links to if there's a document that needs to be handed out with it, etc. Disclosure statement, that type of thing. If that can be solved, that's a massive savings right that's to the business.

Dewain Robinson: Yeah, totally, and see, and that's the difference between so I'll put it this way, so I'll give you a great example, mark, where we see people getting really confused is the difference between knowledge extraction, or what we call generative answers, and generative actions. And why are they different? Right? Yes, generative answers is the fact of I have a whole bunch of knowledge that is in documents and things like that, and that information. I compare this, if I talk about it in, like a cruise line example right, whether or not Jack Daniels is included in the premium drink package or not, it's generally something that's in an article somewhere. That's something you're going to extract the data off of that. But if I want to know what cruises leave out of a certain port within a certain price range, that's probably not something you're going to pull from content. People confuse that in a website because they go well, no, that's on the website, but it's not stored in the website as an article or something. Right, it's actually an API that's being called on the back yet. So, in the case for which you were talking about, this is where generative actions would play, because you would have an API that understands what are the pieces of information I need to be able to go get the answer to that question with that rate is, and then I will return it. And so the cool thing, what we're going to see happen and this is going to be happening very soon is the ability for you to be able to just plug in that connector to that API in the Power Platform and just say if someone asks this question, here's the API to get it from. And if they provide the information, if they say what is the rate in Australia, then it will automatically fill in the API to say I need the Australia one. If you don't say that, then what it will say is well, what country are you looking for the rate for? And then you'll fill it in. So, whatever you don't provide as required information in that connector, it would generate a question to get the answer from the user and then provide it. But when it starts getting really crazy is when you get into the planner, and the planner is the ability to say, oh, the information you gave me through this conversation, I already know who you are because you authenticated, so I can go to your profile through a different API to get what location you're from and provide that location in so that I don't have to ask you that question and that's the difference between the planner and the connector component of it, and you'll see generative actions. One of the things that we're looking at is breaking it into those two different major components, where there's one component, that's the planner, that helps orchestrate it and figure out a plan to answer the question, and another one, which is how does it consume an API or a plug-in just to be able to make it, where you don't have to author all of the different ways that somebody might interact with that API. So it's kind of really you start getting into this. It's really starts to make your head hurt in a way.

Mark Smith: And it makes you want to get your hands on and really start experiences. Now, one of the resources we're going to provide in the show notes was to a blog post that was written by Sarah Critchley on August, the 7th, 2023. Do you want to talk a bit about what was announced in that and really what the impact it is for people that really want to get going now on Power Virtual Agents, get up their skill level, because this is really, I see as a practical way to use AI generative AI in your organization.

Dewain Robinson: Yeah, so there's a lot of confusion in two different key areas that it's going on with people in this, and when we get into, like when we're talking about extracting information out of knowledge, which was generative answers right, a lot of people kept coming to us and saying we're totally confused. Generative answers is this thing inside of PVA that will answer stuff over your data. Then I have Azure Open AI on your data what is that? And then you have Azure Open AI which is what? And so I had to start liking this. I like to use analogies that people can relate to. So Azure Open AI is like buying a brick. Azure OpenAI on your data is like buying a wall. Then, when you look at generative answers inside of PVA, it's like buying a house. What do I mean by that? If you look at it, you say, well, generative answers actually includes an OpenAI implementation, an Azure OpenAI implementation that's trained by data scientists and a whole bunch of services to do query optimization, moderation and make sure that you can ask a question, you can point it at a set of data and it will respond to you with an answer, with reference links and things like that. You don't need to know anything about that, except here's the data I want to give it. I provided the question that someone asked. I provided where the data is, that I want the answer done and it just handles it. But what if your data, so this thing, mark? What about if my data is stored across multiple locations? And maybe my data it's stored inside a PDF and I need to be able to vectorize, search over it to be able to improve the search results that are being given back? Well, that's what you get with Azure OpenAI over your data, because now you are in control of the index or the query itself that is going in. But you still didn't have to. You're still taking advantage of all of that tuning that's been done on the Azure OpenAI model without having to hire the data scientists to go do that. You're not managing prompts at this point, and if you're not familiar with prompts, for people who don't know what that is, it's simply think of it as the words in natural language that explain to it. You should never do this and you should do that. This is where we get into Azure OpenAI. Azure OpenAI is an implementation where you're in control of the prompts and there's two levels of prompts that you have to worry about. Think of it like if you've ever heard of the laws of robotics. There's a base set of rules that you want to give, give that can never be broken. Or it's like, hey, these are the things that govern the system. Then there are ones that come in behind the scenes, that are part of the query itself, that allow it to be able to do it. If you wanted to make it talk, when you say, hey, make me a poem about Tom Brady in the style of old English, the way that it's doing that is, it's actually in the prompt, in the query that you're sending, that the fact that you want a poem, the fact that you want it about Tom Brady, the fact that you want it in the style of old English, is actually part of the prompt that you're passing in as the query. But the base training of the model is don't allow someone to ask a question that results in insulting them or cursing at them.

Mark Smith: Yes.

Dewain Robinson: Things of this nature. If you look at the Bing thing that happened inside of Microsoft, where people were talking to the bot for hours and hours and then they were able to get it to do stuff that it shouldn't do, this is an example of the worry that you have to have as a data scientist. When you are doing Azure Open AI by itself, those prompts and things like that you have to come up with. It's almost. I explain this. A bunch of my friends are lawyers and I said a data scientist has to think like a lawyer. What are all the loopholes that someone might find in the training set? One of the number one things you need to do is tell it never to divulge its training, which is one of the things that we learned. So it's just stuff like this that it helps people understand. How much control do you want? But with each control level that you release as you move down that stack from generative answers and down, while you're gaining control, you're also gaining the need for you to really know what you're doing, because if you index content that people don't have rights to and you expose it externally and allow people to ask questions over it that was your decision on the Azure Open AI over your data Because you own the index, you own the control. So these type of things you just have to make sure that you understand, and I think that's the biggest risk mark. I think a lot of people find it because they can do stuff really easy and really quickly, but they don't necessarily think about what all do you need to do and a lot of people don't understand that. The reason why it takes a little while for us to get stuff out in PVA, for example, is we spend a lot of time with our applied AI team and our responsible AI team to make sure that what we're doing is safe, because we want to make sure that it's safe. It's not just because it's cool and you can do it and it's awesome in a demo. It takes a lot to make sure that it's good enough to go out in the public.

Mark Smith: One of the game changes I've seen is in the area of SAP, and a great percentage of business customers globally are using Microsoft Teams right now. These big enterprises will often have data locked up in SAP systems and you never find a general person inside an organization that says, oh man, I love working in SAP. Brilliant, I could spend my entire life. Most people are like, oh my gosh, it takes so much effort to get to the information that I need, and one of the use cases that are becoming up consistently is that you put Power Virtual Agents inside Teams and you can go, hey, are there any outstanding shipments for this week? And it can go and query that. Come back with an answer. That would have been 15 clicks in the past. Within the SAP experience, people are really stuck, but you give them that data model. One of the things you're not wanting to do is go hey, you have access to Workday and now I can query people's salaries inside the business. You're making sure that that kind of protection is in place is so important and that the what data is accessible, because a lot of companies have, of course, confidential data that's even confidential internally. It's not that you're an employee and therefore you've accessed everything. There's need to know in many organizations. How do you see companies getting that bit right? Is it going to still need the data scientists inside the organization that says, listen, even though we're giving it access to this data set, we've got to make sure it can't access this portion here, because that's confidential and it should be based on someone's security role as to whether they can query. That's it there. So that whole identifying who they are is important to the questions that they can ask.

Dewain Robinson: Yeah, it's an interesting challenge because, by the way I was talking about before, I actually see this function coming before too much longer within organizations that are going to need to be able to think about, and you already sort of have it today in the form of data architecture, and so what I anticipate is that this is going to become the instrumental component that organizations have to go make sure they get right before they can really extract the value out of large language models, and so it's interesting. It's like I said before people are really worried about how they go do prompt engineering and how they go do all this stuff, but what you're going to see is every vendor in the world is already figuring out how to do that over there on where they store their data. The question becomes is how are you going to publish that data in a secure way and in an authoritative way, and what are the key triggers to be able to make sure that the system knows One? Do you have rights to ask this question over that data, like you can ask a question, but I could also come back and say I'm sorry, I can't answer that for you.

Mark Smith: Yeah, yeah, yeah.

Dewain Robinson: Maybe I say that to one person in the organization, but I don't say it to the CEO because he has access right. Yes, yes. So when you start looking at things like this data access, data quality, data governance all of these are going to be massive places that people are really going to need help, and I think, if I look across like our partners and things of that nature, or you start looking at organizations, they're so busy worrying about how to get the response off the data that they, most of the time, what we've found is they can't go forward because they realize very quickly that the data that they have access to hasn't been managed properly and therefore they've got a data quality issue.

Mark Smith: Exactly.

Dewain Robinson: And I will tell you that. Here's a. I'll give you a funny story, mark. So we launched generative answers with PVA and the first feature that we did is we launched it with the Bing Prometheus engine. So if you pointed out a website and what we ended up having was we had a bunch of customers come back to us and they said, hey, why did you answer this question? There's a problem here. You shouldn't be able to answer that question. And how did you get that reference link to where that is? That's not supposed to be where people can see it, and you know what? it was. It was the fact that they put it out onto the internet. They never did anything with the SEO or search engine optimization. Yes, yes, they didn't say don't allow a search engine to crawl this thing, don't crawl it. And they ended up finding all kinds of things that they had basically put in obscurity, thinking that it wasn't going to get crawled by search engines. But the search engines found it and therefore it could answer the question, and so we ended up turning it around and saying, hey, we just helped you figure out where you have data issues, so it's really funny that you will find every data issue you have ever had in your life by using a large language model. It's pretty interesting and that's why I tell people start with. I think a lot of people think, oh, I just pointed out my entire SharePoint or my entire workday and that's probably not the right first step. The first step is crawl, Give it a few things. Maybe give it a few product manuals or something like that or something of that nature. And also it's also interesting that people seem to want to start with internal data. But is that not the scariest place mark? I would actually start with public data and work my way to private data, because it's so much easier to take something that's public and just say here's the quality of it, Make sure it's pointing at the right quality stuff, versus also trying to figure out data access rates and the risk that comes along with that. It's very interesting problems.

Mark Smith: For a public data set is what it's doing. Is it basically crawling all the links on that public data set or do you need an RSS feed in that public data set? What's its method of gathering the body of knowledge before it starts answering?

Dewain Robinson: Yeah, so in most cases, if you're using the Bing Prometheus engine, what it's doing is it's using the index off of Bing. So when Bing goes through and does its stuff, but when it finds results, it goes out and grabs the actual HTTP page and pulls that content in so that it gets the latest version of that information, and then that's what it does for the search and summarization action. But one of the things that a lot of people don't know, mark, is that you can actually use Bing with a custom query. What that means is, for those that don't know, you can go to Bing and there's a way to say I want to build my own custom query, and so whenever you pass it in and you say here's the URL, you pass in also this unique identifier as part of it and the result will be is it will scope the data down to what you want. So that's a way for you to be able to kind of control the data set. That's kind of being in return with Bing. Otherwise, you have to start thinking about the indexer and where is the index content and what's the ability. So, like a lot of people think well, I've got a PDF on my website. Well, bing doesn't, doesn't crack PDFs, so you've got to come up with how do you index that. And then if you have a document and let's say it's 150 pages, you can't afford to run that back in through the tokens that you would have to buy for a large language model. So that's where you start getting into vector based search being pretty important so that you can basically chunk the file. So maybe I only want to return a paragraph within a document, not the entire content of the document whenever I return it. So these are kind of the challenges that you start to run into and I think this is the part that when people first get into this, they're like, oh, it's going to be super cool and I'm going to be able to do it. And then you go oh, I got to get search right, I got to figure out how to index that content. Once you get that down, the results will blow your mind. But it's really, it's really kind of the key thing. So depends on the kind of where your data is and how you want to access it and what it, how is it stored. But most people for public data. I tell them the best way is to put it directly so you don't have to pay for your own indexing. The engines do a great job. So take your content and publish it in an ASVX page or an HTML page with the content, and as long as you don't make it noncrawlable, then the systems will do the work for you. You don't have to necessarily do it. You just have to make sure the content's up to date and it's right. Rest of it solves itself.

Mark Smith: That's awesome. We're already at time. Well, well past time. We could talk for ages on this. Anything you want to say in closing?

Dewain Robinson: No, I think. In general, for those that are interested in this, definitely stay tuned. You can go out. If you want to play with a trial, you can go to akams slash trypva. You can also follow me on YouTube. I think it's DeWayne 27 and I spelled DeWayne really weird, so make sure you look at the show notes on how it's spelled. Also, if you follow me on LinkedIn or join up on LinkedIn, what I try to do is make sure that the things that are relevant are published for people so that they can see. I try to focus on educational type of things. I'm sorry, just product announcements. The blog is a good place to get the product announcements and the show notes as well, but in general, get in there, get your hands dirty, start playing with this and enjoy the innovation that's there. Just be aware that my key guidance to people is don't try to be the wave ride, the wave of innovation. Let the big players commoditize where they're going to commoditize. Make sure you understand where they're going and just make sure that what you're doing is you're not trying to duplicate them or jump far ahead, because the reason why you might get ahead in the short terms you're going to cut some corners. It's likely to cost you in quality or some sort of hard lesson you're going to learn. So I would just say in general stay tuned. We've got a lot of cool stuff coming and, mark, you don't even want to know what all we're going to announce in the coming months. It's going to blow your mind even more. So please go to the Power Platform conference, if you can, or show up at Ignite. There will be plenty of new cool stuff that we're going to talk about as we get closer.

Dewain RobinsonProfile Photo

Dewain Robinson

Dewain Robinson is a Microsoft Principal Program Manager for Bot Framework and Power Virtual Agents. He has been working with Conversational AI for over 8 years and with Microsoft technologies and IT for over 25 years. Dewain has experience working with numerous Fortune 500 companies to help them realize their potential with Microsoft solutions and has worked to implement Conversational AI in numerous form factors from IOT-specific to Enterprise scenarios. Dewain has recently worked closely on generative AI and how to change the landscape of conversational applications.