Event Recording

Yann Lechelle - Personal Voice Assistants: How Personal, How Much Assistance?

Name: Yann Lechelle - Personal Voice Assistants: How Personal, How Much Assistance?
Uploaded: 2017-12-10T12:00:00+01:00
Duration: 31 min 8 s
Description: Presentation at the Consumer Identity World 2017 EU in Paris, France

Yann Lechelle

COO

Snips

Posted on Dec 10, 2017

Video Description

Presentation at the Consumer Identity World 2017 EU in Paris, France

Show Transcript

So SNPs is a very, very interesting company happen to be fresh, but because it's interesting, but dealing with very important topic, which is how do you, you know, all the ness of new interfaces, particularly with voice the Boies. Thank you all good afternoon. I think almost lunch time. So before we dive into the topic of personal assistance, I'd like to walk you through the last 70 years of user interfaces. I promise this will be quick, so we'll do it in 70 seconds. So this used to be computers back then you had to have a degree.

In fact, the computers were shipped with humans. It was very complicated. Then every 10 years you get an upgrade and the systems become more, become more usable here with punch cards, right? Still complicated. You still need a degree, but it's easier to use and interact with.

And then, you know, you could type directly into the machine, no feedback. You had to print it out. Still need a degree. These guys are invents of Unix, and then you have a terminal direct feedback, right? So interface becomes easier, more intuitive. Perhaps you don't need a degree, but you need to be seriously motivated to operate the machine. Some of you may remember this day, no degree required, but definite motivation and then the graphical user interface. So this is a new paradigm that brings a layer of abstraction on top of the, of the operating system.

And this was done at the Palo Alto zero center. And then this was the inspiration for lack, which was the inspiration for windows.

So this, this level abstraction allowed pretty much everyone say an adult to operate those machines for business purposes or pressure and allowing to address a much wider population. The next level of evolution is the personal digital system. Some of them were connected. Some of them were not converging exactly 10 years ago in the iPhone. And this is a convergence that allows today, nearly 2 billion people to use modern computing with, you know, quite quite another ease. So what about the next three decades? When you see that history in this timeline, you know, where does this bring us?

Well, typically computers will be more accessible. And if you map it here, you can see that as you go from bottom to the top, you get to different levels of attractions and you address a much wider population. So let's not talk about the last one of humanity, which is a, you know, farfetched topic, but essentially voice interfaces and romantic reality are next. And they are bringing technology to the masses where you don't need a degree.

Kids, any generation can use them that instructions. In fact, I'll talk about that later this here, but in the past interfaces where artificial in the future, they're natural.

In fact, you do need artificial intelligence to make a machines, adapt to us that the actually represents the digital divide. This is an economic factor. So we're talking about accessibility, not access to technology as a virtual of being affordable. So let's talk about voice inverses. That's the next bit slightly before we do that, let's still get at the power that generates the shift in the way we interact with machines and how technology gets to us. And it's not very stable and you'll see why.

So first of all, we live in a world that is today dominated by platforms, platforms, TVP educators in the new technologies. And we use them every day for work or personal life. And it's architecture around these three major constructs. The first one is the corporation. Their goal is to maximize quality profit, and they are using the tools, AI, and data to, to achieve that goal. And you have the states, hopefully they're actually maximize general welfare, hopefully. And then the citizens pretty much left on their own. They don't have access to big data. I don't have access to personal AI.

So it's fairly embarrassed, right? The anchor is at the bottom. So you have states and corporation that are driving a lot of data, big data and AI growth mix to drive the, the overall ecosystems. That's fairly advanced.

In fact, if you look at the numbers, there is a club, very restricted club, and these entities, they each have a billion users. In fact, there are two more. These platforms do not have an operating system and they're sitting on top of the operating systems and as being exception, being in retail, and then we can soon add any value, right? These entities drive a billion users and that gives them special powers.

That's a problem because they have a tendency to beat data, black holes, and the more they do that and the more they reinforce their ecosystem, the less corporations outside of these platforms and individuals can achieve the sort of balanced offering. So this is one problem. And then the problem is increased or enhanced or made worse. When there's a policy state that forces the platforms to donate the data or actually donate, essentially they take the data and use it to do a public state revers Europe, which is kind of slow to adapt to these shifts is rolling with the GDPR.

So they are gonna re some of the, some of the balance or some of the power back to the principles. So that's the first big, you know, framework that that is essential to understand. But then also consumers do have a choice. It's not true that they are alone together. They have the ability to choose which services to, to promote and take advantage of. So the first thing, when you talk about voice assistance, everyone knows that Alexa is coming in France. It has not yet arrived because Amazon has not localized to French.

Google was first in France, but Amazon excited the reference in Silicon valley, New York, London, they sold 35 million devices, almost one per home. It's amazing. People are concerned because these are listing microphones that keep streaming, you know, what is being said to the cloud. So what happens to your voice sound flips. Voice is a biometric marker. Just like the fingerprint. So is that okay for the voice to be, you know, processed on the cloud, people are concerned and some people actually say her say, oh, new generations don't care.

They, they share by default. Well, it's not true.

You know, the younger generations, if you ask them, they are very concerned. What they care more about usually is follow fear of missing out. So they will choose a platform. Yeah. Not because it protects them because they want to there because of social pressure, given two identical platforms, they will choose the one that preserves their interest. This is the use case here where SMS and WhatsApp for the longest time were not encrypted. So if you're using SMS, the operator of course knows what was being sent in the message, but then the states can also listen. It right.

Has been the case for 20 years, WhatsApp was no different. WhatsApp is owned by Facebook. Facebook is a big data, black hole, as it called.

Now, this is really a case study that is worth noticing because telegram came out of Russia and it had end to end encryption, which means that no entity could actually see what's going on inside the messages. And perhaps this was motivated by local hackers or, you know, used by terrorists. I don't know, typically a tiny fraction of 100 million users. Some people decided that this was a better equivalent product to use, which in turn worried WhatsApp cause WhatsApp is looking for a dominant position. So what are they to do encrypt right?

Facebook has no interest to build encryption into WhatsApp because it prevents them from having access to the actual data content, which allows them to build better bot and so on. So on, but consumer choice for equal service forced them to, to operate differently. The slide before actually was talking about resilience. When you have a cloud offering, the consumer is left with a service that could be functional, which happened to be the case probably six months ago when Amazon had an outage. So all of your home and connected devices don't over operated.

So consumers will have a choice because they're looking for features that are resilient and all things being equal. They'll prefer privacy. They'll prefer resilience over things.

So the, the cloud by default platforms are not necessarily the better choice, but they are the start. The other interesting factor from a consumer standpoint is that in the past century, technology was mostly available in corporations. The big budgets in it were there, and you actually discovered computing at work and so on and so forth. But today is to contrary, we have better technology at home than we have at work. And so as a, as a result, consumers force the workplace to adapt and provide different technologies and platforms.

Perhaps you remember five years ago, it was called the brand, your device where, you know, it departments just gave up and said, okay, brand your right iPhone, if you, if you want to. Right?

So this, this is a new era in which we live. And we have a choice as individuals, as parents, as professionals. And it's part of the equation.

So here, essentially what I'm saying is that consumers have a, have a way to rebalance the overall architecture finally. And this is probably the most intriguing one is the notion of exponentials. So you perhaps remember this quote from software in the world. The more recent quote is AI is going to software. So therefore you can put me in this, but I'm sure others have said it. AI is gonna lead the world.

Now, why is that AI? That's a big word that other people, you know, mix it up or actually perhaps St it, along with Hollywood fantasies, you know, AI is going to kill mankind, maybe not, but AI is the family of algorithms within which you have machine learning within which you have deep. Now deep learning is the one that captures most of the attention today because it's something that is critical.

Think people don't necessarily know the experts don't understand so much, you know, what's going on inside of neural net, the most intriguing part about AI and is that its exponential strength is unmatched and not necessarily in control. You remember this event that took place years back you or two ago.

Great, amazing, but it took human beings. A deep mind, not owned by Google years of man machine interaction to build an algorithm using deep learning, to beat the world champion. That was amazing. But what's most amazing is that the best experts in AI for people who actually know how deep learning works, didn't expect this to happen for another 10 years. So the best, the most equipped human beings in their predictions were wrong. By 10 years, this is nothing compared to what happened two weeks ago.

Deep mind wrote a new piece of software that learned how to beat the champion software from two years ago in just three days with no human interaction whatsoever, 100 games to zero. This is exponential to you in other words, wow. Right? Human beings could not predict this. The first event forget about the second one. It's mind blowing this game was, was the most complex game that exists in terms of predicting and, and, and establishing meaning strategies.

So let's look at an exponential and you know, this one more's law, some people say world's law is that because there's so much you can pack into a market microprocessor, but it's not just more so it's more plus big data. Plus you combine all these things, it remains this direction. Let's call it. Hightech now the problem. And this is precisely what happens with predictions and go is that it's Hightech versus us. I've been coding for the past 35 plus years, still code time. And I eat big dinner for breakfast and I feel the crunch.

I feel that something is happening and it's rapidly spinning, not out of control, but in a way that is hard to, to actually establish the business strategy or, you know, even predict how you should build your, your platform day to day. So I invite all of you and this is a must, it's almost like a invitation to, to go deep. It's a toolbox AI. You need to understand what you can do for you for your, you know, career for your products, for your children, right? This is essential.

You need to understand the elements of the toolbox, whether you should implement it, whether you should provide escape strategies for these things. So at snips, we use a number of these techniques in the produce expose you heard motions of big data machine learning for also recommendation engines. All these things are not necessarily all in the same type of products. And so AI is definitely something that I can talk about explaining to you what Smith that. So the next decade voice, if you see the ecosystem, you also need to look at this. And this is the next 10 years.

The past 10 years have been the, you know, emergence the convergence at first. And then the existence of what we know as the smartphone, the app economy, it's massive, everyone participates. And there are just two players. There is a doable in terms of platforms, IUs, Android. This was the last 10 years. This new decade will go even further, much faster. So as I said, Amazon XR is responsible for the blue bar. Most of it, which means one device Porwal home. Think of it. The smartphone is for the mobility with the smart speaker is called the immobility or the common space.

The space in your home, this space could potentially be voice enabled. And given the fact that this is currently the competitive field, again, the usual suspects. So this is an Google home home bug coming soon and app store near you, but then others, this is in South Korea, also South Korea. They've recreated an entire ecosystem around that. This is in Japan with kids companions.

So voice is entering the home in an unprecedented face, whether you want it or not, you will be using a voice assistant in the next few years, pretty much everywhere because Google and apple and, and Amazon will at least make sure they'll spend hundred millions of dollars in marketing to educate us if not us, our kids. And of course the older generations that would prefer not to use a, you know, twiddle with streams and buttons. If you can ask the machine about the weather or the ticket and so on, you will, it's much simple. Why is that?

I would expect that if you look at the competitive field, essentially every connected device will and potentially have a voice control built in back in pins, cars. And so on typical use cases, you already, you know, intuitively it's pretty easy to imagine that every speaker should be able to respond to voice for music or audio content, TV screens, projectors should respond to these sort of queries, right? Whether they're simple to the volume, that's a function that's part of the device itself, but bring up the German subtitles and mute cuz you have phone call incoming, that's pretty complex.

These machines should be able to do it and they can the car. This is a perfect use case, right? Until we have self driving cars. And it'd be great to be able to keep your hands on the wheel and just talk to your car, to select the radio station, to bring the windows down or, you know, handle the AC. So typical use cases then of course, toys, robots was in Japan two weeks ago and robotics is, is actually the best case brand of their own voice assistance. So the building dollar question is in light of all of this, how do you do a personal assistant or is there just one personal assistant?

Is this gonna be series it gonna be Amazon, right? Are we all as service providers and device manufacturers, are we all gonna be satellites of these dominating platforms? The answer is no, there's a way and it's maybe counter intuitive at this stage, but there's a way to preserve the margin, the brand also data. And this is key to data for us. So I'd like to introduce briefly, snip is a company that focuses on building voice assistant technologies or B2B and the COO part of a team of PhDs in machine learning like informatics and play mathematics. We're 41 people.

We're actually just around the corner. Our offices are literally three minutes walk from here. So R and D based in Paris, we a small office in New York and you know, we claim French tech cause in France, we Don it turns out that we have very good mathematicians and doing AI in France is a good place to, so our goal is to make technology disappear. This is our long-term vision by using AI and put an AI assistant in every device. In other words, devices should be able to respond to us much like all technology, the door handle is a technology.

You look at the door handle, you can operate it in a most natural way. If you can bring a voice assistant into lamp into a new projector, you should be able to interact with it directly. Voice turns out that voice. This goes for transaction as well. Voice requires half of the brain power to operate.

In fact, when you use your phone, if you're looking at your phone screen, you walk down the street, you're likely to have an accident cause you can't do two things at the same time while working with a visual standby. So voice is much more efficient for multiple purposes. Not all purposes, of course, cause we live in multimodal interactions. This is very practical. We've rebuilt an entire chain to process the voice on device, right? So it goes with wayward to, to make sure that technology then listens to your instructions, ASR, automatic speech recognition.

And now you natural language understanding. And then the dialogue piece, which we we're gonna be very humble in 2017 and 18 machines are not ready to talk to humans and humans are not ready to have conversations with machines. That's a fallacy chat bots. That's my interpretation of the ecosystems. Going back to the triangle chat bots only help reinforce Facebook and Tencent because these guys don't have an operating system. They don't have the right to have apps within their apps.

So therefore they're forcing all developers to build a chat bot, even ready for all of the conversations within channels are reinforcing those ecosystems because they're gonna do banking. They're gonna do try looking.

And so, so what we've rebuilt is essentially the, the, the bare bone value chain and process to process, voice and device. And we have a process to build actual models per use case basis. So this works on device, which is also known as edge computing. That's very interesting, cuz you can see over the past 70 years there's been a swing back and forth between servers and clients. The smartphone is a very powerful computer with personal computer cause a very cross and powerful computer.

So we, you know, we've swung between server and you know, big clients in many ways. So we are back at the point where OTs, they pack sufficient power so that we can bring a lot of intelligence into the devices themselves. That's what we've done. And by doing that, we are GDPR component because we can process the voice where it belongs, which is next to the people speaking. If the voice is being dealt with the server that that creates a compliancy issue. So benchmarks we've published benchmark, which so it is unfair. Cause we're comparing with Google, Facebook, Amazon, and Microsoft.

So these guys have a general purpose stack to operate with voice. Their performance is good, not great. Our performance is great, but not generally speaking, we have great because we verticalize the use cases. So on a vertical basis, we have a virtual performance because that's what we're trying to do. We're trying to make sure that this video projector understands my my will. My voice commands about video projector, not, I don't want to ask the video projector. What's the age of Barba, that's general purpose. And it runs on a tiny cheap computer, you know, as the Rasberry file.

So this is like five to $10 worth of component and three fast, it works on device. There's no, you know, latency going back and forth to the web, very snappy. And we've built a system that is self-service. People can build quickly in a system using natural language. This is a new way to program machines, by the way, natural language.

It's, it's a new interface between programmers and their, and there's a marketplace where you can use pre-build bundle. We just launched actually, or we have 5,000 developers and 12,000 assistance available in multiple languages. And this is our first product service. It has been available since the summer. And this is a robot, the kicker actually a French company as well. And the keyer is essentially, there are two, two device with a video projector of wheels. And when you talk to it, you can say, Hey, Keer project a video on the screen on, on the wall.

You can say, follow me to the bedroom and project on the ceiling. Cuz the beer projector goes both ways. And so they integrated on technology so that you can say Hays, snips, sorry, my Hays head following. And in the past they were using Google speech, which required to use okay, Google, which was not the best interest of that brand. And then also operated with a cloud, which meant that if the Keer was in the corner of the room with no wifi access, it wouldn't work. It wouldn't fully it. So you had to carry it, which defeated the purpose.

So this is one, one way that so in nutshell, it's possible, you know, to, to have a different approach. Of course it requires a bit more, you know, a CPU and device, but it's possible to, to provide voice assistance or voice interactions with devices and services with a different model.

So yes, voice everywhere expected pretty much everywhere near you coming future. And this is it. So you have time for questions. Great. Thank you very much. Have a question here. It's gonna be lunch time. So It's gonna be quick but important Again. I'm gonna make it quick two question in one, one you mentioned in the maybe 10 slide before the end talking about connective device. My understanding from the last few slides of what Smith does is that you don't necessarily, you don't necessarily need to be online.

So question is, is it something that works entirely autonomously without internet connection and two, if so, or if not, you know, how about, you know, what, how do you secure cuz this is something we haven't been discussing much this morning, but how do you secure all that data that you're collecting your voice? Thank you. Thanks. So I think we have to distinguish the, and when we talk about AI, what we really need to, you know, slice the problem, voice understanding requires AI. The voice understanding is a skill that a 10 year old kid acquires or masters around age 10.

Once you acquire that, then you don't need to necessarily feed it with more data. So what we've done by specializing our, you know, voice assistance to one specific use case. So let's say it's a coffee machine maker and they was to building barista inside the machine. Once we've built barista, it works in a way that is indefinite in the same way. So if you buy the coffee machine, you breathe on your shelf, don't connect it to the wifi. It works for 10 years as it should. So there's no data collection unless the Ew machine maker wants to collect data, not snips, we don't have servers.

We just provide beware with the data models and the language models and the use case. Once it's integrated, if the coffee machine is connected, then they can decide to collect data. In case people suddenly say, I want a chocolate milk and the coffee machine doesn't do that. And they want to report that consumers also would like to, to have a chocolate. Yeah. Right. So data collection is besides the point in terms of voice understanding. Now the cognitive part is a different part. If you say to your coffee machine, I would like to have the same coffee as yesterday.

Then you need to build business logic or use case logic such as short term memory and voice identification, which is a module we're providing. So I don't want my wife coffee. I want my coffee, the one I or yesterday that's connected on device as well. Right? So there's no transaction in area for the voice that answers that question.

Well, just as a follow up here, isn't a core part of AI and, and machine learning systems that all the new data trains the model so that it performs better tomorrow than it today. So if you're not hosting a server, if you're not collecting back, how do you, you know, make the voice, you know, capabilities in the robot better next week familiar Today? Okay.

So the, the answer is there is no absolute answer. It's not black or white. It's always a compromise, but there's a huge fallacy when it comes to big data and AI, it's not linear gain. Adding more data does not improve your models linearly, right? It's it's the law of diminishing returns. So when I talk about language, we've created models that work reliably for 95% of the population, regardless of the speaker, that's for the language model. Right? And we've managed to do it because we've now the use case.

But if you talk about a big data model, same thing, it's not because you're gonna be amassing much more data that you almost gonna be better. In fact, it might be worse cause you always give the same data. You don't get outliers, you don't get new, right.

Or, you know, it's self reinforced or it's biased. In other words, big data for the sake of the data is not in itself of value right now it's, it's a compromise. If the coffee machine maker wants to collect the, the voice clips, stream them to their data centers, perhaps if the user has up today, which is entirely optional, right? Maybe they have access to a much broader set of queries, which we can use to retrain the model and then to an over the year upgrade. Right. That's entirely possible.

Our approach is to say, okay, we're gonna use the edge computing possibilities to rebalance a little bit of the data collection, which means data collection is entirely optional, not by default, hence the privacy by design approach. Right? So let's distinguish what is AI? What is big data? What do you use it for today? The default approaches I'll collect everything, put it in the server. And if the regulator, you know, stop in the hand, I'll do something about it. That's the default approach today and it's possible to do different. Yeah.

It, it sounds to me like what you're talking about is sort of thinking about what data should be used for what purpose and optimizing for consumer experience at the moment. My questions actually about, you know, the kind of the more nuts symbols kind of authentication protocols. How are you thinking about voice in this context? Are you thinking about voice prints? Are you thinking about NLP for voice based identification based on speech patterns? What are you Thinking? So we are very much concerned first to try and tackle consumer use cases.

And so in that way, the module that we are working on is a, is a voice discrimination and not voice authentication. So the coffee machine example is I want the same coffees yesterday. It has to know that it's me, not my wife, my kids, or you want to commute for kids so that you, you lock in the system maybe for VIN or something. So that's pretty easy to do, to do like with identity secure grade authentication, that's a different type of technology, multiple set of technologies because your voice changes between the morning and the evening and all that. So we're attacking that today.

Like this?

Don't like this?

Why don't you like this?

Yann Lechelle - Personal Voice Assistants: How Personal, How Much Assistance?