Duplex shows Google failing at ethical and creative AI design

May 10, 2018 at 01:28PM
via TechCrunch

Google CEO Sundar Pichai milked the woos from a clappy, home-turf developer crowd at its I/O conference in Mountain View this week with a demo of an in-the-works voice assistant feature that will enable the AI to make telephone calls on behalf of its human owner.

The so-called ‘Duplex’ feature of the Google Assistant was shown calling a hair salon to book a woman’s hair cut, and ringing a restaurant to try to book a table — only to be told it did not accept bookings for less than five people.

At which point the AI changed tack and asked about wait times, earning its owner and controller, Google, the reassuring intel that there wouldn’t be a long wait at the elected time. Job done.

The voice system deployed human-sounding vocal cues, such as ‘ums’ and ‘ahs’ — to make the “conversational experience more comfortable“, as Google couches it in a blog about its intentions for the tech.

The voices Google used for the AI in the demos were not synthesized robotic tones but distinctly human-sounding, in both the female and male flavors it showcased.

Indeed, the AI pantomime was apparently realistic enough to convince some of the genuine humans on the other end of the line that they were speaking to people.

At one point the bot’s ‘mm-hmm’ response even drew appreciative laughs from a techie audience that clearly felt in on the ‘joke’.

But while the home crowd cheered enthusiastically at how capable Google had seemingly made its prototype robot caller — with Pichai going on to sketch a grand vision of the AI saving people and businesses time — the episode is worryingly suggestive of a company that views ethics as an after-the-fact consideration.

One it does not allow to trouble the trajectory of its engineering ingenuity.

A consideration which only seems to get a look in years into the AI dev process, at the cusp of a real-world rollout — which Pichai said would be coming shortly.

Deception by design

“Google’s experiments do appear to have been designed to deceive,” agreed Dr Thomas King, a researcher at the Oxford Internet Institute’s Digital Ethics Lab, discussing the Duplex demo. “Because their main hypothesis was ‘can you distinguish this from a real person?’. In this case it’s unclear why their hypothesis was about deception and not the user experience… You don’t necessarily need to deceive someone to give them a better user experience by sounding naturally. And if they had instead tested the hypothesis ‘is this technology better than preceding versions or just as good as a human caller’ they would not have had to deceive people in the experiment.

“As for whether the technology itself is deceptive, I can’t really say what their intention is — but… even if they don’t intend it to deceive you can say they’ve been negligent in not making sure it doesn’t deceive… So I can’t say it’s definitely deceptive, but there should be some kind of mechanism there to let people know what it is they are speaking to.”

“I’m at a university and if you’re going to do something which involves deception you have to really demonstrate there’s a scientific value in doing this,” he added, agreeing that, as a general principle, humans should always be able to know that an AI they’re interacting with is not a person.

Because who — or what — you’re interacting with “shapes how we interact”, as he put it. “And if you start blurring the lines… then this can sew mistrust into all kinds of interactions — where we would become more suspicious as well as needlessly replacing people with meaningless agents.”

No such ethical conversations troubled the I/O stage, however.

Yet Pichai said Google had been working on the Duplex technology for “many years”, and went so far as to claim the AI can “understand the nuances of conversation” — albeit still evidently in very narrow scenarios, such as booking an appointment or reserving a table or asking a business for its opening hours on a specific date.

“It brings together all our investments over the years in natural language understanding, deep learning, text to speech,” he said.

What was yawningly absent from that list, and seemingly also lacking from the design of the tricksy Duplex experiment, was any sense that Google has a deep and nuanced appreciation of the ethical concerns at play around AI technologies that are powerful and capable enough of passing off as human — thereby playing lots of real people in the process.

The Duplex demos were pre-recorded, rather than live phone calls, but Pichai described the calls as “real” — suggesting Google representatives had not in fact called the businesses ahead of time to warn them its robots might be calling in.

“We have many of these examples where the calls quite don’t go as expected but our assistant understands the context, the nuance… and handled the interaction gracefully,” he added after airing the restaurant unable-to-book example.

So Google appears to have trained Duplex to be robustly deceptive — i.e. to be able to reroute around derailed conversational expectations and still pass itself off as human — a feature Pichai lauded as ‘graceful’.

And even if the AI’s performance was more patchy in the wild than Google’s demo suggested it’s clearly the CEO’s goal for the tech.

While trickster AIs might bring to mind the iconic Turing Test — where chatbot developers compete to develop conversational software capable of convincing human judges it’s not artificial — it should not.

Because the application of the Duplex technology does not sit within the context of a high profile and well understood competition. Nor was there a set of rules that everyone was shown and agreed to beforehand (at least so far as we know — if there were any rules Google wasn’t publicizing them). Rather it seems to have unleashed the AI onto unsuspecting business staff who were just going about their day jobs. Can you see the ethical disconnect?

“The Turing Test has come to be a bellwether of testing whether your AI software is good or not, based on whether you can tell it apart from a human being,” is King’s suggestion on why Google might have chosen a similar trick as an experimental showcase for Duplex.

“It’s very easy to say look how great our software is, people cannot tell it apart from a real human being — and perhaps that’s a much stronger selling point than if you say 90% of users preferred this software to the previous software,” he posits. “Facebook does A/B testing but that’s probably less exciting — it’s not going to wow anyone to say well consumers prefer this slightly deeper shade of blue to a lighter shade of blue.”

Had Duplex been deployed within Turing Test conditions, King also makes the point that it’s rather less likely it would have taken in so many people — because, well, those slightly jarringly timed ums and ahs would soon have been spotted, uncanny valley style.

Ergo, Google’s PR flavored ‘AI test’ for Duplex is also rigged in its favor — to further supercharge a one-way promotional marketing message around artificial intelligence. So, in other words, say hello to yet another layer of fakery.

How could Google introduce Duplex in a way that would be ethical? King reckons it would need to state up front that it’s a robot and/or use an appropriately synthetic voice so it’s immediately clear to anyone picking up the phone the caller is not human.

“If you were to use a robotic voice there would also be less of a risk that all of your voices that you’re synthesizing only represent a small minority of the population speaking in ‘BBC English’ and so, perhaps in a sense, using a robotic voice would even be less biased as well,” he adds.

And of course, not being up front that Duplex is artificial embeds all sorts of other knock-on risks, as King explained.

“If it’s not obvious that it’s a robot voice there’s a risk that people come to expect that most of these phone calls are not genuine. Now experiments have shown that many people do interact with AI software that is conversational just as they would another person but at the same time there is also evidence showing that some people do the exact opposite — and they become a lot ruder. Sometimes even abusive towards conversational software. So if you’re constantly interacting with these bots you’re not going to be as polite, maybe, as you normally would, and that could potentially have effects for when you get a genuine caller that you do not know is real or not. Or even if you know they’re real perhaps the way you interact with people has changed a bit.”

Safe to say, as autonomous systems get more powerful and capable of performing tasks that we would normally expect a human to be doing, the ethical considerations around those systems scale as exponentially large as the potential applications. We’re really just getting started.

But if the world’s biggest and most powerful AI developers believe it’s totally fine to put ethics on the backburner then risks are going to spiral up and out and things could go very badly indeed.

We’ve seen, for example, how microtargeted advertising platforms have been hijacked at scale by would-be election fiddlers. But the overarching risk where AI and automation technologies are concerned is that humans become second class citizens vs the tools that are being claimed to be here to help us.

Pichai said the first — and still, as he put it, experimental — use of Duplex will be to supplement Google’s search services by filling in information about businesses’ opening times during periods when hours might inconveniently vary, such as public holidays.

Though for a company on a general mission to ‘organize the world’s information and make it universally accessible and useful’ what’s to stop Google from — down the line — deploying vast phalanx of phone bots to ring and ask humans (and their associated businesses and institutions) for all sorts of expertise which the company can then liberally extract and inject into its multitude of connected services — monetizing the freebie human-augmented intel via our extra-engaged attention and the ads it serves alongside?

During the course of writing this article we reached out to Google’s press line several times to ask to discuss the ethics of Duplex with a relevant company spokesperson. But ironically — or perhaps fittingly enough — our hand-typed emails received only automated responses.

Pichai did emphasize that the technology is still in development, and said Google wants to “work hard to get this right, get the user experience and the expectation right for both businesses and users”.

But that’s still ethics as a tacked on afterthought — not where it should be: Locked in place as the keystone of AI system design.

And this at a time when platform-fueled AI problems, such as algorithmically fenced fake news, have snowballed into huge and ugly global scandals with very far reaching societal implications indeed — be it election interference or ethnic violence.

You really have to wonder what it would take to shake the ‘first break it, later fix it’ ethos of some of the tech industry’s major players…

Google Assistant making calls pretending to be human not only without disclosing that it's a bot, but adding "ummm" and "aaah" to deceive the human on the other end with the room cheering it… horrifying. Silicon Valley is ethically lost, rudderless and has not learned a thing.

— zeynep tufekci (@zeynep) May 9, 2018

Ethical guidance relating to what Google is doing here with the Duplex AI is actually pretty clear if you bother to read it — to the point where even politicians are agreed on foundational basics, such as that AI needs to operate on “principles of intelligibility and fairness”, to borrow phrasing from just one of several political reports that have been published on the topic in recent years.

In short, deception is not cool. Not in humans. And absolutely not in the AIs that are supposed to be helping us.

Transparency as AI standard

The IEEE technical professional association put out a first draft of a framework to guide ethically designed AI systems at the back end of 2016 — which included general principles such as the need to ensure AI respects human rights, operates transparently and that automated decisions are accountable.

In the same year the UK’s BSI standards body developed a specific standard — BS 8611 Ethics design and application robots — which explicitly names identity deception (intentional or unintentional) as a societal risk, and warns that such an approach will eventually erode trust in the technology.

“Avoid deception due to the behaviour and/or appearance of the robot and ensure transparency of robotic nature,” the BSI’s standard advises.

It also warns against anthropomorphization due to the associated risk of misinterpretation — so Duplex’s ums and ahs don’t just suck because they’re fake but because they are misleading and so deceptive, and also therefore carry the knock-on risk of undermining people’s trust in your service but also more widely still, in other people generally.

“Avoid unnecessary anthropomorphization,” is the standard’s general guidance, with the further steer that the technique be reserved “only for well-defined, limited and socially-accepted purposes”. (Tricking workers into remotely conversing with robots probably wasn’t what they were thinking of.)

The standard also urges “clarification of intent to simulate human or not, or intended or expected behaviour”. So, yet again, don’t try and pass your bot off as human; you need to make it really clear it’s a robot.

For Duplex the transparency that Pichai said Google now intends to think about, at this late stage in the AI development process, would have been trivially easy to achieve: It could just have programmed the assistant to say up front: ‘Hi, I’m a robot calling on behalf of Google — are you happy to talk to me?’

Instead, Google chose to prioritize a demo ‘wow’ factor — of showing Duplex pulling the wool over busy and trusting humans’ eyes — and by doing so showed itself tonedeaf on the topic of ethical AI design.

Not a good look for Google. Nor indeed a good outlook for the rest of us who are subject to the algorithmic whims of tech giants as they flick the control switches on their society-sized platforms.

“As the development of AI systems grows and more research is carried out, it is important that ethical hazards associated with their use are highlighted and considered as part of the design,” Dan Palmer, head of manufacturing at BSI, told us. “BS 8611 was developed… alongside scientists, academics, ethicists, philosophers and users. It explains that any autonomous system or robot should be accountable, truthful and unprejudiced.

“The standard raises a number of potential ethical hazards that are relevant to the Google Duplex; one of these is the risk of AI machines becoming sexist or racist due to a biased data feed. This surfaced prominently when Twitter users influenced Microsoft’s AI chatbot, Tay, to spew out offensive messages.

”Another contentious subject is whether forming an emotional bond with a robot is desirable, especially if the voice assistant interacts with the elderly or children. Other guidelines on new hazards that should be considered include: robot deception, robot addiction and the potential for a learning system to exceed its remit.

“Ultimately, it must always be transparent who is responsible for the behavior of any voice assistant or robot, even if it behaves autonomously.”

Yet despite all the thoughtful ethical guidance and research that’s already been produced, and is out there for the reading, here we are again being shown the same tired tech industry playbook applauding engineering capabilities in a shiny bubble, stripped of human context and societal consideration, and dangled in front of an uncritical audience to see how loud they’ll cheer.

Leaving important questions — over the ethics of Google’s AI experiments and also, more broadly, over the mainstream vision of AI assistance it’s so keenly trying to sell us — to hang and hang.

Questions like how much genuine utility there might be for the sorts of AI applications it’s telling us we’ll all want to use, even as it prepares to push these apps on us, because it can — as a consequence of its great platform power and reach.

A core ‘uncanny valley-ish’ paradox may explain Google’s choice of deception for its Duplex demo: Humans don’t necessarily like speaking to machines. Indeed, oftentimes they prefer to speak to other humans. It’s just more meaningful to have your existence registered by a fellow pulse-carrier. So if an AI reveals itself to be a robot the human who picked up the phone might well just put it straight back down again.

“Going back to the deception, it’s fine if it’s replacing meaningless interactions but not if it’s intending to replace meaningful interactions,” King told us. “So if it’s clear that it’s synthetic and you can’t necessarily use it in a context where people really want a human to do that job. I think that’s the right approach to take.

“It matters not just that your hairdresser appears to be listening to you but that they are actually listening to you and that they are mirroring some of your emotions. And to replace that kind of work with something synthetic — I don’t think it makes much sense.

“But at the same time if you reveal it’s synthetic it’s not likely to replace that kind of work.”

So really Google’s Duplex sleight of hand may be trying to conceal the fact AIs won’t be able to replace as many human tasks as technologists like to think they will. Not unless lots of currently meaningful interactions are rendered meaningless. Which would be a massive human cost that societies would have to — at very least — debate long and hard.

Trying to avoid such a debate from taking place by pretending there’s nothing ethical to see here is, hopefully, not Google’s designed intention.

King also makes the point that the Duplex system is (at least for now) computationally costly. “Which means that Google cannot and should not just release this as software that anyone can run on their home computers.

“Which means they can also control how it is used, and in what contexts — and they can also guarantee it will only be used with certain safeguards built in. So I think the experiments are maybe not the best of signs but the real test will be how they release it — and will they build the safeguards that people demand into the software,” he adds.

As well as a lack of visible safeguards in the Duplex demo, there’s also — I would argue — a curious lack of imagination on display.

Had Google been bold enough to reveal its robot interlocutor it might have thought more about how it could have designed that experience to be both clearly not human but also fun or even funny. Think of how much life can be injected into animated cartoon characters, for example, which are very clearly not human yet are hugely popular because people find them entertaining and feel they come alive in their own way.

It really makes you wonder whether, at some foundational level, Google lacks trust in both what AI technology can do and in its own creative abilities to breath new life into these emergent synthetic experiences.