ISRF Flexi-Grant recipient Bridget Vincent delves into the arrival of AI-narrated audiobooks, shedding light on the profound ethical dilemmas they bring to the forefront.
Meet Madison and Jackson, the AI narrators or “digital voices” soon to be reading some of the audiobooks on Apple Books. They sound nothing like Siri or Alexa or the voice telling you about the unexpected item in the bagging area of your supermarket checkout. They sound warm, natural, animated. They sound real.
With their advanced levels of realism, Apple’s new AI voices present the genuine possibility that the listener will be unaware of their artificiality. Even the phrase used in Apple’s catalogues of digitally-narrated audiobooks – “this is an Apple Books audiobook narrated by a digital voice based on a human narrator” – is ambiguous. It’s not clear from this phrase who or what is doing the narrating.
This ambiguity means that it would be possible for you to download an audiobook voiced by Jackson, start listening and think (if you think about it at all) that the voice you hear is that of a voice actor. But does this matter?
If the listener is wholly unaware that the narrator is digital, this raises some of the many ethical questions (such as that of consent) that arise whenever users are unaware that they are interacting with an AI-driven technology, rather than with a person.
The more complicated – and more interesting – problem, however, arises when we are both aware and unaware of their artificiality. When you listen to an AI narrator, you may know that you are interacting with an artificially intelligent entity. But, as so many of us already do with chatbots, many listeners will partially suspend this awareness and project ideas of personhood onto the digital voice, somewhat as we do for these books’ fictional characters.
Most worryingly, Apple’s marketing language is engaging in its own form of pretence in presenting the “digital voice” technology as harmless. The Apple Books for Authors audiobook information page emphasises the technology’s potential for democratising audiobook creation and plays down the impact on human actors. Indeed, the website explicitly positions the technology as being on the side of the little guy – Apple claims to be “empowering indie authors and small publishers”.
This pretence ultimately operates by capitalising on the multiple meanings of the word “heard”. Apple claims that “only a fraction of books are converted to audio – leaving millions inaccessible to readers who prefer audiobooks, whether by choice or necessity”. Apple’s statement that “Every book deserves to be heard” is an especially canny choice given its built-in associations with democratic representation and inclusivity.
It’s certainly the case that using digital narration means that authors don’t shoulder the financial costs or time burden of narrating the books themselves. And, indeed, this means more people can produce audiobooks.
But in potentially eroding the livelihood of another kind of small operator (the voice artist), the new digital narration technology doesn’t so much stand up for the little guy as set the interests of two different little guys against each other.
In a further twist, the datasets used to train Apple’s digital voices have, in some cases, been reported to include the work of existing voice artists, drawing their considerable indignation.
In presenting itself as disrupting “big audiobook” and favouring small players, Apple’s marketing follows a recognisable trope. This involves a technological “disruptor” touting the ability of individual operators to participate in previously closed-off areas of commercial activity without passing on the corporate profits made through such “inclusivity”.
What is perhaps unsettling about this new technology, then, is not the unfamiliarity of its powers but the familiar ring of “platform capitalism” – when big companies provide the technology for others to operate.
The frequently–sued Uber and the frequently-banned Airbnb have by now lost much of their sheen as engines of accessibility. Their initial identity, however, was grounded in the use of democratic rhetoric, from Uber telling potential drivers “you’re in charge”, to AirBnB’s claim to be founded in “connection and belonging”.
So the use of pseudo-altruistic language by tech disruptors is nothing new. What is new is the window onto this seductive fiction offered by the encounter with AI narrators. After all, the self-deception involved in assuming that your narrator is human parallels, in many ways, the self-deception required to believe that Apple’s digital voice technology is an altruistic development.
Reflecting on the connection between these acts of imagination is necessary because, so often, it’s easier just to believe. It’s easier just to believe that your Uber driver is there for the flexibility, that your Airbnb host is just a neighbourhood guy rather than a property conglomerate that owns half the street.
It’s easier to believe, but it’s not always easy to identify and understand the dynamics of this belief. The experience of listening to an artificially-intelligent narrator might help us catch our own brains in the act of self-deception – including the act of buying AI-narrated audiobooks because a marketing website tells us it’s the democratic thing to do.
Australian National University
Bridget Vincent earned her PhD at Cambridge University as a General Sir John Monash Scholar, followed by a McKenzie Postdoctoral Research Fellowship at the University of Melbourne. She also served as a Postdoctoral Affiliate at Clare Hall, Cambridge, with funding from an Endeavour Research Fellowship. Before joining ANU, she taught modern and contemporary literature at Cambridge and Nottingham Universities. She has also published literary journalism and op-eds in prominent publications like The Guardian, The Times Higher Education, The Age, Cordite, and The Australian Book Review.