Something has been nagging at me recently. I read a lot of tech news, and it seems automated translation is about to get a whole lot better, and a whole lot more mobile. Meanwhile there is the burgeoning prospect of augmenting ourselves with technology to enhance our squishy human brains – the so-called ‘singularity’ of machine-human interaction. What’s nagging me is that these developments will surely have a massive effect on how people speaking different languages interact. That in turn will drive an existential shift in our approach to ‘language rights’.
But so far nobody has really said anything about all this, perhaps because the technology seems so distantly futuristic. It really isn’t though. It’s basically here already. There is some way left to go, and right now the market is still more hype than reality. So I’ll start by reviewing where we actually are now, then I’ll join the dots from there to everyone being seamlessly equipped with reliable, universal live translation. Then I’ll think through what that might mean for the field of language rights.
First is the question of reliability. Computer translation has long been a bit of a clichéd joke – accidentally translating your holiday request for a medium spiced latté into an insult about the barista’s mother. Computers can do the basics, but only humans get all the nuances right. Right? Well, that’s changing.
Current machine translation is not yet as good as humans, especially between very dissimilar languages. A recent university study of Korean-English translation, pitting various translation programs against a fleshy rival, came out decisively in favour of us air-breathers. But the programs still averaged around one-third accuracy. That’s pretty good. Meanwhile another recent controlled test, comparing the accuracy of automated translation tools, concludes that “new technologies of neural and adaptive translation are not just hype, but provide substantial improvements in machine translation quality”.
A recent article in The Economist shows incremental improvements to accuracy over recent decades. Translation is increasing in accuracy more quickly than ever, fuelled by advances in artificial intelligence, neural networks, and machine learning – essentially computers learning on their own, not waiting for frail humans to gradually program them during waking hours and between meals. Computers can now independently chew over vast databases of natural language, compare common patterns, and refine their own algorithms – see for example this pre-review academic paper outlining the Google Neural Machine Translation (GNMT) system. The more data that goes in, the more accurate it becomes. A recent update to Google Translate in November 2016 improved the system “more in a single leap than we’ve seen in the last ten years combined”.
So, highly reliable real-time automated translation doesn’t seem such a distant mirage.
And importantly, since it’s learning from spontaneous human input, it’s not just consulting dictionary-like formal grammars and vocabulary, but rather the way people really speak and write. This is not just good news for speakers of different languages but also of different non-standard dialects. The computer doesn’t care if you speak proper! It only cares if you speak in a way that’s approximately comparable to how other people have spoken.
This kind of live translation is going mobile too, thanks to clever phone apps like Speak & Translate, or Google Translate. Point your phone at the Ristorante Italiano, and on the screen you’ll see an Italian Restaurant – not just as blocky subtitles but as text that actually overlays the text on the sign in front of you, as if the sign were written in your language.
The next piece of the puzzle is voice recognition. Growing up in the 1980s and 1990s, I remember my dad using auto-dictation. And. Painstakingly. Reading. Out. Each. Word. But again, times have changed. The Economist article I cited earlier also reviews recent leaps forward in voice recognition, to work with unfamiliar voices and rapid speech.
Combining that with live translation gives you Skype’s new real-time translator. Search online for videos of that, and once you get past the slick corporate promotional videos you should find examples of multilingual friends taking it for a spin. Their reactions tend to range between impressed, confused and amused, so there’s room for improvement. But as I noted above, a lot of time, effort and money is being pumped into this. Expect further big advances, soon.
And again, this technology is going mobile. With apps like Google Translate you can speak into your phone, and an automated voice will speak a translation aloud. As that gets more accurate, it’ll be easier and easier to have reasonably natural conversations between languages.
But… translating your voice isn’t much use when there’s lots of other noise around. Technology has two answers here. Firstly, noise filtering techniques are improving (Kirch & Zhu 2016), and are the focus of much innovative energy – search Google Scholar for ‘voice audio noise’ and you’ll find a flurry of recent patents. Secondly, machine lip reading is advancing rapidly too: comparing human sounds with their corresponding mouth movements – so-called ‘visemes’. Your phone can’t hear you? No problem if it can at least see you (and phones needn’t limit themselves to the puny smear of light that us squinting humans rely on).
Artificial Intelligence is being applied here too, similarly outpacing clunky mammalian programmers clicking away one key at a time.
If voice recognition is improving, what about voice production? That Economist article I mentioned notes the application of machine learning to understand pronunciation. That in turn points to a future auto-translator that not only translates your words but could even mimic your actual voice.
The next piece I’ll put into the mix is perhaps the weirdest. We’ve all seen dubbed movies where the actors’ lips don’t match the translation. Pretty soon, that mismatch will disappear. Enter Face2Face, a computer algorithm developed by researchers at the University of Erlangen-Nuremberg, the Max Planck Institute for Informatics, and Stanford University. It works by filming someone making facial expressions, and then dynamically mapping those expressions onto a moving face in a video, in real-time. The absolutely bizarre result is the ability to force anyone in any video to assume any facial expression you wish, including mouthing out different words.
Useful in movies for now, but think about the likely direction of the technology. The research I noted above on ‘visemes’ could straightforwardly lead to a database of mouth movements needed for all human sounds. A computer could then artificially map facial expressions onto a moving video – that is, onto your face, as you speak into your phone or webcam.
Of course, you’d still have to speak through a device. That’s a bit awkward. But that leads me to the last piece of this increasingly futuristic (but, as I’m trying to convince you, not that futuristic!) puzzle: Augmented Reality.
Watch this 2016 TED talk demonstrating a current AR headset, and think about how that could combine with live audio translation and Face2Face. Just think it through. You could meet someone speaking an unfamiliar language; the headset could translate their voice while also augmenting their moving face; and you would hear and see them speaking your language.
You’d look a bit weird with that thing strapped to your head; but there are already more compact AR headsets, like Google Glass – already in everyday use by many factory workers to flash up details of products in front of them. Apple is making plenty of noises about moving into AR. So is Facebook. Wearing clunky glasses is still pretty awkward, and uncool – probably why they never caught on outside factories – but in 2016 a patent was filed for a tiny AR implant that fits inside your eyeball. Who might have filed that patent? Surprise surprise: Google.
Then there are recent advances in data storage and miniaturised processing power, for example a technique to write data to single atoms (Natterer et al. 2017), and the newly created ‘LI-RAM’ microchips promising supercomputer-like power inside tiny devices. In-ear technology is already available, of course. So what if that unwieldy headset was instead little more than a glint in your eye and a bud in your ear, Black Mirror style? Not so awkward anymore. And if it featured reliable live translation and Face2Face, suddenly Babel disintegrates completely in a puff of pixelated smoke.
This is the ‘singularity’ I mentioned at the start, the merging of wobbly human parts with synthetic improvements. This is predicted in the next few decades, and is currently the subject of active venture capital. The market research firm Global Market Insights predicts a $165bn market in AR by 2024.
My point is that, once you imagine all these pieces of nascent technology floating around and rapidly improving, their journey into a single new gadget doesn’t seem remotely unlikely. And you know what rampant neoliberal capitalism really likes? New gadgets!
So, that’s part 1, the gadgetry. I give it ten years before live, unnoticeable, automated translation between anyone anywhere is utterly trivial. Ok, let’s be really cautious: twenty years. Now on to part 2: what does all this mean for how we currently think about language rights?
We move now from the field of technology, into the academic sphere of sociolinguistics and political philosophy. In broad terms, the field of language rights has three overarching aims that relate to speakers of lesser-used minority languages:
- To pursue basic freedoms by preventing discrimination on the basis of the language you speak.
- Beyond basic freedoms, to create a world in which speakers of minority languages aren’t alienated from normal life. This means ensuring accessibility of services in different languages. That can also mean training people to speak minority languages, to aid communication.
- Promoting languages as important and valuable goods in themselves, emblems of cultural diversity, with a value that transcends their material benefit to particular groups.
These three broad goals are referred to by François Grin, respectively, as “negative rights”, “positive rights”, and a “third pillar … [which] cannot be understood strictly in terms of rights” (2003:84). If these are the current aims of language rights, let’s relate each one to the technological leaps outlined above.
In this future scenario, negative rights are essentially no longer relevant. Speak whatever language you like! Outright language bans tend to be based on chauvinistic nationalism and the jealous wish to hear nothing but the One True Language all around (and that One language has a funny tendency to be different in each country). Even the most ardent linguistic nationalist could simply set their translator to filter everything into beige monolingual monotony. Best of all, they could do it without bothering anyone.
Positive rights would be trivially easy to achieve, but only if all languages are included in the translation database. This then would be a new area of debate for language rights: ensuring inclusion of minority languages. That actually leads on pretty smoothly from current debates about inclusion of minority languages in education and civic life. Remember though: machine learning has the potential to make that process a lot cheaper and quicker, so it could be less burdensome than current debates over manual translation.
If all that were made a reality, then minority language speakers need never feel isolated or excluded again. Just like the majority language speakers I mentioned under negative rights, so too could minority language speakers translate everyone into their language.
The same applies to speakers of non-standard vernaculars. I pointed out earlier that machine learning works on natural input, not standard language. Your translation device could translate into any language variety you like. In fact, since your device would understand your own language patterns and your own voice best of all, it could even make everyone sound exactly like you.
As I noted above, positive rights currently also involves training other people in minority languages, so that minority language speakers can interact seamlessly with different organisations. That would change in this future scenario, if minority language speakers could hear, see and speak their language all around them at the flick of a (virtual) switch. There would be no need for anyone to be trained in minority languages, at least not to lift barriers faced by their speakers.
And this of course cuts both ways. Universal translation removes the need to bother learning ‘majority’ languages or standard varieties. It’s immaterial, if the technology enables us to understand one another regardless of the actual noises coming out of our faces.
So what about the “third pillar”? Celebrating languages as goods in themselves, regardless of whether that necessarily delivers material benefits. Actually, this one more or less gets a free pass. Even if we can all understand each other, if languages still have some other transcending value, then that’s not affected by the sorts of material barriers or benefits that concern negative and positive rights. Included within this is learning a language for personal or emotional rewards – for example this touching recent article about a First Nations Canadian learning her heritage language despite its deathbed status, or this account of a similar effort in Singapore. That rationale could continue unchanged.
The third pillar is not just something that affects individuals. The third pillar is a mainstay of many governmental policies to revitalise minority languages as important bearers of culture and heritage, above and beyond material benefits they might bring. That motivation may endure, just without the need for anyone to learn the language who didn’t really want to.
But what about achieving literacy in the first place? It doesn’t help being part cyborg and translating all the text that surrounds you if you can’t understand writing at all. Achieving literacy is easiest in your first language, so surely there remains a need for provision in minority languages, at least in terms of teaching? Well not necessarily; not if each child could simply see learning materials augmented to appear in their language, while their teacher’s voice could be auto-translated too, even to sound like one of their parents. There may come a day when there is no need to translate textbooks or train teachers in minority languages; it could all be virtually delegated.
And as I noted above, that means you can learn to read in any dialect. That in turn means the imperative to approximate any kind of standard dialect begins to fade from view. That has massive implications for other areas of language rights in relation to language standardisation and ‘correct’ language.
Peering further into the future (though perhaps not very much further), there lies the possibility of your little translation gadget no longer relying on you wobbling your gooey speech organs at all. It could just read your thoughts directly. Again, this is not science fiction but a predictable advance of existing technology. It is already possible to read basic yes/no responses with electrodes mounted on a head cap (Chaudhary et al. 2017). One neurologist went further and surgically implanted electrodes on his brain, then recorded which neurons fired up as he spoke certain sounds, words and phrases. He had to remove the electrodes after a few weeks for safety reasons; and ethical approval has not yet been granted for wider testing. Nevertheless, his preliminary results suggest clear potential. Vaunted tech maestro Elon Musk has caught the scent, and launched a company, Neuralink, dedicated to the brain-machine merger. Hot on Musk’s heels, Facebook has announced similar plans. The potential end point of this tech is word-free communication, where written and spoken language are seen as mere quaint extravagances.
But wait a minute. This talk of ubiquitous live translation is all very nice, but not everyone gets the latest gadgets for Christmas. Rampant neoliberal capitalism loves new gadgets, but it also seems to love stark and growing socioeconomic inequalities. It also loves those gadgets to improve so fast that, even when disadvantaged folks get hold of them, there’s already a better model for those higher up the global elite food chain.
Still, if this is true of gadgets then it’s also true of public funding for literacy programmes and provision of services in minority languages. So today’s arguments about funding for minority language literacy programmes could be tomorrow’s arguments about equitable rollout of translation gadgets.
That might seem beyond the largesse of governments, but think about the internet. Only two decades ago it was a rather exclusive luxury; but today it’s the subject of huge government subsidy, philanthropic investment in poorer countries, and even a UN resolution.
Certainly, there would be stark inequalities in the access to AR translation technologies; but it seems unlikely that the response would be simply not to support greater access to them, just continuing to support old-fashioned literacy programmes. How would that look, while the global elite lorded over them the ability to understand all humanity? One kind of inequality (to literacy and availability of services) would be replaced by another (to live translation), but both would be addressed by very similar politics.
So, equal access to live translation could be just another new area for the field of language rights.
Reprising the three goals of language rights outlined above, negative and positive rights could in time be subsumed and transformed into debate over access to translation technologies. Meanwhile the “third pillar” could remain largely intact, though constrained to language learning as a meaningful and rewarding leisure pursuit, not urging others to learn minority languages for reasons of accessibility.
Overall then, the future may be very different; but then in other ways, the more things change the more they may stay the same. The gradual global push to equalise global internet access is bearing fruit. If the same happened with live translation facilities, the inequalities I just outlined could be overcome in time.
Given the pace of technological improvements, and the steady spread of access to other technologies, how long will it really be before we all simply and instantly comprehend each other? There will no doubt be decades of teething troubles, and people would still find plenty of ways to misunderstand each other and start fights, just as speakers of the same language do now. But there is reason for optimism in a future of worldwide mutual understanding, with freedom to speak how you like, and nobody coercing anyone else to speak a certain way. If any of those positive outcomes are possible, then I for one welcome our robot overlords.
Chaudhary U, Xia B, Silvoni S, Cohen LG, & Birbaumer N (2017). Brain-Computer Interface-Based Communication in the Completely Locked-In State. PLoS biology, 15 (1) PMID: 28141803
Grin, F. 2003. Language Policy Evaluation and the European Charter for Regional or Minority Languages. New York: Palgrave Macmillan.
Jones, Carwyn. 2015. Letter to David Melding AM ‘Committee for the Scrutiny of the First Minister: Meeting on 13 March 2015’. www.senedd.assembly.wales/documents/s44696/CSFM402-15ptn2.pdf
Kirch, Nicole & Na Zhu. 2016. A discourse on the effectiveness of digital filters at removing noise from audio. Journal of the Acoustical Society of America 139, 2225. https://dx.doi.org/10.1121/1.4950680
Natterer, Fabian D., Kai Yang, William Paul, Philip Willke, Taeyoung Choi, Thomas Greber, Andreas J. Heinrich & Christopher P. Lutz. 2017. Reading and writing single-atom magnets. Nature 543: 226–228. https://doi.org/10.1038/nature21371
Sayers, D. 2016. Exploring the enigma of Welsh language policy (or, How to pursue impact on a shoestring). In R. Lawson & D. Sayers (eds.), Sociolinguistic Research: Application and Impact. London: Routledge. 195–214. http://www.routledge.com/books/details/9780415748520/