How to explain the failure of voice tweets


In 2020, Twitter decides to develop voice tweets, which allow you to tweet not through writing but with your own voice. They immediately arouse criticism and incomprehension. Quickly abandoned, the functionality seems to be reduced to niche uses. A rather telling tale of failure.

“The first voice tweets are a failure”, writes editorial project manager Niri Brusa. “Remember voice tweets? Lmao”, questions the murderous title of Mashable, echoing a number of perplexed internet users. The observation is implacable: voice tweets are a semi-failure. But what has happened? 

In recent years, podcasts have emerged, voice assistants have become mainstream – in short, the “vocal turn” of the Internet has gradually taken shape. “One thing is certain, the revolution will be podcasted”, humorist Augustin Shackelpopoulos gently mocks in his show, DAVA 8.

Voice tweets therefore seemed to have everything needed to fit in with the times without too much difficulty. More specifically, Twitter’s bet was to ride the voicemail boom. “It’s a whole generation, born with new technologies, which prefers to communicate by voice messages”, we could already read in the columns of Figaro in 2019. Family conversations on WhatsApp, arguments on Messenger, our whole social life can now be summed up in these little voice capsules that have managed to combine the convenience of the answering machine with the short side of digital messages.

In order to understand the reasons for the profound lack of interest in this functionality, we can first of all cite some very pragmatic reasons. “There has been quite a debate about moderation. A legitimate debate since there can be more problems with voice than in text”, analyses Emmanuelle Patry, marketing strategy consultant. Moderation after the event that makes you gnash your teeth. Back-pedaling is immediate: Twitter announces the forthcoming integration of subtitles in voice tweets. Further proof of the impasse that voice alone constitutes, for the consultant:

“The idea of voice tweets was a response to the vocal trend, in search of sound, but it doesn’t really meet users’ needs. The vocal trend is starting to be strong on Instagram, and in private conversations generally, but has not caught on in public conversations”.

In reality, there is a misunderstanding about the rise of orality on the web 2.0. The voice is in a ‘state of limbo’, neither really intimate because it is aimed at a specific audience, nor really totally public. 

A strategic mistake?

For Elsa Godart, philosopher at Gustave Eiffel University, the specific grammar of the social network does not allow good integration of voice tweets:

“Twitter’s native format is that of jousting. It’s normally tit for tat, but with voice tweets you can’t see anything. We’re no longer in the age of presenteeism, we’re in the age of instantaneousness”

“The voice message is very egotistical, we subject the other to a message”, says the researcher, author of the essay Éthique de la sincérité. Survivre à l’ère du mensonge (Ethics of Sincerity. Surviving the era of lies) (Armand Colin, 2020). Some people denounce this verbal attack, where the interlocutor cannot add his or her point of view. A mechanism that is not new – it is also present in private messaging.

The vocal in public is now indistinguishable from its polemical aspect. Whereas in the past, the author of a huge article was answered with “Didn’t read LOL”, an expression that has become typical of web folklore, the Internet user is forced to refuse to use or listen to a voice tweet, for the simple and good reason that its form in itself prevents any dialogue. 

An intimate experience, really?

The emerging discussion is no longer the same, according to Elsa Godart:

“The voice is intrusive and fragments daily life, the silence at work. It is disturbing. Sometimes you are told ‘I don’t have time to listen to your message’. It creates a break, you wait five hours before listening to it”

 Voice tweets intrude into the silent regime of written conversation. This sudden intrusion into intimacy is not to everyone’s taste. “The vocal social networks that are beginning to emerge, such as ClubHouse, are losing sight of the fact that the most interesting thing is listening to people talking to each other, like on the radio or in a podcast. Whereas a voice on its own is quite disembodied”, laments Alexandra Profizi, a specialist in digital practices. 

The voice alone loses its intimate aspect to drown in the flood of images and videos that constitutes the timeline of the social network. Between the image noise and the silence of the scrolling, there seems to be no room for another format. Alexandra Profizi explains:

“First there were the classic phone calls, with a beginning and an end to the conversation. Then there were mobile phones, which could be answered anywhere. Then there were social networks with a certain return to the written word, and to vocal tools that took over the fragmented aspect of social networks”.

The vocal turn is therefore not obvious. It is a question of returning to the origins of the telephone, even though practices and needs have greatly evolved.

Twitter is just starting to test its “Spaces” feature, modelled on ClubHouse. The social network intends here to focus on “the intimacy of the human voice”.

“It reminds me of Bubble, a social network that allowed you to make sound bubbles, little capsules that you could listen to. But as far as I know, it didn’t really work”, explains Emmanuelle Patry. There is a contradiction in wanting to make the voice of users heard. An online account is a staging, a way of appearing to the world. Making your voice heard is, in a way, breaking the image that you have built for yourself online. For David Le Breton, author of Éclats de voix: une anthropologie des voix (Métailié, 2011), the voice often disappears in favour of the meaning of what is said. However, on social networks, there is nothing more salient than a voice that breaks the silent continuity of scrolling text.

All the more so as the format, limited by the number of seconds, does not allow a real connection to be established, as Elsa Godart reminds us:

“The choice to limit messages to 240 seconds is a form of castration of language, it is the antithesis of speech”

The betrayal of images 

For a long time, the Internet was the realm of the written word. Web 1.0 had difficulty supporting multimedia additions, such as sound or video. Today, one would think that all formats are consumed indiscriminately. However, one element rises above the others. According to Social Insider, images are still the preferred mode of expression for brands, “whether on Instagram, Facebook or Twitter”. It is therefore not easy to force in orality via voice tweets or Spaces. 

Alexandra Profizi, who was interested in memes in her book Le Temps de l’ironie (The Time of Irony) (L’Aube, 2020), agrees:

“The combination of the written and the visual has given rise to a lot of creativity, because we play on the complementarity of a written sentence and an image that will create the contrast. It is this combination that gives more power to a publication”.

“Don’t underestimate Twitter’s voice tweet feature – it could completely change the platform”, warns Forbes magazine. Quoting a 2005 study, the article points out that when speaking, people only manage to distinguish the difference between first degree and sarcasm in “56% of cases”, “which is hardly more than random chance”. Poe’s famous law, according to which “it is difficult to distinguish extremism from satire of extremism on the Internet unless the author clearly indicates his/her intent”, is not about to be called into question by the form of “sincerity” that can constitute the oral form. “Even if it could be said that to some extent, the basic form of utterance in Twitter is irony”, adds Alexandra Profizi. 

Actually, current uses tend more to be humorous, to a staging of oneself through speech and the absence of one’s image on the screen.

The constant play with limits

Generally speaking, audio humor is an integral part of web culture. Vocal memes exist (from “This is Sparta!” to Senor Chang’s “Ha, Gay!”), and often manage to detach themselves from their original pictorial attachments. The use of GIFs to evoke them is still common: the sound is then left only suggested by the animated image. The vocal aspect of the meme is there, hollowed out. On Twitch, the mechanics are different, and allow you to insert voice memes in the form of small jingles.

For its part, the voice tweet, due to its hybrid format, does not really allow the emergence of the memic form. Everything is therefore based on a composite use: the vocal tweet is content just to take up written comic elements, it is often only the simple oralization of a written content.

The culture of writing seems to stick to the skin of Twitter, which nevertheless seeks to develop a more pronounced audiovisual grammar (as with the arrival of the Fleets).

On the fringe, a few large accounts make occasional use of the functionality. Like the BBC, to promote its Doctor Who series:

These companies have clearly understood that voice on the Internet only works in relation to an image, and in a context of parodic diversion. As for the media – especially podcasts – they have always known how to enhance their productions by using graphic possibilities that already exist.

In his book Visual Thinking (1969), the art historian Rudolf Arnheim develops the idea that the boundaries between images, text and sound are becoming increasingly porous. The Internet proved him right. Isolating just one aspect nowadays seems like a risky gamble: if you listen to podcasts, it is because you are doing something else on the side. The voice tweet is not long enough to do the dishes, and it is not a personal enough message to make you feel involved. 

In this, Twitter has probably not understood the revolution brought about by platforms like Twitch or TikTok, which are places where the fusion between sound, image and text takes place in an almost organic way. As for the Spaces feature, it is gradually opening up to emojis and this addition is not insignificant: “reasumption” is integral with the Internet. TikTok videos are reposted on Twitter, tweets are screened on Instagram, and vice versa. In short, it is to this infinite discussion that the vocal tweets refuse to open up. The only thing left is the choice between the impermeable ‘in-between’ and the very private clubs. Like ClubHouse and Spaces. 

Leave a Reply

Your email address will not be published. Required fields are marked *