Artificial intelligence is proving to be one of the pillars on which the metaverse will be built. Starting with the processing of user-generated data, continuing with generative AI models that create photorealistic virtual environments and avatars that resemble users, as well as the ability to recognise body movements and thus make the metaverse experience more natural.
But AI will also spark new life into the digital characters that populate virtual worlds, such as non-human characters and personal assistants, and enable everyone to understand each other in their own language by translating speech simultaneously. Artificial intelligence could help create increasingly engaging and user-friendly experiences to maximise activity and engagement time as is the case in social networks today, and could act as a watchdog, stopping harassment before it even happens, as long as we don’t have a problem with an intrusive AI listening to all our conversations and judging our every move.
What definition for the metaverse
One of the difficulties we have today is to define with sufficient accuracy what the metaverse is, a term that has come to the fore more for marketing reasons than anything else. Those of us who were already online in the early Nineties will remember similar problems in defining “cyberspace” during the years of the Internet boom; a dilemma that’s been long forgotten, since the pervasive and continuous use of the medium makes naming and definition issues fade into the background, an adjustment that will probably also affect the metaverse in the years to come.
However, if we insist on settling on a definition, we can consider the metaverse as a series of digital environments with various levels of immersiveness – from a simple browser or smartphone, to fully virtual reality environments – allowing interaction between many users (an environment limited by design to a single user does not fall under our definition). The metaverse, however, isn’t simply a multi-user video game, but rather will assume such an important and engaging role to represent a whole new piece of human existence or, if you like, a digital layer that will overlap and interconnect with the physical one.
One of the fundamental elements of this new environment will be the interactions we will have with other users, which will lead us to invest resources – time, yes, but also financial resources – to improve our status and our experience in the digital world. We will buy digital goods and services, perhaps in the form of NFTs, from companies and other users, feeding a parallel but interlinked economy.
We will be able to assume identities other than our physical one, that in some cases may be more fulfilling and engaging than the one we already have: think of William Dafoe’s character in the 1999 film eXistenZ, who was a petrol station attendant in the real world but a deity in the digital reality. In fact, this is something that has already been happening for years with MMORPGs, Massively Multiplayer Online Role-Playing Games, where millions of people shed their everyday clothes to become wizards, warriors, elves, and where many spend a fortune buying digital goods and services that are useful only for the game.
The metaverse, given its immersiveness, the greater involvement between users and an ensured network effect triggered by the huge investments of several Big Tech companies (Facebook/Meta first and foremost) could represent a new way of enriching one’s existence, or a colossal waste of time, depending on how the various implementation phases will be handled and how the society at large will respond.
AI use cases for the metaverse
Artificial intelligence will provide fundamental support to the metaverse, simplifying people’s access to digital environments, as well as helping with content generation and interaction between humans and virtual worlds. Here are some of the most important use cases.
Holding everything together
Primum vivere, said ancient Romans. First, you need to live, and for the metaverse to live servers and network systems need to be up and running. As companies hosting MMORPGs (such as World of Warcraft or Elder Scrolls Online, to name but a couple) are well aware, running an infrastructure that can host simultaneously over half a million users every day requires titanic efforts in terms of computational resources.
This is precisely why Meta recently unveiled the AI Research SuperCluster (RSC), one of the most powerful AI supercomputers in the world, which when completed – in mid-2022 – will be the most powerful, bar none. As stated by the company, one of the supercomputer’s tasks will be to take care of the metaverse, i.e. keeping digital worlds up and running and hosting the activities of millions of users, even simultaneously, without slowdowns or resource problems.
Artificial intelligence will also be used to scan and process in real-time the enormous amount of data produced every second by the activities of users in the company’s metaverse, to make other use cases possible.
Creating virtual environments
A digital world requires the presence of digital places, as in rooms or villas or grassy hills, to allow whoever is occupying them at that moment to move around, interact with the environment and carry out the various activities allowed by that particular place, whether it be a meeting room immersed in a mountainous landscape, a comet in deepest space or a reproduction of Minas Tirith. But whereas in the past for the construction of these digital environments we had to thank teams of developers who semi-manually created every single part, from the hills to the sea, placing trees or furniture by dragging them with the mouse, not to mention making sure floors and objects had the right collision (who hasn’t fallen through the world because of a missing collision at one point on the map), tomorrow it will be a generative AI model that will create all this, with very little human input.
It will be able to create environments that really exist in the physical world, generating with stunning realism the 3D scene from still photographs, thus allowing us to accurately recreate any existing place in the world, from the Colosseum to the gardens of the Alhambra in Granada, to the veranda of our beach house.
AI will also be able to generate completely made-up places. It may start from a few inputs by a developer, but then reinforcement learning will take over and the algorithms will design more places that are increasingly comfortable, or enjoyable, for human users. AI could analyse which environments we seem to enjoy the most, or relax the most, extracting their features and keep experimenting by creating places that are even more fun or even more relaxing. Refining the technique with each iteration until the perfect areas are created for our demanding human needs.
Making your own avatar
Although in the metaverse potentially nobody knows who you are, there will be situations – such as metaverse-hosted business meetings – where masquerading behind a nickname and a Salvador Dali mask may not be commonly accepted behaviour. In those environments it will be necessary, and useful, to be present not only with one’s real name but also with an avatar that looks as much like us as possible. Artificial intelligence can help here too, with models that analyse our photos and recreate a 3D avatar in our image and likeness.
Mapping body movements
If you ever spent some time in VR, you know that the current interfaces are not the best. This forfeits the objective of keeping people in the metaverse for as long as possible or making them log in as frequently as possible. So one goal is to make VR interactions more natural, allowing people to perform tasks as easily as picking up an object or waving a hand. To do this, artificial intelligence will look at our body movements, capturing them through sensors of different types, then transforming them into orders or movements of the avatar.
Raising your hand to greet someone should be as simple as in the physical world, without holding any controller in your hand, and opening or closing a virtual panel should be easy and immediate, with the AI correctly interpreting your every movement.
But the recognition won’t stop there. AI will also be able to copy our facial expressions onto the avatar, so that our smile is also the avatar’s smile, transferring more and more expressions – frowning, yawning, surprise, blinking, etc. – onto our digital twin, to make our transposition from the physical to the digital world is as realistic as possible.
Giving life to the metaverse digital denizens
In a digital world, we need digital people. As we already know, artificial intelligence is now able to hold discussions, correctly interpreting input and producing appropriately correlated output, giving the impression of understanding what is being said and being able to reply back. This ability, achieved through large language models of which GPT-3 is one example among many, can be incorporated into the various digital agents that will populate the metaverse to produce highly realistic virtual assistants or companions.
In online games these agents are called NPCs (Non-Playing Characters), i.e. elements that are usually graphically similar to user avatars but are there only to do a few simple tasks, such as starting a quest, handing out rewards, giving out info or doing something for cosmetic reasons (e.g. walking around). Over the years, some games have made these digital agents take on slightly more complex tasks, such as following the player in adventures and fighting alongside them. But even then, these ‘companions’ do not show great signs of intelligence (quite the contrary, trust me on this).
In the metaverse, thanks to AI, these NPCs or personal assistants will take on a completely new guise, performing ‘intelligent’ actions and far more complex tasks. Imagine a digital assistant helping novice users move around and explore the metaverse, recognising their mistakes and suggesting ways to correct them (or, in some cases, actually getting them out of trouble). Or imagine a digital secretary taking incoming messages while we are in a meeting in the metaverse, notifying them to us only once the meeting is over.
Or again, since this already happens with various smartphone apps, let’s imagine an area of the metaverse where virtual characters are there as friends or even companions, with whom to converse, tell them our problems, or with whom to entertain a ‘romantic friendship’. Let’s not be surprised by this: the ability of AI to create photorealistic human representations, together with the capability to entertain conversations of particular depth, will make digital romance a more widespread ‘guilty pleasure’ in the not-so-distant future. The metaverse can host that, too.
Real-time translation is one of the use cases explicitly acknowledged by Meta, which will dedicate part of its supercomputer specifically for this activity. The idea here is to enable groups of people from different countries, each speaking a different language, to speak and understand each other in real-time. To do this, the artificial intelligence model will first need to recognise the language spoken by one user, interpret every single word and recognise the meaning, translate it correctly into the language spoken by the other interlocutor and then generate the translated text in audio format, perhaps with the same voice as the first interlocutor (an audio deepfake would need to be used to simulate the voice).
All of this is already possible in theory. In practice, it requires massive resources, especially if you want to do it in near-real time and at the scale that the metaverse requires. But Meta has been directing resources there for quite some time already. In 2019 it released Wav2vec, to recognise speech structures directly from raw audio without the need for transcribed text. In May 2021 it demonstrated that unsupervised machine learning could recognise speech better than other methods, while in November of the same year its multi-language translation model beat other bilingual models in a machine translation competition. Meta stated in no uncertain terms that its goal is to create a universal translator.
We now know that all these research efforts, which started years ago, were aimed at finding a way for people from different countries to speak together in their native languages, and what better use case than the metaverse to put this project into practice.
Algorithms that increase engagement and presence
Even in the metaverse, metrics such as engagement, uptime, login frequency and so on will be crucial for companies hosting digital environments and content. Just as all social networks today are pushing to keep us as long as possible in their systems, the metaverse too will feature recommendation and content selection algorithms that will do everything in their power to show us what we are most interested in. And they won’t let go.
This is a very familiar situation: when we’re bored we already log on to social networks to see what is going on, to read something new, to talk to our contacts or – more frequently – because some notification prompted us to do so. And when we open the app or the site, various algorithms are at work to prevent us from disconnecting too quickly.
The same will apply to the metaverse, just heightened by the fact that we will enter an immersive environment, where algorithms will benefit from a higher degree of attention from us, as some of our most important senses (sight, hearing, even touch) will be fully relinquished to the immersive experience. AI models will be able to surround our senses, provoke our interest and ultimately seize our attention much more effectively than a simple smartphone screen.
If you spend some of your time in virtual reality, you might be accustomed to the fact that one hour there seems to pass more quickly, were it not for today’s VR devices that are still too bulky and heavy on the face. When it will be physically easier to immerse oneself in the metaverse, and when the clumsy systems regulating contents will be more sophisticated and intelligent, disengaging from this new digital existence will certainly become more difficult. AI, unfortunately, will be one more weapon that companies will use to convince us to stay connected as much as possible. To make matters worse, it is already believed that the metaverse will be a place full of marketing and manipulation.
Moderation and identification of harmful behaviour
As already reported in some news stories, the metaverse is not free of typically human problems such as harassment or bullying. These are challenges that all platforms have to come to terms with. Facebook first, and Meta today, know something about it and do not want their grandiose project to be ruined by the harasser next-door.
Artificial intelligence already helps human moderators intercept and examine suspicious behaviour; in the metaverse, these controls will only increase. Let’s not forget that in virtual reality every movement of our avatar can easily be recorded and documented, as can every word we say or hear. As the immersiveness and sophistication of devices increases – think of the body trackers we referred to earlier – the data points that could be intercepted and analysed by AI will only increase.
With this amount of information, it would not be impossible to create models that calculate the probability that harassment is happening (or is about to happen). If enough data were available, it would be enough to analyse all the behaviour that occurred before, during and after the harassment complaints to create a model that could recognise or predict them with good accuracy.
A system, hypothetical at the moment, which would be potentially very useful to allow everyone to enjoy a digital experience without disturbance or offence, but which also raises several questions, such as the degree of intrusion we are willing to allow in our private digital interactions.