I often think about how Generative AI and the metaverse will intersect and overlap. Both are seen as multi-trillion-dollar, revolutionary opportunities. So, where’s the unique common ground between these two emerging technologies?
WTF? ChatGPT Builds Metaverse Assets
Bart Trzynadlowski created ChartARKit, a concept I find inspiring as it integrates ChatGPT into a real-time AR camera feature using Apple’s ARKit. In the video below, you’ll see him generate 3D assets in real time by prompting the AI:
It’s as if he has the entire world in his palm, with godlike powers over his own virtual domain. This feature makes AR leagues more interesting since you can ask the AI to materialize anything in your view. Finding the asset online, sometimes having to pay for the file, and then uploading it in your AR environment takes time. Let alone the hours it takes to design the asset yourself, which is usually the case.
Broadly speaking for the metaverse as a whole, I think Generative AI plays a critical role in assisting the development of spaces and the things we fill them with.
The metaverse largely depends on sandbox environment architecture, a style of game design used in Minecraft and Roblox that gives users control over customizing and building things in their world. Meta Horizon Worlds is a sandbox metaverse, for example.
The challenge with this freedom to create in the metaverse is that it’s not that easy. Designing and deploying 3D assets is not like opening the iPhone camera app, snapping a picture, and then uploading.
“We’re able to fill the internet with interesting stuff because everybody is capable of taking a picture, recording a video, or writing words,” says Rev Lebaredian, VP for Omniverse and simulation technology at the chipmaker Nvidia. “If we are going to create a 3-D internet, then you absolutely have to have the people who are participating in it creating content as well—and the only hope we have of making that happen is if AI can help us.” – Time
Generative AI, on the other hand, gives this power of 3D creation to everyone. And it only takes a prompt. With AI, we can fill the metaverse with detail-rich and customizable visuals infinitely faster than humans can by hand. Not to mention, you can do it on the fly because of how quick AI is.
Imagine being able to build an entire metaverse environment with the ease of creating a mood board on Pinterest. You type a couple of things in, pick a design style, pin a few things, and you end up with a solid vision. The AI enhances this further by then building whatever you wanted with the design specs you outlined. In the near term, we’ll generate individual assets, one by one, with the AI’s help and place them throughout the metaverse. Eventually, we’ll generate entire virtual neighborhoods and explorable metaverse cities from a single prompt.
There are many text-to-3D generators on the market, including GET3D and Magic3D from Nvidia, Make-a-Video from Meta, and DreamFusion from Google. But the ultimate vision for this AI-meets-the-metaverse crossover would be for the Generative AI to be native in the metaverse builder platform.
Meta has a retention problem. They can get people to try Meta Horizon Worlds but they can’t get people to come back. Although building in Horizon Worlds is by far the easiest of all platforms, it’s not where it needs to be. The learning curve and time investment is too steep. That’s why nine of ten Horizon users leave within a month. However, I feel that an AI integration like this would solve their retention problem because it would make building metaverse worlds incredibly simple and addictive.
Frankly, for this type of feature to have a profound impact, I believe we need Multimodal Large Language Models (MLLMs). As you may recall from Everydays 171, MLLMs are different in that they operate in multimedia input and output – they’re multi-sensory, in a sense. In other words, MLLMs can see what’s in a picture while also getting input from text and sound and then can generate output in any of these formats.
MLLMs would be crucial for metaverse building because they’d need to be able to see the space as you do in order to take your feedback and be effective in co-designing 3D worlds with you.