Computation and Language - ComboBench Can LLMs Manipulate Physical Devices to Play Virtual Reality Games?

2025-10-29

Alright learning crew, welcome back to PaperLedge! Today we're diving into the fascinating world of virtual reality and asking a really cool question: can AI, specifically those super-smart Large Language Models (LLMs) like the ones powering your chatbots, actually play VR games? Now, when we play VR, it feels pretty natural, right? We know if we want to pick something up, we reach out with the controller, squeeze a trigger, and bam, it's in our virtual hand. But think about it: your brain is doing a ton of work...

Now, when we play VR, it feels pretty natural, right? We know if we want to pick something up, we reach out with the controller, squeeze a trigger, and bam, it's in our virtual hand. But think about it: your brain is doing a ton of work translating the idea of "pick up bottle" into the specific button presses and movements needed to make it happen in the game.

This new research introduces something called ComboBench, a benchmark designed to test exactly how well LLMs can handle this translation. Think of it like a VR obstacle course for AI. The researchers picked four popular VR games – Half-Life: Alyx, Into the Radius, Moss: Book II, and Vivecraft – and created 262 different scenarios. So it's a pretty big test!

These scenarios require the AI to figure out the sequence of actions needed to complete a task. Let's say the goal is to "start the car in Vivecraft". A human player instinctively knows they need to get in the car, find the key, insert it into the ignition, and turn it. The AI has to figure out all those steps and then translate them into the correct controller inputs. That's a complex chain of reasoning that goes beyond simple fact retrieval.

The researchers pitted seven different LLMs, including big names like GPT-4o and Gemini 1.5 Pro, against these challenges. They also compared the AI's performance to how humans performed and to pre-programmed solutions (the "ground truth").

Here’s the juicy part. The results showed that even the best LLMs still struggle compared to humans. While models like Gemini 1.5 Pro are pretty good at breaking down tasks into smaller steps (like understanding that "starting the car" involves multiple actions), they have trouble with what the researchers call "procedural reasoning" and "spatial understanding."

"While top-performing models like Gemini-1.5-Pro demonstrate strong task decomposition capabilities, they still struggle with procedural reasoning and spatial understanding compared to humans."

Imagine trying to explain to an AI how to tie a shoelace. You can tell it the steps, but it's much harder to convey the subtle hand movements and spatial relationships involved. That's the kind of challenge these LLMs are facing in VR.

Interestingly, the researchers also found that giving the AI a few examples of how to perform similar tasks – what they call "few-shot examples" – dramatically improved performance. It's like showing the AI a cheat sheet! This suggests there's a lot of potential to improve LLMs' VR skills with targeted training.

Why does this research matter?

For gamers and VR enthusiasts: Imagine having AI companions that can actually help you solve puzzles or navigate complex VR environments. Or even AI opponents that are truly challenging and unpredictable.
For AI developers: This research highlights the limitations of current LLMs and points the way toward improving their reasoning and spatial abilities, which are crucial for a wide range of applications beyond VR.
For anyone interested in the future of human-computer interaction: Understanding how AI can interact with virtual environments is a key step toward creating more intuitive and immersive technologies.

The research team has made all their data and tools publicly available at https://sites.google.com/view/combobench, so anyone can try their hand at training an AI to conquer VR!

So, after thinking about all this, here are some questions that pop into my head:

If we can train AIs to play VR games, what other real-world tasks could they learn by practicing in virtual environments?
What are the ethical implications of having AI that can seamlessly mimic human actions in VR? Could this be used for malicious purposes?
Could playing VR games actually improve the reasoning and problem-solving abilities of AI, making them even more capable in other areas?

That's all for this week's episode of PaperLedge. Until next time, keep learning and stay curious!

Credit to Paper authors: Shuqing Li, Jiayi Yan, Chenyu Niu, Jen-tse Huang, Yun Peng, Wenxuan Wang, Yepang Liu, Michael R. Lyu

Comments (3)