X Square Robot’s Wang Qian: Robots will eventually reach Mars

Interview with 100 Founders: China's Next Wave — Founder #2

Apr 11, 2026

∙ Paid

X Square Robot's Wang Qian: Robots will eventually reach Mars

At EAIDC 2026 in Shenzhen, X Square Robot CEO Wang Qian outlined the company’s push toward scalable embodied AI, predicting that general-purpose robots could eventually operate even in extreme environments like Mars.

On March 30, 2026, in Shenzhen, a quiet but telling milestone took place in the embodied AI world. X Square Robot (自变量机器人) hosted the inaugural Embodied AI Developers Conference — EAIDC 2026 — billed as the world’s first large-scale gathering dedicated specifically to developers building embodied AI systems. The event brought together researchers, engineers, and technology companies from across the industry for live robotic demonstrations, a national-level hackathon, and focused discussions on a topic that most robotics conferences still treat as futuristic: not whether robots can be intelligent, but how to deploy them at scale in the real world.

That framing — from can it work to how do we ship it — is precisely what distinguishes X Square from most companies in this space. While competitors have spent the past two years perfecting demo videos of robots doing backflips, X Square has been obsessively focused on a harder, less photogenic problem: building a foundation model general enough to actually run in a real home environment, a messy and variable space without failing the moment something unexpected happens.

EAIDC was, in some ways, a coming-out party. X Square is no longer a well-funded startup operating in the background. It is increasingly the company setting the terms of the conversation. Following the conclusion of the conference, Wang Qian(王潜), X Square’s founder and CEO, sat down with Pandaily for an exclusive interview to discuss the road ahead.

What X Square Actually Builds

To understand X Square, it helps to understand what problem it is trying to solve.

Traditional industrial robots are essentially very precise, very fast machines that follow pre-programmed instructions. They excel in structured environments — the same task, the same object, the same position, repeated thousands of times. Change any variable and they fail. They have no ability to observe the world, reason about it, and adapt.

The company’s answer is what it calls an Embodied Intelligence Foundation Model — specifically, a Vision-Language-Action (VLA) model that processes sensory input (video, language, haptic signals) and outputs physical action (joint torques, velocities, poses) in a single end-to-end architecture. The model, branded the Great Wall series with its flagship called WALL-A, is designed to enable robots to perceive, decide, and act autonomously across diverse environments and tasks without being explicitly reprogrammed for each one.

X Square claims WALL-A is currently one of the world’s largest-scale unified embodied foundation models — achieved via a novel three-stage training paradigm, a proprietary architecture combining shared attention with expert-routed feed-forward modules, and training data drawn from a mix of self-collected real-world demonstrations, open-source robot datasets, and automatically generated multimodal data.

In September 2025, X Square open-sourced a developer-facing version of its model called WALL-OSS, which has since been integrated into Hugging Face’s LeRobot framework — a move that simultaneously builds community goodwill and feeds the data flywheel. EAIDC itself can be read as an extension of this strategy: by convening the ecosystem, X Square is positioning its open-source model as the platform developers build on top of.

On the hardware side, X Square produces two robot platforms: Quanta X1, a wheeled dual-arm robot designed for precision manipulation in service environments, and Quanta X2, a wheeled humanoid launched in August 2025. It also makes the Artixon Hand, a 20 DoF dexterous robotic hand — which the company claims is the world’s first such hand to be controlled directly by a foundation model rather than traditional motion-planning algorithms.

The company is headquartered in Shenzhen with offices in Beijing and Shanghai, and has deployed its robots in hotel groups, logistics companies, elderly care facilities, supermarkets, and research institutions. Early commercial revenue is coming in from education, hospitality, and elder care, with household services — including a collaboration with classifieds platform 58.com — currently active.

The Founder: A Transformer Pioneer Who Quit Finance to Build Robots

Wang Qian is not a typical robotics entrepreneur. His academic background is in deep learning — specifically, he was among the earliest researchers in the world to introduce the attention mechanism into neural networks, publishing work at the same conference as Google’s early attention paper in 2014, three years before the Transformer architecture would reshape the entire field of AI.

After completing his bachelor’s and master’s degrees at Tsinghua University, he pursued a PhD at the University of Southern California, conducting robotics learning and human-robot interaction research at top American robotics laboratories. But he pivoted mid-career — to quantitative finance. He founded a quant fund in the United States, which by his own account was financially successful. And then he spent many nights unable to sleep.

“I kept thinking: I should have stayed in robotics,” Wang has said in interviews. In 2023, he dissolved the fund and returned to China.

His co-founder and CTO, Wang Hao (王昊), brings a complementary profile: a PhD in computational physics from Peking University, he previously led the algorithm team for the Fengshenbang large language model at the Institute for Intelligent Computing (IDEA Research), overseeing development of China’s first hundred-billion-parameter foundation model and one of the earliest trillion-parameter models, Ziya.

Together, the two founders embody X Square’s core thesis: that building a general-purpose robot brain requires expertise in both large foundation models and robotics learning — and that the two cannot be solved in isolation.

“We Were the Only Variable”

The name “X Square” is not arbitrary. In mathematics, x represents the independent variable — the thing that changes and drives outcomes. The Chinese name, 自变量 (zì biànliàng), carries the same meaning with an additional nuance: zì means “self” or “autonomous.” The company wants to be the variable that changes the world — and it wants that change to be self-generated, not dependent on others.

This framing turns out to be an accurate description of how X Square navigated its early years.

When the company was founded in late 2023, the embodied intelligence landscape in China was already competitive. Galbot (银河通用) and Agibot (智元机器人) had both launched that same year with considerable fanfare, larger teams, and more initial funding. X Square, by contrast, operated with little public visibility and struggled to raise early rounds.

“The biggest difficulty was that nobody trusted us,” Wang Qian has recalled. “Early investors didn’t believe we could actually pull this off. They bet on the team being decent enough that even if we failed at this, we’d find something else to do.”

What changed was output. X Square released its first embodied intelligence model just two months after founding. By October 2024, it had trained WALL-A — at the time, the largest-parameter general-purpose embodied manipulation model in the world. The demos showed something qualitatively different: robots that could hang laundry, prepare shaved ice, wind cables around pegs, and sort parcels of arbitrary shape, all controlled by a single model without task-specific reprogramming. By the time Physical Intelligence (PI), the American benchmark company in this space, released its π0 model in late 2024 — validating the end-to-end VLA approach — X Square had already been on that path for over a year.

“We didn’t need to copy anyone’s homework,” Wang has said. “We were already there.”

By early 2025, the funding picture had transformed entirely. Meituan led a Series A round; Alibaba Cloud and Hongshan (formerly Sequoia China) co-led an A+ round of nearly RMB 1 billion; and by January 2026, a 1 billion yuan A++ round closed with ByteDance, Hongshan. The company has now raised approximately $280 million in total — and is the only Chinese embodied intelligence startup to have simultaneously attracted backing from Meituan, Alibaba, and ByteDance.

The Technical Bet: Why End-to-End Matters

The central debate in embodied intelligence — and the one X Square has wagered its existence on — is architectural. Should a robot’s “brain” be built as a modular stack (perception module → planning module → control module), or as a single end-to-end model that processes raw sensor input and outputs raw motor commands without explicit intermediate representations?

X Square’s answer has been unambiguous from day one: end-to-end, always.

Wang Qian’s reasoning is both technical and philosophical. Modular systems, he argues, impose human-designed abstractions onto a continuous physical world. Each handoff between modules introduces error, latency, and brittleness. The physical world doesn’t segment neatly into “perception” and “planning” — it is a continuous stream of force, contact, spatial relationships, and temporal dependencies. A model that learns to navigate that stream as a whole, rather than as a sequence of sub-problems, will generalize better and fail more gracefully.

“Embodied intelligence is a foundation model for the physical world — independent from, and parallel to, language models and multimodal models for the virtual world,” Wang said at the 2025 MEET conference. “The long-term advancement of robotics intelligence depends on general-purpose AI capability.”

The practical implication is that WALL-A takes language, video, and haptic sensor data as input and outputs velocity, pose, and torque directly — with no intermediate planning layer. The same model controls the company’s wheeled robots, its dexterous hand, and in principle any robotic platform it is trained on. In July 2025, X Square demonstrated what it claims was an industry first: a foundation model controlling a high-degree-of-freedom dexterous hand to perform complex manipulation tasks, including card dealing, without task-specific engineering. The company has also integrated chain-of-thought reasoning natively into the model — enabling robots to break down complex instructions into subtask sequences and reason through them before acting.

Where the Robots Are Actually Working

Keep reading with a 7-day free trial

Subscribe to China's Next Wave to keep reading this post and get 7 days of free access to the full post archives.