Nexdata Expands Physical AI Data Collection Factory with Total Investment of 2.5 Billion JPY, Offering Datasets for Foundation Models and Ego-centric Data Collection
Nexdata invested over 2.5 billion yen to build an 8,000 sq meter physical AI data collection factory. Operating over 400 robots, it provides crucial real-world datasets for robotics development.
📋 Article Processing Timeline
- 📰 Published: April 14, 2026 at 21:00
- 🔍 Collected: April 14, 2026 at 12:31
- 🤖 AI Analyzed: April 19, 2026 at 17:19 (124h 47m after Collected)
Background
The evolution of AI technology is shifting from the era of Large Language Models (LLMs), which specialize in generating information in the digital space, to the stage of 'Physical AI,' which interacts directly with the physical world and operates autonomously. In Japan, the decline in the working-age population due to a declining birthrate and aging population coincides with the need for automation in the manufacturing and service industries, accelerating market expansion. While conventional generative AI was primarily aimed at processing text and 2D images, Physical AI is positioned as next-generation infrastructure that integrates environmental perception through sensors with physical movements of robots, directly contributing to solving real-world challenges.
Large-scale Data is Essential for Physical AI Development
It is becoming common knowledge in the industry that the 'Scaling Law' applies to the development of Physical AI, just as it does for LLMs. To enhance the versatility of models and control accuracy in real environments, large-scale, high-quality real-world data containing diverse physical phenomena and movement patterns that cannot be fully replicated by simulations is indispensable. However, data collection in the real world has become the biggest bottleneck in the development process, compounded by the cost of building environments, the difficulty of synchronizing and calibrating multiple sensors, and the burden of annotation.
To solve this issue, Nexdata has invested over 2.5 billion yen in total to construct a dedicated data collection factory spanning over 8,000 square meters. We provide comprehensive data solutions to accelerate Physical AI development, ranging from data collection within the factory, ego-centric real-environment data collection and annotation, to off-the-shelf datasets supporting environmental perception, decision-making, and motion control. Through cost advantages driven by large-scale production and data assets that are immediately usable in the development field, we help shorten the lead time for developing Physical AI and VLA models and improve the accuracy of real-world adaptation.
Large-scale, Low-cost Data Supply Realized by an 8,000 sq meter Data Collection Factory
Nexdata has invested over 2 billion yen specifically into data infrastructure for Physical AI development. Currently operating two large-scale data collection factories with dedicated space exceeding 8,000 square meters, we can simultaneously run over 400 diverse robot platforms, including humanoid robots, quadruped robots, industrial robot arms, and multi-fingered manipulators.
Within the facilities, we have established diverse scenario environments that faithfully replicate real operational environments such as homes, pharmacies, manufacturing lines, and logistics warehouses, staffed by over 600 full-time operators and management personnel.
This makes it possible to efficiently produce high-quality Physical AI data that covers the entire development phase: from pre-training of large-scale foundation models, fine-tuning for specific tasks, to imitation learning where robots learn by mimicking human demonstrations.
Comprehensive Collection Factory
Dedicated Robot Hand Collection Factory
Off-the-shelf Datasets to Train the 'Perception, Brain, and Cerebellum' of Physical AI
In Physical AI development, it is standard practice to design along a 3-layer architecture consisting of the 'Environmental Perception Layer,' 'Decision-making Layer,' and 'Motion Execution Layer,' which are responsible for understanding the environment, action planning, and precise control, respectively. Nexdata achieves the rapid production of datasets precisely because it is fully equipped with massive specialized data collection hubs and dedicated collection personnel. To date, we have provided the following cost-effective datasets cumulatively:
■ Environment Database (Environmental Perception Layer)
Contains over 288 million sets of high-precision 3D models and real-world scene data. By providing a simulation base close to real environments containing diverse lighting conditions, object placements, and background patterns, it supports the improvement of a robot's spatial awareness and object detection accuracy.
■ Brain Dataset (Decision-making Layer)
Contains 4,000 hours of ego-centric multi-task execution videos. It comprehensively covers the visual inputs and action sequences when humans perform daily tasks (e.g., cooking, cleaning up, displaying products, etc.), making it ideal for planning tasks that require long-term dependencies and for training context-aware decision-making models.
■ Cerebellum/Body Dataset (Control Layer)
Contains over 10,000 sets of high-fidelity trajectory data, joint angle time series, and haptic feedback information. It can be utilized as foundational data to support the learning of low-level control policies in imitation learning and reinforcement learning, as well as precise motion execution in real environments.
Furthermore, Grasping, Manipulation, and Haptic Feed
The evolution of AI technology is shifting from the era of Large Language Models (LLMs), which specialize in generating information in the digital space, to the stage of 'Physical AI,' which interacts directly with the physical world and operates autonomously. In Japan, the decline in the working-age population due to a declining birthrate and aging population coincides with the need for automation in the manufacturing and service industries, accelerating market expansion. While conventional generative AI was primarily aimed at processing text and 2D images, Physical AI is positioned as next-generation infrastructure that integrates environmental perception through sensors with physical movements of robots, directly contributing to solving real-world challenges.
Large-scale Data is Essential for Physical AI Development
It is becoming common knowledge in the industry that the 'Scaling Law' applies to the development of Physical AI, just as it does for LLMs. To enhance the versatility of models and control accuracy in real environments, large-scale, high-quality real-world data containing diverse physical phenomena and movement patterns that cannot be fully replicated by simulations is indispensable. However, data collection in the real world has become the biggest bottleneck in the development process, compounded by the cost of building environments, the difficulty of synchronizing and calibrating multiple sensors, and the burden of annotation.
To solve this issue, Nexdata has invested over 2.5 billion yen in total to construct a dedicated data collection factory spanning over 8,000 square meters. We provide comprehensive data solutions to accelerate Physical AI development, ranging from data collection within the factory, ego-centric real-environment data collection and annotation, to off-the-shelf datasets supporting environmental perception, decision-making, and motion control. Through cost advantages driven by large-scale production and data assets that are immediately usable in the development field, we help shorten the lead time for developing Physical AI and VLA models and improve the accuracy of real-world adaptation.
Large-scale, Low-cost Data Supply Realized by an 8,000 sq meter Data Collection Factory
Nexdata has invested over 2 billion yen specifically into data infrastructure for Physical AI development. Currently operating two large-scale data collection factories with dedicated space exceeding 8,000 square meters, we can simultaneously run over 400 diverse robot platforms, including humanoid robots, quadruped robots, industrial robot arms, and multi-fingered manipulators.
Within the facilities, we have established diverse scenario environments that faithfully replicate real operational environments such as homes, pharmacies, manufacturing lines, and logistics warehouses, staffed by over 600 full-time operators and management personnel.
This makes it possible to efficiently produce high-quality Physical AI data that covers the entire development phase: from pre-training of large-scale foundation models, fine-tuning for specific tasks, to imitation learning where robots learn by mimicking human demonstrations.
Comprehensive Collection Factory
Dedicated Robot Hand Collection Factory
Off-the-shelf Datasets to Train the 'Perception, Brain, and Cerebellum' of Physical AI
In Physical AI development, it is standard practice to design along a 3-layer architecture consisting of the 'Environmental Perception Layer,' 'Decision-making Layer,' and 'Motion Execution Layer,' which are responsible for understanding the environment, action planning, and precise control, respectively. Nexdata achieves the rapid production of datasets precisely because it is fully equipped with massive specialized data collection hubs and dedicated collection personnel. To date, we have provided the following cost-effective datasets cumulatively:
■ Environment Database (Environmental Perception Layer)
Contains over 288 million sets of high-precision 3D models and real-world scene data. By providing a simulation base close to real environments containing diverse lighting conditions, object placements, and background patterns, it supports the improvement of a robot's spatial awareness and object detection accuracy.
■ Brain Dataset (Decision-making Layer)
Contains 4,000 hours of ego-centric multi-task execution videos. It comprehensively covers the visual inputs and action sequences when humans perform daily tasks (e.g., cooking, cleaning up, displaying products, etc.), making it ideal for planning tasks that require long-term dependencies and for training context-aware decision-making models.
■ Cerebellum/Body Dataset (Control Layer)
Contains over 10,000 sets of high-fidelity trajectory data, joint angle time series, and haptic feedback information. It can be utilized as foundational data to support the learning of low-level control policies in imitation learning and reinforcement learning, as well as precise motion execution in real environments.
Furthermore, Grasping, Manipulation, and Haptic Feed