【Research Summary and Key Points】 We have developed a technology that allows a robot arm to estimate the shape of objects that are difficult for conventional 3D measurement systems to handle—such as transparent containers and glossy packaging—from a single camera image, enabling successful grasping.

We developed a method that automatically determines the optimal shooting position and movement path by balancing the accuracy of shape estimation with the distance the camera must travel, even when observation from multiple viewpoints is required.

Verification using an actual robot achieved a 96.0% grasping success rate, while reducing camera travel distance by 52% and total handling execution time by 19% compared to conventional methods.

This research achievement is expected to promote the automation of processes that previously relied on human labor, contributing to improved productivity through the combination of high-precision grasping and efficient operation.

【Research Overview】 A research group led by Associate Professor Shogo Arai of the Department of Mechanical Engineering, Faculty of Science and Technology, Tokyo University of Science, and Ginga Kennis (a second-year master's student in the same department, 2025 academic year), has developed methods for "3D measurement (*1)" and "grasp planning (*2)" for objects that are difficult for robot arms to grasp, such as transparent or glossy items. Furthermore, they succeeded in reducing the movement and processing time required for image capture in a hand-eye configuration (*3).

Conventionally, transparent containers and glossy packaging have been difficult for robot arms to grasp automatically because light reflection and transmission on their surfaces make depth sensors and general 3D measurement systems unstable. The research group focused on a method combining semantic segmentation (*4) of RGB images, which is less susceptible to optical properties, with "Shape from Silhouette (*5)," which reconstructs shapes from contour information across multiple viewpoints. However, while multi-viewpoint shooting improves accuracy, it is time-consuming to move the camera, creating a conflict with the cycle times required in manufacturing environments. To solve this, the group introduced a cost function that balances improved 3D measurement accuracy with reduced camera travel distance, thereby optimizing shooting positions and movement paths.

In verification with an actual robot, the system achieved a 96.0% grasping success rate for transparent, glossy, and opaque objects. Compared to the baseline method, it succeeded in reducing camera travel distance by 52% and total handling execution time by 19%.

This research realizes robustness in grasping objects with difficult optical properties while reducing the time costs associated with multi-viewpoint observation. This is expected to expand the range of robot applications, contributing to the promotion of automation and improved productivity in manufacturing sites.

This research result was published online in the international academic journal "IEEE ROBOTICS AND AUTOMATION LETTERS" on January 12, 2026. The findings are also scheduled to be presented at the 2026 IEEE International Conference on Robotics & Automation (ICRA 2026), a top-tier conference in the field of robotics, attracting global attention.

[Figure 1: Performance comparison between previous research (VGN, GraspNeRF) and the method developed in this study (HEAPGrasp w/o VP, HEAPGrasp)] [Figure 2: (Left) Grasping a transparent object (Center) Grasping an object inside a transparent bag (Right) Grasping a glossy object]

【Research Background】 Robot arms are devices that contribute to improved productivity and the reduction of manual labor in factories and production sites by automatically grasping, transferring, and aligning parts and products. In recent years, the development of high-precision cameras and 3D measurement technology has led to their application in increasingly complex tasks. However, transparent or glossy objects have posed a challenge, as light reflection and transmission make 3D measurement unstable, hindering automatic grasping. Additionally, accurately grasping the position and orientation of an object requires capturing images from multiple viewpoints, which consumes time for measurement and processing. This has resulted in lower cycle times and reduced efficiency for the entire system.

Therefore, the research group focused on semantic segmentation using RGB images and Shape from Silhouette. Furthermore, they aimed to achieve both accuracy and efficiency by introducing a cost function that balances 3D measurement accuracy with camera travel distance, optimizing the trajectory of the hand-eye camera.

【Details of Research Results】 In this study, verification was conducted using an actual robot, comparing conventional baseline methods (VGN, GraspNeRF) with the newly developed methods (HEAPGrasp, HEAPGrasp w/o VP).

The conventional VGN method achieved an 88.5% success rate for opaque objects, but this dropped to 72.0% for glossy objects and 53.8% for transparent objects. GraspNeRF showed a similar trend, achieving a 91.7% success rate for opaque objects, but dropping to 52.2% for glossy objects and 68.2% for transparent objects. In contrast, the HEAPGrasp w/o VP and HEAPGrasp methods developed in this study achieved high grasping success rates of 92.6% or higher across all categories. Furthermore, they demonstrated strong generalization capabilities even for unknown objects not used during training, achieving success rates of 98.0% and 96.0%, respectively. The difference in success rates is primarily due to the difference in 3D measurement accuracy.

Because the conventional GraspNeRF method requires six viewpoints around the workspace, it required a camera travel distance of 2.33 m and an execution time of 18.8 seconds. In contrast, the HEAPGrasp w/o VP developed in this study reduced the camera travel distance to 2.03 m and the execution time to 9.91 seconds. Furthermore, HEAPGrasp achieved a 52% reduction in camera travel distance to 0.97 m and a 19% reduction in execution time to 8.01 seconds compared to HEAPGrasp w/o VP.

Associate Professor Arai, who led this research, commented: "In manufacturing and logistics, objects that are difficult to automate, such as transparent trays and glossy bags, remain in the process, creating bottlenecks that rely on human hands. We undertook this research with the goal of creating a mechanism that is less susceptible to optical properties and achieves 'seeing and grasping' with the minimum necessary movement. These results are expected to promote the robotization of processes that previously relied on human labor, contributing to improved productivity. Furthermore, the design, which reduces the need for pre-adjustment, makes it easier to introduce into existing facilities and handle a wider variety of products. In the future, we aim to realize a robot system that can easily generate optimal movements according to operational conditions, even in workplaces with frequent setup changes or additional objects."

【Terminology】 *1: 3D Measurement Technology that uses cameras or distance sensors to acquire the position and shape of an object as 3D information (x, y, z). It serves as the foundation for robots to grasp the accurate position and orientation of an object for grasping or transfer.

*2: Grasp Planning The process of determining the position and orientation from which a robot can stably grasp an object, based on shape and orientation information obtained through 3D measurement.

*3: Hand-Eye Configuration A mechanism where a camera is mounted on the robot's end-effector (hand), and the robot's movements are controlled based on the camera's visual information.

*4: Semantic Segmentation An image recognition method that classifies each pixel in an image into semantic classes such as object or background. Because it can extract the region of an object at the pixel level, it is used to improve the accuracy of object recognition in 3D measurement and grasp planning.

*5: Shape from Silhouette A method for estimating a 3D shape using silhouette images of an object acquired from multiple viewpoints. While it is difficult to reproduce the internal shape of an object, it is characterized by relatively simple calculations and the ability to stably reconstruct the outer shape of an object.

【Paper Information】 Journal Name: IEEE ROBOTICS AND AUTOMATION LETTERS Paper Title: HEAPGrasp: Hand-Eye Active Perception to Grasp Objects with Diverse Optical Properties Authors: Ginga Kennis and Shogo Arai DOI: 10.1109/LRA.2026.3653331

FACT BOX

  • Source: PR TIMES
  • Category: research