Ricoh Develops Multimodal Large Language Model with Reasoning Performance in "GENIAC" Phase 3

Ricoh has developed "Qwen3-VL-Ricoh-32B-20260227," a multimodal large language model with reasoning capabilities that can accurately interpret complex documents including charts and tables, as part of Phase 3 of the "GENIAC (Generative AI Accelerator Challenge)" project by METI and NEDO. A lightweight version, "Qwen3-VL-Ricoh-8B-20260227," is being released for free, aiming to streamline knowledge utilization within enterprises.
新製品NQ 0/100出典:PR Times

📋 Article Processing Timeline

  • 📰 Published: March 30, 2026 at 20:10
  • 🔍 Collected: March 30, 2026 at 22:56 (2h 46m after Published)
  • 🤖 AI Analyzed: April 22, 2026 at 22:42 (551h 46m after Collected)
Ricoh Co., Ltd. (President and CEO: Akira Oyama) announces the completion of the development of "Qwen3-VL-Ricoh-32B-20260227," a basic model of a multimodal large language model (hereinafter, reasoning LMM) equipped with reasoning performance*2, capable of accurately interpreting diverse documents including charts and tables. This development was carried out in Phase 3 of "GENIAC (Generative AI Accelerator Challenge)"*1, a project implemented by the Ministry of Economy, Trade and Industry (METI) and the New Energy and Industrial Technology Development Organization (NEDO) to strengthen generative AI development capabilities in Japan. A key feature of this model is its ability to understand complex documents through multi-stage inference.

Furthermore, we will release a lightweight model, "Qwen3-VL-Ricoh-8B-20260227," which utilizes the technology applied in the development of this model, for free starting today. In addition, Ricoh's independently developed benchmark tool*3, specialized for evaluating reasoning performance, will also be released in the future.

[Public Access]
https://huggingface.co/ricoh-ai/Qwen-3-VL-Ricoh-8B-20260227



**1. Background of Initiatives and Societal Challenges**

LMM (Large Multimodal Model) is an AI technology capable of processing multiple types of data simultaneously, such as text, images, audio, and video. It is highly anticipated as an AI that can handle a wide range of data formats, demonstrating high performance in various tasks such as text summarization from screenshots and answering questions that include charts and tables.

Within companies, diverse documents are accumulated, including transactional data like invoices and receipts, management materials such as business strategies and plans, service manuals, internal technical standards, and quality control standards. These documents contain not only text but also figures, tables, and images. There are high expectations for their efficient use within companies and for leveraging them to create new value and innovation. On the other hand, challenges such as "text searches not yielding intended results" and "difficulty in fully utilizing documents with only search functions" have also been pointed out.

Furthermore, in recent years, there has been a demand to address management challenges such as efficient working methods to cope with labor shortages, skill transfer due to the retirement of veteran employees, and multilingualization of documents due to an increase in foreign workers. Against this backdrop, there is a growing need to efficiently utilize internal corporate knowledge through AI.

Ricoh developed a 70 billion parameter LMM in Phase 2 of GENIAC, which was implemented from August 2024, and released its basic model and independently developed benchmark tool for free. In January 2026, Ricoh also developed a compact 32 billion parameter LMM based on "Qwen2.5-VL-32B-Instruct," a large language model (LLM) family developed and provided by Alibaba Cloud in China.



**2. Achievements in This Phase**

In Phase 3, based on "Qwen3-VL-32B-Instruct*4," we developed "Qwen3-VL-Ricoh-32B-20260227," a basic model of a reasoning LMM capable of accurately understanding complex documents through multi-stage inference. This model, through innovations in learning methods such as reinforcement learning*5 and curriculum learning*6, can associate and understand charts and tables spanning multiple pages, and generate highly accurate answers even for questions with high reading comprehension difficulty. In reinforcement learning, we set unique reward functions to enhance learning efficiency while suppressing overfitting. In curriculum learning, we optimized difficulty settings and learning pace.

Through these efforts, benchmark results comparable to large commercial models such as "Gemini2.5-Pro" have been confirmed (as of February 17, 2026). To evaluate the reasoning performance of this model, Ricoh has developed its own benchmark tool.