Stockmark Inc. Gold Sponsors JSAI2026, Presents Paper Verifying Limitations of Existing VLMs in Complex Document Understanding
Stockmark Inc. will be a gold sponsor for the 2026 Annual Conference of the Japanese Society for Artificial Intelligence (JSAI2026) and will present a paper verifying the limitations of existing Visual Language Models (VLMs) in understanding complex documents. The research aims to explore the extent to which AI can accurately comprehend intricate real-world documents.
📋 Article Processing Timeline
- 📰 Published: April 28, 2026 at 22:00
- 🔍 Collected: April 28, 2026 at 13:31
- 🤖 AI Analyzed: April 28, 2026 at 15:24 (1h 52m after Collected)
Stockmark Inc. (Headquarters: Minato-ku, Tokyo; Representative Director CEO: Tatsu Hayashi; hereinafter: "the Company"), which independently develops domestic generative AI foundations and provides generative AI services, announces its gold sponsorship for the "2026 Annual Conference of the Japanese Society for Artificial Intelligence (40th, hereinafter JSAI2026)" to be held in June 2026.
Furthermore, at this conference, Takashi and Aita, researchers from the Company's LLM organization, will present a paper on benchmarking large-scale visual language models (VLMs) targeting agricultural calendar images.
This research does not indicate new business development in the agricultural sector, but rather aims to verify the cross-industry technical challenge of how accurately AI can understand complex documents in the real world, using "cultivation calendars"—a representative example of extremely difficult-to-analyze documents. The Company will leverage the insights gained from this research to advance structuralization technologies for business documents, specifications, drawings, internal knowledge, and foundational technologies supporting corporate AI BPR.
Background of Sponsorship
The Company's mission is "to reinvent the mechanism of value creation and advance humanity." We are engaged in product development and research and development utilizing cutting-edge natural language processing technology and generative AI technologies such as LLMs. Currently, we provide "Aconnect," an AI agent that supports R&D sites in manufacturing, and "SAT Agent Cockpit," which structures complex internal data into an AI-utilizable format, thereby promoting corporate AI BPR (Business Process Re-engineering).
The Japanese Society for Artificial Intelligence (JSAI), which brings together top-class AI researchers in Japan, plays an extremely important role in the development of AI technology in Japan and its application to real society. To further vitalize and develop the domestic AI research community, the Company will again sponsor as a gold sponsor this year.
Presented Paper: Verification of VLM Limitations Using Agricultural Calendars
At this conference, Takashi and Aita of the Company will present research on "FiT-QA: VQA Benchmark for Cultivation Calendars – Dataset Construction and Limitations of General-Purpose VLMs –".
A cultivation calendar is a practical document that compactly integrates operational information regarding crop cultivation, along with tables, figures, photos, annotations, and time-series information, all on a single page. Compared to general documents, it requires reading information across multiple domains and integrating reasoning according to context, making it an extremely difficult comprehension target for AI.
In this research, we propose a VQA (Visual Question Answering) benchmark "FiT-QA (Figures and Tables Question Answering)" for these cultivation calendar images. FiT-QA consists of easy-QA, which was automatically generated and then manually edited and confirmed, and difficult-QA, which was manually created to require integrated reasoning across multiple domains, containing 347 images and 1,152 QA pairs. Evaluation with high-performance general-purpose VLMs showed that errors remained even in easy-QA, and correct answers were limited in difficult-QA.
This clarifies the limitations of directly applying existing general-purpose VLMs to complex practical documents in the real world and releases FiT-QA as a practical benchmark for future model development and evaluation.
Presentation Session Details
・Title: FiT-QA: VQA Benchmark for Cultivation Calendars – Dataset Construction and Limitations of General-Purpose VLMs
・Presenters: Kosuke Takahashi, Hayato Aita (Stockmark Inc.)
Kazuki Miyawaki, Sumire Nakagawa, Taichi Kimura (Otaru University of Commerce)
Kazuma Kadowaki (Japan Research Institute, Ltd.)
Akio Kobayashi, Masahiro Otomo, Junichi Ishihara, Kenta Baba (National Agriculture and Food Research Organization, Institute of Agricultural Information)
・Date and Time: June 9, 2026 (Tuesday) 14:00〜15:30
・Venue: Venue Y (Exhibition Hall AB-1)
・Session URL: https://pub.confit.atlas.jp/ja/event/jsai2026/presentation/2Yin-A-50
Aim of Paper Presentation: From Agricultural Document Research to Business Document Analysis
The "FiT-QA: Cultivation Calendar VQA Benchmark" being presented this time, at first glance, appears to be research in the agricultural domain, which differs from our Company's primary business domain. However, the underlying technical challenge is deeply connected to our business.
Stockmark has consistently positioned "advanced comprehension of complex documents" as a crucial theme in both R&D and solution provision. Through "Aconnect" and "SAT Agent Cockpit," we accurately structure practical documents that contain a mixture of tables, figures, and annotations, converting them into an AI-utilizable format.
Furthermore, at this conference, Takashi and Aita, researchers from the Company's LLM organization, will present a paper on benchmarking large-scale visual language models (VLMs) targeting agricultural calendar images.
This research does not indicate new business development in the agricultural sector, but rather aims to verify the cross-industry technical challenge of how accurately AI can understand complex documents in the real world, using "cultivation calendars"—a representative example of extremely difficult-to-analyze documents. The Company will leverage the insights gained from this research to advance structuralization technologies for business documents, specifications, drawings, internal knowledge, and foundational technologies supporting corporate AI BPR.
Background of Sponsorship
The Company's mission is "to reinvent the mechanism of value creation and advance humanity." We are engaged in product development and research and development utilizing cutting-edge natural language processing technology and generative AI technologies such as LLMs. Currently, we provide "Aconnect," an AI agent that supports R&D sites in manufacturing, and "SAT Agent Cockpit," which structures complex internal data into an AI-utilizable format, thereby promoting corporate AI BPR (Business Process Re-engineering).
The Japanese Society for Artificial Intelligence (JSAI), which brings together top-class AI researchers in Japan, plays an extremely important role in the development of AI technology in Japan and its application to real society. To further vitalize and develop the domestic AI research community, the Company will again sponsor as a gold sponsor this year.
Presented Paper: Verification of VLM Limitations Using Agricultural Calendars
At this conference, Takashi and Aita of the Company will present research on "FiT-QA: VQA Benchmark for Cultivation Calendars – Dataset Construction and Limitations of General-Purpose VLMs –".
A cultivation calendar is a practical document that compactly integrates operational information regarding crop cultivation, along with tables, figures, photos, annotations, and time-series information, all on a single page. Compared to general documents, it requires reading information across multiple domains and integrating reasoning according to context, making it an extremely difficult comprehension target for AI.
In this research, we propose a VQA (Visual Question Answering) benchmark "FiT-QA (Figures and Tables Question Answering)" for these cultivation calendar images. FiT-QA consists of easy-QA, which was automatically generated and then manually edited and confirmed, and difficult-QA, which was manually created to require integrated reasoning across multiple domains, containing 347 images and 1,152 QA pairs. Evaluation with high-performance general-purpose VLMs showed that errors remained even in easy-QA, and correct answers were limited in difficult-QA.
This clarifies the limitations of directly applying existing general-purpose VLMs to complex practical documents in the real world and releases FiT-QA as a practical benchmark for future model development and evaluation.
Presentation Session Details
・Title: FiT-QA: VQA Benchmark for Cultivation Calendars – Dataset Construction and Limitations of General-Purpose VLMs
・Presenters: Kosuke Takahashi, Hayato Aita (Stockmark Inc.)
Kazuki Miyawaki, Sumire Nakagawa, Taichi Kimura (Otaru University of Commerce)
Kazuma Kadowaki (Japan Research Institute, Ltd.)
Akio Kobayashi, Masahiro Otomo, Junichi Ishihara, Kenta Baba (National Agriculture and Food Research Organization, Institute of Agricultural Information)
・Date and Time: June 9, 2026 (Tuesday) 14:00〜15:30
・Venue: Venue Y (Exhibition Hall AB-1)
・Session URL: https://pub.confit.atlas.jp/ja/event/jsai2026/presentation/2Yin-A-50
Aim of Paper Presentation: From Agricultural Document Research to Business Document Analysis
The "FiT-QA: Cultivation Calendar VQA Benchmark" being presented this time, at first glance, appears to be research in the agricultural domain, which differs from our Company's primary business domain. However, the underlying technical challenge is deeply connected to our business.
Stockmark has consistently positioned "advanced comprehension of complex documents" as a crucial theme in both R&D and solution provision. Through "Aconnect" and "SAT Agent Cockpit," we accurately structure practical documents that contain a mixture of tables, figures, and annotations, converting them into an AI-utilizable format.