Corpy & Co., Inc. (Headquarters: Chiyoda-ku, Tokyo, CEO: Kohei Yamamoto; hereinafter "Corpy"), an AI startup spun out of the University of Tokyo and Inria (French National Institute for Research in Digital Science and Technology) that supports the realization of mission-critical AI with XAI & QAAI technologies, is pleased to announce the release of its research and development results. These results are from the project "Research and Development on Strengthening AI Safety / Promotion of Research, Development, and Verification for Strengthening AI Safety" by the New Energy and Industrial Technology Development Organization (NEDO), specifically focusing on "Creation of an Implementation Guide for Enterprises from the Perspective of Operational Planning and Management" in the safety evaluation of generative AI, an initiative that commenced in April 2025.
**Background of this initiative: Urgent need to ensure generative AI safety and respond to international standards**
With the rapid spread of generative AI, safety-related risks such as hallucinations (output that differs from facts), prompt injection (malfunctions due to malicious input), and the generation of harmful content have become significant societal challenges. International movements for AI regulation are accelerating, with the EU AI Act being gradually enforced in Europe. In Japan, there is a strong demand for companies to establish systems for systematically managing and evaluating AI safety. In this context, ISO/IEC 42001, an international standard for AI management systems, provides a framework for organizations to address AI risks. However, this standard does not specify concrete safety evaluation methods or criteria, leaving it up to each organization to determine "what to evaluate and in what order."
**Overview and Achievements of Research and Development**
In this project, Corpy aimed to bridge the practical "gap" between the requirements of ISO/IEC 42001 and the practice of generative AI safety evaluation, and developed the following deliverables:
Deliverable ①: Report "Generative AI Safety Evaluation Protocol Based on AI Management System and its Implementation Guide" This is an implementation guide that systematizes a generative AI safety evaluation protocol, aligned with ISO/IEC 42001, into three phases (Analysis, Testing, Reporting). It organizes the series of processes from risk assessment to test plan formulation, evaluation execution, and report creation, making it easy for practitioners to grasp concretely. Using a virtual customer support system with a vision-language model (*1) as an example, it also presents specific evaluation cases such as integrated testing (*3) for jailbreak attacks (*2) and unit testing (*5) for data poisoning detection (*4). Furthermore, it raises issues and provides examples for important practical concepts such as "access" and "agency" (*6) in risk assessment, "exposure mapping" (*8) when using LLM-as-a-Judge (*7) for safety evaluation, and "chain of trust" (*9) in supply chain management.
Deliverable ②: Generative AI Safety Evaluation Template (with examples) This is a recording template corresponding to each step of the evaluation protocol. It covers all processes, from business situation analysis, stakeholder analysis, and system structure analysis to risk assessment, risk response plans, statement of applicability, test plans, test methods, and resources used for testing. Assuming a virtual chatbot system, specific examples are provided, which can be used as a reference for companies when applying it to their own AI systems.
**Features of the Deliverables**
The main features of these deliverables are as follows: * **Consistency with ISO/IEC 42001:** Clearly outlines the process from the requirements of the AI management system standard to its application in generative AI safety evaluation. * **Three-phase systematic evaluation protocol:** Clear steps of Analysis (PA) → Testing (PB) → Reporting (PC). * **Practical evaluation examples:** Presents concrete test scenarios using vision-language models. * **Templates:** Recording formats that can be used in conjunction with the evaluation protocol.
**Publication of Deliverables**
Deliverables ① and ② can be downloaded from the following link: https://corpy.app.box.com/s/fijqk4vu4nawvl15mxyt809xh3sp3jkq * ① Report (Japanese version, PDF format) * ② Evaluation Template (with examples, XLSX format) *The deliverables are scheduled to be released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license after the determination of copyright ownership.
**Future Outlook**
Corpy will leverage the knowledge gained from this project to continue contributing to the international standardization and social implementation of AI safety evaluation technology. By promoting approaches compliant with AI management system standards, including ISO/IEC 42001, and supporting the creation of an environment where companies can utilize AI with confidence, we will accelerate the realization of "mission-critical AI."
**About this Project**
Project Name: Research and Development on Strengthening AI Safety / Promotion of Research, Development, and Verification for Strengthening AI Safety Project Executor: New Energy and Industrial Technology Development Organization (NEDO) Implementation Structure: National Institute of Advanced Industrial Science and Technology (AIST), Citadel AI Inc., Corpy & Co., Inc. Corpy's 담당 theme (Assigned Theme): Creation of an implementation guide for enterprises from the perspective of operational planning and management. Period: April 2025 - March 2026
**Glossary**
*1 **Vision-Language Model (VLM):** A general term for AI models that can understand and process both images and text. They can answer questions based on images and describe the content of images.
*2 **Jailbreak Attack:** An attack method that attempts to bypass the safety constraints set for an AI using clever instructions (prompts) to elicit harmful output that should originally be rejected.
*3 **Integrated Test:** A test that verifies whether multiple components (parts) of a system work correctly as a whole when combined. Here, it confirms the overall safety of the AI system.
*4 **Data Poisoning:** An attack method that intentionally contaminates AI training data with malicious data to mislead the AI's judgment and output.
*5 **Unit Test:** A test that verifies individual components (parts) of a system in isolation. Here, specific safety items are evaluated individually.
*6 **Access and Agency:** Two important perspectives in risk assessment. "Access" refers to what kind of data and functions the AI system can interact with, and "Agency" refers to the degree to which the AI can make autonomous judgments and actions. Higher access and agency imply greater risk.
*7 **LLM-as-a-Judge:** A method that utilizes large language models (LLMs) as "evaluators" to automatically determine the safety and quality of AI output. It reduces the burden of manual evaluation while ensuring a certain level of evaluation accuracy.
*8 **Exposure Mapping:** A method for systematically identifying and visualizing parts of an AI system that are vulnerable to external attacks or unauthorized use (exposure surface).
*9 **Chain of Trust:** A concept that ensures the continuous reliability of each element (training data, models, tools, etc.) within the AI system's supply chain (each stage of development and provision). If trust is compromised at any single point, it affects the safety of the entire system.
**About Corpy & Co.**
Company Name: Corpy & Co., Inc. Established: March 2017 Head Office: 1-44-11 Kanda Jinbocho, Chiyoda-ku, Tokyo Representative Director: Kohei Yamamoto Website: https://corpy.co.jp/
**Inquiries regarding this press release**
Corpy & Co., Inc. Public Relations: pr@corpy.co.jp
FACT BOX
- Source: PR TIMES
- Category: research