[University Entrance Exam x Generative AI Project] Latest AI Surpasses Top Scorers at University of Tokyo and Kyoto University, Scoring Perfect Marks in Some Subjects

LifePrompt Inc. conducted a verification using latest AI models like ChatGPT 5.2 and Gemini 3 Pro against UTokyo and KyotoU entrance exams. Several models surpassed the highest human scores, demonstrating a dramatic evolution in reasoning capabilities in just one year.
調査NQ 48/100出典:PR Times

📋 Article Processing Timeline

  • 📰 Published: April 28, 2026 at 21:56
  • 🔍 Collected: April 28, 2026 at 13:31
  • 🤖 AI Analyzed: April 28, 2026 at 14:46 (1h 15m after Collected)
LifePrompt Inc. (Headquarters: Shinjuku-ku, Tokyo; CEO: Satoshi Endo) conducted a study in February 2026 to have the latest generative AI models (ChatGPT 5.2 Thinking, Gemini 3 Pro Preview, and Claude 4.5 Opus) answer entrance exam questions for the University of Tokyo and Kyoto University.

With grading cooperation from Kawaijuku instructors and KIES Inc., ChatGPT 5.2 Thinking and Gemini 3 Pro Preview scored above the highest human scores (equivalent to valedictorians) in all categories at the University of Tokyo and almost all departments/faculties at Kyoto University. Notably, perfect marks were recorded in multiple subjects, particularly in Mathematics.

Detailed results are available on note: https://note.com/lifeprompt/n/n85674c186fbc

■ Key Verification Results

▼ University of Tokyo (Most difficult: Science III / Max 550 points)
- ChatGPT 5.2 Thinking: 503.59
- Gemini 3 Pro Preview: 496.54
- Claude 4.5 Opus: 451.99
- Reference: 2026 Science III Highest Human Score: 453.60
* ChatGPT and Gemini surpassed the highest human scores in all six categories (Science I, II, III / Humanities I, II, III).

▼ Kyoto University (Most difficult: Faculty of Medicine / Max 1275 points)
- ChatGPT 5.2 Thinking: 1176.38
- Gemini 3 Pro Preview: 1122.75
- Claude 4.5 Opus: 1005.25
- Reference: 2026 Faculty of Medicine Highest Human Score: 1098.25
* ChatGPT exceeded the highest score in all 19 departments at Kyoto University, and Gemini in 18 departments. Claude also surpassed the highest score in 14 departments.

▼ Subjects with Perfect Scores
- UTokyo Math (Science/120 pts): ChatGPT, Gemini
- UTokyo Math (Humanities/80 pts): ChatGPT, Gemini
- KyotoU Math (Science/200 pts): ChatGPT, Gemini
- KyotoU Math (Humanities/150 pts): ChatGPT
- KyotoU Chemistry (100 pts): ChatGPT

In last year's verification, the top score for UTokyo Science Math was 38 points. In just one year, it reached full marks, quantitatively demonstrating the extremely high speed of AI reasoning evolution.

■ Methodology
To ensure fairness, the company used a proprietary automated exam system.
- Entrance exam PDFs were imaged by page and sent to AI models via API.
- Human intervention was eliminated by direct system-to-system communication instead of a chat interface.
- Prompts were standardized across all subjects (only high school curriculum knowledge, LaTeX output, etc.).
- No web browsing was used; answers were based solely on learned knowledge and reasoning.
- Descriptive answers were graded by Kawaijuku instructors using the same standards as for human students.

■ Analysis by Kawaijuku Instructors (Excerpts)
Instructors pointed out both strengths and weaknesses:
"All three AIs far exceeded expectations. ChatGPT's answering capability was especially astounding." (Mr. Akira Mukai, Biology)
"All AIs produced passing-level answers. The processing speed is beyond comparison with humans." (Mr. Tadashi Ogura, Japanese History)

Clear weaknesses also emerged:
- Image recognition: Challenges in recognizing structural formulas, graphs, and maps (especially Claude).
- Logical composition: Weakness in presenting logical and causal relationships despite vast knowledge.
- Output control: Frequent failures to follow character limits or physical answer sheet constraints.
- Convention bias: Missteps caused by prioritizing Western physics conventions over Japanese settings.

■ CEO Comment
"I was genuinely moved to see the highest scores ever for UTokyo. This verification clarified which tasks AI can score perfectly on and which it cannot. In business, success depends on designing tasks in a way that AI can solve. The intelligence of foundational models has been proven on the common ground of entrance exams. From here, the competition moves to specific domains—how to connect AI to proprietary data and operations for business impact. Looking at the leap from 38 points to full marks in math in one year, we must rethink our systems rather than adjusting to current AI limits."