RAG AI Chatbot 'chai+': Announcement of Patent Acquisition for '3-Stage Hybrid Search Engine' Combining Vector, Keyword, and Semantic Re-ranking
Defide Inc. has acquired a patent for its unique search engine technology used in the RAG-type AI chatbot 'chai+'. By combining three methods—vector search, keyword search, and semantic re-ranking—it structuraly suppresses hallucinations and achieves high accuracy even with specialized terminology.
📋 Article Processing Timeline
- 📰 Published: April 28, 2026 at 23:00
- 🔍 Collected: April 28, 2026 at 14:31
- 🤖 AI Analyzed: April 28, 2026 at 15:53 (1h 21m after Collected)
AI/DX consulting firm Defide Inc. (Akasaka, Minato-ku, Tokyo; CEO: Tetsuya Yamamoto) is pleased to announce that it has acquired Patent No. 7851525, 'A program that searches documents and responds to user questions,' for its RAG-type AI chatbot 'chai+.'
This patented technology is a unique hybrid search engine that combines three search methods: (1) Vector Search (semantic similarity), (2) Keyword Search (token match), and (3) Semantic Re-ranking. It has been recognized as the core technology of chai+ that simultaneously solves the issues of missed information and hallucinations (AI factual errors) that were unavoidable with single search methods.
■ Background: Why is Generative AI often called 'Unusable'?
General-purpose generative AI, such as ChatGPT, has a fundamental limitation in that it cannot use company-specific information (internal regulations, product manuals, contracts, FAQs, etc.) for answers. Furthermore, even when RAG (Retrieval-Augmented Generation) is introduced, configurations relying solely on vector search often hit 'semantically similar but irrelevant documents' or suffer a significant drop in accuracy for questions containing technical jargon or proper nouns.
▶ Three Limitations of Conventional RAG:
1. Low accuracy for questions including technical terms/proper nouns with vector search alone.
2. Inability to hit relevant documents with keyword search alone when wording differs.
3. Insufficient search accuracy directly causing hallucinations.
The 3-stage hybrid search engine was developed to fundamentally resolve these bottlenecks.
■ Patent No. 7851525: How the 3-Stage Hybrid Search Engine Works
Title of Invention: Program for searching documents and responding to user questions
Patent Information (J-PlatPat): https://www.j-platpat.inpit.go.jp/c1801/PU/JP-7851525/15/ja
The core of this patent lies in dividing documents into chunks (small search units) per page and building/maintaining two types of indices for each chunk: 'Embedding Vector' and 'Token (Keyword)' in parallel. For each query, the optimal chunk is selected through three search phases, and the source document to which that chunk belongs is clearly indicated to the user in the response.
[Search Phases]
STEP 1: Vector Search (Embedding Vector) - Rapidly searches for chunks semantically similar to the question in vector space. Accurately hits 'semantically close' documents even if keywords differ.
STEP 2: Keyword Search (Token/BM25) - Searches for chunks matching keywords in the question based on tokens. Covers expressions that semantic search is poor at, such as technical jargon and proper nouns.
STEP 3: Semantic Re-ranking (Re-ranking score) - Integrates results from STEP 1 & 2 and makes the final selection of the chunk with the highest semantic relevance to the question. Maximizes accuracy through three-stage filtering.
By generating responses based only on chunks selected through this 3-stage process, the risk of hallucinations is structurally suppressed. Furthermore, by showing the source document, it ensures both reliability and transparency of the answer.
■ Comparison with Conventional RAG / General Generative AI
- Search Method: Conventional is single (vector only); chai+ is 3-stage hybrid (patented).
- Technical Terms/Proper Nouns: Conventional misses them; Keyword search in chai+ aims for zero misses.
- Hallucination: Frequent in conventional; Greatly reduced risk in chai+ by using only chunks based on company documents.
- Evidence: Black box in conventional; Clearly indicated source document in chai+.
■ Implications for Corporate AI Utilization
▶ Business Challenges Solved by this Patent:
- Handling inquiries about internal regulations and manuals - Significantly reducing HR, legal, and general affairs workload.
- Support FAQ based on product specs/technical docs - Realizing high-precision customer response with suppressed error risk.
- Search/Summary of contracts and reports - Instant retrieval from vast internal document stores.
This patented technology is a unique hybrid search engine that combines three search methods: (1) Vector Search (semantic similarity), (2) Keyword Search (token match), and (3) Semantic Re-ranking. It has been recognized as the core technology of chai+ that simultaneously solves the issues of missed information and hallucinations (AI factual errors) that were unavoidable with single search methods.
■ Background: Why is Generative AI often called 'Unusable'?
General-purpose generative AI, such as ChatGPT, has a fundamental limitation in that it cannot use company-specific information (internal regulations, product manuals, contracts, FAQs, etc.) for answers. Furthermore, even when RAG (Retrieval-Augmented Generation) is introduced, configurations relying solely on vector search often hit 'semantically similar but irrelevant documents' or suffer a significant drop in accuracy for questions containing technical jargon or proper nouns.
▶ Three Limitations of Conventional RAG:
1. Low accuracy for questions including technical terms/proper nouns with vector search alone.
2. Inability to hit relevant documents with keyword search alone when wording differs.
3. Insufficient search accuracy directly causing hallucinations.
The 3-stage hybrid search engine was developed to fundamentally resolve these bottlenecks.
■ Patent No. 7851525: How the 3-Stage Hybrid Search Engine Works
Title of Invention: Program for searching documents and responding to user questions
Patent Information (J-PlatPat): https://www.j-platpat.inpit.go.jp/c1801/PU/JP-7851525/15/ja
The core of this patent lies in dividing documents into chunks (small search units) per page and building/maintaining two types of indices for each chunk: 'Embedding Vector' and 'Token (Keyword)' in parallel. For each query, the optimal chunk is selected through three search phases, and the source document to which that chunk belongs is clearly indicated to the user in the response.
[Search Phases]
STEP 1: Vector Search (Embedding Vector) - Rapidly searches for chunks semantically similar to the question in vector space. Accurately hits 'semantically close' documents even if keywords differ.
STEP 2: Keyword Search (Token/BM25) - Searches for chunks matching keywords in the question based on tokens. Covers expressions that semantic search is poor at, such as technical jargon and proper nouns.
STEP 3: Semantic Re-ranking (Re-ranking score) - Integrates results from STEP 1 & 2 and makes the final selection of the chunk with the highest semantic relevance to the question. Maximizes accuracy through three-stage filtering.
By generating responses based only on chunks selected through this 3-stage process, the risk of hallucinations is structurally suppressed. Furthermore, by showing the source document, it ensures both reliability and transparency of the answer.
■ Comparison with Conventional RAG / General Generative AI
- Search Method: Conventional is single (vector only); chai+ is 3-stage hybrid (patented).
- Technical Terms/Proper Nouns: Conventional misses them; Keyword search in chai+ aims for zero misses.
- Hallucination: Frequent in conventional; Greatly reduced risk in chai+ by using only chunks based on company documents.
- Evidence: Black box in conventional; Clearly indicated source document in chai+.
■ Implications for Corporate AI Utilization
▶ Business Challenges Solved by this Patent:
- Handling inquiries about internal regulations and manuals - Significantly reducing HR, legal, and general affairs workload.
- Support FAQ based on product specs/technical docs - Realizing high-precision customer response with suppressed error risk.
- Search/Summary of contracts and reports - Instant retrieval from vast internal document stores.