Building a High-Precision Travel Guide: Optimizing RAG for a Jeju Island AI Chatbot

Introduction: The Limits of Vanilla LLMs

When building a tourism assistant for a specific location like Jeju Island, generic LLMs often fall short. They might hallucinate trail names, provide outdated cafe hours, or miss the subtle nuances of local documents. To solve this, I developed a RAG (Retrieval-Augmented Generation) based chatbot designed to provide reliable, document-grounded travel advice.

The Problem: Noise and Hallucination

In early prototypes, the system faced two major hurdles:

Semantic Mismatch: Standard vector search sometimes retrieved documents that were mathematically similar but contextually irrelevant.
Stagnant UX: Waiting for an entire paragraph to generate led to a jarring user experience, especially on mobile networks.
Document Complexity: Turning raw PDFs into a searchable database without losing the structural context was surprisingly difficult.

The Challenge: Improving Retrieval Precision

The core challenge was not just "finding" information, but "ranking" it correctly. If the top-3 retrieved chunks aren't the best ones, the LLM will provide a confident but incorrect answer. I needed a way to filter out the noise before it reached the prompt.

The Solution: A Multi-Stage RAG Pipeline

1. Robust Chunking and Embedding

Instead of simple splitting, I implemented a sliding window approach to preserve context. I used the BAAI/bge-m3 model for embeddings, known for its high performance in multilingual contexts.

# Configuration for high-quality retrieval
CHUNK_SIZE = 600
OVERLAP = 150
RAG_MODEL_NAME = 'BAAI/bge-m3'

# Implementing the sliding window chunking
def get_chunks(text, size=CHUNK_SIZE, overlap=OVERLAP):
    return [text[i:i + size] for i in range(0, len(text), size - overlap)]

2. High-Precision Reranking (The "Secret Sauce")

To solve the "semantic mismatch" problem, I introduced a Cross-Encoder reranker (Dongjin-kr/ko-reranker). The system first retrieves 15 candidates via vector search (ChromaDB) and then performs a deep contextual comparison to pick the absolute top 3.

# Two-stage retrieval logic
# 1. Initial retrieval
results = collection.query(query_texts=[query_text], n_results=15)

# 2. Reranking for precision
scores = rerank_model.predict([(query_text, chunk) for chunk in fetched_chunks])
sorted_indices = np.argsort(scores)[::-1]
final_context_chunks = [fetched_chunks[i] for i in sorted_indices[:3]]

3. Real-Time UX with FastAPI Streaming

To make the bot feel responsive, I utilized FastAPI’s StreamingResponse combined with OpenAI’s stream API. This allows the UI to render the answer character-by-character as it is generated.

@app.post("/chat")
async def chat(request: Request):
    # ... (retrieval logic)
    def generate():
        response = client.chat.completions.create(
            model=GPT_MODEL,
            messages=messages,
            stream=True # Enabling real-time streaming
        )
        for chunk in response:
            content = chunk.choices[0].delta.content
            if content:
                yield content

    return StreamingResponse(generate(), media_type="text/event-stream")

Key Takeaways

Reranking is Essential: Moving from simple similarity search to a reranked pipeline significantly reduced hallucinations by ensuring the LLM only sees the most relevant data.
Overlap Matters: A 150-character overlap was the "sweet spot" for maintaining context between chunks, preventing the model from losing information at the split points.
Async/Streaming UX: In modern web apps, the perception of speed (streaming) is often more important than the actual execution time.

By combining ChromaDB, BGE-M3, and a Ko-Reranker, I successfully built a system that doesn't just "chat," but provides verified, high-quality information for travelers exploring the beauty of Jeju Island.

Building a High-Precision Travel Guide: Optimizing RAG for a Jeju Island AI Chatbot

Introduction: The Limits of Vanilla LLMs

The Problem: Noise and Hallucination

The Challenge: Improving Retrieval Precision

The Solution: A Multi-Stage RAG Pipeline

1. Robust Chunking and Embedding

2. High-Precision Reranking (The "Secret Sauce")

3. Real-Time UX with FastAPI Streaming

Key Takeaways

Comments

More from this blog

Cracking the Recycling VQA Challenge: From Baseline (0.70) to 0.901 with Qwen2.5-VL

Understanding Python Classes: A Beginner's Guide to Object-Oriented Programming

파이썬 초보 탈출을 위한 클래스(Class) 핵심 가이드 (feat. 던더 메서드, 상속)

Mastering Python List Structures: A Detailed Look at Features and Operations

Command Palette

Introduction: The Limits of Vanilla LLMs

The Problem: Noise and Hallucination

The Challenge: Improving Retrieval Precision

The Solution: A Multi-Stage RAG Pipeline

1. Robust Chunking and Embedding

2. High-Precision Reranking (The "Secret Sauce")

3. Real-Time UX with FastAPI Streaming

Key Takeaways

Comments

More from this blog