Skip to main content

Command Palette

Search for a command to run...

Building a High-Precision Travel Guide: Optimizing RAG for a Jeju Island AI Chatbot

Updated
3 min read

Introduction: The Limits of Vanilla LLMs

When building a tourism assistant for a specific location like Jeju Island, generic LLMs often fall short. They might hallucinate trail names, provide outdated cafe hours, or miss the subtle nuances of local documents. To solve this, I developed a RAG (Retrieval-Augmented Generation) based chatbot designed to provide reliable, document-grounded travel advice.


The Problem: Noise and Hallucination

In early prototypes, the system faced two major hurdles:

  1. Semantic Mismatch: Standard vector search sometimes retrieved documents that were mathematically similar but contextually irrelevant.

  2. Stagnant UX: Waiting for an entire paragraph to generate led to a jarring user experience, especially on mobile networks.

  3. Document Complexity: Turning raw PDFs into a searchable database without losing the structural context was surprisingly difficult.


The Challenge: Improving Retrieval Precision

The core challenge was not just "finding" information, but "ranking" it correctly. If the top-3 retrieved chunks aren't the best ones, the LLM will provide a confident but incorrect answer. I needed a way to filter out the noise before it reached the prompt.


The Solution: A Multi-Stage RAG Pipeline

1. Robust Chunking and Embedding

Instead of simple splitting, I implemented a sliding window approach to preserve context. I used the BAAI/bge-m3 model for embeddings, known for its high performance in multilingual contexts.

# Configuration for high-quality retrieval
CHUNK_SIZE = 600
OVERLAP = 150
RAG_MODEL_NAME = 'BAAI/bge-m3'

# Implementing the sliding window chunking
def get_chunks(text, size=CHUNK_SIZE, overlap=OVERLAP):
    return [text[i:i + size] for i in range(0, len(text), size - overlap)]

2. High-Precision Reranking (The "Secret Sauce")

To solve the "semantic mismatch" problem, I introduced a Cross-Encoder reranker (Dongjin-kr/ko-reranker). The system first retrieves 15 candidates via vector search (ChromaDB) and then performs a deep contextual comparison to pick the absolute top 3.

# Two-stage retrieval logic
# 1. Initial retrieval
results = collection.query(query_texts=[query_text], n_results=15)

# 2. Reranking for precision
scores = rerank_model.predict([(query_text, chunk) for chunk in fetched_chunks])
sorted_indices = np.argsort(scores)[::-1]
final_context_chunks = [fetched_chunks[i] for i in sorted_indices[:3]]

3. Real-Time UX with FastAPI Streaming

To make the bot feel responsive, I utilized FastAPI’s StreamingResponse combined with OpenAI’s stream API. This allows the UI to render the answer character-by-character as it is generated.

@app.post("/chat")
async def chat(request: Request):
    # ... (retrieval logic)
    def generate():
        response = client.chat.completions.create(
            model=GPT_MODEL,
            messages=messages,
            stream=True # Enabling real-time streaming
        )
        for chunk in response:
            content = chunk.choices[0].delta.content
            if content:
                yield content

    return StreamingResponse(generate(), media_type="text/event-stream")

Key Takeaways

  • Reranking is Essential: Moving from simple similarity search to a reranked pipeline significantly reduced hallucinations by ensuring the LLM only sees the most relevant data.

  • Overlap Matters: A 150-character overlap was the "sweet spot" for maintaining context between chunks, preventing the model from losing information at the split points.

  • Async/Streaming UX: In modern web apps, the perception of speed (streaming) is often more important than the actual execution time.

By combining ChromaDB, BGE-M3, and a Ko-Reranker, I successfully built a system that doesn't just "chat," but provides verified, high-quality information for travelers exploring the beauty of Jeju Island.

More from this blog

파이썬 초보 탈출을 위한 클래스(Class) 핵심 가이드 (feat. 던더 메서드, 상속)

이번 문서에서는 파이썬 객체 지향 프로그래밍(OOP)의 근간을 이루는 **클래스(Class)**에 대해 정리해보고자 한다. 클래스의 기본적인 개념과 용어를 정의하고, 실제 예제를 통해 클래스를 설계하고 활용하는 방법을 알아본다. 나아가 지난 '리스트(List) 심층 분석'에서 잠시 다루었던 **던더 메서드(Dunder Method)**가 클래스 내에서 어떻게 연산자 오버로딩과 객체 표현을 가능하게 하는지, 그리고 클래스 변수와 상속의 개념까지 ...

Jun 11, 20255 min read3
파이썬 초보 탈출을 위한 클래스(Class) 핵심 가이드 (feat. 던더 메서드, 상속)
E

Elios.devlog

6 posts

Want to be a AI backend Programmer! AI 백엔드 개발자를 꿈꾸고 있습니다!