Story behind building a production-grade RAG chatbot for campus KKP/PI guidelines-from hybrid search to contextual conversations that turned out to be more complex than expected
One thing I noticed during college is that students often struggle more with understanding procedures than with the coursework itself.
In my campus, once students enter semester 6, they are required to take either KKP (internship program) or PI (research project). These courses are considered one of the most important stages before graduation, but they also create a lot of confusion.
Ironically, the official guideline book already exists.
But even with the handbook available, students still repeatedly ask questions like:
At first, I thought:
“Maybe students are just lazy to read.”
"
But after observing more carefully, I realized the real problem was different.
The handbook itself was long, dense, and difficult to navigate when someone only needed one specific answer. Most students did not want to read dozens of pages just to find one procedure.
As a result:
Some students were even too hesitant or embarrassed to ask lecturers directly, so they simply guessed the procedure themselves.
That was the moment I started thinking:
“What if students could simply chat with the handbook instead of manually searching through it?”
"
That simple question became the starting point of this project.
This project was fully self-initiated.
No lecturer asked me to build it. It was not a class assignment. I simply felt the problem was real enough to solve.
At first, my goal was actually very simple:
create a chatbot that could answer questions based on the official KKP/PI guideline documents.
"
But once I started building it, I quickly realized that creating a “working chatbot” was easy.
Creating a chatbot that gives answers that are actually helpful was much harder.
That realization completely changed the way I approached AI systems.
When I first built the system, I used a simple RAG pipeline.
Technically, it worked.
The chatbot could answer questions. The retrieval system returned relevant chunks. The responses looked convincing.
But when I tested it myself, something felt off.
Some answers were technically correct but still confusing. Some lacked context. Some referenced incomplete information. Some sounded intelligent while still being unhelpful.
That was the first time I realized something important:
In many AI systems, the biggest problem is not the model. It is how information is retrieved and structured.
"
From that point onward, I spent much more time improving retrieval architecture than changing the LLM itself.
One of the biggest technical challenges was document chunking.
At first, I used a common fixed-size chunking approach because that was what most tutorials recommended.
But the KKP/PI guideline documents had hierarchical structures:
When the document was split carelessly into fixed token sizes, the meaning broke apart.
For example:
The retrieval was still “technically relevant,” but the answers no longer felt useful for humans.
That was when I implemented a parent-child chunking strategy:
This increased complexity significantly:
But the quality improvement was worth it.
And honestly, this was the project that made me truly understand that building AI products is often more about system design than model selection.
One of the most valuable lessons from this project came from evaluation.
I used RAGAS to evaluate the system using metrics like:
At one point, I became too focused on improving the numbers.
I kept optimizing the system to increase evaluation scores.
And the scores improved.
But when I manually tested the chatbot again, the experience actually became worse.
The answers became:
That was the moment I realized:
Good evaluation metrics do not automatically create good user experiences.
"
So I changed my priorities.
Instead of optimizing purely for metrics, I started optimizing for:
Some metrics decreased slightly afterward. But the actual chatbot experience became much better.
This project taught me that engineering is often about balancing trade-offs, not maximizing a single number.
Another challenge was handling follow-up questions.
Humans naturally ask questions like:
“Then how long is it?”
"
But for a retrieval system, that sentence has almost no meaning without previous context.
So I started implementing:
The system learned to distinguish:
I also implemented query reformulation so ambiguous questions could be rewritten into clearer search queries before retrieval.
This made the chatbot feel much less robotic and much more conversational.
And strangely, this became one of my favorite parts of the project.
Because I realized I was no longer just building “an AI model.”
I was designing how humans interact with information.
This project taught me far more than RAG pipelines or vector databases.
It changed the way I think about software engineering itself.
I learned:
More importantly, I learned that good engineering is not about adding complexity everywhere.
It is about understanding:
Although this project started as a personal experiment, it solved a real problem inside my campus environment.
The chatbot helps students:
For me personally, this project became one of the experiences that pushed me deeper into:
It also made me realize that I genuinely enjoy building systems that combine:
And I think that realization is ultimately the most valuable outcome of this project.
Fullstack & AI Engineer passionate about building intelligent systems. Sharing insights on web development, AI, and software engineering.
Learn More →