Redmine RAG system

It was so tempting to give it the title "Oops, I did it again."

The Goal

The goal was to extract all issues from a Redmine instance, anonymize the data, and build a local RAG system for semantic search and Q&A.

Just like with the previous experiment with Bugzilla it started with data extraction. I planned to make a simple bulk download via Redmine API. Then came the first problem. Redmine’s API doesn’t return journal entries (comments) when using the project-level endpoint, even with the include=journals parameter. I tried out different ways but nothing worked. The solution was, after all, to change the strategy and fetch each issue individually via /issues/{id}.json. This was much slower but guaranteed complete data including all comments.

[Read More]

I want to have a hot shower

From Tesseract Troubles to Local VLM

It all started last summer

When my family moved to a new place. In our previous home we had a district heating service that included unlimited hot water for a fixed price. That was awesome - not very environmentally friendly and actually not very cheap, but we never ran out of hot water.

The boiler

In our new home we are independent from the city’s hot water services. This is good because we pay exactly for the energy we use. It means that we have a 300-liter hot water heater that we turn on when we want to make as much hot water as we need. In most households such a hot water boiler has a thermostat set to a specific temperature and the heater keeps all 300 liters of water as hot as it is set. I do not like this, because regardless of how great the insulation on the water tank is, it loses temperature over time. I needed a smarter system.

[Read More]

Building a Local Bugzilla RAG System

A guide to building a local RAG system for Bugzilla data using Ollama and ChromaDB.

My goal was to build a local database that could:

  • Ingest my ~4GB Bugzilla database
  • Answer questions or give advice on new bugs based on historical ones
  • Run offline on my openSUSE Tumbleweed machine, which is equipped with 64GB RAM and an AMD Ryzen 7 PRO 7840U

Naturally, my first idea was to build a standalone LLM like GPT. But fine-tuning an LLM on custom data is resource-intensive—a massive understatement. When I started to fine-tune an LLM on my laptop, I let the process run for a full week, and it reached only 1%. Using cloud-based services or investing in powerful new hardware were not options. Also, the problem with standalone LLMs is that they may hallucinate or generate inaccurate information, especially on domain-specific topics. The other disadvantage of using LLMs is that they are static; once trained, they don’t know anything that happened afterward.

[Read More]