Automatically classify personal notes into 5 categories with LLM (Ollama) — distill pipeline design and pitfalls
🔗 Series Table of Contents: This article is the Implementation Edition (3) of the AI Assistant Operations Notes - Practical record for raising Copilot / Claude Code as your partner series.
What you can learn from this article
- When operating a personal RAG “I write notes, but I forget to promote them to lessons learned” Problem structure
- Mechanism of 5 category automatic classification with LLM (Ollama / Llama 3 Japanese specialized)
- If implemented sloppily, accidents such as “I am sorry” will be classified as a lesson learned will occur, how to deal with them.
- How to use JSON mode and fallback design if it still fails
- Prevent reprocessing state file + differential detection in mtime
- Effects and points to note after one month of operation
Target audience
- Those who are operating a personal RAG and are experiencing an increase in “unfinished notes”
- Those who want to incorporate local LLM (Ollama) into practical batch processing
- Those who want to know the know-how to reduce unexpected misclassifications in LLM classification tasks
Operating environment
| Item | Version |
|---|---|
| Python | 3.13 (venv) |
| Ollama | Server started (localhost:11434) |
| LLM for classification | microai/suzume-llama3 (Japanese specialized Llama 3) |
| Embedding | nomic-embed-text (for ChromaDB input) |
| ChromaDB | Persistence mode |
1. Introduction — “Keep writing notes” problem
When operating a personal RAG, the first step is to increase the amount of “material” input into ChromaDB. I used the rag note "..." command to create a system that allows me to leave short notes the moment I think of something.
After a few weeks, data/diary/ had accumulated hundreds of note entries.
# 2026-04-30
## 04:09
Lesson learned for Qiita 429: The value of the Rate-Reset header is the only correct answer.
Don't follow Gemini or LLM's assumptions (wait 24 hours, etc.).
## 12:30
2026-04-30 Implemented my-rag-brain distill function.
Background: I originally wrote all my notes in note (diary/),
Manual operation to promote as lesson/idea/knowledge/profile...## 15:51
Countermeasures against distill misclassification: The issue of AI's apology letters and operating instructions being misclassified as lessons will be addressed in two steps...
If you pull rag search "Qiita Rate-Reset" in a vector search, you will find the contents of the note. But something is missing.
What was missing was categorization such as This should be used as a lesson to prevent recurrence'' and This should be saved as an idea for the future.” In the note state, the search results are evenly mixed. I would like to distinguish between “lessons I wrote at that time” and “ideas I wrote at that time.”
At first I tried to do manual classification. data/lessons/ data/ideas/ and upgrade it while reading the note.
**I gave up after 3 days. ** Always forget about manual operations. I forget even right after I write it.
That’s what I thought.
**Maybe this should be classified as LLM **
Ollama is running locally. Llama 3 series seems to be able to distinguish Japanese notes. Read a month’s worth of notes at once and automatically sort them into 5 categories I decided to create a batch.
This article is about its implementation and an accident that naturally occurred when it was made sloppily.
2. Definition of 5 categories
First, decide what category you want to put it into.
| Categories | What to include | Examples |
|---|---|---|
lesson | Specific rules to prevent recurrence that you should do this from next time | ”Rate-Reset header is the only correct answer” |
idea | Ideas/conceptions that you would like to implement/realize in the future | “Convert personal RAG to MCP and connect it to Copilot” |
knowledge | Technical specifications/API specifications/objective facts | ”nomic-embed-text has 768 dimensions” |
profile | Values/beliefs/motivations/obsessions | “Prioritize human readability over machine optimization” |
conclusion | Judgment arrived at through discussion and consideration | ”Qiita = summary version, Astro = complete version” |
none | None of the above (work records, apologies, procedure manuals) | “Sorry, we will correct it” |
none was made independent as a countermeasure for the “apology lesson incident”, which I will write about later.
3. Naive implementation and immediate accidents
At first I sloppily threw it at LLM.```python prompt = f""" Please classify the text below as lesson / idea / knowledge / profile / conclusion.
Text: {text}
Category: """ response = ollama.chat(model=“llama3.2:3b”, messages=[…])
Now it's working. I tested about 30 items and got a **classification like that**. I was happy to pass it all on.
When I checked the results, I found a large amount of apology letters in the **`lessons/` directory.
```markdown
## 22:43 [auto-distilled]
Sorry. Be careful next time so you don't repeat the same mistake.
**Original text (excerpt)**:
Sorry. The previous execution was incorrect.
---
LLM probably reasoned:
“Don’t make the same mistake again” “I’ll be more careful next time” = words that sound like a lesson = lesson!
Certainly it reads that way, but this is an apology letter immediately after the AI accident, not a lesson itself. The lesson lies elsewhere in sentences like “Rate-Reset header is the only correct answer” that I wrote by hand.
Other common misclassifications include:
- “Please paste the following as is into Gemini Ultra” → Classified as idea (actually operation instructions)
- “That’s right” → Classified as conclusion (actually, agreement)
- Short sentence with only 3 lines → Forcibly classified as lesson (Actually, there is no basis for judgment)
Trying to force nonsensical text into a category. This is LLM Classification Trap No. 1.
4. Countermeasure — Tell LLMs that they can choose “none”
When I broke down the problem, it looked like this.
4.1 LLMs are reluctant to choose “Not applicable”
LLMs are not often trained to judge information that is not an option as “not applicable.” There is a tendency to force one choice out of the options given.
Countermeasure: Specify none as an option + Write “Example where you can select none” in the prompt.
CLASSIFY_PROMPT = """\
Please read the note entries below and place them into the most appropriate category.Categories and selection criteria:
- lesson : 「次回からこうすべき」という具体的な再発防止ルールが書かれている
- idea : 将来実装・実現したいアイデア・構想が書かれている
- knowledge : 技術仕様・API仕様・ツールの客観的な事実が書かれている
- profile : 作者の価値観・信念・動機・こだわりが書かれている
- conclusion : 議論・検討を通じて導き出した導出・合意・設計判断が書かれている
- none : 上記に当てはまらない(作業記録・謝罪文・操作手順・会話のやり取りなど)
Example of selecting none:
- 「申し訳ありません」「その通りです」などの謝罪・同意の文
- 「以下を実行してください」などの操作指示・手順書
- 単なる作業経緯の説明(何をしたか の記録)
- 短すぎてルールや洞察が読み取れない断片
Note entry:
---
{text}
---
Thought process:
1. このエントリに「次回に活かせる洞察・ルール・知見」が含まれているか?
2. 含まれている場合: 上記カテゴリのどれが最も近いか
3. If not included: select none
出力(JSONのみ。コードブロック・余分なテキスト不要):
{{"reasoning": "One sentence why you chose that category", "type": "lesson|idea|knowledge|profile|conclusion|none", "summary": "Summary in one sentence (empty if none)"}}
"""
Three points:
noneを選ぶ例を3〜4個明示 — 謝罪・指示・短文を具体例として LLM に見せる- 思考プロセスを言語化させる — 「次回に活かせる洞察があるか?」を最初に問う
reasoningを JSON に含める — LLM 自身に判断理由を書かせることで、安直な分類を抑える
これだけで誤分類が 8割以上減りました。
4.2 Pre-filter before throwing to LLM
それでも残るパターンには、Python 側で事前フィルタを入れました。
def classify_entry(text: str) -> dict:
stripped = text.strip()
# フィルタ1: 短すぎる断片は問答無用で none
if len(stripped) < 50:
return {"type": "none", "summary": ""}# Filter 2: Typical opening pattern of apology, agreement, and procedure introduction
apology_patterns = ("Sorry", "That's right", "As you said", "As you pointed out", "Continue as below")
if any(stripped.startswith(p) for p in apology_patterns):
return {"type": "none", "summary": ""}
# Once you've reached this point, submit to LLM
prompt = CLASSIFY_PROMPT.format(text=text[:600])
# ...
Less than 50 characters and the beginning of the apology pattern Play before contacting LLM. It lowers the cost of calling Ollama and cuts off a major source of misclassification.
4.3 Use JSON mode + fallback regex
Ollama (and the Llama-based models behind it) had individual differences in that even if you asked them to “output only JSON” at the prompt, they would preface it with “Yes, I understand” or surround it with a code block.
Solution: Force JSON mode with Ollama’s format="json" option.
resp = ollama.chat(
model=LLM_MODEL,
messages=[{"role": "user", "content": prompt}],
options={"temperature": 0.1}, # Suppress fluctuation
format="json", # JSON mode: Always output valid JSON
)
Even so, it occasionally fails, so I’ll include a fallback.
raw = resp["message"]["content"].strip()
try:
result = json.loads(raw)
except json.JSONDecodeError:
# Fallback: safety valve in case format="json" fails in rare cases
m = re.search(r"\{[^{}]+\}", raw, re.DOTALL)
if not m:
return {"type": "none", "summary": ""}
result = json.loads(m.group())
````**Don't expect perfection**. If the premise is that it can fail, it can be saved with a fallback. I feel that this is a golden rule when using LLM in practical batches.
---
## 5. Promotion process — Add to category files + Insert ChromaDB
Once the classification results are finalized, add them to the corresponding `data/<カテゴリ>/YYYY-MM-DD.md`.
```python
TYPE_DIRS = {
"idea": DATA_DIR / "ideas",
"lesson": DATA_DIR / "lessons",
"knowledge": DATA_DIR / "knowledge",
"profile": DATA_DIR / "profile",
"conclusion": DATA_DIR / "conclusions",
}
def promote_entry(entry_type, time_str, summary, original, source_date, dry_run):
target_dir = TYPE_DIRS.get(entry_type)
if not target_dir:
return # none does not create a file
target_dir.mkdir(parents=True, exist_ok=True)
target_file = target_dir / f"{source_date}.md"
if not target_file.exists():
target_file.write_text(
f"# {source_date} {LABEL_BY_TYPE[entry_type]} (automatic extraction)\n\n",
encoding="utf-8"
)
block = f"""
## {time_str} [auto-distilled]
{summary}
**Original text (excerpt)**:
{original[:400]}
---
"""
with target_file.open("a", encoding="utf-8") as f:
f.write(block)
# Input to ChromaDB at the same time
ingest_file(str(target_file), entry_type, tags=f"{source_date},auto-distilled")
Three ideas:1. Promoted to a separate file from the original note — Leave the original note untouched (so that humans can review it later)
2. Record both summary + original text excerpt — Even if the LLM makes a mistake in the summary, it can be determined by looking at the original text.
3. Inject into ChromaDB at the same time as promotion — Put auto-distilled in the tag to narrow down “automatically classified” later.
6. Reprocessing prevention — state file + mtime check
Distill is a heavy process. It queries Ollama one entry at a time, so it takes several tens of seconds to several minutes for 100 notes.
Since it would be wasteful to reprocess all files each time, we added a mechanism to record the processed files in the state.
STATE_FILE = DATA_DIR / ".distill_state.json"
def load_state() -> dict:
if STATE_FILE.exists():
return json.loads(STATE_FILE.read_text(encoding="utf-8"))
return {"processed": {}}
def save_state(state: dict):
STATE_FILE.write_text(json.dumps(state, ensure_ascii=False, indent=2), encoding="utf-8")
{ファイル名: 処理時の mtime} is recorded in processed. The next time it is executed, it will be determined like this:
for md_path in diary_files:
file_key = md_path.name
mtime = datetime.fromtimestamp(md_path.stat().st_mtime).isoformat()
# Processed and file not updated → Skip
if not all_mode and file_key in processed and processed[file_key] >= mtime:
continue
# New or updated → Process
...
The key is to check not only the file name but also the mtime. If a new note entry is added to the same file, the mtime will be updated and it will be reprocessed.
You can also force all reprocessing using the --all flag:```bash
rag distill # unprocessed only
rag distill —all # Reprocess all (ignore state)
rag distill —dry-run # Display only classification results, do not save
---
## 7. Try it out — what changed
This is my impression after using it for a month.
### 7.1 “Notes” almost disappeared
Notes that used to be left as they were written are now automatically sorted into meaningful categories and made searchable.
The number of `data/lessons/` has gradually increased, and it is now possible to cross-search only lessons learned.
### 7.2 Improved RAG search quality
If you narrow down your search using `type=lesson`, only ``Rules learned from past mistakes'' will be returned. This worked well as a context to pass to Copilot/Claude Code. Rather than doing a sloppy all-item search, more accurate context can be passed to AI.
### 7.3 I now have more moments where I think, “Oh, I should record this.”
As a side effect, when I write notes, I start to think, ``If this is automatically classified, it will go to lesson'' or ``This is knowledge.'' **Category awareness** develops for the writer as well. A chain reaction occurred that improved the quality of recording.
---
## 8. Points to note/limitations
### 8.1 LLM judgment is not perfect
If you don't check `--dry-run` frequently, **sometimes strange classifications will be mixed**. It is necessary to look at `data/lessons/` and manually remove `none` entries that you think are wrong about once every six months.
### 8.2 The same note may fall under multiple categories
"Qiita 429's Rate-Reset Lesson" is both a lesson and knowledge. The current implementation uses **1 entry and 1 category**, but multi-label classification may be more appropriate. Points for future improvement.
### 8.3 Changing the LLM model changes the results
I am using `microai/suzume-llama3` and have observed that the classification trend changes when switching to another model. Prompts require tuning for each model. Before switching, you should check the quality with a sample.
### 8.4 Local LLM resource consumption
Since it is assumed that Ollama is running, if you are working on other things on your laptop, it will feel like it is slowly taking up memory and CPU. If you want to run it all at once in batches, it would be reasonable to run it before going to bed.
---## 9. Summary
- Everyone encounters the problem of ``I write notes, but I can't keep up with the classification'' when operating a personal RAG. Can be solved with **LLM automatic classification**
- If the implementation is naive, an accident will occur where ``I'm sorry'' is classified as a lesson**. **`none` Explicit category** + **Embed thought process prompt** + **Suppress with pre-filter**
- LLM output is safely received in **JSON mode + fallback regex**. Design without expecting perfection and with the possibility of failure
- Prevent reprocessing with **state file + mtime check**. Fundamentals of managing heavy LLM batches wisely
- After using it for a month, both the quality of notes and the quality of searches will improve. **Even the consciousness of the writer changes** is a nice side effect.
Related articles:
- [Input the conversation history of Copilot Chat and Claude Code into ChromaDB and search for your "past self"](/blog/ai-chat-transcript-ingestion) — Before distill, a mechanism to accumulate notes and conversation history in RAG
- [Mechanism to prevent Copilot from making the same mistake twice — Design to have "memory of discussion" with RAG + MCP] (/blog/copilot-memory-rag-mcp) — MCP server implementation where Copilot references lessons extracted by distill in real time
A note you keep writing will turn into a lesson that will save you tomorrow. **I feel that is the real value of the distill pipeline**.