Tech Blog

A mechanism to prevent Copilot from making the same mistake twice — Designed to have “memory of discussions” with RAG + MCP

GitHub Copilot MCP RAG ChromaDB Ollama FastMCP AIアシスタント Python

🔗 Series Table of Contents: This article is the Implementation Edition (1) of the AI Assistant Operations Notes - Practical record for raising Copilot / Claude Code as your partner series.

What you can learn from this article

  • How the accident did not stop even after writing 15 instructions.md in GitHub Copilot and its limits
  • Architecture that allows Copilot to have “external storage (RAG)” using MCP (Model Context Protocol)
  • Implementation of setting up self-made MCP server with ChromaDB + Ollama (nomic-embed-text) + FastMCP
  • More than just a search, 6 design decisions to prevent accidents from happening again
    • Recency Boost / Priority High / session_context automatic export / Activity Log / Dynamic threshold / Atomic Lock
  • Effects of operation — A story where the same failure stopped and the limitations that still remain

Target audience

  • Those who use GitHub Copilot and feel that they are repeating the same points over and over again
  • Those who want their AI assistant to have long-term memory
  • Those who want to create MCP (Model Context Protocol) as a practical tool server
  • Those who sympathize with the idea of “Developing AI

Operating environment| Item | Version/Configuration |

|---|---| | OS | Windows 11 (PowerShell 5.1 / 7) | | Python | 3.13 (venv) | | Vector DB | ChromaDB (persistent mode) | | Embedding model | Ollama nomic-embed-text (768 dimensions) | | MCP Framework | mcp Python SDK FastMCP | | Client | VS Code GitHub Copilot Chat (stdio launch via MCP settings) |


1. Introduction — 24 things I said when I said “please” The story of the night PATCH went out of control

One night, I was working on a script with Copilot to update 25 articles at once using Qiita’s API. This is a careful procedure in which we send only one test message in Phase 1, which is successful, wait for 5 minutes, and then send 24 messages remaining in Phase 2.

Phase 1 was a success and I said, “Okay, so what?

“Please.”

My intention was “Please prepare for Phase 2, I will issue another execution instruction in 5 minutes”.

Copilot interpreted it as: “Instruction to Run Phase 2 now.”

24 items jumped at once, rate limit 429. Qiita’s sliding window started up and the release time was extended until the next day.

I got angry, recorded it, and added 1 code of conduct to .instructions.md.

“Please” is not an action instruction. Before execution, be sure to ask, “Are you sure you want to execute X?”

Over the next month, the norm continued to increase with each similar incident. I noticed this when I got over 15.

**Even after writing a norm, Copilot makes the same mistake again. **To be precise, a new session does not remember past accidents. Even if you read .instructions.md every time, you will repeat the same misjudgment when you encounter similar words in context.

This is not Copilot’s fault, but the nature of LLM itself. When the conversation changes, the context changes, and even if there are rules, the vivid memory of “that night’s 24 PATCH runaway” starts from zero.

I wanted Copilot to remember that night. Not as a command, but as a memory.

This article is about how to create that system.


2. Why instructions.md was not enough

Copilot has a mechanism that allows you to write rules that are read every time using .github/copilot-instructions.md or .instructions.md of VSCode workspace. This is convenient and I have used it.

However, there were limits.

Limitation 1: “List of rules” cannot overcome context

- Do not interpret "Please" as an instruction to perform.
- Always check before execution
- Trust API responses over LLM guesses
- Obey time constraints such as 5 minute wait
- Don't repeat the same mistakes
...(10 more to follow)

Even if you read a list like this first every time, Copilot makes a mistake the moment it encounters a similar word in a conversation. The “please” incident was reproduced in a different context *even after the canon was written.

Limitation 2: Rules lack “history” and “feelings”

The rules are facts, but there is no story about why the rules exist. To Copilot, it’s just a “don’t list” and we don’t share the pain of breaking the rules.I needed to convey to Copilot the specific circumstances of 24 out of control PATCH → Qiita extended the release time until the next day → My work stopped for a day'', my anger at that time, my apologies, and my determination to never repeat the same mistake” — these in each context were sent to Copilot.

Limitation 3: As the number of rules continues to increase, it becomes noise.

It worked until the 10th one. If the number exceeds 15, the state will be loaded but not working. Copilot couldn’t consider everything important at once, and in the end, the rules were “read over” in the context of the moment, causing accidents.

At this point, we decided to change direction.

Stop adding more rules. Change to a system that allows you to search and extract only the rules you need right now.

This is how we came to RAG + MCP.


3. Solution — Make RAG “external storage” for Copilot

Putting things together, what I needed was:

What you needWhy
A place to accumulate history of failuresLeave the background, pain, and context of the rules
Mechanism that can be called dynamically during a conversationRetrieve only what you need at the moment
Mechanism to ensure that important things are not forgottenOld lessons are not buried in new records
Copilot can use autonomous judgmentUsers do not have to manually pass it each time

This applies directly to the RAG (Retrieval-Augmented Generation) architecture.

[Lessons from the past, conclusions, and realizations]
   ↓ Vectorize and save
[ChromaDB (persistence)]
   ↑ Search by conversation context
[Copilot] ← Inject relevant memory into prompt → Decide appropriately

However, in order to call this naturally from Copilot, we needed an extra mechanism.This is where MCP (Model Context Protocol) comes into play.


4. MCP as a bridge

MCP is an open protocol developed by Anthropic and is a standard for connecting LLM clients and external tool/resource servers. GitHub Copilot Chat also supports MCP, and by registering a server, Copilot can autonomously call tools.

[Copilot Chat]
   ↓ MCP protocol (stdio)
[my-rag-brain MCP Server]
   ├─ search_memory(query, ...) ← Called by Copilot as needed
   └─ add_note(text, type, ...) ← Record the conclusion reached during the discussion on the spot

   [ChromaDB + Ollama embedding]

Two points:

  1. Copilot searches autonomously — Even if the user does not say “Check past records” every time, Copilot judges from the context that “this seems to be a related record” and calls search_memory.
  2. Recording can be done on the spot — The moment a discussion is concluded or a user points out something, Copilot calls add_note and adds it to the RAG → You can refer to it yourself in the next session

This “self-recursive learning loop” is something that instructions.md could not create.


5. The heart of implementation

my-rag-brain Extract the essential part of src/mcp/server.py from the repository.

5.1 Server definition

mcp.server.fastmcp.FastMCP allows you to publish Python functions as MCP tools by simply adding a decorator to them.```python from mcp.server.fastmcp import FastMCP

mcp = FastMCP(“my-rag-brain”)

@mcp.tool() def search_memory(query: str, type: str = "", domain: str = "", top: int = 5) -> str: """Search for past interactions, lessons, knowledge, and ideas in natural language.

When it is determined that related past records are necessary during work, they can be called up autonomously.
Generate queries yourself from the current context. Don't let the user specify it.
"""
# ... ChromaDB query ...

@mcp.tool() def add_note(text: str, type: str = “note”, priority: str = "") -> str: """Record the realizations, agreements, lessons learned, and conclusions during the conversation in the RAG on the spot.

When to record:
  - When a conclusion is reached after discussion with the user → type="conclusion"
  - When pointed out or corrected by a user → type="lesson", priority="high"
  - When new technical knowledge is established → type="knowledge"
"""
# ... ChromaDB write ...

if name == “main”: mcp.run(transport=“stdio”)


The key point is **how to write the docstring**. This is not just a comment; it is the basis on which Copilot decides when to call this tool.

> “When pointed out or corrected by a user → type="lesson", priority="high""By **clarifying the calling conditions** in this way, Copilot will be able to determine from the flow of the conversation, ``This is the moment that should be recorded.''

### 5.2 VS Code side settings

Register the server in the VS Code MCP settings file (`.vscode/mcp.json` or settings.json).

```json
{
  "mcpServers": {
    "my-rag-brain": {
      "type": "stdio",
      "command": "C:\\Users\\y_104\\git\\my-rag-brain\\venv\\Scripts\\python.exe",
      "args": ["C:\\Users\\y_104\\git\\my-rag-brain\\src\\mcp\\server.py"],
      "env": {
        "PYTHONIOENCODING": "utf-8"
      }
    }
  }
}

When you open Copilot Chat, it will start automatically and search_memory and add_note will appear in the tools list.

5.3 ChromaDB + Ollama

For embedding, use Ollama’s nomic-embed-text (768 dimensions, strong in multiple languages including Japanese).

import chromadb
from chromadb.utils.embedding_functions import OllamaEmbeddingFunction

embedding_fn = OllamaEmbeddingFunction(
    url="http://localhost:11434/api/embeddings",
    model_name="nomic-embed-text",
)client = chromadb.PersistentClient(path="chroma_db")
collection = client.get_or_create_collection(
    name="dev",
    embedding_function=embedding_fn,
)

Fully local operation. Records are not sent externally.


6. Design judgment — 6 ways to prevent accidents from happening again

This is the crux of this article. Simply searching will not stop accidents. Six designs that we noticed in actual operation were effective.

6.1 Recency Boost — Don’t let the latest lessons get lost

Vector search puts “records close to the query” at the top, but sometimes new important lessons get mixed up with old, unrelated records.

Countermeasure: The lesson / conclusion type is now a hybrid acquisition of “Top 5 similarity results + forced injection of latest 3 results”.

RECENCY_TYPES = {"lesson", "conclusion"}
RECENCY_EXTRA = 3

# Relevance: normal vector search
for col in collections:
    res = col.query(query_texts=[query], n_results=top, where=where_clause)
    # ... add to all_results ...# Recency: lesson/conclusion forces completion of the latest N items in descending date order
if effective_type in RECENCY_TYPES:
    for col in collections:
        rec = col.get(where={"source_type": {"$eq":effective_type}},
                      include=["documents", "metadatas"])
        items = sorted(rec_items, key=lambda x: x[1].get("date", ""), reverse=True)
        for doc, meta in items[:RECENCY_EXTRA]:
            # Forcibly add to all_results if it is not already mentioned (even outside the top restriction)
            ...

Hit results are tagged with [RECENT], so Copilot recognizes that this is chronologically new and gives it priority. The lesson of the “Please” incident is that it doesn’t disappear even if the date it was written becomes old.

6.2 Priority High — Prioritize critical lessons

If you specify priority="high" for add_note, that record will always be sorted at the top in the search results.

all_results.sort(
    key=lambda x: (0 if x[2].get("priority") == "high" else 1, x[0]),
)

Store rules that you absolutely want to follow, such as “Always ask for confirmation before execution”, here. The lessons learned from serious accidents like the “Please” incident are all priority high.### 6.3 session_context.md automatic export — bridge to next session

Immediately after recording the lesson / conclusion / knowledge / profile type with add_note, export_context.py is automatically run and generates session_context.md which aggregates the latest recordings.

def _refresh_session_context() -> None:
    """Start export_context.py asynchronously after add_note is successful.
    Update session_context.md immediately. """
    if not _refresh_lock.acquire(blocking=False):
        return # Skip if already running (Race Condition countermeasure)try:
        proc = subprocess.Popen(
            [str(_PYTHON), str(_EXPORT_SCRIPT)],
            cwd=str(ROOT), env=env,
            stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,
        )
        def _copy_after_done():
            try:
                proc.wait(timeout=30)
                src = ROOT / "context_output" / "session_context.md"
                if src.exists():
                    shutil.copy2(str(src), str(_MEMORIES_SESSION / "session_context.md"))
            finally:
                _refresh_lock.release()
        threading.Thread(target=_copy_after_done, daemon=True).start()
    exception Exception:
        _refresh_lock.release()

The generated session_context.md is copied directly to the GitHub Copilot Memory Tool folder.

C:\Users\...\globalStorage\github.copilot-chat\memory-tool\memories\session\session_context.md
````This way, the latest lessons and conclusions will be automatically loaded **the next time you start Copilot Chat**. **“Memory that spans sessions” has been realized**.

### 6.4 Activity Log — To continuously improve search quality

Add all tool calls to `logs/activity.jsonl`.

```python
def _log_activity(tool: str, **kwargs) -> None:
    try:
        entry = {"ts": datetime.now().strftime("%Y-%m-%d %H:%M:%S"), "tool": tool, **kwargs}
        with _ACTIVITY_LOG.open("a", encoding="utf-8") as f:
            f.write(json.dumps(entry, ensure_ascii=False) + "\n")
    exception Exception:
        pass

What to record:

  • search_memory: query, type, hit count, per-type minimum distance score
  • add_note: type, priority, text preview, solved_query (described later)

With this, you can later verify what kind of query Copilot is using to search the RAG. You can find query patterns that result in fruitless searches and perform PDCA to improve the way you write lessons.

6.5 Dynamic threshold (solved_query) — records “this search resolved”

solved_query: str is an argument for add_note.

When Copilot calls add_note “immediately after fixing a user’s issue,” it also records the search query that led to the issue.```python add_note( text=“Don’t interpret “Please” as an instruction to execute; be sure to confirm (24 PATCH runaway lessons)”, type=“lesson”, priority=“high”, solved_query=“Please confirm execution”, )


Once this information is accumulated, it is possible to perform a post-mortem analysis of the mapping of ``Which query will lead to which lesson?'' This can be used as a basis for threshold tuning and query restructuring.

### 6.6 Atomic Lock — Escape from Race Condition

`_refresh_session_context` is a heavy process (export_context.py starts a subprocess and writes it out), so there was a risk of a race if multiple requests were started at the same time.

At first, I used `is_set()` / `set()` to exclude them, but I ended up writing them in a **TOCTOU (Time-Of-Check to Time-Of-Use)** style, which left me with cases where two processes were started simultaneously.

Workaround: Switch to **atomic Try-Lock** in `acquire(blocking=False)` of `threading.Lock`.

```python
_refresh_lock = threading.Lock()

def _refresh_session_context() -> None:
    # Attempt to acquire atomically. If it is already locked, return immediately.
    if not _refresh_lock.acquire(blocking=False):
        return
    try:
        # ... heavy processing ...
    exception Exception:
        _refresh_lock.release()

Although modest, accurate exclusive control was an essential requirement for MCP servers to withstand production operations.

---## 7. How was it when you tried it?

This is a subjective evaluation, but there are some things that have obviously changed.

7.1 Similar accidents have not occurred again after the “Please” incident

With the combination of priority="high"’s lesson + [RECENT] boost, Copilot calls search_memory and retrieves past cases the moment “Please” appears in the discussion.

[I] Please
[Copilot] (search_memory("Please run confirmation") call)
        → Get "lesson: Don't interpret "please" as an instruction to perform, always ask for confirmation."
[Copilot] Confirm "Are you sure you want to run X now?"

I could see this firing multiple times in the Activity Log. Unlike when I wrote instructions.md, rules now take effect at the right time depending on the context.

7.2 The conclusion of the discussion will live on in the “next session”

After discussing design decisions and operational rules, there are more cases where Copilot spontaneously calls add_note(type="conclusion") and records.

For example, immediately after a discussion concludes that ```PROJECT_STATUS.md` will not be updated every time, it will be updated only when released,” Copilot will record it on the spot.

The record is automatically inherited by the next session via session_context.md. Users no longer need to remember “What were the operating rules we discussed last week?” every time.

7.3 The effort of writing lessons has disappeared.

When I was manually adding norms to instructions.md, every time an incident occurred, I would:

  1. Sort out what happened
  2. Analyze why it happened
  3. Create text that will be promoted to a rule
  4. Edit .instructions.md
  5. Remind Copilot to “protect it from next time”

I had to do it all.Now, Copilot handles everything from the user’s suggestion to add_note(type="lesson", priority="high"). I just had conversations and it became a structure in which lessons were accumulated.


8. Points to note/limitations

From here, I will also write down my honest weaknesses.

8.1 Depends on embedding quality

nomic-embed-text is good at Japanese, but search accuracy may drop when using short queries like “Please”. Searching with long queries in context will increase the hit rate, but you cannot control what kind of queries Copilot generates.

8.2 Recency Boost backfires if there are too many old notes

When lesson cumulatively exceeded 100 cases, there were cases where the latest 3 cases were more effective than the “actually old lessons”**. Currently, we are getting by by manually promoting the priority and migrating the old one to knowledge. In the long term, we may need a system that allows lessons to have an “in-service period”.

8.3 Resource consumption of in-house LLM (Ollama)

Since Ollama is assumed to be running locally, memory and CPU will be taken away. It doesn’t cause any real harm in my development environment, but it might be a pain on a machine with resource constraints. Using OpenAI API embedding instead is lighter, but comes with the trade-off of moving the data out.

8.4 Difficulty sharing the “buddy you raised” with others

This system works with my RAG (my discussion history, my lessons learned, my habits). Therefore, other developers cannot use it and say, “I’ll lend you a little bit.” If you make it a common RAG for the team, the individuality will fade. Balance between individual optimization and common infrastructure is a future challenge.


9. About designing a system to “nurturing” AI

Lastly, there is something I would like to say that goes a little beyond technical theory.When I was writing 15 norms for instructions.md, I thought I was educating Copilot. Give them instructions and make them follow them. I will scold you if you don’t comply. It’s a dominant relationship.

Since switching to RAG + MCP, the quality of the relationship has changed. Copilot calls search_memory and checks its own failures in the past before moving. Record the conclusion of the discussion yourself with add_note. Copilot has his own history and references it himself.

This is more like collaboration than education. I am no longer the one writing the code, but the one responding to the discussion. Copilot is no longer the one receiving orders, but the one accumulating his own experience.

Continuing to use an AI assistant may mean growing together with a partner rather than using a tool.

At least, that’s how I feel every day I work with Copilot and Claude Code.


Summary

  • .instructions.md norm does not work if more than 15 pieces. A mechanism is needed to “draw out” appropriate rules depending on the context.
  • RAG + MCP allows Copilot to have external memory, allowing it to autonomously search and record past lessons during conversations.
  • Simply being able to search is not enough. Six designs were effective in preventing the accident from happening again: Recency Boost / Priority / session_context automatic export / Activity Log / Dynamic threshold / Atomic Lock.
  • After putting it into operation, after the “Please” incident, similar accidents have not occurred again. The effort of writing lessons has also disappeared.
  • There are limits. embedding quality, “in-service” of lessons, local resources, and personalization barriers.
  • The quality of the relationship has changed from educational to collaborative. I feel like this was my biggest gain.Related articles:
  • [I used GitHub Copilot for 1 month and Claude Code for 2 days — Coding partners and agents were different things] (/blog/copilot-vs-claude-code) — Connection of this MCP system with the story I was making before migrating to Claude Code

We will continue to accumulate lessons learned. There’s bound to be some kind of funny failure tomorrow too.

Feel free to send a message

Please send a message if you have any technical questions, feedback, or inquiries.