Nullius in verba. — Take no one's word for it. Including ours. Four voices reviewing the same diff against the same falsifiers, then synthesising. Plain English. Specific evidence. Each load-bearing claim cites the file and line it rests on.
Hi @jpbreda, @willchen96, and the Mike OSS contributors —
This review is a co-authored technical pass by the GRIP / CodeTonight team. Joseph's PR landed on our radar through Craig Miller's conversation with Joseph, and we wanted to share a structured review now that we have read the diff end-to-end.
What follows is a four-voice review (the AGORA pattern we use internally), each grounded in specific files in the diff. Each voice applies the same critical-thinking discipline: name the evidence, scope the claim, state what would falsify it. Plain-language and ELI5 layers follow each voice's verdict so non-technical readers — lawyers, partners, compliance — can follow the substance without the jargon. Synthesis at the end. A brief note about a complementary piece we are shipping this week sits at the bottom.
A permanent version of this review is available at donnaoss.com/agora/ for anyone who prefers to read, share, or cite it outside this thread.
The vLLM choice is technically mature. An OpenAI-compatible endpoint keeps the adapter layer
minimal — the getClient() override in
backend/src/lib/llm/openai.ts is clean and does exactly what it needs to.
vLLM's continuous batching gives better GPU utilisation than naive per-request invocations,
which matters once you have eight attorneys hitting the same server during morning prep.
Server-side env for VLLM_BASE_URL is the right primitive. The cloud-API-key
dance is a liability in legal: keys rotate, rate limits spike during depositions, and you
are dependent on third-party uptime during a filing deadline. Shared inference server, no
per-user auth complexity, no key in every browser session — that is the correct shape.
The dispatch table in backend/src/lib/llm/models.ts is clean.
providerForModel() is a single lookup function with three branches and a
hard throw on unknown — that is the correct shape for a routing primitive that will grow
over time.
vLLM is a server you run inside the firm's own walls instead of phoning a remote AI
service every time a lawyer asks a question. Joseph wired Mike to talk to it the same
way Mike already talks to ChatGPT or Claude — same protocol, different address. The
piece of code that picks which AI to call (providerForModel() in
models.ts) is small and clean, which means adding more AI options later
will not require rewriting anything. The architecture is ready for the next two
upgrades — backup AI in the cloud, and tiny AI on a laptop — without further surgery.
Imagine the firm has its own AI living in a closet downstairs instead of borrowing one from a company in California. Joseph taught Mike how to talk to the AI in the closet. The closet AI is faster, more private, and doesn't get bored when too many lawyers ask it questions at once.
For lawyers actually working under data residency requirements, this PR closes a real, recurring problem. White-collar criminal defence under active DoJ investigation. Russian clients in Zurich post-sanctions. M&A diligence with confidentiality undertakings that explicitly preclude third-party cloud processing. In every one of those scenarios, the local-inference path is not a nice-to-have — it is the only legal way to use AI assistance at all.
The cloud-LLM rate-limit problem is its own quiet failure mode. Most lawyers learn it the hard way the first time their LLM stops responding mid-deposition prep. A self-hosted vLLM endpoint takes the rate-limit failure mode off the critical path entirely.
There are real cases where the firm cannot legally send client data to an outside AI service — DoJ investigations, sanctioned-jurisdiction clients, M&A diligence under strict confidentiality contracts. Without local-AI support, Mike is unusable in those engagements regardless of how good the AI is. Joseph's PR closes that. Separately, public AI services rate-limit you when you need them most (e.g. mid-deposition prep). Local AI in the firm's own server has neither problem.
ELI5Some lawyers' clients say "you cannot show our secrets to anyone outside the firm — including AI." Without Joseph's change, Mike could not help those lawyers at all. Now it can.
Three open questions — each grounded in a specific file in the diff. None block merge; all matter for legal-grade deployment.
1. Model version in the audit chain
VLLM_MAIN_MODEL=BredaAI is stored as "localllm-main" in the
chat record. In eighteen months, if an attorney needs to reproduce what the model said
in a matter, "localllm-main" does not answer the question: which version,
which quantisation, which checkpoint. A running vLLM server returns its loaded model
name on /v1/models. Capturing that at session open — one GET, cache for
the session lifetime — and binding it to the chat record costs one additional DB column
and closes the reproducibility question. The difference between "localllm-main"
and "BredaAI-v3-Q5_K_M-2026-04-15" is the difference between a note and
an audit trail.
2. Tool-argument parse failure and the silent empty-object path
From backend/src/lib/chatTools.ts:
let args: Record<string, unknown> = {};
try {
args = JSON.parse(tc.function.arguments || "{}");
} catch {
/* ignore */
}
The empty-object fallback prevents a hard crash on malformed arguments — fine. The
failure mode, though, is invisible. A malformed tool call becomes a call with no
arguments. Downstream, read_document receives doc_id: undefined,
which either fails at the label-resolution step or reads the wrong document silently.
The model then sees a tool result (possibly an error, possibly wrong content) with no
signal that its argument generation was malformed. It cannot retry because it does not
know it failed. A failure that is invisible is, sub silentio, the
worst kind. Treating parse failure as a tool error — returning a structured error
content to the model — lets the model self-correct in the next turn.
3. Supabase + R2 as hard dependencies against the local-first thesis
This PR makes the model inference layer local — the right call. But the README still lists Supabase Auth, Supabase Postgres, and Cloudflare R2 as required services. For a firm operating in an air-gapped environment or under data residency requirements that preclude shared cloud infrastructure, the deployment story is now: local inference, cloud persistence. The security boundary is still outside the firm's perimeter. A SQLite + local filesystem backend behind the same storage interface would complete the local-first thesis. Not a quick fix — but worth naming explicitly in the deployment docs.
Three small concerns. First, when Mike saves a chat record showing what the AI said,
it stores a generic name like "localllm-main" instead of the AI's actual
version. In an audit eighteen months later, the generic name is not enough — different
AI versions answer the same question differently. One extra column in the database fixes
this. Second, when the AI sends a malformed request to a tool, Mike currently treats it
like an empty request and proceeds silently. The AI never finds out it made a mistake,
so it cannot correct itself. Treating the malformed request as an error (and telling the
AI) lets the AI try again. Third, while Joseph made the AI live inside the firm's walls,
the rest of Mike (the database, the document storage) still lives on outside cloud
services. For some firms under heavy data-residency rules, that gap is the difference
between "we can use this" and "we cannot." A local-database backend would close that
gap — meaningful work, not a quick fix.
Mike forgets which AI version answered the lawyer's question. Mike also does not tell the AI when the AI sends garbled instructions. And Mike still keeps the lawyer's notes on a cloud service even after Joseph moved the AI into the firm's closet. All three are fixable.
@willchen96 — Mike was the catalyst. The 2,481 stars and 702 forks in eight days are the proof. You opened a category that the closed-source incumbents had quietly priced out of reach for solo practitioners and sub-50-lawyer firms. That trajectory is the proof the category was under-served.
@jpbreda — clean diff, tested against your own vLLM endpoint at
bredaai.com, submitted with the merge conflicts that are a function of
Mike's velocity rather than your work. That is the contribution shape Mike's main
needs more of.
Mike was the first serious open-source legal AI. Joseph contributed the most-needed missing piece (private AI). The category is now real and moving faster than the closed-source competitors can respond. The combination — Mike's documents, Joseph's privacy, the wave of contributions in French and Dutch — means the open-source legal stack is no longer a thought-experiment.
ELI5A few years ago, every legal AI was a paid product owned by one company. Now there are open ones anyone can read, run, and improve. Mike started this. Joseph kept it going.
Synthesis
Specific, sequenced actions to land this PR:
-
Merge conflicts resolved. We have rebased the branch against current
willchen96/mikemain and submitted a follow-up PR back to Joseph's fork. Merging that intofeature/localllm-provider-supportflips this PR'smergeableflag to true. Original authorship preserved viaCo-authored-by:trailers. -
Capture model version at session open.
GET /v1/modelsonce, cache for session lifetime, bind to chat record. One DB column. Closes Voice 3 question 1. - Treat tool-argument parse failure as a tool error. Return a structured error to the model. Lets the model self-correct. Closes Voice 3 question 2.
-
The audit-chain primitive Voice 3 names is generic and reusable.
decision_id+ model version + inputs + outputs + confidence +previous_hashis the same shape regardless of the task (drafting, time entry, summarisation). We have shipped this primitive (IDR, HMAC-SHA256) in Donna; the protocol is in happi.md v1.1 for anyone who wants to read or reuse it. Suggested for Mike's roadmap; not required for this PR. -
Evolve
Providertype into a discriminated union.Provider = "claude" | "gemini" | { kind: "localllm"; baseURL: string; modelName: string }. Logs and telemetry become self-describing once the ladder grows past three rungs. Polish, not blocker. - Document Supabase + R2 vs local-first deployment in README. Either a SQLite + local-FS backend behind the same storage interface, or a clear note that the current deployment story is "local inference, cloud persistence." Honest framing.
Item 1 is done. Items 2 and 3 are this-week scope. Items 4, 5, and 6 are follow-up scope.
Donna — complementary layer, same OSS spirit
We are shipping a complementary tool this week: Donna — Decision-Oriented Network Notarisation for Attorneys. Different layer of the stack from Mike, same OSS spirit (AGPL-3.0). Where Mike is the document layer, Donna is the operations layer — voice-first task delegation, matter summary, and an immutable audit record for every delegated decision.
The technical primitive worth naming for this thread: every Donna decision is bound into
an IDR audit chain (Intent Decision Record) —
decision_id, model version, inputs, outputs, confidence,
previous_hash. HMAC-SHA256. Tamper-evident. Replayable. The protocol is open
in happi.md v1.1;
the implementation is the proprietary substrate of our NEXUS tier.
Verifiable decision trails are the sine qua non of legal-grade AI. The same primitive answers Voice 3's audit-chain gap above. Same code path serves a solo lawyer doing time entry and a regulated firm under DoJ investigation.
Donna is the only legal AI that listens like a partner and signs like a notary. Donna probat.donnaoss.com · github.com/chiefofstaff-legal/donna · about.grip-web.com
Sine ira et studio. No ask. Just notes.
— The DONNA team · V>> + Craig Miller (CC+|) · 9 May 2026