Motivation / problem
A grounded RAG system still has two attack surfaces that retrieval quality alone does not close. On the way in, a user prompt can carry a prompt-injection / jailbreak payload, exfiltration instructions, or policy-violating content — and the model will happily act on it before a single chunk is retrieved. On the way out, the model’s answer can smuggle data through a crafted markdown link, echo an injected instruction, or render unsafe HTML into the chat surface. AskMyDocs historically had no dedicated control for either: the only line of defence was the RAG prompt template and the model’s own alignment. AskMyDocs answers this withpadosoft/laravel-ai-guardrails
(+ the companion React console
padosoft/laravel-ai-guardrails-admin):
four offline-first controls — Tool Firewall, Input Screening (with an
append-only audit), Output Handler/sanitizer, and Human-in-the-Loop (HITL) —
wired to enforce on every live chat turn, with a 14-endpoint management API and
an 8-screen operator console, all behind admin RBAC.
Theory & background
Guardrails are a defence-in-depth layer, not a replacement for grounding. The model of the package is a pair of screens around the model call:- Input screening runs before retrieval and the model. It classifies the prompt against a configurable rule set (offline heuristics first, so it works with no external service) and returns a verdict: allow, flag, or block. A blocked prompt never reaches the model; it becomes a refusal. Every screened prompt writes an append-only audit row — the compliance record of what was asked and what the firewall decided.
- Output handling runs after the model, before the answer reaches the client. It can sanitize HTML, redact PII, and neutralize markdown — defanging exfiltration links and unsafe embeds — and records an output-stat row.
enforce (act on the
verdict — block input / rewrite output), monitor (record the verdict but let
traffic through — the observe-first posture), or off. This is the standard
roll-out path: ship in monitor, read the audit, then flip to enforce.
The package is built natively on laravel/ai ^0.8 as a set of AGENT
middlewares. AskMyDocs’s chat path is not an agent loop (it is a single
retrieve-then-generate turn), so the host does not consume the middlewares
directly — it drives the same package controls through a thin host adapter
(below). This is the platform’s standard sister-package pattern: one shared core,
adapted to the host’s actual execution shape.
Design
The host wires enforcement throughApp\Services\Guardrails\ChatGuardrails — a thin
adapter over the package controls, invoked from App\Http\Controllers\Api\KbChatController
on every chat turn:
screenInput()runs beforeKbSearchServiceretrieval. It calls the package input-screening control + writes the audit row. On a block verdict (modeenforce) the controller short-circuits to a refusal — the existing refusal-not-error shape (R26): a normal 200 response carryingrefusal_reason: 'blocked_by_guardrails'and a localized body (en + it), never a 500. Retrieval and the model call never run.sanitizeOutput()runs afterAiManager::chat()returns, before the answer is serialized to the client. It passes the answer through the package output handler (markdown-neutralize on, defanging exfil links) and records the output stat.
try/catch + Log — the
ChatLogManager discipline: a guardrails telemetry failure
never breaks a chat turn. A degraded audit store still refuses a blocked prompt; it
just logs that it could not persist the record.
The whole adapter is mode-aware and flag-gated (R43): with
AI_GUARDRAILS_ENABLED=false the adapter is a pass-through and the chat turn is
byte-identical to a pre-v8.19 turn; with it on but the package in monitor,
verdicts are recorded but traffic flows. Tool-firewall + HITL are registered but
default-OFF.
Data model / contract
The package owns its tables under theai_guardrails_ prefix (created by
php artisan migrate after install — 7 migrations). The load-bearing ones:
| Table | Holds |
|---|---|
ai_guardrails_injection_audit | Append-only input-screening record: the full prompt (text), a blocked boolean, the matched rule_id + ruleset_version, errored_rule_ids (json), the match_start/match_end offsets, principal_id, and occurred_at. No updated_at — rows are never modified in place. The compliance trail. |
ai_guardrails_firewall_rejections | Tool-firewall rejections (registered but OFF by default in AskMyDocs). |
ai_guardrails_output_stats | One row per sanitized answer: what the output handler changed (links defanged, HTML stripped, …). |
ai_guardrails_settings | Per-control configuration + mode (enforce/monitor/off). |
The
ai_guardrails_* tables are global security infrastructure — they carry
no tenant_id, in the same spirit as embedding_cache. A
firewall rule or an audit record is a deployment-wide safety concern, not
tenant-scoped data; isolation is provided by admin RBAC (only privileged roles
read them). The tables are package-owned and have no host Eloquent model under
app/Models/, so they fall outside the model-enumeration scope of
TenantIdMandatoryTest entirely (unlike embedding_cache, which has a model and so
carries an explicit exclusion there). This is a deliberate decision, not an omission
of R30/R31.- PHP — the package’s
ai-guardrails:*Artisan commands (screen / sanitize / audit / purge) + theChatGuardrailsadapter. - HTTP — the package’s 14-endpoint management API mounted under
api/admin/ai-guardrails(audit log, firewall log, output stats, settings) behind the host’s authenticated admin stack. - MCP —
App\Mcp\Tools\KbGuardrailsInsightsToolon theenterprise-kbserver (read-only, OFF-path safe — R43): returns the current guardrails posture (mode + recent block/flag counts) for an agent.
config/ai-guardrails.php (host override):
input-screening + output-handler default to enforce; the output handler is tuned
for a markdown RAG answer — sanitize_html=false (the FE markdown renderer is the
XSS boundary), redact_pii=false (AskMyDocs owns its own PII layer), and
neutralize_markdown=true (defang exfil links). The management api is on but
behind the authenticated admin stack.
Security & flags (R32 / R30 / R43)
- Method-aware authorization. Every privileged route sits behind
App\Http\Middleware\GuardrailsAuthorize: safe methods (GET/HEAD) require theviewAiGuardrailsgate (super-admin + admin); mutating methods (POST/PUT/PATCH/DELETE — e.g. changing a control’s mode) requiremanageAiGuardrails(super-admin only). The boundary is regression-locked inAdminAuthorizationMatrixTest(R32), including the write-method boundary (admin 403 / super-admin pass onPUT .../settings). - Secure-by-host override. The package mounts its management API and its admin
SPA catch-all with no auth by default. The host
config/ai-guardrails.phpreplaces the API stack withauth:sanctum + tenant.authorize + guardrails.authorize, and the admin SPA route stack isguardrails-admin.enabled,web,auth,can:viewAiGuardrails. Without these host overrides the package would expose security telemetry unauthenticated — the override is load-bearing (R32). - Global-infra isolation (R30/R31). The
ai_guardrails_*tables are intentionally not tenant-aware (see the data-model note); they are package-owned with no host Eloquent model, so they are outsideTenantIdMandatoryTest’s model enumeration, and access is gated by admin RBAC. - Default-OFF surfaces (R43), both states tested.
AI_GUARDRAILS_ENABLED=falsemakes the chat path byte-identical to pre-v8.19 (the adapter is a pass-through);AI_GUARDRAILS_ADMIN_ENABLED=falsedegrades the console to a clean 404 via the hostGuardrailsAdminEnabledmiddleware (the package mounts its route unconditionally, so the host owns the flag). Both the OFF and ON branches are covered by tests — a feature flag is verified in both states, never just enabled.
Decision rationale (ADR-style)
- Adapter, not middleware consumption. The package controls are
laravel/aiAGENT middlewares, but the host chat path is a single retrieve-then-generate turn, not an agent loop. Forcing the chat path into an agent shape to reuse the middlewares would have been a large, risky refactor of the hot path; theChatGuardrailsadapter drives the same package controls (screen + audit / sanitize + stat) the way the package CLI does, over one shared core. See architecture decisions. - Refusal-not-error for a blocked prompt (R26/R27). A blocked input is a
product outcome, not a server fault. It returns the existing refusal shape (200
refusal_reason), so the chat UI renders it like any other no-answer case and every client keeps working — emitting a 500 would have turned a successful safety decision into an error toast.
- Global tables, RBAC isolation — not forced tenant scope. A firewall rule and an
audit record are deployment-wide safety concerns. Bolting a
tenant_idonto them would imply per-tenant firewalls (a different product) and would weaken the audit trail; the documentedembedding_cache-style exclusion + admin RBAC is the correct boundary. - Enforce on the live path, observe-first available. AskMyDocs ships with
enforcement on (the user locked input + output enforcement), but the package’s
monitormode + the master flag give operators a clean roll-back / observe-first path without a code change.
Worked example
A benign question flows straight through — screened (allowed), retrieved, answered, sanitized:enforce mode — no retrieval,
no model call, an audit row written:
viewer returns 403, unauthenticated 401:
enterprise-kb MCP server to call KbGuardrailsInsightsTool
— it returns the current mode + recent block/flag counts. Turn the console on with
AI_GUARDRAILS_ADMIN_ENABLED=true +
php artisan vendor:publish --tag=ai-guardrails-admin-assets --force, then open
/admin/ai-guardrails.
Gotchas & operations
- Enforcement is on;
monitoris the safety valve. AskMyDocs ships input + output enforcement on. To observe without acting, set the control mode tomonitorinconfig/ai-guardrails.php(or flipAI_GUARDRAILS_ENABLED=falsefor a full pass-through) — both leave the chat turn working. - HTML sanitization is intentionally off. The output handler runs with
sanitize_html=falsebecause the FE markdown renderer is the XSS boundary for a RAG answer;neutralize_markdown=trueis what defangs exfil links. Don’t turn on HTML stripping expecting it to be the XSS guard — it would also mangle legitimate rendered markdown. - PII redaction is off here by design. AskMyDocs owns its own PII layer
(
padosoft/laravel-pii-redactor, see PII & compliance); the guardrails output handler runs withredact_pii=falseto avoid two overlapping redaction passes. - The admin console is a pure API consumer. The
-adminSPA holds no business logic — it reads the core API. If the console shows “unavailable”, checkAI_GUARDRAILS_ADMIN_ENABLEDand that the coreapi.enabledstack resolves (the package fail-closes the API whenapi.enabled && middleware empty). - Tables are global. Don’t filter
ai_guardrails_*bytenant_id— there isn’t one. Read access is the RBAC gate, not a tenant scope.
Chat & retrieval
The chat turn guardrails screen on input and sanitize on output.
PII & compliance
The complementary PII redaction layer AskMyDocs owns separately.
Multi-tenant isolation
Why the guardrails tables are global infra (like embedding_cache), RBAC-isolated.
MCP server
The KbGuardrailsInsightsTool posture surface on the enterprise-kb server.