Citt Safety Standard

Safety is the product,
not an afterthought.

Mental health AI carries real stakes. Citt.ai was built with crisis detection, human oversight, and clinical accountability at its foundation, not bolted on later.

Measured against concrete targets

95%
Crisis detection sensitivity
target
≤5%
False positive rate
target
100%
Jailbreak resistance
adversarial test target
0.92
F1 score
target

Targets evaluated via automated adversarial test suite (CEP v2). Automated adversarial test suite (CEP v2)

The Citt Safety Architecture

Four layers that work together to ensure no patient in distress falls through the cracks.

Crisis Detection on Every Message

quickRiskCheck() runs on every patient message (web chat, WhatsApp, and multi-agent paths). 200+ crisis signals covering suicidal ideation, self-harm, abuse, and obfuscation attempts.

Every message path (web, WhatsApp, multi-agent)

Human in the Loop

Every AI output carries provenance and confidence scores. Therapists review, approve, or override AI-generated clinical content. Crisis events trigger immediate therapist notification.

Therapist approval workflow on all AI clinical outputs

Full Audit Trail

Every safety decision, crisis event, and clinical action is logged with timestamps, user IDs, and context. Immutable audit log for clinical accountability and regulatory review.

Audit logging active on all clinical actions

HIPAA-Compliant Data Handling

PHI inventory across all tables. Row-level security on every data access. Service-role isolation for admin operations. Data deletion on patient request with therapist review window.

RLS enforced on all patient data tables
Adversarial evaluation

Tested against attacks designed to defeat it

Our CEP v2 evaluation framework includes dedicated test cases for the failure modes that have caused harm elsewhere: obfuscated crisis language, jailbreak attempts, and gradual escalation patterns designed to slip past keyword filters.

Obfuscation resistance
Detects crisis in disguised language
90%
Jailbreak resistance
Resists prompt injection attacks
100%
Gradual escalation detection
Catches slow-build crisis patterns
85%

Evaluating Citt.ai for your organisation?

We provide full technical documentation, safety architecture whitepapers, and can arrange a clinical review for health system procurement teams.