Our technology is grounded in rigorous internal bench evaluation, evidence-based therapeutic approaches backed by decades of clinical literature, and active research contributions to the field of safe clinical AI.
Lilo Engine performance from internal bench evaluation (February 2026)
Zero false negatives across 3,720 internal test scenarios spanning 15 clinical phenotypes
1,000x faster than the 30-second Crisis Now/URAC regulatory benchmark
Across 11 therapeutic categories with zero crisis-to-non-crisis misclassifications
End-to-end scenario evaluation with zero clinical anti-patterns detected
These results reflect internal bench evaluation. Manuscripts describing the methodology and results in detail are currently under peer review at leading journals. Results will be updated as external validation progresses.
We are actively contributing to the scientific literature on safe clinical AI
Manuscript under peer review at a leading digital medicine journal (Springer Nature)
Full details will be available upon publication. A preprint may be posted on Research Square pending journal quality review.
Manuscript under peer review at a leading health policy journal
Content under journal embargo. Details available upon publication.
Decades of published research demonstrate fundamental limitations of human oversight in safety-critical systems — limitations that inform our architectural approach.
Felisberto et al. (2024) meta-analysis, 95% CI: 85–95%
Mackworth (1948), confirmed by Frontiers in Psychology review (2025)
Parasuraman & Manzey (2010); Rosbach et al. (2024) pathology study
Boston Globe investigation (2005–2010); FDA MAUDE database
These findings — from independent, peer-reviewed sources — are why we designed Lilo Engine as a deterministic pipeline with structural safety invariants, rather than relying on human oversight or agentic conventions.
Deterministic pipeline architectures provide a class of safety guarantees that agentic orchestration patterns structurally cannot achieve.
Three-tier validated assessment framework integrated into the therapeutic pipeline — the most comprehensive clinical instrument integration in any AI therapeutic system
Scheduled at fixed intervals for every resident. Cast a broad net to detect signals.
On clinical indication. Deeper assessment that feeds crisis detection gates directly.
Baseline + 90/180 days. Track slow-moving clinical trajectories over time.
Addressing gaps that no existing AI therapeutic system covers
| Capability | Woebot | Wysa | ElliQ | Lilo Solace |
|---|---|---|---|---|
| Target Population | College students | General adults | Elderly (loneliness) | Elderly assisted living |
| Voice Interaction | ✗ | ✗ | ✓ | ✓ Senior-optimized |
| Crisis Detection | ✗ | Basic | ✗ | 4-gate OR, 100% recall |
| Clinical Instruments | ✗ | Limited | ✗ | 13 instruments, 3 tiers |
| Architectural Safety | ✗ | ✗ | ✗ | Deterministic pipeline |
| On-Premise / HIPAA | Cloud only | Cloud only | Partial | Full on-premise, §164.312 |
| Evidence-Based Therapies | CBT only | CBT + others | Social only | 5 peer-reviewed skills |
| Deployment Cost | Per-user SaaS | Per-user SaaS | $250/mo + device | $590 hardware, runs locally |
Comparison based on published capabilities as of February 2026. Sources: Fitzpatrick et al. (2017), Inkster et al. (2018), Intuition Robotics Impact Report (2023).
All safety-critical AI processing runs locally. No patient data ever leaves the device.
All safety-critical models (BGE embedding + SLM generation) are co-located on every device. This is a non-negotiable safety, compliance, and reliability requirement — not an optimization. If the internet goes down, the device continues all operations autonomously. Any device with <32GB RAM is disqualified from production deployment.
Minisforum UM890 Pro barebones ($479) + 32GB DDR5 ($60) + 1TB NVMe ($50). Silent, 45-65W, 24/7 operation.
Crisis detection, embeddings, and all 11 ML models run on-device. Models ~7-8GB + services ~4-6GB + OS ~3-4GB = 14-18GB used, 14-18GB free.
TLS 1.3 in transit, AES-256 at rest, immutable 7-year audit logs. PII redacted locally before any network transmission.
At-Home (~$590) → Small Facility ($590–$1,180, 1-2 units) → Large Facility ($3K–$8K, Dell PowerEdge R760). Same pipeline at every scale.
Cross-platform: GGUF model weights are portable across Metal (dev), CUDA (facility GPU), Vulkan (edge AMD), and ROCm backends. Same deterministic pipeline, same safety guarantees, any hardware.
We welcome collaboration with researchers, clinicians, and policy makers advancing safe clinical AI.