Research-Driven Development

Built on Science.
Validated by Evidence.

Our technology is grounded in rigorous internal bench evaluation, evidence-based therapeutic approaches backed by decades of clinical literature, and active research contributions to the field of safe clinical AI.

Internal Benchmark Results

Lilo Engine performance from internal bench evaluation (February 2026)

100%
Crisis Recall

Zero false negatives across 3,720 internal test scenarios spanning 15 clinical phenotypes

28.7ms
Detection Latency

1,000x faster than the 30-second Crisis Now/URAC regulatory benchmark

96.4%
Intent Classification

Across 11 therapeutic categories with zero crisis-to-non-crisis misclassifications

98.4%
Generation Quality

End-to-end scenario evaluation with zero clinical anti-patterns detected

These results reflect internal bench evaluation. Manuscripts describing the methodology and results in detail are currently under peer review at leading journals. Results will be updated as external validation progresses.

Active Research

We are actively contributing to the scientific literature on safe clinical AI

Architectural Safety Guarantees for Clinical AI Policy

Aejaz Sheriff — PragLogic AI

Manuscript under peer review at a leading health policy journal

Examines limitations of human-in-the-loop oversight in clinical AI
Proposes engineering alternatives grounded in published safety science

Content under journal embargo. Details available upon publication.

Why Architectural Safety Matters
Evidence from Published Research

Decades of published research demonstrate fundamental limitations of human oversight in safety-critical systems — limitations that inform our architectural approach.

90–96%
Medication alert override rate

Felisberto et al. (2024) meta-analysis, 95% CI: 85–95%

10–15%
Vigilance accuracy drop within 30 minutes

Mackworth (1948), confirmed by Frontiers in Psychology review (2025)

93%
Peak automation bias rate

Parasuraman & Manzey (2010); Rosbach et al. (2024) pathology study

216+
Patient deaths linked to alarm fatigue

Boston Globe investigation (2005–2010); FDA MAUDE database

These findings — from independent, peer-reviewed sources — are why we designed Lilo Engine as a deterministic pipeline with structural safety invariants, rather than relying on human oversight or agentic conventions.

Our Approach
Why Pipelines, Not Agents

Deterministic pipeline architectures provide a class of safety guarantees that agentic orchestration patterns structurally cannot achieve.

Agentic Orchestration
Deterministic Pipeline (Lilo)
Crisis detection
Convention (bypassable)
Structural invariant
Execution paths
7+ variable
Exactly 2 (normal + crisis)
LLM calls/request
1–3+ (variable)
Exactly 1 (Layer 4 only)
Audit trail
Non-deterministic
Deterministic (L1→L5)
Safety independence
Shared LLM
Independent ML models
Failure mode
Silent reasoning errors
Explicit stage failures

13-Instrument Clinical Framework

Three-tier validated assessment framework integrated into the therapeutic pipeline — the most comprehensive clinical instrument integration in any AI therapeutic system

Tier 1 Universal Screening

Scheduled at fixed intervals for every resident. Cast a broad net to detect signals.

  • GDS-15 — Geriatric Depression Scale
  • GAD-7 — Generalized Anxiety Disorder
  • UCLA-3 — Loneliness Scale
  • WHO-5 — Well-Being Index
  • C-SSRS — Suicide Severity Rating (Screener)
Tier 2 Triggered / Adaptive

On clinical indication. Deeper assessment that feeds crisis detection gates directly.

  • PHQ-9 — Patient Health Questionnaire → feeds Gate 3
  • ISI — Insomnia Severity Index
  • PG-13 — Prolonged Grief → feeds Gate 4
  • CAM — Confusion Assessment Method
Tier 3 Longitudinal / Clinical

Baseline + 90/180 days. Track slow-moving clinical trajectories over time.

  • MoCA — Montreal Cognitive Assessment
  • Katz ADL — Activities of Daily Living
  • EQ-5D-5L — Quality of Life
  • LSNS-6 — Social Network Scale

How Lilo Solace Compares

Addressing gaps that no existing AI therapeutic system covers

Capability Woebot Wysa ElliQ Lilo Solace
Target Population College students General adults Elderly (loneliness) Elderly assisted living
Voice Interaction ✓ Senior-optimized
Crisis Detection Basic 4-gate OR, 100% recall
Clinical Instruments Limited 13 instruments, 3 tiers
Architectural Safety Deterministic pipeline
On-Premise / HIPAA Cloud only Cloud only Partial Full on-premise, §164.312
Evidence-Based Therapies CBT only CBT + others Social only 5 peer-reviewed skills
Deployment Cost Per-user SaaS Per-user SaaS $250/mo + device $590 hardware, runs locally

Comparison based on published capabilities as of February 2026. Sources: Fitzpatrick et al. (2017), Inkster et al. (2018), Intuition Robotics Impact Report (2023).

On-Premise by Design

All safety-critical AI processing runs locally. No patient data ever leaves the device.

Architectural Invariant: Model Co-Location

All safety-critical models (BGE embedding + SLM generation) are co-located on every device. This is a non-negotiable safety, compliance, and reliability requirement — not an optimization. If the internet goes down, the device continues all operations autonomously. Any device with <32GB RAM is disqualified from production deployment.

$590 Hardware

Minisforum UM890 Pro barebones ($479) + 32GB DDR5 ($60) + 1TB NVMe ($50). Silent, 45-65W, 24/7 operation.

Zero Cloud Dependency

Crisis detection, embeddings, and all 11 ML models run on-device. Models ~7-8GB + services ~4-6GB + OS ~3-4GB = 14-18GB used, 14-18GB free.

HIPAA by Architecture

TLS 1.3 in transit, AES-256 at rest, immutable 7-year audit logs. PII redacted locally before any network transmission.

Scales to Enterprise

At-Home (~$590) → Small Facility ($590–$1,180, 1-2 units) → Large Facility ($3K–$8K, Dell PowerEdge R760). Same pipeline at every scale.

Cross-platform: GGUF model weights are portable across Metal (dev), CUDA (facility GPU), Vulkan (edge AMD), and ROCm backends. Same deterministic pipeline, same safety guarantees, any hardware.

Interested in Our Research?

We welcome collaboration with researchers, clinicians, and policy makers advancing safe clinical AI.