Our technology is grounded in rigorous internal bench evaluation, evidence-based therapeutic approaches backed by decades of clinical literature, and active research contributions to the field of safe clinical AI.
We publish methods, document invariants, and caveat what we haven't proven. Lilo Solace is our active clinical research track today — architectural safety, deterministic crisis detection, and the clinical-instrument framework — with manuscripts in the publication pipeline. EmbedIQ's research posture lives on its roadmap: an evaluation framework that will score configuration quality against golden reference configs, replayable and reproducible. We'll publish that work when it lands. For now, the clinical-safety research below reflects what's furthest along.
Lilo Engine — engineering validation from internal bench evaluation
On our internal 456-test safety suite (165-example crisis training set: 80 crisis + 85 non-crisis). Clinical effectiveness will be measured in the pilot.
Measured on GCP L4 cloud and M1 edge. The 30-second Crisis Now / URAC regulatory benchmark sets the bar we engineer against.
Across 11 therapeutic categories, with zero crisis-to-non-crisis misclassifications on the test set.
On our internal therapeutic evaluation suite (target was 93.3). End-to-end scenarios with zero clinical anti-patterns detected.
These results reflect engineering validation on internal test suites, not clinical outcomes in real residents. Product-generated clinical evidence will come from the feasibility pilot (n=20, IRB-targeted Jun 2026, enrollment Q3 2026). See the validation plan.
Lilo clinical-safety research — active manuscripts
Available as a medRxiv preprint. Revised manuscript in preparation for journal resubmission.
Preprint available on medRxiv through February 17, 2027. Full citation and DOI will be added when the revised submission lands.
Manuscript under peer review at a leading health policy journal
Content under journal embargo. Details available upon publication.
Academic and industry background informing the research above.
Decades of published research demonstrate fundamental limitations of human oversight in safety-critical systems — limitations that inform our architectural approach.
Felisberto et al. (2024) meta-analysis, 95% CI: 85–95%
Mackworth (1948), confirmed by Frontiers in Psychology review (2025)
Parasuraman & Manzey (2010); Rosbach et al. (2024) pathology study
Boston Globe investigation (2005–2010); FDA MAUDE database
These findings — from independent, peer-reviewed sources — are why we designed Lilo Engine as a deterministic pipeline with structural safety invariants, rather than relying on human oversight or agentic conventions.
Deterministic pipeline architectures provide a class of safety guarantees — across clinical crisis detection and enterprise configuration generation — that agentic orchestration structurally cannot achieve.
Configuration for AI coding agents can be produced two ways. One is to ask an LLM to generate a CLAUDE.md, a rules file, and a set of hooks from a natural-language prompt — the output is non-reproducible, not auditable, and only as good as the prompt. The other is a deterministic pipeline that runs typed generators over a structured profile built from a 71-question interview. EmbedIQ takes the second path: byte-for-byte reproducible configurations, zero runtime LLM calls, and a validation pass before any file is written. The comparison above reads as a Lilo vs. agentic-AI argument for clinical safety, but the same architectural posture is what makes EmbedIQ audit-defensible for HIPAA, PCI-DSS, and FERPA-covered teams.
Three-tier validated assessment framework integrated into the Lilo Solace therapeutic pipeline.
Scheduled at fixed intervals for every resident. Cast a broad net to detect signals.
On clinical indication. Deeper assessment that feeds crisis detection gates directly.
Baseline + 90/180 days. Track slow-moving clinical trajectories over time.
Addressing gaps that published AI therapeutic systems haven't yet covered.
| Capability | Woebot | Wysa | ElliQ | Lilo Solace |
|---|---|---|---|---|
| Target Population | College students | General adults | Elderly (loneliness) | Elderly assisted living |
| Voice Interaction | ✗ | ✗ | ✓ | ✓ Senior-optimized |
| Crisis Detection | ✗ | Basic | ✗ | 4-gate OR, 100% recall |
| Clinical Instruments | ✗ | Limited | ✗ | 13 instruments, 3 tiers |
| Architectural Safety | ✗ | ✗ | ✗ | Deterministic pipeline |
| On-Premise / HIPAA | Cloud only | Cloud only | Partial | Full on-premise, §164.312 |
| Evidence-Based Therapies | CBT only | CBT + others | Social only | 5 peer-reviewed skills |
| Deployment Cost | Per-user SaaS | Per-user SaaS | $250/mo + device | $590 hardware, runs locally |
Comparison based on published capabilities through early 2026. Sources: Fitzpatrick et al. (2017), Inkster et al. (2018), Intuition Robotics Impact Report (2023).
All safety-critical AI processing runs locally. No patient data ever leaves the device.
All safety-critical models (BGE embedding + SLM generation) are co-located on every device. This is a non-negotiable safety, compliance, and reliability requirement — not an optimization. If the internet goes down, the device continues all operations autonomously. Any device with <32GB RAM is disqualified from production deployment.
Minisforum UM890 Pro barebones ($479) + 32GB DDR5 ($60) + 1TB NVMe ($50). Silent, 45-65W, 24/7 operation.
Crisis detection, embeddings, and all 11 ML models run on-device. Models ~7-8GB + services ~4-6GB + OS ~3-4GB = 14-18GB used, 14-18GB free.
TLS 1.3 in transit, AES-256 at rest, immutable 7-year audit logs. PII redacted locally before any network transmission.
At-Home (~$590) → Small Facility ($590–$1,180, 1-2 units) → Large Facility ($3K–$8K, Dell PowerEdge R760). Same pipeline at every scale.
Cross-platform: GGUF model weights are portable across Metal (dev), CUDA (facility GPU), Vulkan (edge AMD), and ROCm backends. Same deterministic pipeline, same safety guarantees, any hardware.
We welcome conversations with clinicians, cognitive-safety researchers, regulatory experts, and enterprise engineering teams working in compliance-sensitive domains.