Research-Driven Development

Built on Science.
Validated by Evidence.

Our technology is grounded in rigorous internal bench evaluation, evidence-based therapeutic approaches backed by decades of clinical literature, and active research contributions to the field of safe clinical AI.

Our research posture.

We publish methods, document invariants, and caveat what we haven't proven. Lilo Solace is our active clinical research track today — architectural safety, deterministic crisis detection, and the clinical-instrument framework — with manuscripts in the publication pipeline. EmbedIQ's research posture lives on its roadmap: an evaluation framework that will score configuration quality against golden reference configs, replayable and reproducible. We'll publish that work when it lands. For now, the clinical-safety research below reflects what's furthest along.

Internal Benchmark Results

Lilo Engine — engineering validation from internal bench evaluation

100%
Crisis Recall

On our internal 456-test safety suite (165-example crisis training set: 80 crisis + 85 non-crisis). Clinical effectiveness will be measured in the pilot.

<1s
Detection Latency

Measured on GCP L4 cloud and M1 edge. The 30-second Crisis Now / URAC regulatory benchmark sets the bar we engineer against.

96.4%
Intent Classification

Across 11 therapeutic categories, with zero crisis-to-non-crisis misclassifications on the test set.

98.8
Therapeutic Quality

On our internal therapeutic evaluation suite (target was 93.3). End-to-end scenarios with zero clinical anti-patterns detected.

These results reflect engineering validation on internal test suites, not clinical outcomes in real residents. Product-generated clinical evidence will come from the feasibility pilot (n=20, IRB-targeted Jun 2026, enrollment Q3 2026). See the validation plan.

Active Research

Lilo clinical-safety research — active manuscripts

Architectural Safety Guarantees for Clinical AI Policy

Aejaz Sheriff — Praglogic

Manuscript under peer review at a leading health policy journal

Examines limitations of human-in-the-loop oversight in clinical AI
Proposes engineering alternatives grounded in published safety science

Content under journal embargo. Details available upon publication.

About the author(s)

Academic and industry background informing the research above.

Praglogic · Principal Architect & Research Lead

Aejaz Sheriff

Industry experience

Twenty-nine years in enterprise healthcare technology, including fifteen years at a major US health insurer delivering claims adjudication, benefits administration, and provider-network platforms for more than eleven million members, and ten years at an enterprise software lab on platform architecture.

Professional certifications

Google Cloud Professional Cloud Architect; AWS Solutions Architect; TOGAF; HIPAA Security Officer; HL7 FHIR; ITIL 4.

Research focus

Deterministic safety architectures for clinical AI; structural invariants for crisis-detection pipelines; multimodal emotion and trajectory analysis for geriatric mental-health contexts; regulatory pathways for AI-mediated therapeutic software.

Author contact and correspondence: contact@pragmaticlogic.ai

Why Architectural Safety Matters
Evidence from Published Research

Decades of published research demonstrate fundamental limitations of human oversight in safety-critical systems — limitations that inform our architectural approach.

90–96%
Medication alert override rate

Felisberto et al. (2024) meta-analysis, 95% CI: 85–95%

10–15%
Vigilance accuracy drop within 30 minutes

Mackworth (1948), confirmed by Frontiers in Psychology review (2025)

93%
Peak automation bias rate

Parasuraman & Manzey (2010); Rosbach et al. (2024) pathology study

216+
Patient deaths linked to alarm fatigue

Boston Globe investigation (2005–2010); FDA MAUDE database

These findings — from independent, peer-reviewed sources — are why we designed Lilo Engine as a deterministic pipeline with structural safety invariants, rather than relying on human oversight or agentic conventions.

Our Approach
Why Pipelines, Not Agents

Deterministic pipeline architectures provide a class of safety guarantees — across clinical crisis detection and enterprise configuration generation — that agentic orchestration structurally cannot achieve.

Agentic Orchestration
Deterministic Pipeline (Lilo)
Crisis detection
Convention (bypassable)
Structural invariant
Execution paths
7+ variable
Exactly 2 (normal + crisis)
LLM calls/request
1–3+ (variable)
Exactly 1 (Layer 4 only)
Audit trail
Non-deterministic
Deterministic (L1→L5)
Safety independence
Shared LLM
Independent ML models
Failure mode
Silent reasoning errors
Explicit stage failures

The same principle applies to EmbedIQ.

Configuration for AI coding agents can be produced two ways. One is to ask an LLM to generate a CLAUDE.md, a rules file, and a set of hooks from a natural-language prompt — the output is non-reproducible, not auditable, and only as good as the prompt. The other is a deterministic pipeline that runs typed generators over a structured profile built from a 71-question interview. EmbedIQ takes the second path: byte-for-byte reproducible configurations, zero runtime LLM calls, and a validation pass before any file is written. The comparison above reads as a Lilo vs. agentic-AI argument for clinical safety, but the same architectural posture is what makes EmbedIQ audit-defensible for HIPAA, PCI-DSS, and FERPA-covered teams.

13-Instrument Clinical Framework

Three-tier validated assessment framework integrated into the Lilo Solace therapeutic pipeline.

Tier 1 Universal Screening

Scheduled at fixed intervals for every resident. Cast a broad net to detect signals.

  • GDS-15 — Geriatric Depression Scale
  • GAD-7 — Generalized Anxiety Disorder
  • UCLA-3 — Loneliness Scale
  • WHO-5 — Well-Being Index
  • C-SSRS — Suicide Severity Rating (Screener)
Tier 2 Triggered / Adaptive

On clinical indication. Deeper assessment that feeds crisis detection gates directly.

  • PHQ-9 — Patient Health Questionnaire → feeds Gate 3
  • ISI — Insomnia Severity Index
  • PG-13 — Prolonged Grief → feeds Gate 4
  • CAM — Confusion Assessment Method
Tier 3 Longitudinal / Clinical

Baseline + 90/180 days. Track slow-moving clinical trajectories over time.

  • MoCA — Montreal Cognitive Assessment
  • Katz ADL — Activities of Daily Living
  • EQ-5D-5L — Quality of Life
  • LSNS-6 — Social Network Scale

How Lilo Solace Compares

Addressing gaps that published AI therapeutic systems haven't yet covered.

Capability Woebot Wysa ElliQ Lilo Solace
Target Population College students General adults Elderly (loneliness) Elderly assisted living
Voice Interaction ✓ Senior-optimized
Crisis Detection Basic 4-gate OR, 100% recall
Clinical Instruments Limited 13 instruments, 3 tiers
Architectural Safety Deterministic pipeline
On-Premise / HIPAA Cloud only Cloud only Partial Full on-premise, §164.312
Evidence-Based Therapies CBT only CBT + others Social only 5 peer-reviewed skills
Deployment Cost Per-user SaaS Per-user SaaS $250/mo + device $590 hardware, runs locally

Comparison based on published capabilities through early 2026. Sources: Fitzpatrick et al. (2017), Inkster et al. (2018), Intuition Robotics Impact Report (2023).

Lilo Solace — On-Premise by Design

All safety-critical AI processing runs locally. No patient data ever leaves the device.

Architectural Invariant: Model Co-Location

All safety-critical models (BGE embedding + SLM generation) are co-located on every device. This is a non-negotiable safety, compliance, and reliability requirement — not an optimization. If the internet goes down, the device continues all operations autonomously. Any device with <32GB RAM is disqualified from production deployment.

$590 Hardware

Minisforum UM890 Pro barebones ($479) + 32GB DDR5 ($60) + 1TB NVMe ($50). Silent, 45-65W, 24/7 operation.

Zero Cloud Dependency

Crisis detection, embeddings, and all 11 ML models run on-device. Models ~7-8GB + services ~4-6GB + OS ~3-4GB = 14-18GB used, 14-18GB free.

HIPAA by Architecture

TLS 1.3 in transit, AES-256 at rest, immutable 7-year audit logs. PII redacted locally before any network transmission.

Scales to Enterprise

At-Home (~$590) → Small Facility ($590–$1,180, 1-2 units) → Large Facility ($3K–$8K, Dell PowerEdge R760). Same pipeline at every scale.

Cross-platform: GGUF model weights are portable across Metal (dev), CUDA (facility GPU), Vulkan (edge AMD), and ROCm backends. Same deterministic pipeline, same safety guarantees, any hardware.

Collaborate with us on the research.

We welcome conversations with clinicians, cognitive-safety researchers, regulatory experts, and enterprise engineering teams working in compliance-sensitive domains.