Making AI Feel Like Someone's Actually There

Three Seconds Is an Eternity in Conversation

Senior Experience Engineer · Soul Machines · 2022–2025

Conversational AI that feels slow feels like a voice interface with loading screens. I mapped the full response pipeline, introduced streaming, TTS prediction, and parallelization to reduce typical latency to ~2 seconds, with optimized configurations reaching ~1 second. Then added vision so agents could actually see you.

Response latency~2 sec typical (~1s optimized)

The Problem

Multi-second accumulated latency across STT/NLU/TTS/animation pipeline, zero environmental awareness.

What I Did

Mapped entire pipeline stage by stage, restructured architecture to stream/predict/parallelize, integrated vision LLMs.

What Was Built

Pipeline instrumentation
Streaming responses
TTS prediction
Vision integration
Multimodal analysis

Outcome

Typical response latency reduced to ~2 seconds
Optimized configurations reaching ~1 second (varies by provider/config)
Vision creates unprompted "magic moments"

Every millisecond saved compounds across millions of interactions. Latency is the difference between demo and product.

View full portfolio →