Making AI Feel Like Someone's Actually There
Three Seconds Is an Eternity in Conversation
Senior Experience Engineer · Soul Machines · 2022–2025
Conversational AI that feels slow feels like a voice interface with loading screens. I mapped the full response pipeline, introduced streaming, TTS prediction, and parallelization to reduce typical latency to ~2 seconds, with optimized configurations reaching ~1 second. Then added vision so agents could actually see you.
Response latency~2 sec typical (~1s optimized)
The Problem
Multi-second accumulated latency across STT/NLU/TTS/animation pipeline, zero environmental awareness.
What I Did
Mapped entire pipeline stage by stage, restructured architecture to stream/predict/parallelize, integrated vision LLMs.
What Was Built
- Pipeline instrumentation
- Streaming responses
- TTS prediction
- Vision integration
- Multimodal analysis
Outcome
- Typical response latency reduced to ~2 seconds
- Optimized configurations reaching ~1 second (varies by provider/config)
- Vision creates unprompted "magic moments"
Every millisecond saved compounds across millions of interactions. Latency is the difference between demo and product.
View full portfolio →