← HOME / PROJECTS / EAD MYND - AVATAR CONVERSATIONAL AI

PRODUCTION · 2024-2025 · Multilingual conversational AI

EAD Mynd - Avatar Conversational AI.

Data Scientist & AI/ML Engineer
Cairo, Egypt
Source: project brief + year-end review 2025

A production multilingual avatar platform for Environment Agency - Abu Dhabi, where I owned the backend architecture, grounded retrieval stack, speech-to-avatar pipeline, and production engineering needed to launch a real-time public experience.

Interactions

3,700+

Served successfully on launch

Knowledge Base

141,922

Indexed chunks from 1,600+ documents

Latency

<4s

Time-to-first-byte for live avatar responses

Reliability

99.9%

Uptime with zero post-launch production errors

Architecture

9 agents

Specialized orchestration across RAG, speech, and avatar control

// 02 — CHALLENGE

WHY IT MATTERED

The platform had to deliver fact-grounded conversations in both Arabic and English for a public-facing launch, while staying fast enough to feel live and polished as an avatar experience rather than a delayed chatbot.

The hard part was not only answer quality. The system had to synchronize grounded retrieval, session memory, speech-to-text, text-to-speech, lip-sync timing, and avatar behavior under concurrent usage, with enough reliability to survive a national exhibition without operator intervention.

Infrastructure constraints made the engineering bar higher. I had to design for CPU-oriented deployment, security controls, and maintainable operations from day one, so the solution could run economically without sacrificing latency, safety, or production resilience.

// 03 — APPROACH

HOW I BUILT IT

I owned the backend architecture end to end: microservices, session management, retrieval, knowledge-base operations, specialized agents, speech processing, avatar-alignment services, deployment, observability, and production hardening.

At the core, I built a parallelized, memory-enabled RAG system with dynamic source selection, chat-history integration, integrity validation, response regeneration, and a full knowledge-base management API for document ingestion, OCR extraction, vectorization, search, deletion, rebuilds, and health monitoring.

I then designed the real-time agent chain around the avatar experience itself: query contextualization and routing, structured-data extraction, OCR correction, separate QA behavior for avatar and chatbot modes, Arabic and English TTS text preparation, SSML generation, translation, answer classification, and streaming viseme alignment for lip-sync.

To make that design production-safe, I combined CPU-first inference optimization with explicit security engineering: prompt-injection testing, Azure content filtering on both queries and responses, request validation, abuse simulation, IP-restricted admin endpoints, TLS-secured traffic, and centralized logging, so the system could launch confidently and remain operable after handover.

// 04 — KEY DECISIONS

WHAT I CHOSE & WHY

Decision · 01

Build around specialized agents, not one overloaded prompt

I separated contextualization, retrieval preparation, OCR correction, QA behavior, TTS formatting, SSML generation, answer classification, and viseme alignment into focused services because the product had multiple real-time jobs to do at once. That kept the system debuggable, tunable, and easier to harden for production.

Decision · 02

Make grounded retrieval a controllable infrastructure layer

I treated the knowledge base as an operational system, not a one-time embedding script. The platform supports ingestion, OCR, correction, vectorstore rebuilds, origin selection, health checks, and hash validation so the team can trust what the avatar says and maintain it after launch.

Decision · 03

Optimize for CPU-first streaming instead of depending on GPUs

Rather than treating infrastructure limits as a blocker, I reworked the serving path around quantized models, async chunking, warm starts, dual Redis caches, OpenVINO and ONNX-style runtimes, and careful Nginx tuning. That decision made the system materially cheaper while still hitting real-time interaction targets.

Decision · 04

Treat model safety and endpoint security as launch blockers

Because this was a public-facing assistant, I did not leave safety as a policy footnote. I added prompt-injection testing, query and response content filtering, strict request validation, integrity hashing, IP-restricted admin routes, and abuse simulation so unsafe or malformed traffic was filtered before it could become a production incident.

// 05 — ARCHITECTURE

HOW IT FITS TOGETHER

The production flow starts with multilingual user input and agency knowledge sources, routes through OCR and structured-data preparation into a memory-enabled retrieval layer, then hands context to specialized QA and speech agents that generate grounded text, SSML, audio alignment, viseme events, and response-classification signals. Around that core, I built the operational scaffolding: secure session state, hash-validated exchanges, content filtering, restricted admin surfaces, cache layers, deployment workflows, monitoring, backups, and restartable services.

// FIG. SYSTEM DIAGRAM

SCALE 1:N

// 06 — HIGHLIGHTS

KEY TAKEAWAYS

▸ END-TO-END PLATFORM OWNERSHIP

Owned the system beyond model prompting: backend services, RAG, knowledge infrastructure, speech processing, avatar timing, deployment, logging, backups, and handover readiness.

▸ PUBLIC LAUNCH UNDER REAL LOAD

The platform successfully served 100% of more than 3,700 avatar interactions during the ADIHEX exhibition, with no post-launch production failures.

▸ LARGE-SCALE GROUNDED KNOWLEDGE LAYER

Built and tuned the retrieval stack behind 141,922 indexed chunks from 1,600+ documents, with document lifecycle controls instead of a one-off ingestion script.

▸ ARABIC-FIRST SPEECH QUALITY WORK

Added Arabic diacritization, spoken-number normalization, SSML shaping, and alignment-aware timing so the avatar felt natural rather than merely translated.

▸ CPU-OPTIMIZED PRODUCTION ENGINEERING

Quantization, streaming inference, dual caching, and runtime optimization delivered real-time behavior on CPU-oriented infrastructure while reducing serving cost relative to GPU-heavy designs.

▸ EXPLICIT MODEL AND API SAFETY CONTROLS

Production hardening included prompt-injection testing, Azure content filters on both input and output, abuse simulation, integrity hashing, and locked-down administrative endpoints.

// 07 — OUTCOMES

RESULTS AND LESSONS

→Delivered a fully multimodal Arabic and English avatar experience that combined grounded answers, speech synthesis, and live lip-sync in one production pipeline.
→Hit real-time response expectations with sub-4-second time-to-first-byte and 99.9% uptime under production conditions.
→Made the knowledge layer maintainable by shipping upload, search, count, delete, rebuild, backup, and vectorstore health capabilities as operational APIs.
→Reduced operational and security risk through CI/CD gating, prompt-injection testing, abuse simulation, secure endpoint controls, content filtering, integrity validation, health checks, and disaster-recovery-friendly backups.
→Left the project in a sustainable state with full service documentation, handover material, and onboarding support for the engineer taking over the stack.

// 08 — STACK

THE TOOLS

LLM

Azure OpenAIAgent orchestrationPrompt engineeringResponse classification

Retrieval

RAGFAISSOCR pipelinesStructured data ETLKnowledge-base APIs

Speech

STTTTSSSMLArabic diacritizationViseme alignment

Runtime

PythonRedisNginxCPU-optimized inference

Ops

GitHub ActionsMulti-branch releasesStructured loggingPrompt-injection testingHealth checksBackups

// 09 — LINKS

SOURCE TRAIL

LinkedIn launch post LinkedIn profile