AMR.ALFAYOUMY
← HOME / PROJECTS / EAD MYND - AVATAR CONVERSATIONAL AI
PRODUCTION · 2024-2025 · Multilingual conversational AI

EAD Mynd - Avatar Conversational AI.

Data Scientist & AI/ML Engineer
Cairo, Egypt
Source: project brief + year-end review 2025

A production multilingual avatar platform for Environment Agency - Abu Dhabi, where I owned the backend architecture, grounded retrieval stack, speech-to-avatar pipeline, and production engineering needed to launch a real-time public experience.

Interactions
3,700+
Served successfully on launch
Knowledge Base
141,922
Indexed chunks from 1,600+ documents
Latency
<4s
Time-to-first-byte for live avatar responses
Reliability
99.9%
Uptime with zero post-launch production errors
Architecture
9 agents
Specialized orchestration across RAG, speech, and avatar control
// 02 — CHALLENGE
WHY IT MATTERED

The platform had to deliver fact-grounded conversations in both Arabic and English for a public-facing launch, while staying fast enough to feel live and polished as an avatar experience rather than a delayed chatbot.

The hard part was not only answer quality. The system had to synchronize grounded retrieval, session memory, speech-to-text, text-to-speech, lip-sync timing, and avatar behavior under concurrent usage, with enough reliability to survive a national exhibition without operator intervention.

Infrastructure constraints made the engineering bar higher. I had to design for CPU-oriented deployment, security controls, and maintainable operations from day one, so the solution could run economically without sacrificing latency, safety, or production resilience.

// 03 — APPROACH
HOW I BUILT IT

I owned the backend architecture end to end: microservices, session management, retrieval, knowledge-base operations, specialized agents, speech processing, avatar-alignment services, deployment, observability, and production hardening.

At the core, I built a parallelized, memory-enabled RAG system with dynamic source selection, chat-history integration, integrity validation, response regeneration, and a full knowledge-base management API for document ingestion, OCR extraction, vectorization, search, deletion, rebuilds, and health monitoring.

I then designed the real-time agent chain around the avatar experience itself: query contextualization and routing, structured-data extraction, OCR correction, separate QA behavior for avatar and chatbot modes, Arabic and English TTS text preparation, SSML generation, translation, answer classification, and streaming viseme alignment for lip-sync.

To make that design production-safe, I combined CPU-first inference optimization with explicit security engineering: prompt-injection testing, Azure content filtering on both queries and responses, request validation, abuse simulation, IP-restricted admin endpoints, TLS-secured traffic, and centralized logging, so the system could launch confidently and remain operable after handover.

// 04 — KEY DECISIONS
WHAT I CHOSE & WHY
Decision · 01

Build around specialized agents, not one overloaded prompt

I separated contextualization, retrieval preparation, OCR correction, QA behavior, TTS formatting, SSML generation, answer classification, and viseme alignment into focused services because the product had multiple real-time jobs to do at once. That kept the system debuggable, tunable, and easier to harden for production.

Decision · 02

Make grounded retrieval a controllable infrastructure layer

I treated the knowledge base as an operational system, not a one-time embedding script. The platform supports ingestion, OCR, correction, vectorstore rebuilds, origin selection, health checks, and hash validation so the team can trust what the avatar says and maintain it after launch.

Decision · 03

Optimize for CPU-first streaming instead of depending on GPUs

Rather than treating infrastructure limits as a blocker, I reworked the serving path around quantized models, async chunking, warm starts, dual Redis caches, OpenVINO and ONNX-style runtimes, and careful Nginx tuning. That decision made the system materially cheaper while still hitting real-time interaction targets.

Decision · 04

Treat model safety and endpoint security as launch blockers

Because this was a public-facing assistant, I did not leave safety as a policy footnote. I added prompt-injection testing, query and response content filtering, strict request validation, integrity hashing, IP-restricted admin routes, and abuse simulation so unsafe or malformed traffic was filtered before it could become a production incident.

// 05 — ARCHITECTURE
HOW IT FITS TOGETHER

The production flow starts with multilingual user input and agency knowledge sources, routes through OCR and structured-data preparation into a memory-enabled retrieval layer, then hands context to specialized QA and speech agents that generate grounded text, SSML, audio alignment, viseme events, and response-classification signals. Around that core, I built the operational scaffolding: secure session state, hash-validated exchanges, content filtering, restricted admin surfaces, cache layers, deployment workflows, monitoring, backups, and restartable services.

// FIG. SYSTEM DIAGRAM
SCALE 1:N
EAD Mynd avatar conversational AI high-level architecture User speech and knowledge sources feed ingestion, grounded retrieval, specialized agents, speech and viseme generation, and production operations services. INPUTS + SOURCES INGESTION RETRIEVAL + MEMORY SPECIALIZED AGENTS SPEECH + AVATAR PRODUCTION OPS USER VOICE / TEXT QUERY Arabic and English live user turns live query path DOCUMENT + DATA SOURCES PDFs, DOCX, TXT · internal and public data origins knowledge ingestion OCR, TEXT EXTRACTION, OCR CORRECTION, KB MANAGEMENT API upload · processing · vectorization · search · deletion · rebuilds · integrity checks grounded context prep MEMORY-ENABLED RAG + DYNAMIC ORIGIN SELECTION FAISS retrieval · chat history · recency and semantic ranking · response regeneration · hash validation QUERY ROUTING AGENTS contextualization · intent · sanitization structured-data injection QA AGENTS avatar mode · chatbot mode Arabic and English answers CONTROL AGENTS translation · answer classification behavior selection TTS TEXT AGENT diacritization · number normalization chunking for streaming SSML + TTS + ALIGNMENT prosody control · audio generation word and phoneme timestamps VISEME EVENTS streaming lip-sync avatar behavior timing SERVING + CACHE Nginx · dual Redis · session state DEPLOYMENT + SECURITY GitHub workflows · TLS · prompt-injection safety · restricted admin endpoints OBSERVABILITY + RECOVERY logs · health checks · backups · restart-on-failure CI/CD stage · stable prod release flow
// 06 — HIGHLIGHTS
KEY TAKEAWAYS
▸ END-TO-END PLATFORM OWNERSHIP

Owned the system beyond model prompting: backend services, RAG, knowledge infrastructure, speech processing, avatar timing, deployment, logging, backups, and handover readiness.

▸ PUBLIC LAUNCH UNDER REAL LOAD

The platform successfully served 100% of more than 3,700 avatar interactions during the ADIHEX exhibition, with no post-launch production failures.

▸ LARGE-SCALE GROUNDED KNOWLEDGE LAYER

Built and tuned the retrieval stack behind 141,922 indexed chunks from 1,600+ documents, with document lifecycle controls instead of a one-off ingestion script.

▸ ARABIC-FIRST SPEECH QUALITY WORK

Added Arabic diacritization, spoken-number normalization, SSML shaping, and alignment-aware timing so the avatar felt natural rather than merely translated.

▸ CPU-OPTIMIZED PRODUCTION ENGINEERING

Quantization, streaming inference, dual caching, and runtime optimization delivered real-time behavior on CPU-oriented infrastructure while reducing serving cost relative to GPU-heavy designs.

▸ EXPLICIT MODEL AND API SAFETY CONTROLS

Production hardening included prompt-injection testing, Azure content filters on both input and output, abuse simulation, integrity hashing, and locked-down administrative endpoints.

// 07 — OUTCOMES
RESULTS AND LESSONS
  • Delivered a fully multimodal Arabic and English avatar experience that combined grounded answers, speech synthesis, and live lip-sync in one production pipeline.
  • Hit real-time response expectations with sub-4-second time-to-first-byte and 99.9% uptime under production conditions.
  • Made the knowledge layer maintainable by shipping upload, search, count, delete, rebuild, backup, and vectorstore health capabilities as operational APIs.
  • Reduced operational and security risk through CI/CD gating, prompt-injection testing, abuse simulation, secure endpoint controls, content filtering, integrity validation, health checks, and disaster-recovery-friendly backups.
  • Left the project in a sustainable state with full service documentation, handover material, and onboarding support for the engineer taking over the stack.
// 08 — STACK
THE TOOLS
LLM
Azure OpenAIAgent orchestrationPrompt engineeringResponse classification
Retrieval
RAGFAISSOCR pipelinesStructured data ETLKnowledge-base APIs
Speech
STTTTSSSMLArabic diacritizationViseme alignment
Runtime
PythonRedisNginxCPU-optimized inference
Ops
GitHub ActionsMulti-branch releasesStructured loggingPrompt-injection testingHealth checksBackups