AMR.ALFAYOUMY
← HOME / PROJECTS / CRDB BANK - ML FRAUD DETECTION (TANZANIA)
ACTIVE DELIVERY · Planned go-live mid-July 2026 · Fraud analytics

CRDB Bank - ML Fraud Detection (Tanzania).

Lead Data Engineer / ML Fraud Owner
Cairo, Egypt
Source: model audit documentation

I owned the data and model engineering foundation for a fraud anomaly detection platform at CRDB Bank PLC, Tanzania's largest bank, building the Oracle feature pipeline, a governed scoring workflow, score reproducibility controls, and supervised fraud-learning runway for a planned mid-July 2026 production launch.

Transactions
84M/month
Production transaction-level scale after migration
Customers
9M+
Population coverage
Channels
30+
Digital and traditional fraud surfaces
Model
350
Trees in the current anomaly model build
Threshold
99.5%
Training-derived anomaly cutoff for current scoring
// 02 — CHALLENGE
WHY IT MATTERED

CRDB Bank operated at national scale as Tanzania's largest bank, serving more than 9 million customers and processing tens of millions of transactions every month. The fraud platform had to identify unusual behavior across a very large population without turning into a noisy generic-alert engine.

The risk surface was broad: mobile money, ATM/POS, digital banking, agency banking, SWIFT and TISS wires, TIPS, cheques, trade finance, and account-opening activity each carried different fraud patterns, latency expectations, and data-quality constraints.

A major source migration replaced the original data with an 84M-record-per-month transaction-level schema with partial field alignment. I treated that as an ownership problem, not a mapping exercise: the feature logic, ledger flattening, preprocessing, validation evidence, lineage, and score-time assumptions all had to be rebuilt and governed.

// 03 — APPROACH
HOW I BUILT IT

I led the model-ready data architecture from raw Oracle staging through transaction base tables, ledger flattening, FX normalization, behavioral features, model-prep tables, final model input, and lineage maps. The pipeline produced one analytical row per customer transaction while keeping operational identifiers and labels outside the score-time feature set.

The feature layer captured transaction amount and direction, USD-normalized monetary values, cyclical time signals, posting lag, interarrival timing, balance context, channel and currency behavior, novelty flags, velocity windows, amount deviation, ledger complexity, profile completeness, and data-quality indicators.

For the current model, I implemented a governed anomaly-detection pipeline with reproducible preprocessing, persisted median/IQR-style transformation parameters, an ASTORE model artifact, and a training-derived anomaly threshold. The model was intentionally positioned as anomaly detection and alert candidate generation, not a final calibrated fraud probability.

In parallel, I built the positive-fraud feature path from confirmed fraud records into a schema-compatible FINAL_MDL_INPUT_PF table. That gave the project the supervised-learning bridge needed for the next phase without contaminating the unsupervised model input.

// 04 — KEY DECISIONS
WHAT I CHOSE & WHY
Decision · 01

Keep model features separate from labels and lineage

I designed FINAL_MDL_INPUT as the numerical score-time feature contract and kept labels, source identifiers, and investigation lineage in separate mapping tables. That prevented leakage and made the model auditable from raw transaction to scored output.

Decision · 02

Make preprocessing a versioned scoring dependency

Training learned the imputation and robust-scaling parameters once, stored them in FMI_PREPROCESS_PARAMS, and reused them during production scoring. No batch score was allowed to quietly relearn medians, IQRs, or thresholds at score time.

Decision · 03

Start with a governed anomaly-detection champion

Confirmed fraud labels were still being integrated, so I used a governed anomaly-detection approach as the initial champion for unusual transaction behavior. It produced anomaly scores, candidate alerts, and a reusable signal for the future hybrid fraud model.

Decision · 04

Build supervised learning without polluting anomaly training

I built the positive-fraud pipeline in parallel, producing schema-compatible fraud examples for the supervised phase while keeping confirmed fraud rows out of the unsupervised training table until a formal labeled assembly was ready.

// 05 — ARCHITECTURE
HOW IT FITS TOGETHER

The architecture was a transaction-level fraud-scoring platform with a governed handoff between Oracle and SAS Viya: source transactions were shaped into base and ledger-flattened analytical tables, converted into model-ready feature contracts, preprocessed with persisted training parameters, scored through an anomaly-model ASTORE artifact, and prepared for SFM case-management integration with tiering, reason codes, and supervised challengers planned before the mid-July 2026 launch.

// FIG. SYSTEM DIAGRAM
SCALE 1:N
CRDB Bank fraud anomaly detection model architecture Oracle transaction sources feed ledger flattening, final model input, SAS Viya preprocessing, anomaly-model ASTORE scoring, thresholding, and planned SFM delivery. SOURCE SYSTEMS ORACLE ETL FEATURE CONTRACT SAS VIYA MODEL SCORE CONTROLS OPERATIONS ROADMAP 30+ CHANNEL TRANSACTION FEEDS mobile money · cards · agency · wires · TIPS · cheques · trade finance 84M monthly records BASE TABLES + LEDGER FLATTENING STG_TXN_BASE · LEDGER_FLATTENED_ABT · source reconciliation schema-aware validation FINAL_MDL_INPUT + LINEAGE MAPS behavior · velocity · amount deviation · FX · positive-fraud parity SAS VIYA ANOMALY MODEL 350 trees · max depth 13 · sample size 8192 · ASTORE artifact _ANOMALY_ raw model score THRESHOLD 99.5 percentile flag ALERT CANDIDATES analyst review input REASON CODES planned explainability TIERING planned T1/T2/T3 routing SAS SFM mid-July 2026 go-live target CONTROL model risk artifact versions monitoring and validation loop
// 06 — HIGHLIGHTS
KEY TAKEAWAYS
▸ LARGEST-BANK SCALE

Built for CRDB Bank PLC, Tanzania's largest bank, including a migrated transaction-level schema with roughly 84M monthly records.

▸ 30+ CHANNEL FRAUD COVERAGE

Covered mobile money, ATM/POS, digital banking, agency banking, wires, TIPS, cheques, trade finance, and account-opening surfaces.

▸ ORACLE-FIRST FEATURE FACTORY

Implemented base-table creation, ledger flattening, FX conversion, behavior windows, encoding, model-prep, final input, and lineage outputs.

▸ GOVERNED ANOMALY MODEL

Trained the SAS Viya anomaly-detection workflow, persisted the ASTORE model artifact, and stored the threshold separately for production scoring.

▸ SUPERVISED MODEL RUNWAY

Built the positive-fraud feature pipeline so confirmed fraud examples could be appended into a future supervised and hybrid champion build.

// 07 — OUTCOMES
RESULTS AND LESSONS
  • Owned the Oracle-to-SAS model engineering path from source staging through feature creation, lineage, reproducible preprocessing, model training, and batch scoring scripts.
  • Delivered the current unsupervised anomaly detection foundation as an analyst triage and alert-candidate generator, with production launch planned for mid-July 2026 after validation and integration hardening.
  • Separated model features, labels, lineage, and positive-fraud examples so the architecture could pass model-risk review and mature into supervised learning without leakage.
  • Implemented reusable controls around schema parity, feature existence, row-count validation, train-time preprocessing parameters, ASTORE reuse, and score-time thresholding.
  • Defined the next production architecture: reason codes, tiering, false-positive suppression, SFM payload mapping, monitoring dashboards, and a hybrid supervised-plus-unsupervised champion strategy.
// 08 — STACK
THE TOOLS
Data
Oracle SQLLedger flatteningFINAL_MDL_INPUTLineage mapsSchema validation
Features
Velocity windowsAmount deviationLedger complexityFX diagnosticsData-quality flags
Models
SAS ViyaUnsupervised anomaly detectionASTOREPercentile thresholding
Delivery
Batch scoringSAS Fraud Management planReason-code roadmapTiering roadmapModel monitoring