CRDB Bank - ML Fraud Detection (Tanzania).
Cairo, Egypt
Source: model audit documentation
I owned the data and model engineering foundation for a fraud anomaly detection platform at CRDB Bank PLC, Tanzania's largest bank, building the Oracle feature pipeline, a governed scoring workflow, score reproducibility controls, and supervised fraud-learning runway for a planned mid-July 2026 production launch.
CRDB Bank operated at national scale as Tanzania's largest bank, serving more than 9 million customers and processing tens of millions of transactions every month. The fraud platform had to identify unusual behavior across a very large population without turning into a noisy generic-alert engine.
The risk surface was broad: mobile money, ATM/POS, digital banking, agency banking, SWIFT and TISS wires, TIPS, cheques, trade finance, and account-opening activity each carried different fraud patterns, latency expectations, and data-quality constraints.
A major source migration replaced the original data with an 84M-record-per-month transaction-level schema with partial field alignment. I treated that as an ownership problem, not a mapping exercise: the feature logic, ledger flattening, preprocessing, validation evidence, lineage, and score-time assumptions all had to be rebuilt and governed.
I led the model-ready data architecture from raw Oracle staging through transaction base tables, ledger flattening, FX normalization, behavioral features, model-prep tables, final model input, and lineage maps. The pipeline produced one analytical row per customer transaction while keeping operational identifiers and labels outside the score-time feature set.
The feature layer captured transaction amount and direction, USD-normalized monetary values, cyclical time signals, posting lag, interarrival timing, balance context, channel and currency behavior, novelty flags, velocity windows, amount deviation, ledger complexity, profile completeness, and data-quality indicators.
For the current model, I implemented a governed anomaly-detection pipeline with reproducible preprocessing, persisted median/IQR-style transformation parameters, an ASTORE model artifact, and a training-derived anomaly threshold. The model was intentionally positioned as anomaly detection and alert candidate generation, not a final calibrated fraud probability.
In parallel, I built the positive-fraud feature path from confirmed fraud records into a schema-compatible FINAL_MDL_INPUT_PF table. That gave the project the supervised-learning bridge needed for the next phase without contaminating the unsupervised model input.
Keep model features separate from labels and lineage
I designed FINAL_MDL_INPUT as the numerical score-time feature contract and kept labels, source identifiers, and investigation lineage in separate mapping tables. That prevented leakage and made the model auditable from raw transaction to scored output.
Make preprocessing a versioned scoring dependency
Training learned the imputation and robust-scaling parameters once, stored them in FMI_PREPROCESS_PARAMS, and reused them during production scoring. No batch score was allowed to quietly relearn medians, IQRs, or thresholds at score time.
Start with a governed anomaly-detection champion
Confirmed fraud labels were still being integrated, so I used a governed anomaly-detection approach as the initial champion for unusual transaction behavior. It produced anomaly scores, candidate alerts, and a reusable signal for the future hybrid fraud model.
Build supervised learning without polluting anomaly training
I built the positive-fraud pipeline in parallel, producing schema-compatible fraud examples for the supervised phase while keeping confirmed fraud rows out of the unsupervised training table until a formal labeled assembly was ready.
The architecture was a transaction-level fraud-scoring platform with a governed handoff between Oracle and SAS Viya: source transactions were shaped into base and ledger-flattened analytical tables, converted into model-ready feature contracts, preprocessed with persisted training parameters, scored through an anomaly-model ASTORE artifact, and prepared for SFM case-management integration with tiering, reason codes, and supervised challengers planned before the mid-July 2026 launch.
Built for CRDB Bank PLC, Tanzania's largest bank, including a migrated transaction-level schema with roughly 84M monthly records.
Covered mobile money, ATM/POS, digital banking, agency banking, wires, TIPS, cheques, trade finance, and account-opening surfaces.
Implemented base-table creation, ledger flattening, FX conversion, behavior windows, encoding, model-prep, final input, and lineage outputs.
Trained the SAS Viya anomaly-detection workflow, persisted the ASTORE model artifact, and stored the threshold separately for production scoring.
Built the positive-fraud feature pipeline so confirmed fraud examples could be appended into a future supervised and hybrid champion build.
- →Owned the Oracle-to-SAS model engineering path from source staging through feature creation, lineage, reproducible preprocessing, model training, and batch scoring scripts.
- →Delivered the current unsupervised anomaly detection foundation as an analyst triage and alert-candidate generator, with production launch planned for mid-July 2026 after validation and integration hardening.
- →Separated model features, labels, lineage, and positive-fraud examples so the architecture could pass model-risk review and mature into supervised learning without leakage.
- →Implemented reusable controls around schema parity, feature existence, row-count validation, train-time preprocessing parameters, ASTORE reuse, and score-time thresholding.
- →Defined the next production architecture: reason codes, tiering, false-positive suppression, SFM payload mapping, monitoring dashboards, and a hybrid supervised-plus-unsupervised champion strategy.