An international banking group managed its HR data across fragmented systems spanning 150 subsidiaries on 4 continents. No shared reference framework, large-scale duplication, and 700 users without unified BI tooling. The challenge: build the foundations of an HR data lake from scratch.
The Challenge
The group operated with heterogeneous information systems across each of its geographic regions: North and South America, Africa, Asia-Pacific. HR data from its subsidiaries was neither consolidated, nor cleansed, nor governed. Duplicate employee identifiers distorted analyses at every level. No common definition of what constituted a "unique employee" had ever been established.
HR data fragmented across incompatible systems spanning 150 entities in 4 geographic regions
Massive employee identifier duplication making any group-level analysis impossible
No data flow cataloguing or mapping of HR data sources
700 users across 150 subsidiaries with no common BI tool or adequate training
No data governance and no identified owner for HR reference datasets
The Approach
Before engaging in any migration or deploying any analytical tool, the project began with what most data projects skip: defining what we want to unify, mapping what already exists, and establishing the governance rules that will make data reliable over time. This upfront rigour is what made the results possible.
Comprehensive inventory of HR data sources across 150 entities, mapping of inter-system flows, creation of a group-wide data catalogue enabling origin and quality tracking for every data point.
Development of a probability-based algorithm to identify and unify duplicate employee identifiers at group scale — without perfect matching across source systems. 100% workforce coverage achieved.
Creation of HR reporting for subsidiaries, data lake architecture implementation, and rollout of a training programme for 700 group BI tool users across 150 subsidiaries.
Results
Beyond the metrics, this project produced something the previous systems had never delivered: a shared, operational definition of what a group employee actually is. HR data accuracy improved by 20%, reducing cross-system mismatches by 25%. It is these foundations — not the tools — that finally made group-level analysis reliable.
"The real problem wasn't technical. It's that no one had ever defined what a unique identifier was before starting. Twelve per cent duplicates across 100,000 people, and no one saw it — because no one was looking."
Technologies & Methods
Work with Wysegen
Let's talk about your data challenges — no commitment, no jargon.