Back

Data Lineage & Governance Analyst

Full time Information Technology Telecommunications SQL Oracle Python

Role Description

The Technical Analyst - Data Lineage will support a Data Governance, Controls, and Reporting program for a top-tier banking client.
Responsible for establishing and validating end-to-end lineage across critical datasets used in operational and regulatory reporting.
Translate governance and reporting requirements into actionable lineage deliverables (source-to-target mapping, lineage diagrams, metadata standards, and audit evidence).
Work with Data Platform, Data Engineering, Architecture, Risk, Compliance, and Security teams to define lineage standards, metadata capture, and control points.
Maintain lineage artifacts for Critical Data Elements (CDEs), key reports, and priority data products.
Support control design to ensure traceability from source systems transformations curated layers consumption (dashboards/reports/APIs).
Actively participate from discovery workshops through to implementation and continuous improvement.
Ensure traceability from data definition transformation logic lineage evidence audit readiness.

Location

The role supports one of our top-tier banking clients in London (Canary Wharf) and requires a minimum of three days on-site presence.
This is a permanent position based in the UK. We will only consider applicants who are eligible to work in the UK. For this role do NOT offer visa sponsorship.

Experience Requirements & Qualifications

Minimum 3 years of relevant experience in data analytics, data quality, reporting controls, or data transformation programs (preferably in financial services).

Core Skills & Experience

Minimum 3 years of relevant experience in data governance, lineage, metadata management, or controls programs within finance/banking.
Strong understanding of data lineage concepts: technical lineage, business lineage, column-level lineage, impact analysis, and provenance.
Hands on experience with data lineage / metadata tooling in enterprise environments (e.g., Collibra, Alation, Informatica EDC/IDMC, IBM Infosphere, Microsoft Purview, Apache Atlas, Amundsen, DataHub or similar).
Proven ability to build lineage for complex platforms: data lakes, warehouses, marts, and distributed processing (Spark based pipelines).
Strong proficiency in SQL for tracing transformations and validating mappings across layers.
Working knowledge of ETL/ELT patterns, data modeling (dimensional + normalized), and batch scheduling dependencies.
Ability to interpret data transformation logic from pipelines (Spark SQL / PySpark / Hive queries / orchestration configs).
Strong documentation capability: source to target mappings, lineage diagrams, data dictionaries, metadata standards, and control evidence packs.

Technical Skills

Strong proficiency in Python (data analysis/automation for metadata extraction, validation scripts, rule checks).
Hands on experience with PySpark and Spark SQL in production environments.
Solid knowledge of Hive, Impala, HDFS, and Parquet.
Advanced SQL skills; experience with Oracle databases is preferred.
Working knowledge of Autosys & Apache Airflow.
Experience with CI/CD tools (Git, Harness, UrbanCode Deploy (UCD), Red Hat OpenShift).
Familiarity with AWS S3 for large-scale data storage.
Exposure to Tableau (understanding data sources, extracts, dependencies) is a plus.

Nice-to-Have

Experience with regulatory reporting data domains (risk, liquidity, capital, finance, BCBS 239 alignment, etc.).
Knowledge of data governance operating models: CDEs, data ownership, stewardship, data quality dimensions.
Experience creating audit ready documentation and participating in audit walkthroughs.
Experience working in Agile/Scrum delivery models.
Familiarity with monitoring and alerting tools for data pipelines.

Experience Requirements & Qualifications

Conduct discovery workshops to identify priority reports, data products, and Critical Data Elements (CDEs).
Build and maintain end-to-end lineage across systems, including column level mappings where required.
Produce and maintain Source-to-Target Mapping (STTM) documentation and metadata standards.
Validate lineage accuracy by tracing logic through SQL/Spark transformations and pipeline configurations.
Support impact analysis for proposed changes (upstream/downstream dependencies, report impact, control impact).
Partner with engineers and platform teams to improve metadata capture and lineage automation (where possible).
Define lineage related control points and produce audit ready evidence (diagrams, mappings, query proofs, run evidence).
Support UAT by validating that reported numbers can be traced and explained back to trusted sources.
Maintain the lineage backlog and track changes across releases to ensure artifacts remain current.