Data Lineage & Governance Analyst

  • Boundaryless
  • 08/06/2026
Full time Information Technology Telecommunications SQL Oracle Python

Job Description

Role Description
  • The Technical Analyst - Data Lineage will support a Data Governance, Controls, and Reporting program for a top-tier banking client.
  • Responsible for establishing and validating end-to-end lineage across critical datasets used in operational and regulatory reporting.
  • Translate governance and reporting requirements into actionable lineage deliverables (source-to-target mapping, lineage diagrams, metadata standards, and audit evidence).
  • Work with Data Platform, Data Engineering, Architecture, Risk, Compliance, and Security teams to define lineage standards, metadata capture, and control points.
  • Maintain lineage artifacts for Critical Data Elements (CDEs), key reports, and priority data products.
  • Support control design to ensure traceability from source systems transformations curated layers consumption (dashboards/reports/APIs).
  • Actively participate from discovery workshops through to implementation and continuous improvement.
  • Ensure traceability from data definition transformation logic lineage evidence audit readiness.
Location
  • The role supports one of our top-tier banking clients in London (Canary Wharf) and requires a minimum of three days on-site presence.
  • This is a permanent position based in the UK. We will only consider applicants who are eligible to work in the UK. For this role do NOT offer visa sponsorship.
Experience Requirements & Qualifications
  • Minimum 3 years of relevant experience in data analytics, data quality, reporting controls, or data transformation programs (preferably in financial services).
Core Skills & Experience
  • Minimum 3 years of relevant experience in data governance, lineage, metadata management, or controls programs within finance/banking.
  • Strong understanding of data lineage concepts: technical lineage, business lineage, column-level lineage, impact analysis, and provenance.
  • Hands on experience with data lineage / metadata tooling in enterprise environments (e.g., Collibra, Alation, Informatica EDC/IDMC, IBM Infosphere, Microsoft Purview, Apache Atlas, Amundsen, DataHub or similar).
  • Proven ability to build lineage for complex platforms: data lakes, warehouses, marts, and distributed processing (Spark based pipelines).
  • Strong proficiency in SQL for tracing transformations and validating mappings across layers.
  • Working knowledge of ETL/ELT patterns, data modeling (dimensional + normalized), and batch scheduling dependencies.
  • Ability to interpret data transformation logic from pipelines (Spark SQL / PySpark / Hive queries / orchestration configs).
  • Strong documentation capability: source to target mappings, lineage diagrams, data dictionaries, metadata standards, and control evidence packs.
Technical Skills
  • Strong proficiency in Python (data analysis/automation for metadata extraction, validation scripts, rule checks).
  • Hands on experience with PySpark and Spark SQL in production environments.
  • Solid knowledge of Hive, Impala, HDFS, and Parquet.
  • Advanced SQL skills; experience with Oracle databases is preferred.
  • Working knowledge of Autosys & Apache Airflow.
  • Experience with CI/CD tools (Git, Harness, UrbanCode Deploy (UCD), Red Hat OpenShift).
  • Familiarity with AWS S3 for large-scale data storage.
  • Exposure to Tableau (understanding data sources, extracts, dependencies) is a plus.
Nice-to-Have
  • Experience with regulatory reporting data domains (risk, liquidity, capital, finance, BCBS 239 alignment, etc.).
  • Knowledge of data governance operating models: CDEs, data ownership, stewardship, data quality dimensions.
  • Experience creating audit ready documentation and participating in audit walkthroughs.
  • Experience working in Agile/Scrum delivery models.
  • Familiarity with monitoring and alerting tools for data pipelines.
Experience Requirements & Qualifications
  • Conduct discovery workshops to identify priority reports, data products, and Critical Data Elements (CDEs).
  • Build and maintain end-to-end lineage across systems, including column level mappings where required.
  • Produce and maintain Source-to-Target Mapping (STTM) documentation and metadata standards.
  • Validate lineage accuracy by tracing logic through SQL/Spark transformations and pipeline configurations.
  • Support impact analysis for proposed changes (upstream/downstream dependencies, report impact, control impact).
  • Partner with engineers and platform teams to improve metadata capture and lineage automation (where possible).
  • Define lineage related control points and produce audit ready evidence (diagrams, mappings, query proofs, run evidence).
  • Support UAT by validating that reported numbers can be traced and explained back to trusted sources.
  • Maintain the lineage backlog and track changes across releases to ensure artifacts remain current.