Job Overview
Design, develop and improve software that provides business, platform, and technology capabilities for our customers and colleagues. Deliver high quality solutions using industry aligned programming languages, frameworks, and tools while ensuring code is scalable, maintainable, and optimized for performance.
Responsibilities
- Collaborate cross functionally with product managers, designers, and engineers to define requirements, devise solution strategies, and ensure seamless integration aligned with business objectives.
- Participate in code reviews, promote a culture of code quality and knowledge sharing, and adhere to secure coding practices to mitigate vulnerabilities.
- Implement effective unit testing practices to support proper code design, readability, and reliability.
- Stay informed of industry technology trends and innovations, actively contributing to the organisation's technology communities to foster technical excellence.
- Serve as an SRE Engineer with responsibilities that include: defining SLOs and error budgets, maintaining uptime, supporting incident response for multi layered distributed systems, and driving reliability improvements.
- Operate on cross region cloud platforms (AWS, Azure, GCP), leveraging observability tools (Prometheus, CloudWatch) and automation frameworks to reduce toil.
- Use infrastructure as code tools (Terraform, CloudFormation) to build self service platforms, tooling, and deployment pipelines for engineering teams.
- Plan capacity, tune performance, optimise cost, and manage production systems at significant scale.
- Ensure security through encryption, secrets management, authentication and authorisation (OAuth, SAML, API Key Management).
- Automate monitoring and monitoring metrics collection with Python, employing modern testing approaches including chaos and performance testing.
- Mentor SREs, run post mortems, and drive reliability improvements across engineering teams.
Qualifications
- Bachelor's degree in computer science or a related field with depth in systems engineering, networking, and distributed systems design.
- Experience with Kubernetes, service mesh technologies, and microservices reliability patterns in production environments.
- Knowledge of compliance requirements and experience working with audit/regulatory frameworks.
- A solid understanding of Large Language Models and Agentic frameworks.
Additional Skills and Experience
- Strong collaborative skills and ability to lead cross functional and multi year assignments.
- Experience in incident response, root cause analysis, and driving continuous improvement.
- Strategic thinking and business acumen to influence functional and cross functional areas of impact.
- Demonstrated ability to build and maintain trusting relationships with internal and external stakeholders.
Location
Glasgow / Knutsford