Windows Site Reliability Engineer £110K Hybrid London Overview: A leading global investment bank is seeking a skilled Windows Site Reliability Engineer to join their London-based Technology team. This is a senior-level leadership role focused on driving platform reliability, automation strategy, and infrastructure modernisation across mission-critical banking systems. This is an excellent opportunity to join an investment banking powerhouse. Role and Responsibilities: Lead Platform Engineering and SRE strategy across the organisation Build scalable, self-service, and resilient infrastructure platforms Drive Infrastructure as Code and CI/CD adoption, automating manual tasks Define SLOs and optimise system performance, latency, and scalability Lead root cause analysis and remediate technical debt Partner with development teams to improve testing, releases, and deployments Shape cloud and hybrid migration strategies Ensure regulatory, audit, and security compliance Act as senior escalation point for high-severity incidents Identify cost and efficiency improvements; deputise for senior leadership Skills and Experience: Deep expertise in Microsoft Windows Server internals Advanced knowledge of Active Directory, DNS, DHCP, LDAP, and Kerberos Extensive experience tuning low-latency, enterprise-scale systems Strong background in SRE, DevOps, CI/CD, and Infrastructure as Code principles Advanced scripting and development capability (PowerShell, Python, C#) Clustering, high-availability, replication, and disaster recovery expertise Experience with Git, Terraform, Ansible, Jenkins, and TeamCity Strong understanding of security hardening (CIS) within regulated environments Deep knowledge of performance monitoring, system internals, and optimisation techniques Experience working within financial services or other regulated industries Desirable: Kubernetes and Docker orchestration Major cloud platforms (Azure, AWS, GCP) Nutanix HCI and VMware ESX Enterprise databases (SQL Server, Oracle, Sybase ASE, MongoDB, Snowflake) Enterprise monitoring tools (Splunk, Netcool) Package Circa £110,000 + Excellent Package London / Hybrid x3 days onsite Flexible working considered Windows Site Reliability Engineer £110K Hybrid London
12/03/2026
Full time
Windows Site Reliability Engineer £110K Hybrid London Overview: A leading global investment bank is seeking a skilled Windows Site Reliability Engineer to join their London-based Technology team. This is a senior-level leadership role focused on driving platform reliability, automation strategy, and infrastructure modernisation across mission-critical banking systems. This is an excellent opportunity to join an investment banking powerhouse. Role and Responsibilities: Lead Platform Engineering and SRE strategy across the organisation Build scalable, self-service, and resilient infrastructure platforms Drive Infrastructure as Code and CI/CD adoption, automating manual tasks Define SLOs and optimise system performance, latency, and scalability Lead root cause analysis and remediate technical debt Partner with development teams to improve testing, releases, and deployments Shape cloud and hybrid migration strategies Ensure regulatory, audit, and security compliance Act as senior escalation point for high-severity incidents Identify cost and efficiency improvements; deputise for senior leadership Skills and Experience: Deep expertise in Microsoft Windows Server internals Advanced knowledge of Active Directory, DNS, DHCP, LDAP, and Kerberos Extensive experience tuning low-latency, enterprise-scale systems Strong background in SRE, DevOps, CI/CD, and Infrastructure as Code principles Advanced scripting and development capability (PowerShell, Python, C#) Clustering, high-availability, replication, and disaster recovery expertise Experience with Git, Terraform, Ansible, Jenkins, and TeamCity Strong understanding of security hardening (CIS) within regulated environments Deep knowledge of performance monitoring, system internals, and optimisation techniques Experience working within financial services or other regulated industries Desirable: Kubernetes and Docker orchestration Major cloud platforms (Azure, AWS, GCP) Nutanix HCI and VMware ESX Enterprise databases (SQL Server, Oracle, Sybase ASE, MongoDB, Snowflake) Enterprise monitoring tools (Splunk, Netcool) Package Circa £110,000 + Excellent Package London / Hybrid x3 days onsite Flexible working considered Windows Site Reliability Engineer £110K Hybrid London
Senior DevOps Engineer 6-month contract London/Remote SC Clearance Inside IR35 My Financial Customer is looking for a Senior DevOps Engineer to join a growing technology team responsible for maintaining and evolving a complex on-premise platform. This role will play a key part in ensuring the reliability, performance, and continuous improvement of the organisation s technical estate while supporting the delivery of new services. This Senior DevOps Engineer is required to have experience with the following: Cloud Platform (Azure, AWS, or GCP), Kubernetes, IaC using Terraform and CI/CD pipelines. Skills & Experience required from the Senior DevOps Engineer: Active SC Clearance Cloud Platforms (Azure, AWS, or GCP) Strong experience supporting production environments, including on-call support and release management. Hands-on experience with Kubernetes or OpenShift in on-premise environments. Proven background in Linux system administration Experience implementing Infrastructure as Code, ideally using Terraform, integrated with CI/CD pipelines such as Jenkins. Observability platforms including logging, monitoring, and alerting tools such as ELK, Splunk, Prometheus, or Grafana. Experience improving DevOps tooling and contributing to technology roadmaps. Strong knowledge of Agile methodologies and modern DevOps practices. Experience working in the financial industry would be beneficial. Key Responsibilities of the Senior DevOps Engineer: Support, maintain and enhance the organisation s core platform and supporting infrastructure. Manage the promotion of code across environments, ensuring safe and controlled releases from pre-production through to live. Champion continuous improvement and modern DevOps practices, helping teams adopt automation-first approaches across build, deployment, and release processes. Provide technical mentorship and guidance to engineers, sharing best practices and supporting professional development within the team. Collaborate with architects and solution designers to align technical delivery with long-term product and technology roadmaps. Lead incident management activities, coordinating major incident responses and ensuring effective communication with stakeholders. Support a 24/7 production environment, including participation in an on-call rota and out-of-hours release support when required. Apply now to speak with VIQU IT in confidence. Or reach out to Connor Smal via the VIQU IT website. Do you know someone great? We ll thank you with up to £1,000 if your referral is successful (terms apply). For more exciting roles and opportunities like this, please follow us on IT Recruitment.
10/03/2026
Contractor
Senior DevOps Engineer 6-month contract London/Remote SC Clearance Inside IR35 My Financial Customer is looking for a Senior DevOps Engineer to join a growing technology team responsible for maintaining and evolving a complex on-premise platform. This role will play a key part in ensuring the reliability, performance, and continuous improvement of the organisation s technical estate while supporting the delivery of new services. This Senior DevOps Engineer is required to have experience with the following: Cloud Platform (Azure, AWS, or GCP), Kubernetes, IaC using Terraform and CI/CD pipelines. Skills & Experience required from the Senior DevOps Engineer: Active SC Clearance Cloud Platforms (Azure, AWS, or GCP) Strong experience supporting production environments, including on-call support and release management. Hands-on experience with Kubernetes or OpenShift in on-premise environments. Proven background in Linux system administration Experience implementing Infrastructure as Code, ideally using Terraform, integrated with CI/CD pipelines such as Jenkins. Observability platforms including logging, monitoring, and alerting tools such as ELK, Splunk, Prometheus, or Grafana. Experience improving DevOps tooling and contributing to technology roadmaps. Strong knowledge of Agile methodologies and modern DevOps practices. Experience working in the financial industry would be beneficial. Key Responsibilities of the Senior DevOps Engineer: Support, maintain and enhance the organisation s core platform and supporting infrastructure. Manage the promotion of code across environments, ensuring safe and controlled releases from pre-production through to live. Champion continuous improvement and modern DevOps practices, helping teams adopt automation-first approaches across build, deployment, and release processes. Provide technical mentorship and guidance to engineers, sharing best practices and supporting professional development within the team. Collaborate with architects and solution designers to align technical delivery with long-term product and technology roadmaps. Lead incident management activities, coordinating major incident responses and ensuring effective communication with stakeholders. Support a 24/7 production environment, including participation in an on-call rota and out-of-hours release support when required. Apply now to speak with VIQU IT in confidence. Or reach out to Connor Smal via the VIQU IT website. Do you know someone great? We ll thank you with up to £1,000 if your referral is successful (terms apply). For more exciting roles and opportunities like this, please follow us on IT Recruitment.
Site Reliability Engineer / SRE / Systems Engineer A fantastic opportunity for a Site Reliability Engineer / Systems Engineer to support highly available, scalable production systems within a fast-growing technology environment, working across cloud platforms, DevOps, networking and operational resilience. If you've also worked in the following roles, we'd also like to hear from you: DevOps Engineer, Operations Engineer, Cloud Engineer, Platform Engineer, Systems Engineer, Infrastructure Engineer, Production Engineer SALARY: up to £70,000 per annum (depending on experience) + Benefits LOCATION: Remote and Hybrid Working Options Available. You can either work remotely of if you prefer Hybrid working from home and the office in Altrincham, Greater Manchester, North West England JOB TYPE: Full-Time, Permanent JOB OVERVIEW We have a fantastic new job opportunity for a Site Reliability Engineer / Systems Engineer to join a growing technology team focused on delivering reliable, scalable and resilient platforms and services. As a Site Reliability Engineer/ Systems Engineer you will act as the vital link between operations, end users and backend development teams, ensuring system availability, performance optimisation and effective incident management across live environments. This Site Reliability Engineer/ Systems Engineer role offers the chance to work with modern cloud technologies, containerisation, observability tools and automation practices, while influencing long-term reliability improvements across business-critical systems. APPLY TODAY Ready to make your next career move? Apply Now for our Recruitment Team to review. DUTIES Your duties as the Site Reliability Engineer / Systems Engineer include: Incident Triage and Ownership: Acting as first-line technical escalation for live production issues through to resolution or handover System Monitoring and Availability: Maintaining high availability, performance and scalability of production platforms and services Observability Implementation: Managing logging, monitoring, alerting and metrics to proactively identify and resolve issues Reliability Improvements: Collaborating with development teams to translate operational insights into long-term platform resilience Automation and Resilience: Supporting automation, incident response and continuous improvement practices New Service Support: Ensuring new products and features are operable, reliable and scalable from day one Cross-Team Collaboration: Working with network engineering, operations and support teams to diagnose service issues Documentation and Reporting: Creating and maintaining runbooks, escalation guides and incident reports Incident Prioritisation: Balancing customer impact with long-term system health and stability Security and Compliance: Supporting compliance with security, availability and regulatory frameworks CANDIDATE REQUIREMENTS ESSENTIAL Previous experience in a Site Reliability Engineer, DevOps Engineer, Systems Engineer or Operations Engineer role Experience supporting production services at scale within a DevOps or SRE environment Strong working knowledge of ISP-related networking concepts including DNS, DHCP, PPPoE, RADIUS and IPv4/IPv6 Experience with observability tools such as Prometheus, Grafana, ELK or Splunk Hands-on experience with containerisation and orchestration using Docker and Kubernetes Cloud platform experience, ideally Google Cloud Platform, including automation and scaling practices Strong Linux administration skills with scripting capability in Bash, Python or similar Familiarity with CI/CD pipelines and source control tools such as GitHub Actions Understanding of security frameworks and operational resilience best practices DESIRABLE Experience within ISP, MSP or telecommunications environments Familiarity with enterprise IT architectures including OSS and BSS systems Knowledge of information security frameworks such as ISO27001, NIST or GDPR Experience with infrastructure automation tools such as Terraform or Ansible BENEFITS Smart casual dress code Free access to gym facilities Access to a financial wellbeing platform (on successful completion of probationary period) Access to an employee assistance programme, Virtual GP and Elderly Care support (on successful completion of probationary period) Access to cycle to work, childcare, and electric vehicle schemes after six months Brand new office with excellent transport links Supportive team culture, growth and career progression HOW TO APPLY To be considered for this job vacancy, please submit your CV to our Recruitment Team who will review your details. CV's of Job Applicants meeting this requirement will be submitted to our Client for consideration. By submitting your job application to us you are hereby giving us your express consent to submit your details to our Client for this purpose. JOB REF: AWDO-P14376 Full-Time, Permanent Jobs, Careers and Vacancies. Find a new job and work in Altrincham, Greater Manchester, North West England. Multi-Job Board Advertising and CV Sourcing Recruitment Services provided by AWD online. AWD online specialise in sourcing candidates and advertising vacancies on multiple job boards for companies on a non-commission basis. AWD online operates as an employment agency. awd online
10/03/2026
Full time
Site Reliability Engineer / SRE / Systems Engineer A fantastic opportunity for a Site Reliability Engineer / Systems Engineer to support highly available, scalable production systems within a fast-growing technology environment, working across cloud platforms, DevOps, networking and operational resilience. If you've also worked in the following roles, we'd also like to hear from you: DevOps Engineer, Operations Engineer, Cloud Engineer, Platform Engineer, Systems Engineer, Infrastructure Engineer, Production Engineer SALARY: up to £70,000 per annum (depending on experience) + Benefits LOCATION: Remote and Hybrid Working Options Available. You can either work remotely of if you prefer Hybrid working from home and the office in Altrincham, Greater Manchester, North West England JOB TYPE: Full-Time, Permanent JOB OVERVIEW We have a fantastic new job opportunity for a Site Reliability Engineer / Systems Engineer to join a growing technology team focused on delivering reliable, scalable and resilient platforms and services. As a Site Reliability Engineer/ Systems Engineer you will act as the vital link between operations, end users and backend development teams, ensuring system availability, performance optimisation and effective incident management across live environments. This Site Reliability Engineer/ Systems Engineer role offers the chance to work with modern cloud technologies, containerisation, observability tools and automation practices, while influencing long-term reliability improvements across business-critical systems. APPLY TODAY Ready to make your next career move? Apply Now for our Recruitment Team to review. DUTIES Your duties as the Site Reliability Engineer / Systems Engineer include: Incident Triage and Ownership: Acting as first-line technical escalation for live production issues through to resolution or handover System Monitoring and Availability: Maintaining high availability, performance and scalability of production platforms and services Observability Implementation: Managing logging, monitoring, alerting and metrics to proactively identify and resolve issues Reliability Improvements: Collaborating with development teams to translate operational insights into long-term platform resilience Automation and Resilience: Supporting automation, incident response and continuous improvement practices New Service Support: Ensuring new products and features are operable, reliable and scalable from day one Cross-Team Collaboration: Working with network engineering, operations and support teams to diagnose service issues Documentation and Reporting: Creating and maintaining runbooks, escalation guides and incident reports Incident Prioritisation: Balancing customer impact with long-term system health and stability Security and Compliance: Supporting compliance with security, availability and regulatory frameworks CANDIDATE REQUIREMENTS ESSENTIAL Previous experience in a Site Reliability Engineer, DevOps Engineer, Systems Engineer or Operations Engineer role Experience supporting production services at scale within a DevOps or SRE environment Strong working knowledge of ISP-related networking concepts including DNS, DHCP, PPPoE, RADIUS and IPv4/IPv6 Experience with observability tools such as Prometheus, Grafana, ELK or Splunk Hands-on experience with containerisation and orchestration using Docker and Kubernetes Cloud platform experience, ideally Google Cloud Platform, including automation and scaling practices Strong Linux administration skills with scripting capability in Bash, Python or similar Familiarity with CI/CD pipelines and source control tools such as GitHub Actions Understanding of security frameworks and operational resilience best practices DESIRABLE Experience within ISP, MSP or telecommunications environments Familiarity with enterprise IT architectures including OSS and BSS systems Knowledge of information security frameworks such as ISO27001, NIST or GDPR Experience with infrastructure automation tools such as Terraform or Ansible BENEFITS Smart casual dress code Free access to gym facilities Access to a financial wellbeing platform (on successful completion of probationary period) Access to an employee assistance programme, Virtual GP and Elderly Care support (on successful completion of probationary period) Access to cycle to work, childcare, and electric vehicle schemes after six months Brand new office with excellent transport links Supportive team culture, growth and career progression HOW TO APPLY To be considered for this job vacancy, please submit your CV to our Recruitment Team who will review your details. CV's of Job Applicants meeting this requirement will be submitted to our Client for consideration. By submitting your job application to us you are hereby giving us your express consent to submit your details to our Client for this purpose. JOB REF: AWDO-P14376 Full-Time, Permanent Jobs, Careers and Vacancies. Find a new job and work in Altrincham, Greater Manchester, North West England. Multi-Job Board Advertising and CV Sourcing Recruitment Services provided by AWD online. AWD online specialise in sourcing candidates and advertising vacancies on multiple job boards for companies on a non-commission basis. AWD online operates as an employment agency. awd online
Senior Infrastructure & Directory Services Engineer - Defence Programme Location: Caerphilly (with travel to Defence/Ministry of Defence sites as required)Clearance: SC (must hold or be eligible); DV highly desirablePackage: £65,000 + Bonus + comprehensive benefitsWorking Pattern: Hybrid (Mon & Fri optional WFH)This is a fantastic opportunity for you, a Senior infrastructure and Director Services Engineer - or similar, to join a long?established, financially robust technology provider with comprehensive experience delivering secure IT solutions across the UK public sector. The Opportunity A major Defence customer is undertaking an extensive programme to build a secure, on?premise, multi?domain virtualised infrastructure that underpins mission?critical and operational capabilities. As a Senior Infrastructure & Directory Services Engineer, you will design, implement, and support enterprise services across several secure domains - including Active Directory, identity services, Windows Server, DNS/DHCP, PKI, authentication platforms, and related Microsoft ecosystem components. Key Responsibilities - Architect, manage, and maintain multi?site Active Directory environments- Administer DNS, DHCP, and PKI in secure operational environments- Design and enforce Group Policy Objects (GPOs)- Manage Web Application Proxy (WAP), RDS, and federation services- Ensure alignment with MOD security standards- Participate in design reviews and Change Advisory Boards (CAB)- Work closely with security, cloud, networking, Linux, and virtualisation teams- Monitor performance using Splunk, event logging, and PowerShell automation- Automate tasks using PowerShell (essential) or Ansible (desirable) Essential Skills & Experience - Multi?site Active Directory support- Windows Server, DNS, DHCP- GPO design and troubleshooting- PKI and certificate services- Web Application Proxy & RDS- Strong PowerShell scripting- Working in a Defence or Government environment Desirable Skills - Defence sector experience- VMware vSphere, vCenter, ESXi; awareness of NSX?T- SIEM/monitoring experience- Ansible or other automation tools- Linux/Unix integration with AD- Secure WAN technologies (BGP/MPLS/VPN)- Existing SC/DV clearance Qualifications - Microsoft Identity & Access Administrator (or equivalent)- ITIL v4 Foundation or higher- Degree in Computer Science or equivalent experience Personal Attributes - Customer?focused and committed to service excellence- Strong ownership and reliability- Willingness to challenge established practices- Driven, proactive, and collaborativeIf you are a Senior Infrastructure and Director Services Engineer or similar, with a background in Defence, then please send your CV to me today as this will go quickly.
04/03/2026
Full time
Senior Infrastructure & Directory Services Engineer - Defence Programme Location: Caerphilly (with travel to Defence/Ministry of Defence sites as required)Clearance: SC (must hold or be eligible); DV highly desirablePackage: £65,000 + Bonus + comprehensive benefitsWorking Pattern: Hybrid (Mon & Fri optional WFH)This is a fantastic opportunity for you, a Senior infrastructure and Director Services Engineer - or similar, to join a long?established, financially robust technology provider with comprehensive experience delivering secure IT solutions across the UK public sector. The Opportunity A major Defence customer is undertaking an extensive programme to build a secure, on?premise, multi?domain virtualised infrastructure that underpins mission?critical and operational capabilities. As a Senior Infrastructure & Directory Services Engineer, you will design, implement, and support enterprise services across several secure domains - including Active Directory, identity services, Windows Server, DNS/DHCP, PKI, authentication platforms, and related Microsoft ecosystem components. Key Responsibilities - Architect, manage, and maintain multi?site Active Directory environments- Administer DNS, DHCP, and PKI in secure operational environments- Design and enforce Group Policy Objects (GPOs)- Manage Web Application Proxy (WAP), RDS, and federation services- Ensure alignment with MOD security standards- Participate in design reviews and Change Advisory Boards (CAB)- Work closely with security, cloud, networking, Linux, and virtualisation teams- Monitor performance using Splunk, event logging, and PowerShell automation- Automate tasks using PowerShell (essential) or Ansible (desirable) Essential Skills & Experience - Multi?site Active Directory support- Windows Server, DNS, DHCP- GPO design and troubleshooting- PKI and certificate services- Web Application Proxy & RDS- Strong PowerShell scripting- Working in a Defence or Government environment Desirable Skills - Defence sector experience- VMware vSphere, vCenter, ESXi; awareness of NSX?T- SIEM/monitoring experience- Ansible or other automation tools- Linux/Unix integration with AD- Secure WAN technologies (BGP/MPLS/VPN)- Existing SC/DV clearance Qualifications - Microsoft Identity & Access Administrator (or equivalent)- ITIL v4 Foundation or higher- Degree in Computer Science or equivalent experience Personal Attributes - Customer?focused and committed to service excellence- Strong ownership and reliability- Willingness to challenge established practices- Driven, proactive, and collaborativeIf you are a Senior Infrastructure and Director Services Engineer or similar, with a background in Defence, then please send your CV to me today as this will go quickly.
Network Technical Integration Lead Location: Hybrid (60% onsite / 40% remote) Duration: Contract through to November 2026 Rate: 550 - 604 per day - Inside IR35 Role Overview We are seeking an experienced Network Technical Integration Lead to drive engineering execution, ensure quality delivery, and lead safe, controlled change into production environments. This is a senior technical leadership role focused on resilience, automation-first engineering, and end-to-end design accountability. You will coach engineers, embed SRE best practices, and ensure non-functional requirements (NFRs) are engineered into every solution. Key Responsibilities End-to-End Design Ownership Lead integration design including non-functional requirements, resilience patterns, integration contracts, and rollback strategies. Change & Release Engineering Define and execute structured change rehearsals, smoke testing, soak testing, and production rollbacks. Automation & Infrastructure as Code Conduct code reviews across IaC and automation frameworks (e.g., Ansible, Terraform, Git-based pipelines). Drive automation as the default engineering approach. Site Reliability Engineering (SRE) Practices Implement error budgets, reduce toil, and enhance reliability engineering standards. Operational Excellence & Continuous Improvement Lead defect analytics and remediation sprints. Improve change models and expand automation coverage. Foster a disciplined post-incident review (PIR) culture. Essential Experience 8-10+ years' experience across network and security engineering (design, build, operate) Strong troubleshooting across multiple layers: Network Identity Endpoint Proxy SIEM / SOAR Hands-on automation experience (e.g., Ansible, Terraform, GitHub, Azure DevOps) Proven ITIL change leadership experience Major Incident Management (MIM) exposure Desirable Certifications CCNP / CCIE Zscaler Professional Fortinet NSE 4+ Splunk Admin / Enterprise Security ITIL 4 Managing Professional (or equivalent) Success Metrics Reduced change failure rate Improved MTTR (Mean Time to Resolution) Increased automated test coverage Higher percentage of changes moved to standard/automated models Measurable reduction in repeat incidents This role is ideal for a technically strong, automation-driven engineering leader who thrives in complex enterprise environments and is passionate about building resilient, production-ready network platforms.
23/02/2026
Contractor
Network Technical Integration Lead Location: Hybrid (60% onsite / 40% remote) Duration: Contract through to November 2026 Rate: 550 - 604 per day - Inside IR35 Role Overview We are seeking an experienced Network Technical Integration Lead to drive engineering execution, ensure quality delivery, and lead safe, controlled change into production environments. This is a senior technical leadership role focused on resilience, automation-first engineering, and end-to-end design accountability. You will coach engineers, embed SRE best practices, and ensure non-functional requirements (NFRs) are engineered into every solution. Key Responsibilities End-to-End Design Ownership Lead integration design including non-functional requirements, resilience patterns, integration contracts, and rollback strategies. Change & Release Engineering Define and execute structured change rehearsals, smoke testing, soak testing, and production rollbacks. Automation & Infrastructure as Code Conduct code reviews across IaC and automation frameworks (e.g., Ansible, Terraform, Git-based pipelines). Drive automation as the default engineering approach. Site Reliability Engineering (SRE) Practices Implement error budgets, reduce toil, and enhance reliability engineering standards. Operational Excellence & Continuous Improvement Lead defect analytics and remediation sprints. Improve change models and expand automation coverage. Foster a disciplined post-incident review (PIR) culture. Essential Experience 8-10+ years' experience across network and security engineering (design, build, operate) Strong troubleshooting across multiple layers: Network Identity Endpoint Proxy SIEM / SOAR Hands-on automation experience (e.g., Ansible, Terraform, GitHub, Azure DevOps) Proven ITIL change leadership experience Major Incident Management (MIM) exposure Desirable Certifications CCNP / CCIE Zscaler Professional Fortinet NSE 4+ Splunk Admin / Enterprise Security ITIL 4 Managing Professional (or equivalent) Success Metrics Reduced change failure rate Improved MTTR (Mean Time to Resolution) Increased automated test coverage Higher percentage of changes moved to standard/automated models Measurable reduction in repeat incidents This role is ideal for a technically strong, automation-driven engineering leader who thrives in complex enterprise environments and is passionate about building resilient, production-ready network platforms.
CBSbutler Holdings Limited trading as CBSbutler
Knutsford, Cheshire
Role: Senior DevOps Integration Engineer Location: Hybrid - 60% onsite / 40% remote - Cheshire Contract Length: Until 30/11/2026 Rate: 600 to 630 per day Role Overview We're looking for a Senior DevOps Integration Engineer to take ownership of designing, automating, and integrating modern CI/CD pipelines, cloud infrastructure, and platform tooling. This is a hands-on technical role focused on building scalable, resilient DevOps solutions that improve deployment speed, platform reliability, and engineering efficiency across multiple teams. If you like solving complex automation problems and making pipelines purr rather than scream, you'll feel at home here. Key Responsibilities Design, build, and maintain enterprise-scale CI/CD pipelines across multi-service environments Integrate build, test, security scanning, and deployment workflows Automate cloud infrastructure using Infrastructure as Code (Terraform preferred) Build and manage container platforms using Docker and Kubernetes (AKS/EKS/GKE) Implement monitoring, logging, and alerting integrations Collaborate with engineering, SRE, and security teams to embed DevOps best practices Troubleshoot pipeline failures, environment inconsistencies, and integration issues Drive continuous improvement, performance optimisation, and platform resilience Produce clear documentation, automation standards, and reusable tooling Required Skills & Experience Strong hands-on experience building CI/CD pipelines (Azure DevOps, GitHub Actions, GitLab CI, Jenkins) Solid cloud engineering experience (Azure, AWS, or GCP) Deep Infrastructure as Code expertise (Terraform highly preferred) Kubernetes and container orchestration experience Strong scripting skills (Python, Bash, PowerShell) Good understanding of DevSecOps practices (SAST, DAST, secrets management, code scanning) Excellent troubleshooting and systems integration skills Nice to Have Cloud / DevOps certifications (Azure DevOps Engineer, AWS DevOps Pro, CKA) Experience with API integration, service automation, or event-driven pipelines Familiarity with monitoring stacks (Prometheus, Grafana, ELK, Splunk) Experience with GitOps tooling (ArgoCD, Flux)
20/02/2026
Contractor
Role: Senior DevOps Integration Engineer Location: Hybrid - 60% onsite / 40% remote - Cheshire Contract Length: Until 30/11/2026 Rate: 600 to 630 per day Role Overview We're looking for a Senior DevOps Integration Engineer to take ownership of designing, automating, and integrating modern CI/CD pipelines, cloud infrastructure, and platform tooling. This is a hands-on technical role focused on building scalable, resilient DevOps solutions that improve deployment speed, platform reliability, and engineering efficiency across multiple teams. If you like solving complex automation problems and making pipelines purr rather than scream, you'll feel at home here. Key Responsibilities Design, build, and maintain enterprise-scale CI/CD pipelines across multi-service environments Integrate build, test, security scanning, and deployment workflows Automate cloud infrastructure using Infrastructure as Code (Terraform preferred) Build and manage container platforms using Docker and Kubernetes (AKS/EKS/GKE) Implement monitoring, logging, and alerting integrations Collaborate with engineering, SRE, and security teams to embed DevOps best practices Troubleshoot pipeline failures, environment inconsistencies, and integration issues Drive continuous improvement, performance optimisation, and platform resilience Produce clear documentation, automation standards, and reusable tooling Required Skills & Experience Strong hands-on experience building CI/CD pipelines (Azure DevOps, GitHub Actions, GitLab CI, Jenkins) Solid cloud engineering experience (Azure, AWS, or GCP) Deep Infrastructure as Code expertise (Terraform highly preferred) Kubernetes and container orchestration experience Strong scripting skills (Python, Bash, PowerShell) Good understanding of DevSecOps practices (SAST, DAST, secrets management, code scanning) Excellent troubleshooting and systems integration skills Nice to Have Cloud / DevOps certifications (Azure DevOps Engineer, AWS DevOps Pro, CKA) Experience with API integration, service automation, or event-driven pipelines Familiarity with monitoring stacks (Prometheus, Grafana, ELK, Splunk) Experience with GitOps tooling (ArgoCD, Flux)
CBSbutler Holdings Limited trading as CBSbutler
City, Sheffield
Role Title: Lead Data Engineer Location: Sheffield/hybrid (3 days on site) Duration: 9 months Rate: 430 per day inside ir35 We are seeking a Lead Data Engineering Consultant with proven experience in leading and developing data engineering platforms. Experience required: Extensive enterprise experience with Hadoop, Spark, and Splunk. Proficiency in object-oriented and functional scripting, particularly in Python. Skilled in handling raw, structured, semi-structured, and unstructured data (SQL and NoSQL). Experience integrating large, disparate datasets using modern tools and frameworks. Strong background in building and optimizing ETL/ELT data pipelines. Familiarity with source control and implementing Continuous Integration, Delivery, and Deployment via CI/CD pipelines. Experience supporting and collaborating with BI and Analytics teams in fast-paced environments. Ability to pair program and work effectively with other engineers. Excellent analytical and problem-solving abilities. Knowledge of agile methodologies such as Scrum or Kanban is a plus. Comfortable representing the team in standups and problem-solving sessions. Capable of driving the creation of technical test plans and maintaining records, including unit and integration tests, within automated test environments to ensure high code quality. Promote SRE (Site Reliability Engineering) culture by addressing challenges through data engineering. Ensure service resilience, sustainability, and adherence to recovery time objectives for all delivered software solutions. Soft Skills (Consultant): Demonstrated ability and enthusiasm for enhancing team performance. Strong active listening and effective communication skills. Self-mastery, with a focus on positive mindsets and professional behaviours. Maintains up-to-date expertise in current tools, technologies, and key areas such as cybersecurity, data privacy, consent, and data residency regulations. Engages with industry groups and external vendors to represent and advance HSBC's interests and influence. Takes accountability for ensuring control and compliance throughout the engineering process. Champions innovation and the adoption of advanced technologies and best practices within the domain. If you are interested in this role or wish to apply, please feel free to submit your CV.
19/02/2026
Contractor
Role Title: Lead Data Engineer Location: Sheffield/hybrid (3 days on site) Duration: 9 months Rate: 430 per day inside ir35 We are seeking a Lead Data Engineering Consultant with proven experience in leading and developing data engineering platforms. Experience required: Extensive enterprise experience with Hadoop, Spark, and Splunk. Proficiency in object-oriented and functional scripting, particularly in Python. Skilled in handling raw, structured, semi-structured, and unstructured data (SQL and NoSQL). Experience integrating large, disparate datasets using modern tools and frameworks. Strong background in building and optimizing ETL/ELT data pipelines. Familiarity with source control and implementing Continuous Integration, Delivery, and Deployment via CI/CD pipelines. Experience supporting and collaborating with BI and Analytics teams in fast-paced environments. Ability to pair program and work effectively with other engineers. Excellent analytical and problem-solving abilities. Knowledge of agile methodologies such as Scrum or Kanban is a plus. Comfortable representing the team in standups and problem-solving sessions. Capable of driving the creation of technical test plans and maintaining records, including unit and integration tests, within automated test environments to ensure high code quality. Promote SRE (Site Reliability Engineering) culture by addressing challenges through data engineering. Ensure service resilience, sustainability, and adherence to recovery time objectives for all delivered software solutions. Soft Skills (Consultant): Demonstrated ability and enthusiasm for enhancing team performance. Strong active listening and effective communication skills. Self-mastery, with a focus on positive mindsets and professional behaviours. Maintains up-to-date expertise in current tools, technologies, and key areas such as cybersecurity, data privacy, consent, and data residency regulations. Engages with industry groups and external vendors to represent and advance HSBC's interests and influence. Takes accountability for ensuring control and compliance throughout the engineering process. Champions innovation and the adoption of advanced technologies and best practices within the domain. If you are interested in this role or wish to apply, please feel free to submit your CV.
Windows Site Reliability Engineer 110K Hybrid London Overview: A leading global investment bank is seeking a skilled Windows Site Reliability Engineer to join their London-based Technology team. This is a senior-level leadership role focused on driving platform reliability, automation strategy, and infrastructure modernisation across mission-critical banking systems. This is an excellent opportunity to join an investment banking powerhouse. Role and Responsibilities: Lead Platform Engineering and SRE strategy across the organisation Build scalable, self-service, and resilient infrastructure platforms Drive Infrastructure as Code and CI/CD adoption, automating manual tasks Define SLOs and optimise system performance, latency, and scalability Lead root cause analysis and remediate technical debt Partner with development teams to improve testing, releases, and deployments Shape cloud and hybrid migration strategies Ensure regulatory, audit, and security compliance Act as senior escalation point for high-severity incidents Identify cost and efficiency improvements; deputise for senior leadership Skills and Experience: Deep expertise in Microsoft Windows Server internals Advanced knowledge of Active Directory, DNS, DHCP, LDAP, and Kerberos Extensive experience tuning low-latency, enterprise-scale systems Strong background in SRE, DevOps, CI/CD, and Infrastructure as Code principles Advanced scripting and development capability (PowerShell, Python, C#) Clustering, high-availability, replication, and disaster recovery expertise Experience with Git, Terraform, Ansible, Jenkins, and TeamCity Strong understanding of security hardening (CIS) within regulated environments Deep knowledge of performance monitoring, system internals, and optimisation techniques Experience working within financial services or other regulated industries Desirable: Kubernetes and Docker orchestration Major cloud platforms (Azure, AWS, GCP) Nutanix HCI and VMware ESX Enterprise databases (SQL Server, Oracle, Sybase ASE, MongoDB, Snowflake) Enterprise monitoring tools (Splunk, Netcool) Package Circa 110,000 + Excellent Package London / Hybrid x3 days onsite Flexible working considered Windows Site Reliability Engineer 110K Hybrid London
16/02/2026
Full time
Windows Site Reliability Engineer 110K Hybrid London Overview: A leading global investment bank is seeking a skilled Windows Site Reliability Engineer to join their London-based Technology team. This is a senior-level leadership role focused on driving platform reliability, automation strategy, and infrastructure modernisation across mission-critical banking systems. This is an excellent opportunity to join an investment banking powerhouse. Role and Responsibilities: Lead Platform Engineering and SRE strategy across the organisation Build scalable, self-service, and resilient infrastructure platforms Drive Infrastructure as Code and CI/CD adoption, automating manual tasks Define SLOs and optimise system performance, latency, and scalability Lead root cause analysis and remediate technical debt Partner with development teams to improve testing, releases, and deployments Shape cloud and hybrid migration strategies Ensure regulatory, audit, and security compliance Act as senior escalation point for high-severity incidents Identify cost and efficiency improvements; deputise for senior leadership Skills and Experience: Deep expertise in Microsoft Windows Server internals Advanced knowledge of Active Directory, DNS, DHCP, LDAP, and Kerberos Extensive experience tuning low-latency, enterprise-scale systems Strong background in SRE, DevOps, CI/CD, and Infrastructure as Code principles Advanced scripting and development capability (PowerShell, Python, C#) Clustering, high-availability, replication, and disaster recovery expertise Experience with Git, Terraform, Ansible, Jenkins, and TeamCity Strong understanding of security hardening (CIS) within regulated environments Deep knowledge of performance monitoring, system internals, and optimisation techniques Experience working within financial services or other regulated industries Desirable: Kubernetes and Docker orchestration Major cloud platforms (Azure, AWS, GCP) Nutanix HCI and VMware ESX Enterprise databases (SQL Server, Oracle, Sybase ASE, MongoDB, Snowflake) Enterprise monitoring tools (Splunk, Netcool) Package Circa 110,000 + Excellent Package London / Hybrid x3 days onsite Flexible working considered Windows Site Reliability Engineer 110K Hybrid London
My Client, a large global financial services brand is looking for an experienced Mobile QA Engineer (Manual) on a initial contract basis. The role in Inside IR35 and Hybrid in London (3 days based onsite). We're looking for a Mobile QA Engineer (Manual) to work on award winning mobile applications that will be used by millions of customers worldwide. We want someone with strong technical skills and creativity. Should enjoy solving tough problems and working with new technologies. You should not be shy about sharing your ideas and be obsessive about user experience and high quality. You'll be part of the Mobile Engineering team whose mandate is to develop new products and platforms for customers. Mobile Engineering's aim is to build interactive experiences at all touch points of a consumer's journey whether before, at, or after the time of purchase. Responsibilities Collaborate with Product, Design and Development teams to understand product requirements and create comprehensive test plan and test cases. Execute functional and automated tests to verify the accuracy, completeness, and reliability of functionality. Contribute to the development and enhancement of UI automated testing frameworks built on Espresso (Android) and XCUITest (iOS). Analyse requirements and determine technical feasibility for Automation. Integrate automated tests into CI to identify issues during development cycle. Contribute to PR reviews, submit PRs, and contribute to the goal of 100% regression automation readiness. Develop and maintain robust, scalable, reusable automated test scripts across applications. Identify, document, and track defects, working closely with development teams to ensure timely resolution and retesting. Improve QA delivery and quality through defining test strategy, process improvements, coordination with multiple back end teams. Work with the development team to define and implement mechanisms to inject testing earlier into the software development process via mocking strategy. Prioritise competing demands, manage multiple concurrent tasks, adapt to changing priorities. Participate in regression testing to validate new enhancements don't negatively impact existing functionality. Continuously improve the QA process and contribute to the development of testing best practices. Qualifications: Minimum 7+ years of technical experience with a bachelor's or master's degree in science (preferably Computer Science, Engineering, or other related disciplines). Must have hands-on testing experience in iOS and Android mobile platforms by leveraging various functional and automated tools. Minimum 3+ years of mobile app Automation experience with tools like Monkey Talk, Selendriod, Appium, Katalon etc. Possesses deep knowledge on Functional, Integration, Regression, Exploratory, End to End, Compatibility, GUI, Web Services and Accessibility testing. Good Understanding of Swift, Kotlin or similar functional programming language. Strong programming abilities and debugging skills. Excellent API testing experience using Postman, IntelliJ Http Client, or similar tools. Strong experience with Debugging tools like Charles Proxy, Splunk, Sentry, Console or similar. Excellent communication and team player. Experience with full life cycle software deployment using Agile practices. Strong attention to detail and ability to work in a fast-paced environment.
02/10/2025
Contractor
My Client, a large global financial services brand is looking for an experienced Mobile QA Engineer (Manual) on a initial contract basis. The role in Inside IR35 and Hybrid in London (3 days based onsite). We're looking for a Mobile QA Engineer (Manual) to work on award winning mobile applications that will be used by millions of customers worldwide. We want someone with strong technical skills and creativity. Should enjoy solving tough problems and working with new technologies. You should not be shy about sharing your ideas and be obsessive about user experience and high quality. You'll be part of the Mobile Engineering team whose mandate is to develop new products and platforms for customers. Mobile Engineering's aim is to build interactive experiences at all touch points of a consumer's journey whether before, at, or after the time of purchase. Responsibilities Collaborate with Product, Design and Development teams to understand product requirements and create comprehensive test plan and test cases. Execute functional and automated tests to verify the accuracy, completeness, and reliability of functionality. Contribute to the development and enhancement of UI automated testing frameworks built on Espresso (Android) and XCUITest (iOS). Analyse requirements and determine technical feasibility for Automation. Integrate automated tests into CI to identify issues during development cycle. Contribute to PR reviews, submit PRs, and contribute to the goal of 100% regression automation readiness. Develop and maintain robust, scalable, reusable automated test scripts across applications. Identify, document, and track defects, working closely with development teams to ensure timely resolution and retesting. Improve QA delivery and quality through defining test strategy, process improvements, coordination with multiple back end teams. Work with the development team to define and implement mechanisms to inject testing earlier into the software development process via mocking strategy. Prioritise competing demands, manage multiple concurrent tasks, adapt to changing priorities. Participate in regression testing to validate new enhancements don't negatively impact existing functionality. Continuously improve the QA process and contribute to the development of testing best practices. Qualifications: Minimum 7+ years of technical experience with a bachelor's or master's degree in science (preferably Computer Science, Engineering, or other related disciplines). Must have hands-on testing experience in iOS and Android mobile platforms by leveraging various functional and automated tools. Minimum 3+ years of mobile app Automation experience with tools like Monkey Talk, Selendriod, Appium, Katalon etc. Possesses deep knowledge on Functional, Integration, Regression, Exploratory, End to End, Compatibility, GUI, Web Services and Accessibility testing. Good Understanding of Swift, Kotlin or similar functional programming language. Strong programming abilities and debugging skills. Excellent API testing experience using Postman, IntelliJ Http Client, or similar tools. Strong experience with Debugging tools like Charles Proxy, Splunk, Sentry, Console or similar. Excellent communication and team player. Experience with full life cycle software deployment using Agile practices. Strong attention to detail and ability to work in a fast-paced environment.
Job Title: Splunk Site Reliability Engineer/Migration Specialist (Contract) Location: Birmingham (Hybrid/On-site, required 3 days per week) Contract Type: Contract Duration: 3 months rolling Job Summary: We are seeking an experienced Splunk SME/Migration Specialist to lead and support the migration of observability workloads from Splunk to Elasticsearch (ELK Stack) . The ideal candidate will bring hands-on expertise in Splunk architecture, data ingestion, alerting, and dashboarding, along with experience migrating workloads to Elasticsearch. In addition to migration duties, the candidate will maintain and enhance existing Splunk infrastructure, provide incident support, manage upgrades, and ensure observability platforms remain secure and performant. This role demands a technically strong individual with excellent stakeholder communication and problem-solving skills. Key Responsibilities: Migration: Develop and implement a comprehensive migration strategy from Splunk to Elasticsearch (ELK Stack). Assess existing Splunk configurations (dashboards, alerts, saved searches, data models) and recreate them in Kibana. Collaborate with Elastic teams to configure alerting and monitoring using Kibana, Elasticsearch Watcher, or third-party tools. Ensure migration plans include validation, rollback procedures, and knowledge transfer. Platform Operations & Incident Response: Maintain Splunk infrastructure in both Production and Non-Production environments. Support Splunk SRE and Application teams in incident investigation and resolution. Proactively monitor system health and performance metrics. Upgrades and Change Management: Plan and execute upgrades to Splunk components. Perform pre- and post-upgrade checks and validations. Prepare documentation and submit Change Requests following organizational procedures. Security and Compliance: Work with Puppet and other automation tools to ensure timely patching of vulnerabilities. Implement and verify security best practices for observability platforms. Support compliance initiatives and audits. Documentation and Knowledge Sharing: Maintain accurate and up-to-date technical documentation, including architecture diagrams, configurations, procedures, and troubleshooting guides. Review and update support articles and take ownership of relevant assets. Support knowledge transfer across teams as needed. Troubleshooting and Support: Identify and resolve issues in Splunk and ELK environments. Assist teams with Splunk-related queries and optimization efforts. Skills and Qualifications: Essential: Proven expertise with Splunk architecture , data ingestion, dashboarding, alerting, and administration. Experience migrating Splunk workloads to Elasticsearch (ELK Stack) . Solid understanding of Kibana , Elasticsearch Watcher , and observability tooling. Proficiency in Linux/Unix systems and networking protocols . Hands-on experience with Scripting (eg, Python, Shell/Bash). Experience supporting or working alongside DevOps/SRE teams . Strong analytical, troubleshooting, and communication skills. Desirable: Experience with containerized environments such as Docker or Kubernetes . Industry certifications such as Splunk Certified Power User/Admin/Architect . Knowledge of automation tools (eg, Puppet, Ansible). Bachelor's degree in Computer Science, Information Systems, or related field. Key Attributes: Independent and proactive problem-solver. Collaborative and able to work cross-functionally with infrastructure, security, and application teams. Able to work under pressure and prioritize tasks effectively. Strong communicator, both written and verbal.
04/09/2025
Contractor
Job Title: Splunk Site Reliability Engineer/Migration Specialist (Contract) Location: Birmingham (Hybrid/On-site, required 3 days per week) Contract Type: Contract Duration: 3 months rolling Job Summary: We are seeking an experienced Splunk SME/Migration Specialist to lead and support the migration of observability workloads from Splunk to Elasticsearch (ELK Stack) . The ideal candidate will bring hands-on expertise in Splunk architecture, data ingestion, alerting, and dashboarding, along with experience migrating workloads to Elasticsearch. In addition to migration duties, the candidate will maintain and enhance existing Splunk infrastructure, provide incident support, manage upgrades, and ensure observability platforms remain secure and performant. This role demands a technically strong individual with excellent stakeholder communication and problem-solving skills. Key Responsibilities: Migration: Develop and implement a comprehensive migration strategy from Splunk to Elasticsearch (ELK Stack). Assess existing Splunk configurations (dashboards, alerts, saved searches, data models) and recreate them in Kibana. Collaborate with Elastic teams to configure alerting and monitoring using Kibana, Elasticsearch Watcher, or third-party tools. Ensure migration plans include validation, rollback procedures, and knowledge transfer. Platform Operations & Incident Response: Maintain Splunk infrastructure in both Production and Non-Production environments. Support Splunk SRE and Application teams in incident investigation and resolution. Proactively monitor system health and performance metrics. Upgrades and Change Management: Plan and execute upgrades to Splunk components. Perform pre- and post-upgrade checks and validations. Prepare documentation and submit Change Requests following organizational procedures. Security and Compliance: Work with Puppet and other automation tools to ensure timely patching of vulnerabilities. Implement and verify security best practices for observability platforms. Support compliance initiatives and audits. Documentation and Knowledge Sharing: Maintain accurate and up-to-date technical documentation, including architecture diagrams, configurations, procedures, and troubleshooting guides. Review and update support articles and take ownership of relevant assets. Support knowledge transfer across teams as needed. Troubleshooting and Support: Identify and resolve issues in Splunk and ELK environments. Assist teams with Splunk-related queries and optimization efforts. Skills and Qualifications: Essential: Proven expertise with Splunk architecture , data ingestion, dashboarding, alerting, and administration. Experience migrating Splunk workloads to Elasticsearch (ELK Stack) . Solid understanding of Kibana , Elasticsearch Watcher , and observability tooling. Proficiency in Linux/Unix systems and networking protocols . Hands-on experience with Scripting (eg, Python, Shell/Bash). Experience supporting or working alongside DevOps/SRE teams . Strong analytical, troubleshooting, and communication skills. Desirable: Experience with containerized environments such as Docker or Kubernetes . Industry certifications such as Splunk Certified Power User/Admin/Architect . Knowledge of automation tools (eg, Puppet, Ansible). Bachelor's degree in Computer Science, Information Systems, or related field. Key Attributes: Independent and proactive problem-solver. Collaborative and able to work cross-functionally with infrastructure, security, and application teams. Able to work under pressure and prioritize tasks effectively. Strong communicator, both written and verbal.
Site Reliability Engineer
Southampton HQ - 2 Times a week in Office
Cloud, SaaS, AWS,
Please be advised Security Clearance is required for this position
We are working alongside one of our longstanding clients in helping them recruit a Site Reliability Engineer. The company deliver cutting-edge enterprise software solutions across both cloud and on-premises environments, empowering organisations to enhance customer experiences, maintain regulatory compliance, and proactively fight fraud. The company are trusted by businesses worldwide to drive seamless, intelligent customer interactions.
In this role, you'll oversee the production environment by ensuring system availability and maintaining a comprehensive perspective on overall health. You'll develop tools and software to support and streamline the management of platform infrastructure and key applications. A major focus will be enhancing the dependability, performance, and delivery speed of our software products. You'll also be responsible for analysing and fine-tuning system performance to anticipate user demands and drive innovation. Additionally, you'll take the lead in providing operational support and technical oversight for several large-scale distributed applications.
How You'll Contribute:
Monitor and interpret system and application metrics to fine-tune performance and troubleshoot issues effectively
Collaborate closely with developers to enhance service quality through thorough testing and structured release practices
Engage in architectural discussions, manage platform operations, and contribute to capacity forecasting
Design and implement automated solutions to build resilient, scalable systems
Maintain a strong focus on delivering new features while ensuring stability and adherence to service level goalsYou'll Stand Out If You Have:
Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus
Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo
Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck
Experience using configuration management platforms like Ansible, Puppet, or Chef
Professional certifications in cloud DevOps, such as AWS Certified DevOps Engineer or Google Cloud Professional DevOps Engineer, or similar credentialsDo You Have What It Takes?
3-6 years of hands-on experience in a similar role, with a strong emphasis on systems engineering, automation, and service reliability
Proficient in at least one programming language such as Python, Go, Java, or C#, along with scripting skills in Bash or PowerShell
Solid grasp of cloud platforms like AWS, including an understanding of how core services like EC2, ECS, Lambda, and DynamoDB operate under reliability constraints
Practical experience using infrastructure-as-code tools like CloudFormation or Terraform
In-depth knowledge of CI/CD principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI
Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture
Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch
Excellent analytical and troubleshooting abilities, especially within complex distributed systems
Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through resolution and communication during critical outagesBenefits
Life Insurance - 4 x Annual Salary
Private Medical Insurance
Employee Assistance Programme
Hybrid Working - 3 Days from Home
GP Online Assistance Portal.
+ Much MorePlease click the "Apply" button to state your interest in this position.
Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy
01/06/2025
Site Reliability Engineer
Southampton HQ - 2 Times a week in Office
Cloud, SaaS, AWS,
Please be advised Security Clearance is required for this position
We are working alongside one of our longstanding clients in helping them recruit a Site Reliability Engineer. The company deliver cutting-edge enterprise software solutions across both cloud and on-premises environments, empowering organisations to enhance customer experiences, maintain regulatory compliance, and proactively fight fraud. The company are trusted by businesses worldwide to drive seamless, intelligent customer interactions.
In this role, you'll oversee the production environment by ensuring system availability and maintaining a comprehensive perspective on overall health. You'll develop tools and software to support and streamline the management of platform infrastructure and key applications. A major focus will be enhancing the dependability, performance, and delivery speed of our software products. You'll also be responsible for analysing and fine-tuning system performance to anticipate user demands and drive innovation. Additionally, you'll take the lead in providing operational support and technical oversight for several large-scale distributed applications.
How You'll Contribute:
Monitor and interpret system and application metrics to fine-tune performance and troubleshoot issues effectively
Collaborate closely with developers to enhance service quality through thorough testing and structured release practices
Engage in architectural discussions, manage platform operations, and contribute to capacity forecasting
Design and implement automated solutions to build resilient, scalable systems
Maintain a strong focus on delivering new features while ensuring stability and adherence to service level goalsYou'll Stand Out If You Have:
Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus
Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo
Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck
Experience using configuration management platforms like Ansible, Puppet, or Chef
Professional certifications in cloud DevOps, such as AWS Certified DevOps Engineer or Google Cloud Professional DevOps Engineer, or similar credentialsDo You Have What It Takes?
3-6 years of hands-on experience in a similar role, with a strong emphasis on systems engineering, automation, and service reliability
Proficient in at least one programming language such as Python, Go, Java, or C#, along with scripting skills in Bash or PowerShell
Solid grasp of cloud platforms like AWS, including an understanding of how core services like EC2, ECS, Lambda, and DynamoDB operate under reliability constraints
Practical experience using infrastructure-as-code tools like CloudFormation or Terraform
In-depth knowledge of CI/CD principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI
Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture
Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch
Excellent analytical and troubleshooting abilities, especially within complex distributed systems
Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through resolution and communication during critical outagesBenefits
Life Insurance - 4 x Annual Salary
Private Medical Insurance
Employee Assistance Programme
Hybrid Working - 3 Days from Home
GP Online Assistance Portal.
+ Much MorePlease click the "Apply" button to state your interest in this position.
Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy
Site Reliability Engineer
Southampton HQ - 2 Times a week in Office
Cloud, SaaS, AWS,
Please be advised Security Clearance is required for this position
We are working alongside one of our longstanding clients in helping them recruit a Site Reliability Engineer. The company deliver cutting-edge enterprise software solutions across both cloud and on-premises environments, empowering organisations to enhance customer experiences, maintain regulatory compliance, and proactively fight fraud. The company are trusted by businesses worldwide to drive seamless, intelligent customer interactions.
In this role, you'll oversee the production environment by ensuring system availability and maintaining a comprehensive perspective on overall health. You'll develop tools and software to support and streamline the management of platform infrastructure and key applications. A major focus will be enhancing the dependability, performance, and delivery speed of our software products. You'll also be responsible for analysing and fine-tuning system performance to anticipate user demands and drive innovation. Additionally, you'll take the lead in providing operational support and technical oversight for several large-scale distributed applications.
How You'll Contribute:
Monitor and interpret system and application metrics to fine-tune performance and troubleshoot issues effectively
Collaborate closely with developers to enhance service quality through thorough testing and structured release practices
Engage in architectural discussions, manage platform operations, and contribute to capacity forecasting
Design and implement automated solutions to build resilient, scalable systems
Maintain a strong focus on delivering new features while ensuring stability and adherence to service level goalsYou'll Stand Out If You Have:
Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus
Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo
Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck
Experience using configuration management platforms like Ansible, Puppet, or Chef
Professional certifications in cloud DevOps, such as AWS Certified DevOps Engineer or Google Cloud Professional DevOps Engineer, or similar credentialsDo You Have What It Takes?
3-6 years of hands-on experience in a similar role, with a strong emphasis on systems engineering, automation, and service reliability
Proficient in at least one programming language such as Python, Go, Java, or C#, along with scripting skills in Bash or PowerShell
Solid grasp of cloud platforms like AWS, including an understanding of how core services like EC2, ECS, Lambda, and DynamoDB operate under reliability constraints
Practical experience using infrastructure-as-code tools like CloudFormation or Terraform
In-depth knowledge of CI/CD principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI
Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture
Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch
Excellent analytical and troubleshooting abilities, especially within complex distributed systems
Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through resolution and communication during critical outagesBenefits
Life Insurance - 4 x Annual Salary
Private Medical Insurance
Employee Assistance Programme
Hybrid Working - 3 Days from Home
GP Online Assistance Portal.
+ Much MorePlease click the "Apply" button to state your interest in this position.
Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy
01/06/2025
Site Reliability Engineer
Southampton HQ - 2 Times a week in Office
Cloud, SaaS, AWS,
Please be advised Security Clearance is required for this position
We are working alongside one of our longstanding clients in helping them recruit a Site Reliability Engineer. The company deliver cutting-edge enterprise software solutions across both cloud and on-premises environments, empowering organisations to enhance customer experiences, maintain regulatory compliance, and proactively fight fraud. The company are trusted by businesses worldwide to drive seamless, intelligent customer interactions.
In this role, you'll oversee the production environment by ensuring system availability and maintaining a comprehensive perspective on overall health. You'll develop tools and software to support and streamline the management of platform infrastructure and key applications. A major focus will be enhancing the dependability, performance, and delivery speed of our software products. You'll also be responsible for analysing and fine-tuning system performance to anticipate user demands and drive innovation. Additionally, you'll take the lead in providing operational support and technical oversight for several large-scale distributed applications.
How You'll Contribute:
Monitor and interpret system and application metrics to fine-tune performance and troubleshoot issues effectively
Collaborate closely with developers to enhance service quality through thorough testing and structured release practices
Engage in architectural discussions, manage platform operations, and contribute to capacity forecasting
Design and implement automated solutions to build resilient, scalable systems
Maintain a strong focus on delivering new features while ensuring stability and adherence to service level goalsYou'll Stand Out If You Have:
Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus
Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo
Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck
Experience using configuration management platforms like Ansible, Puppet, or Chef
Professional certifications in cloud DevOps, such as AWS Certified DevOps Engineer or Google Cloud Professional DevOps Engineer, or similar credentialsDo You Have What It Takes?
3-6 years of hands-on experience in a similar role, with a strong emphasis on systems engineering, automation, and service reliability
Proficient in at least one programming language such as Python, Go, Java, or C#, along with scripting skills in Bash or PowerShell
Solid grasp of cloud platforms like AWS, including an understanding of how core services like EC2, ECS, Lambda, and DynamoDB operate under reliability constraints
Practical experience using infrastructure-as-code tools like CloudFormation or Terraform
In-depth knowledge of CI/CD principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI
Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture
Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch
Excellent analytical and troubleshooting abilities, especially within complex distributed systems
Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through resolution and communication during critical outagesBenefits
Life Insurance - 4 x Annual Salary
Private Medical Insurance
Employee Assistance Programme
Hybrid Working - 3 Days from Home
GP Online Assistance Portal.
+ Much MorePlease click the "Apply" button to state your interest in this position.
Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy
NBC Sports Next is where sports and technology intersect. We're a subdivision of NBC Sports and home to all NBCUniversal digital applications in sports and technology within our three groups: Youth & Recreational Sports; Golf; and Betting, Gaming & Emerging Media. At NBC Sports Next, we make playing sports better through innovative technology and immersive experiences for athletes, coaches, players and fans. We equip more than 30MM players, coaches, athletes, sports administrators and fans in 40 countries with more than 25 sports solution products, including SportsEngine, the largest youth sports club, league and team management platform; GolfNow, the leading online tee time marketplace and provider of golf course operations technology; GolfPass the ultimate golf membership that connects golfers to exclusive content, tee time credits, and coaching, tips; TeamUnify, swim team management services; GoMotion, sports and fitness business software solutions; and NBC Sports Edge, a leading platform for fantasy sports information and betting-focused tools. At NBC Sports Next we're fueled by our mission to innovate, create larger-than-life events and connect with sports fans through technology that provides the ultimate in immersive experiences. This role is part of our Youth & Recreational Sports group, comprised of technology platforms such as SportsEngine, GoMotion, TourneyMachine, and TeamUnify. We enable athletes, parents, coaches and team administrators in the youth and recreational space to manage their organizations, collect payments, share schedules, find programs to participate in and connect with other families. Additionally, NCSI enables leagues and organizations to properly screen and train coaches in an effort to keep kids safe. Come join us as we work together as one team to innovate and deliver what's Next. Job Description Based out of our Belfast offices or working remotely within the UK or Ireland, the Senior Platform Operations Engineer will be a key member of our Platform Operations Team, helping to build and support the core infrastructure of the SportsEngine Platform services and products through activities and key responsibilities that include: Contributing to efforts that ensure the continuous and smooth running of the SportsEngine platform while serving a large volume of traffic. Leveraging Amazon Web Services to build highly available services for the SportsEngine infrastructure platform built on top of the EKS, RDS and EC2. Developing Infrastructure as code using tools like Terraform. Helping to foster a culture of cooperation, coordination, and continuous learning within the Platform Operations Team and with other Product Development teams throughout SportsEngine. Working closely with the SportsEngine Cyber Security Team to maintain and improve the security of the SportsEngine Platform. Contributing to and using our GitHub Pull Request-centered development pipeline as we continuously deliver value to our customers. Using tools such as NewRelic, Splunk and Datadog to monitor the health of the SportsEngine platform. Being an advocate for quality code and engineering practices that enable Continuous Delivery. Participation in a sustainable on-call schedule. Qualifications • 5 or more years of experience in the field of Software Engineering which operating web applications in a Site Reliability Engineering, Web Operations, or Cloud Engineering capacity. • A strong foundation in modern infrastructure practices and the ability to deploy and operate maintainable, scalable secure infrastructure. • Ability to write quality, modular, maintainable, secure, and testable infrastructure automation. • A team-oriented attitude and seemingly endless intellectual curiosity. • Excellent verbal and written communication skills. Desired skills & experience •AWS Experience Experience in the following areas of AWS: - EC2 - VPC - Subnets, Security Groups, NAT Gateways, Transit Gateways, ELB/ALB/NLB etc. - IAM - S3 - Managed data tiers - RDS/Elasticache etc. • Experience in production with: - EKS - OpsWorks - Lambda - DynamoDB • Kubernetes - Production experience of running services in Kubernetes - Ability to take a VM based application and migrate to Kubernetes • CI/CD - Experience with CI/CD pipelines, assisting developers in delivering changes on a daily cadence - Experience with TravisCI, Jenkins, Gitlab CI, Github Actions or similar technologies • Automation - Ability to script automation in one of either Ruby, Python, Go etc • Infrastructure as Code - Terraform Ability to author Terraform at a proficient level Ability to break out reusable, opinionated and standardized actions into Terraform modules - Chef/Ansible Additional Information NBCUniversal's policy is to provide equal employment opportunities to all applicants and employees without regard to race, color, religion, creed, gender, gender identity or expression, age, national origin or ancestry, citizenship, disability, sexual orientation, marital status, pregnancy, veteran status, membership in the uniformed services, genetic information, or any other basis protected by applicable law. NBCUniversal will consider for employment qualified applicants with criminal histories in a manner consistent with relevant legal requirements, including the City of Los Angeles Fair Chance Initiative For Hiring Ordinance, where applicable. If you are a qualified individual with a disability or a disabled veteran, you have the right to request a reasonable accommodation if you are unable or limited in your ability to use or access as a result of your disability. You can request reasonable accommodations in the US by calling 1- and in the UK by calling .
24/09/2022
Full time
NBC Sports Next is where sports and technology intersect. We're a subdivision of NBC Sports and home to all NBCUniversal digital applications in sports and technology within our three groups: Youth & Recreational Sports; Golf; and Betting, Gaming & Emerging Media. At NBC Sports Next, we make playing sports better through innovative technology and immersive experiences for athletes, coaches, players and fans. We equip more than 30MM players, coaches, athletes, sports administrators and fans in 40 countries with more than 25 sports solution products, including SportsEngine, the largest youth sports club, league and team management platform; GolfNow, the leading online tee time marketplace and provider of golf course operations technology; GolfPass the ultimate golf membership that connects golfers to exclusive content, tee time credits, and coaching, tips; TeamUnify, swim team management services; GoMotion, sports and fitness business software solutions; and NBC Sports Edge, a leading platform for fantasy sports information and betting-focused tools. At NBC Sports Next we're fueled by our mission to innovate, create larger-than-life events and connect with sports fans through technology that provides the ultimate in immersive experiences. This role is part of our Youth & Recreational Sports group, comprised of technology platforms such as SportsEngine, GoMotion, TourneyMachine, and TeamUnify. We enable athletes, parents, coaches and team administrators in the youth and recreational space to manage their organizations, collect payments, share schedules, find programs to participate in and connect with other families. Additionally, NCSI enables leagues and organizations to properly screen and train coaches in an effort to keep kids safe. Come join us as we work together as one team to innovate and deliver what's Next. Job Description Based out of our Belfast offices or working remotely within the UK or Ireland, the Senior Platform Operations Engineer will be a key member of our Platform Operations Team, helping to build and support the core infrastructure of the SportsEngine Platform services and products through activities and key responsibilities that include: Contributing to efforts that ensure the continuous and smooth running of the SportsEngine platform while serving a large volume of traffic. Leveraging Amazon Web Services to build highly available services for the SportsEngine infrastructure platform built on top of the EKS, RDS and EC2. Developing Infrastructure as code using tools like Terraform. Helping to foster a culture of cooperation, coordination, and continuous learning within the Platform Operations Team and with other Product Development teams throughout SportsEngine. Working closely with the SportsEngine Cyber Security Team to maintain and improve the security of the SportsEngine Platform. Contributing to and using our GitHub Pull Request-centered development pipeline as we continuously deliver value to our customers. Using tools such as NewRelic, Splunk and Datadog to monitor the health of the SportsEngine platform. Being an advocate for quality code and engineering practices that enable Continuous Delivery. Participation in a sustainable on-call schedule. Qualifications • 5 or more years of experience in the field of Software Engineering which operating web applications in a Site Reliability Engineering, Web Operations, or Cloud Engineering capacity. • A strong foundation in modern infrastructure practices and the ability to deploy and operate maintainable, scalable secure infrastructure. • Ability to write quality, modular, maintainable, secure, and testable infrastructure automation. • A team-oriented attitude and seemingly endless intellectual curiosity. • Excellent verbal and written communication skills. Desired skills & experience •AWS Experience Experience in the following areas of AWS: - EC2 - VPC - Subnets, Security Groups, NAT Gateways, Transit Gateways, ELB/ALB/NLB etc. - IAM - S3 - Managed data tiers - RDS/Elasticache etc. • Experience in production with: - EKS - OpsWorks - Lambda - DynamoDB • Kubernetes - Production experience of running services in Kubernetes - Ability to take a VM based application and migrate to Kubernetes • CI/CD - Experience with CI/CD pipelines, assisting developers in delivering changes on a daily cadence - Experience with TravisCI, Jenkins, Gitlab CI, Github Actions or similar technologies • Automation - Ability to script automation in one of either Ruby, Python, Go etc • Infrastructure as Code - Terraform Ability to author Terraform at a proficient level Ability to break out reusable, opinionated and standardized actions into Terraform modules - Chef/Ansible Additional Information NBCUniversal's policy is to provide equal employment opportunities to all applicants and employees without regard to race, color, religion, creed, gender, gender identity or expression, age, national origin or ancestry, citizenship, disability, sexual orientation, marital status, pregnancy, veteran status, membership in the uniformed services, genetic information, or any other basis protected by applicable law. NBCUniversal will consider for employment qualified applicants with criminal histories in a manner consistent with relevant legal requirements, including the City of Los Angeles Fair Chance Initiative For Hiring Ordinance, where applicable. If you are a qualified individual with a disability or a disabled veteran, you have the right to request a reasonable accommodation if you are unable or limited in your ability to use or access as a result of your disability. You can request reasonable accommodations in the US by calling 1- and in the UK by calling .
Company Description NBC Sports Next is where sports and technology intersect. We're a subdivision of NBC Sports and home to all NBCUniversal digital applications in sports and technology within our three groups: Youth & Recreational Sports; Golf; and Betting, Gaming & Emerging Media. At NBC Sports Next, we make playing sports better through innovative technology and immersive experiences for athletes, coaches, players and fans. We equip more than 30MM players, coaches, athletes, sports administrators and fans in 40 countries with more than 25 sports solution products, including SportsEngine, the largest youth sports club, league and team management platform; GolfNow, the leading online tee time marketplace and provider of golf course operations technology; GolfPass the ultimate golf membership that connects golfers to exclusive content, tee time credits, and coaching, tips; TeamUnify, swim team management services; GoMotion, sports and fitness business software solutions; and NBC Sports Edge, a leading platform for fantasy sports information and betting-focused tools. At NBC Sports Next we're fueled by our mission to innovate, create larger-than-life events and connect with sports fans through technology that provides the ultimate in immersive experiences. This role is part of our Youth & Recreational Sports group, comprised of technology platforms such as SportsEngine, GoMotion, TourneyMachine, and TeamUnify. We enable athletes, parents, coaches and team administrators in the youth and recreational space to manage their organizations, collect payments, share schedules, find programs to participate in and connect with other families. Additionally, NCSI enables leagues and organizations to properly screen and train coaches in an effort to keep kids safe. Come join us as we work together as one team to innovate and deliver what's Next. Job Description Based out of our Belfast offices or working remotely within the UK or Ireland, the Platform Operations Engineer II will be a key member of our Platform Operations Team, helping to build and support the core infrastructure of the SportsEngine Platform services and products through activities and key responsibilities that include; Contributing to efforts that ensure the continuous and smooth running of the SportsEngine platform while serving a large volume of traffic. Leveraging Amazon Web Services to build highly available services for the SportsEngine infrastructure platform built on top of the EKS, RDS and EC2. Developing Infrastructure as code using tools like Terraform. Helping to foster a culture of cooperation, coordination, and continuous learning within the Platform Operations Team and with other Product Development teams throughout SportsEngine. Working closely with the SportsEngine Cyber Security Team to maintain and improve the security of the SportsEngine Platform. Contributing to and using our GitHub Pull Request-centered development pipeline as we continuously deliver value to our customers. Using tools such as NewRelic, Splunk and Datadog to monitor the health of the SportsEngine platform. Being an advocate for quality code and engineering practices that enable Continuous Delivery. Participation in a sustainable on-call schedule. Qualifications • 2 or more years of experience operating web applications in a Site Reliability Engineering, Web Operations, or Cloud Engineering capacity. • A strong foundation in modern infrastructure practices and the ability to deploy and operate maintainable, scalable secure infrastructure. • A team-oriented attitude and seemingly endless intellectual curiosity. • Excellent verbal and written communication skills. Additional desirable skills & experience; • Cloud Experience - Experience with either AWS, GCP or Azure (AWS preferred) - Deploying and managing public Cloud based applications • Kubernetes • CI/CD - Experience operating a CI/CD pipeline • Automation - Ability to script automation in one of either Ruby, Python, Go, Bash etc • Infrastructure as Code - Some exposure to one or all of Terraform/Ansible/Chef Additional Information NBCUniversal's policy is to provide equal employment opportunities to all applicants and employees without regard to race, color, religion, creed, gender, gender identity or expression, age, national origin or ancestry, citizenship, disability, sexual orientation, marital status, pregnancy, veteran status, membership in the uniformed services, genetic information, or any other basis protected by applicable law. NBCUniversal will consider for employment qualified applicants with criminal histories in a manner consistent with relevant legal requirements, including the City of Los Angeles Fair Chance Initiative For Hiring Ordinance, where applicable. If you are a qualified individual with a disability or a disabled veteran, you have the right to request a reasonable accommodation if you are unable or limited in your ability to use or access as a result of your disability. You can request reasonable accommodations in the US by calling 1- and in the UK by calling .
24/09/2022
Full time
Company Description NBC Sports Next is where sports and technology intersect. We're a subdivision of NBC Sports and home to all NBCUniversal digital applications in sports and technology within our three groups: Youth & Recreational Sports; Golf; and Betting, Gaming & Emerging Media. At NBC Sports Next, we make playing sports better through innovative technology and immersive experiences for athletes, coaches, players and fans. We equip more than 30MM players, coaches, athletes, sports administrators and fans in 40 countries with more than 25 sports solution products, including SportsEngine, the largest youth sports club, league and team management platform; GolfNow, the leading online tee time marketplace and provider of golf course operations technology; GolfPass the ultimate golf membership that connects golfers to exclusive content, tee time credits, and coaching, tips; TeamUnify, swim team management services; GoMotion, sports and fitness business software solutions; and NBC Sports Edge, a leading platform for fantasy sports information and betting-focused tools. At NBC Sports Next we're fueled by our mission to innovate, create larger-than-life events and connect with sports fans through technology that provides the ultimate in immersive experiences. This role is part of our Youth & Recreational Sports group, comprised of technology platforms such as SportsEngine, GoMotion, TourneyMachine, and TeamUnify. We enable athletes, parents, coaches and team administrators in the youth and recreational space to manage their organizations, collect payments, share schedules, find programs to participate in and connect with other families. Additionally, NCSI enables leagues and organizations to properly screen and train coaches in an effort to keep kids safe. Come join us as we work together as one team to innovate and deliver what's Next. Job Description Based out of our Belfast offices or working remotely within the UK or Ireland, the Platform Operations Engineer II will be a key member of our Platform Operations Team, helping to build and support the core infrastructure of the SportsEngine Platform services and products through activities and key responsibilities that include; Contributing to efforts that ensure the continuous and smooth running of the SportsEngine platform while serving a large volume of traffic. Leveraging Amazon Web Services to build highly available services for the SportsEngine infrastructure platform built on top of the EKS, RDS and EC2. Developing Infrastructure as code using tools like Terraform. Helping to foster a culture of cooperation, coordination, and continuous learning within the Platform Operations Team and with other Product Development teams throughout SportsEngine. Working closely with the SportsEngine Cyber Security Team to maintain and improve the security of the SportsEngine Platform. Contributing to and using our GitHub Pull Request-centered development pipeline as we continuously deliver value to our customers. Using tools such as NewRelic, Splunk and Datadog to monitor the health of the SportsEngine platform. Being an advocate for quality code and engineering practices that enable Continuous Delivery. Participation in a sustainable on-call schedule. Qualifications • 2 or more years of experience operating web applications in a Site Reliability Engineering, Web Operations, or Cloud Engineering capacity. • A strong foundation in modern infrastructure practices and the ability to deploy and operate maintainable, scalable secure infrastructure. • A team-oriented attitude and seemingly endless intellectual curiosity. • Excellent verbal and written communication skills. Additional desirable skills & experience; • Cloud Experience - Experience with either AWS, GCP or Azure (AWS preferred) - Deploying and managing public Cloud based applications • Kubernetes • CI/CD - Experience operating a CI/CD pipeline • Automation - Ability to script automation in one of either Ruby, Python, Go, Bash etc • Infrastructure as Code - Some exposure to one or all of Terraform/Ansible/Chef Additional Information NBCUniversal's policy is to provide equal employment opportunities to all applicants and employees without regard to race, color, religion, creed, gender, gender identity or expression, age, national origin or ancestry, citizenship, disability, sexual orientation, marital status, pregnancy, veteran status, membership in the uniformed services, genetic information, or any other basis protected by applicable law. NBCUniversal will consider for employment qualified applicants with criminal histories in a manner consistent with relevant legal requirements, including the City of Los Angeles Fair Chance Initiative For Hiring Ordinance, where applicable. If you are a qualified individual with a disability or a disabled veteran, you have the right to request a reasonable accommodation if you are unable or limited in your ability to use or access as a result of your disability. You can request reasonable accommodations in the US by calling 1- and in the UK by calling .
Site Reliability Engineer Our client, a leading global supplier for IT services requires a Site Reliability Engineer- Virtualisation SME based at their client's offices in London . You may be able to work some days remotely. This is a 1 year temporary contract to start ASAP. Day rate: Competitive market rate We are looking for a Site Reliability Engineer - Virtualisation SME with 10+ years of experience having excellent knowledge of ESX VMWare and/or Nutanix HCI and of container orchestration platforms such as Docker and Kubernetes: Key Responsibilities Responsible for the reliability and efficiency of virtualisation infrastructure through the delivery of common, repeatable tools and processes that greatly reduce the amount of toil the OS and DB Platform Operations team must perform Responsible for writing software to make the virtualisation infrastructure self-managing and self-service. Responsible for automation and continuous service improvement by developing Infrastructure as Code. Responsible for elimination of manual, repetitive, automatable, tactical tasks that are devoid from value. Responsible for availability, latency, performance, efficiency, change management, monitoring and capacity planning. Responsible for improving system performance, making effective use of resources, distributing load and reducing latency. Responsible for identifying SLO's (Service Level Objectives) that align the team to meet availability and latency objectives. Responsible for developing pro-active monitoring solutions that alert on symptoms and not just on outages. Responsible for performing detailed root cause analysis (RCA's) on incidents and outages to prevent future occurrence. Responsible for partnering with development teams to improve services via rigorous testing and release procedures. Responsible for actively sharing knowledge and best practices across the organisation. Responsible for identifying technical debt and partner with application teams to build remediation plans. Responsible for developing standard operational procedures and producing effective documentation. Responsible for analysing workloads and devising suitable cloud migration strategies where appropriate. Responsible for participating in on-call rotation, triaging and addressing production issues as they arise. Responsible for performing the OS Platform Operations function as and when required. Responsible for mentoring and developing less experienced SA's and SRE's. Responsible for identifying cost saving and optimisation opportunities within the customer business. Responsible for building strong relationships across the customer functions and business areas, underpinned by trust and the core values of the customer. Key Skills Essential: Excellent knowledge of ESX VMWare and/or Nutanix HCI. Excellent knowledge of Windows Server 2008/2012/2016/2019. Excellent knowledge of Windows OS tuning utilities and commands. Excellent knowledge of configuring Windows OS systems for optimal performance. Excellent knowledge of Windows clustering and high-availability solutions. Excellent knowledge of Microsoft Active Directory, LDAP and Kerberos. Excellent knowledge of TCP/IP Networking Protocols. Excellent knowledge of networking, storage, database and virtualization layers. Excellent knowledge of container orchestration platforms such as Docker and Kubernetes. Excellent knowledge of version control software such as GitHub and Subversion. Excellent knowledge of configuration management software such as Chef, Puppet, Ansible, Terraform and SaltStack. Excellent knowledge of "Infrastructure as Code" principles and practices. Excellent knowledge of continuous integration (CI) and continuous development (CD) principles and practices. Excellent knowledge of applications development using Agile, and DevOps best practices. Excellent knowledge of operating system security and auditing methods. Excellent knowledge of security hardening principles in line with CIS industry benchmarks. Excellent knowledge of data security governance and regulations such as GDPR and SOX. Excellent knowledge of cloud computing - IaaS, PaaS and SaaS offerings across Azure, AWS, GCP and Oracle. Desirable: Good working knowledge of RedHat Enterprise Linux (6.x, 7.x, 8.x) and Solaris (10.x and 11.x). Good working knowledge of Unix/Linux OS tuning utilities and commands. Good working knowledge of Unix/Linux system internals and Kernel tuning for optimal performance. Good working knowledge of Red Hat Satellite. Good working knowledge of Anti-Virus software such as McAfee and Sophos. Good working knowledge of Ivanti LANDESK and Symantec Altiris. Good working knowledge of ThinPrint and EquiTrack (Follow-Me Printing). Good working knowledge of Rubrik. Good working knowledge of EMC, HDS and Pure storage arrays. Good working knowledge of Dell PowerEdge, IBM xSeries and Cisco UCS hardware. Good working knowledge of EMC Networker, Data Domain and IBM Tivoli Storage Manager. Good working knowledge of Infoblox DNS. Good working knowledge of Icinga 2 and OpManager. Good working knowledge of IBM Tivoli and Netcool. Good working knowledge of GitHub, Subversion and TeamCity. Good working knowledge of BMC Control-M. Good working knowledge of CyberArk. Good working knowledge of Splunk and IBM QRadar. Good working knowledge of Qualys. Good working knowledge of SharePoint, JIRA and Confluence. Good working knowledge of ServiceNow and Serena Business Manager. Candidate Specifications Excellent communication and interpersonal skills Ability to handle pressure during outages and systematically resolve issues Excellent problem-solving skills Results driven, with a strong sense of accountability A proactive, motivated approach The ability to operate with urgency and prioritise work accordingly A structured and logical approach to work Attention to detail and accuracy Ability to perform well in a pressurised environment Ability to manage constructive conflict effectively The ability to manage large workloads and tight deadlines Able to communicate complex technical concepts to non-technical persons at all levels
23/09/2022
Contractor
Site Reliability Engineer Our client, a leading global supplier for IT services requires a Site Reliability Engineer- Virtualisation SME based at their client's offices in London . You may be able to work some days remotely. This is a 1 year temporary contract to start ASAP. Day rate: Competitive market rate We are looking for a Site Reliability Engineer - Virtualisation SME with 10+ years of experience having excellent knowledge of ESX VMWare and/or Nutanix HCI and of container orchestration platforms such as Docker and Kubernetes: Key Responsibilities Responsible for the reliability and efficiency of virtualisation infrastructure through the delivery of common, repeatable tools and processes that greatly reduce the amount of toil the OS and DB Platform Operations team must perform Responsible for writing software to make the virtualisation infrastructure self-managing and self-service. Responsible for automation and continuous service improvement by developing Infrastructure as Code. Responsible for elimination of manual, repetitive, automatable, tactical tasks that are devoid from value. Responsible for availability, latency, performance, efficiency, change management, monitoring and capacity planning. Responsible for improving system performance, making effective use of resources, distributing load and reducing latency. Responsible for identifying SLO's (Service Level Objectives) that align the team to meet availability and latency objectives. Responsible for developing pro-active monitoring solutions that alert on symptoms and not just on outages. Responsible for performing detailed root cause analysis (RCA's) on incidents and outages to prevent future occurrence. Responsible for partnering with development teams to improve services via rigorous testing and release procedures. Responsible for actively sharing knowledge and best practices across the organisation. Responsible for identifying technical debt and partner with application teams to build remediation plans. Responsible for developing standard operational procedures and producing effective documentation. Responsible for analysing workloads and devising suitable cloud migration strategies where appropriate. Responsible for participating in on-call rotation, triaging and addressing production issues as they arise. Responsible for performing the OS Platform Operations function as and when required. Responsible for mentoring and developing less experienced SA's and SRE's. Responsible for identifying cost saving and optimisation opportunities within the customer business. Responsible for building strong relationships across the customer functions and business areas, underpinned by trust and the core values of the customer. Key Skills Essential: Excellent knowledge of ESX VMWare and/or Nutanix HCI. Excellent knowledge of Windows Server 2008/2012/2016/2019. Excellent knowledge of Windows OS tuning utilities and commands. Excellent knowledge of configuring Windows OS systems for optimal performance. Excellent knowledge of Windows clustering and high-availability solutions. Excellent knowledge of Microsoft Active Directory, LDAP and Kerberos. Excellent knowledge of TCP/IP Networking Protocols. Excellent knowledge of networking, storage, database and virtualization layers. Excellent knowledge of container orchestration platforms such as Docker and Kubernetes. Excellent knowledge of version control software such as GitHub and Subversion. Excellent knowledge of configuration management software such as Chef, Puppet, Ansible, Terraform and SaltStack. Excellent knowledge of "Infrastructure as Code" principles and practices. Excellent knowledge of continuous integration (CI) and continuous development (CD) principles and practices. Excellent knowledge of applications development using Agile, and DevOps best practices. Excellent knowledge of operating system security and auditing methods. Excellent knowledge of security hardening principles in line with CIS industry benchmarks. Excellent knowledge of data security governance and regulations such as GDPR and SOX. Excellent knowledge of cloud computing - IaaS, PaaS and SaaS offerings across Azure, AWS, GCP and Oracle. Desirable: Good working knowledge of RedHat Enterprise Linux (6.x, 7.x, 8.x) and Solaris (10.x and 11.x). Good working knowledge of Unix/Linux OS tuning utilities and commands. Good working knowledge of Unix/Linux system internals and Kernel tuning for optimal performance. Good working knowledge of Red Hat Satellite. Good working knowledge of Anti-Virus software such as McAfee and Sophos. Good working knowledge of Ivanti LANDESK and Symantec Altiris. Good working knowledge of ThinPrint and EquiTrack (Follow-Me Printing). Good working knowledge of Rubrik. Good working knowledge of EMC, HDS and Pure storage arrays. Good working knowledge of Dell PowerEdge, IBM xSeries and Cisco UCS hardware. Good working knowledge of EMC Networker, Data Domain and IBM Tivoli Storage Manager. Good working knowledge of Infoblox DNS. Good working knowledge of Icinga 2 and OpManager. Good working knowledge of IBM Tivoli and Netcool. Good working knowledge of GitHub, Subversion and TeamCity. Good working knowledge of BMC Control-M. Good working knowledge of CyberArk. Good working knowledge of Splunk and IBM QRadar. Good working knowledge of Qualys. Good working knowledge of SharePoint, JIRA and Confluence. Good working knowledge of ServiceNow and Serena Business Manager. Candidate Specifications Excellent communication and interpersonal skills Ability to handle pressure during outages and systematically resolve issues Excellent problem-solving skills Results driven, with a strong sense of accountability A proactive, motivated approach The ability to operate with urgency and prioritise work accordingly A structured and logical approach to work Attention to detail and accuracy Ability to perform well in a pressurised environment Ability to manage constructive conflict effectively The ability to manage large workloads and tight deadlines Able to communicate complex technical concepts to non-technical persons at all levels
Senior Application Support / SRE Hybrid Working: Mix of Home Working / London EMEA HQ Permanent, Full Time As a trusted and preferred recruitment partner to this leading global provider of cloud-based solutions to the global financial sector, we have been asked to assist in the hire of a Senior Application Support Engineer to take responsibility for the availability and reliability of services used by over 23,000 customers across 90 countries (including 22 of the world's top 25 banks). In this role you will ensure all services exceed availability targets, have in-depth monitoring and are proactively managed. Already benefitting from a dominance in the North American finance industry, our client is expanding its London operations to better serve the UK and EU markets. This is an exciting time to join, and you will have the opportunity to work a mix of remotely and within their state-of-the-art EMEA HQ in London. Your Job *Service Reliability: Proactively identifying risks to service and remediate them. Reduce risk from deployments by improved use of resilience and ensuring appropriate testing of releases pre and post deployment. Provide support and troubleshooting when service incidents occur. Improve time to recover from service impacting incidents. Identifying trends and root causes to reduce volume of incidents. *Automation: Identify and deliver on opportunities to use automation to increase efficiency, reduce toil and drive service availability. Use automation and orchestration techniques to provide repeatable solutions and reduce risk of mis-operations. *Observability: Monitor and ensure smooth operation of all production services. Identifying gaps in coverage and improving observability of Production services. Ensuring appropriate events are generated for service failure or degradation scenarios. Responding to events and alerts in timely manner managing through to resolution. *Knowledge management: Continuously improving the knowledge of the Application Support team to become subject matter experts on the Product and the technology that runs it. Collaborating with other teams to understand how underpinning services support the Products. Identifying opportunities to share knowledge and decrease the time it takes to resolve customer related incidents. Tech Stacks: Platform and Database Tech: Linux, Cassandra, Kafka, ArangoDB; Containerisation/Virtualisation: Kubernetes/OpenShift, VMware; Instrumentation and Monitoring: Splunk, Zabbix, Prometheus, Grafana; Scripting: PowerShell, Python. Your Skills *Experience as a Site Reliability Engineer, Application Support Engineer or similar running highly available critical services (ideally SaaS) *Scripting abilities in PowerShell / Python *Understanding of networking, firewalls, protocols, databases and more *Java Debugging - ability to complete thread dumps and analysis *Experience with monitoring solutions *Splunk Experience - creating dashboards, events and analysis *CI/CD Delivery Practices *Troubleshooting connectivity issues: TCP/IP, DNS, Telnet, Trace Route, TCP dump and analysis *Awareness of Load Balancing Technologies such as HA Proxy, Nginx, F5 *Experience of collaboration technologies - email, archiving, instant messaging *Exposure to support Voice / SMS Tech nice to have Alongside a competitive salary, you will receive a benefits package which includes 25 Days Holiday (increases with service), Private Medical Cover, Bupa Dental Cover, Life Insurance, Income Protection, Secondment Opportunities to Global HQ in Vancouver, Pension Scheme (increases with service up to 7% employer contribution), Bonus Scheme (up to 8% dependent on revenues and team performance). This role would be suitable for those who have held the following job roles: Site Reliability Engineer, Senior SRE, Site Availability Engineer, Application Support Engineer, Senior Site Reliability Engineer, Senior Application Support Engineer, Lead SRE, Lead Site Reliability Engineer, Lead Application Support. Deerfoot IT Resources Ltd is one of the UK's leading IT Recruitment Agencies, trusted by many of the UK's leading employers. Established in 1997, we have over twenty years of experience as IT Recruitment Specialist. We will never send your CV anywhere without your authorisation and only after you have seen the complete details on this opportunity. Deerfoot is acting as an employment agency in relation to this vacancy. Each time Deerfoot sends a CV to a recruiting client we donate £1 to The Born Free Foundation ().
04/11/2021
Full time
Senior Application Support / SRE Hybrid Working: Mix of Home Working / London EMEA HQ Permanent, Full Time As a trusted and preferred recruitment partner to this leading global provider of cloud-based solutions to the global financial sector, we have been asked to assist in the hire of a Senior Application Support Engineer to take responsibility for the availability and reliability of services used by over 23,000 customers across 90 countries (including 22 of the world's top 25 banks). In this role you will ensure all services exceed availability targets, have in-depth monitoring and are proactively managed. Already benefitting from a dominance in the North American finance industry, our client is expanding its London operations to better serve the UK and EU markets. This is an exciting time to join, and you will have the opportunity to work a mix of remotely and within their state-of-the-art EMEA HQ in London. Your Job *Service Reliability: Proactively identifying risks to service and remediate them. Reduce risk from deployments by improved use of resilience and ensuring appropriate testing of releases pre and post deployment. Provide support and troubleshooting when service incidents occur. Improve time to recover from service impacting incidents. Identifying trends and root causes to reduce volume of incidents. *Automation: Identify and deliver on opportunities to use automation to increase efficiency, reduce toil and drive service availability. Use automation and orchestration techniques to provide repeatable solutions and reduce risk of mis-operations. *Observability: Monitor and ensure smooth operation of all production services. Identifying gaps in coverage and improving observability of Production services. Ensuring appropriate events are generated for service failure or degradation scenarios. Responding to events and alerts in timely manner managing through to resolution. *Knowledge management: Continuously improving the knowledge of the Application Support team to become subject matter experts on the Product and the technology that runs it. Collaborating with other teams to understand how underpinning services support the Products. Identifying opportunities to share knowledge and decrease the time it takes to resolve customer related incidents. Tech Stacks: Platform and Database Tech: Linux, Cassandra, Kafka, ArangoDB; Containerisation/Virtualisation: Kubernetes/OpenShift, VMware; Instrumentation and Monitoring: Splunk, Zabbix, Prometheus, Grafana; Scripting: PowerShell, Python. Your Skills *Experience as a Site Reliability Engineer, Application Support Engineer or similar running highly available critical services (ideally SaaS) *Scripting abilities in PowerShell / Python *Understanding of networking, firewalls, protocols, databases and more *Java Debugging - ability to complete thread dumps and analysis *Experience with monitoring solutions *Splunk Experience - creating dashboards, events and analysis *CI/CD Delivery Practices *Troubleshooting connectivity issues: TCP/IP, DNS, Telnet, Trace Route, TCP dump and analysis *Awareness of Load Balancing Technologies such as HA Proxy, Nginx, F5 *Experience of collaboration technologies - email, archiving, instant messaging *Exposure to support Voice / SMS Tech nice to have Alongside a competitive salary, you will receive a benefits package which includes 25 Days Holiday (increases with service), Private Medical Cover, Bupa Dental Cover, Life Insurance, Income Protection, Secondment Opportunities to Global HQ in Vancouver, Pension Scheme (increases with service up to 7% employer contribution), Bonus Scheme (up to 8% dependent on revenues and team performance). This role would be suitable for those who have held the following job roles: Site Reliability Engineer, Senior SRE, Site Availability Engineer, Application Support Engineer, Senior Site Reliability Engineer, Senior Application Support Engineer, Lead SRE, Lead Site Reliability Engineer, Lead Application Support. Deerfoot IT Resources Ltd is one of the UK's leading IT Recruitment Agencies, trusted by many of the UK's leading employers. Established in 1997, we have over twenty years of experience as IT Recruitment Specialist. We will never send your CV anywhere without your authorisation and only after you have seen the complete details on this opportunity. Deerfoot is acting as an employment agency in relation to this vacancy. Each time Deerfoot sends a CV to a recruiting client we donate £1 to The Born Free Foundation ().