Browse IT Jobs | IT Job Board

VIQU IT City, London

Senior DevOps Engineer 6-month contract London/Remote SC Clearance Inside IR35 My Financial Customer is looking for a Senior DevOps Engineer to join a growing technology team responsible for maintaining and evolving a complex on-premise platform. This role will play a key part in ensuring the reliability, performance, and continuous improvement of the organisation s technical estate while supporting the delivery of new services. This Senior DevOps Engineer is required to have experience with the following: Cloud Platform (Azure, AWS, or GCP), Kubernetes, IaC using Terraform and CI/CD pipelines. Skills & Experience required from the Senior DevOps Engineer: Active SC Clearance Cloud Platforms (Azure, AWS, or GCP) Strong experience supporting production environments, including on-call support and release management. Hands-on experience with Kubernetes or OpenShift in on-premise environments. Proven background in Linux system administration Experience implementing Infrastructure as Code, ideally using Terraform, integrated with CI/CD pipelines such as Jenkins. Observability platforms including logging, monitoring, and alerting tools such as ELK, Splunk, Prometheus, or Grafana. Experience improving DevOps tooling and contributing to technology roadmaps. Strong knowledge of Agile methodologies and modern DevOps practices. Experience working in the financial industry would be beneficial. Key Responsibilities of the Senior DevOps Engineer: Support, maintain and enhance the organisation s core platform and supporting infrastructure. Manage the promotion of code across environments, ensuring safe and controlled releases from pre-production through to live. Champion continuous improvement and modern DevOps practices, helping teams adopt automation-first approaches across build, deployment, and release processes. Provide technical mentorship and guidance to engineers, sharing best practices and supporting professional development within the team. Collaborate with architects and solution designers to align technical delivery with long-term product and technology roadmaps. Lead incident management activities, coordinating major incident responses and ensuring effective communication with stakeholders. Support a 24/7 production environment, including participation in an on-call rota and out-of-hours release support when required. Apply now to speak with VIQU IT in confidence. Or reach out to Connor Smal via the VIQU IT website. Do you know someone great? We ll thank you with up to £1,000 if your referral is successful (terms apply). For more exciting roles and opportunities like this, please follow us on IT Recruitment.

31/03/2026

Contractor

Senior DevOps Engineer 6-month contract London/Remote SC Clearance Inside IR35 My Financial Customer is looking for a Senior DevOps Engineer to join a growing technology team responsible for maintaining and evolving a complex on-premise platform. This role will play a key part in ensuring the reliability, performance, and continuous improvement of the organisation s technical estate while supporting the delivery of new services. This Senior DevOps Engineer is required to have experience with the following: Cloud Platform (Azure, AWS, or GCP), Kubernetes, IaC using Terraform and CI/CD pipelines. Skills & Experience required from the Senior DevOps Engineer: Active SC Clearance Cloud Platforms (Azure, AWS, or GCP) Strong experience supporting production environments, including on-call support and release management. Hands-on experience with Kubernetes or OpenShift in on-premise environments. Proven background in Linux system administration Experience implementing Infrastructure as Code, ideally using Terraform, integrated with CI/CD pipelines such as Jenkins. Observability platforms including logging, monitoring, and alerting tools such as ELK, Splunk, Prometheus, or Grafana. Experience improving DevOps tooling and contributing to technology roadmaps. Strong knowledge of Agile methodologies and modern DevOps practices. Experience working in the financial industry would be beneficial. Key Responsibilities of the Senior DevOps Engineer: Support, maintain and enhance the organisation s core platform and supporting infrastructure. Manage the promotion of code across environments, ensuring safe and controlled releases from pre-production through to live. Champion continuous improvement and modern DevOps practices, helping teams adopt automation-first approaches across build, deployment, and release processes. Provide technical mentorship and guidance to engineers, sharing best practices and supporting professional development within the team. Collaborate with architects and solution designers to align technical delivery with long-term product and technology roadmaps. Lead incident management activities, coordinating major incident responses and ensuring effective communication with stakeholders. Support a 24/7 production environment, including participation in an on-call rota and out-of-hours release support when required. Apply now to speak with VIQU IT in confidence. Or reach out to Connor Smal via the VIQU IT website. Do you know someone great? We ll thank you with up to £1,000 if your referral is successful (terms apply). For more exciting roles and opportunities like this, please follow us on IT Recruitment.

Mobile QA Engineer (Manual)

Experis

My Client, a large global financial services brand is looking for an experienced Mobile QA Engineer (Manual) on a initial contract basis. The role in Inside IR35 and Hybrid in London (3 days based onsite). We're looking for a Mobile QA Engineer (Manual) to work on award winning mobile applications that will be used by millions of customers worldwide. We want someone with strong technical skills and creativity. Should enjoy solving tough problems and working with new technologies. You should not be shy about sharing your ideas and be obsessive about user experience and high quality. You'll be part of the Mobile Engineering team whose mandate is to develop new products and platforms for customers. Mobile Engineering's aim is to build interactive experiences at all touch points of a consumer's journey whether before, at, or after the time of purchase. Responsibilities Collaborate with Product, Design and Development teams to understand product requirements and create comprehensive test plan and test cases. Execute functional and automated tests to verify the accuracy, completeness, and reliability of functionality. Contribute to the development and enhancement of UI automated testing frameworks built on Espresso (Android) and XCUITest (iOS). Analyse requirements and determine technical feasibility for Automation. Integrate automated tests into CI to identify issues during development cycle. Contribute to PR reviews, submit PRs, and contribute to the goal of 100% regression automation readiness. Develop and maintain robust, scalable, reusable automated test scripts across applications. Identify, document, and track defects, working closely with development teams to ensure timely resolution and retesting. Improve QA delivery and quality through defining test strategy, process improvements, coordination with multiple back end teams. Work with the development team to define and implement mechanisms to inject testing earlier into the software development process via mocking strategy. Prioritise competing demands, manage multiple concurrent tasks, adapt to changing priorities. Participate in regression testing to validate new enhancements don't negatively impact existing functionality. Continuously improve the QA process and contribute to the development of testing best practices. Qualifications: Minimum 7+ years of technical experience with a bachelor's or master's degree in science (preferably Computer Science, Engineering, or other related disciplines). Must have hands-on testing experience in iOS and Android mobile platforms by leveraging various functional and automated tools. Minimum 3+ years of mobile app Automation experience with tools like Monkey Talk, Selendriod, Appium, Katalon etc. Possesses deep knowledge on Functional, Integration, Regression, Exploratory, End to End, Compatibility, GUI, Web Services and Accessibility testing. Good Understanding of Swift, Kotlin or similar functional programming language. Strong programming abilities and debugging skills. Excellent API testing experience using Postman, IntelliJ Http Client, or similar tools. Strong experience with Debugging tools like Charles Proxy, Splunk, Sentry, Console or similar. Excellent communication and team player. Experience with full life cycle software deployment using Agile practices. Strong attention to detail and ability to work in a fast-paced environment.

02/10/2025

Contractor

My Client, a large global financial services brand is looking for an experienced Mobile QA Engineer (Manual) on a initial contract basis. The role in Inside IR35 and Hybrid in London (3 days based onsite). We're looking for a Mobile QA Engineer (Manual) to work on award winning mobile applications that will be used by millions of customers worldwide. We want someone with strong technical skills and creativity. Should enjoy solving tough problems and working with new technologies. You should not be shy about sharing your ideas and be obsessive about user experience and high quality. You'll be part of the Mobile Engineering team whose mandate is to develop new products and platforms for customers. Mobile Engineering's aim is to build interactive experiences at all touch points of a consumer's journey whether before, at, or after the time of purchase. Responsibilities Collaborate with Product, Design and Development teams to understand product requirements and create comprehensive test plan and test cases. Execute functional and automated tests to verify the accuracy, completeness, and reliability of functionality. Contribute to the development and enhancement of UI automated testing frameworks built on Espresso (Android) and XCUITest (iOS). Analyse requirements and determine technical feasibility for Automation. Integrate automated tests into CI to identify issues during development cycle. Contribute to PR reviews, submit PRs, and contribute to the goal of 100% regression automation readiness. Develop and maintain robust, scalable, reusable automated test scripts across applications. Identify, document, and track defects, working closely with development teams to ensure timely resolution and retesting. Improve QA delivery and quality through defining test strategy, process improvements, coordination with multiple back end teams. Work with the development team to define and implement mechanisms to inject testing earlier into the software development process via mocking strategy. Prioritise competing demands, manage multiple concurrent tasks, adapt to changing priorities. Participate in regression testing to validate new enhancements don't negatively impact existing functionality. Continuously improve the QA process and contribute to the development of testing best practices. Qualifications: Minimum 7+ years of technical experience with a bachelor's or master's degree in science (preferably Computer Science, Engineering, or other related disciplines). Must have hands-on testing experience in iOS and Android mobile platforms by leveraging various functional and automated tools. Minimum 3+ years of mobile app Automation experience with tools like Monkey Talk, Selendriod, Appium, Katalon etc. Possesses deep knowledge on Functional, Integration, Regression, Exploratory, End to End, Compatibility, GUI, Web Services and Accessibility testing. Good Understanding of Swift, Kotlin or similar functional programming language. Strong programming abilities and debugging skills. Excellent API testing experience using Postman, IntelliJ Http Client, or similar tools. Strong experience with Debugging tools like Charles Proxy, Splunk, Sentry, Console or similar. Excellent communication and team player. Experience with full life cycle software deployment using Agile practices. Strong attention to detail and ability to work in a fast-paced environment.

Splunk Site Reliability Engineer

Flint UK Technology Services

Job Title: Splunk Site Reliability Engineer/Migration Specialist (Contract) Location: Birmingham (Hybrid/On-site, required 3 days per week) Contract Type: Contract Duration: 3 months rolling Job Summary: We are seeking an experienced Splunk SME/Migration Specialist to lead and support the migration of observability workloads from Splunk to Elasticsearch (ELK Stack) . The ideal candidate will bring hands-on expertise in Splunk architecture, data ingestion, alerting, and dashboarding, along with experience migrating workloads to Elasticsearch. In addition to migration duties, the candidate will maintain and enhance existing Splunk infrastructure, provide incident support, manage upgrades, and ensure observability platforms remain secure and performant. This role demands a technically strong individual with excellent stakeholder communication and problem-solving skills. Key Responsibilities: Migration: Develop and implement a comprehensive migration strategy from Splunk to Elasticsearch (ELK Stack). Assess existing Splunk configurations (dashboards, alerts, saved searches, data models) and recreate them in Kibana. Collaborate with Elastic teams to configure alerting and monitoring using Kibana, Elasticsearch Watcher, or third-party tools. Ensure migration plans include validation, rollback procedures, and knowledge transfer. Platform Operations & Incident Response: Maintain Splunk infrastructure in both Production and Non-Production environments. Support Splunk SRE and Application teams in incident investigation and resolution. Proactively monitor system health and performance metrics. Upgrades and Change Management: Plan and execute upgrades to Splunk components. Perform pre- and post-upgrade checks and validations. Prepare documentation and submit Change Requests following organizational procedures. Security and Compliance: Work with Puppet and other automation tools to ensure timely patching of vulnerabilities. Implement and verify security best practices for observability platforms. Support compliance initiatives and audits. Documentation and Knowledge Sharing: Maintain accurate and up-to-date technical documentation, including architecture diagrams, configurations, procedures, and troubleshooting guides. Review and update support articles and take ownership of relevant assets. Support knowledge transfer across teams as needed. Troubleshooting and Support: Identify and resolve issues in Splunk and ELK environments. Assist teams with Splunk-related queries and optimization efforts. Skills and Qualifications: Essential: Proven expertise with Splunk architecture , data ingestion, dashboarding, alerting, and administration. Experience migrating Splunk workloads to Elasticsearch (ELK Stack) . Solid understanding of Kibana , Elasticsearch Watcher , and observability tooling. Proficiency in Linux/Unix systems and networking protocols . Hands-on experience with Scripting (eg, Python, Shell/Bash). Experience supporting or working alongside DevOps/SRE teams . Strong analytical, troubleshooting, and communication skills. Desirable: Experience with containerized environments such as Docker or Kubernetes . Industry certifications such as Splunk Certified Power User/Admin/Architect . Knowledge of automation tools (eg, Puppet, Ansible). Bachelor's degree in Computer Science, Information Systems, or related field. Key Attributes: Independent and proactive problem-solver. Collaborative and able to work cross-functionally with infrastructure, security, and application teams. Able to work under pressure and prioritize tasks effectively. Strong communicator, both written and verbal.

04/09/2025

Contractor

Job Title: Splunk Site Reliability Engineer/Migration Specialist (Contract) Location: Birmingham (Hybrid/On-site, required 3 days per week) Contract Type: Contract Duration: 3 months rolling Job Summary: We are seeking an experienced Splunk SME/Migration Specialist to lead and support the migration of observability workloads from Splunk to Elasticsearch (ELK Stack) . The ideal candidate will bring hands-on expertise in Splunk architecture, data ingestion, alerting, and dashboarding, along with experience migrating workloads to Elasticsearch. In addition to migration duties, the candidate will maintain and enhance existing Splunk infrastructure, provide incident support, manage upgrades, and ensure observability platforms remain secure and performant. This role demands a technically strong individual with excellent stakeholder communication and problem-solving skills. Key Responsibilities: Migration: Develop and implement a comprehensive migration strategy from Splunk to Elasticsearch (ELK Stack). Assess existing Splunk configurations (dashboards, alerts, saved searches, data models) and recreate them in Kibana. Collaborate with Elastic teams to configure alerting and monitoring using Kibana, Elasticsearch Watcher, or third-party tools. Ensure migration plans include validation, rollback procedures, and knowledge transfer. Platform Operations & Incident Response: Maintain Splunk infrastructure in both Production and Non-Production environments. Support Splunk SRE and Application teams in incident investigation and resolution. Proactively monitor system health and performance metrics. Upgrades and Change Management: Plan and execute upgrades to Splunk components. Perform pre- and post-upgrade checks and validations. Prepare documentation and submit Change Requests following organizational procedures. Security and Compliance: Work with Puppet and other automation tools to ensure timely patching of vulnerabilities. Implement and verify security best practices for observability platforms. Support compliance initiatives and audits. Documentation and Knowledge Sharing: Maintain accurate and up-to-date technical documentation, including architecture diagrams, configurations, procedures, and troubleshooting guides. Review and update support articles and take ownership of relevant assets. Support knowledge transfer across teams as needed. Troubleshooting and Support: Identify and resolve issues in Splunk and ELK environments. Assist teams with Splunk-related queries and optimization efforts. Skills and Qualifications: Essential: Proven expertise with Splunk architecture , data ingestion, dashboarding, alerting, and administration. Experience migrating Splunk workloads to Elasticsearch (ELK Stack) . Solid understanding of Kibana , Elasticsearch Watcher , and observability tooling. Proficiency in Linux/Unix systems and networking protocols . Hands-on experience with Scripting (eg, Python, Shell/Bash). Experience supporting or working alongside DevOps/SRE teams . Strong analytical, troubleshooting, and communication skills. Desirable: Experience with containerized environments such as Docker or Kubernetes . Industry certifications such as Splunk Certified Power User/Admin/Architect . Knowledge of automation tools (eg, Puppet, Ansible). Bachelor's degree in Computer Science, Information Systems, or related field. Key Attributes: Independent and proactive problem-solver. Collaborative and able to work cross-functionally with infrastructure, security, and application teams. Able to work under pressure and prioritize tasks effectively. Strong communicator, both written and verbal.

Site Reliability Engineer

IT Jobs Southampton, Hampshire

Site Reliability Engineer Southampton HQ - 2 Times a week in Office Cloud, SaaS, AWS, Please be advised Security Clearance is required for this position We are working alongside one of our longstanding clients in helping them recruit a Site Reliability Engineer. The company deliver cutting-edge enterprise software solutions across both cloud and on-premises environments, empowering organisations to enhance customer experiences, maintain regulatory compliance, and proactively fight fraud. The company are trusted by businesses worldwide to drive seamless, intelligent customer interactions. In this role, you'll oversee the production environment by ensuring system availability and maintaining a comprehensive perspective on overall health. You'll develop tools and software to support and streamline the management of platform infrastructure and key applications. A major focus will be enhancing the dependability, performance, and delivery speed of our software products. You'll also be responsible for analysing and fine-tuning system performance to anticipate user demands and drive innovation. Additionally, you'll take the lead in providing operational support and technical oversight for several large-scale distributed applications. How You'll Contribute: Monitor and interpret system and application metrics to fine-tune performance and troubleshoot issues effectively Collaborate closely with developers to enhance service quality through thorough testing and structured release practices Engage in architectural discussions, manage platform operations, and contribute to capacity forecasting Design and implement automated solutions to build resilient, scalable systems Maintain a strong focus on delivering new features while ensuring stability and adherence to service level goalsYou'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet, or Chef Professional certifications in cloud DevOps, such as AWS Certified DevOps Engineer or Google Cloud Professional DevOps Engineer, or similar credentialsDo You Have What It Takes? 3-6 years of hands-on experience in a similar role, with a strong emphasis on systems engineering, automation, and service reliability Proficient in at least one programming language such as Python, Go, Java, or C#, along with scripting skills in Bash or PowerShell Solid grasp of cloud platforms like AWS, including an understanding of how core services like EC2, ECS, Lambda, and DynamoDB operate under reliability constraints Practical experience using infrastructure-as-code tools like CloudFormation or Terraform In-depth knowledge of CI/CD principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through resolution and communication during critical outagesBenefits Life Insurance - 4 x Annual Salary Private Medical Insurance Employee Assistance Programme Hybrid Working - 3 Days from Home GP Online Assistance Portal. + Much MorePlease click the "Apply" button to state your interest in this position. Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy

01/06/2025

Site Reliability Engineer Southampton HQ - 2 Times a week in Office Cloud, SaaS, AWS, Please be advised Security Clearance is required for this position We are working alongside one of our longstanding clients in helping them recruit a Site Reliability Engineer. The company deliver cutting-edge enterprise software solutions across both cloud and on-premises environments, empowering organisations to enhance customer experiences, maintain regulatory compliance, and proactively fight fraud. The company are trusted by businesses worldwide to drive seamless, intelligent customer interactions. In this role, you'll oversee the production environment by ensuring system availability and maintaining a comprehensive perspective on overall health. You'll develop tools and software to support and streamline the management of platform infrastructure and key applications. A major focus will be enhancing the dependability, performance, and delivery speed of our software products. You'll also be responsible for analysing and fine-tuning system performance to anticipate user demands and drive innovation. Additionally, you'll take the lead in providing operational support and technical oversight for several large-scale distributed applications. How You'll Contribute: Monitor and interpret system and application metrics to fine-tune performance and troubleshoot issues effectively Collaborate closely with developers to enhance service quality through thorough testing and structured release practices Engage in architectural discussions, manage platform operations, and contribute to capacity forecasting Design and implement automated solutions to build resilient, scalable systems Maintain a strong focus on delivering new features while ensuring stability and adherence to service level goalsYou'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet, or Chef Professional certifications in cloud DevOps, such as AWS Certified DevOps Engineer or Google Cloud Professional DevOps Engineer, or similar credentialsDo You Have What It Takes? 3-6 years of hands-on experience in a similar role, with a strong emphasis on systems engineering, automation, and service reliability Proficient in at least one programming language such as Python, Go, Java, or C#, along with scripting skills in Bash or PowerShell Solid grasp of cloud platforms like AWS, including an understanding of how core services like EC2, ECS, Lambda, and DynamoDB operate under reliability constraints Practical experience using infrastructure-as-code tools like CloudFormation or Terraform In-depth knowledge of CI/CD principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through resolution and communication during critical outagesBenefits Life Insurance - 4 x Annual Salary Private Medical Insurance Employee Assistance Programme Hybrid Working - 3 Days from Home GP Online Assistance Portal. + Much MorePlease click the "Apply" button to state your interest in this position. Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy

Site Reliability Engineer

IT Jobs Southampton, Hampshire

Site Reliability Engineer Southampton HQ - 2 Times a week in Office Cloud, SaaS, AWS, Please be advised Security Clearance is required for this position We are working alongside one of our longstanding clients in helping them recruit a Site Reliability Engineer. The company deliver cutting-edge enterprise software solutions across both cloud and on-premises environments, empowering organisations to enhance customer experiences, maintain regulatory compliance, and proactively fight fraud. The company are trusted by businesses worldwide to drive seamless, intelligent customer interactions. In this role, you'll oversee the production environment by ensuring system availability and maintaining a comprehensive perspective on overall health. You'll develop tools and software to support and streamline the management of platform infrastructure and key applications. A major focus will be enhancing the dependability, performance, and delivery speed of our software products. You'll also be responsible for analysing and fine-tuning system performance to anticipate user demands and drive innovation. Additionally, you'll take the lead in providing operational support and technical oversight for several large-scale distributed applications. How You'll Contribute: Monitor and interpret system and application metrics to fine-tune performance and troubleshoot issues effectively Collaborate closely with developers to enhance service quality through thorough testing and structured release practices Engage in architectural discussions, manage platform operations, and contribute to capacity forecasting Design and implement automated solutions to build resilient, scalable systems Maintain a strong focus on delivering new features while ensuring stability and adherence to service level goalsYou'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet, or Chef Professional certifications in cloud DevOps, such as AWS Certified DevOps Engineer or Google Cloud Professional DevOps Engineer, or similar credentialsDo You Have What It Takes? 3-6 years of hands-on experience in a similar role, with a strong emphasis on systems engineering, automation, and service reliability Proficient in at least one programming language such as Python, Go, Java, or C#, along with scripting skills in Bash or PowerShell Solid grasp of cloud platforms like AWS, including an understanding of how core services like EC2, ECS, Lambda, and DynamoDB operate under reliability constraints Practical experience using infrastructure-as-code tools like CloudFormation or Terraform In-depth knowledge of CI/CD principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through resolution and communication during critical outagesBenefits Life Insurance - 4 x Annual Salary Private Medical Insurance Employee Assistance Programme Hybrid Working - 3 Days from Home GP Online Assistance Portal. + Much MorePlease click the "Apply" button to state your interest in this position. Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy

01/06/2025

Site Reliability Engineer Southampton HQ - 2 Times a week in Office Cloud, SaaS, AWS, Please be advised Security Clearance is required for this position We are working alongside one of our longstanding clients in helping them recruit a Site Reliability Engineer. The company deliver cutting-edge enterprise software solutions across both cloud and on-premises environments, empowering organisations to enhance customer experiences, maintain regulatory compliance, and proactively fight fraud. The company are trusted by businesses worldwide to drive seamless, intelligent customer interactions. In this role, you'll oversee the production environment by ensuring system availability and maintaining a comprehensive perspective on overall health. You'll develop tools and software to support and streamline the management of platform infrastructure and key applications. A major focus will be enhancing the dependability, performance, and delivery speed of our software products. You'll also be responsible for analysing and fine-tuning system performance to anticipate user demands and drive innovation. Additionally, you'll take the lead in providing operational support and technical oversight for several large-scale distributed applications. How You'll Contribute: Monitor and interpret system and application metrics to fine-tune performance and troubleshoot issues effectively Collaborate closely with developers to enhance service quality through thorough testing and structured release practices Engage in architectural discussions, manage platform operations, and contribute to capacity forecasting Design and implement automated solutions to build resilient, scalable systems Maintain a strong focus on delivering new features while ensuring stability and adherence to service level goalsYou'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet, or Chef Professional certifications in cloud DevOps, such as AWS Certified DevOps Engineer or Google Cloud Professional DevOps Engineer, or similar credentialsDo You Have What It Takes? 3-6 years of hands-on experience in a similar role, with a strong emphasis on systems engineering, automation, and service reliability Proficient in at least one programming language such as Python, Go, Java, or C#, along with scripting skills in Bash or PowerShell Solid grasp of cloud platforms like AWS, including an understanding of how core services like EC2, ECS, Lambda, and DynamoDB operate under reliability constraints Practical experience using infrastructure-as-code tools like CloudFormation or Terraform In-depth knowledge of CI/CD principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless postmortems, including leading cross-functional teams through resolution and communication during critical outagesBenefits Life Insurance - 4 x Annual Salary Private Medical Insurance Employee Assistance Programme Hybrid Working - 3 Days from Home GP Online Assistance Portal. + Much MorePlease click the "Apply" button to state your interest in this position. Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy

Senior Platform Operations Engineer

NBCUniversal

NBC Sports Next is where sports and technology intersect. We're a subdivision of NBC Sports and home to all NBCUniversal digital applications in sports and technology within our three groups: Youth & Recreational Sports; Golf; and Betting, Gaming & Emerging Media. At NBC Sports Next, we make playing sports better through innovative technology and immersive experiences for athletes, coaches, players and fans. We equip more than 30MM players, coaches, athletes, sports administrators and fans in 40 countries with more than 25 sports solution products, including SportsEngine, the largest youth sports club, league and team management platform; GolfNow, the leading online tee time marketplace and provider of golf course operations technology; GolfPass the ultimate golf membership that connects golfers to exclusive content, tee time credits, and coaching, tips; TeamUnify, swim team management services; GoMotion, sports and fitness business software solutions; and NBC Sports Edge, a leading platform for fantasy sports information and betting-focused tools. At NBC Sports Next we're fueled by our mission to innovate, create larger-than-life events and connect with sports fans through technology that provides the ultimate in immersive experiences. This role is part of our Youth & Recreational Sports group, comprised of technology platforms such as SportsEngine, GoMotion, TourneyMachine, and TeamUnify. We enable athletes, parents, coaches and team administrators in the youth and recreational space to manage their organizations, collect payments, share schedules, find programs to participate in and connect with other families. Additionally, NCSI enables leagues and organizations to properly screen and train coaches in an effort to keep kids safe. Come join us as we work together as one team to innovate and deliver what's Next. Job Description Based out of our Belfast offices or working remotely within the UK or Ireland, the Senior Platform Operations Engineer will be a key member of our Platform Operations Team, helping to build and support the core infrastructure of the SportsEngine Platform services and products through activities and key responsibilities that include: Contributing to efforts that ensure the continuous and smooth running of the SportsEngine platform while serving a large volume of traffic. Leveraging Amazon Web Services to build highly available services for the SportsEngine infrastructure platform built on top of the EKS, RDS and EC2. Developing Infrastructure as code using tools like Terraform. Helping to foster a culture of cooperation, coordination, and continuous learning within the Platform Operations Team and with other Product Development teams throughout SportsEngine. Working closely with the SportsEngine Cyber Security Team to maintain and improve the security of the SportsEngine Platform. Contributing to and using our GitHub Pull Request-centered development pipeline as we continuously deliver value to our customers. Using tools such as NewRelic, Splunk and Datadog to monitor the health of the SportsEngine platform. Being an advocate for quality code and engineering practices that enable Continuous Delivery. Participation in a sustainable on-call schedule. Qualifications • 5 or more years of experience in the field of Software Engineering which operating web applications in a Site Reliability Engineering, Web Operations, or Cloud Engineering capacity. • A strong foundation in modern infrastructure practices and the ability to deploy and operate maintainable, scalable secure infrastructure. • Ability to write quality, modular, maintainable, secure, and testable infrastructure automation. • A team-oriented attitude and seemingly endless intellectual curiosity. • Excellent verbal and written communication skills. Desired skills & experience •AWS Experience Experience in the following areas of AWS: - EC2 - VPC - Subnets, Security Groups, NAT Gateways, Transit Gateways, ELB/ALB/NLB etc. - IAM - S3 - Managed data tiers - RDS/Elasticache etc. • Experience in production with: - EKS - OpsWorks - Lambda - DynamoDB • Kubernetes - Production experience of running services in Kubernetes - Ability to take a VM based application and migrate to Kubernetes • CI/CD - Experience with CI/CD pipelines, assisting developers in delivering changes on a daily cadence - Experience with TravisCI, Jenkins, Gitlab CI, Github Actions or similar technologies • Automation - Ability to script automation in one of either Ruby, Python, Go etc • Infrastructure as Code - Terraform Ability to author Terraform at a proficient level Ability to break out reusable, opinionated and standardized actions into Terraform modules - Chef/Ansible Additional Information NBCUniversal's policy is to provide equal employment opportunities to all applicants and employees without regard to race, color, religion, creed, gender, gender identity or expression, age, national origin or ancestry, citizenship, disability, sexual orientation, marital status, pregnancy, veteran status, membership in the uniformed services, genetic information, or any other basis protected by applicable law. NBCUniversal will consider for employment qualified applicants with criminal histories in a manner consistent with relevant legal requirements, including the City of Los Angeles Fair Chance Initiative For Hiring Ordinance, where applicable. If you are a qualified individual with a disability or a disabled veteran, you have the right to request a reasonable accommodation if you are unable or limited in your ability to use or access as a result of your disability. You can request reasonable accommodations in the US by calling 1- and in the UK by calling .

24/09/2022

Full time

NBC Sports Next is where sports and technology intersect. We're a subdivision of NBC Sports and home to all NBCUniversal digital applications in sports and technology within our three groups: Youth & Recreational Sports; Golf; and Betting, Gaming & Emerging Media. At NBC Sports Next, we make playing sports better through innovative technology and immersive experiences for athletes, coaches, players and fans. We equip more than 30MM players, coaches, athletes, sports administrators and fans in 40 countries with more than 25 sports solution products, including SportsEngine, the largest youth sports club, league and team management platform; GolfNow, the leading online tee time marketplace and provider of golf course operations technology; GolfPass the ultimate golf membership that connects golfers to exclusive content, tee time credits, and coaching, tips; TeamUnify, swim team management services; GoMotion, sports and fitness business software solutions; and NBC Sports Edge, a leading platform for fantasy sports information and betting-focused tools. At NBC Sports Next we're fueled by our mission to innovate, create larger-than-life events and connect with sports fans through technology that provides the ultimate in immersive experiences. This role is part of our Youth & Recreational Sports group, comprised of technology platforms such as SportsEngine, GoMotion, TourneyMachine, and TeamUnify. We enable athletes, parents, coaches and team administrators in the youth and recreational space to manage their organizations, collect payments, share schedules, find programs to participate in and connect with other families. Additionally, NCSI enables leagues and organizations to properly screen and train coaches in an effort to keep kids safe. Come join us as we work together as one team to innovate and deliver what's Next. Job Description Based out of our Belfast offices or working remotely within the UK or Ireland, the Senior Platform Operations Engineer will be a key member of our Platform Operations Team, helping to build and support the core infrastructure of the SportsEngine Platform services and products through activities and key responsibilities that include: Contributing to efforts that ensure the continuous and smooth running of the SportsEngine platform while serving a large volume of traffic. Leveraging Amazon Web Services to build highly available services for the SportsEngine infrastructure platform built on top of the EKS, RDS and EC2. Developing Infrastructure as code using tools like Terraform. Helping to foster a culture of cooperation, coordination, and continuous learning within the Platform Operations Team and with other Product Development teams throughout SportsEngine. Working closely with the SportsEngine Cyber Security Team to maintain and improve the security of the SportsEngine Platform. Contributing to and using our GitHub Pull Request-centered development pipeline as we continuously deliver value to our customers. Using tools such as NewRelic, Splunk and Datadog to monitor the health of the SportsEngine platform. Being an advocate for quality code and engineering practices that enable Continuous Delivery. Participation in a sustainable on-call schedule. Qualifications • 5 or more years of experience in the field of Software Engineering which operating web applications in a Site Reliability Engineering, Web Operations, or Cloud Engineering capacity. • A strong foundation in modern infrastructure practices and the ability to deploy and operate maintainable, scalable secure infrastructure. • Ability to write quality, modular, maintainable, secure, and testable infrastructure automation. • A team-oriented attitude and seemingly endless intellectual curiosity. • Excellent verbal and written communication skills. Desired skills & experience •AWS Experience Experience in the following areas of AWS: - EC2 - VPC - Subnets, Security Groups, NAT Gateways, Transit Gateways, ELB/ALB/NLB etc. - IAM - S3 - Managed data tiers - RDS/Elasticache etc. • Experience in production with: - EKS - OpsWorks - Lambda - DynamoDB • Kubernetes - Production experience of running services in Kubernetes - Ability to take a VM based application and migrate to Kubernetes • CI/CD - Experience with CI/CD pipelines, assisting developers in delivering changes on a daily cadence - Experience with TravisCI, Jenkins, Gitlab CI, Github Actions or similar technologies • Automation - Ability to script automation in one of either Ruby, Python, Go etc • Infrastructure as Code - Terraform Ability to author Terraform at a proficient level Ability to break out reusable, opinionated and standardized actions into Terraform modules - Chef/Ansible Additional Information NBCUniversal's policy is to provide equal employment opportunities to all applicants and employees without regard to race, color, religion, creed, gender, gender identity or expression, age, national origin or ancestry, citizenship, disability, sexual orientation, marital status, pregnancy, veteran status, membership in the uniformed services, genetic information, or any other basis protected by applicable law. NBCUniversal will consider for employment qualified applicants with criminal histories in a manner consistent with relevant legal requirements, including the City of Los Angeles Fair Chance Initiative For Hiring Ordinance, where applicable. If you are a qualified individual with a disability or a disabled veteran, you have the right to request a reasonable accommodation if you are unable or limited in your ability to use or access as a result of your disability. You can request reasonable accommodations in the US by calling 1- and in the UK by calling .

Platform Operations Engineer II

NBCUniversal

Company Description NBC Sports Next is where sports and technology intersect. We're a subdivision of NBC Sports and home to all NBCUniversal digital applications in sports and technology within our three groups: Youth & Recreational Sports; Golf; and Betting, Gaming & Emerging Media. At NBC Sports Next, we make playing sports better through innovative technology and immersive experiences for athletes, coaches, players and fans. We equip more than 30MM players, coaches, athletes, sports administrators and fans in 40 countries with more than 25 sports solution products, including SportsEngine, the largest youth sports club, league and team management platform; GolfNow, the leading online tee time marketplace and provider of golf course operations technology; GolfPass the ultimate golf membership that connects golfers to exclusive content, tee time credits, and coaching, tips; TeamUnify, swim team management services; GoMotion, sports and fitness business software solutions; and NBC Sports Edge, a leading platform for fantasy sports information and betting-focused tools. At NBC Sports Next we're fueled by our mission to innovate, create larger-than-life events and connect with sports fans through technology that provides the ultimate in immersive experiences. This role is part of our Youth & Recreational Sports group, comprised of technology platforms such as SportsEngine, GoMotion, TourneyMachine, and TeamUnify. We enable athletes, parents, coaches and team administrators in the youth and recreational space to manage their organizations, collect payments, share schedules, find programs to participate in and connect with other families. Additionally, NCSI enables leagues and organizations to properly screen and train coaches in an effort to keep kids safe. Come join us as we work together as one team to innovate and deliver what's Next. Job Description Based out of our Belfast offices or working remotely within the UK or Ireland, the Platform Operations Engineer II will be a key member of our Platform Operations Team, helping to build and support the core infrastructure of the SportsEngine Platform services and products through activities and key responsibilities that include; Contributing to efforts that ensure the continuous and smooth running of the SportsEngine platform while serving a large volume of traffic. Leveraging Amazon Web Services to build highly available services for the SportsEngine infrastructure platform built on top of the EKS, RDS and EC2. Developing Infrastructure as code using tools like Terraform. Helping to foster a culture of cooperation, coordination, and continuous learning within the Platform Operations Team and with other Product Development teams throughout SportsEngine. Working closely with the SportsEngine Cyber Security Team to maintain and improve the security of the SportsEngine Platform. Contributing to and using our GitHub Pull Request-centered development pipeline as we continuously deliver value to our customers. Using tools such as NewRelic, Splunk and Datadog to monitor the health of the SportsEngine platform. Being an advocate for quality code and engineering practices that enable Continuous Delivery. Participation in a sustainable on-call schedule. Qualifications • 2 or more years of experience operating web applications in a Site Reliability Engineering, Web Operations, or Cloud Engineering capacity. • A strong foundation in modern infrastructure practices and the ability to deploy and operate maintainable, scalable secure infrastructure. • A team-oriented attitude and seemingly endless intellectual curiosity. • Excellent verbal and written communication skills. Additional desirable skills & experience; • Cloud Experience - Experience with either AWS, GCP or Azure (AWS preferred) - Deploying and managing public Cloud based applications • Kubernetes • CI/CD - Experience operating a CI/CD pipeline • Automation - Ability to script automation in one of either Ruby, Python, Go, Bash etc • Infrastructure as Code - Some exposure to one or all of Terraform/Ansible/Chef Additional Information NBCUniversal's policy is to provide equal employment opportunities to all applicants and employees without regard to race, color, religion, creed, gender, gender identity or expression, age, national origin or ancestry, citizenship, disability, sexual orientation, marital status, pregnancy, veteran status, membership in the uniformed services, genetic information, or any other basis protected by applicable law. NBCUniversal will consider for employment qualified applicants with criminal histories in a manner consistent with relevant legal requirements, including the City of Los Angeles Fair Chance Initiative For Hiring Ordinance, where applicable. If you are a qualified individual with a disability or a disabled veteran, you have the right to request a reasonable accommodation if you are unable or limited in your ability to use or access as a result of your disability. You can request reasonable accommodations in the US by calling 1- and in the UK by calling .

24/09/2022

Full time

Company Description NBC Sports Next is where sports and technology intersect. We're a subdivision of NBC Sports and home to all NBCUniversal digital applications in sports and technology within our three groups: Youth & Recreational Sports; Golf; and Betting, Gaming & Emerging Media. At NBC Sports Next, we make playing sports better through innovative technology and immersive experiences for athletes, coaches, players and fans. We equip more than 30MM players, coaches, athletes, sports administrators and fans in 40 countries with more than 25 sports solution products, including SportsEngine, the largest youth sports club, league and team management platform; GolfNow, the leading online tee time marketplace and provider of golf course operations technology; GolfPass the ultimate golf membership that connects golfers to exclusive content, tee time credits, and coaching, tips; TeamUnify, swim team management services; GoMotion, sports and fitness business software solutions; and NBC Sports Edge, a leading platform for fantasy sports information and betting-focused tools. At NBC Sports Next we're fueled by our mission to innovate, create larger-than-life events and connect with sports fans through technology that provides the ultimate in immersive experiences. This role is part of our Youth & Recreational Sports group, comprised of technology platforms such as SportsEngine, GoMotion, TourneyMachine, and TeamUnify. We enable athletes, parents, coaches and team administrators in the youth and recreational space to manage their organizations, collect payments, share schedules, find programs to participate in and connect with other families. Additionally, NCSI enables leagues and organizations to properly screen and train coaches in an effort to keep kids safe. Come join us as we work together as one team to innovate and deliver what's Next. Job Description Based out of our Belfast offices or working remotely within the UK or Ireland, the Platform Operations Engineer II will be a key member of our Platform Operations Team, helping to build and support the core infrastructure of the SportsEngine Platform services and products through activities and key responsibilities that include; Contributing to efforts that ensure the continuous and smooth running of the SportsEngine platform while serving a large volume of traffic. Leveraging Amazon Web Services to build highly available services for the SportsEngine infrastructure platform built on top of the EKS, RDS and EC2. Developing Infrastructure as code using tools like Terraform. Helping to foster a culture of cooperation, coordination, and continuous learning within the Platform Operations Team and with other Product Development teams throughout SportsEngine. Working closely with the SportsEngine Cyber Security Team to maintain and improve the security of the SportsEngine Platform. Contributing to and using our GitHub Pull Request-centered development pipeline as we continuously deliver value to our customers. Using tools such as NewRelic, Splunk and Datadog to monitor the health of the SportsEngine platform. Being an advocate for quality code and engineering practices that enable Continuous Delivery. Participation in a sustainable on-call schedule. Qualifications • 2 or more years of experience operating web applications in a Site Reliability Engineering, Web Operations, or Cloud Engineering capacity. • A strong foundation in modern infrastructure practices and the ability to deploy and operate maintainable, scalable secure infrastructure. • A team-oriented attitude and seemingly endless intellectual curiosity. • Excellent verbal and written communication skills. Additional desirable skills & experience; • Cloud Experience - Experience with either AWS, GCP or Azure (AWS preferred) - Deploying and managing public Cloud based applications • Kubernetes • CI/CD - Experience operating a CI/CD pipeline • Automation - Ability to script automation in one of either Ruby, Python, Go, Bash etc • Infrastructure as Code - Some exposure to one or all of Terraform/Ansible/Chef Additional Information NBCUniversal's policy is to provide equal employment opportunities to all applicants and employees without regard to race, color, religion, creed, gender, gender identity or expression, age, national origin or ancestry, citizenship, disability, sexual orientation, marital status, pregnancy, veteran status, membership in the uniformed services, genetic information, or any other basis protected by applicable law. NBCUniversal will consider for employment qualified applicants with criminal histories in a manner consistent with relevant legal requirements, including the City of Los Angeles Fair Chance Initiative For Hiring Ordinance, where applicable. If you are a qualified individual with a disability or a disabled veteran, you have the right to request a reasonable accommodation if you are unable or limited in your ability to use or access as a result of your disability. You can request reasonable accommodations in the US by calling 1- and in the UK by calling .

Site Reliability Engineer

Project Recruit

Site Reliability Engineer Our client, a leading global supplier for IT services requires a Site Reliability Engineer- Virtualisation SME based at their client's offices in London . You may be able to work some days remotely. This is a 1 year temporary contract to start ASAP. Day rate: Competitive market rate We are looking for a Site Reliability Engineer - Virtualisation SME with 10+ years of experience having excellent knowledge of ESX VMWare and/or Nutanix HCI and of container orchestration platforms such as Docker and Kubernetes: Key Responsibilities Responsible for the reliability and efficiency of virtualisation infrastructure through the delivery of common, repeatable tools and processes that greatly reduce the amount of toil the OS and DB Platform Operations team must perform Responsible for writing software to make the virtualisation infrastructure self-managing and self-service. Responsible for automation and continuous service improvement by developing Infrastructure as Code. Responsible for elimination of manual, repetitive, automatable, tactical tasks that are devoid from value. Responsible for availability, latency, performance, efficiency, change management, monitoring and capacity planning. Responsible for improving system performance, making effective use of resources, distributing load and reducing latency. Responsible for identifying SLO's (Service Level Objectives) that align the team to meet availability and latency objectives. Responsible for developing pro-active monitoring solutions that alert on symptoms and not just on outages. Responsible for performing detailed root cause analysis (RCA's) on incidents and outages to prevent future occurrence. Responsible for partnering with development teams to improve services via rigorous testing and release procedures. Responsible for actively sharing knowledge and best practices across the organisation. Responsible for identifying technical debt and partner with application teams to build remediation plans. Responsible for developing standard operational procedures and producing effective documentation. Responsible for analysing workloads and devising suitable cloud migration strategies where appropriate. Responsible for participating in on-call rotation, triaging and addressing production issues as they arise. Responsible for performing the OS Platform Operations function as and when required. Responsible for mentoring and developing less experienced SA's and SRE's. Responsible for identifying cost saving and optimisation opportunities within the customer business. Responsible for building strong relationships across the customer functions and business areas, underpinned by trust and the core values of the customer. Key Skills Essential: Excellent knowledge of ESX VMWare and/or Nutanix HCI. Excellent knowledge of Windows Server 2008/2012/2016/2019. Excellent knowledge of Windows OS tuning utilities and commands. Excellent knowledge of configuring Windows OS systems for optimal performance. Excellent knowledge of Windows clustering and high-availability solutions. Excellent knowledge of Microsoft Active Directory, LDAP and Kerberos. Excellent knowledge of TCP/IP Networking Protocols. Excellent knowledge of networking, storage, database and virtualization layers. Excellent knowledge of container orchestration platforms such as Docker and Kubernetes. Excellent knowledge of version control software such as GitHub and Subversion. Excellent knowledge of configuration management software such as Chef, Puppet, Ansible, Terraform and SaltStack. Excellent knowledge of "Infrastructure as Code" principles and practices. Excellent knowledge of continuous integration (CI) and continuous development (CD) principles and practices. Excellent knowledge of applications development using Agile, and DevOps best practices. Excellent knowledge of operating system security and auditing methods. Excellent knowledge of security hardening principles in line with CIS industry benchmarks. Excellent knowledge of data security governance and regulations such as GDPR and SOX. Excellent knowledge of cloud computing - IaaS, PaaS and SaaS offerings across Azure, AWS, GCP and Oracle. Desirable: Good working knowledge of RedHat Enterprise Linux (6.x, 7.x, 8.x) and Solaris (10.x and 11.x). Good working knowledge of Unix/Linux OS tuning utilities and commands. Good working knowledge of Unix/Linux system internals and Kernel tuning for optimal performance. Good working knowledge of Red Hat Satellite. Good working knowledge of Anti-Virus software such as McAfee and Sophos. Good working knowledge of Ivanti LANDESK and Symantec Altiris. Good working knowledge of ThinPrint and EquiTrack (Follow-Me Printing). Good working knowledge of Rubrik. Good working knowledge of EMC, HDS and Pure storage arrays. Good working knowledge of Dell PowerEdge, IBM xSeries and Cisco UCS hardware. Good working knowledge of EMC Networker, Data Domain and IBM Tivoli Storage Manager. Good working knowledge of Infoblox DNS. Good working knowledge of Icinga 2 and OpManager. Good working knowledge of IBM Tivoli and Netcool. Good working knowledge of GitHub, Subversion and TeamCity. Good working knowledge of BMC Control-M. Good working knowledge of CyberArk. Good working knowledge of Splunk and IBM QRadar. Good working knowledge of Qualys. Good working knowledge of SharePoint, JIRA and Confluence. Good working knowledge of ServiceNow and Serena Business Manager. Candidate Specifications Excellent communication and interpersonal skills Ability to handle pressure during outages and systematically resolve issues Excellent problem-solving skills Results driven, with a strong sense of accountability A proactive, motivated approach The ability to operate with urgency and prioritise work accordingly A structured and logical approach to work Attention to detail and accuracy Ability to perform well in a pressurised environment Ability to manage constructive conflict effectively The ability to manage large workloads and tight deadlines Able to communicate complex technical concepts to non-technical persons at all levels

23/09/2022

Contractor

Site Reliability Engineer Our client, a leading global supplier for IT services requires a Site Reliability Engineer- Virtualisation SME based at their client's offices in London . You may be able to work some days remotely. This is a 1 year temporary contract to start ASAP. Day rate: Competitive market rate We are looking for a Site Reliability Engineer - Virtualisation SME with 10+ years of experience having excellent knowledge of ESX VMWare and/or Nutanix HCI and of container orchestration platforms such as Docker and Kubernetes: Key Responsibilities Responsible for the reliability and efficiency of virtualisation infrastructure through the delivery of common, repeatable tools and processes that greatly reduce the amount of toil the OS and DB Platform Operations team must perform Responsible for writing software to make the virtualisation infrastructure self-managing and self-service. Responsible for automation and continuous service improvement by developing Infrastructure as Code. Responsible for elimination of manual, repetitive, automatable, tactical tasks that are devoid from value. Responsible for availability, latency, performance, efficiency, change management, monitoring and capacity planning. Responsible for improving system performance, making effective use of resources, distributing load and reducing latency. Responsible for identifying SLO's (Service Level Objectives) that align the team to meet availability and latency objectives. Responsible for developing pro-active monitoring solutions that alert on symptoms and not just on outages. Responsible for performing detailed root cause analysis (RCA's) on incidents and outages to prevent future occurrence. Responsible for partnering with development teams to improve services via rigorous testing and release procedures. Responsible for actively sharing knowledge and best practices across the organisation. Responsible for identifying technical debt and partner with application teams to build remediation plans. Responsible for developing standard operational procedures and producing effective documentation. Responsible for analysing workloads and devising suitable cloud migration strategies where appropriate. Responsible for participating in on-call rotation, triaging and addressing production issues as they arise. Responsible for performing the OS Platform Operations function as and when required. Responsible for mentoring and developing less experienced SA's and SRE's. Responsible for identifying cost saving and optimisation opportunities within the customer business. Responsible for building strong relationships across the customer functions and business areas, underpinned by trust and the core values of the customer. Key Skills Essential: Excellent knowledge of ESX VMWare and/or Nutanix HCI. Excellent knowledge of Windows Server 2008/2012/2016/2019. Excellent knowledge of Windows OS tuning utilities and commands. Excellent knowledge of configuring Windows OS systems for optimal performance. Excellent knowledge of Windows clustering and high-availability solutions. Excellent knowledge of Microsoft Active Directory, LDAP and Kerberos. Excellent knowledge of TCP/IP Networking Protocols. Excellent knowledge of networking, storage, database and virtualization layers. Excellent knowledge of container orchestration platforms such as Docker and Kubernetes. Excellent knowledge of version control software such as GitHub and Subversion. Excellent knowledge of configuration management software such as Chef, Puppet, Ansible, Terraform and SaltStack. Excellent knowledge of "Infrastructure as Code" principles and practices. Excellent knowledge of continuous integration (CI) and continuous development (CD) principles and practices. Excellent knowledge of applications development using Agile, and DevOps best practices. Excellent knowledge of operating system security and auditing methods. Excellent knowledge of security hardening principles in line with CIS industry benchmarks. Excellent knowledge of data security governance and regulations such as GDPR and SOX. Excellent knowledge of cloud computing - IaaS, PaaS and SaaS offerings across Azure, AWS, GCP and Oracle. Desirable: Good working knowledge of RedHat Enterprise Linux (6.x, 7.x, 8.x) and Solaris (10.x and 11.x). Good working knowledge of Unix/Linux OS tuning utilities and commands. Good working knowledge of Unix/Linux system internals and Kernel tuning for optimal performance. Good working knowledge of Red Hat Satellite. Good working knowledge of Anti-Virus software such as McAfee and Sophos. Good working knowledge of Ivanti LANDESK and Symantec Altiris. Good working knowledge of ThinPrint and EquiTrack (Follow-Me Printing). Good working knowledge of Rubrik. Good working knowledge of EMC, HDS and Pure storage arrays. Good working knowledge of Dell PowerEdge, IBM xSeries and Cisco UCS hardware. Good working knowledge of EMC Networker, Data Domain and IBM Tivoli Storage Manager. Good working knowledge of Infoblox DNS. Good working knowledge of Icinga 2 and OpManager. Good working knowledge of IBM Tivoli and Netcool. Good working knowledge of GitHub, Subversion and TeamCity. Good working knowledge of BMC Control-M. Good working knowledge of CyberArk. Good working knowledge of Splunk and IBM QRadar. Good working knowledge of Qualys. Good working knowledge of SharePoint, JIRA and Confluence. Good working knowledge of ServiceNow and Serena Business Manager. Candidate Specifications Excellent communication and interpersonal skills Ability to handle pressure during outages and systematically resolve issues Excellent problem-solving skills Results driven, with a strong sense of accountability A proactive, motivated approach The ability to operate with urgency and prioritise work accordingly A structured and logical approach to work Attention to detail and accuracy Ability to perform well in a pressurised environment Ability to manage constructive conflict effectively The ability to manage large workloads and tight deadlines Able to communicate complex technical concepts to non-technical persons at all levels

Senior Application Support / SRE

Deerfoot IT Resources Ltd

Senior Application Support / SRE Hybrid Working: Mix of Home Working / London EMEA HQ Permanent, Full Time As a trusted and preferred recruitment partner to this leading global provider of cloud-based solutions to the global financial sector, we have been asked to assist in the hire of a Senior Application Support Engineer to take responsibility for the availability and reliability of services used by over 23,000 customers across 90 countries (including 22 of the world's top 25 banks). In this role you will ensure all services exceed availability targets, have in-depth monitoring and are proactively managed. Already benefitting from a dominance in the North American finance industry, our client is expanding its London operations to better serve the UK and EU markets. This is an exciting time to join, and you will have the opportunity to work a mix of remotely and within their state-of-the-art EMEA HQ in London. Your Job *Service Reliability: Proactively identifying risks to service and remediate them. Reduce risk from deployments by improved use of resilience and ensuring appropriate testing of releases pre and post deployment. Provide support and troubleshooting when service incidents occur. Improve time to recover from service impacting incidents. Identifying trends and root causes to reduce volume of incidents. *Automation: Identify and deliver on opportunities to use automation to increase efficiency, reduce toil and drive service availability. Use automation and orchestration techniques to provide repeatable solutions and reduce risk of mis-operations. *Observability: Monitor and ensure smooth operation of all production services. Identifying gaps in coverage and improving observability of Production services. Ensuring appropriate events are generated for service failure or degradation scenarios. Responding to events and alerts in timely manner managing through to resolution. *Knowledge management: Continuously improving the knowledge of the Application Support team to become subject matter experts on the Product and the technology that runs it. Collaborating with other teams to understand how underpinning services support the Products. Identifying opportunities to share knowledge and decrease the time it takes to resolve customer related incidents. Tech Stacks: Platform and Database Tech: Linux, Cassandra, Kafka, ArangoDB; Containerisation/Virtualisation: Kubernetes/OpenShift, VMware; Instrumentation and Monitoring: Splunk, Zabbix, Prometheus, Grafana; Scripting: PowerShell, Python. Your Skills *Experience as a Site Reliability Engineer, Application Support Engineer or similar running highly available critical services (ideally SaaS) *Scripting abilities in PowerShell / Python *Understanding of networking, firewalls, protocols, databases and more *Java Debugging - ability to complete thread dumps and analysis *Experience with monitoring solutions *Splunk Experience - creating dashboards, events and analysis *CI/CD Delivery Practices *Troubleshooting connectivity issues: TCP/IP, DNS, Telnet, Trace Route, TCP dump and analysis *Awareness of Load Balancing Technologies such as HA Proxy, Nginx, F5 *Experience of collaboration technologies - email, archiving, instant messaging *Exposure to support Voice / SMS Tech nice to have Alongside a competitive salary, you will receive a benefits package which includes 25 Days Holiday (increases with service), Private Medical Cover, Bupa Dental Cover, Life Insurance, Income Protection, Secondment Opportunities to Global HQ in Vancouver, Pension Scheme (increases with service up to 7% employer contribution), Bonus Scheme (up to 8% dependent on revenues and team performance). This role would be suitable for those who have held the following job roles: Site Reliability Engineer, Senior SRE, Site Availability Engineer, Application Support Engineer, Senior Site Reliability Engineer, Senior Application Support Engineer, Lead SRE, Lead Site Reliability Engineer, Lead Application Support. Deerfoot IT Resources Ltd is one of the UK's leading IT Recruitment Agencies, trusted by many of the UK's leading employers. Established in 1997, we have over twenty years of experience as IT Recruitment Specialist. We will never send your CV anywhere without your authorisation and only after you have seen the complete details on this opportunity. Deerfoot is acting as an employment agency in relation to this vacancy. Each time Deerfoot sends a CV to a recruiting client we donate £1 to The Born Free Foundation ().

04/11/2021

Full time

Senior Application Support / SRE Hybrid Working: Mix of Home Working / London EMEA HQ Permanent, Full Time As a trusted and preferred recruitment partner to this leading global provider of cloud-based solutions to the global financial sector, we have been asked to assist in the hire of a Senior Application Support Engineer to take responsibility for the availability and reliability of services used by over 23,000 customers across 90 countries (including 22 of the world's top 25 banks). In this role you will ensure all services exceed availability targets, have in-depth monitoring and are proactively managed. Already benefitting from a dominance in the North American finance industry, our client is expanding its London operations to better serve the UK and EU markets. This is an exciting time to join, and you will have the opportunity to work a mix of remotely and within their state-of-the-art EMEA HQ in London. Your Job *Service Reliability: Proactively identifying risks to service and remediate them. Reduce risk from deployments by improved use of resilience and ensuring appropriate testing of releases pre and post deployment. Provide support and troubleshooting when service incidents occur. Improve time to recover from service impacting incidents. Identifying trends and root causes to reduce volume of incidents. *Automation: Identify and deliver on opportunities to use automation to increase efficiency, reduce toil and drive service availability. Use automation and orchestration techniques to provide repeatable solutions and reduce risk of mis-operations. *Observability: Monitor and ensure smooth operation of all production services. Identifying gaps in coverage and improving observability of Production services. Ensuring appropriate events are generated for service failure or degradation scenarios. Responding to events and alerts in timely manner managing through to resolution. *Knowledge management: Continuously improving the knowledge of the Application Support team to become subject matter experts on the Product and the technology that runs it. Collaborating with other teams to understand how underpinning services support the Products. Identifying opportunities to share knowledge and decrease the time it takes to resolve customer related incidents. Tech Stacks: Platform and Database Tech: Linux, Cassandra, Kafka, ArangoDB; Containerisation/Virtualisation: Kubernetes/OpenShift, VMware; Instrumentation and Monitoring: Splunk, Zabbix, Prometheus, Grafana; Scripting: PowerShell, Python. Your Skills *Experience as a Site Reliability Engineer, Application Support Engineer or similar running highly available critical services (ideally SaaS) *Scripting abilities in PowerShell / Python *Understanding of networking, firewalls, protocols, databases and more *Java Debugging - ability to complete thread dumps and analysis *Experience with monitoring solutions *Splunk Experience - creating dashboards, events and analysis *CI/CD Delivery Practices *Troubleshooting connectivity issues: TCP/IP, DNS, Telnet, Trace Route, TCP dump and analysis *Awareness of Load Balancing Technologies such as HA Proxy, Nginx, F5 *Experience of collaboration technologies - email, archiving, instant messaging *Exposure to support Voice / SMS Tech nice to have Alongside a competitive salary, you will receive a benefits package which includes 25 Days Holiday (increases with service), Private Medical Cover, Bupa Dental Cover, Life Insurance, Income Protection, Secondment Opportunities to Global HQ in Vancouver, Pension Scheme (increases with service up to 7% employer contribution), Bonus Scheme (up to 8% dependent on revenues and team performance). This role would be suitable for those who have held the following job roles: Site Reliability Engineer, Senior SRE, Site Availability Engineer, Application Support Engineer, Senior Site Reliability Engineer, Senior Application Support Engineer, Lead SRE, Lead Site Reliability Engineer, Lead Application Support. Deerfoot IT Resources Ltd is one of the UK's leading IT Recruitment Agencies, trusted by many of the UK's leading employers. Established in 1997, we have over twenty years of experience as IT Recruitment Specialist. We will never send your CV anywhere without your authorisation and only after you have seen the complete details on this opportunity. Deerfoot is acting as an employment agency in relation to this vacancy. Each time Deerfoot sends a CV to a recruiting client we donate £1 to The Born Free Foundation ().

9 jobs found

Modal Window