Job Title:
OpenShift Platform Lead - Virtualization Services
Job Summary
We are seeking an experienced OpenShift Platform Lead to own and manage our OpenShift-based virtualization platform that delivers enterprise VM hosting services. This role is responsible for the complete lifecycle management of the platform, including design, architecture, BAU operations, patching, upgrades, incident response, and driving platform stability.
You will lead the implementation, work closely with SRE and operations teams, and enable seamless VM migration from legacy infrastructure. This is a hands on technical leadership role requiring deep OpenShift expertise and the ability to balance operational excellence with strategic platform evolution.
Key Responsibilities Platform Leadership & Strategy
- Own the technical strategy and roadmap for the OpenShift Virtualization platform
- Define platform architecture, design patterns, and technical standards
- Lead platform lifecycle management including major/minor upgrades and Red Hat CoreOS updates
- Drive platform stability improvements and performance optimization initiatives
- Establish platform governance, compliance, and security policies
- Build relationships with Red Hat support and leverage Technical Account Management (TAM)
Lifecycle & Operations Management
- Manage complete platform lifecycle from installation through upgrades to decommissioning
- Plan and execute OpenShift platform upgrades (4.x releases) with zero/minimal downtime
- Coordinate quarterly/monthly Red Hat CoreOS (RHCOS) patching cycles
- Oversee OpenShift Virtualization operator upgrades and feature enablement
- Maintain platform health through proactive monitoring and capacity planning
- Ensure platform meets defined SLAs and availability targets (99.9%+)
- Lead Major Incident response for platform-level issues (Sev 1/2)
- Perform root cause analysis (RCA) and implement preventive measures
- Collaborate with SRE team on incident postmortems and improvement plans
- Manage platform-related events including maintenance windows
- Coordinate emergency changes and rollback procedures
- Participate in on call rotation for critical platform escalations
Change Implementation & Release Management
- Review and approve platform changes through Change Advisory Board (CAB)
- Plan and execute complex platform changes with risk assessment
- Implement infrastructure as code (IaC) practices using Ansible and Terraform
- Drive GitOps adoption for platform configuration management
- Coordinate release windows for platform updates with business stakeholders
- Ensure change documentation and runbook accuracy
- Lead VM migration strategy from VMware/legacy platforms to OpenShift Virtualization
- Design VM migration runbooks and automation workflows
- Create and maintain VM templates, golden images, and standardized configurations
- Enable application teams for self service VM provisioning
- Troubleshoot VM performance, networking, and storage issues
- Optimize VM placement, resource allocation, and cluster balancing
Platform Stability & Performance
- Define and monitor key performance indicators (KPIs) for platform health
- Tune OpenShift control plane and worker node performance
- Optimize storage performance (ODF/Ceph) for VM workloads
- Configure network policies and OVN Kubernetes for optimal VM networking
- Drive continuous improvement initiatives based on operational metrics
Required Qualifications Must Have Skills & Experience Experience Requirements
- 8 12 years of overall IT infrastructure experience
- 5+ years of hands on experience with Red Hat OpenShift Container Platform (4.x)
- 3+ years of experience with OpenShift Virtualization (KubeVirt) or similar VM hosting platforms
- 3+ years of experience in platform/infrastructure leadership roles
- 2+ years of experience with Red Hat Enterprise Linux (RHEL 7/8/9) and Red Hat CoreOS (RHCOS)
Technical Skills
- Advanced OpenShift Virtualization knowledge (VMs, DataVolumes, CDI, live migration)
- Advanced Red Hat CoreOS and Machine Config Operator (MCO) experience
- Advanced Linux administration and troubleshooting (RHEL based)
- Advanced storage management (ODF/Ceph, Storage Classes, PV/PVC, CSI drivers)
- Advanced networking (OVN Kubernetes, Multus, Network Policies, SDN concepts)
- Advanced automation skills (Ansible, Bash scripting, Python)
- Intermediate Infrastructure as Code (Terraform, GitOps tools like ArgoCD/Flux)
- Intermediate observability platforms (Prometheus, Grafana, AlertManager)
Platform Operations
- Proven experience managing platform lifecycle (installation, upgrades, patching)
- Strong incident management and major incident response experience
- Experience with change management processes and release coordination
- Demonstrated ability to perform root cause analysis and implement preventive measures
- Experience with capacity planning and performance tuning
- Track record of driving platform stability improvements
Certifications Required (one or more)
- Red Hat Certified Specialist in OpenShift Administration
Highly Desirable
- Red Hat Certified Architect (RHCA) certification
- Red Hat Certified Specialist in OpenShift Virtualization
- Experience with Red Hat Advanced Cluster Management (RHACM)
- Experience with Red Hat Advanced Cluster Security (RHACS/Stackrox)
- Experience with OpenShift on multiple infrastructures (bare metal, VMware, AWS, Azure)
Nice to Have
- Certified Kubernetes Administrator (CKA) or CKS
- Experience with multi tenancy and namespace isolation strategies
- Knowledge of compliance frameworks (PCI DSS, HIPAA, SOC2, ISO 27001)
- Experience with backup solutions (Kasten K10, Veeam, Commvault)
- Programming skills in Go, Python, or Java
- Experience with hybrid/multi cloud architectures
Key Success Metrics
- Successful upgrade completion rate: 100% with zero unplanned rollbacks
- Incident MTTR:
- VM migration velocity: Target VMs per month with
- Platform capacity utilization: % optimal range
- Change success rate: >98% first time success
- Some evening/weekend work required for maintenance windows