Job Description
Icehouseventures is seeking a Staff Cloud Site Reliability Engineer to shape the reliability of large-scale AI systems and GPU compute infrastructure. This founding role involves building and scaling reliability foundations for the AI cloud platform and ensuring cloud infrastructure resilience. Responsibilities include operationalizing SLOs, improving incident response, and creating automation for operations. The position offers a hybrid work model, encouraging collaboration in the London office while allowing remote work.