WHAT YOU'LL DO
- Develop dashboards to best aid our incident response capabilities
- Develop alerts based on SLOs allowing the team to respond to issues quickly
- Review logging setup to ensure that understanding the product flow is easy for engineers
- Benchmarking cloud disk performance and understand tradeoffs between filesystems
- Reviewing security group and network configurations to ensure the least privilege is being requested
- Build features in the development codebase from time to time
- Update the application code to review logging, metrics, tracing, auth etc
- Reviewing infrastructure costs and looking for potential savings
- Write public-facing Terraform and Cloudformation for AWS, GCP and Azure
- Work with product development teams to support their IAM needs
We operate on a hybrid working model, with three days per week in our impressive Custom House Square offices (Tuesdays, Wednesdays and Thursdays) and the remaining days working remotely.