Job description
At GSK, we want to supercharge our data capability to better understand our patients and accelerate our ability to discover vaccines and medicines. The Onyx Research Data Platform organization represents a major investment by GSK R&D and Digital & Tech, designed to deliver a step-change in our ability to leverage data, knowledge, and prediction to find new medicines.
Our Compute Platform Engineering team builds a first in class platform of toolchains and workflows that accelerate application development, scale up computational experiments, and integrate all computation with project metadata, logs, experiment configuration and performance tracking over abstractions that encompass Cloud and High Performance Computing (HPC). This metadata forward, CI/CD driven platform represents and enables the entire application and analysis lifecycle including interactive development and explorations (notebooks), large scale batch processing, observability and production application deployments.
Key Responsibilities
- Design, build and operate tools, services, workflows that deliver high value through solutions to key business problems.
- Develop key components of a hybrid on prem/cloud compute platform for both interactive and scalable batch computing and establish processes and workflows to transition existing HPC users and teams to this platform.
- Manage code driven environment, applications and container/image builds as well as CI/CD driven application deployments.
- Consult science users on application scalability to PBs of data, incorporating deep understanding of software engineering, algorithms and underlying hardware infrastructure.
- Optimize design and execution of complex solutions within large scale distributed computing environments.
- Produce well engineered software, including automated test suites, technical documentation, and operational strategy.
- Ensure consistent application of platform abstractions to maintain quality and consistency with respect to logging and lineage.
- Adhere to coding best practices, participate in code reviews and partner to improve team standards.
- Follow QMS framework and CI/CD best practices, guiding continual improvements.
Basic Qualifications
- Bachelor's degree in Data Engineering, Computer Science, Software Engineering or related field.
- 4+ years of professional experience.
- Experience with Python.
- Experience with Cloud.
- Experience with High Performance Compute (HPC).
Preferred Qualifications
- Knowledge and use of at least one common programming language: Python, Go, C++, Scala, Java, including toolchains for documentation, testing and operations/observability.
- Expertise in modern software development tools and ways of working (e.g., git/GitHub, devops tools, metrics, monitoring).
- Cloud expertise (AWS, Google Cloud, Azure), including infrastructure as code tools and scalable compute technologies such as Google Batch and Vertex.
- Experience with CI/CD implementations using git and a common CI/CD stack (Azure DevOps, CloudBuild, Jenkins, CircleCI, GitLab).
- Expertise with Docker, Kubernetes and the larger CNCF ecosystem, including Helm.
- Experience with low level application build tools (make, CMake) and automated build systems such as Spack or EasyBuild.
- Experience in workflow orchestration with tools such as Argo Workflow, Airflow, Nextflow, Snakemake, VisTrails, or Cromwell.
- Experience with application performance tuning and optimization, including parallel and distributed computing paradigms and communication libraries such as MPI, OpenMP, Gloo.
- Demonstrated excellence with agile software development environments using Jira and Confluence.
- Familiarity with tools, techniques and optimizations in the high performance applications space, including engagement with the open source community.
GSK is an Equal Opportunity Employer. This ensures that all qualified applicants will receive equal consideration for employment without regard to race, color, religion, sex (including pregnancy, gender identity, and sexual orientation), parental status, national origin, age, disability, genetic information (including family medical history), military service or any basis prohibited under federal, state or local law.