We are recruiting for an SC Cleared Enterprise Observability Consultant for a leading IT Service provider based in Hatfield.
We are looking for an Enterprise Observability Consultant with an in-depth understanding of Observability platforms and technologies ranging between Vendor Specific products eg Dynatrace, Splunk, Grafana, Cribl etc. & Open-Source Observability projects eg Open Telemetry, Prometheus, Grafana OSS etc.
You will be responsible for providing Observability platform delivery expertise to deliver advisory, design & implementation services that meets our customers business requirements within their overall observability strategy. The role will also involve staying at the forefront of new technologies and new vendors, working within the Enterprise Observability Practice.
Main Responsibilities
- Observability Strategy & Advisory
- Lead discovery workshops to assess observability maturity and define tailored roadmaps aligned to business and IT objectives
- Assess current monitoring and observability maturity for Enterprise Organisations & recommend tooling strategies, often leveraging platforms like Dynatrace for full-stack visibility
- Translate business and technical requirements into actionable observability use cases to support change management and enablement initiatives
- Advise on tools, platforms, and best practices (eg, OpenTelemetry, SIEM vs Observability, Telemetry Management, SRE principles)
- Architecture & Solution Design
- Design end-to-end observability architectures, including Logs, metrics, traces, profiles etc., Distributed tracing frameworks/APM tooling, Infrastructure & cloud monitoring, Synthetic and real-user monitoring
- Create telemetry data pipelines and instrumentation strategies
- Ensure scalable, secure, and cost-efficient observability patterns
- Tooling Implementation
- Deploy and configure observability platforms such as Dynatrace, Splunk, Grafana Cloud, Cribl, Elastic
- Implement OpenTelemetry collectors, agents, and SDK instrumentation strategies
- Build dashboards, alerts, and automation workflows
- Integrate Observability platforms with ITSM, AIOps, Event Management platforms
- Troubleshooting & Performance Engineering
- Analyse application, infrastructure, and network performance issues.
- Lead root-cause analysis and performance optimisation initiatives.
- Enable proactive detection through anomaly detection and alert tuning.
Skills Required
- Expertise in observability frameworks, telemetry pipelines, and service mesh integrations.
- Deep understanding of observability pillars: metrics, logs, traces, and user experience.
- Expert Level Familiarity with Products such as Dynatrace, Splunk, Grafana Cloud, Cribl (experience with at least two product sets)
- Strong understanding of Observability platform architecture, including Telemetry Storage, OpenTelemetry support, and cloud integrations.
- Experience with Dynatrace/Splunk/Grafana APIs, tagging strategies, and problem detection workflows.
- Proficiency in Scripting (Python, Bash) and automation tools (Terraform, Ansible).
- Strong stakeholder engagement and communication skills.
- 10+ years in consulting, enterprise design, and implementation roles
Desirable
- Professional Level Certifications in Observability products/OpenTelemetry Associate Certification/Prometheus Associate Certification
- Familiarity with DevOps and Platform engineering ways of working with associated tools (CI/CD, git, automation etc.)
- Working level understanding of Cloud/Cloud Native Observability technologies (AWS CloudWatch, Azure Monitor, eBPF, Prometheus etc.)
- Good understanding of networking principles related to Observability protocols (Syslog, SNMP, OTLP etc.)
- Experience integrating Observability platforms with ITSM and alerting platforms
- Cloud/CNCF certifications