About Me
Hello 👋 I’m Dhan V Sagar.
I live in Waterloo, Ontario, Canada.
Accomplished Cloud and Site Reliability Engineer with over 10 years of experience in Cloud Native technologies, DevOps, and Observability. Expertise in infrastructure automation, Kubernetes administration, and cloud operations across AWS, Terraform, and Helm. Proficient in CI/CD, GitOps practices, and monitoring solutions like Prometheus, OpenTelemetry, LGTM Stack and Grafana. Passionate about delivering scalable, reliable, and efficient cloud platforms.
I have also got a chance to work on some IPTV/OTT/Video products in my previous company Nokia Corporation.
As a hobby during my free time, I enjoy capturing landscapes and telling stories through my photography
Experience
-
Architected and implemented Metrics 2.0 solutions, enabling Observability as a Service using Prometheus, Grafana Mimir, and Grafana.
-
Migrated 35M active metric series from legacy systems (Circonus, InfluxDB) to Prometheus/Mimir with zero downtime.
-
Led the modernization of logging and observability systems for enhanced service reliability.
-
Deployed and maintained scalable Kubernetes applications using Helm and Terraform.
-
Automated CI/CD pipelines with GitOps tools like ArgoCD and Terraform Enterprise.
-
Conducted incident management and post-mortem analysis to improve system reliability.
-
Created proof-of-concepts (PoCs) for building intelligent agents using LangChain, RAG (Retrieval-Augmented Generation), and integrating internal documentation to enhance team productivity.
-
Modernized infrastructure monitoring using OpenTelemetry (OTel).
-
Explored and architected distributed tracing solutions with OpenTelemetry and Tempo.
-
Designed specific logging use cases leveraging Apache Iceberg, Apache Pinot, and AWS Glue for advanced analytics.
-
Built AWS environments (sandbox and production) and automated deployments using Terraform and Ansible.
-
Streamlined Kubernetes lifecycle management via Helm charts and Jenkins-based CI/CD.
-
Developed automated testing frameworks and containerized test environments using Docker and Kubernetes.
-
Containerized multimedia applications using Docker and performed end-to-end validation on cloud-based systems.
-
Automated performance analysis of video/audio traffic with Python scripting.
-
Automated CI/CD pipelines using GitLab CI, Docker, and Python.
-
Led network configuration and VPN integrations for production environments.
-
Enhanced service reliability through functional and performance testing on Linux platforms.
- End to End integration and validation of IPTV/OTT Products
Education
Amrita University
M.Tech Computer Science
2013 - 2015
Graduate Coursework:
Machine Learning; Agent based intelligent systems; Statistics; Soft Computing, Computational theory.
UnderGraduate Coursework:
Datastructures & Algorithms; Operating Systems; Computer Networks; Software Engineering Methods.
Research
2013 - 2015
- Best Paper Award IEEE-ICCIC (2014): Author of the research paper Titled: “Random Forest and Change Point Detection for Root Cause LocalizaUon in Large Scale Systems.”
- Research Paper in SPRINGER (2015): Co- author of the paper Titled, “ForecasUng the stability of Data Center based on Real Time data of batch workload using Time Series models.”