We’re seeking a Senior DevOps Engineer to lead our cloud-native transformation and platform engineering initiatives. You’ll architect and implement scalable AWS infrastructure, build modern CI/CD pipelines, and establish comprehensive observability practices. This role requires deep expertise in Kubernetes, cloud-native architecture, and a passion for automation and operational excellence.
Key Responsibilities
Cloud Infrastructure & Architecture
- Design and implement highly available, scalable AWS infrastructure using IaC principles
- Architect and manage production-grade EKS clusters with advanced features (Karpenter, Cluster Autoscaler, VPC CNI)
- Implement multi-account AWS strategies using AWS Organizations, Control Tower, and Landing Zones
- Design and deploy secure network architectures (VPC, Transit Gateway, PrivateLink, VPN)
- Optimize AWS costs and implement FinOps practices for cloud resource management
- Lead infrastructure security reviews and implement AWS security best practices
- Implement disaster recovery and business continuity strategies
Kubernetes & Container Orchestration
- Design and manage enterprise-grade Kubernetes clusters across multiple environments
- Implement GitOps workflows using ArgoCD, FluxCD, or similar tools
- Develop and maintain Helm charts for application deployments
- Implement pod security policies, network policies, and RBAC configurations
- Manage service mesh implementations (Istio, Linkerd, or AWS App Mesh)
- Optimize container resource utilization and implement autoscaling strategies (HPA, VPA, KEDA)
- Troubleshoot complex Kubernetes networking and storage issues
- Implement multi-tenancy and namespace isolation strategies
Modern CI/CD Pipelines
- Design and implement cloud-native CI/CD pipelines using modern tools (GitHub Actions, GitLab CI, AWS CodePipeline, Jenkins X, Tekton)
- Build progressive delivery strategies (canary deployments, blue-green, feature flags)
- Implement automated testing frameworks (unit, integration, E2E, performance)
- Integrate comprehensive security scanning throughout the pipeline (SAST, DAST, SCA, container scanning)
- Establish artifact management and promotion strategies
- Implement automated rollback and self-healing mechanisms
- Design build optimization strategies for faster deployment cycles
Observability & Monitoring
- Implement comprehensive observability stack (metrics, logs, traces, and events)
- Deploy and manage Prometheus, Grafana, and Alertmanager for metrics monitoring
- Implement centralized logging with ELK/EFK stack or AWS CloudWatch Logs Insights
- Set up distributed tracing with Jaeger, Tempo, or AWS X-Ray
- Design custom dashboards and alerting rules for proactive monitoring
- Implement SLIs, SLOs, and error budgets for reliability engineering
- Build observability for infrastructure, applications, and business metrics
- Establish on-call rotation and incident response procedures
DevSecOps & Security
- Integrate security scanning tools into CI/CD pipelines (Trivy, Aqua, Snyk, Prisma Cloud)
- Implement secrets management using AWS Secrets Manager, HashiCorp Vault, or External Secrets Operator
- Conduct regular security audits and vulnerability assessments
- Implement compliance-as-code and policy enforcement (OPA, Kyverno, AWS Config)
- Manage SSL/TLS certificates and implement certificate automation
- Implement security monitoring and threat detection (GuardDuty, Security Hub)
- Ensure HIPAA and healthcare compliance requirements
Platform Engineering & Developer Experience
- Build and maintain internal developer platforms and golden paths
- Create self-service portals for infrastructure provisioning
- Develop and maintain platform documentation and runbooks
- Implement developer productivity tools and workflows
- Provide technical mentorship to junior DevOps engineers
- Conduct training sessions on cloud-native best practices
Required Qualifications
Technical Expertise
- AWS: 5+ years of hands-on AWS experience across multiple services (EC2, ECS, EKS, RDS, S3, Lambda, CloudFormation, etc.)
- Kubernetes: 3+ years of production Kubernetes experience, including cluster management and troubleshooting
- Infrastructure as Code: Expert-level Terraform or CloudFormation experience
- CI/CD: Deep experience with modern CI/CD tools and GitOps workflows
- Observability: Proven experience implementing full observability stacks
- Scripting: Strong Python, Go, or similar programming skills for automation
- Containers: Deep Docker expertise and container security best practices
Certifications (Preferred)
- AWS Certified Solutions Architect – Professional
- AWS Certified DevOps Engineer – Professional
- Certified Kubernetes Administrator (CKA)
- Certified Kubernetes Application Developer (CKAD)
- AWS Certified Security – Specialty (plus)
Professional Skills
- 6+ years of DevOps/SRE/Platform Engineering experience
- Strong understanding of microservices architectures and distributed systems
- Experience with high-availability and disaster recovery design
- Proven track record of leading technical initiatives and infrastructure migrations
- Excellent troubleshooting and performance optimization skills
- Strong communication skills and ability to explain complex technical concepts
- Experience working in Agile/Scrum environments
Nice to Have
- Experience with multi-cloud environments (Azure, GCP)
- Service mesh implementation experience
- Experience with Backstage or similar developer portals
- Knowledge of ML/AI infrastructure and MLOps
- Experience with chaos engineering practices
- Contribution to open-source projects
- Public speaking or technical blogging experience
- Previous experience in healthcare IT or regulated industries
- Experience with FHIR standards and healthcare interoperability
Healthcare Sector Requirements
- Understanding of HIPAA compliance and healthcare data security
- Experience with PHI (Protected Health Information) handling
- Knowledge of healthcare regulatory requirements (GDPR, HITRUST)
- Experience building SOC 2 or ISO 27001 compliant infrastructure
- Understanding of healthcare application architecture patterns
What We Offer
- Opportunity to build cutting-edge cloud infrastructure for healthcare
- Work with modern tools and technologies
- Influence technical direction and architecture decisions
- Collaborative environment with skilled engineers
- Professional development budget for certifications and training
- Conference attendance opportunities
- Flexible work arrangements
- Competitive compensation package
Work Environment
- Fast-paced, innovative environment
- Focus on automation and operational excellence
- Emphasis on security and compliance
- Culture of continuous learning and improvement
- Cross-functional collaboration