[Remote] Site Reliability Engineer
Note: The job is a remote job and is open to candidates in USA. Arctiq is a global, intelligence-driven technology services company delivering professional and managed services across various domains. The Site Reliability Engineer will focus on executing and maintaining reliability engineering practices for mission-critical government systems, bridging the gap between development and operations through automation and monitoring.
Responsibilities
- Implement and maintain dashboards and alerting rules using Prometheus, Grafana, or ELK Stack. Support the identification of Service Level Indicators (SLIs)
- Develop and maintain Infrastructure as Code (IaC) scripts using Terraform and Ansible to ensure repeatable, error-free deployments
- Maintain automated deployment pipelines, ensuring security scans and automated tests are integrated into the workflow
- Participate in on-call rotations and assist in troubleshooting system outages. Contribute to blameless post-mortem reports to drive continuous improvement
- Identify repetitive manual tasks and develop automation to reduce "toil," allowing the team to focus on high-value engineering
Skills
- 3–5 years of experience in SRE, DevOps, or Systems Engineering roles
- Proficiency in scripting languages (Python, Go, or Bash)
- Hands-on experience with containerization (Docker, Kubernetes) and cloud platforms (AWS, Azure, or GCP)
- Familiarity with NIST SP 800-53 security controls
- Education: Bachelor's degree in Computer Science or a related technical field
Company Overview