DevOps Engineer / SRE

Daftar Sekarang

Overview

Perkiraan Gaji

8.000.000 - 12.000.000

Industri

Tipe Pekerjaan

Hybrid

Durasi Bekerja

12 Bulan

Deskripsi

About the Role

We are looking for a DevOps / Site Reliability Engineer (SRE) to join our Managed Services team. In this role, you will support clients with large-scale infrastructure environments, ensuring system reliability, availability, and performance.

Your responsibilities will include managing virtual machine and containerized environments, performing disaster recovery activities, maintaining operating systems and applications, and troubleshooting infrastructure-related issues.

Requirements

Preferred Qualifications

Hands-on experience administering Linux servers in production environments (RHEL, Rocky Linux, Ubuntu Server).
Strong understanding of Linux storage management and troubleshooting.
Experience troubleshooting Linux boot issues, including GRUB, dracut rescue, and initramfs rebuild.
Ability to analyze and troubleshoot system logs using tools such as journalctl, dmesg, and /var/log.
Familiarity with Disaster Recovery (DR) concepts, including main site vs. DR site architecture, failover processes, RPO, and RTO.
Experience managing virtual machines, including provisioning, snapshots, lifecycle management, and troubleshooting.
Hands-on experience with on-premises private cloud, virtualization platforms, or public cloud environments.
Solid understanding of networking fundamentals, including IP addressing, routing, DNS, VLANs, and firewalls.
Familiarity with Kubernetes or Docker Swarm, including workload deployment and basic troubleshooting.
Experience working with CI/CD pipelines and executing existing deployment workflows.
Understanding of end-to-end application request flow (e.g., DNS → Proxy → Application → Database).
Strong communication and collaboration skills, with a positive attitude and sense of ownership.
Excellent analytical and problem-solving skills, with a willingness to learn and take on new challenges.

Nice to Have

Experience with enterprise backup solutions such as Acronis, Veeam, or similar tools.
Scripting and automation experience using Python, Bash, or Ansible.
Experience with monitoring and observability tools such as Prometheus, Grafana, or Zabbix.
Windows Server administration experience, including user management, RDP/WinRM, and event log analysis.
Basic database administration skills, including backup and restore operations for MySQL or PostgreSQL.
Understanding of web application architecture and the ability to diagnose application-related issues.
Experience performing application testing, validation, and integrity checks.