Overview
Perkiraan Gaji
8.000.000 - 12.000.000
Industri
Tipe Pekerjaan
Hybrid
Durasi Bekerja
12 Bulan
Tags
Deskripsi
About the Role
We are looking for a DevOps / Site Reliability Engineer (SRE) to join our Managed Services team. In this role, you will support clients with large-scale infrastructure environments, ensuring system reliability, availability, and performance.
Your responsibilities will include managing virtual machine and containerized environments, performing disaster recovery activities, maintaining operating systems and applications, and troubleshooting infrastructure-related issues.
Requirements
Preferred Qualifications
Hands-on experience administering Linux servers in production environments (RHEL, Rocky Linux, Ubuntu Server).
Strong understanding of Linux storage management and troubleshooting.
Experience troubleshooting Linux boot issues, including GRUB, dracut rescue, and initramfs rebuild.
Ability to analyze and troubleshoot system logs using tools such as
journalctl,dmesg, and/var/log.Familiarity with Disaster Recovery (DR) concepts, including main site vs. DR site architecture, failover processes, RPO, and RTO.
Experience managing virtual machines, including provisioning, snapshots, lifecycle management, and troubleshooting.
Hands-on experience with on-premises private cloud, virtualization platforms, or public cloud environments.
Solid understanding of networking fundamentals, including IP addressing, routing, DNS, VLANs, and firewalls.
Familiarity with Kubernetes or Docker Swarm, including workload deployment and basic troubleshooting.
Experience working with CI/CD pipelines and executing existing deployment workflows.
Understanding of end-to-end application request flow (e.g., DNS → Proxy → Application → Database).
Strong communication and collaboration skills, with a positive attitude and sense of ownership.
Excellent analytical and problem-solving skills, with a willingness to learn and take on new challenges.
Nice to Have
Experience with enterprise backup solutions such as Acronis, Veeam, or similar tools.
Scripting and automation experience using Python, Bash, or Ansible.
Experience with monitoring and observability tools such as Prometheus, Grafana, or Zabbix.
Windows Server administration experience, including user management, RDP/WinRM, and event log analysis.
Basic database administration skills, including backup and restore operations for MySQL or PostgreSQL.
Understanding of web application architecture and the ability to diagnose application-related issues.
Experience performing application testing, validation, and integrity checks.
Key Competencies
Infrastructure Operations
Linux Administration
Virtualization & Cloud Platforms
Containerization & Orchestration
Disaster Recovery & Business Continuity
Monitoring & Observability
Automation & Scripting
Troubleshooting & Incident Response
Communication & Team Collaboration
Poin Penting!
Proyek Serupa
Lihat Proyek IT Lainnya
Cara Menjadi Talenta IT KAZOKKU
untuk Proyek Ini
Lamar dengan mendaftar
Proses seleksi oleh KAZOKKU
Proses seleksi oleh Klien Akhir
Kontrak dan Mulai Bekerja untuk Klien