Site Reliability Engineer – Operations

RealPage View all jobs

  • Manila City, Metro Manila Pasig City, Metro Manila
  • Permanent
  • Full-time
  • 1 day ago
OverviewThe SRE Ops Engineer reports to the Sr. Director of Reliability Engineering and is responsible for ensuring product stability, operational excellence, and a strong customer experience across critical platforms, with a primary focus on Windows-based environments. This role partners closely with Engineering, CloudOps, InfoSec, and QA to reduce incidents, improve system reliability, and drive operational rigor through automation, monitoring, and incident management.ResponsibilitiesPrimary Responsibilities
  • Manage and support Windows-based production environments, including IIS, Windows Services, Active Directory, and related infrastructure
  • Build, maintain, and enhance monitoring, alerting, and observability frameworks using ELK or equivalent platforms
  • Lead incident response, troubleshooting, and root cause analysis (RCA) for customer-impacting issues
  • Improve system reliability by reducing critical incidents and driving down Mean Time to Resolution (MTTR)
  • Develop and maintain automation using scripting tools such as PowerShell, Python, or similar technologies
  • Support high-availability, high-performance production systems and participate in on-call rotations
  • Collaborate with cross-functional teams to ensure platform stability, security, and reliability
  • Contribute to platform upgrades, patching, modernization initiatives, and operational best practices
  • Create and maintain runbooks, operational standards, and documentation
QualificationsRequired Knowledge & Skills
  • 5+ years of experience in Windows Server environments, including IIS and Windows Services
  • 5+ years of experience with monitoring and observability tools (ELK stack or equivalent)
  • Strong experience with incident management, troubleshooting, and root cause analysis
  • Hands-on experience with automation and scripting (PowerShell, Python, etc.)
  • Working knowledge of Linux systems for basic administration and troubleshooting
  • Strong understanding of system performance, scalability, and operational best practices
  • Experience supporting production systems with high availability requirements
  • Familiarity with cloud platforms (AWS, GCP, Azure) is a plus
  • Exposure to CI/CD tools and DevOps practices
  • Strong communication, collaboration, and ownership mindset
  • Ability to operate effectively in a fast-paced, production-focused environment

RealPage