
Platform Reliability Engineer
- Quezon City, Metro Manila
- Permanent
- Full-time
- Support our Product Line Engineering (PLE) organizations to develop resilient and highly scalable applications, working closely with our operations staff to support these applications while maintaining a strong focus on application reliability
- Champion, promote, and deploy the effective use of innovative application monitoring and AIOps-based machine learning tools
- Facilitate monitoring and custom instrumentation across our business-critical applications, including the consultation and management of agent-based, agentless, synthetic and scripted monitoring technologies
- Address incidents and problems within the platforms, with rotational accountability for on-call support
- Collaborate with platform and software engineers, site reliability engineers, product managers and engineering leadership to uncover difficulties and opportunities to accelerate the delivery of new value through software
- Prototype and build new capabilities to increase the leverage of platform operations and security
- Deliver an outstanding user experience to our engineers with a focus on reliability
- A bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field is often preferred.
- 3-5 years’ experience in Observability and Monitoring
- Proficiency in programming languages such as Powershell, Python, Java, Javascript, NodeJS, or Ruby for scripting and automation.
- Experience with monitoring and observability tools like New Relic, Broadcom CA APM, Dynatrace, and PRTG, Solarwinds, Grafana, ADX
- Familiarity AIOps technologies such as Moogsoft
- Proficient in cloud platforms (AWS, Azure, Google Cloud) and their native monitoring tools.
- Familiarity with containerization and orchestration technologies like Docker and Kubernetes.
- Knowledge of infrastructure as code (IaC) or observability as code (OaC) tools such as Terraform or Ansible.
- Experience using collaboration tools like ServiceNow and xMatters, Power Automate
- Familiarity with Open source, GitHub, Agentic AI
- Previous experience in DevOps, Site Reliability Engineering (SRE), or similar Observability role that involves monitoring and tools platform management.
- Experience with building and maintaining observability pipelines, including logs, metrics, and traces.
- Previous experience in DevOps, Site Reliability Engineering (SRE), or performance and reliability developer using the full stack monitoring functionality.
- Experience with open-source technology, software development and system engineering, automation and predictive analytics.
- Experience or familiarity with Open source, GitHub, CoPilot, Agentic AI, AI
- Familiarity with API development principles and practical application and knowledge of modern software architecture
- Amenable to work UP Ayala Technohub (Quezon City)
- Amenable to work on a hybrid set-up (3x a week onsite)
- Amenable to work in a rotating shift schedule (flexible depending on business need)
- Strong problem-solving skills and the ability to analyze complex systems.
- Ability to interpret data and metrics to diagnose issues and improve system performance.
- Able to build Reporting Dashboard using PowerBI
- Strong communication skills to collaborate with development, operations, and product teams.
- Ability to work in a fast-paced environment and manage multiple tasks simultaneously.
- Understanding of networking concepts and protocols.
- Experience with performance tuning and capacity planning, involves designing.
- Familiarity with Agile methodologies and tools like JIRA.
- Data driven decision making
- We’ll empower you to learn and grow the career you want.
- We’ll recognize and support you in a flexible environment where well-being and inclusion are more than just words.
- As part of our global team, we’ll support you in shaping the future you want to see.