Production Support Analysts
UST View all jobs
- Taguig City, Metro Manila
- Permanent
- Full-time
- Monitor system health across applications, infrastructure, APIs, and services using Splunk,
- Dynatrace, Grafana
- Review and respond to s, dashboards, and metrics in real time
- Create, expand, and maintain dashboards and ing to improve observability coverage
- Perform initial triage using logs, traces, and metrics; identify symptoms and potential root
- causes
- Execute runbooks/SOPs for common production issues (restarts, validation checks, health
- checks, etc.)
- Create incidents, document findings, and escalate to L2/L3 engineering teams
- Coordinate with on-call responders during high-priority events
- Perform routine health checks and proactive monitoring tasks every shift
- Provide clear communication and shift-to-shift handoff notes
- Operate within a 24/7 shift rotation, including nights/weekends/holidays
- 5+ years of experience in Production Support / NOC / L1 Ops roles
- Strong hands-on experience with Splunk, Dynatrace, Grafana
- Ability to analyze logs, metrics, and traces to identify first-level issues
- Experience executing operational runbooks
- Knowledge of ITSM tools like ServiceNow
- Strong communication and documentation skills
- Ability to work in fast-paced production environments
- Amenable to work in 24/7 rotational shifts
- Eagerness to learn new tools and technologies
- Good communication skills
- High ownership, reliability, urgency
- Strong attention to detail
- Collaborative mindset
- Basic understanding of AWS
- Familiarity with microservices, APIs, Linux basics, networking
- Exposure to Prometheus, CloudWatch
- Understanding of incident management and SRE/DevOps culture