What you will be working on
The AI Performance Engineer is a new role within the AI Enablement Office, a team dedicated to helping CPF Board—and Government as a whole—get the most out of AI.
As AI adoption accelerates, we’ve found that building AI systems is only half the challenge. The other half is making them work well. Today, most teams know how to build and test traditional software, but AI systems behave differently. They produce non-deterministic outputs, fail in subtle and context-dependent ways, and require fundamentally different approaches to evaluation and improvement. This role exists to close that gap.
You will work hands-on to evaluate, diagnose, and improve GenAI and agentic AI systems, while also contributing to the organisation’s growing capability in this space. Think of this as the AI equivalent of what Site Reliability Engineering did for infrastructure: a disciplined practice focused on making AI systems actually perform in production.
This is a new function—not just at CPF Board, but across the broader ecosystem. The discipline of AI performance engineering is still being defined. You should be comfortable with ambiguity, excited about shaping something new, and ready to figure things out alongside the team.
In this role, you will:
Evaluating and Improving AI Systems (~70% of the role)
- Design and execute evaluation frameworks for GenAI and agentic AI systems, measuring performance against real-world use cases using techniques such as LLM-as-judge, human evaluation protocols, and automated test suites.
- Systematically diagnose failure modes in AI systems—understanding why an agent selected the wrong tool, why a retrieval step missed relevant context, or why a prompt produces inconsistent outputs—and implement targeted fixes.
- Iterate on prompts, tool definitions, agent workflows, context engineering, and orchestration logic to improve system outputs. Frameworks in use include LangGraph and N8N.
- Build regression testing and benchmarking pipelines to ensure AI systems maintain or improve performance over time, drawing on emerging practices in AI evaluation and observability.
- Collaborate with product and engineering teams to define what “good” looks like for AI outputs in specific business contexts—translating domain requirements into measurable evaluation criteria.
Building Organisation Capability (~30% of the role)
- Contribute to playbooks, guides, and reusable templates on AI evaluation, prompt engineering, and performance improvement for IT teams across CPF Board.
- Support workshops and knowledge-sharing sessions to help other teams adopt best practices for testing and tuning AI systems.
- Document patterns, anti-patterns, and lessons learned to build institutional knowledge in a rapidly evolving field.
What are we looking for
We value the diverse talents and experiences that each individual brings to the table. While mastery of every requirement may not be necessary, familiarity and expertise in some of the following areas will position you for success within this team.
- Some software development experience, with the ability to write and debug code effectively. Fresh graduates with strong technical foundations and demonstrated curiosity are welcome.
- Familiarity with Large Language Model (LLM) APIs and core concepts (tokens, context windows, temperature, tool use). Hands-on experience with frameworks like LangChain, LangGraph, or LlamaIndex is advantageous but not essential.
- Understanding or strong interest in AI evaluation methods, prompt engineering, and agentic AI patterns (e.g. ReAct, tool calling, multi-step workflows).
- Strong analytical and problem-solving skills. You should be comfortable reasoning about why an AI system is underperforming and forming hypotheses to test.
- Excellent written communication skills. Much of this work involves writing prompts, evaluation criteria, documentation, and playbooks, so clarity of expression matters.
- Intellectual curiosity and a genuine interest in how AI systems work under the hood.
- Comfort with ambiguity and willingness to help define a new function from the ground up. There is no established playbook for this role, you will help write it.
- Proactive, self-driven attitude with the ability to work both independently and collaboratively.
- Experience with AI evaluation and observability platforms (e.g. LangSmith, Braintrust, or similar).
- Familiarity with workflow orchestration tools such as LangGraph, N8N, or similar.
- Cloud-native development experience.
- Desire and aptitude to be full-stack—comfortable spanning from infrastructure to UX when needed.
- Experience with AI evaluation and observability tooling (e.g. LangSmith, Braintrust, or similar).
- Exposure to workflow orchestration tools such as N8N or similar low-code automation platforms.
- Experience with cloud-native development.
- Interest in broadening scope toward full-stack or platform work over time
The seniority of appointment and actual corporate job title will commensurate with individual work experiences.
Position is on a 2-year full-time contract directly under the payroll of CPF Board with an option to renew, contingent upon confirmation and subject to organisational needs. Additionally, there is potential for emplacement into a permanent position.