Building a National-Scale Design System for AI-Driven Early Childhood Assessment

Published on :

May 20, 2026

achieved task completion

70%

improved customer satisfaction

80%

reduced support quesries

90%

staff satisfaction score

Building a National-Scale Design System for AI-Driven Early Childhood Assessment

ACER is a Global Education Research Organisation

Product Design Lead

Position

$150000

Budget

7 months

Duration

ACER (Australian Council for Educational Research) is a globally recognised educational research organisation, trusted by governments, schools, and international bodies to design and evaluate assessments, research programmes, and educational tools. ACER develops complex, data-heavy digital platforms used by educators, students, administrators, and policy makers across Australia and internationally. The organisation's reputation is built on rigour, evidence, and accuracy — the same standards the design practice needed to reflect but hadn't yet operationalised.

Problem & Challenges

‍

The Preschool Outcomes Measure (POM) was developed as a new national digital assessment tool to support educators in observing and understanding preschool children’s learning and development. The challenge was to design a product that translated complex educational research into an intuitive, trusted experience for educators across diverse contexts, while ensuring accessibility, ethical data use, and readiness for scale.I was primarily involved in the Small Scale Trial (SST) and the National Applied Trial (NAT), contributing to the delivery of a validated formative assessment tool aligned with learning progressions.

Aligning Diverse Educational Needs - Translating complex pedagogy and assessment theory in to intuitive user experience
Designing an adaptable mobile and web tool that fits both structured and exploratory classroom environments.
Creating terminology, workflows and support resources that feel natural to educators

‍

The core problem:

ACER had strong design intent but no design infrastructure. Designers across squads were rebuilding the same UI components from scratch. The result was a growing UX debt and patchwork of inconsistent interfaces across 8+ product surfaces, with WCAG compliance unreliable, a real risk in ACER's regulated education context.There was no shared system, no research operations framework, and no AI-integrated workflow to keep pace with delivery expectations. I was brought in to build all three.

‍

Users & Audience‍

ACER's products serve multiple distinct audiences across its education and fintech platforms:

Educators, school administrators and Federal Government

Teachers and school leaders using ACER's assessment platforms to track student progress. Needs: clear data presentation, low-friction workflows, and interfaces that don't require training to use effectively.

Students

Assessment participants, ranging from primary through to tertiary. Needs: accessible, focused interfaces that minimise anxiety and maximise performance conditions.

Policy makers and researchers

Data-heavy users interpreting ACER research outputs. Needs: complex data made legible, with strong information hierarchy and reliable accessibility across assistive technologies.

Internal design team

A growing team of product designers who were the primary audience for the design system and research operations framework I built. Their ability to adopt and maintain what I built was the real measure of success.

‍

Roles & Responsibilities

I was the Lead Product Designer on the Preschool Outcomes Measure (POM) project, holding end-to-end UX and product design ownership across two major trial phases, the Small Scale Trial (SST) and the National Applied Trial (NAT). The team was small and cross-functional: I worked directly alongside a Product Manager, technical leads, developers, and subject matter experts in early childhood education research. There was no separate research lead. I owned the research programme alongside the design work.

The project ran from March to October 2025, operating in design sprints within an agile delivery model. Given the tight timeline and the complexity of the domain — two major assessment domains, ten subdomains, seven progressive competency levels — I had to move between strategic design decisions and hands-on delivery simultaneously.

‍

What I personally owned:

Led end-to-end UX strategy, defining the research approach, design principles, and usability benchmarks before any screens were designed
Conducted all primary user research: semi-structured interviews with educators and facilitators, focus groups, competitor analysis (including StoryPark, and assessment platforms across the early childhood education space), and internal discovery workshops
Translated complex psychometric and pedagogical frameworks into progressive user journeys and interaction models that non-expert users could navigate with confidence
Established Design Ops practices from scratch — built a scalable component library in Figma using variables, tokens, and auto-layout, with WCAG accessibility compliance built in
Designed and iterated across the full fidelity spectrum — from low-fidelity wireframes and user flows through to high-fidelity, clickable prototypes used in both trial phases
Led two rounds of usability testing — Round 1 during the Small Scale Trial, Round 2 during the National Applied Trial — using task success rates, System Usability Scale (SUS) scoring, and qualitative behavioural observation
Synthesised research findings using Dovetail with global tagging, ensuring insights were structured and comparable across trial cohorts
Collaborated with engineering to integrate AI-driven decision support — implementing logic based on validated psychometric scales to recommend next steps aligned to each child's competency level, and enabling automated report generation and intelligent data pre-filling
Managed design-to-development handoff through detailed Figma documentation, design gap meetings, and UAT support
Conducted design QA on staged builds to catch and resolve inconsistencies before launch
Established post-launch feedback loops — NAT survey, passive observation, and CX feedback cadence — to inform continuous improvement

‍

Process & What I did

‍

‍1. Phase 1 - Discovery
‍ Competitive analysis of early childhood assessment platforms including StoryPark, plus internal workshops with PM, developers, and education researchers to map the assessment framework before designing anything.
‍
2. Phase 2 - User research
‍Semi-structured interviews and focus groups with educators across diverse classroom contexts. Built a Dovetail tagging taxonomy before sessions ran so findings were comparable across cohorts and usable directly in sprint planning..
‍‍
3. Phase 3 - Translating complexity
Two domains, ten subdomains, seven competency levels had to become flows a busy educator could navigate without training. Designed IA around the educator's natural workflow — not the framework's structure. Validated simultaneously with educators and subject matter experts, facilitating where those two groups disagreed.
‍
4. Phase 4 - Design system
Component library in Figma using variables, tokens, and auto-layout. WCAG compliant from component level. Architecture followed the assessment domain structure so components were reused systematically rather than rebuilt per flow.

5. Phase 5 - AI decision support
Integrated AI logic based on validated psychometric scales: real-time performance analysis, next-step recommendations, automated report generation, and intelligent data pre-filling. Designed recommendation UI in educator language, prompts that built professional confidence, not directives that replaced judgement.

6. Phase 6 - Usability testing
Two rounds. Round 1 (Small Scale Trial): task success rate, SUS scoring, behavioural observation. Round 2 (National Applied Trial): validation at scale with facilitator debrief interviews to surface gaps between self-reported ease and actual friction.

7. Phase 7 - Delivery
Weekly dev reviews, design gap meetings, UAT support, design QA on staged builds. Post-launch feedback infrastructure built into the delivery plan from day one: NAT survey, Mouseflow passive observation, and a CX feedba

‍

Solution

Key Solutions Delivered:

Responsive web and mobile platform optimised for touch interaction and classroom use conditions.
Progressive assessment flows translating complex framework (2 domains, 10 subdomains, 7 competency levels) into intuitive educator journeys requiring no prior training.
AI-powered decision support, real-time next-step recommendations, automated report generation, and intelligent data pre-filling based on validated psychometric scales.
Role-based dashboards and workflows for educators, facilitators, and administrators, each tailored to their context and permissions.
Observation capture interfaces built around natural classroom moments, record, tag, and submit without disrupting teaching flow.
WCAG 2.1 AA compliant design system across all web and mobile surfaces, scalable components, tokens, and auto-layout in Figma.

Additional Enhancements:

Onboarding flow validated at 80% task success rate during the National Applied Trial, educators self-onboarded with minimal facilitation.
Post-launch feedback infrastructure built into delivery from day one, NAT survey, Mouse flow passive observation, and a CX feedback loop.
Dovetail tagging system designed before research began, findings comparable across cohorts and usable directly in sprint planning.

Result

70%+ System Usability Score
0 -> 55 NPS for new digital assessment
80% task success rate for Assessment Flowduring the National Applied Trial‍
90% internal satisfactory review
25% Faster solution delivery vs prior quarter

AI-assisted prototyping shifts validation earlier; reduces late-stage rework
Single governed Figma system; 30% fewer UI inconsistencies
Strengthened brand perception as a modern, education-focused brand‍
3 -> 0.5 days — AI-assisted Dovetail + structured taxonomy + prompt libraries

‍

Lessons learned

•AI tools are most valuable when you design the workflow around them, not drop them into existing processes. The taxonomy and prompt libraries I built were as important as the tools themselves.

•Design system adoption is a change management problem, not a design problem. I had to make the new system demonstrably better for individual designers immediately, not just better for the organisation long-term, or it wouldn't be used.

•Building while delivering is hard but necessary. Waiting for a dedicated infrastructure sprint that never comes means teams carry technical design debt indefinitely. The parallel track approach worked because I made both tracks visible to leadership from day one.