roadmap

Honest about today. Clear about the trajectory.

Hayao's engine is broad and its verification is real. What it can't yet hand you out of the box is a professional, hour-long, polished game. This page says exactly where that gap is and how it closes — no check the engine can't cash.

the north star
“Create a metroidvania like Ori & the Blind Forest. Build with npm package hayao.” — and get back something that feels professional, plays for an hour, and is polished out of the box.

That's the destination, not a claim about today. Below is the honest distance to it.

where it stands · v0.2

What's already true

The engine covers the popular 2D genres end to end, and every example proves its own truth in CI. The dimension that lags is art & polish: five games are art-finished, the rest render in functional placeholder style.

Engine breadth
27 genres
Verification
485 ✓
Determinism
bit-exact
Art-finished games
5 / 28
Benchmark ladder
2 / 6
the work

What closes the gap

Art pipeline propagation

The code-as-art toolkit (palettes, organic shapes, autotile, textures) is proven on Lanternway, Rootward, and Tarnholm. Next: extract the shared cosmetic-layer helpers and lift the remaining games off placeholder art.

Fun & juice defaults

Game-feel — hit-stop, screen-shake, tweened readability — proven per-game today. Goal: make it the default an agent inherits, not something it must hand-author each time.

Benchmark fidelity, rungs 3–6

Reproduce harder human-ranked js13k winners under determinism + proof discipline, feeding each engine gap back into src/. See the ladder below.

Long-form assembly

From single verified games to an hour-long, multi-zone experience an agent can compose — the true test of the north-star prompt.

external ground truth

The js13k reproduction benchmark

Rather than grade hayao against invented specs, we reproduce games that humans already ranked highly in open jams — mechanics in spirit, nothing copied — and machine-measure the fidelity. Each rung stresses a specific engine muscle; a red gate means it doesn't count.

RungTarget (jam, year, rank)StressesStatus
1Edge Not Found — js13k 2020, #2solver on a twisted torus✅ Seamfold
2Black Hole Square — js13k 2021, #9BFS-provable tap puzzle✅ Gravewell
3Dying Dreams — js13k 2022, #2multi-avatar state explosion☐ planned
4Norman the Necromancer — js13k 2022, #3bot-provable real-time balance☐ planned
5Ninja vs EVILCORP — js13k 2020, #1feel-critical movement, par-time bot☐ planned
6Space Huggers — js13k 2021, #8juice + particles + perf ceiling☐ planned

Method and scoring rubric: docs/BENCHMARK.md.

help build it

Contribute

The engine is MIT and the whole surface is text. Pick a placeholder-art game and lift it to the woodblock bar, take a benchmark rung, or drive the engine with your own agent and file what fought it. Start from AGENTS.md and the developer docs.