roadmap

Honest about today. Clear about the trajectory.

Hayao's engine is broad and its verification is real. What it can't yet hand you out of the box is a professional, hour-long, polished game. This page says exactly where that gap is and how it closes — no check the engine can't cash.

the north star

“Create a metroidvania like Ori & the Blind Forest. Build with npm package hayao.” — and get back something that feels professional, plays for an hour, and is polished out of the box.

That's the destination, not a claim about today. Below is the honest distance to it.

where it stands · v0.2

What's already true

The engine covers the popular 2D genres end to end, and every example proves its own truth in CI. The dimension that lags is art & polish: five games are art-finished, the rest render in functional placeholder style.

Engine breadth

27 genres

Verification

485 ✓

Determinism

bit-exact

Art-finished games

5 / 28

Benchmark ladder

2 / 6

the work

What closes the gap

Art pipeline propagation

The code-as-art toolkit (palettes, organic shapes, autotile, textures) is proven on Lanternway, Rootward, and Tarnholm. Next: extract the shared cosmetic-layer helpers and lift the remaining games off placeholder art.

Fun & juice defaults

Game-feel — hit-stop, screen-shake, tweened readability — proven per-game today. Goal: make it the default an agent inherits, not something it must hand-author each time.

Benchmark fidelity, rungs 3–6

Reproduce harder human-ranked js13k winners under determinism + proof discipline, feeding each engine gap back into src/. See the ladder below.

Long-form assembly

From single verified games to an hour-long, multi-zone experience an agent can compose — the true test of the north-star prompt.

external ground truth

The js13k reproduction benchmark

Rather than grade hayao against invented specs, we reproduce games that humans already ranked highly in open jams — mechanics in spirit, nothing copied — and machine-measure the fidelity. Each rung stresses a specific engine muscle; a red gate means it doesn't count.

Rung	Target (jam, year, rank)	Stresses	Status
1	Edge Not Found — js13k 2020, #2	solver on a twisted torus	✅ Seamfold
2	Black Hole Square — js13k 2021, #9	BFS-provable tap puzzle	✅ Gravewell
3	Dying Dreams — js13k 2022, #2	multi-avatar state explosion	☐ planned
4	Norman the Necromancer — js13k 2022, #3	bot-provable real-time balance	☐ planned
5	Ninja vs EVILCORP — js13k 2020, #1	feel-critical movement, par-time bot	☐ planned
6	Space Huggers — js13k 2021, #8	juice + particles + perf ceiling	☐ planned

Method and scoring rubric: docs/BENCHMARK.md.

help build it

Contribute

The engine is MIT and the whole surface is text. Pick a placeholder-art game and lift it to the woodblock bar, take a benchmark rung, or drive the engine with your own agent and file what fought it. Start from AGENTS.md and the developer docs.