TUI program performing over 60 perfect clears a second

How I made the fastest perfect clear solver (kinda)

CodeProject page

This blog post is meant to be an updated write-up of an old Reddit post I wrote at 5AM to explain Perfect Tetris.

If you're not familiar with modern Tetris or perfect clears, you might want to check out these links before reading:

Perfect clear (aka PC)Rotation system

Background

This project is part of my (now somewhat abandoned) goal of making the strongest Tetris bot.

To assist the bot, I made a dedicated PC solver. The initial version was a brute-force backtracking solver, which worked but wasn't super fast. I wrote another solver based on the exact cover problem, but this required complicated post-checks, making it buggy and slow.

I wanted a solver that was agnostic to the rotation system. I settled on writing yet another backtracking solver due to their flexibility, and tried to reduce the branching factor as much as possible and added a heuristic to reduce the time to first solution.

To answer some "why didn't I do X?" questions, I wanted these constraints on the solver:

  1. Returns the first solution (which solution comes first can be tuned using the heuristic)
  2. Is single threaded (more CPU available to the bot)
  3. Has no large pre-computed tables (final binary size should be <2MB)
  4. Minimises memory usage (this correlates with speed)

The bot eventually got good enough that it would occassionally PC without any additional guidance, lol.

First, there was DFS

Solve time: EXPTIME, EXPSPACE, and everything bad

I started off by re-writing my first solver from C# into Zig. It was a basic depth-first backtracking solver. The only optimisations so far were representing the 10x40 playfield as 40 u16 bitfields (same as my Tetris engine) and memoizing previous states to avoid redundant branches. The result was similarly slow.

In the header of each section, I'll write down the mean solve time of the same 100 random 4-line PCs, to give a concrete sense of my progress. Due to a bug accidentally speeding the initial version up, I don't have any fair performance numbers, but it was slow.

Pruning the walled gardens (mod 4)

Solve time: 754.2ms

First, let's reduce the branching factor. Early on in the search, it's not obvious if a setup does not lead to a solution. But as the playfield gets more filled up, many candidates end up being dead ends, and we can detect these only by looking at the playfield.

⬛⬛⬛⬛🟩🟩⬛⬛⬛⬛
⬛⬛⬛🟩🟩🟪🟥🟥⬛⬛
🟨🟨🟦🟦🟦🟪🟪🟥🟥⬛
🟨🟨⬛⬛🟦🟪🟫🟫🟫🟫
The worst PC opener possible. Also, why is there no cyan square emoji?

Look at the setup above. Give yourself any number of any tetromino you like. A 4-line PC is completely impossible!

0️⃣0️⃣0️⃣0️⃣🟩🟩1️⃣1️⃣1️⃣1️⃣
0️⃣0️⃣0️⃣🟩🟩🟪🟥🟥1️⃣1️⃣
🟨🟨🟦🟦🟦🟪🟪🟥🟥1️⃣
🟨🟨0️⃣0️⃣🟦🟪🟫🟫🟫🟫
The worst PC opener possible, annotated.

Notice that the 2 columns in the middle form a solid wall. Pieces that you place can "go through" solid rows (because of cleared lines), but not

🟥🟥🟫🟫🟫🟫🟧🟨🟨🟦
🟪🟥🟥🟨🟨🟪🟧🟨🟨🟦
🟪🟪🟩🟩🟪🟪🟧🟧🟦🟦
🟪🟩🟩🟨🟨🟪🟫🟫🟫🟫
Average PC. Yellow O piece "goes through" a solid row.

through solid columns. This effectively splits the empty area into 2 zones that we have to fill separately. Each piece occupies 4 cells, so for a PC to be possible, the area of both zones must be a multiple of 4. (zone0 = 9, zone1 = 7, so no PC here)

Furthermore, solid "walls" can span 2 columns, even diagonally.

🟫0️⃣0️⃣0️⃣🟨🟨1️⃣1️⃣🟩🟩
🟫0️⃣0️⃣0️⃣🟨🟨1️⃣🟩🟩🟪
🟫0️⃣0️⃣🟧1️⃣🟦🟦🟦🟪🟪
🟫🟧🟧🟧1️⃣1️⃣1️⃣🟦2️⃣🟪
The worst-er PC opener possible, annotated.

Following the "no through-columns" rule, the setup above is split into 3 zones. If you play around with this setup, you'll realise that the rule can apply to walls spanning 3 columns, further splitting the top and bottom of zone1 into 2 zones.

In fact, there can be valid walls that span any number of columns, but detecting them becomes increasingly complicated for deminishing returns. I capped the code at 2 columns, as the 2-column case can be collapsed into the 1-column case with a single line of code. The exact details of the bit hacks used are left as an exercise to the reader.

No, not that kind of AI!

Solve time: 206.1ms

I tokenmaxxed and the AI overlords made my code 4x faster 💸💨💨💨💨. Yeah, no lol.

Remember that Tetris bot I mentioned at the start of this post? The bot actually has a tiny AI to evaluate playfield states. What if we reused it as a heuristic for choosing which moves to try out first?

At first, it didn't make much difference. After including holds in the set of moves to rank, there was a surprisingly large speedup. This idea shouldn't have worked, but it did somehow.

I also had to re-write the feature extration code to work on the bitfields directly to be fast. If only my Python notebooks could do feature extraction in 43 nanoseconds.

Using a heuristic also meant that it could solve much larger PCs, like say, 20 lines in 23ms. Larger PCs should technically be easier to solve since the number of possible solutions grows exponentially, but if it takes a bad branch early on it gets stuck for a loooong time. I don't know of any other software that can solve PCs of more than 6 lines in a reasonable amount of time.

Save your cache💸

Solve time: 92.059ms

Inspired by xkcd 1732:

⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
🟥🟥⬛⬛⬛⬛🟧🟨🟨🟦
🟪🟥🟥⬛⬛⬛🟧🟨🟨🟦
🟪🟪🟩🟩⬛⬛🟧🟧🟦🟦
🟪🟩🟩⬛⬛⬛🟫🟫🟫🟫
Typical 10x40 playfield.

So far the solver has been working with 10x40 playfields, as that is the size mandated by the Tetris Guideline. But when finding a PC, usually only the first 4 or 6 rows are ever touched.

The playfield and held piece combined are enough to determine if we've seen a state before, so let's cap the solver at 6 rows and store them in a u64. That leaves 4 bits free to store the held piece type. If a solve requires more rows, we switch back to the old playfield representation.

This is a 90% reduction in the only place that allocates heap memory. Most solves use about as much memory as there is L2 cache on a single CPU core!

AI again?

Solve time: 39.823ms

What if, instead of using a pre-existing AI heuristic not trained specifically for solving PCs, we trained an AI specifically for solving PCs?

Yeah, I don't know why I didn't think of it earlier.

The AI was trained with a genetic algorithm called NEAT, so after porting my old NEAT implementation over, I changed the goal to "solve PCs as quickly as possible" and boom.

The AI could have had any number of layers but chose a linear equation to speed itself up lol.

Geddan

Solve time: 30.893ms

My code to find all valid placements was pretty lenient to ensure that it found all the possible placements. A bit too lenient, in fact. Sometimes a wall kick would send the current piece entirely above the playfield, and it would burn cycles doing this until it explored the entire invalid area above.

Adding a check to stop it once it got too high sped things up a bit.

Parity checks

Solve time: 17.215 25.327ms

After reading up more about Perfect Clear theory, I came across the idea of column and checkerboard parity.

To clear the entire playfield and perform a perfect clear, both parities must end at 0. Checkerboard parity is especially interesting as it is only affected by T pieces. By looking at the piece queue, we can determine the maximum amount the parity can change. If the current parity is greater than that, we prune out the candidate solution.

...is what I thought would work. Unfortunely, I forgot to account for line clears, which also affect checkerboard parity. Column parity is unaffected by line clears, but so many pieces can affect it that it's not very useful.

This bug was present in version 1.0.0 and it wasn't until I tested the solver on all 4-line PC queues (yes, all 79,516,080 of them) that I realised I was missing a solution in 0.01% of them.

In a last ditch effort, I addded checkerboard and column parity to the list of features fed into the heuristic AI, and got a slight speedup.

O-spins are (usually not) real

Solve time: 22.665ms

I wanted my solver to be agnostic to the rotation system, which meant trying to rotate O pieces just in case there was an O-spin available.

To my surprise, most mainstream rotation systems that people care about (SRS, SRS+, Tetr.io, etc.) do not have any O kicks, making O-spins impossible. To optimise for this case, I added an initial check through the kick table for O kicks, and if there are none, don't bother rotating O pieces.

Pre-computed collisions

Solve time: 11.452ms

There's a golden rule in optimisation: if something is cheap to compute, don't bother storing it. Detecting if the current piece collides with the playfield takes only a few logical and arithmatic operations on integers, which sounds extremely cheap.

I ran the program through a profiler, and found that this cheap routine was called so often that it made up a large part of the runtime.

When searching for possible placements, a piece is usually checked against the same spot in the playfield for collisions multiple times, so what if we just computed the collisions for all positions once and reused that?

Well, turns out my bit hacks were not as fast as I thought they were and this lead to a pretty dramatic speedup.

Benchmarks

I made a few extra small optimisations that don't warrant their own section:

Here's the final benchmark result, aggregated over 200 consecutive PCs:

7-bag randomiser, SRS+
PC heightMean timeMax timeMean memoryMax memory
410.697ms237.6ms245.4KiB9.455MiB
69.253ms217.8ms173.8KiB2.430MiB

As far as I know, this makes Perfect Tetris the fastest single-solution PC solver. Though, it was the only single-solution PC solver I could find.

No benchmark section would be complete without a comparison to other software:

4-line PC, SRS
SolverTime (ms)ThreadsSolutions found
Queue ZLSJIOTZTIS
Tetra tools202611824
pcf31.6016187
Perfect Tetris9.76311
Perfect Tetris (BFS)3422611824
Queue JIOSTLZTLIS
Tetra tools423019197
pcf70.19162316
Perfect Tetris1.04411
Perfect Tetris (BFS)5693219189
Queue OIZSJLTZTIJ
Tetra tools168714199
pcf62.07161669
Perfect Tetris.739111
Perfect Tetris (BFS)3242314198
  • Tetra tools runs on the browser and returns all solutions. It uses a 23MB pre-computed table.
  • pcf runs natively and returns some solutions. It does not use a pre-computed table.
  • Perfect Tetris runs natively and returns the first solution. It does not use a pre-computed table.
  • Perfect Tetris (BFS) is a slight re-write of Perfect Tetris to perform a breadth-first search for all solutions, to provide a fair comparision to other solvers. No extra optimisations were made. (The no. of solutions found being lower than Tetra tool's is likely a bug as this version was not written with much thought)

There are definitely a lot of optimisations to be made to find multiple solutions faster, but when it comes to finding a single solution, Perfect Tetris is over an order of magnitude faster than pcf, which is written by the same person who wrote Cold Clear, perhaps one of the strongest Tetris bots ever made.