Have you seen War Games? You know, the 1983 movie where Matthew Broderick’s character almost starts a nuclear war? Devastation is prevented because the “WOPR” super-computer determines that nuclear war is a no-win scenario. WOPR discovers that nuclear war is kind of the same as tic-tac-toe; if you play it right every game is a draw. I personally don’t consider wiping out all of humanity a draw, but I can see what WOPR was thinking.
WOPR took a ponderously-long 72 seconds to reach this conclusion about tic-tac-toe. I wondered at the time I saw the movie if that was a realistic amount of compute time. Every few years it would cross my mind again. I’d think about it for a bit, then get back to my life. I didn’t actually try to answer that question until 38 years after seeing the movie.
I wondered what optimizations WOPR would likely have used. Surely a super-computer trusted with the fate of humanity wouldn’t create a simplistic brute-force implementation. First, tic-tac-toe is highly symmetric. There are really just 3 opening moves; the rest are just rotations of corner, outside middle, and middle. The same reflection argument can be made for subsequent moves, but that starts to add complexity and provides more limited optimizations.
Second, computational game theory was well-established decades before War Games. MiniMax trees with alpha-beta pruning go back at least to 1956. A minimax tree is simple: just assume each player in turn takes their best move available. Try all possible moves in turn until an end-state is reached, win, lose, or draw. Alpha-beta pruning just eliminates looking at moves someone would never take because a better move has already been considered.
Given those two optimizations, there are just 6,493 total moves to consider. 1,903 come from the opening corner move, 2,458 from the outside center move, and 2,132 from the center move. Without alpha-beta pruning, there are 24,385 moves. Without either optimization, there are 94,977 moves. These two efficiency improvements make the app 14.6 times faster.
For the third optimization, I rely on the likelihood that WOPR had symmetric multi-processing. Even early versions of the Michigan Terminal System (MTS) had dual processors in the 1960s. This expanded to 4 CPUs by the time I used MTS in college in the mid 1980s. Personal computers have had multiple cores for decades, along with smartphones going back at least to 2011. Tic-tac-toe is easy to optimize for multiple cores. My home PC has 32 cores, but to be fair to WOPR, I chose to use at most 3 cores — one for each of the 3 unique opening moves.
To get started, I wrote an app in C# on my PC. Using just one core, it runs in 0.0808 milliseconds. Using 3 cores, it runs in 0.0307 milliseconds. That’s more than 1/3 the time of one core, because it takes a different amount of time to compute each of the 3 possible starting moves. Two cores complete and wait for the third to finish. I ran the code in a loop 10,000 times to get an accurate reading. So this app runs 2,345,277 times faster than WOPR.
But that’s not fair, since computers are a lot faster now than then. I have an old TRS-80 Model 100, which was brought to market in 1983 — the same year that War Games was released. Its CPU is an 80C85. The first iteration of the 8085 came out in 1976. So at the time it was a 7 year old CPU with a single core. But it’s probably a closer comparison to WOPR than a modern PC. That TRS-80 Model 100 contains the N82 version of Microsoft BASIC 80, the last piece of significant software Bill Gates wrote for Microsoft. I rewrote the app in BASIC, and even after optimizing it a bit it took 20 minutes and 35 seconds to run. That’s 17.2 times slower than WOPR. I was worried that maybe WOPR was pretty impressive.
The Cray 1 supercomputer was released in 1976. In my mind, that’s the type of machine WOPR was emulating. The Cray 1 ran at 80Mhz. The 80C85 in my TRS-80 ran at just 2.4Mhz. If Bill Gates had ported BASIC to the Cray 1, I’m guessing my app would have scaled linearly based on the clock speed to 37.05 seconds. There you have it. War Games is bullshit. WOPR took 2x as long as the 7 year older Cray 1 would have run on a single core.
I wanted more tangible evidence, so I rewrote the app in 8085 assembly language. I ran that on the TRS-80, and it ran in 4.8 seconds. A battery-powered, single-core, underclocked machine with a 7 year old CPU bested WOPR by 15 times. No hand-waving Cray-Bill-Gates-BASIC-fever-dream required.
At this point, I could (should!) have stopped. But there were two tangents I needed to pursue. First, in the mid-80s I was obsessed with Turbo Pascal. This compiler was magic; there was nothing else like it. That MTS mainframe I used in college had strict time limits per semester. Mistakenly create one infinite loop and you could blow your budget and fail all of your CS classes. Many of the classes I took used Pascal, so I could do my homework on my 1982 Kaypro running CP/M on a Z80 using Turbo Pascal. Once the assignment worked, I just had to upload it to MTS and run it once to turn it in. (Embedded tangent: on MTS you could “run *pizzadelivery” to order Domino’s pizza, something the internet wouldn’t support for many years). Anyway, I wanted to see how fast an app Turbo Pascal could produce. I used a Kaypro emulator, since mine (which I still have) no longer boots. The emulator runs at 2.5Mhz, same as the actual machine. I ported the BASIC and assembly (the Z80 has a superset of the 8085 instructions) apps to CP/M from the TRS-80, and those ran in times almost identical to the TRS-80. The Turbo Pascal version ran in 33 seconds. Yes, one more kick in the face of WOPR. A 1982 Kaypro with Turbo Pascal is 2.2x faster. No need to write assembly to be faster than that mythical super-computer.
My second tangent was around how my main PC’s AMD 5950x compared to the M1 processor in Apple’s MacBook Air. I rewrote the app in C++, and used the GNU compiler for each platform along with OpenMP for parallelism. The M1 is slightly faster than the 5950x on 1 core (0.039ms vs. 0.0408ms), slightly slower on 3 cores. (0.0189ms vs. 0.0153ms). It’s astonishing to me that the 5950x using 3 cores is 4,705,882 times faster than WOPR. It’s also astonishing to me that Apple’s new CPUs are so good.
The Microsoft Visual Studio C++ compiler produces worse code than GNU on my PC: 0.0589ms for single core and 0.0218ms for 3 cores. I used PPL for multiple cores since OpenMP generally isn’t as efficient with MSVS as it is with GNU.
The Go compiler on my PC produced results only slightly worse than C++ with MSVC: 0.0673ms for 1 core and 0.0261ms for 3. I am really impressed with Go as a language and its implementation.
Swift on the Mac produced decent results, but much slower than C++ and C#: 0.229 for a single core. I didn’t investigate parallelism in Swift.
I also wrote an assembly version for x64. I’m not a good assembly programmer, but for 1 core it was faster than the C# app at 0.0772ms.
I am a little embarrassed I wrote so many lines of code for tic-tac-toe. But I learned some things along the way and I’ll never again stand in the shower looking off in the distance wondering if WOPR’s speed was just for dramatic effect. It was.
Note: The code I referenced can be found here: davidly/ttt: tic-tac-toe and its applicability to nuclear war and WOPR (github.com)