October 2024

TRAVEL

I had a fantastic time in Taiwan. First I got to see the country solo, though not so much of it as I had planned. I landed on the northern end of the island at the same time a typhoon was busy pummeling the southern. My side trip to see the old city of Tainan was therefore canceled; instead I stuck to the northern third, where the storm turned out to be a nothingburger despite two canceled schooldays.  But a train ride down to Hualien and a car tour of Jiufen and Shifen on the eastern coasts satisfied my standing urge to escape the city.  And Taipei is marvelously comfortable for an American.

After the solo portion, I was wrapped in the over the top hospitality of the parents of the Taipei American School debate program, where I was wined, dined and regaled with about 81,102 toasts, though I confess I didn’t count. I confess that for whatever reason Taiwan has never been on my priority list of places to visit. But now it’s on my list of places to return to.

One of the things I did manage while there is to spend a whole day writing travel blog posts from earlier trips, so my sequence about last year’s Japan adventure will be posting once a week until done. Let’s see if I can get more written between the end of that and now.

No more big journeys for me until Thanksgiving, when I launch on a European Gallavant.

TABROOM

We had a brief downtime on Saturday, which thankfully happened just as I was waking up anyway in Taipei. The downtime has apparently caused a fair fury of speculative debugging around the socials, because the error messages indicated that a disk was out of space. But the lesson of this speculation is that errors can be misleading if you don’t have the full picture.

Some myths, debunked: Tabroom does indeed run in the cloud, not on a server we run ourselves. That fact isn’t so magic as you might expect: all “the cloud” means is someone else’s server. Cloud services are still subject to the resource limits any other server has. In particular, database servers are tricky to parallelize, or run on multiple machines, so we are not vulnerable to a single instance’s downtime. So while our web servers run 2 instances during the week and anywhere from 4-16 instances during the weekend, our primary database server remains singular. That limitation is from the core tech, and is not specific to Tabroom. So, we are stuck with it.

The root cause of this downtime was not insufficient disk space. Tabroom’s database takes up 36gb of space; that’s all your registered entries, ballot comments, and event descriptions rolled up into one mess of data. The database has its own dedicated disk, separate from the operating system and general server it runs on. That disk currently has 128gb of space; not the largest, but we pay for fast instead of big here, and it’s still 4x as large as Tabroom’s data needs. It’ll do fine for a decade, and we can expand it at will when and if we need to.

That was not the full disk you were seeing errors about.

Instead, a badly written query created years ago for a rarely used results page that experienced a sudden surge in popularity this weekend. That query failed to limit its scope: in order to calculate its output, it was pulling every ballot and ballot score in Tabroom. In 2016 when it was written, that made it slow but not particularly noteworthy. In 2024, every time someone went to that page, a 23 gigabyte temporary file was created on the server disk to run this one query.

At that point, it was only a matter of time: no server can indefinitely handle several hundred 23 GB files being dumped on it. At 4:30 PM CST, the disk hit its limit. Kaboom.

Fortunately, it was a simple matter, once I woke up, to clear the disk, fix the query, and kick the server. That’s life when you’re the sole maintainer of a project sometimes. I simply cannot go back and test and check every one of the thousands of queries that Tabroom regularly runs. When I do confront something like this, I do a review and put up guardrails around this exact thing happening again, but that only solves for the problems I’m aware of. Are there other ticking time bombs in the code?  Probably! Is this true of every other online service on earth?  Definitely!

But I promise you the issue is not that we haven’t found a good enough deal on enterprise disks, or the NSDA is being cheap on the hosting provider. We’ve been pretty lavish this year in terms of server resources, actually. But this particular problem would have blown up no matter how much overkill we’d built into our hosting setup; it was simple the result of the terribly common human errors. That’s what my job is. Consider that your typos at worst can insult someone, or temporarily hurt a student’s grade. Mine can bring down most of speech & debate. No amount of paranoid care can entirely prevent that, even though I do take quite a bit of it.

The worst part of it?  That results page with the query doesn’t actually work properly anyway; its formatting is broken. And since people are for some reason now fascinated by this page, they won’t stop emailing us. I’m going to put up a notice about that, but perhaps will just take it down. There’s little value in spending a week trying to fix this bad spaghetti code when I’m just going to have to rewrite it soon anyway; instead I’ll just move it up the list of things to be rewritten early.

The rewrite goes apace. Right now I’m working on standing up a testing framework which should very much help in finding bad queries before they go bad in production. Having a proper testing framework from the beginning of the rewrite reduces the chances that future changes will go back and hurt existing code without me knowing about it. But it also means I have to slim down that 36GB of data y’all have created over the years, to a set of data that is complete enough that I can test every scenario but doesn’t take 2 hours to load on the testing database. This work is drudgery, but invaluable, which is the worst kind of drudgery. But given the jetlag, it’s probably about what my brain is up for right now.

OTHERWISE

Fall is here!  It’s the best time of the year in New England, except that I’m allergic to it, and still jet lagged from Taiwan. But the days are also growing sadly shorter, which isn’t helping with the lag. I’m hoping to get up into the north country this weekend and spend some time in the outdoors crispness; I’m hoping it won’t be a total tourist mob scene in the White Mountains, but likely we’re past foliage peak up there anyhow.

This blog is and has long been hosted on WordPress. But recently, the WordPress project has decided to set itself on fire, thanks to an apparent hissy fit by the founder. He runs both the nonprofit that owns the open source code and update servers, and a for-profit hosting company, Automattic, built on the same software. He claims that a competing hosting company, WPEngine, isn’t giving enough back to the community and somehow abusing trademarks in a nebulous way. But WPEngine isn’t required to give anything back at all, and the trademark claims seem spurious to me; that was enough to raise my antenna. And then the founder leveraged control over the nonprofit to cut off WPEngine from the open source code and security updates. They took over a plugin created and maintained by WPEngine, and pushed out their own changes to it, as well as renaming it, under the guise of “security.”  This update would have auto-installed on thousands of blogs without the administrators thereof consenting to the change or even being aware of it.

That final bit crossed the Rubicon in my book; I no longer trust WordPress, and will therefore soon transition this site to another platform. Honestly, WordPress was never a perfect fit for me anyway, and because it is so common on the web it also requires a lot of security filtering; even my little blog suffers near constant hacking attempts. The most obvious alternative appears to be Ghost, which has the virtue of integrating in email subscriptions, so if you are one of the three people who regularly like to keep updated on my blather, you’ll have that as an option soon.

I don’t really want or need an additional side project, but so it goes sometimes in the world of open source.