tournaments – Page 2

Answering innumeracy with data

He still doesn’t understand my point about increased mutuality, but I’ll write that up when it’s not 12:30 AM on a Tuesday.

For now:

Percentage of 1-off judges at Lexington: 8.46%
Percentage of 1-offs at Columbia: 8.33%

Man, really blew mutuality to shreds with those 9 tiers at Lex, didn’t we?

At Columbia, 12.5% of the VLD pool did not pref; at Lexington it was only 9%, thus making the job harder at Lexington to boot. As we say in the business, “No Link.”

Potshot #2

Menick is reaching a flawed conclusion, in my opinion, not because his reasoning is unsound â€” try not to faint of shock â€” but because his underlying assumptions are.

Assumption #1: My pref sheet is based on paradigms.

Hah, as if. The nether regions of the pref sheet are sometimes based on a reading, or the mere existence, of a paradigm. The meat of the pref sheet, however, is based on first hand experience with judges. We keep a written log of our RFDs in our team Dropbox; I read those regularly to adjust our pref sheets, because that’s actual data, and not just random assertions and opinions like the paradigm represents. I’ve never read the paradigms of our top judges; I rely instead on being in the room when they say “I vote on theory” and ignore their assertion that they never will.

As such I find judges, the ones I prefer anyway, rather predictable. When my debaters lose a round, they get an L; when they win one, they get a W, and we can talk about how to make the former turn into the latter. 99 times out of 100 I can figure out why they lost a given judge and most of those times, it’s because they ignored something I told them to do beforehand. The other 1 time, the pref sheet probably changes. My pref sheet isn’t about converting Ls into Ws; it’s about winning when we do win the debate, and being able to coach beforehand.

Remember, seÃ±or Menick, that the folks you’re hearing from are those coming into tab suggesting that you find them a better panel, as if you hadn’t already thought to try. It takes someone rather unfamiliar with the way things work to imagine that tab rooms put out horrendous panels and withhold good ones until asked. In short, you’re hearing only from people who don’t know what they’re doing, and drawing general conclusions from that data. Talk about your flimsy evidence.

Assumption #2: Good pref sheets can win all the rounds!!!

A good chunk of rounds are yours no matter the panel; a good chunk of rounds likewise are impossible to win. If you’re a senior with ten bids against a sophomore with ten cards, you’re going to win unless you do something tragic, even in front of a 1-5 judge. The pref sheet isn’t about that; it’s about helping you in the marginal debates. Which debates are marginal depends on what your level is; a younger team should have different prefs than an older.

Assumption #3: The W/L is the only concern of the pref sheet

This assumption is the most important of the set. The number I hang on a judge isn’t entirely about the likelihood they’ll vote for us. It’s just as much informed by the type of debate that judge would like to hear. If you don’t like debating theory, then you de-pref the theoriest of the theory judges, even if one or two of them is likely to vote for you anyway. The aim here is not to win rounds you would have lost otherwise, but to have debates you find enjoyable and are prepared for.

NStar was a mighty LARPer, and was most at home with DAs and CPs and such. If some framework-happy sophomore in her first varsity tournament came along, and hit him round 2 in front of a 1-2 judge in her favor, a judge who positively loves philosophical framework debate, she’d still be toast. But she could argue the kind of debate she wanted, and he could not, despite being better at it. So even with the W, a harm is caused. It’s sometimes an unavoidable harm, but it’s one the pref system is designed to minimize; and it’s one that blunt, imprecise tiers minimize less well.

Note that it is not a harm that NStar had to debate framework; it’s simply relatively unfair that one debater got to steer the debate into her own home turf and the other didn’t. A better outcome is a judge who likes yet a third style of debate, and so both debaters have to adapt equally. That is, after all, the idea behind mutuality, and an argument for the maximal mutuality possible.

At a wider level, too, I’ll pref differently for younger debaters. Some judges are not as good for us stylistically, but they’re great educators and can give excellent feedback. I won’t stake a junior or senior’s last bid round on being able to adapt to them in front of their favoritest debaters; but I might take the chance to get a good post-round for a student who is going to lose those rounds anyway.

Assumption #4: Team’s opinions don’t matter

Take away everything else, denounce it as the foul lies of a dirty Papist or what have you; and you’re still left with a final thought: the perception of fairness sometimes matters as much as the reality. If a rating is entirely about unfounded imprecise opinions â€” which I would assert might be true for some people’s pref sheets but isn’t of mine â€” those opinions still matter. A kid walking into a debate where she feels she has no shot to win because of the judge, likely will not, even if she actually had a decent shot after all.

There are a lot of reasons to de-prefer a judge; there’s the judge who might actually like your style a great deal but freaks out out, or the college freshman judge who you have a crush on and can’t string two words together in front of. There are a host of considerations that go into a pref sheet, and some of the are opaque to the tabber; if we’re going to have the tool at all, it may as well be as good as you can make it.

RFD

Finally, to steal a page from debate; there is no offense in the round for less precise categories. I’ve given a half dozen or so positive reasons for more precision; thus far all I’ve heard in reply is doubt whether the precision achieves a real effect. But without any affirmative reason to prefer blunter categories, who cares? Why is a tournament better for having 4 categories instead of 8 or 12 or 16? All I’ve heard argued is defense: doubt that the 8 category tournament is better than the 4 category one. There’s no argument on my flow of anything being harmed by having smaller, more numerous categories. Sure, an 8 tier system will have more 1-2 matchups, but if it makes you feel better, all those 1-2 matchups would be 1-1s in a 4 tier system; and there’ll be fewer of them on the pairing than there would be hiding underneath those 1-1s. Better for the debaters and the judges both; though perhaps worse for perfectionists tabbers. Speaking as a perfectionist tabber, surely my sensibilities are an unimportant factor here.

In absence of offense for the larger category pref sheet then, I’m sticking to, and advocating for, more of ’em.

I’ll have my tiers and eat them too

Menick contends that fewer tiers are fine, thanks. I’m definitely in the more-is-better school with categories. Fewer categories needlessly throws away information that could be used by tab to make more mutual pairings, is why.

I think the point of difference is that he defines a “mutual” matchup as a matchup where the two judges are in the same category, and anything else as imperfect. But the real picture is murkier than that; simply making the categories larger and thereby forcing me to rate more judges in a category does not magically make all those judges equally preferred to my debaters.

Suppose one tournament has 4 categories and another 8; and the first delivers 100% mutual matchups, and the second has a number of one-offs on the pairing. The second tournament will have delivered the more mutual judging. The 4-category tournament will have many more matchups that are just as non-mutual as those 8 category one-offs. They will only nominally be mutual to the eye of the tabber. There will be more of them, too, as the tab system no longer knows which matchups would have been 1-2s in a 8 category system, and so it can do nothing to minimize them.

When you place a 1-2 judge in an 8 category tournament, you know what you’re doing and you know there’s no better choice. If that same tournament used 4 tiers, then you might place that same judge into the same debate, despite there being a more mutual option which is concealed by the broader, less precise categories. The pairing looks prettier, but at the expense of the missing data which might make it fairer. Choosing tier sizes should not be about satisfying the OCD of tab staff.

Pairing mutual matchups is becoming an automatic process, and there’s little reason to deprive this process of additional data. The point of expanding to more categories while necessarily growing more permissive of one-off matchups is not to increase the numbers of those matchups, but to minimize the number of hidden ones. So from a tabbing perspective there is little defense in my mind to using blunter, less precise categories.

It may be there’s a limit past which coaches are unable to make distinctions in the judging pool. That limit is higher than 9, for me: I certainly could have filled out a pref sheet at Lex with clear distinctions between tiers; all my 1s would have been preferred to the 2s, all 2s over the 3s, etc. The distinctions would have carried real information well down my sheet, and if I were going to the tournament, I’d want the tab staff to have that information in pairing my judging.

College debate has operated for some years under the premise that the limit of reasonable distinctions does not exist, and rates judging ordinally at most tournaments. I think the high school community largely hasn’t followed suit only because our software wouldn’t permit it, not for any inherent reason. At worst, finer distinctions do no harm; if you really don’t have any point of difference between 6s and 7s on your sheet, then just randomly assign them between the two ratings. Given that I am a coach who makes those distinctions, I don’t see why I should sacrifice them because another coach does not.

So in short, I’m a computer programmer, and a data person, and I don’t see much value in sacrificing more data for less.

The Massacre of the Novii

On July 1st of each year, I have a ritual I call the Massacre of the Novii. Today I go through the database on Tabroom.com and change every student listed as a novice to not be a novice anymore. I also this year went through and automatically marked any student with a grad year 2012 or before as “retired”. So your team rosters will be considerably smaller; and *sniff* our little babies are all grown up now into the vicious argumentative hellions we’ve trained them to be. Papa’s so proud.

I’ve been working feverishly on Tabroom.com this summer, mostly doing boring behind the scenes work to prepare to it function much more smoothly with debate events, particularly international debate events. This work is supported by a grant from the Open Society Foundation, which is George Soros’s main philanthropic effort, and IDEA, the International Debate Education Association. The plan is for Tabroom to become the integrated web fronted for debate tournaments worldwide, working together seamlessly with the CAT/debateresults.com system developed by Jon Bruschke of CSU Fullerton, who’s been a great hippy Californian partner in arms in this effort. Mostly, I’m doing the web stuff, he’s doing the desktop client.

This is not a black UN helicopter taking over Tabroom; I’m still going to be in the thick of it, and the software itself, by OSF mandate, must be open sourced. This effort on OSF/IDEA’s part is about expanding their services and therefore their own profile in debate, and also attempting to cross-pollinate good ideas from abroad and the US. It’s not about seizing control of anything. There are also plans afoot to integrate this tabulation and results system into a global honor society, in which debaters can be recognized for their entire careers, high school, college and coaching, worldwide. All of which I think is very exciting, and I’m glad IDEA is stepping in to fill these needs.

The programming itself is unspeakably boring, because it mostly consists of me correcting some fundamental flawed assumptions and mistakes that I made back in the beginning of Tabroom 2.0, which was released more or less in 2004. (Tabroom 1.0 was 2000-2003, but nobody ever used it except for me). Tabroom 3.0 features a professional graphical design based on the new IDEA website, which is spiffier than anything I could come up with; I can design for clean, but not quite for “shiny”.

But I’m also working on some cool new features; I don’t want to over promise, but I expect that Tabroom will support texting/email of pairings, team management features where your students can sign up for tournaments directly on tabroom and only requires coaches’ approval, the ability for judges to enter their ballots and results directly online by computers and phone, more varied ways of displaying results (a carryover from debateresults.com), and a few new surprises that I’m cooking up. It’ll support US formats, together with various global formats, such as 4 team British Parliamentary debate and more.

So that’s the future of Tabroom.com. Launch is August 1st for registration, Sept 1st for tabbing/pairing features. And brave new worlds shall be upon us.

Survival

Yale survived and even flourished without me, which is good for both its future and my own. I got September back this year, and it turns out to be a lovely month, with all kinds of nice cooling weather. Though today it’s raining like hell, but ah well. They even refrained from doing anything embarrassing in the awards ceremony, which I appreciate. I was also sick all day Saturday of Yale, which had it happened when things depended on me there would have been an adventure. I survived, though recovery has been slow. So that’s all to the good.

Now we have a lull prior to Bronx. Bronx will be about four weeks long, so that’s good. I’m running IEs, so there’s going to be a curious beat to the weekend, where I’m round-robinning for the first day, then this magical Sushi tour that Cruz is so enraptured with, then a full day off apart from coaching, since there’s no speech on Friday, which will give us time to panel and arrange. Then we’ll be over at Fordham on Saturday for IEs, then back to Bronx for bubbles coaching presumably, then IE and debate elims together back at Bronx on Saturday. Woo boy. It also means we give Cruz another awards ceremony, and this likely a large and well attended one, as IE people are more into that kind of thing than debaters are.

We’ve failed to record a TVFT after about three times trying. That’s par for course. Timezones and too many people, it happens. I’m also thinking of the best way to commit my Standard Economics Lectures to media. I have given them at extemp camp and practices many many times at this point, and wouldn’t mind getting them down in a form where people can digest them at leisure. But I don’t know that my style of teaching is suited to either video or audio-only. I do have a good outline going though.

And I’m blogging. Perhaps regularly. We’ll see.