The 4-2 Screw by Breann Smith

It is a major tournament, and you have hit fierce competitors, including Round Robin participants. You are 4-2, and have a good chance of breaking. Then, the breaks are posted. You rush up to the list, and scan it eagerly for your name. After checking then double-checking, you look to the speaker point cut off. You realize you have missed breaks. By six-tenths of a speaker point. 

At major tournaments, most 4-2s are expected to break, but what if a debater’s speaks are not high enough? This happened, for example, to six 4-2s at the Greenhill tournament, where the average 4-2 missed advancing by merely six-tenths of a speaker point.1  This phenomenon, known as the “4-2 Screw”, affects all tournaments, national and local alike. Many ways of evaluating speaker points exist, and within those evaluations there exist a variety of variables.

How “Pretty” You Speak

Some judges evaluate rounds based on how “pretty” the debater speaks. While this is consistent with what most think when they hear the term “speaker points,” it has a few fundamental flaws. First, it advantages those with speech patterns that are closest to the judge. If the debater speaks like they would, the judge is more likely to think they speak well and thus reward them with higher speaks. Inversely, if they speak in a way that the judge finds peculiar, the odds of higher speaker points do not seem likely. Second, this method of evaluating speaker points is not ideal because some may have an accent the judge finds cacophonous, or English may not be their primary language. In both of these cases, the debater is at a disadvantage because the judge may not think they spoke well and thus give them lower speaks. Both of these points have been confirmed by a study at the University of California,2 which found we strongly prefer voices that sound similar to our own and we prefer “breathier” tones as opposed to more “raspy” ones. The third and most important problem with evaluating how “pretty” a debater speaks is it disadvantages those with speech impediments, such as stutters, lisps, etc. This puts the debater in a double bind of sorts. They could either a) tell the judge about the impediment or b) not tell the judge about it. With the first option, the judge could over, or under, compensate with their speaker points, depending on that particular judges evaluative tendencies. With the second option, the debater risks receiving poor speakers points due to something they could not prevent. Each of these options leaves much to be desired.

Strategic Decisions

The second type of evaluative mechanism for evaluating speaker points, strategic decisions, seems even more arbitrarily decided than how well one speaks. First, it encourages judges to come up with their own strategies of how they would win the round. This is harmful because they may not pay full attention to the arguments presented in the round because they are thinking of their own strategy, and it rewards debaters for thinking like the judge does. Second, judges may not see the strategy the debater was going for. While thinking of their own strategy, judges might not see, or even completely dismiss, the strategies actually present in round. This is unique to speaker points because judges have empirically voted off arguments they do not like, and the same could go for strategies. With speaker points, there are more variables in it’s determination and thus less likely the resulting points would be very high.

Out Round Odds

Odds of being in out rounds are a third method of deciding speaker points. This has two main problems, the first of which is the competition level of the particular round. If it is a competitive round, and the opponent is a nationally ranked debater, a good local debater might not show their best. This particular round may not show their true chances of being in out rounds and thus reflect in their speaker points, affecting their chances of breaking. The second main problem is the debaters odds of being in out rounds are often determined by those speaker points. If according to this scale a judge views a 28 as “good and should clear” but the speaker point cutoff is around 29, the debater that “should clear” may miss clearing based off those speaks.

Entertainment

Yet another way to evaluate speaker points is how entertaining the debater is in round. Unless the judge puts exactly what they find funny on their paradigm, debaters will not know how to “be humorous” according to what the judge likes. This is not Humorous Interp, and debaters often make arguments that when paired with humor would be inappropriate. For example, if an affirmative ran a Revenge Porn AC on this topic it would be kind of hard to crack a joke somewhere in there without being plain offensive. Further, if a debater is not funny according to the judge they either will not get good speaks or the judge will decide their speaks in some other arbitrary manner. Another problem with this method is how does one separate out the categories of “humor?” Would a 28 be, “You were kind of funny, but I didn’t laugh,” but then what about a 29, or a 30?  This seems to be an arbitrary and ineffective measure of who should break at a tournament.

Point Fairies & Tenths-of-a-Point

We all know and love those judges that give out 30s, but is that really a measure of who should break? According to this paradigm, you can be a decent or mediocre debater and still get high speaks. While this is encouraging to the debater who receives the 30, it creates an unbalance within the tournament pool. If a debater goes 4-2, hits good competition, and struggled for 28s and 28.5s does not break over a debater who was under the same conditions but received a 30, the pool has effectively been skewed despite similar debating skill. Another point system that skews the breaks (although somewhat less juristically) is utilizing tenths of a point. With the inflation before mentioned, some judges have started using half or even tenths of points to differentiate debaters. While the intention behind this was good, it creates only more arbitrary distinctions. Is there really a difference between a 28.9 and a 29? Odds are no, but judges insist on doing this to offset inflation. This only created frustration when, for example, a debater misses breaking at the Greenhill tournament by .3 speaks. This clearly shows how important arbitrary tenths of a speaker point really are in the grand scheme of breaks.

The Alternative

Times may seem bleak for 4-2s everywhere, but do not fret, there is an alternative! Instead of the traditional “Record-Speaks-Opponent Wins-Opponent Speaks” format, tournaments should utilize opponent wins BEFORE speaker points. If power matching and breaks were determined by opponent wins, the quality of rounds would increase. Say for example two TOC level debaters hit in one round, and two novice debaters also hit round one. Due to Mutual Preference Judging, the national circuit debaters would most likely pref a judge that does not hand out speaker points, because they are constantly judging that level of competition. On the flip side, the novice debaters at that same tournament may get new judges or point fairies. The result would be one TOC level loser and one novice winner, seemingly undifferentiated according to speaker points. By using opponent wins, the TOC level debater would be rewarded with higher chances of breaking for hitting good competition while the novice level debater would be less likely to break due to the quality of their competition. Using opponent wins would also increase the quality of rounds by weeding out the “easy draws.” This is due to the matching based off of opponent wins pairing debaters who hit similar level competition, thus increasing the quality of the round.

Once written down, this alternative does not seem all that crazy. In fact, when asked “How should breaks and power matching be determined,” 72.4% of those involved in debate were in favor of opponent wins, while only 27.6% were in favor of continuing to use speaker points.2 Compared to arbitrarily decided speaker points, using opponent wins before speaker points seems to be a very supported and viable option for determining breaks and power matching.

Notes And Methodology

  1. For each competitor, it was calculated how many points they were from breaking. Those points were then averaged to get the final result of .6, meaning that the average 4-2 missed breaking by six-tenths of a speaker point.
  2. McGuire, Grant. [Professor at the University of California Department of Linguistics http://people.ucsc.edu/~gmcguir1/] “Why Some Voices Sound More Attractive.” Phys.Org. American Institute of Physics, 5 Nov. 2010. Web. 30 Nov. 2014. <http://phys.org/news/2010-11-voices.html>.
  3. The question “How should breaks and power matching be determined” was presented to five different debate-centered Facebook groups over a time span of one week [October 22, 2014-October 29, 2014]: (i), NSDA Student Leadership Committee, (ii) High School Policy, (iii) Debate Girls 2016, (iv) YAAS Lab, (v) Finktank. Anyone who was in multiple groups only got one vote. If that same person voted one way in one group, and another way in a different group, their vote(s) were voided.

NSDA SLC:

Total Members- 173

Number Of Respondents- 54

Opponent Wins- 47

Speaker Points- 7

High School Policy:

Total Members- 2,059

Number Of Respondents-95

Opponent Wins- 61

Speaker Points- 34

Debate Girls 2016:

Total Members- 82

Number Of Respondents- 15

Opponent Wins- 8

Speaker Points- 7

YAAS Lab:

Total Members- 21

Number Of Respondents- 9

Opponent Wins- 8

Speaker Points- 1

Finktank:

Total Members- 16

Number Of Respondents- 8

Opponent Wins- 7

Speaker Points- 1

Total: 171

Number For Opponent Wins: 131

Number For Speaker Points: 50

Percentage for Opponent Wins: 72.4%

Percentage for Speaker Points: 27.6%

  • Breann Smith

    UPDATE: The purpose of this article was to start conversation about how flawed and arbitrary speaker points are, and possibly come up with alternatives. I agree opponent wins does not seem ideal, it was simply the best system that I was aware of at the time. Since the publishing of this article, Chris Theis released his article about the issue (http://vbriefly.com/2015/01/12/opponent-adjusted-performance-score-a-proposed-alternative-to-speaker-points-as-the-first-tie-breaker/) and it looks to be an excellent alternative.

  • Bpoop

    I think a much better solution would be a better speak point allocation process. In the status quo, almost every judge gives speaker points between 25-30. This makes no sense.
    1. It means a lot of new judges (especially at tournaments like Berkeley where flow and lay overlap) mess up and think “hey, that debater did a great job. A 27 out of 30 would be a great score to reward such prowess” while simultaneously, more educated judges are giving 27’s to mediocre debaters.
    2. Anything below a 25 is usually granted as unacceptably low unless a debater behaved offensively. This gives judges a total of 5 points to use to differentiate debaters. Again, no sense.

    A 0-10 scale makes a lot more sense. With lay-friendly protocol.

  • Grant

    I think that we should consider using judge variance as the tiebreaker after wins, at least at local and regionals tournaments. For those that haven’t encountered j var before, it’s pretty simple in its effect: you get a high j var by getting a high number of speaks relative to how many the judge gave out over the tournament.

    While I’m not sure if such a system would work well on the circuit (given that, in my personal experiences, judges were more consistent in their scores), I do believe it should be more commonly used in more local and regional tournaments.

    One of the main issues with local and regional tournaments is the high number of judges who either a) don’t know how many speaks to give or b) give out way too many or too few. J var solves this, at least to a decent extent, by controlling for these variables and rewarding debaters based upon how well they did relative to their competition that was faced with the same judge.

    However, there are are issues with j var that definitely need to be evaluated, the most obvious stemming from sample size:
    1) A judge may have rounds off, thus making inconsistent how representative the j var truly would be in determining how good a debater was relative to others
    2) A judge may just get a draw that is particularly strong or particularly weak, reducing the ability of the judge to influence who breaks/places

    Although I haven’t personally had too many issues with getting the screw and I agree with others who have argued that good debaters can avoid it most of the time, I do think that replacing speaks with j var as a tiebreaker is an attractive option for tournaments with less experienced judging pools.

  • Nathan Mostow

    As someone who was once a 4-2 screw by a 0.1-point margin, I still think debaters only have themselves to blame in the event of a screw. It’s worth pointing out that the screw is rarely a problem for the most successful debaters on the circuit; even if a couple judges might give a 4-2 debater some unfairly low speaks while others may unfairly inflate another debater’s points, good debaters, even if they go 4-2, will still manage to avoid the screw. This is because a good debater will be able to consistently get speaks that are high enough to compensate for any arbitrary point assignments; if you consistently score in the 29’s range at most tournaments, then a few arbitrary point allocations aren’t going to hurt you. Points-based seeding thus does not let bad debaters advance over great debaters, it just lets decent debaters advance over slightly less decent ones. Thus, debaters who are truly determined not to suffer a 4-2 screw can and should avoid one simply through hard work and practice.

    • akskdkf

      Could not agree more, which is why I think the polling sample in the article may be misleading. The debaters who would probably want to use speaks would most likely be the debaters at the top of the LD national circuit, as it’s incontrovertible that they’re who are generally the benefactors of good speaks.

      Judging by the polling sample, though, over half (95 / 171) of the respondents were from a high school policy Facebook group, which, in my opinion, is definitely not a good representation of the LD community as a whole (different circuit norms, multiple speakers in rounds, etc).

      The remainder of the respondents include two groups of (at least in my knowledge, correct me if I’m wrong) intermediate-level debate camp labs and the NSDA leadership committee (which consists of very few LDers, and even fewer who debate on the national circuit). The last group, Debate Girls 2016, is also only juniors — even so, it’s around 50/50 in terms of its feedback.

      The important thing to remember is that the speaker points system does favor the best of the best, which is why any polling sample about this issue should also include those debaters to avoid these kinds of biases. In terms of how fair this is, I agree with Nathan — it’s those same best of the best, the ones who work the hardest and practice the most, who probably should be breaking at these tournaments.

  • Guest

    There’s an old debate adage- “You didn’t miss on speaker points, you missed on wins”

  • Chris Kymn

    While speaker points aren’t perfect, I’m not convinced that prioritizing opponent wins in seeding is the right solution, given the way prelims are paired.

    First, debaters can’t control who they hit in presets. Due to power matching, this randomness has a disproportional effect. For two debaters with identical speaker points and records, the only difference in their opp wins would be from presets, excepting pullup situations.

    It also seems to overweight the scale of difficulty of presets for those who are breaking or close to breaking. For example, a debater who consistently goes 5-1 at tournaments is very likely to beat a debater who usually goes 3-3 or 0-6. The marginal increase in difficulty for this debater is not reflected by the significant difference in opp wins.

    Second, opp wins seem to harm debaters who have very high speaker points. In round 3, the 2-0 with highest speaker points will hit the 2-0 with the lowest speaker points, who may have merely gotten lucky draws and is not likely to have the best record. Even if speaker points are somewhat subjective, it seems disvaluable to indirectly lower the seed of debaters who are rated well.

    • Ben Ulene

      Agreed, although I think the author’s intent was to propose opp wins as a way to pair prelims in addition to determine breaks. Chip hits the nail on the head with the second point as to what would happen if it were only used for determining breaks.

      Using opp wins for both prelims and breaks, though, seems to generate some unintuitive scenarios. Most tournaments use speaker points with high/low dropped as the first tiebreaker. This creates a weird double-bind scenario for using opp wins:

      A- If high/low opp wins aren’t dropped, it would seem to completely screw over anyone with whom a “bad / unfair” decision is made (as it would arbitrarily put them into a lower bracket for prelim pairings, giving them fewer opp wins).

      B- If high/low opp wins are dropped, it would mean that people are complete victims of their presets — one really hard draw in presets can leave someone with low opp wins for the rest of the tournament (same scenario of being in a lower bracket for power pairing), but it wouldn’t be reflected in the tiebreaker.

      I think that while the current system isn’t perfect, it definitely is somewhat self-correcting when it comes to opp wins — at least in my experience, judges generally give higher speaks overall in rounds with two really good debaters than in rounds where one debater is destroying another.

  • Shrey Desai

    I am sympathetic to the 4-2 and the 5-2 screw because I have had some good friends who have been affected by it, but I don’t think that a couple of the factors that you talk about in the article are relevant or unique to a person that is part of this screw since everyone is inevitably affected.

    Your first argument is about how “pretty” you speak, but I think judges would have a preconceived notion that as an educator and an adult, they cannot heavily insert their bias into the round to the point that they will give 26s or 27s to debaters who have a raspy voice. Your study says that most people prefer “breathier” tones, but the study doesn’t say that judges are more likely to give lower speaker points to people who have speech impediments or unusual speaking styles. It seems that this claim is merely speculative and I am not sure if people have been affected by this where the judge readily admitted that the reason a debater got a 27 was because of their speaking style, but without evidence that this happens, I’m not sure if it can be considered. Also, many judges do outline in their paradigm what they award speaker points based off of and they pretty much follow that. If a debater makes strategic decisions based on a raspy voice, they would probably get rewarded high speaker points because it was in accordance with their paradigm. Finally, even if there is evidence that judges are blindly enforcing a rule where they give lower speaker points to people with “raspy” voices, this is not a unique situation to 4-2s. All debaters who are judged by that particular judge would be affected whether they be 6-0, 5-1, 4-2, 3-3, or lower. Also, if this judge is recognized in this community as exemplifying such behavior, they could be striked at circuit tournaments, which solves any potential disadvantage.

    The second argument is about strategic decisions and this seems problematic because it overlooks the whole idea of judge adaptation. If there are two different judges, for example, traditional and progressive judges, they are obviously going to have different views on debate and what seems like a strategic decision. This is one of the main reason why paradigms exist – debaters should adhere and adapt to the paradigm of their particular judge. If a judge is well-versed in critical philosophy, reading analytical philosophy would be a bad strategy and therefore awarded with lower speaker points.

    The third argument is completely on the onus of the debater that is debating. Even if a local debater is debating a nationally ranked debater, they would still have to formulate a strategy to counteract the arguments presented in that round. If the strategy is not up to par with the judge’s expectations, then it would be awarded lower speaker points.

    The fourth argument is about entertainment but again, this is probably not the main factor that goes into a judge’s assignment of speaker points. In the instance where the affirmative read a Revenge Porn AC, the judge would have to be mature enough to realize that this is a serious environment. Also, even if the affirmative read a Revenge Porn AC, they could still make jokes based on the negative’s strategy, for example, if there was some topic literature argument on the topicality shell that made no sense. And, even if the judge were to assign speaker points based wholly off of entertainment, that would inevitably affect everyone and not just the 4-2.

    The fifth argument is about point fairies but I think instead of damaging 4-2s, this would probably help them. A 4-2 that just made the bracket would be on the top of the tenths placement and therefore the tenths-of-a-point idea would help them. Regardless, debaters can always pref judges who are point fairies if they are concerned about getting high speaker points. The debater who struggled for 28s and 28.5s probably did not do their prefs accordingly if they are preffing judges who give low speaks.

    Finally, the alternative is an interesting concept. Tournaments do use opponent wins in their evaluating of breaks but speaker points are a higher priority. I’m interested in seeing if there are any tournament directors or tabroom officials that would be interested in employing such a format.

    • Breann Smith

      Just as a foreword: I am aware most judges use many and not just one of the categories when deciding speaks, however for the purposes of the article I viewed each category in a vacuum. (For example: a judge may use tenths-of-a-point and strategic decisions, but the article isolates one of them and evaluates that.)

      Section One:
      Off the “preconceived notion as an educator”: I completely agree that judges know their roles and try not to be biased, however there will always be some personal preference seeing as no one can be completely tab. I am also not saying judges prefer certain tones on purpose. It is highly unlikely someone will sit there and think, “Hmm, they had a breathier tone, so I’ll give them high speaks.” The point is that such preferences are unintentional and due simply to our nature.

      Off “the judge readily admitting” speaker points: How often have judges had to explain their reasoning behind speaker points? I think if they did it would be much harder for them to pinpoint why exactly that number was chosen. While this point may be speculative, it does logically make sense.

      Off “not unique to 4-2s”: This is exactly the point. Everyone is affected by it, not just the 4-2s. I merely used that record as an example as that is usually the record that it takes to break at national tournaments.

      Off “striked at tournaments”: Sure, maybe some debaters would strike them, but others would still pref them because doing so would be to their advantaged.

      SectionTwo:
      I agree that paradigms are helpful, and debaters should adapt to their judges, however there are still some issues. First, the judge may say “plans are cool, that’s what I read.” Even within the realm of plans there are a lot of different ways they could be formatted or leveraged, and the judge may not like they position or how you debated it.

      Section Three:
      The example I provided may not have been the best one to illustrate the point. The argument behind that section was that if a judge has a speaker point scale on their paradigm, and a 28 means “pretty good, should clear,” but the speaker point cutoff was say 28.5 or 29, that debater that should have broken according to the judges scale won’t break. This is often because it is hard to predict where the cutoff will actually be, so the judge doesn’t know how exactly to adjust their points.

      Section Four:
      This is one of those “evaluating in a vacuum” points. I agree that most judges probably won’t use speaks solely off of entertainment, but for the article it is assumed that is the only metric. Also this doesn’t really address the fact it is really hard to know what the judge will find humorous, and even harder
      for the judge to put a number on how funny you are. (how does one separate out the categories of “humor?” Would a 28 be, “You were kind of funny, but I didn’t laugh,” but then what about a 29, or a 30?)

      Section Five:
      This is kind of the point. Not all the 4-2s would get that judge, so only the debaters that did would be advantaged, causing the before mentioned inflation of speaks. Also debaters may not want to pref those judges even if it means higher speaks because their styles may be juristically different.

  • Yoloswolo

    I think this system has a major flaw, as seeding becomes arbitrary. Debaters cannot control who they debate, and simply basing breaks based on the draw seems to punish good debaters whose early opponents have low win records. While with speaker points, individuals have more control, and can adapt to a judge to get higher speaks