Monday, September 7, 2015

What, precisely, is going on with the Hugo Awards data?

For those who recall the recent Hugo Awards controversy, including the massive votes for 'No Award' in five categories (doubling, in a single year, the number of 'No Award' results in the entire history of the Awards), there's some very . . . strange news.

Back at Sasquan, the BM passed a non-binding resolution to request that Sasquan provide anonymized nomination data from the 2015 Hugo Awards.  I stood before the BM and said, as its official representative, that we would comply with such requests.  However, new information has come in which has caused us to reverse that decision.  Specifically, upon review, the administration team believes it may not be possible to anonymize the nominating data sufficiently to allow for a public release.  We are investigating alternatives.

Thank you for your patience in this matter.  While we truly wish to comply with the resolution and fundamentally believe in transparent processes, we must hold the privacy of our members paramount and I hope that you understand this set of priorities.


Glenn Glazer
Vice-Chair, Business and Finance
Sasquan, the 73rd World Science Fiction Convention

Vox Day has more commentary at the link (and his readers provide their own thoughts below his, many of which are worth reading).

Folks, back in the 1980's I was a Systems Engineer at IBM.  I've had well over a decade in the commercial information technology and computer systems business, in positions ranging from Operator to Project Manager, from Programmer to End-User Computing Analyst to a directorship in a small IT company.  Speaking from that background, let me assure you:  I can 'anonymize' almost any data set in a couple of hours, no matter how complicated it may be.  To allege that 'it may not be possible to anonymize the nominating data sufficiently to allow for a public release' is complete and utter BULL.  Period.  End of story.

What the hell is going on at Sasquan (the conference that hosted Worldcon and the Hugo Awards this year)?  What are they not saying?  What are they trying to hide?



Brad J (Kazrak) said...

Anonymizing data is harder than you think, if your goal is to actually make it truly anonymous. See what happened when AOL tried to anonymize search results, or when Netflix tried to anonymize movie recommendations.

This ArsTechnica article is 6 years old as of tomorrow, and metadata analysis hasn't exactly gotten worse since then.

lpdbw said...

Well, I think we need to apply a saying from the witch hunts against Republicans from the 70's here:

Where there's smoke, there's fire. They're hiding something, and the only question is "What are they hiding?"

My guess is that while Brad and Larry really, really wanted to give the benefit of the doubt to the committee that they wouldn't diddle the votes, or allow them to be diddled, Brad and Larry were too optimistic.

Rolf said...

It is at least two of the three laws of SJWs in action . Lying, and doubling down. They suddenly realized they may get caught. Time for a full on audit, methinks.

Anonymous said...

" Brad J (Kazrak) said... Anonymizing data is harder than you think, if your goal is to actually make it truly anonymous."

Not if the data set is (1) unique and (2) simple. In this case, a voting list from each voting Sasquan member, and (2) a number to replace all other PII or non-PII data. No cross-reference list to link back to the original voter data is required -- it can be immediately disposed. This is a trivially easy case for the data in question.

Human research deals with this on a daily basis, and while it's growing easier to backtrack data and its source(s), you need data patterns that link to users to do it. Those patterns do not the requested 2015 voting records (after anonymization) for the data that would be released.

Sasquan would be wise to be as transparent as possible from this point forward for the sake of their reputation and that of Worldcon.

J Melcher said...

There about 300 more nominations offered, in total, than nominations for best novel. That is to say about 300 ballots left "best novel" blank but specified a nominee as 2014's best, for instance, "Fan Artist".

So, if (purely hypothetically) the anonymous data shows a cluster of 100 or so people who all nominated for their buddy in a minor category while leaving the major category (or many major categories) then that might be taken to indicate a collusion on a "slate".

If the data also shows the date ballots were cast, and if a cluster of nominations for a minor category all arrived on or about the same date, there again might be comparisons drawn to a blog post or a email list posting and accusations made regarding collusion and slate actions.

Individual privacy would not be the issue. The issue is whether or not clusters of the form hypothesized show up and betray the preferences and actions and organizations of some sort of "Journo-List" style effort.

J Van Stry said...

The only thing that could complicate it, is if there is evidence of wrong doing that they do not wish to expose.

jaed said...

One of the first things anyone would do with the anonymized data is to partition it: that is, group all the exactly-matching ballots together, to see how many groups of exactly-matching ballots there were and how many ballots in each group.

It was mentioned that the admins are working with the EPH proposers (the Tor-backed proposal to change the way nominations are counted) to see how EPH would have changed the list of nominated works this year.

My guess is that during this analysis, someone figured out that there weren't all that many nomination ballots that exactly matched the Sad Puppies or Rabid Puppies lists - that people used them as recommendation lists, adding and removing according to their judgment. And perhaps that there were other matching groups of nomination ballots, indicating the presence of some other, non-public list or lists. Either of these conditions would blow up the "But SLAAAAATE!" narrative. If they are both true, and if there were more ballots corresponding exactly to non-public slates than to either Puppies list, well... some people might end up very embarrassed.

And we can't have that.

(I normally try not to indulge in conspiracy theorizing, but if this memo is authentic - I don't think Vox said where he got it - then I'm left with little else. The "we can't anonymize it" excuse makes no sense, and the "we've been working with the EPH proposers" makes all too much.

BB said...

So, if (purely hypothetically) the anonymous data shows a cluster of 100 or so people who all nominated for their buddy in a minor category while leaving the major category (or many major categories) then that might be taken to indicate a collusion on a "slate".

Yes, a slate or slates, other than the evil puppy slates, which we are assured never happens. Not this year and not any of the previous years either. Everyone knows that only the puppies created and voted slates. And puppies are evil.

richard mcenroe said...

I smell books being cooked. You?

Cubist said...

Maybe Sasquan's concom are worried about the possibility that some nasty person or persons might use insufficiently-anonymized data to identify nominators and/or voters, so that said nasty persons could harass or SWAT those nominators and/or voters?

richard mcenroe said...

The only thing worse than the crime is the coverup.

richard mcenroe
Today at 12:34 AM

To Sasquan Info

NO ONE with any data experience is going to believe your excuse. The only rational explanation that can be drawn is that you are deliberately suppressing evidence of collusion and malfeasance in the Hugo Awards.

Please consider this e-mail as authorization to release my voting information to assist in any investigation.


Richard McEnroe

richard mcenroe said...

Cubist: You mean sgain, right?

Mark said...

@richard mcenroe

No need to wait for the Hugo Admins to release your nominating data, you can do it right here and now.

bmq215 said...

Brad's right, anonymizing this sort of data can be a huge challenge and is nigh impossible with some data structures. Sasquan knows that, regardless of what the data show, it will be picked apart with a fine-toothed comb by both sides. If one or both sides think they can use it to attack the other, they will. Any potential clues of voter identity will be exploited.

Sasquan's in a tough spot because if they don't release the data it will be taken as evidence of a conspiracy. If they do release it, however, chances are fair someone will figure out how to link it up with voters and thereby violate the assumption of privacy in the voting process. If it's not dead already, that would kill it.

Think about it; if this was a conspiracy by Sasquan they could just release a bunch of fake data and none would be the wiser.

Total said...

I was a teenager in the early 1980s, so I'm sure I can pontificate loudly on what teenagers think and do today, right?

Yeesh. As a number of people have pointed out, anonymizing data isn't easy. Netflix found that out pretty quickly.

Peter said...

To all those saying it's hard to properly anonymize the data: I'm sorry, but I can't agree, provided that you have someone competent doing the anonymizing. I've had to anonymize several datasets professionally, and had to do so again as part of the statistics required for my Masters degree in management. It's not rocket science. If the data is restructured during the process of anonymization, it's almost impossible to reconstruct it in its original form.

Unfortunately, many data anonymizers today are doing it by rote, following a series of steps someone wrote down for them without understanding what they're doing or why they're doing it. That way lies trouble.

Bruce said...

I'm with Peter, with over 21 years in IT, anonymizing this data set is not only possible, but likely relatively easy.

richard mcenroe said...

Mark, that's true, I could.

David Lang said...

Anonymizing data sets that contain information about the voter is hard.

But if you don't care about the voter information, then it's rather easy.

If you strip out all info about who submitted the nomination (including all information about where they live) so that all that's left is the date of submission and the list of nominees, what's to tie it to individuals?

now, you may be able to extract all sorts of interesting things about voting patterns, cliques, etc. But isn't that just what people are interested in doing here?

richard mcenroe said...

Dear fans,

We're sorry, we'd like to help you, but that would be betraying the trust of the individual voters whose trust we've already betrayed and whose money money we took under false pretenses.


Jason Rennie said...

I suspect the Torlings are screeching at Sasquan behind the scenes to bury the data because it will show up their perfidy and demonstrate Larry was right.

A more concerning possibility. The Sasquan committee is corrupt and it isn't possible to reconstruct the results from the voting data because they falsified the results as they saw fit and whoever agreed to release the data was no aware of this and has since been apprised and is in damage control mode.

Anonymous said...

If the information being suppressed were harmful or embarassing to the Puppies, then they wouldn't be suppressing it. QED.

And frankly, after the Wellstone funeral they made of this year's Hugo ceremonies, Sasquan doesn't deserve the benefit of the doubt.

--Wes S.

Anonymous said...

Releasing data to some people (the proponents of EPH) is so much worse than releasing the data to no one that viewing their efforts as honest becomes harder.

Timbo said...

Oh what a tangled web we weave, When first we practise to deceive.