Netflix's "Brokeback" problem could get messy

It looks like Netflix might be spending more than $1 million on a recent campaign to improve its recommendation engine. The movie rental company recently held a contest that successfully improve its recommendation by more than 10%. But now an in-the-closet lesbian woman is suing the company for privacy invasion, saying that she could have been outed due to Netflix sharing data that wasn't quite so anonymous.

While her claims may be spurious, this could have legal implications for the ways user information is shared and stored online.

The Doe v. Netflix lawsuit alleges that film preferences are personal information that Netflix does not properly protect. From the filing:

"Jane Doe, a lesbian, who does not want her sexuality nor interests in gay and lesbian themed films broadcast to the world, seeks anonymity in this action.

To some, renting a movie such as “Brokeback Mountain” or even “The Passion of the Christ” can be a personal issue that they would not want published to the world.

The Brokeback Mountain Factor is described thusly: Our secrets, great or small, can now without our knowledge hurtle around the globe at the speed of light, preserved indefinitely for future recall in the elec- tronic limbo of computer memories. These technological and economic changes in turn have made legal barriers more essential to the preservation of our privacy."

The problem arose when Netflix sent the information to contestants competing to improve Netflix's recommendation engine and win $1 million earlier this year.

According to Wired:

"In order to get a better movie recommendation algorithm, the online DVD rental company gave more than 50,000 Netflix Prize contestants two massive datasets. The first included 100 million movie ratings, along with the date of the rating, a unique ID number for the subscriber, and the movie info. Based on this data from 480,000 customers, contestants had to come up with a recommendation algorithm that could predict 10 percent better than Netflix how those same subscribers rated other movies."

But given that information, it turns out to be pretty easy to find out about Netflix's users. A few weeks into the contest, two University of Texas researchers — Arvind Narayanan and Vitaly Shmatikov — were able to identify Netflix customers (and their political leanings and sexual orientation) by comparing supposedly anonymous Netflix reviews with ones posted on IMDB.

According to the lawsuit, "the Brokeback Mountain factor" could be combined with other user information that can be used by marketing companies to target and categorize consumers.

The suit seeks more than $2,500 in damages for each of more than 2 million Netflix customers, which could break Netflix's contest dreams pretty quickly.

In addition to the monetary request, the suit wants to halt Netflix from launching a second contest to improve its recommendation engine. For this contest, the company is set to release customer information such as user ZIP codes, ages and gender and movie ratings. Althought user names will be replaced with ID numbers, it won't be hard to identify individuals.

Paul Ohm from Princeton's Center for Information Technolgy Policy blog, identifies the problem like this:

"Researchers have known for more than a decade that gender plus ZIP code plus birthdate uniquely identifies a significant percentage of Americans (87% according to Latanya Sweeney's famous study.) True, Netflix plans to release age not birthdate, but simple arithmetic shows that for many people in the country, gender plus ZIP code plus age will narrow their private movie preferences down to at most a few hundred people. Netflix needs to understand the concept of "information entropy": even if it is not revealing information tied to a single person, it is revealing information tied to so few that we should consider this a privacy breach."

While this specific lawsuit may be seen as overblown or opportunistic, it's not hard to see how other supposedly anonymous data may get companies into trouble online.

The problem lies not really in revealing what movies people watch online (though whoever helped Crash become the most popular Netflix film should be embarassed), but how different data sets, when combined, can reveal much more than individuals would like to share on the internet. 

Based in New York, Meghan Keane is US Editor of Econsultancy. You can follow her on Twitter: @keanesian.

Add your own

Reader comments (9)

  1. Matthew Curry Matthew Curry Silver

    Head of Ecommerce at Lovehoney

    7:56AM on 18th December 2009

    I'm not entirely certain of the relevance of knockoff shoes to this debate.

    Really econsultancy moderator types, there must be a way of stopping all this comment spam.

    Anyway, I have to ask, what on earth does renting Brokeback have to do with being lesbian?

  2. Avatar-blank-50x50 Mike

    3:32PM on 18th December 2009

    "...though whoever helped Crash become the most popular Netflix film should be embarassed..."

    Haha!

  3. Meghan Keane Meghan Keane

    US Editor at Econsultancy

    5:04PM on 18th December 2009

    Matthew, (Deleted the shoes commented.) I think the issue is that movie queues are personally identifiable at all. I'm not clear that "Jane Doe" can be identified as a lesbian, but you can know a lot about someone if  you can compile enough of their online choices, and that's especially worrisome when companies claim that their data is anonymous. And it turns out not to be.

  4. Avatar-blank-50x50 Patrick Clarkson

    offical at new york post

    6:08AM on 21st December 2009

    jetfuellatte, it’s kinda pointless to argue that it can’t happen since the researchers have already done it with Jane Doe’s information.

    http://www.topnflnews.com/

  5. Avatar-blank-50x50 chinese wholesalers

    3:27AM on 29th December 2009

    These technological and economic changes in turn have made legal barriers more essential to the preservation of our privacy

  6. Avatar-blank-50x50 Netflix

    11:03AM on 6th January 2010

    I think there is no problem in such type of movies online. There should not be problem in that.

  7. Avatar-blank-50x50 m65

    9:57PM on 30th January 2010

    wow everyone is just looking for a reason to sue nowadays

  8. Avatar-blank-50x50 Air Shoes

    6:23AM on 23rd March 2010

    t about someone if you can compile enough of their online choices, and that's especially worrisome when companies claim that their data is anonymous

  9. Avatar-blank-50x50 Violet Petran

    9:59PM on 12th April 2010

    This case is an interesting interplay between culture, law, and business.  I just dont feel like this suit holds any muster and I think that people cannot expect the same amount of privacy they may be accustom to when the information is online.

    I really liked your article and mentioned it in my blog! http://lawblog.legalmatch.com/2010/01/14/netflix-bringing-its-customers-out-of-the-closet/

Log in to post a comment