Build Algorithms Like You Give a Damn

WrangleConf Opening Source: Sam Charrington via Twitter

For the second year in a row, WrangleConf did not disappoint. The conversation picked up right where last year’s left off: on the ethics of our craft. Last year the focus was on the humans building algorithms and the humans whom algorithms affect. This year, the discussion expanded in scope to consider the growing number people who interact with data science teams.

With an eye toward the increasing presence of data science in our daily lives, the speakers were more focused than ever on strategies to build and maintain trust: opening communication, recognizing bias, and, well, giving a damn.

To make algorithms effective, we need effective communication

We all have expectations. They’re usually based on some form of data, even if they’re not based on explicit data analysis. The problem with our expectations is that they often carry bias. As consumers (of data science), this bias reduces our trust in recommendations that don’t jive with our priors. We’re quick to dismiss the result as wrong.

To bring this issue to life, Moritz Sudhof of Kanjoya highlighted a number of biases inherent in employee performance management. For instance, managers typically remember only the most recent events or they seek to confirm things they “already know” about an employee.

Imagine a manager conducting a review for one of her employees, James. If other employees rate James in a way that doesn’t square with the manager’s view of his performance, it’s going to be harder for her to trust the reviews. It’s easy for the manager to brush them off by saying that the algorithm that produced the highlighted reviews is wrong. The in-product experience of recommendations can make or break their usefulness. As data scientists, we need to partner with product people to present algorithms effective:

It’s not enough to just make recommendations. Kirstin Aschbacher of Jawbone illustrated how the language, timing, and focus of recommendations matters greatly.

Jawbone has many different ways to tell Jawbone UP users that they should eat better. It turns out that recommending healthy foods that people already like has a more positive effect on outcomes than discouraging people from eating unhealthy foods. The goal of both approaches is the same: in most cases, get people to eat fewer, better things. The difference—and much of the effectiveness—is in the positioning.

Both of these examples incorporate partnership with people outside the realm of data science. Model fit matters a lot less than real-world results. Throughout the day we heard from data scientists who are shipping increasingly useful algorithms when they’ve collaborated with people who are not traditionally involved in algorithm development.

Keep an eye on digital vulnerability

The scale of modern data products’ use and impact can introduce severe consequences, especially for marginalized populations. Chris Diehl of The Data Guild calls this Digital Vulnerability: anyone may be victim of unwitting disclosure of personal information or of a biased prediction algorithm.

We see exploitations of this vulnerability in the news all the time:

Abe Gong of Aspire Health later drove this point home when he said that “Algorithms are de facto gatekeepers to opportunity.” COMPAS, a proprietary algorithm used to predict prison recidivism and inform parole, may be deeply unfair to blacks. This algorithm literally determines if someone can be released from prison.

Compare this to an algorithm that determines retargeting for ads. We can opt-out of disclosing our personal information to this algorithm. We have a choice. Those evaluated by COMPAS do not. As our personal data increasingly is used to determine if we’ll be good homeowners, if we’ll be healthy enough for low-cost insurance, or if we’ll be successful employees, we need to monitor the bias of algorithms all the more carefully.

Handling the impact

Pete Skomoroch said it best: “If we don’t figure out how to handle these things better, they will be handled for us in ways we don’t like.”

Diehl suggests that we need “trusted implementations” that are open source and vetted by the widest community possible. Gong was a little more specific—he presented the idea of a data ethics review and challenged the audience to participate.

Both of them created public documents to keep the discussion ongoing:

Josh Wills of Slack pointed to Chuck Klosterman’s I Wear the Black Hat as traits to keep an eye on: “The villain is the person who knows the most but cares the least.” In data science, this is the person or company who has the most data, and doesn’t care about how its use impacts people. And there are people who don’t care—the most indifferent person in the industry will always set the standard. It’s up to us as data scientists not to be that person.

Looking at the #WrangleConf Twitter stream, it’s clear that this year’s ethics discussions fueled a lot of excitement. Don’t let that fire go out. As data scientists, let’s challenge ourselves to second-guess assumptions, identify and eliminate biases, and, above all, keep the conversation going.