Jim DeGraw, Ropes & Gray IP transactions partner and co-leader of the firm's digital health initiative, discusses the importance of discriminating when using big datasets.


Hi, I'm Jim DeGraw, I'm a partner in Ropes & Gray's San Francisco office. My work is in the technology area and I work with clients an awful lot with data and data issues. I also co-lead our digital health initiative here at the firm. Today, I want to talk to you about the importance of being discriminating when you're using big datasets.

Misuse of data is going to get you in trouble if you don't understand the fundamental points about what your data is doing, or more importantly, what your data is saying, and where your data came from. Before you even think of a dataset, I want you to think of two books. In fact, I want you to read these two books. The first is Factfulness by Hans Rosling. One of the things that I want you to keep in mind is that he says that data naturally, in many environments, curves. That is, there's no such thing really as a straight line of facts; straight lines are pretty rare. And oftentimes we get trapped into thinking that if something's going to go in a particular way, it will always go in that particular way. So think of your child's growth charts. Did your child grow at the same pace as they did in the first three years of their life? Mine didn't, they're still my height, right? But, you know, they would be 10-20 feet tall if they had grown at the pace they did when they were infants, right? If you look at the different measuring points of when you're measuring something, you might see a straight line for a piece, but that doesn't mean it's always going to be a straight line. So, the first question you have to ask yourself is: Have I looked at all the data points and do I understand the full range of elements that are being fed into this particular piece of data or particular result that I'm looking at?

The second book I want you to read, and this is one of my favorites, is Michael Lewis's The Undoing Project. It's the history of behavioral economics. It's a great read. The important takeaway there is, again, it's your assumptions, but there it's how you ask the question matters a lot. People have inherent biases, they just do. If you ask someone the same question two different ways, you can get, and often do get, two different answers. And so it's really important to think about your assumptions, right? Are you looking at the full dataset and have you asked how the question's been asked? The number one thing that drives the consumer economy, the consumer-lending economy is your FICO score, your credit score. The higher it is, the better off you are; the cheaper your credit is, the more credits available to you. Well, it turns out if you map your FICO score to the year in which you were born, it's almost, despite what I just said about Hans Rosling's book, a straight line up. The older you are, the higher your credit score is. Why? It has nothing to do with the data itself – it has to do with the sources of the data. The question should be asked: What are we basing this on? We're basing it upon someone's credit history, and that's an inherent bias in the system itself. So, what does that mean from a legal perspective? Well, there's a number of people who are actually looking at this and they're changing the model. The CFPB has allowed one company to try a new lending model that's looking at different behavioral patterns to see if whether or not they're good indicators of future credit risk worthiness. That's awesome!

The overall story here is you should be discriminating in the data that you're using to make sure that there's no inherent bias in it, or if there is, that you're correcting for it because you don't want to be on the receiving end as we've already seen plaintiff class action attorneys do launching discrimination suits over the use of certain datasets. Make sure you understand which question you're asking and how it's being asked, and blow away the assumptions about how the world exists because those are sometimes eco effects of what we saw in the past that we're trying to change with our new business models.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.