On this episode of Ropes & Gray's Insights Lab's multi-part Multidimensional Data Reversion podcast series, Shannon Capone Kirk and David Yanofsky discuss how artificial intelligence and machine learning are being applied to legal investigations and document reviews. They explore the evolution from traditional search term methods to advanced techniques like predictive coding, continuous active learning, and the emerging role of generative AI ("GenAI") while demystifying what these techniques are actually doing with your data. The conversation highlights the importance of using plain language when describing these technologies, the critical role of human expertise in refining AI tools, and the practical challenges and efficiencies gained when integrating AI into internal investigations and privilege reviews. Tune in to gain insight into how legal teams are balancing innovation, accuracy, and defensibility as they adopt new data-driven approaches.
Transcript:
David Yanofsky: Hello, and welcome to Multidimensional Data Reversion, a show where we are digging into where data analysis intersects with the law. I'm David Yanofsky, director of data insights, analysis, and visualization in R&G Insights Lab.
Shannon Capone Kirk: And I'm Shannon Capone Kirk, managing principal and global head of Ropes & Gray's advanced e-discovery and AI strategy group.
David Yanofsky: I think about the history of AI—we were talking about sentiment analysis, various machine learning techniques, natural language processing—and the thing that always sticks in my head is that one of the biggest corpuses, the free text language training sets that exist out there is from the Enron scandal, all of Enron's emails. And so, I think in the legal context, the perfect thing to use some sort of AI analysis on is going to be accounting fraud. First off, is that right or wrong? And secondly, where else do you see the opportunities in terms of the types of investigations where AI is appropriate?
Shannon Capone Kirk: To your point, we, as lawyers in litigation in this realm have used the Enron dataset for a variety of tests when we were first doing predictive coding. It's the primary dataset that people test against because it's public and we know the results. So, what are the types of investigations that GenAI is good on? We are usually in the work of doing internal investigations for any host of corporate compliance issues—that could be accounting fraud, bribes, FCPA, FCA violations, a culture of MeToo problems, diversity issues, etc. We have been piloting a number of GenAI tools for data review over the last year and a half. We have now come to a place where we are actively using GenAI where the client consents and it's appropriate on internal investigations, as opposed to—and this is an important distinction—production reviews. That's where we have an obligation to give a complete set of responsive material to law enforcement, a regulatory body, or opposing counsel.
David Yanofsky: Any document missed is a huge problem.
Shannon Capone Kirk: It could be—there is less of a risk of that with internal investigations, because you're not looking for every responsive document, like you are in production reviews.
David Yanofsky: There are also aspects of internal investigations where the question you're trying to answer is, "Is there a cultural problem here?" It's not, "Who is the bad actor? And let's nail them." It's "We need to understand what's going on." We don't need every single document to answer that question—we just need a lot of them.
Shannon Capone Kirk: Or you need—to put a finer point on internal investigations, when I'm looking at an event or an allegation as opposed to the wider cultural question—just the answer to the question. Did this bribe occur? Was an inappropriate MeToo statement or action taken by this person or not? You don't need every document for that usually—you just need the answer to the question. We are focusing our expanded use of GenAI beyond just piloting to internal investigations where the client consents and it's a fairly confined scope of data and custodians. It's not just accounting fraud—"What is the investigative question that we want to answer?"
David Yanofsky: Were there sales in Iran? Did we send a product to a country that it shouldn't have gone to? Was a government official at the luxury resort with our sales team?
Shannon Capone Kirk: This particular regional manager claims that he had no knowledge of X, Y, Z whistleblowing allegation. Is that true or not? Well, you can pretty fairly quickly determine, "That's not true," if you find just one email communication. For those internal investigations, we are not just using GenAI just yet for all the reasons that I said at the start—we are now starting to layer in portions of GenAI in very select cases, again, where clients consent, in production reviews. It's not anything that is very prime time or widespread at all, in any way, shape, or form—really just more as a back-end expansion of our pilots to pressure test things.
David Yanofsky: But there are some machine learning strategies that are prime time, right?
Shannon Capone Kirk: Yes, for years, we have been using what is now called TAR 2.0 or also continuous active learning, which is predictive coding. I want to talk a little bit about the terminology here, because, to me, the legal world around e-discovery has, in some segments, done a disservice by using highfalutin' fancy words. And that troubles me, because it is something I feel really strongly about, that we should use plain language so that we have wider adoption, because these tools are better than just using search terms alone and not having any cohesive batching to documents. So, when I say things like "continuous active learning," "predictive coding," "TAR 2.0," don't those sound intimidating and will shut people down?
David Yanofsky: They are jargon. One of the most effective things that we can do as communicators is to not use jargon and to use plain language to describe what we're doing. Now, of course, some of these things are, like, brand names. Is "Coca-Cola" jargon? So, being cognizant when you're using jargon, I think, is really, really important, and for these tools and these methods, which have specific names, to not take for granted that people actually understand what the product is when you talk about it.
Shannon Capone Kirk: When I step into a corporate environment, and I sit with a team of people from cross-functional, and they start talking, using their own esoteric corporate acronyms—and they all know what they're talking about; they're using their jargon—inevitably, I'm the one in the room who interrupts the flow, and I'm like, "I have no idea what you're talking about. If all your other legal service providers are sitting here pretending like they know what you're talking about, they're lying to you. Nobody knows what you mean by the 'CCL department.'" So, that is why I would prefer to talk in plain terms, and because, frankly, machine learning, when applied to the law right now, is a really simple concept, David.
David Yanofsky: What is the simplest way to describe the types of machine learning strategies that courts are comfortable with the results of to accept as evidence?
Shannon Capone Kirk: Explaining it in simple terms is what I mean. Now, does it require expertise? Does it require specialized project managers and good technology? Absolutely—all of that is true. That's why you have specialists involved. I don't mean to undercut the importance of all of that, but when you're explaining it to a client or a court, using plain terms and no jargon is critical, and there is especially more heightened importance around that now that we are going to start using GenAI. What we've seen over the last year is folks just calling everything "AI," and you know that is one of my huge pet peeves, because while that may be true outside of the law, it really trips a lot of people up. What we want to talk about are the nuances in plain terms between machine learning, data analytics, and true GenAI.
David Yanofsky: Absolutely. I think that there are people who consider linear regressions to be AI or machine learning tactics, and those linear regressions are used at times in algorithms that are considered AI. It is one technique that is used, and there are dozens, if not more, of techniques that are also used that encompass AI. And so, being very specific about whether or not the thing is a calculation or a tactic, or a combination of activities, methods, tactics, or calculations is important despite the fact that it is harder to talk about it than just saying, "This is AI." Not every implementation of a type of AI is the same. The type of AI that is used to analyze what animal is in a photograph or even create a picture based on a text prompt, that is different from, "Identify a cluster of documents that is similar to this cluster of documents." We talk about all of those things as AI, but they are all using different tactics, different techniques, and different calculations to achieve that AI.
Shannon Capone Kirk: When we talk about predictive coding, TAR 2.0, or continuous active learning—all the same thing—all we're really doing there is machine learning. The simplest way to analogize it—we talked about this—is Pandora. And it's the same thing—the human is interacting with the data; the human's interacting with the music. In the review world, you are coding that document responsive, a machine learning tool will then go score the rest of the documents or the songs, and it really just organizes it for you. It just prioritizes the rest of the songs or the data in an order where the highest scores are the documents that are most likely relevant, because you, as the human, have told it, "This is the type of document that is relevant." What the tool is doing there is really linguistic-based in large part. It's looking at words, at context around words, and it is a symbiotic relationship between the subject matter expert human being and the powerful tool that runs on algorithms, etc., and gets better and better if you're doing continuous active learning. It's continuous—you are continually improving and working with that computer to deliver and serve to you documents that are high-scored, meaning most likely to be relevant.
David Yanofsky: The continuous algorithm aspect that we're talking about is, we have scored a bunch of documents whether they are responsive or not, this binary thing. We then use a tool that looks at all the words in all of the documents that were responsive and looks at all the words in all the documents that were not responsive and tries to guess, based on those words, which ones are indicators, or which combinations of them, or which combinations of either the words or other information that we have about the document about which of those combinations makes something responsive and which of those combinations makes something not responsive. Let's give this other set of documents a score based on that first association of words versus responsiveness, which ones we think are responsive and which ones we don't, and let's rank them. Now, let's go through them with a human and do the exercise again. Say, "Yes, algorithm, you were correct," or, "Yes, algorithm, you were wrong." We ask the algorithm to say, "Now, look at this set again. We've corrected you." Let's refine continuously through this process what associations between words, between features, between metadata are responsive and which are not.
Shannon Capone Kirk: Look at this progression that we've talked about over the course of our podcast for document review. It has gone from literal search term hits—you collect a million documents, you run search terms—but really, all you review is the output of items that hit exactly on those search terms or phrases. That's your world for review. Then, we move to where we are now, which is what we're talking about: continuous active learning, where it's machine learning, and, yes, it's still largely linguistic-based but based on not necessarily literal search term hits and instead context, metadata, and this continual improvement by this relationship between the human subject matter expert and the computer. And so, that gets us away from being so confined to literal search term hits. Now, we're at GenAI, which is an even further expansion on basically being less tied to such a confined search-term-hit world.
In months of pilots, we have learned that it is critical to have a subject matter expert human being involved at every single stage of refining and perfecting that prompt or series of prompts. It takes work. It takes thought. It takes judgment. It takes a person who really knows the case and the documents. No computer can do that yet. And that person is also doing sampling and validation. So, that's still there. What's powerful and exciting that we've seen in the latest pilot that we're conducting is we worked on, with a subject matter expert, getting to a place where the prompt was good compared to without GenAI, and we were getting good precision and recall numbers, etc., coming out of it. I said to the data expert, the software expert and the subject matter expert, "You've gone through a few rounds of these, and one of the rounds of improvements was where you fed into the prompt all of the names of the lawyers involved," because we were testing for privilege. Could GenAI detect privileged documents in a bunch of unstructured emails and documents? The most expensive part of document reviews is privilege review, so everybody is focused on trying to find a way that GenAI works to streamline that really critical and expensive process.
After they fed in all of the lawyers' names—and this is how the report came to me—the metrics were much better and showed a pretty good correlation to the prior privilege review that was accurate (that's what we were judging against) that didn't have GenAI. And so, I said, "Hold up. Is the tool just literally flagging any document with these lawyers' names on it and saying that it's privileged or potentially privileged?" Because, to me, that just sounds like running a search term. And they said, "No, here's the full prompt." And this is what was exciting to me. Yes, it had all the lawyers' names, but then, the prompt went on, and it actually said, "Just because a lawyer's name is on a document does not mean it's privileged. It would have to have these additional factors," and the prompt goes on and on. Through the various rounds in this pilot, that's what they came to—that was the conclusion, the best result. It's a really important demonstration, in my mind, to explaining—I'm getting back to the language that we use—why it is critical, I think, that we in the legal community not just call everything "AI," because you can see there are very big differences between that predictive coding back-and-forth activity, which is accepted by the courts and we know how to run metrics on it, to these early days of figuring out how to build prompts with subject matter experts. And then, how do we test those to demonstrate to our opponents and the courts that those results are accurate?
David Yanofsky: Absolutely critical not to call everything "AI," and absolutely critical, to your point, that we show metrics, that we show that we can test and study as we go, that GenAI is at least as good as human reviewers or perhaps at least as good as bad human reviewers. There are discovery processes out there where people are not as considered as perhaps we are, and that yields a result that perhaps GenAI could beat in easy ways, but to be able to quantify that is important, so taking a sample of documents or, like you said, having examples where we have done the same thing two ways and be able to compare and contrast the differences.
You've mentioned before how having more people, more reviewers can sometimes take longer than fewer reviewers, and you talked about with GenAI just now how there are still humans involved. Let's get into the details a little bit around how more reviewers doesn't necessarily mean faster work, but also, how new tools like GenAI can bring efficiency to the process. What are the stages of a typical review, where we can add or subtract people from?
Shannon Capone Kirk: First is just getting the collection right and the right scope. That doesn't require a lot of people—it just requires people to know how to collect data in a defensible way. The next stage is what we call first-level review. When it comes down to it, it's a rinse-repeat process. You've got millions of documents, and you've got to get through a good chunk of them in a certain time frame. So, you triage it, and your first rinse-repeat is the contract attorneys usually, in our litigation. In that area, that's where you want to make sure that you are calibrating the right amount of contract attorneys that you really need and not overstaffing. It's at that stage where overstaffing can lead to inconsistencies and, therefore, more need for QC to fix errors. It also creates a bottleneck or chokepoint for the QC team, that is inevitably always going to be much smaller.
David Yanofsky: So, collect, first level, then QC. If you have too much inconsistency in first level, your QC just...
Shannon Capone Kirk: It slows way down because then you have to essentially go back to the drawing board, retrain the first-level reviewers, and so, it's this cycle. Ideally, you would have the right size contract attorney team, they'd have great metrics on errors—we do keep those—and then, you would almost simultaneously be doing QC anyway with your QC team.
David Yanofsky: It's not just a longer time frame—you also end up with duplicative work, right, if you have to retrain?
Shannon Capone Kirk: You get duplicative work, and inevitably, the truth is, no matter what anybody says, the more contract attorneys you have, the more you ultimately have to find the folks who are having the highest error rates, weeding those out, finding new people, retraining them, so it's just more management, more downtime, etc. So, that's where I meant adding more people, etc. The QC team is usually, nowadays, some combination of firm attorneys and highest-performing contract attorneys, some mixture or one or the other, and then you get it out the door. QC's always going to be there. I cannot imagine a world where we just allow GenAI to do the whole review and there's no QC—that makes no sense.
Where we definitely see saving costs is layering in GenAI for triaging and helping to organize us in an amplified way compared to how machine learning is triaging and organizing for us. So now, in addition to the machine learning highest score and lowest score of the documents, we would also use refining prompts to identify privileged material, and then you QC that material, and you get comfortable. "The computer was correct in this segment of the population. We're comfortable with that. We'll set that aside." It won't be all the potentially privileged documents, but now, your world of QC and review for privilege is that much smaller. Instead of, for example, 10,000 documents that might hit on attorney names and other privileged terms and concepts, you've now used prompts, like the one I just explained, to look at that 10,000 and identify within that which ones are truly privileged. You then do some statistical sampling of that, log it, and then you deal with the remainder that you're not so comfortable with in a more traditional way. So, it's just a way to further reduce the amount of human QC time for the subject matter experts to move through.
David Yanofsky: Yes, it's a tool like any other. A new tool has come into the world. We are going to end up with people who are particularly adept at using that tool, just like there are people particularly adept at using a hammer, saw, pencil, or a paintbrush. And those people are going to be able to use this tool to great effect while other people, just like when a non-trained person or someone who hasn't practiced using a paintbrush, crayon, pencil, or a hammer does not get the same results as a carpenter, or an artist, or as a printmaker or whomever.
Shannon Capone Kirk: Such a great analogy—I really like it.
David Yanofsky: That's going to be it for Multidimensional Data Reversion for today, so be sure to subscribe wherever you get your podcasts. I'm David Yanofsky.
Shannon Capone Kirk: And I'm Shannon Capone Kirk. Thank you for listening.
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.