In high-stakes litigation, the amount of data and data analysis required to develop and support expert testimony is growing rapidly. At the same time, there has been an explosion in the number of database platforms and software tools used in big data analytics. Given that expert discovery timelines have not expanded to reflect this increased size and complexity, these trends have important implications for the computing resources and expertise required, putting a premium on the ability of experts and consultants to efficiently and flexibly analyze enormous databases from varied sources.

The Evolution of Big Data Use in Litigation

Over the last five years, companies in virtually every industry have amassed increasing amounts of data in the normal course of business. The sheer size of the data and the need to better manage and utilize it have created emerging fields of data analytics and data science. As big data has spilled into the litigation sphere, attorneys increasingly need economics and finance experts and consultants who can manage, produce and analyze large datasets efficiently, securely and accurately.

To put things in perspective, a decade ago, complex, high-stakes matters typically involved analyzing datasets containing millions of records at most. Today, datasets with billions of records are commonly used in empirical analysis of the economic and financial issues that arise in litigation and investigations.

Historically, it has been a common practice for researchers to conserve computing resources by analyzing only a sample of the data available. For example, an expert in a health care provider fraud or False Claims Act matter might draw and analyze a statistically valid sample of the relevant claims data and extrapolate the findings based on analyzing the sample to the remainder of the data. As available tools have made it frequently possible to analyze all of the relevant data, some courts have rejected statistical extrapolation methods, especially in liability analysis or at the class certification stage.

Navigating Big Data in Litigation

The data used to support expert consulting and testimony are frequently derived from both public and private sources. Managing the discovery process for data produced by the parties often requires a collaborative effort among clients, counsel, experts and consultants. Litigators can influence the effectiveness of this process by keeping the big picture in mind, setting clear goals, involving inside counsel and facilitating communication between the consulting experts and client. Because the questions relevant to litigation may not arise in the normal course of business, companies often do not organize data collection in a manner that matches the needs of expert testimony. For a company to effectively respond in the discovery process, and for an expert to make effective use of the relevant information, data often must be pulled from multiple client platforms using customized code. At this stage, it is important to minimize business disruption and cost to the client, maintain the security of client information, and ensure accurate and responsive data. Using secure computing facilities and sophisticated techniques is one way to mitigate these risks.

In addition to being sourced from various platforms, relevant datasets are frequently compiled from disparate sources and parties and maintained in incompatible formats. Experienced data analytic consultants who are also able to integrate and build the data for analysis are now required for client data production.

Increasingly, the complexity and scale of these datasets also requires a change in the technology that experts and consultants use to analyze and store them. The tools required are evolving from traditional database management systems and desktop statistics packages to sophisticated, high-performance and high-throughput hardware and software. Using these computing resources and techniques to efficiently and accurately analyze billions of records requires specialized skill in big data technologies, platforms and programming tools. Because expert discovery deadlines generally have not expanded to allow for more intensive and complex data analysis, access to a team of dedicated data analytics specialists with the resources necessary to meet these deadlines has become essential in many litigation contexts.

For example, a number of recent, prominent cases involving allegations of price fixing or collusion in financial markets have required analysis of massive databases of trades, prices and quotes. These cases have included matters related to equity, fixed income, options, credit default swaps and interest rate benchmarks. The analysis of terabytes of data for these types of cases requires using massively parallel-processing techniques across dozens or even hundreds of computers, as opposed to a single server. Similarly, matters relating to various aspects of high-frequency and low-latency trading or evaluation of customer order execution have required investigation of complex intraday trading and quote data as well as forensic market reconstructions.

Data Analytics/Science Experts

The increased reliance on big data has also led to increased use of experts with more specialized skills in data science, statistics or marketing science in addition to economics or finance. Such experts come from a range of disciplines and bring focused expertise on different functional (e.g., structural modeling, text analytics or sampling) or industry areas (e.g., health care claims data, financial data or web data).

Justin McCrary is a professor of law at the University of California, Berkeley. He is also the founding director of the University of California, Berkeley Social Sciences Data Laboratory. The lab supports the use of big data in social science research by providing physical computing infrastructure; archiving public and confidential data collected by government, industry and the academy; and training on software and research methods. As an expert at the intersection of law and economics, Professor McCrary has a front row seat for the changes sweeping through expert testimony and support.

"The possibilities for showing relevant facts and testing the important questions have increased in academia and litigation alike, and it goes hand in hand with the availability of more information created and maintained in businesses," said Professor McCrary. "To make effective use of the opportunities created by this new information, you must play on a new field in terms of the advanced computing platforms and techniques required, as well as the need for more specialists and support teams experienced with these technologies."

One area where this is playing out is the increased use of text analytics and content analysis in intellectual property and consumer protection litigation. These methods lie at the intersection of data science and marketing science and have already become widespread in the e-discovery world for assisting or automating document review. Their use in expert work opens up new avenues for analysis of public press and user-generated content.

Anindya Ghose is a professor of information, operations and management sciences and a professor of marketing at New York University's Leonard N. Stern School of Business. According to Professor Ghose, "Text analytics or text mining has wide applicability in litigation, ranging from assessing the potential exposure to and significance of information or product features, the volume and tone of messaging and whether a disclosure is sufficient or not."

Methods, Platforms and Software Tools

Big data is not just big, it is also complex as more detailed and nuanced information is available than ever before. This has led to an increased opportunity for experts and consultants to bring new, effective and sophisticated empirical modeling techniques to bear in analyzing complex business and financial data in the litigation setting. Taking advantage of this opportunity involves applying evolving statistical and econometric methods that include regression analysis, time series analysis, forecasting, simulations and transactional analysis. Increasingly, it also requires high-performance computing, leveraging dozens, hundreds and even thousands of processors as well as structural modeling, machine learning and text analytics.

It is not surprising that the size of data collections is not the only thing growing exponentially in this space. The fundamental fact is that if one analyzes larger datasets using the same tools appropriate for smaller datasets, computing time will be much longer. This fact has led to a rapid expansion in the number of platforms and software tools used to analyze data. Whereas in prior decades, most economic experts and consultants used a handful of desktop statistical packages to perform all data management and analysis on cases, big data combined with the time-sensitive nature of litigation has led experts and consultants to seek out new solutions. This environment is experiencing rapid change and increased variation.

The rapid change has unfortunately led to challenges in the litigation context. Given the number of new and evolving technologies, there is no practical way to anticipate what platforms and software tools will be used in a case, or to know ahead of reports being served what tools the opposing expert is using. This underscores the need for dedicated data analytics consultants trained in a range of new technologies and avoids the unwelcome scenario where an inexperienced consulting team is trying to learn new platforms and statistical packages in a very tight time frame.

Originally published by Law360, New York.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.