Technology Assisted Review (TAR), or predictive coding, is an alternative to the traditional manual review of documents.
- The manual review by senior members of a legal team of a small subset of documents to identify whether they belong to certain categories (relevant, not relevant, privileged, etc.)
- Computer analysis to apply the characteristics of the subset to the full population of documents to group them into the same categories. The resulting document set is consistently categorised using a process which is both auditable and repeatable.
Computer Assisted Review (CAR) and Technology Assisted Review (TAR) are interchangeable terms.
More information on TAR is in our publications
(TAR and CAR are interchangeable terms)
To some it is bewildering that TAR has not been more widely adopted. Cynics might suggest that lawyers have fees to lose. But we believe it's not as simple as that. The complex statistics involved in some of the TAR protocols used, together with a natural level of inertia and fear of the unknown, have combined to slow the acceptance and use of TAR. Another key factor is that TAR requires an acceptance that any review process is imperfect.
At KordaMentha Forensic we maintain the position that a more intuitive non-statistical TAR protocol will always be more palatable. Recent studies show that human-aligned protocols are both more workable and more efficient.
A recent article1 by Maura Grossman and Gordon Cormack on the current state of TAR in the electronic document review marketplace, points out that:
- There are a number of protocols which can be used when undertaking TAR. New protocols recently developed can be even more efficient in reviewing documents.
- Different TAR products use algorithms to perform TAR. They discuss the effectiveness of the various algorithms.
Grossman and Cormack are leading the research into TAR. They have published a number of ground-breaking studies on document review. Their latest study2 in late 2014 compares the effectiveness of the protocols currently available.
2 The adoption of TAR in the United States has been slow
Grossman and Cormack lament that the adoption of TAR has been very slow in the USA despite strong judicial support, and the significant cost and time advantages. They suggest that the protocols and algorithms in the most commonly used protocols – such as Simple Active Learning (SAL) and Simple Passive Learning (SPL) – contain complex statistical vocabulary and rituals which dissuade practitioners from using TAR. However, none of them are essential. They argue that using a new, simpler protocol, more resembling a Web-search methodology, will encourage greater adoption of the technology.
Continuous Active Learning (CAL) is more efficient
The CAL protocol removes the complexity of statistical control sets, random samples etc. Instead, it relies on the ongoing stream of documents coded by reviewers. The TAR algorithm uses this coding to continually re-define the documents that are presented for review and further coding, until the legal team is comfortable that they have identified and reviewed the potentially relevant documents.
The research by Grossman and Cormack showed that CAL produces a much more efficient form of TAR. Using CAL, manual review was lower, but the number of relevant documents found was higher. The average saving in the manually reviewed documents was 5%. This represented an average of 36,250 documents per case: the equivalent of 72.53 lawyer review days. These savings are over and above the significant savings that can be achieved by moving from traditional manual review to the earlier protocols of TAR (often referred to as TAR version 1.0).
3 How does CAL differ to other forms of TAR?
Using the CAL protocol both reduces and simplifies the steps involved in the process when compared to SAL or SPL: see the Appendix.
As well as being more efficient, CAL has a number of other benefits:
|CAL is more flexible when introducing new documents to a corpus||The control set for SAL or SPL needs to be a statistical representation of the corpus. If the corpus changes, a new control set is needed to be a statistical representation of the new corpus.|
|CAL is more flexible if the criteria for a relevant document change during the legal proceedings||If criteria change, the process of creating a control needs to be created from the start again.|
|The legal team does not need to pre-determine an acceptable level of risk. Following the CAL process, the legal team continues to review documents and train the algorithm until they are comfortable that they have reviewed the potentially relevant documents.||The SAL and SPL protocols require the legal team to determine an F-Score which is measure of the recall and precision the legal team wish to accept. In essence this is measure of how much error (not finding relevant documents) is acceptable. This is traditionally something that legal teams have struggled with.|
|No need to create a control set||Control sets often encounter problems. For example, selecting a control set which turns out not to contain even one relevant document, thereby rendering the control set useless for the SAL protocol.In our experience it is common to create many control sets which fail. This destroys much of the benefit of using TAR|
|No need to create random samples||As with control sets, random samples with a large corpus will often not include any relevant documents. While further training of the algorithm can occur, it is not very efficient unless relevant documents are included in conducting the re-training.|
|CAL gives the legal team much more control over the process.||Rather than the statistical formula telling the legal team when to stop reviewing documents, the decision is made by the legal team.This allows the reviewers to quickly identify legally significant documents, and to adapt the process when new documents are added, or new issues or interpretations arise.|
KordaMentha Forensic's experience using TAR is consistent with Grossman's and Cormack's findings. In practice, using random sampling in large corpuses of data becomes very inefficient, especially if there are few relevant documents in the corpus. Often random samples will contain no relevant documents at all to further enhance the training algorithm.
Interestingly, the CAL protocol follows generally accepted methods of implementing artificial intelligence (AI). 'Deep learning' AI algorithms work by a human telling the AI algorithm what he or she thinks is correct or important, based on a small set of documents. The AI algorithm uses this input to analyse all of the data to determine what is correct or important and what is not. This is undertaken as an iterative process similar to CAL.
4 KordaMentha Forensic's Input to the CAL process
KordaMentha Forensic has been using a form of CAL, which we call Continuous Review, when implementing TAR. As part of our Continuous Review protocol we determine the next sets of documents to review, as part of the ongoing training process, using four key criteria to identify documents which;
- Are categorised as 'highly relevant' by the software.
- Are on the threshold of being categorised relevant or not relevant by the software.
- Have been tagged as 'non-relevant' by a reviewer, but which, based on analytics, appear to contain concept and textual similarities to documents which were tagged as relevant by a reviewer.
- Based on analytics, show volatility in categorisation over a number of training rounds. For example where a document moves from being categorised as relevant to not relevant and back again over a number of training rounds.
Reviewing these types of documents will improve the accuracy of the results from the algorithm and allow the legal team to see the documents being identified as most likely to be relevant by the algorithms, and the issues that these documents raise.
5 Not All TAR algorithms are the same
Different eDiscovery tools use different underlying algorithms to perform TAR. Grossman and Cormack also compare the effectiveness of the different types of algorithm.
We believe that simplified and intuitive CAL protocols and workflows, such as our 'continuous review', will help to remove many of the current barriers – real or perceived – to the legal profession embracing TAR. Ongoing cost pressures from general counsel will also help to encourage litigators to consider TAR. Further, the Australian judiciary is showing increasing interest in the use of these sorts of technologies to ensure that discovery/disclosure is undertaken in a proportionate manner. We believe that a successful Australian test case on TAR is unlikely to be far away as the eDiscovery revolution continues.
1Grossman, Maura R and Cormack, Gordon V: Continuous Active Learning For TAR, April/May 2016 E-Discovery Bulletin.
2Maura R. Grossman & Gordon V. Cormack: Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery, 2014 Proceedings of the 37th Ann. Int'l ACM SIGIR Conf. on Research & Dev. in Info. Retrieval, 153-62 (2014).
3Based on a reviewer reviewing 500 documents per day.
4Recall – The fraction of Relevant Documents that are identified as Relevant by a search or review effort.
5Precision – The fraction of Documents identified as Relevant by a search or review effort that are in fact relevant.
6Based on Grossman, Maura R and Cormack, Gordon V; Continuous Active Learning For TAR, April/May 2016 E-Discovery Bulletin.
7Irish Bank Resolution Corporation Ltd & Oors -v- Quinn & Oors  IEHC 175.
8Da Silva Moore v. Publicis Groupe, Case No. 1:11-cv-01279.
9Overturns are documents which have been predicted by the algorithm as relevant but after another round of training the document is re-predicted as not relevant.
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.