E-disclosure predictive coding

1 Comment

The use of technology in litigation disclosure looks set to become more common following a landmark judgment, writes Susan Monty.

In Pyrrho Investments Ltd and another v MWB Property Ltd and others [2016] EWHC 256 (Ch), the use of predictive coding in electronic disclosure was judicially approved for the first time in a reported UK decision. This heralds the increased use of advanced analytical techniques in litigation disclosure.

Disclosure is expensive. In this case, with two claimants and five defendants, there were potentially more than 17.6m documents in electronic format. Even after de-duplication, this number was only whittled down to 3.1m.

Following months of correspondence, the parties reached a case management agreement, subject to court approval, providing for the use of keywords and predictive coding. While this type of agreement is common in the US (Moore v Publicis Groupe [2012]) and was recently approved by the Irish High Court in Irish Bank Resolution Corporation Ltd and others v Sean Quinn and others [2015], Master Matthews (pictured) noted that in the UK ‘there is not a great deal by way of guidance, and nothing by way of authority, on the use of such software as part of the disclosure process’.

Electronic disclosure

Disclosure is governed by the Civil Procedure Rules (part 31) and practice directions PD31A and B. These must be applied in accordance with the ‘overriding objective’ as set out in CPR 1.1(1) ‘enabling the court to deal with cases justly and at a proportionate cost’.

Rule 31.7 requires each party to make a ‘reasonable search’ for ‘disclosable documents’. A ‘document’ includes a computer file. E-disclosure is the disclosure of electronically stored information.

The factors relevant in deciding the reasonableness of a search (CPR 31.7.21) include: (a) the number of documents involved; (b) the nature and complexity of the proceedings; (c) the ease and expense of retrieval of any particular document; and (d) the significance of any document which is likely to be located during the search.

The master noted that what matters most in the disclosure process is the scope and quality of the search itself, as opposed to the listing and production for inspection of the relevant documents discovered.

The question of how the search should be carried out is addressed in PDB 31.7.25, which provides that it could be considered reasonable to search for electronic documents ‘by means of keyword searches or other automated methods of searching’, if a full manual review would be ‘unreasonable’ in the circumstances. PDB recognises that keyword searches may be unsuitable if they find excessive quantities of irrelevant documents (for example, by duplication of documents in email and ‘cc’ chains), or fail to find important documents which ought to be disclosed (PDB 31.7.26). In such circumstances the parties should consider supplementing automated searches with ‘additional techniques’ (such as individual review of key documents), and taking ‘such other steps as may be required to justify the selection to the court’ (PDB 31.7.27).

Predictive coding

Predictive coding is a process which involves the review of documents using computer algorithms to return likely relevant documents based on the selection of relevant documents by a human subject expert. By using algorithms, any increase in the number of documents does not necessarily increase costs, as compared with a manual review. To achieve this:

The parties agree a predictive coding protocol defining data size, reviewers, margin of error and criteria for inclusion of documents (date range, custodians, key words).
A party uploads the electronic documents on to an electronic review platform which excludes incompatible documents (image files, audio, password protected, corrupt files).
A representative sample is selected to ‘train’ the software.
Documents produced from the sample are reviewed manually by a case lawyer and categorised for relevance.
The sample is analysed by the predictive coding software which ‘scores’ documents for relevance (for example, common concepts and language used) to the issues in the case.
Additional computer statistical sampling checks validate the sample for quality assurance electronically.
Based on the training that the software has received, it then proceeds to review and categorise each individual document in the entire document set as either relevant or not relevant.
The results of the categorisation exercise are then validated through several quality assurance exercises based on statistical sampling.

Application to Pyrrho

Master Matthews listed 10 reasons that predictive coding was beneficial and found ‘no factors of any weight pointing in the opposite direction’.

1. ‘Experience in other jurisdictions, while so far limited, has been that predictive coding software can be useful in appropriate cases.’

2. ‘There is no evidence to show that the use of predictive coding software leads to less accurate disclosure being given than manual review alone or keyword searches and manual review combined.’

3. ‘There will be greater consistency in using the computer to apply the approach of a senior lawyer towards the initial sample (as refined) to the whole document set, than in using dozens, perhaps hundreds, of lower-grade fee-earners, each seeking independently to apply the relevant criteria in relation to individual documents.’

4. ‘There is nothing in the CPR or practice directions to prohibit the use of such software.’

5. ‘The number of electronic documents which must be considered for relevance and possible disclosure in the present case is huge, over 3m.’

6. ‘The cost of manually searching these documents would be enormous, amounting to several million pounds at least. In my judgment, therefore, a full manual review of each document would be “unreasonable” within paragraph 25 of practice direction B to part 31, at least where a suitable automated alternative exists at a lower cost.’

7. ‘The costs of using predictive coding software would depend on various factors, including importantly whether the number of documents is reduced by keyword searches… of course there may be additional costs if manual reviews still need to be carried out when the software has done its best.’

8. ‘The “value” of the claims made in this litigation is in the tens of millions of pounds. In my judgment the estimated costs of using the software are proportionate.’

9. ‘The trial in the present case is not until June 2017, so there would be plenty of time to consider other disclosure methods if for any reason the predictive software route turned out to be unsatisfactory.’

10. ‘The parties have agreed on the use of the software, and also how to use it, subject only to the approval of the court.’ Significant agreement had been reached between the parties as to key components of disclosure-date ranges, key words and relevant custodians of data.

Conclusions

It will now be more difficult for parties to hide behind the provisions of CPR 31.6 (which limits the amount of disclosure to be given by one party) on the basis that it would be disproportionate to engage in an expensive trawl through large amounts of data.

The courts may need to adapt to allow increased use of e-platforms, although it may be some time before we can dispense with core bundles and lever arch files.

For law firms, it is about establishing a good relationship with litigation partners able to provide a full range of e-disclosure services, including predictive coding. Finding a partner with the best software and support staff is critical.

Lawyers must be trained as subject experts capable of undertaking to review sample documents produced by the initial predictive coding trawl.

The likely cost of hosting, processing and reviewing documents produced by predictive coding must be considered and assessed at the outset of litigation as part of any budgeting exercise. For any party likely to face a large e-disclosure exercise, cashflow is important. Although predictive coding is likely to be considerably less expensive than a manual review, now that predictive coding has been given the green light it is much more likely that litigants will be faced with large disclosure exercises which they may not have faced previously.

The question of privileged documents was not addressed in the judgment, but in theory could be excluded by an initial trawl using the predictive coding software for potentially privileged domain names and email addresses of individuals or lawyers. Documents responding to such terms could then be ringfenced for addressing between the parties as usual.

Pyrrho was particularly suitable for the application of predictive coding, since the size of the claim versus the cost of disclosure fitted the Jackson principles of a balance between proportionality and cost management.

Susan Monty is a partner at Simons Muirhead & Burton, which is acting for one of the defendants in Pyrrho

Topics

Practice points

1 Comment

E-disclosure predictive coding

Topics

Related articles

Victim compensation and the limits of POCA

AI and the evolving role of solicitors

Glass half empty or full? Where now for the SFO?

Premier job

1 Reader's comment

Only registered users can comment on this article.

More from Practice points

Sound judgement needed in specialist claims

Equity by design: integrating EDI

Commission seeks to reshape outdated law

Recommended services

Law Society Learning

Bookshop

Events

Online library