AI tools could be reaching the stage where they could be used as a cross-check for lawyers working on a legal issue, a new report has concluded.
Testing by magic circle firm Linklaters found marked improvements in how large language model (LLM) machines were able to answer questions about different areas of legal practice.
The responses were still not always right and lacked nuance, despite advances having been made. The firm’s report on its findings said AI tools should not be used for English law legal advice without expert human supervision.
But the report added: ‘If that expert supervision is available, they are getting to the stage where they could be useful, for example by creating a first draft or as a cross-check.
‘This is particularly the case for tasks that involve summarising relatively well-known areas of law. In contrast, their ability to apply the law to the facts or interpret clauses is less good.’
Researchers asked questions that would require advice from a competent mid-level (two years’ post qualification experience) lawyer, specialised in that practice area.
The LLMs’ answers were marked out of 10 by senior Linklaters lawyers, comprising five marks for substance, three for whether the answer was supported by relevant statute or case law, and two marks for clarity.
Read more
The last benchmarking exercise in October 2023 had revealed major flaws in the models tested, with mostly wrong answers and some fictional citations. Linklaters said it would change its methodology after feedback to offer more sophisticated prompt engineering.
In the latest testing, both Gemini 2.0 and OpenAI o1 scored at least six out of 10 and showed material increases in the scores for substance and the accuracy of citations. GPT 4 scored just 3.2 out of 10, recording just one out of five for the substance of its answers.
Lawyers noted that it felt like the AI tools had ‘tried too hard’ – in many cases producing the right answer but alongside a lot of extra and duplicative material. The report said that one potential problem is that the ‘eagerness’ of the models to provide a clear answer led to them overstating the confidence of their advice.
The report concluded that even if flaws in AI technology are ironed out, this does not necessarily mean that human involvement is removed.
It added: ‘Breaking the client’s requirements down into a series of legal steps that will achieve the client’s aim with the minimum effort, expense and uncertainty is the interesting and creative part of being a lawyer. Answering nutshell questions is the easy bit.’
7 Readers' comments