Leveraging AI for legal discovery
Browse Blog Topics

Leveraging AI for legal discovery

E-discovery is a legal procedure for collecting and analyzing electronic documents. Large litigation cases may encompass large volumes of documents that cannot be efficiently reviewed by human experts alone.

Discovery costs account for 20 percent to 50 percent of all costs in federal litigation cases, and the total cost of discovery in federal and state litigation cases amounts to more than $26.3 billion per year.

E-discovery technologies leverage advances in big data and artificial intelligence (AI) to automate many aspects of legal discovery. Unlike AI developments in areas like autonomous vehicles, these advances complement rather than replace humans.

Many AI support tools use natural language processing techniques, which analyze the structure and meaning of language, to add intelligence to more conventional information retrieval technologies. These techniques enable some of the basic building blocks of e-discovery, like extracting important entities in a sentence and identifying the topic of a document.

Information retrieval technologies, like keyword searching and document databases, are also used in e-discovery but do not fall under the AI category.

Imagine you have just been tasked with reviewing a collection of hundreds of thousands of documents. Where do you begin?

You might start with creating a high-level overview of the different categories of document. Document clustering techniques can help with this, by grouping documents that have similar content. For example, emails, product design documents, and related marketing material about a specific product would be classified in the same group.

Hierarchical clustering can further organize these clusters into a hierarchical tree structure, which may help to navigate large document collections. Once related documents are clustered, topic modeling can be used to identify key concepts across the clustered groups and rank documents according to the importance of each concept to that document.

A cluster of documents about a meeting between two companies discussing a partnership might include topics such as product development, sales, accounting, markets, and strategy. From there, a document describing a proposed marketing campaign would rank high on concepts such as markets and strategy and low on accounting. Another document that discusses the cost of goods sold and margins would rank high on sales and accounting topics.

The third step in e-discovery analysis using natural language processing is named entity recognition (NER). This is the process of identifying nouns and noun phrases and then categorizing them into a small set of types, such a person, location, monetary amount, date, and organization. Like topic modeling and clustering, NER maps unstructured content to a more structured form such as a list of entities that appear in a document.

This allows for more precise searching than keywords because it is less subject to problems with words that are spelled the same but have different meanings. One could search for organizations that are banks, for instance, without retrieving emails in which someone refers to “banking on” a customer to commit to a sale.

NER can also help with improving search when acronyms are used. For example, “Bank of the United States” and “BoUS” could both be tagged as entities of type organization. This would require specialized tools for matching acronyms to their full names, as has been used in biomedical literature search.

Natural language processing techniques provide multiple levels of support for legal professionals. Clustering can help organize large document collections and provide something akin to a map of the domain. Topic modeling gives some insight into the key concepts of each document while NER extracts fine-grained details about important people, organizations, and events.

NexLP has created an e-discovery platform that leverages AI and natural language processing to interact with legal professionals. Automated techniques map from unstructured content, such as emails and document, to a more structured organization that lends itself to descriptive and analytic techniques. A key differentiator in NexLP’s platform, called Story Engine, is that it uses metadata about documents to construct a context for the text that is analyzed.

The DISCO e-discovery platform is a cloud-based solution designed to support the full lifecycle of e-discovery. The solution addresses some of the more mundane tasks in e-discovery, such as the need to rapidly ingest files, to the more advanced, such as representing the meaning of words. DISCO uses convolutional and recurrent neural networks that are suitable for multiple kinds of investigations.

AI does not have to function in isolation. Companies like NexLP and DISCO are creating e-discovery systems that offer a hybrid analysis workflow that leverages machine learning and natural language processing while enabling human experts to participate in the workflow.

The cost of legal discovery reaches into the tens of billions of dollars per year. Legal researchers are increasingly dependent on natural language processing tools, such as document clustering, topic modeling and named entity recognition. The most effective solutions leverage human expertise and AI to improve the efficiency and effectiveness of e-discovery.

Related Stories