Search tools and methodologies have numerous applications during the e-discovery phase of the litigation life cycle. The following provides a real life example of the processes and challenges related to using search and how these challenges can be mitigated.
Attorney Jack Reacher is working on a new case involving a motorcycle accident. The plaintiff is claiming that his local garage failed to spot the leaking brake fluid from his BMW 1150 when he was at the garage for maintenance, which caused a mechanical failure leading to the accident. Attorney Reacher, who is representing the defendant garage, has a database containing thousands of documents, including email to and from the plaintiff and the defendant, email from a mailing list for motorcycle aficionados that both plaintiff and defendant participated in, and OCR’d documents including maintenance records and receipts from the garage.
Attorney Reacher wishes to find the responsive documents as quickly as possible, without having to read each of the thousands of documents in the database. In addition, he wants to be sure he does not fail to mark any privileged documents appropriately. The attorney has access to INDICA eDiscovery, so he decides to use it. Being familiar with Google, he enters the following search terms:
motorcycle maintenance records
This type of search, using a list of key words, is known as a keyword search. The set of documents that are returned all contain at least one of the words in his list of search terms. This is his result set.
Indica highlights the words that matched his search when he looks at the documents. He notices that the results contain maintenance records for cars as well as motorcycles, and in a couple of cases contained email about music, not maintenance, records. He changes his search terms to:
motorcycle AND “maintenance records”
Putting in the quotation marks means that his result set will only contain matches for the entire phrase contained within the quotation marks, not for the individual words. When he adds the word AND into his search query, he created what is known as a Boolean search. This means a search contains the words AND, OR, or NOT, and tells the search engine more about what you mean when you enter your terms. In Attorney Reacher’s case, this means his result set will contain documents that have both the word motorcycle in them as well as the phrase “maintenance records”.
The documents in this result set have maintenance records for all sorts of different kinds of motorcycles, so Attorney Reacher changes his search terms to be more specific. He also limits his search so that the search will only run against the current subset of the documents. This is called a subset search. This time he enters:
“BMW 1150” AND brakes
Attorney Reacher realizes from looking at previous documents that entering just “brakes” won’t cover enough options, so he turns on the stemming option and runs his search again. Having stemming turned on means that not only documents containing the word brakes will match his search, but also documents containing brake, braking, and braked.
Attorney Reacher is pleased with the results he obtains from this search, but he soon realizes that he doesn’t see any of the actual maintenance records from his client’s garage. These were paper copies, so they were first scanned, then run through an OCR process to convert the text into a machine-readable form, before being loaded into Indica for searching. The problem with OCR text taken from a scanned document is that sometimes the OCR misses a letter or two, so searches might not match.
Attorney Reacher runs his search again, this time turning on the fuzzy search option. Fuzzy search allows a search term to match other terms that don’t match exactly, but might be off by a letter or two. This way, if the word brakes was misread as brokes or even blokes, it would still match the search term.
Finally, Attorney Reacher has a collection of documents that contains what he needs, and he is ready to prepare review sets for his team to begin reviewing. However, he decides to run one more type of search just to be sure. This search is called a concept search, and it uses statistics to match not just exact keywords but to match concepts that are similar to the keywords entered. Concept search can also be used without keywords, by simply finding all the major concepts and how they are clustered within the set of documents.
This time Attorney Smith runs a concept search using the keywords BMW 1150, brakes, accident, and maintenance. As he scrolls through the results he doesn’t see anything new, until he sees the word stoppies, which he is unfamiliar with. A little digging in the result set of documents lets him discover that stoppies is a behavior similar to wheelies that can result in damaged brakes. The documents containing this word revealed that the plaintiff frequently engaged in this dangerous behavior. Attorney Reacher now had the ammunition he needed to win his case, using a concept he did not know in advance existed.