Tuesday, December 13, 2011

Beware Of Dupes



http://ow.ly/7Yfu1

An article by Jim Shook posted on the emcsourceone.com website.

This article discusses eDiscovery techniques for removing duplicates files, which is commonly referred to as de-duplication.

The article states, "The temptation in eDiscovery is to quickly get rid of all of these extra copies – to de-duplicate the information so that we can spend less in collecting, processing and reviewing that information. We commonly see duplication rates of 15 to 30% depending upon the source data and the environment."

The article goes on to analyze the content of specific document, and breaks them into various parts, to shed light on how the de-duplication process actually works. The author goes on to further comment, "Most frequently, de-duplication is performed by using the hash value or “fingerprint” of a file. (For a thorough discussion of hash values in eDiscovery, see Ralph Losey’s excellent e-Discovery Team blog entry and Ralph’s related Law Review article). The hash of a file is normally based upon its contents..."  A link to the referenced Law Review article by Ralph Losey is provided by Mr. Shook.

The wise advice the author provides in conclusion... "So when you’re looking to save money with de-duplication, do your homework and be careful what you ask for. The some rule applies when the other side is providing you with de-duplicated information: understand what you are getting – and if it matters in your case."

No comments:

Post a Comment