Monday, September 5, 2011

De-NISTing: De-FECTive




http://ow.ly/6lWgW

An article by Craig Ball, Esq. on his blog Ball in Your Court.

This article takes the National Institute for Standards and Technology to task.  The NIST provides four annual updates of the NIST list, which matches files to hash values that are stored by the National Software Reference Library.  The NIST list is used by eDiscovery service providers to remove system files, and other common program and "noise" files that are not likely to be relevant information in any typical litigation...this process is referred to as De-NISTing.

The author points out major flaws in the De-NISTing process, in particular as it pertains to Windows 7 operating system.  The author provides some startling statistics, "I created a pristine install of Windows 7 on a sterile hard drive. The pristine install consisted of 47,690 files, and everything on the drive that wasn’t fashioned on the fly as part of the install process came straight off the Windows installation disk.
But, do you know how may of those 47,690 files were on the latest NIST list? Just 7,277! That’s right, the NIST list misses 85% of the files in a pristine Windows 7 installation."



As the article goes on to further state, "I did some exploring and found that one reason the NIST list missed so many noise files is because NIST hasn’t yet processed Windows 7 for addition to the list. More than 350 million machines run Windows 7, but apparently none at NIST. Arrrgh! What’s more, the NIST list doesn’t include the components of Microsoft Office 2010 either. Only 100 million machines run Office 2010."

Hence, eDiscovery service providers, and those performing de-NISTing in-house need to be wary of this.  The writers of this blog have discussed this issue in the past, and are certainly troubled by the fact that the NIST seems to regularly lag behind the technology that is already in widespread use by the public.



4 comments:

  1. This is exactly what is wrong in the world of Service Providers offering electronic discovery processing services. First, the NIST list can be obtained for free. What would one expect for that price? I have not heard of anyone wanting to pay for this type of library, service providers dont want to pay for anything. Second, if there was truly an understanding of what is involved in "De-Nisting" this issue is easily overcome. All that needs to be done is to make a hash set or library of files that you want excluded and add those values to your comparative lists. Not a hard task, yet it is overwhelmingly clear that there is not a true understanding of what these lists contain. De-Nisting is not defective since obviously the files identified in the NIST list are excluded when the exclusion process is run. It works. De-Nisting is customizable and you get a result based on the effort you put into defining your comparative set. Service providers are the first ones to take something for free, try to make a dollar and then complain that its not complete even though the process can be upgraded at no cost. Just a little time, understanding and knowledge will get you where you need to be.

    Michael Bean - Affirm Discovery

    ReplyDelete
  2. Dear Michael:

    Thank you for the comment, and you are exactly right. SRM Legal does customize our own lists, and doesn't rely on the NIST, rather it is used as a starting point. However, there are many providers out there using it as if it was an accurate and complete standard, and it is troubling that there is such a perception. Especially in light of the fact that the NIST, due to budget limitations as pointed out in Craig Ball's article, fails to provide updates based on the most current technologies.

    What I like about this article is that many individuals seem to perceive the NIST list as being a current standard, and it is certainly far from complete.

    ReplyDelete
  3. In My Humble Opinion, I think the problem Craig points out is valid but points more to a general misperception by clients and in some cases an over representation of what the NIST list can be used for by some service providers. First, anyone that knows how the NIST list is generated, what it was/is used for and how it impacts data reduction would know, you can't rely on it solely as some sort of magic wand to eliminate a high percentage of so called system files. Second, our industry due to many crossing over from Forensics to eDiscovery and Ediscovery to Forensics and all other electronic services disciplines, contribute to this mess by stating as fact that DeNisting IS a magic wand. It is no more magic than deduplication, or key word filtering is when any one is used by themselves. The fact that Windows 7 files are not denisted to the level one MIGHT expect is not huge, because honestly if we are not doing some level of additional file and folder analysis after DeNisting we are not serving our clients well. My $0.02

    ReplyDelete
  4. Dear Rich:

    Thanks for the comment, and well said. I fully agree and thanks for the insight.

    ReplyDelete