Why do tools show different results?

Written by Amber Schroader

April 22, 2026

Since I started working in the DFIR space many years ago I always remembered the rule of two tools. That rule, although stated, is not always followed by every examiner. With the rising costs of DFIR tools many organizations have only funded one tool for their teams, and they rely on the community tools to cross check if needed. There are a lot of complications with that approach, but it does literally follow the rule of two tools. So, where do you go when your two tools show very different results.

Due to the nature of this post, I will not have screen shots of different results from tools to not risk a tool feeling “picked on” in anyway. The point is your tools should be validated by you before using them and if you see a difference in results, you should take additional steps to determine the why.

Question: Why do we dual-tool in the first place?

Digital forensic tools and methods must be reliable, repeatable, and scientifically valid if the results may be used in court. Digital forensics admissibility says tools should produce repeatable/reproducible results. We see this outlined in the NIST guidelines as well as other legal authorities.

Think of the Daubert Standard as the court’s “BS detector” for science. Essentially, before a judge lets forensic evidence into a trial, they need to know it’s based on real science, not just a hunch or a fancy-looking piece of software.

What the Courts are Actually Checking

When a judge looks at a forensic method, they’re basically asking five questions:

  1. Can we test this? Is it a repeatable process or just a one-off?
  2. Has it been peer-reviewed? Have other experts poked holes in it and agreed it works?
  3. What’s the “Oops” factor? We need to know the known or potential error rate.
  4. Are there rules? Is there a standard way to use the tool, or is everyone just winging it?
  5. Is it “Normal”? Does the broader scientific community trust this thing?

For forensic software, this usually means having a paper trail of validation tests that prove the tool actually does what it claims to do. This is where you must avoid tools that just claim they are proprietary and cannot disclose what they are doing because it is “magic” because that doesn’t work in the validation process and falls into the “black box” risk.

What This Means for the Tools We Use

Here’s the thing: a forensic tool doesn’t have to be perfect. No software is. But it does have to be defensible.

As examiners, we need to prove that if we ran the same test tomorrow, we’d get the exact same result. We must be honest about what the tool can’t do and show that our findings came from a specific, documented method, not just a lucky guess.

This is exactly why we spend so much time on validation and why we’ll often use a second, different tool to double-check our work. It’s all about making sure the evidence stands up to scrutiny when the pressure is on. 

Question: How do we explain discrepancies in our reports?

This is where we go back to tools, are not perfect. Checking the results between two different tools that might show a different number of messages recovered is the perfect reason to run a case through more than one tool.

The difference between the two can be explained through different research methods use by each company. Sometimes when the results are very different a third tool should be used to cross validate that discrepancy. 

Question: How far do we go to verify the results? Are we cross-matching databases, or comparing file system views?

The short answer is as far as it takes. Cross reference recheck data, look at databases with other tools designed to just review the database, but understand why you see different results. That part of the process is on the examiner.

A lot of the parsing and recovery processes are very proprietary for each tool and there will be different results often. One tool might do a great job at SQLite parsing and do horrible at plist parsing or do better at one file system than another. Some tools will image a device well but then parse it poorly. Create a system that produces the best results even if it involves the extra steps using more than one tool.

Create a workflow instead of just a process with one tool. An example of a workflow would be as follows:

Suspect device is Android and the tool used only does an ADB backup of the device, another tool can get root and process the device physically. Image with both tools and compare the results. The logical ADB backup might help guide you to parse the physical image. Either way understand what the strengths and weaknesses are for each tool to be able to select which tool should be used with each type of evidence. My example is with mobile devices, but this also rings true for file system evidence and artifacts. 

Question: How do you verify what’s accurate?

This is a tough question and the hardest one for people to understand and put into practice. To verify accuracy, you need to do test plans and validate the tool capabilities. This is not a short process by any means, and it is also one that everyone doesn’t have the time to do.

There are two types of test plans. The first is does the tool run and function in the parameters designed, this type is done with commercial tools when they test a release for launch. Vendors in DFIR providing commercial tools might also go through and do a validation plan that the data that is processed matches their validation images. This is an additional step that does not exist in the rest of the software world. That validation step requires the design and documentation of a variety of test images based on what is supported by the tool. Some government organizations in the US and UK also do validation testing based on specific images that are created to ensure the data is seen by the tools that are being used.

What to watch for is when you use a tool that is created by the community it does not go through the same test plans and validation plans each time. This is a side effect of free tools that put that burden to the end user to not only test for functionality, but also validate for accuracy.

Question: What do you include in your reporting when results don’t match… without over complicating your report?

When I use two tools in a case I often provide the results based on each data point that was reviewed and then have a combined summary of the results. It means there will be three total reports that represent the work that was performed. I also maintain notes that show which tools I used, version numbers, and reasons for the decision to use a specific tool over another based on my workflow design used in the lab.

In the end all the choices for multiple tools lead you to not only better overall results, but a fair amount of extra written paperwork that needs to be maintained in your lab processes. I have seen a variety of different tools do anything from not producing hundreds of different messages that another tool did produce, to tools adding data to evidence and leaving it there. Knowing both the functionality of how a tool works and the accuracy of what it produces is the key to doing the best possible investigation, and well worth the extra paperwork and workflow involved.

 

Forensic-Impact Articles

Understanding the Risks of AI in Investigations

Understanding the Risks of AI in Investigations

When data integrity is everything, hooking an AI tool directly into your investigation workflow is a major security gamble especially when dealing with sensitive evidence, login credentials, or PII. As AI becomes a standard feature in forensic tools and other digital...

Inside Malicious Office Documents

Inside Malicious Office Documents

Guest Blogger: Luca Garofalo Today whether it is at work, in school or any other context we receive documents. They are very usefull they allow us to keep informations in a more organized way thanks to tables, images and text formatting. However some documents can...