February 19, 2024

How Did My Audit Go? - A Framework for Evaluating Audit Effectiveness

Right now the process of choosing and recommending audit firms in the blockchain space is opaque. Sticking to the values of Web3, we want to demystify the selection process and introduce quantifiable metrics for audit effectiveness, thereby enhancing trust and reliability within the blockchain ecosystem.

In the rapidly evolving space of Web3 & blockchain technology, the importance of robust security measures has never been more critical. As blockchain systems become increasingly complex and integral to various sectors, the role of thorough and effective security reviews is paramount.

Right now the process of choosing and recommending audit firms in the blockchain space is opaque. Sticking to the values of Web3, we want to demystify the selection process and introduce quantifiable metrics for audit effectiveness, thereby enhancing trust and reliability within the blockchain ecosystem.

In this article we want to shed some light on the current state of the Web3 auditing space and propose a framework for evaluating audits.

Current State

Right now, protocol teams looking for a security review have to navigate a labyrinth of different providers, without an effective way to evaluate their performance against each other. This leads to a more opaque recommendation-based decision process in a space that aims for transparency, decentralization and fact-based decision-making. There must be a better way, right? What if we could compare these audit providers based on a set of fair metrics instead?

We believe that a Sherlock Contest is the most thorough audit one can get. This is why we worked on a framework to measure it. 

Challenges in evaluating Audits

Of course there are several challenges when it comes to evaluating different security reviews. Every codebase is different so there is not much use in comparing audits of completely different protocols.

So we have to stick to protocols that received multiple security reviews, but there are still a lot of other different factors coming into play. A best-case scenario for a comparison would be if both audits had the same:

  • severity criteria
  • start date
  • scope
  • commit hash
  • duration & cost

Now, unfortunately this is more like the dream scenario that rarely occurs. This is why we have to find a solution to handle situations where one or more of these factors differ.

1. Different Severity Criteria

It is not uncommon in our space that the criteria to decide on the severity of an issue differs between different audit providers. This goes from the actual available categories (INFO, LOW, MEDIUM, HIGH, CRITICAL, etc) to the rules for deciding in which of these categories an issue belongs.

To mitigate this we have to choose a consistent severity standard and re-judge all issues according to it.

2. Different start date

If the security reviews have not been started at the same time we must evaluate them sequentially. This is important, as the later started audit probably already knows about reported issues from the earlier starting one. 

3. Different scope

Sometimes the scope of the different security reviews differs. In such a case the evaluation has to be done only on the overlapping scope. Therefore the framework will ignore findings which relate to a part of the codebase that was not in scope of the preceding/following audit.

4. Different commit-hash

As with the start date it is quite common for protocol teams to conduct audits one after another, and apply fixes to the reported issues in between. This naturally results in the code being reviewed to differ between the audits. For an effective and fair assessment we have to identify the overlapping lines of code that are still the same for both reviews. This helps us identify issues that have already been present for the earlier audit (therefore missed) and issues which are introduced by new lines of code (could not have been found by the earlier reviewer). The former ones are a good indicator for the effectiveness of an audit.

5. Different cost & length

When it comes to the length and the cost of an audit we have to make some assumptions. Naturally different kinds of audits have different price models, but it is fair to assume that every provider is a professional and will not underscope the review to avoid responsibility. 

Still these factors can be very important for protocol teams in terms of budget and deadlines.

Evaluating multiple consecutive audits 

When comparing multiple audits that have been performed sequentially there arises another problem. Using the described mechanics, every audit can only be evaluated against the following ones. 

Example:

Imagine a protocol had 5 separate audits.

  • Audit #1 can be evaluated against all of the following: #2, #3, #4, #5
  • Audit #4 can only be evaluated against Audit #5
  • The last audit (Audit #5) cannot be fully evaluated

A possible solution for this could be to only take into account the overlapping lines of code for consecutive audits.

In Conclusion

Implementing a standardized framework to evaluate audit effectiveness represents a significant step towards transparency and quantifiable security in the blockchain space. By adjusting for timing, scope, code commits, severity criteria, and cost differences, we can fairly compare audits on a level playing field. This data-driven approach aligns with the ethos of Web3 - basing decisions on facts rather than opaque reputation.

While challenges remain, this framework aims to kickstart a meaningful conversation on objectively measuring audit quality. Sherlock believes that delivering robust evidence of superior methodology is essential for auditors and protocol teams. Only through transparency and continuous improvement can we elevate the calibre of smart contract security to match the increasing complexity of blockchain systems.

The stakes have never been higher. As adoption accelerates, the need for bulletproof auditing reaches new urgency. By collectively building better frameworks, we can empower teams to confidently launch, knowing their code received the most rigorous inspection. The result will be more secure, resilient and innovative blockchain applications benefitting users across industries.

There is still much work to be done, but Sherlock is committed to driving progress through research and open collaboration. 

If you would like to learn more about this topic you can also watch the talk that Jack Sanford, one of Sherlock’s co-founders, held at TrustX 2023.

This is part 1 of our series of Blog Posts in regards to Audit Evaluation. Next up we want to apply these techniques on an actual series of audits.