Tech

Code Comprehension and its role in code review

In the previous blog post, we have covered what Code Comprehension is and talked about some examples of its presence in the software development life cycle. Now we are going to talk about the process of code review, its motivations, challenges, and how code comprehension research can be useful. 

Modern Code Review

Peer code review is a widespread software engineering practice that is performed with different motivations. Usually, it is used to find defects in the code but can also be used for other purposes, such as knowledge transfer and refactoring.

It can be performed at many stages of a software project and in many forms. For example, one can choose to review just a piece of the code he is interested in or do a full source code review. One can wait to review once the software or some component is complete or review it continuously, at every commit or pull request.

Bacchelli and Bird’s paper [1] describe modern code review as “(1) informal (in contrast to Fagan-style), (2) tool-based, and that (3) occurs regularly in practice nowadays, for example at companies such as Microsoft, Google, Facebook, and in other companies and OSS projects”. This kind of review is commonly practiced nowadays, whether by the developers themselves, whether by quality assurance or security professionals. Once a new change is committed to the source repository it becomes available for reviews and this process is usually supported by a tool that can display the changes in a pleasant format and allows the reviewer to enter comments.

Motivations and obstacles in Code Review

In the article “Expectations, Outcomes, and Challenges Of Modern Code Review” [1], the authors conducted a study using the analysis of previous studies, observations, interviews, and questionnaires. Developers, code reviewers, and managers provided some data that helped the authors to map the motivations, expectations, and challenges of modern code review.

According to this study the most prominent motivation for code review are:

  • Finding Defects – to find bugs;
  • Code Improvement – to improve code consistency, readability, etc.;
  • Alternative Solutions – to find a better implementation;
  • Knowledge Transfer – for learning purposes;
  • Team Awareness and Transparency – to make the team aware of code evolution and make the code changes transparent.
  • Share Code Ownership – related to “Team Awareness and Transparency”, but with the connotation of collaboration.

The results showed that expectations are not always met. Comments about bugs (finding bugs was one of the main motivations) were few and superficial. Developers and managers expected more in-depth comments on conceptual and design issues.

During this study, many comments appeared regarding the challenges in code review. The main factor to an effective code review is understanding. Even when there are small changes in a codebase, the reviewer needs to read other snippets of code to get all the context necessary to understand what is going on. It takes time to understand code that you’re not familiar with, but once the understanding is acquired, we believe the result of the review may be more productive in terms of time spent in the process and in the quality of the feedback. 

Another study [6] investigated issues related to understanding code changes. The authors wanted to know in which scenarios code understanding is required, the frequency of this process, what information is needed, and what could be done to improve the effectiveness and efficiency to understand code changes. 

This paper listed that code understanding is required for:

  • Reviewing others’ changes;
  • Fixing bug;
  • Developing new feature;
  • Reviewing your own changes;
  • Writing/updating test cases;
  • Refactoring;
  • Resolving merge conflicts.

About the frequency, it demonstrated that most participants needed to understand code changes daily. 

According to the data collected, the top 3 information for understanding code changes are: 

  • Rationale – What is the rationale behind this code change?
  • Correctness – Is this change correct? Does it work as expected?
  • Risk – Does this change break any code elsewhere? How?

And the most difficult information to acquire was also collected. The top 3 are:

  • Consistency –  Are there any other places that need similar changes?
  • Risk – Does this change break any code elsewhere? How?
  • Behavior – How does this change alter the program’s dynamic behavior?

About what could be done to improve the effectiveness and efficiency to understand code changes, the authors mapped two approaches:

  • Determining a change’s risk – According to most participants, testing (eg. unit tests) and code review are the two main practices to determine the change’s risk.  Two features were proposed to improve these practices: 1) a feature to detect the code and the test cases affected by the change and also notify the tester about the need for a retest; 2) a feature to perform static analysis (eg. go to definition, etc) on the diff view;
  • Decomposing a change – Sometimes a change is big and affects many files, other times the change is the implementation of many features or bug fixes not necessarily related. This makes code understanding harder. To mitigate this, a feature was proposed to decompose a change into smaller ones grouped by their relationship.

Challenges

On both papers [1] and [6] we can extract comments that illustrate some of the challenges faced for those doing code review:

“the most difficult thing when doing a code review is understanding the reason of the change” [1]

“big-picture impact analysis requires contextual understanding. When reviewing a small, unfamiliar change, it is often necessary to read through much more code than that being reviewed.” [1]

“It takes a lot longer to understand unknown code, but even then understanding isn’t very deep.” [1]

“I usually would need to manually find portions of the code which are using changed portion and figure out how this change affects callers. Sometimes it’s not obvious from the code itself, I have to actually step through the code with debugger to understand it.” [6]

“It is hard to evaluate impacts on other components, unless there is clear interface between this component and others. Very frequently, other components have some assumptions on this component, while these assumptions are not documented.” [6]

Code review is a time-consuming task, it requires some context about the code being reviewed. Even for small code changes, the reviewer needs to read more code usually across many files. Sometimes a code change is big and unrelated (eg. different bug fixes in the same commit). Besides the main challenges extracted from the papers, we add lack of documentation, unclear comments in commit messages, and inconsistent code style as other obstacles to a more productive code review.

How does Code Comprehension help?

As we saw in the previous section, code understanding is considered the most important aspect of successful code reviews. When we consider the modern code review process, it mostly means context and change understanding. In a recent study [3] about why security defects are not detected by code reviews, the researchers discovered that vulnerabilities that do not require the understanding of many lines of the code context have a higher chance of being detected, while vulnerabilities that require the understanding of a larger code context have higher chances of remaining undetected.

As described in the first post of this series [7], modern integrated development environments (IDEs) already incorporate and provide a lot of code comprehension supporting functionalities, such as call hierarchy views, code browsing, find all references, go to definition and split view. This kind of tool sometimes supports code review activities and thus such functionalities can be used to improve code understanding. However, it is common for code reviews to be conducted in Web tools that may lack such functionalities and provide just a syntax highlighted diff of the code change, which could make it harder for the reviewer to achieve context and change understanding.

Other research goes one step further and explores more advanced topics, such as decomposition of changesets for code review [2], the relation of mental load and working memory capacity with code review effectiveness [4], the influence of the ordering of the code changes presented to the reviewer [4] and reducing the cognitive load of the reviewer [5].

Taking advantage of it as a reviewer

One piece of advice for reviewers is to give preference to tools that support at least basic code comprehension functionalities. This will help in the process of understanding the code context and how the changes impact the existing code base. This is even more important in situations where the reviewer has little to no previous knowledge about the code involved in the changes, as is the case with developers reviewing code for a component they have not been working on or third-party security professionals reviewing code changes for vulnerabilities, for example.

Another point is about how to preserve the knowledge obtained during a code review and make it accessible to subsequent reviews. Nowadays it is common for the reviewers only to leave comments on the code review screens, but could that be improved? What could make it easier for the reviewer to gain an understanding of some code changes? That is some food for thought.

Conclusion

In this post, we have covered the modern code review process, its challenges, and key requirements for successful reviews. We saw some research [1, 6] that describes code understanding as being the most important element in code reviews. We also saw how Code Comprehension studies relate to this area.

This is the second blog post in a series about Code Comprehension, stay tuned for more content about the topic.

References

  1. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ICSE202013-codereview.pdf
  2. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/barnett2015hdh.pdf
  3. http://amiangshu.com/papers/paul-ICSE-2021.pdf
  4. http://tobias-baum.de/rp/memoryCodeOrderAndReview.pdf
  5. https://www.repo.uni-hannover.de/bitstream/handle/123456789/9217/diss.pdf?sequence=1&isAllowed=y
  6. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.261.2263&rep=rep1&type=pdf
  7. https://blog.convisoappsec.com/en/code-comprehension-what-is-it/

Authors
Gabriel Quadros – Sênior Information Security Analyst
Ricardo Silva – Information Security Analyst

About author

Articles

A team of professionals, highly connected on news, techniques and information about application security
Related posts
Code FightersTech

An introduction to secure code review on Go applications

We have a new application or module written in the Go language that we want to analyze. So how do we…
Read more
Tech

Conviso Platform Extension for Burp Suite

That PortSwigger has fantastic products, we were already aware. One of these products is the Burp…
Read more
Application SecurityTech

Why APIs can be a high risk for companies

When we look at the development world and its evolution in the last few years, we can say that one…
Read more

Deixe um comentário