Before discussing Code Comprehension, it is important to talk a bit about Software Engineering. There are some definitions that drive us to a better understanding about this important area of Computing. A well-known definition is from professor and software engineer Barry Boehm. Boehm  defines Software Engineering as:
“The practical application of scientific knowledge in the design and construction of computer programs and the associated documentation required to develop, operate, and maintain them.”
Nowadays that we are increasingly connected and living in a world where software controls critical systems in hospitals, means of transport, financial systems, power plants, etc. this discipline becomes essential not only for Computing but also for society. The work with software does not end when it is released, a software evolves over time, features are incorporated or even removed according to the users’ needs, in addition, bugs occur and cause operational or even security issues. For these reasons, software needs to be properly maintained. In large codebases where many people write code or are responsible for components, basic understanding of this entire ecosystem is essential for maintenance.
The field that studies techniques for extracting information that helps to understand software is known as “Code Comprehension” or “Program Comprehension”. This field of study is so important that there is a specialized conference where researchers can present their work on code comprehension, including human activities or tools that might aid in the process. The IEEE/ACM International Conference on Program Comprehension (ICPC) describes itself as “a quality forum for researchers and practitioners from academia, industry, and government to present and to discuss state-of-the-art results and best practices in the field of program comprehension” .
The main goal in Code Comprehension is to extract information that could be used to understand some aspects of a system. Some of these aspects are described in  and they are:
- the structure (components and their interrelationships)
- the functionality (what operations are performed on what components)
- the dynamic behavior (how input is transformed to output)
- the rationale (how was the design process and what decisions have been taken)
- the construction, modules, documentation, and test suites
Classic Code Comprehension models can be defined as top-down, bottom-up and integrated .
Top-down models are used when the programmer knows the program’s domain. Initially, a general assumption about how the program works is defined, which is then refined as the programmer digs into the details until he has a low-level knowledge of how the code works. During the refinement process, the programmer looks for beacons, which are information expected in some situations and act as a fragment to reconstruct the understanding  (e.g., a variable identifier that can give hints about the analyzed code).
When knowledge about the domain is insufficient, bottom-up models are used. In these cases, the programmer needs to examine the code in detail to build hypotheses about the general purpose of the program. This bottom-up approach is supported by the creation of chunks. Chunking is the process of grouping commands and structures that belong to the same semantic group.
The integrated model occurs when both top-down and bottom-up are used together. A programmer who has knowledge of the domain of a program can formulate a hypothesis about its purpose and if he finds parts that are beyond his knowledge the bottom-up approach is used to this part.
While there are tools specifically built for code comprehension activities, modern integrated development environments (IDEs) already incorporate and provide a lot of code comprehension supporting functionalities. Some of them are well described in the paper by Fekete and Porkoláb :
- control-flow views (e.g., call graph);
- code browsing;
- references finder;
- definitions finder;
- code completion;
- multiple files view;
- diagrams (e.g., UML).
Besides these common functionalities, some of the modern IDEs also support plugins that could be written by third-party developers to meet a specific need.
The images below shows some of these functionalities implemented in a popular IDE (VSCode) :
Peek and navigate to definition
Go to definition
In this post, we have covered the Code Comprehension topic as a research field within the Software Engineering area. It is an interesting and important topic that plays a big role in software development and assessment activities.
This is the first blog post in a series about Code Comprehension, stay tuned for more content about the topic, the challenges, and its relation with secure code reviews.
Gabriel Quadros – Sênior Information Security Analyst
Ricardo Silva – Information Security Analyst