Tips & Tricks for API Pentest

In order to enable communication between different platforms, the use of APIs (Application Programming Interface) is becoming increasingly common in all types of environments, especially in contexts where the expansion of infrastructure is a relevant factor.

You can also listen to the audio version of this article:

Thus, the need to guarantee the security of the data involved in these interactions and, consequently, the demand for security services focused on these components follow this growth curve.

With that in mind, the purpose of this article is to address some specific issues that are frequently observed in analyzes that have this focus, seeking to provide an overview of some errors and/or techniques that generate constant results in this kind of API pentest service, primarily, demonstrating some actions that can be added to an existing methodology for optimizing the results, reproducing in a test API some real cases already verified previously.

Fuzzing

Fuzzing is one step in the active recon process for the vast majority of web pentests, this reality is no different when the target is an API. The purpose of this topic is to introduce some techniques that can be added to a base methodology, if they are not already present, which can make a big difference in discovering valid endpoints during the assessment.

Context

First, the wordlist used to fuzz must be carefully chosen or elaborated, taking into account as many contextual factors as possible. For example, the language in which the already known endpoints are named, its formats and patterns. To illustrate contextual fuzzing, when looking at a request like the following:

We can see that its name contains the prefix “server_”. That way, we can assume that other API endpoints use this same nomenclature (“server_” prefix followed by an English word) and perform directed fuzzing using a wordlist with relevant English words prefixed by “server_”, in search for other valid endpoints invisible to the user, thus discovering the endpoint “/api/v1/server_access” in this case, which returns an access token:

Depth

The fuzzing depth can be the difference between finding or not hidden features in the target environment. In this context, depth refers to the server’s directories. Some important points need to be observed so that the results are always optimized in everything that is within the analyst’s reach. In terms of depth, the first step is to fuzz all observed directories, therefore, upon seeing a request like this:

We notice at least two additional directories besides the root, “/api/” and “/api/v1/”, fuzzing these directories as well could bring good results. With this in mind, it is recommended to execute this process in all directories observed throughout the test, not limited to the server root or default ones. This way, we can guarantee that our content discovery is applied in a systemic way to all directories we already know.

However, valid directories that do not appear in the API’s documentation or when using a component that consumes its resources (web application, mobile app, and so on) will not necessarily be found that way. Therefore, the fuzzing depth range must also be taken into account. That is, fuzzing not only endpoints in the known directories but also directories themselves. For this, we must use at least double depth when fuzzing for this purpose (at least two segments of the URL, for example, https://example.com/api/v1/fuzz1/fuzz2). The difference between recursive and multi-depth fuzzing is that recursive fuzzing relies on valid directory enumeration to discover new endpoints, while multi-depth fuzzing sacrifices resources for more assertiveness and efficiency. To illustrate this case, the test API has the endpoint “ /api/v1/configuration/certificate”, which returns an SSL certificate formatted in JSON. However, when sending a request to the “/api/v1/configuration” directory, the server responds with “404 File not found”:

That is, standard and recursive fuzzing will not discover valid endpoints of this directory, as its behavior will generate a false negative. Multi-depth fuzzing, on the other hand, is capable of discovering these hidden endpoints by requesting them directly, eliminating the dependency on valid directory enumeration.

HTTP Verbs

HTTP methods or verbs can also be the difference between discovering valid endpoints present on the server or not. The reason for this is that in many cases, some features only respond to requests that use specific methods according to their implementation, returning for example “404” when receiving a non-standard request. To illustrate a case like this, the test API has the endpoint “/api/v1/active”, which only responds to “POST” requests and returns “404” upon receiving any other method:

Therefore, it is recommended that we also use different methods when fuzzing, such as “PUT”, “POST”, “PATCH”, “DELETE”, etc., expanding the results even further.

Authorization

Access control vulnerabilities are among the most relevant in API tests, and the “Insecure direct object references” (IDOR) type is quite common and often leads to large impacts in certain contexts. This type of vulnerability arises when a service uses user-supplied data to access objects directly. There are several ways to identify and exploit this type of vulnerability and many things must be considered so that the analysis produces the best result in each case. In this article, the focus is to demonstrate some specific ways to interact directly with objects, commonly seen in API assessments.In the test API, the endpoint “/api/v1/users/me” returns information about the logged in user:

In this example, the API returns an object that represents the authenticated user who submitted the request. One of the first access control validations that can be done is to send a request directly to the endpoint “/api/v1/users”, in order to access all objects referring to users:

In this case, the API returns all its objects, and each of them has a unique ID, different from the numeric ID seen earlier. What can be done next is to test direct access to an object, referencing it by its ID in the request path, instead of the word “me”:

Then, we can try to change its data by sending a “PUT” request containing the object in its body, including the desired changes:

In some cases, it is also possible to insert new arbitrary data into the object, changing its default format, which can cause a variety of unpredictable impacts:

Often, the server implements a certain level of control, preventing direct access to objects referenced in the request path, but direct access referenced in other ways is not correctly evaluated. For example, when referencing the object in the “path” of the request, the API returns “401 Unauthorized”:

However, when sending a “PUT” request to the endpoint “/api/v1/users”, containing the object in the request body, including the ID that references it, the implemented access control does not prevent its modification:

Resource Consumption

A very common type of vulnerability in API testing is excessive resource consumption. These vulnerabilities usually arise when the user manages to make the amount of data processed by the API much larger than expected in its regular usage, introducing the possibility of exhausting server resources in certain situations. There have been cases in which a single “malicious” request was enough to knock services down, making it unavailable to legitimate users, which can have a major impact in some cases.

Vulnerabilities like this can present themselves in different contexts and forms, it often requires attention and critical thinking to identify them. This topic will cover some examples of this weakness, which is constantly exploited.

Often, pagination mechanisms can introduce this flaw. The purpose of pagination is precisely to separate the content from the response into several pages, making each request obtain results referring to a page with a pre-defined number of “records”. For example, assuming there are 100 records on the endpoint, a limit of 20 records per page can be set. In this way, the content will be divided into 5 pages, each containing 20 items. Thus, the amount of data returned per request is greatly reduced and the rest of the data will be gradually requested as the user accesses the following pages.

In this scenario, the problem arises when the user manages to control these values arbitrarily, even though this possibility is not directly available on the front end. Normally, this can be done by sending a parameter in the request containing the number of items returned per page, which is often present by default in libraries or frameworks used to implement this functionality.

The parameter can have a variety of names such as “limit”, “max”, “per_page”, “page_size” among others, often being found through fuzzing, when there is no knowledge of the technology used in the backend.
To illustrate this scenario, the endpoint “/api/v1/get_items” was created in the test API, containing a total of 10,000 items and using a pagination mechanism that applies a limit of 20 items returned per request:

However, when the “limit” parameter is sent in the request, its value is interpreted and used by the backend to define the number of items returned per page:

Because of this, upon discovering this parameter’s existence, the user is able to arbitrarily increase the amount of data processed and returned by the API, being able to “dump” all records in a single request if there are no controls implemented to avoid it. Depending on the amount of data accessed through this endpoint, the result of malicious exploitation of this behavior can cause total service unavailability, even in attacks with extremely limited use of resources.

This possibility is often unknown by API developers, usually because it is a standard feature of the third-party solution/tool used, whose documentation was not sufficiently evaluated during development.

The vulnerability can also arise from logical flaws in the implementation of server-side code. To illustrate some examples, two real scenarios found in some pentests I ran were reproduced. In the first scenario, the application sent a search filter on the request, through the “filter” parameter. The filter had the following format: search=operator:value. The field represented by “search” refers to what is being searched, the field “operator” refers to a comparison operator, and the field “value” represents the content of the search, for example: ?id= eq:1

In the example above, the backend will search for the item with an ID equal to 1, returning it in the response. This type of approach is used to make the API’s search functionality more flexible, enabling more specific content filtering. However, when this flexibility is under free influence of the end-user, without implementing controls, several possibilities can be introduced, including processing a large amount of data. In this same example, the user could change the filter sent by the application in order to maximize the number of results returned, one of the ways to do it would be “?id=ne:0”:

With the above filter, the API will search for items whose ID is different from 0 (ne = not equal), causing it to process and send all items in the database. The details of exploitations like this are usually contextual, requiring a case-by-case analysis, however, they end up following the same logic.

In the second scenario, the API includes a functionality to correlate keywords with objects, that is, upon receiving a predefined keyword in a parameter of the search request, the API returns the entirety of the object referenced by it. In cases like this, when there are no controls implemented, this type of functionality can introduce several vulnerabilities. In the example reproduced, the keyword “template1” could be included in the parameter “s” in the search request, causing the API to return the related object:

One of the issues that can arise from this is the absence of a repetition control, allowing the user to send multiple occurrences of the same keyword separated by a comma, for example. Again, exploitation details are often contextual according to implementation but sharing the same core idea. In this case, the backend will send an object for each of the keywords that reference it, causing it to process and send a large amount of data:

Even though cases like this are somehow “limited” by the implementation of the HTTP protocol on web servers, due to request size, it is still very likely that its exploitation will have great impact, especially in situations where the attacker can identify keywords that refer to objects which naturally have a large amount of data, or contain values that can be modified/included by him.

Conclusion

These are just a few example approaches that produce constant results in API pentest. Just like every cybersecurity feature, there is a universe of possibilities to be explored, whose known limits are constantly expanding. The most important thing is to constantly develop the ability and habit to fully visualize the path traveled by the information, idealize and create subversions for any asset involved in this logic flow.

Tips & Tricks for API Pentest