Introduction to Fuzzing Android Native Components: Strategies for Harness Creation

In the previous article, we covered the Android application market, explored basic fuzzing concepts, discussed how native methods work in Android applications, and introduced the creation of a simple harness to demonstrate the basic functionality of AFL++. If you missed the content, the article is available through this link. In this new article, we will explore a real-world application and discuss some of the strategies adopted during the harness construction.

For this article, we initially developed a script to download several APK files, which served as the foundation for our tests. The chosen application was a simple image converter, selected based on specific criteria. We highlight the following reasons for our choice:

Creating a harness for the target function in the dynamic library was relatively simple, allowing us to condense the entire process into a single article.
In the last article, we discussed how JNI methods need two required parameters: JNIEnv and jobject/jclass. The Android Runtime initializes these structures, which makes creating a harness more complex. This complexity arises from the need to instantiate the JVM and ensure it maintains a consistent state. This state is essential for the validity of pointers within JNIEnv, as the harness would fail if it attempted to access invalid pointers. To avoid this second scenario, the strategy adopted was to select native functions called by the JNI methods as fuzzing targets, recreating the steps executed by the JNI method within the harness. Although this adds an extra layer of complexity, it eliminates the dependency on valid JNIEnv pointers. This limitation will be addressed and overcome in the next article.

JNIEXPORT jstring JNICALL
Java_com_conviso_example_jni_HelloJni_stringFromJNI(
     JNIEnv* env,
     jobject thiz
)
{
     return (*env)->NewStringUTF(env, "Hello from JNI!");
}

The motivation for writing this article was to create a sort of logbook, where I document the challenges faced and the strategies adopted during the development of a harness. The goal is to thoroughly document the steps taken to identify a potential vulnerability through fuzzing, providing a clear view of the process, the difficulties encountered, and the solutions implemented along the way. So, let’s get started.

1. Obtaining Initial Data from the Java Layer

After extracting the APK file of the application, check for the presence of the lib directory in the folder structure. In this case, a file named libimagemagick.so was identified, allowing us to proceed with the analysis. If a dynamic library exists, open the APK in a Java decompiler such as JADX and inspect the classes containing static initialization blocks, as discussed in the previous article. One strategy is to look for invocations of the System.loadLibrary() method, which helps identify the classes that make use of native methods. Below, we were able to identify that the Magick class declares three native methods.

Continuing with the application code analysis, we identified the presence of the MagickImage class, which inherits from the Magick class and declares additional native methods.

Good fuzzing targets are typically native functions that process complex or obscure data structures, such as file parsers. Ideally, these functions should handle inputs that can be controlled by an attacker, such as data received through untrusted channels like sockets or files. These functions often deal with data formatting or validation, making them prone to failures when processing unexpected or malformed inputs.

During the search for native methods for fuzzing, we identified the readImage method, which appears to be an excellent candidate for analysis. The next step will be to examine the code of this method in a disassembler to understand its behavior.

The identification of native methods called by the Java layer can be done statically, by searching for methods with the native keyword, or dynamically, using tools like jnitrace (based on Frida) to trace calls at runtime.

2. Native Code Analysis with Disassembler

After identifying a native method in the Java layer, the next step is to open the library in a disassembler, such as Ghidra or IDA Pro, and analyze the JNI method code. In this article, we will use Ghidra. As discussed in the previous article, a JNI method is typically identified by the prefix Java, followed by the Java package structure, the class, and the method name, all separated by underscores. The method we will search for in Ghidra has the signature: Java_magick_MagickImage_readImage. Upon decompiling the code, we found the following snippet:

When performing reverse engineering, the better the understanding of the data and structures used by a function, the better the visualization and navigation of the code, increasing the accuracy of the analysis. This also facilitates the automation of the process and the identification of vulnerabilities. Disassemblers like IDA Pro and Ghidra allow the import of new data types, ranging from simple types to complex structures like JNIEnv, used by JNI methods. In IDA Pro, simply importing the jni.h header adds the necessary types. In Ghidra, GDT (Ghidra Data Type) files store definitions for custom types. To analyze JNI methods, we can import the jni_all.gdt file and start using the defined types. After importing, it is possible to modify the JNI method signature to include the parameters JNIEnv env, jobject thiz, and the imageInfo, obtained through the analysis of the Java code. Although in the method under analysis the decompiler output did not change significantly, in more complex cases, where the use of pointers in JNIEnv is intensive, importing the types can considerably improve the understanding of the code.

We can observe that the JNI method retrieves the ImageInfo structure and directly passes it to the ReadImage function. With this information in hand, let’s further investigate the ImageMagick library to gain a deeper understanding of how ReadImage works, thereby facilitating the development of the harness.

3. Information in the Documentation and Test Directories of the Projects

If the analyzed library is open source, we can explore its documentation, which typically includes information about its functionalities, data structures, and execution flows. Documentation is essential for understanding the expected behavior of the library and identifying potential entry points for fuzzing. Additionally, it can provide details about configurations, dependencies, and usage examples, making it easier to create more effective test cases and analyze results during the fuzzing process. For further information, we recommend the excellent post written by Salim Largo: Harnessing Libraries for Effective Fuzzing.

The project repository may include directories dedicated to test cases, where you can find examples of harnesses, input corpora, and other useful artifacts for the fuzzing process. Additionally, these directories may contain execution configurations designed to exercise different parts of the library, along with ready-to-use examples of how to integrate the harness with the analyzed library.

If the repository does not contain this information, an alternative is to search the internet for examples of the library’s usage or consult tutorials and discussions in specialized forums. If necessary, reverse engineering can also be employed to gain a better understanding of the library’s behavior by analyzing its binary code or using debugging tools.

After conducting research on the ImageMagick library, the code for the developed harness is provided below.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <magick/MagickCore.h>

int main(int argc, char **argv) {
    InitializeMagick(*argv);

    ExceptionInfo *exception = AcquireExceptionInfo();
    ImageInfo *image_info = CloneImageInfo(NULL);
    strcpy(image_info->filename, argv[1]);

    Image *image = ReadImage(image_info, exception);
    if (exception->severity != UndefinedException) {
        CatchException(exception);
        return EXIT_FAILURE;
    }

    printf("Image width: %lu\n", image->columns);
    printf("Image height: %lu\n", image->rows);
    
    image = DestroyImage(image);
    image_info = DestroyImageInfo(image_info);
    DestroyExceptionInfo(exception);
    
    DestroyMagick();

    return EXIT_SUCCESS;
}

4. Initialization Functions in Android Libraries

During the analysis of libraries, it is important to look for initialization functions, such as lame_init() in libmp3lame or FPDF_InitLibrary() in PDFium, as executing these functions is essential for the proper functioning of the library. If these initialization functions are not called before invoking other native methods, the harness may fail or even cause a crash.

In the Java layer of the Android application, the constructors of classes that have native methods can indicate the necessary initialization functions to ensure that execution occurs correctly.

In the case of the ImageMagick library, the InitializeMagick function must be called before invoking the ReadImage function.

InitializeMagick(*argv);

5. Header Versions and Structure Offsets in Harness Creation

If the library is open source, it is essential to ensure that the library headers used by the harness match exactly the version of the Android library used in the application. Inconsistent data, such as different versions of complex structures, can lead to unexpected results or failures in the execution of the harness.

For example, the figure below illustrates a structure that had its fields modified after an update. If the function copy_transaction(DataTransfer*) in the Android library expects the structure to be passed as described in the CorrectHeader.h file, but we provide the version from WrongHeader.h, the byte calculation in the memcpy operation will be incorrect. Specifically, when the function tries to access the byte count (represented by the num field in the DataTransfer structure), based on the offset calculation [x8 + 16] (where x8 is the base address of the structure), the value of new_field will be interpreted as the number of bytes to be copied. This can lead to memory corruption and result in a failure in the execution of the harness.

The commands below outline the steps for compiling the harness and running tests using afl-qemu-trace. It is important to note that the ImageMagick headers are from the latest version, which, as will be demonstrated below, does not correspond to the version of the library used by the Android application.

$ aarch64-linux-android35-clang -o harness harness.c

The image below illustrates the result of running afl-qemu-trace. It is evident that, due to the incorrect offsets, the process is unable to complete its execution and enters a blocking state.

To obtain information about the library version, we can consult the ELF file, search for specific strings, or use functions that directly return the version. In the case of our tests, the ImageMagick library provides the GetMagickVersion() function, which can be used to retrieve the library’s version. By temporarily modifying the harness code, we can obtain the library version.

#include <stdio.h>
#include <stdlib.h>
#include <magick/MagickCore.h>

int main(int argc, char **argv) {
    size_t length;
    const char *version = GetMagickVersion(&length);
    printf("ImageMagick version: %.*s\n", (int)length, version);
    return EXIT_SUCCESS;
}

The output from the executable revealed that the version of ImageMagick used in the Android library was 6.7.3-0, an outdated version. After restoring the harness code and recompiling it using the correct headers, the harness worked properly.

With the library version in hand, we can download the correct headers, restore the previous harness code, and recompile it using the commands below.

$ aarch64-linux-android35-clang -o harness harness.c -I/home/thiago/Conviso/Android-ImageMagick/jni/ImageMagick-6.7.3-0/ -DMAGICKCORE_HDRI_ENABLE=1 -DMAGICKCORE_QUANTUM_DEPTH=16 -limagemagick -L .
$ export QEMU_SET_ENV=LD_LIBRARY_PATH="/home/thiago/Conviso/Fuzzing"
$ export QEMU_LD_PREFIX="/home/thiago/Conviso/qiling/examples/rootfs/arm64_android"
$ afl-qemu-trace ./harness ./afl_in/apple.png

After running afl-qemu-trace, we can confirm that the harness is working correctly.

When the library is not open source and the function works with complex structures, it is necessary to verify the offsets of the structure fields used by the function through reverse engineering. If the structure field offsets do not follow the default architecture alignment, it is essential to ensure that the __attribute__((packed)) attribute is applied to the structure declaration in the harness code. This attribute instructs the compiler not to add padding between the structure members — which is typically added by the compiler to optimize memory access performance, among other factors — thus preventing incorrect offset calculations.

For example, suppose the function we want to test with fuzzing receives a complex structure as a parameter, but only uses the content located 58 bytes after the base address of the structure. In this case, we can ignore the preceding bytes and focus our attention solely on the field that represents the data. The structure would be defined as follows:

typedef struct {
	char dummy[58];
	void *ptr;
} RandomStruct;

To verify how the compiler will define the structure, let’s create a simple example and examine the code of a function f(RandomStruct*), which receives a pointer to RandomStruct as a parameter and returns the ptr field of the structure, as shown below.

void *f(RandomStruct *r) {
    return r->ptr;
}

The figure below illustrates the difference in the generated code when accessing the ptr field, with and without the __attribute__((packed)) attribute set on the structure. It is observed that, for optimization reasons, when the attribute is omitted, the compiler adjusts the offset of ptr to 64 bytes from the base address of the structure. When the attribute is applied, the offset of ptr becomes 58 bytes from the base address, as desired.

6. Patience is a Virtue

It is essential not to interrupt the fuzzing process prematurely, especially while AFL++ continues to explore new code paths. In the case of the tested application, we configured the corpus directory with a few small images and let the fuzzer run for 5 hours. During this period, AFL++ was able to identify two crashes.

Below are the commands used to start fuzzing with AFL++. If any command needs to be reviewed, refer to the previous article.

$ export QEMU_SET_ENV=LD_LIBRARY_PATH="/home/thiago/Conviso/Fuzzing"
$ export QEMU_LD_PREFIX="/home/thiago/Conviso/qiling/examples/rootfs/arm64_android"
$ AFL_INST_LIBS=1 AFL_QEMU_FORCE_DFL=1 afl-fuzz -Q -i afl_in/ -o afl_out/ -- ./harness @@

7. System Logs and Debugging

After AFL++ identifies crashes in the harness, we can confirm the failure using afl-qemu-trace, passing the input that caused the crash as a parameter. In the image below, it is possible to verify that this input resulted in a segmentation fault in the harness.

When testing the input in the actual application, we can rely on system logs through the logcat tool. Logcat is an Android tool that displays system and application logs, helping with error debugging and real-time event monitoring.

To test the input in the application, we renamed the file to a common image extension, such as .PNG, and requested that the application convert the image to another format. When the conversion started, the application restarted. By analyzing the output in logcat, we were able to confirm that a crash occurred in the application.

In the backtrace output, we can see that the input caused an issue in memcpy, called by the ReadImage function, which is the function we selected as the target for AFL++.

Conclusion

In conclusion, the tips presented throughout this article are helpful for building effective harnesses for fuzzing native code in Android applications, covering everything from code preparation to result analysis. By combining AFL++ with QEMU, it is possible to optimize fault detection and adopt a more comprehensive approach to Android app security analysis. Additionally, the integration of debugging techniques, such as using logcat and tracing tools, allows for a more detailed investigation of crashes, helping to identify critical vulnerabilities.

In the next article, we will address issues related to JVM initialization, consistent access to JNIEnv pointers, and communication with the Java layer in the context of fuzzing harnesses. These solutions will help eliminate the need to recreate the JNI method behavior in the harness code, as was necessary in this article.

Introduction to Fuzzing Android Native Components: Strategies for Harness Creation

1. Obtaining Initial Data from the Java Layer

2. Native Code Analysis with Disassembler

3. Information in the Documentation and Test Directories of the Projects

4. Initialization Functions in Android Libraries

5. Header Versions and Structure Offsets in Harness Creation

6. Patience is a Virtue

7. System Logs and Debugging

Conclusion

About author

Thiago PeixotoArticles

Deixe um comentárioCancelar resposta

Sobre a Conviso

Confira esses artigos

Segurança de aplicações com IA: como apoiar o desenvolvimento seguro

Application Security with AI: How to Support Secure Development

About Us

Check This Articles

Segurança de aplicações com IA: como apoiar o desenvolvimento seguro

Application Security with AI: How to Support Secure Development

Introduction to Fuzzing Android Native Components: Strategies for Harness Creation

1. Obtaining Initial Data from the Java Layer

2. Native Code Analysis with Disassembler

3. Information in the Documentation and Test Directories of the Projects

4. Initialization Functions in Android Libraries

5. Header Versions and Structure Offsets in Harness Creation

6. Patience is a Virtue

7. System Logs and Debugging

Conclusion

About author

Related posts

Deixe um comentárioCancelar resposta

Sobre a Conviso

Confira esses artigos

About Us

Check This Articles

Descubra mais sobre Conviso AppSec