Reversing Tips: (Almost) Automatically renaming functions with Ghidra

Oftentimes when reversing a binary file we encounter the scenario where it doesn’t have symbols and end up with a lot of unnamed functions in the disassembler. To make sense of these we usually need to actually reverse engineer each function to understand what they do so we can rename them, specially when dealing with a closed-source, proprietary binary.

However, there are cases where the binary was compiled with debug logging functionality which expose the function names in error messages. In this post we show a practical example of leveraging this to automatically rename functions with their actual names using a Ghidra [2] script.

1. Case study: soap_serverd from Netgear RAX30

1.1 Obtaining the binary

We are going to use the binary soap_serverd from the Netgear RAX30 firmware [1] for our practical example. This binary implements a SOAP server in the router.

To obtain the binary we have to execute the following steps:

Download the firmware [1];
Extract the files using binwalk [3];
Find the binary at squashfs-root/bin/soap_serverd.

1.2 Loading the binary on Ghidra

Once we have obtained the file, we can load it up on Ghidra and start taking a look. Looking at the recognized functions, we can see their names were assigned by the disassembler and don’t represent their actual names.

1.3 Identifying the logging functionality

Now if we navigate to some of these functions we can see that there are some calls to a log_log function which have in their second argument a string which seems to represent the function name.

2. Creating a script

2.1 Ghidra scripts

To begin writing a Ghidra script, we first need to ensure that Ghidra is installed and set up on our system. We start by creating a new script in the Ghidra scripting environment. To do this, we open Ghidra, in the “Window” menu click in “Script Manager”, then click in “New” to create a new script. We then choose Python as the language and select a name for our script.

2.2 The idea

The idea is pretty simple, find all functions calling log_log, extract the second argument and rename the function. We’re going to break this idea in functions.

2.2.1 Imports and definitions

Here we import the necessary code and create some global definitions that we are going to use during the script.

import ghidra.app.script.GhidraScript as GhidraScript
import ghidra.program.model.symbol.RefType as RefType
import ghidra.program.model.symbol.SymbolType as SymbolType
from ghidra.app.decompiler import DecompileOptions
from ghidra.app.decompiler import DecompInterface
from ghidra.util.task import ConsoleTaskMonitor

current_program = getCurrentProgram()   # Gets the current program
monitor = ConsoleTaskMonitor()          # Handles monitor output to console
options = DecompileOptions()            # Configuration options for the decompiler
ifc = DecompInterface()                 # Interface to a single decompile process

ifc.setOptions(options)
ifc.openProgram(current_program)

2.2.2 Get the target function

The first function we need is one that will find our target function (log_log), so we can use it later to find references.

def get_target_function(name):
    symbol = current_program.symbolTable.getExternalSymbol(name)
    if not symbol:
        return getFunction(name)
    thunk_address = symbol.object.functionThunkAddresses[0]
    for ref in getReferencesTo(thunk_address):
        if ref.getReferenceType() == RefType.COMPUTED_CALL:
            return getFunctionContaining(ref.getFromAddress())
    return None

We use the SymbolManager to find an external symbol to the function and if the symbol doesn’t exist it is a hint that the target function is from the binary itself. Simply call getFunction to get a function. Not the case for this binary.

In this binary log_log is an external function so we will get a symbol. It is common for external functions to be called using thunk functions (an intermediary function that is used to redirect calls to the real function in the binary).

The code gets the reference to a call to the real function, then get the function where this call is found. This is our target function!

2.2.3 Get all callers

Next we need to get a list of all functions that call log_log.

def get_callers(function):
    address = function.getEntryPoint()
    callers = set()
    refs = getReferencesTo(address)
    for ref in refs:
        if ref.getReferenceType().isCall():
            caller = getFunctionContaining(ref.getFromAddress())
            if caller is None: continue
            callers.add(caller)
    return list(callers)

This code is straightforward. It gets the address of the target function and finds all references that are calls, then identifies the functions where these calls are located.

2.2.4 Get all calls from all callers

This is actually done in two functions for sake of simplicity. The function get_calls_from_all_callers is the starting point of the process…

def get_calls_from_all_callers(callers, callee):
    callers_info = []
    for caller in callers:
        caller_info = {
            'caller': {
                'name': caller.getName(),
                'address': caller.getEntryPoint()
            },
            'calls': get_calls_from_caller(caller, callee)
        }
        callers_info.append(caller_info)
    return callers_info

Then we reach a bigger and important function.

def get_calls_from_caller(caller, callee):
    calls = []
    res = ifc.decompileFunction(caller, 60, monitor)
    high_func = res.getHighFunction()
    if high_func:
        opiter = high_func.getPcodeOps()
        while opiter.hasNext():
            op = opiter.next()
            mnemonic = str(op.getMnemonic())
            if mnemonic == "CALL":
                inputs = op.getInputs()
                address = inputs[0].getAddress()
                args = inputs[1:]
                if address == callee.getEntryPoint():
                    location = op.getSeqnum().getTarget()
                    call_info = {
                        'location': location,
                        'callee' : {
                            'name': callee.getName(),
                            'address': callee.getEntryPoint()
                        },
                        'args': resolve_args(args)
                    }
                    calls.append(call_info)
    return calls

First thing we do in this function is to use the DecompInterface to decompile a function and get a high level abstraction that we can use in our analysis. This abstraction uses the P-code language.

“P-code is a register transfer language designed for reverse engineering applications. The language is general enough to model the behavior of many different processors. By modeling in this way, the analysis of different processors is put into a common framework, facilitating the development of retargetable analysis algorithms and applications” [4].

We iterate over a list of P-code instructions of the function and when we find a CALL operation we check if the CALL is for our target function. If positive, we extract the arguments (aka Varnode in P-code). This code is based on a previous work documenting this process [5].

2.2.5 Resolve args

A keen reader may have noticed a call to resolve_args. This function is responsible for resolving the varnodes into something meaningful.

“A varnode is a generalization of either a register or a memory location. It is represented by the formal triple: an address space, an offset into the space, and a size. Intuitively, a varnode is a contiguous sequence of bytes in some address space that can be treated as a single value. All manipulation of data by p-code operations occurs on varnodes” [4].

def resolve_args(args):
    resolveds = []
    for arg in args:
        if arg.isConstant():
            resolved = arg.getOffset()
        elif arg.isUnique():
            the_def = arg.getDef()
            constant_offset = the_def.getInput(0).getOffset()
            constant_addr = toAddr(constant_offset)
            data = getDataContaining(constant_addr)
            if data:
                resolved = data.getValue()
            else:
                resolved = data
        else:
            resolved = arg.getHigh().getName()
        resolveds.append(resolved)
    return resolveds

This code handles multiple types of varnodes; for our context the important branch is the one dealing with varnode of unique type. In this branch we get the pcode that this varnode belongs to using Varnode.getDef, from this we are able to get the string address and consequently the final data using
getDataContaining and Data.getValue.

For instance, in the function FUN_00014fb4 (shown above) the call to log_log has these raw arguments as Varnodes:

[
  (const, 0x7, 4), 
  (unique, 0x1000004c, 4), 
  (const, 0x385, 4), 
  (unique, 0x10000048, 4), 
  (register, 0x20, 4), 
  (register, 0x24, 4), 
  (stack, 0xffffffffffffffe0, 4)
]

The arguments after being resolved:

[
  7L, 
  u'update_client_token', 
  901L, 
  u'update_client_token %s %s', 
  u'param_1', 
  u'param_2', 
  u'param_3'
]

2.2.6 Get candidates

After resolving the arguments to log_log for each function that calls it, we create a list of candidate names that could be used to rename the original function. The arg_num is the position where the caller’s name is expected to appear in the call.

def get_real_name_candidates(callers_info, arg_num):
    callers_candidates = []
    for info in callers_info:
        names_candidates = set()
        caller_info = {'caller': info['caller']}
        for call in info['calls']:  names_candidates.add(call['args'][arg_num])
        caller_info['candidates'] = list(names_candidates)
        callers_candidates.append(caller_info)
    return callers_candidates

2.2.7 Rename the functions

Finally we can rename the functions using function.setName. Note that we only rename functions with only one candidate, leaving the cases with conflicts for the user to decide.

def rename_all(callers_candidates):
    total = len(callers_candidates)
    count = 0
    for callers_candidate in callers_candidates:
        current_name = callers_candidate['caller']['name']
        address = callers_candidate['caller']['address']
        candidates = callers_candidate['candidates']
        if not current_name.startswith('FUN_'): continue
        if (len(candidates)) != 1:
            msg = "ERROR   - {} - more than 1 candidate - {}"
            print(msg.format(current_name, str(candidates)))
            continue
        function = getFunction(current_name)
        new_name = candidates[0]
        if not new_name:
            msg = "ERROR   - {} - candidate is None"
            print(msg.format(current_name))
            continue
        function.setName(new_name, ghidra.program.model.symbol.SourceType.USER_DEFINED)
        print("SUCCESS - {} renamed to {}".format(current_name, new_name))
        count += 1
    perc = (float(count) / float(total)) * 100.0
    print("From {} functions {} were renamed - {}% ".format(total, count, perc))

2.2.8 The final part

Now that we defined almost all the functions this is our final part. This function accepts the function name (the logging function being called) and the position where the caller’s name is expected to appear in the call, then starts the process.

def rename_from_logging_function(function_name, arg_num):
    callee = get_target_function(function_name)
    callers = get_callers(callee)
    callers_info = get_calls_from_all_callers(callers, callee)
    callers_candidates = get_real_name_candidates(callers_info, arg_num)
    rename_all(callers_candidates)

And this is our main and some pythonic code.

def main():
    rename_from_logging_function('log_log', 1)

if __name__ == '__main__':
    main()

3. Demo

As we can see from the demo 96% of functions calling log_log were renamed.

The full script with additional comments can be found at:

https://github.com/convisolabs/reversing_scripts/blob/main/ghidra/rename_functions.py

4. Conclusion

In this post we saw how to leverage Ghidra scripting capabilities to automate the process of function renaming when reverse engineering a binary. We applied this approach in a real-world binary as a practical example. This is just a simple example of how to take advantage of programming to make your reverse engineering work easier.

5. References

Authors

Gabriel Quadros – Information Security Specialist
Ricardo Silva – Information Security Specialist

Reversing Tips: (Almost) Automatically renaming functions with Ghidra

1. Case study: soap_serverd from Netgear RAX30

1.1 Obtaining the binary

1.2 Loading the binary on Ghidra

1.3 Identifying the logging functionality

2. Creating a script

2.1 Ghidra scripts

2.2 The idea

2.2.1 Imports and definitions

2.2.2 Get the target function

2.2.3 Get all callers

2.2.4 Get all calls from all callers

2.2.5 Resolve args

2.2.6 Get candidates

2.2.7 Rename the functions

2.2.8 The final part

3. Demo

4. Conclusion

5. References

Authors

About author

Communication TeamArticles

Deixe um comentárioCancelar resposta

Sobre a Conviso

Confira esses artigos

Segurança de aplicações com IA: como apoiar o desenvolvimento seguro

Application Security with AI: How to Support Secure Development

About Us

Check This Articles

Segurança de aplicações com IA: como apoiar o desenvolvimento seguro

Application Security with AI: How to Support Secure Development

Reversing Tips: (Almost) Automatically renaming functions with Ghidra

1. Case study: soap_serverd from Netgear RAX30

1.1 Obtaining the binary

1.2 Loading the binary on Ghidra

1.3 Identifying the logging functionality

2. Creating a script

2.1 Ghidra scripts

2.2 The idea

2.2.1 Imports and definitions

2.2.2 Get the target function

2.2.3 Get all callers

2.2.4 Get all calls from all callers

2.2.5 Resolve args

2.2.6 Get candidates

2.2.7 Rename the functions

2.2.8 The final part

3. Demo

4. Conclusion

5. References

Authors

About author

Related posts

Deixe um comentárioCancelar resposta

Sobre a Conviso

Confira esses artigos

About Us

Check This Articles

Descubra mais sobre Conviso AppSec