Stories by +Ch0pin🕷️ on Medium

Fuzzing Android binaries using AFL++ Frida Mode

+Ch0pin️ — Tue, 14 May 2024 10:01:02 GMT

You might find this to be a fitting prologue to my earlier post on Creating and using JVM instances in Android C/C++ applications… and you are right !! Well, consider this my way of enticing you by presenting the end goal you’ll eventually reach. After all, it’s not uncommon to wade through various write-ups without a clear understanding of their objectives.

With that said, if you’re interested in or considering exploring fuzzing, this serves as a step-by-step guide on configuring AFL++ and employing it to fuzz Android binaries. I’ll try to keep it short and avoid boring paragraphs of type …how-to-set-up-your-Android-pentest-lab. After all if you don’t know what fuzzing or what AFL is, there are thousands of write ups out there in order to get you started.

I followed this step-by-step guide to set up AFL++ (Frida mode) on my MacBook Pro M1 running Sonoma v. 14.4.1, but I doubt you’ll encounter many challenges with your system.

Setting up AFL++

Download the latest release here: https://github.com/AFLplusplus/AFLplusplus/releases/ and extract the compressed files.
Install the Android-ndk using brew:

$brew install -- cask android-ndk

$export ANDROID_NDK_HOME="/opt/homebrew/share/android-ndk"

Set the ANDROID_NDK_HOME persistently, so you won’t need to redefine it every time you start a shell session. Depending on your OS and shell, you may add the line export ANDROID_NDK_HOME='/opt/homebrew/share/android-ndk' to your shell configuration file (e.g. ~/.zshrc in case you are using zsh).

3. Download the following CMAKE file and save it under the directory you extracted AFL (in step 1):

https://github.com/Ch0pin/android-fuzzing/blob/main/AFLplusplus/CMakeLists.txt

If you face any issue with the above you ming thave to change this part:

execute_process(
  COMMAND
  bash -c "echo 'unsigned char api_js[] = {' > ${API_C}; \
  xxd -p -c 12 ${API_JS} | sed -e \"s/\\([0-9a-f]\\{2\\}\\)/0x\\1, /g\" \
                         | sed -e \"s/^/  /\" >> ${API_C}; \
  echo '};' >> ${API_C}; \
  echo \"unsigned int api_js_len = $(stat --printf='%s' ${API_JS});\" \
     >> ${API_C}"
 )

As follows:

execute_process(
  COMMAND
  bash -c "echo 'unsigned char api_js[] = {' > ${API_C}; \
  xxd -p -c 12 ${API_JS} | sed -e \"s/\\([0-9a-f]\\{2\\}\\)/0x\\1, /g\" \
                         | sed -e \"s/^/  /\" >> ${API_C}; \
  echo '};' >> ${API_C}; \
  echo \"unsigned int api_js_len = $(stat ${API_JS} | cut -d ' ' -f 8);\" \
     >> ${API_C}"
 )

4. Save the following script under the directory you downloaded AFL and run it in order to compile the afl-fuzz and afl-frida-trace.so:

mkdir build && cd build
cmake -DANDROID_PLATFORM=31 \
      -DCMAKE_TOOLCHAIN_FILE=/opt/android-ndk-r25c/build/cmake/android.toolchain.cmake \
      -DANDROID_ABI=arm64-v8a ..
make

You may need to change the DCMAKE_TOOLCHAIN_FILE value with the location of the ndk. In case it is installed with Brew, this path will be under the /opt/homebrew/Cascroom/android-ndk

If everything worked as expected

You may find the afl-fuzz and afl-frida-trace.so under the ./build path. Use adb to push these binaries in /data/local/tmp:

$adb push afl* /data/local/tmp

Give execute access to the afl-fuzz . If you followed my guide, you probably know what to fuzz. Indicatively, assuming that the binary you want to fuzz is called ‘fuzz’ :) you may start with:

./afl-fuzz -O -G 256 -i in -o out ./fuzz:

Ghost files in the shared preferences

+Ch0pin️ — Sun, 18 Feb 2024 16:01:08 GMT

Have you ever encountered an exceptionally clever bug, only to be thwarted by an unforeseen obstacle just moments before exploiting it? Perhaps a check, initially designed for another purpose, now inadvertently blocks you from leveraging this significant bug you’ve discovered?

That’s precisely what this post aims to explore….

The bug ? what bug 🕷️?

Numerous bug categories can enable you to circumvent WRITE restrictions in an application’s home directory, sparking considerable excitement due to the typically high impact of such attacks. For example, achieving code execution through overwriting a native library can lead to significant repercussions, among other potential exploits.

Even in the absence of a native library, numerous avenues remain for exploiting such a bug, with the shared preferences directory being a prime target. Android applications often store connection settings within the shared_prefs directory. Should you succeed in overwriting a trusted domain within this space, you could redirect the application to communicate with a malicious server. This could lead to the unintended transmission of user tokens and other sensitive information directly to an attacker-controlled environment.

Write is not always … Overwrite

Many apps implement checks prior to overwriting a file; it’s quite typical, for instance, to verify the file’s existence. If the file exists, they might either halt the operation altogether or prompt for user consent for overwriting it. This requirement for user interaction can elevate the user-interaction metric of a CVE (Common Vulnerabilities and Exposures) and consequently reduce the overall CVE score, impacting the perceived severity of the vulnerability.

This is exactly the obstacle that I encountered in numerous cases while trying to overwrite a file…

if(file.exists()) abort();

And this is how it feels….

…and I might have given up, if it wasn’t this great file monitoring medusa module called file_write…without any direct intervention on my part, I observed the appearance of unusual .bak files within the shared preferences directory. The answer to the purpose of these mysterious .bak files lies within the implementation of the shared preferences class:

    SharedPreferencesImpl(File file, int mode) {
        mFile = file;
        mBackupFile = makeBackupFile(file);
        mMode = mode;
        mLoaded = false;
        mMap = null;
        mThrowable = null;
        startLoadFromDisk();
    }

Notice the makeBackupFile , which simply returns a new file with the .bak extention:

    static File makeBackupFile(File prefsFile) {
        return new File(prefsFile.getPath() + ".bak");
    }

Let’s say you are using the example.xml file , then this one will return example.xml.bak . Now take a look below:

The StartLoadFromDisk at the end of the constructor, calls the loadFromDisk , which checks if the example.xml.bak exists and if it does, it deletes the example.xml and renames the example.xml.bak to example.xml

Write is always … Overwrite (when it comes to shared_prefs)

I guess, it’s clear where this is leading… Suppose you have WRITE but not OVERWRITE in the shared preferences directory. Instead of attempting to write the file directly, you could simply create a .bak file and allow the behavior described previously to work in your favor. This approach leverages the inherent handling of .bak files by the shared preferences mechanism to indirectly achieve file modification, circumventing the restriction on direct overwriting ;)

Creating and using JVM instances in Android C/C++ applications

+Ch0pin️ — Wed, 30 Aug 2023 11:35:09 GMT

Considering the reader’s interest in this post, it’s reasonable to assume a certain level of familiarity with JNI and its usage. For those who stumbled upon this content by chance, a brief introduction to the subject is recommended. I invite you to explore the topic further by reading THIS article which provides a foundational understanding.

While the typical perception of JNI involves utilising native code in Java applications, this post explores exactly the oposite. We’re about to dive into crafting a pure native Android app and we will try to use Java features, that normally a native app wouldn’t support.

Why we might want to do such a thing ?

In addition to the advantages related to software development, when trying to test, fuzz, or broadly speaking, reverse-engineer native code that interacts with Java objects, you’ll inevitably encounter a juncture where you need to isolate particular segments of code, and this is what this post is all about.

So, what are we going to do ?

Our task is to call a java method from an apk using a C/C++ Android application. I chose com.whatsapp version 2.23.16.76, from which we are going to call the following method (which can be found in the X.2ts class):

public static X.2ts A01(byte[] bArr)

This method gets as an argument a byte array returned by the native method:

WebpUtils.fetchWebpMetadata(file.getAbsolutePath())

This functionality is natively implemented in the libwhatsapp.so library. Given a file path that points to a webp file, this function returns the file's metadata as a byte array. Subsequently, the A01 method utilises this data to initialize an object of the X.2ts class, encapsulating the metadata information. Finally, we invoke the toString method of the X.2ts class to display the metadata of the webp file as interpreted by WhatsApp.

Our final “deliverable” is an Android binary (lets call it caller) which gets a file path to a webp file from the command line and prints its metadata.

Calling WhatsApps get webp metadata java method

The Invocation API

The first step towards achieving our goal is to create a Java Virtual Machine (JVM) and utilize it within our native application to execute our Java compiled code. The Invocation API enables software vendors to integrate the Java VM into any native application [1]. This API offers a range of functions, including the creation, attachment, detachment, and destruction of the JVM. Among these functions, the JNI_CreateJavaVM stands out as one of the most important. It initializes a Java VM and provides a pointer to the JNI interface pointer (JNIEnv):

JNI_CreateJavaVM(&jvm, (void**)&env, &vm_args);

The third parameter contains a set of arguments which are used during the VM’s initialisation. Conveniently, the java native interface provides a structure called JavaVMInitArgs which can be used for this reason:

typedef struct JavaVMInitArgs {
   jint version;
   jint nOptions;
   JavaVMOption *options;
   jboolean ignoreUnrecognized;
} JavaVMInitArgs;

typedef struct JavaVMOption {
    char *optionString;
    void *extraInfo;
} JavaVMOption;

We are going to use theJavaVMOption in order to define the path where our java compiled code relies.

Implementation

While I was writing this post, I came across various implementations of the JVM creation process, including:

One used in THIS post by Quarkslab
Celeb Fenton’s post: Calling JNI Functions with Java Object Arguments from the Command Line
This gist by tewilove

Unfortunately, none of these worked for me (for various reasons in each case), so I decided to use the libnativehelper approach which did the trick for me. Further than that, the code is pretty much similar with the one from Quarkslab and you can find it here.

Our project consists of the following files:

caller.c (which corresponds to our native app)
jnihelper.c (a library that we are going to use to create the JVM)
include/jenv.h (header file for our jnihelper library)
lib/libwhatsapp.so (the whatsapp library extracted from the whatsapp apk)

The jnihelper library (jnihelper.c)

Before we proceed with compiling the library as well as the native app that uses it, let’s take a look on a few things. First of all, the method that we are going to use to create JVMs is the following (jnihelper.c):

int initialize_java_environment(JavaCTX *ctx, char **jvm_options, uint8_t jvm_nb_options)

This method returns JNI_OK on success or JNI_ERR otherwise and takes the three following parameters:

JavaCTX *ctx, is a pointer to a structure of type JavaCTX which holds context and configuration information related to the Java environment.
char **jvm_options, is a pointer to an array of pointers to characters (strings). We will use it to pass an array of Java Virtual Machine (JVM) options or/and configuration settings to the function.
uint8_t jvm_nb_options represents an unsigned 8-bit integer which we will use to indicate the number of JVM options provided in the jvm_options array.

After successful invocation of the JNI_CreateJVM the ctx->vm and and ctx->env will point to the JVM and JNIenv respectivelly:

...

jint status = JNI_CreateJVM(&ctx->vm, &ctx->env, &args);

if (status == JNI_ERR){
        printf("[!] Can't create java vm/env \n");
        return JNI_ERR;
    }
    printf("[+] Initialization completed successfully.\n \
    [+]Java VM pointer: %p\n \
    [+]Java env pointer: %p\n",ctx->vm, ctx->env);
....
....

We are going to use the initialize_java_environment from our caller native program, in order to be able to call java methods from our DEX/APK file.

The native caller (caller.c)

Starting with the main, we have the following:

JavaCTX ctx;

int main(int argc, char **argv)
{
    int status; 
    if(argc < 2){
        printf("Usage: ./caller webp_file.webp");
        return 1;
    }
    char *jvmoptions = "-Djava.class.path=/data/local/tmp/JNIhelper/base.apk";
    if((status = initialize_java_environment(&ctx,&jvmoptions,1)) != 0)
        return status;
    
    wrapper(argv[1]);
    if(cleanup_java_env(&ctx)!=0)
        return -1;
    return 0;
}

While the code is pretty much self-explanatory a few points worth to mention are:

The jvmoptions points to the whatsapp apk, which we push under the: /data/local/tmp/JNIhelper/base.apk
The call to the initialize_java_environment in order to create our JVM and initialise our java context (JavaCTX).
The wrapper depicted below:

int wrapper(const char *path){

jclass X_2ts = (*ctx.env)->FindClass(ctx.env, "X/2ts");
    if (X_2ts == NULL) {
        printf("Can't find class X/2ts\n");
        return -1;
    }
    jmethodID A01 = (*ctx.env)->GetStaticMethodID(ctx.env, X_2ts, "A01", "([B)LX/2ts;");
    if (A01 == NULL) {
        printf("Can't find method A01\n");
        return -1;
    }

    jobject X_2ts_obj = (*ctx.env)->CallStaticObjectMethod(ctx.env,X_2ts,A01,Java_com_whatsapp_stickers_WebpUtils_fetchWebpMetadata(ctx.env,NULL,(*ctx.env)->NewStringUTF(ctx.env,path)));
    if(X_2ts_obj==NULL) {
        printf("Can't create X_2ts_obj object\n");
        return -1;
    }
        
    jmethodID toString = (*ctx.env)->GetMethodID(ctx.env,X_2ts,"toString","()Ljava/lang/String;");
    if(toString==NULL){
        printf("Can't find toString method id\n");
        return -1;
    }
    jstring describe = (*ctx.env)->CallObjectMethod(ctx.env,X_2ts_obj,toString);
    if(describe==NULL){
        return -1;
    }
    const char *descr = (*ctx.env)->GetStringUTFChars(ctx.env, describe, NULL);
    if(descr!=NULL)
        printf("%s",descr);
        return 0;
    (*ctx.env)->DeleteLocalRef(ctx.env, X_2ts_obj);
    (*ctx.env)->DeleteLocalRef(ctx.env, describe);
    return -1;
}

Notice how we use our JVM in order to call the JNI methods, having loaded what we need from the whatsapp apk.

Compile

Assuming that you have download and install the Android NDK, make sure to modify the the build.sh in order to point to your toolchain file:

mkdir build && cd build
cmake -DANDROID_PLATFORM=31 \
        -DCMAKE_TOOLCHAIN_FILE=$HOME/Library/Android/sdk/ndk/25.2.9519653/build/cmake/android.toolchain.cmake \
        -DANDROID_ABI=arm64-v8a ..
make

Running

Push the compiled binaries (build/caller and build/libjenv.so), the whatsapp apk (as base.apk) and the a.webp under /data/local/tmp.
Chmod the caller to +x
Install the whatsapp apk (we need a couple more dependencies to resolve)
Set the LD_LIBRARY_PATH to ./:/data/data/com.whatsapp/files/decompressed/libs.spk.zst/

Run with ./caller a.webpReferences:

Project git directory: https://github.com/Ch0pin/JNIInvocation

References:

Wireless pairing and device mirroring in Android Studio

+Ch0pin️ — Wed, 01 Mar 2023 10:06:12 GMT

Having your mobile devices cable-connected can be challenging sometimes.

Thankfully, the latest Android studio versions provide a convenient way to take control of them remotely, including mirroring, debugging and file browsing.

Wireless Pairing

To pair a device wirelessly, follow the steps bellow:

Open the Android Studio device manager:

2. Click on ‘Pair using Wi-fi’:

3. Navigate to the developer options in your device:

And simply scan the QR code:

You can now access the device using the Device Manager’s options:

Physical device mirroring

As many of you I have been using this excellent tool https://github.com/Genymobile/scrcpy to perform physical device mirroring. The newest Android Studio version though, includes an option that can be used to perform the same task (although it is still experimental).

To enable it, navigate to:

And then:

This is it…. You can now view your device by clicking on the “Running Devices” menu:

Practical ARM64 (Subroutines)

+Ch0pin️ — Fri, 26 Aug 2022 14:12:20 GMT

Calling subroutines in higher level programming languages is trivial, the developer has simply to reference the name of a subroutine, give some arguments (if any) and handle the result. Doing the same in assembly language can be sometimes overwhelming as the developer has to take care a lot of details and comply with the calling conventions of each processor family.

A calling convention defines how arguments are passed to subroutines and how the results are returned. These “rules” are not enforced by hardware, but they must be followed during the development process in order for the product to be available to other programmers.

When it comes to AArch64 the rules of calling a subroutine are the following:

Up to eight parameters are stored in registers x0-x7:
Any additional parameter must be passed in the stack in reverse order
The subroutine’s result (if there is any), should be stored in the x0 register

Marshalling: is the process of placing arguments to the corresponding location

1st argument → x0, 2nd argument →x1, …, 8th argument →x7

Additionally there are volatile (caller saved) and non-volatile (callee saved) registers. Simply said, when you store a data in to a volatile register don’t assume that this information will survive a subroutine call. Contrariwise, a subroutine must save the contents of a non-volatile register before usage and restore them afterwards. In respect to AArch64 we have the following conventions:

x0-x7 are volatile while X0 is used to store the result of a subroutine
x8-x18 are also volatile, while during a system call, X8 stores the (linux) system call number
x19-x28 are non-volatile
x29, x30, sp correspond to the Frame Pointer (FP), Link Register (LR) and Stack Pointer.

Volatile and non volatile registers

Calling a subroutine

Let’s first see the steps that we should take when calling a subroutine.

Arguments to registers

Let’s start with a simple case where we have only up to eight arguments that we have to take care of. In the example below, we are calling the printf function passing the format string to x0 (line 10), and the rest of the parameters to w1-w7 registers (lines 12–15):

nstack.s

Compiling and running the program ($as nstack.s -o nstack.o && ld nstack.o -o nstack -lc)

As we discussed in the previous posts, the bl instruction will store the contents of the program counter (pc) to the link register (lr) and set the new value of the program counter to the address of the first instruction of the subroutine that we are calling. According to the printf’s manual, this subroutine expects the format string as a first argument and the displayed values as a 2nd, 3rd and so on:

Since we comply with the calling convention, printf executes as expected, printing the given values to the standard output.

Arguments to the stack

When calling a subroutine that takes more than eight arguments, the extra ones must be stored in to the stack. The process of popping and pushing values from and in to the stack takes place in two steps:

First the developer has to allocate space in the stack by modifying the value of the stack pointer (sp).
Then, store or recover a value to or from the memory address where sp points to.

Allocating space

This is done by subtracting the space that we need in byte units from the value that the stack pointer points to, while taking care of the stack alignment. In AArch64 the stack pointer must always be 16 bytes aligned.

Although this seems confusing, thing of the stack as the pile depicted below:

AArch64 16 Bytes alignment requirement

In order to store 16 bytes the Stack TOP must be placed one position lower, for 32 bytes two positions and so on. To store values which are not multiples of 16 we need to find the closest 16 byte multiple boundary and set the Stack TOP to this value. This means that in order to store 8 bytes the stack top should still be placed one position lower, for 24 bytes two positions … and so on.

In the example below, we need to store 24 bytes in total (8 for each register):

The instructions at lines 7 and 8 will modify the stack as follows:

More specifically, for the sake of simplicity, assume that sp points to 0 when entering the main function. The [sp, #-32]! will set sp equal to sp −32 and X29 will be stored at sp[-32:-25] and X30 at sp[-24:-17]. Finally X19 is stored at sp + 16 (the sp value is not modified).

Now that (hopefully), this step is clear, let’s see an example, which make use of these concepts. We will use the notation sp[a:b] to indicate the stack offsets and start by storing an array of 8 integers in to the stack:

Compile and load the program above in gdb and set a breakpoint at *main+0. Then step in to each instruction in order to observe the changes in the stack:

sp is set to sp-32, x29 will be stored at 0x7..ffb10 → 0x7..ffb17 and X30 at 0x7..ffb18

sp is not modified and X19 is stored at 0x7..ffb20

sp is set to sp-32

sp is restored to the entry value

Notice that the instruction at line 25 allocates a 32 bytes space and the next stp instructions push the array elements in to the stack. Finally the instruction at line 33 will restore the stack pointer to the state before saving the array elements, and finally recover the rest of the values (line 34, 35).

In the next example, we are calling printf once again, passing more than 8 arguments this time:

Few things to notice:

At line 12, we store x19 as we are going to use it and it is non volatile
printf will take 12 arguments in total, including the format string, this means that 4 arguments have to be pushed to the stack
At lines 25, 26 x11 will be stored at sp[16:23], x12 at sp[24:31], x9 at sp[0:7] and x10 at sp[8:15]
Although that the extra arguments are 4 bytes each we store them as 8 bytes value in to the stack before the call to printf
In line 28 we set the return value to 0 and restore sp (line 29)
Finally, we restore x19, x29 and x30 from the stack and return to the address indicated by x30

Implementing a subroutine

We saw the steps that we should take when calling a subroutine and now it is time to see the conventions from the perspective of the called subroutine. From what have been discussed so far, you must already figured out that:

We can safely assume that up to 8 arguments must be stored in the registers x0 to x7 and the extra ones in the stack.
The returned value must be stored in x0

Additionally:

When using a non volatile register we must save its value before we use it and restore it before exiting
Volatile registers can be used without need to restore their value
The link register (x30) and frame pointer (x29) must also be saved when entering a subroutine and restored before exiting.

Example u-itoa

Reaching at the end of this post, we are going to write a program which converts an unsigned integer to a null-terminated string using a specified base and prints the result to the screen. More specifically, our main function calls the scanf to get an unsigned value and a base. It then calls our subroutine uitoa which does the conversion and prints the returned result to the screen. We are going to break our program in 3 parts, in order to make it easier to understand.

The first part which is the simplest one, asks the user to enter an unsigned integer and a base and then calls the standard scanf function to get the input. It then calls our subroutine uitoa which gets three arguments: the address where it should save the result, the integer to be converted and the conversion base:

Our simplified version of itoa, checks if the base is between 2 and 16 and the input is greater than 0. It then runs a loop where it divides the input with the base and stores the remainder on every iteration at position result[i]:

When this function exits, the result will be in reverse order at the memory address where x0 points to, while the length of the result is stored in the x1 register. Finally we print the result in reverse order:

The overall program structure is as follows:

Compile and run:

Practical ARM64 (selections and loops)

+Ch0pin️ — Tue, 16 Aug 2022 08:26:33 GMT

So far we went trough the most important instructions of the AArch64 instruction set and it is time to move to something more practical. In these series of posts we are going to talk about structured programming in arm64. For better understanding, we are going to use C statements and try to “translate” them to their arm64 corresponding ones.

Selections

Simple if — then C statements, can be easily implemented in ARM by combining compare and branch instructions. The following examples are pretty straightforward:

Similarly, an if x ≥ 10 statement can be written as follows:

An if-then-else C statement can be achieved by adding an additional branch instruction in order to jump to the else branch, in case the if fails:

As being said, the compare and branch instruction can be combined to effectively simulate any if-then-else statement:

Conditional operations can also be used to implement more complex selection structures like if-elif-else.

Let’s see an example:

ARM to C equivalent

Few things about the example above:

Starting in line 15, we load the nums address to the x0 register.
In line 16, the integers 2,1 will be stored in w1 and w2 respectively and x0 will be advanced by 8 bytes in order to point to the next integer 3. At this point w1=2, w2=1, w3 = 3.
The statement in line 20 compares w1 < w2 and if it succeeds it redirects the execution to the elseif label. If it fails, which means that w1≥ w2, the statement at line 22 will be executed and w1 will be compared with w2.
The statement in line 23 completes the if(a≥b && a≥ c) then max = a, since the csel statement will maintain the value of w1 if w1 is greater or equal to w3, or it will set w1 = w3 if w3 is grater than w1.
Line 24 uses a branch statement to jump to the else label and prepare the call to printf after storing the address of the first parameter “%d\n” to x0 and the second parameter (the max value) to x1(remember R0 to R7 store argument values passed to and results returned from a subroutine).
Line 26, 27 covers the if(b≥c) then max = b while
Arriving at line 29 we make sure that the max number is stored in x1, thus the call to printf will yield the number 3.

You can compile the above with the following oneliner:

Although that we are going to discus about the prologue and epilogue of a function in a dedicated post, line 14 saves the frame pointer and link register (which stores the return address to the _start function) to the stack, while line 31 restores these values, thus the execution returns to the calling function.

Loops

Depending upon the position of a control statement in a program, loop statements in ARM are classified into two types:

pre-test (for and while)
post-test (do-while )

Let’s see some examples:

Example 1: Reverse String

Reverse string, pre-test loop

Few things to notice about the above: in lines 17–18 we call the strlen which will return the length of the string inserted by the user to the w0 register. Assume, for example that the user entered “example” as an input, then w0 will be equal to 7. As we want to print the last char which is equal to inpt[6] we subtract #1 from w0 and we store the result to the stack (line 19) and subsequently we store this value back to w1 (this is because we need w0 to be available for our call to putchar in line 27).

The startloop label literally implements our while loop: the comparison in line 23 will test the value of w1 and will end the loop (line 23), if w1 is les than 0. The body of the loop is implemented in lines 24–29. More specifically we store the address of inpt to x2 register and load the value of x2+x1 to x0. As x1 is subtracted in each loop, x0 will store the values [x2+6], [x2+5],…,[x2+0], in each loop, thus the putchar will print the values “e”, “l”, “p”,…,“e” respectively.

Compile and run the program with

$as revstr.s -o revstr.o && ld revstr.o -o revstr -lc

Example 2: Print decimal to binary

In this program we are going to ask the user to enter an unsigned integer value of which we are going to print its binary form by performing short division by two with remainder. Let decimal number is N then divide this number from 2 because base of binary number system is 2. Note down the value of remainder, which will be either 0 or 1. Again divide remaining decimal number till it became 0 and note every remainder of every step. Then write remainders from bottom to up (or in reverse order), which will be equivalent binary number of given decimal number.

Compile with $as printbin.s -o printbin.o && ld printbin.o -o printbin -lc

ARM 64 Assembly Series — Data Processing (Part 2)

+Ch0pin️ — Thu, 04 Aug 2022 10:34:42 GMT

ARM 64 Assembly Series — Data Processing (Part 2)

Previous posts: Basic definitions and registers, lab setup, offset and addressing modes, Load And Store, Branch, Data Processing Part 1

In the first part of the data processing instruction set we talked about arithmetic, logical, move and shift operations. Continuing on the same track, in this part, we are going to discuss about multiplication and division, as well as the rest of the most important operations of the A64 instruction set including compare, conditional and special instructions.

Multiplication and Division

The mul, madd, msub and mneg can be used to multiply two 32bit or 64bit registers and get 32bit and 64bit results respectively:

https://medium.com/media/d56263c60473b14fb4f447596228a77a/href

When forming 64bit results from 32bit registers we have the following:

https://medium.com/media/ebe07a6cfad0abdaf415b843d4485fd3/href

Replacing the s in the beginning of the mnemonic with a u denotes unsigned multiplication (umull, umaddl, umsubl, umnegl). For 128bit results the smulh and umulh can be used too calculate the upper 64bits and mull can be used for the rest, for example:

smulh Xd, Xm, Xn means Xd=Xm × Xm

Xd will hold the upper 64 bits of the result and the s denotes that the Xm and Xn must be sign extended. For the corresponding unsigned operation we have the following:

umulh Xd, Xm, Xn means Xd=Xm × Xm

When it comes to division we have the sdiv and udiv to divide and unsigned divide. For example:

sdiv Rd, Rm, Rn means Rd=Rm ÷ Rm

Comparison

The comparison operators are used to set the PSTATE flags and don’t have any further effect as the result is discarded. Their general syntax is as follows:

op Rn, operand2

https://comp.anu.edu.au/courses/comp2300/resources/ARM_cheat_sheet

For example the statement cmp X0, #0x40 will subtract 0x40 from X0 and if the result is negative, it will set the N flag of the PSTATE register. Similarly cmn will add the first and second operand in order to set the N flag accordingly. The TST instruction performs a bitwise AND operation on the value in Rn and the value of Operand2. This is the same as a ANDS instruction, except that the result is discarded.The TEQ instruction performs a bitwise Exclusive OR operation on the value in Rn and the value of Operand2. This is the same as a EORS instruction, except that the result is discarded.

Using compare, branch and and to construct loops is straightforward:

Conditional operations

This set of instructions is used to set the destination register equal to first or second operand, depending on certain conditions. Their syntax is as follows:

op Rd, Rn, Rm, (1)
op Rd, Rn, (2)
op Rd, (3)
op Rn, R_or_imm, nzcv, (4)

The can be one of the following:

In the first case (1) op can be either csel, csinc, csinv, csneg and can be interpreted as follows:

csel Rd, Rn, Rm, cond means if cond then Rd = Rn else Rd = Rm

csinc Rd, Rn, Rm, cond means if cond then Rd = Rn else Rd = Rm+1

csinv Rd, Rn, Rm, cond means if cond then Rd = Rn else Rd = ˜Rm

csneg Rd, Rn, Rm, cond means if cond then Rd = Rn else Rd = ˜Rm+1

In the second case (2) op can be either cinc, cinv or cneg and can be interpreted as follows:

cinc Rd, Rn, cond means if cond then Rd = Rn+1 else Rd = Rn

cinv Rd, Rn, cond means if cond then Rd = ˜Rn else Rd = Rm+1

cneg Rd, Rn, cond means if cond then Rd = ˜Rn else Rd = Rn

In the third case (3) op can be either cset or csetm and can be interpreted as follows:

cset Rd, cond means if cond then Rd =1 else Rd=0

csetm Rd, cond means if cond then Rd =0xffff..fff else Rd=0x0000..000

Finally in case (4) op can be either ccmp or ccmn and can be interpreted as follows:

The conditional compare ccmp Rn, R_or_imm, #nzcv, cond can be interpreted as follows:

if  then
    cmp Rn with R_or_imm and set the nzcv according

else 
    nzcv = #nzcv

R_or_imm can be a register or an immediate

As being said before, while cmp a, b executes a-b and sets the PSTATE flags accordingly, cmn on the other hand executes a+b, this affects the conditional compare as follows:

if  then
    cmn Rn with R_or_imm and set the nzcv according

else 
    nzcv = #nzcv

R_or_imm can be a register or an immediate

Special instructions

Count leading zeros: clz Rn, Rm counts the leading zero of Rm and stores the result to Rn:

Move Status to Register or register to status: mrs Rn, status or msr status, Rn:

Supervisor call: svc system_call_number is used to perform a system call. Depending on the operating system, each call has a specific id as it is depicted in the figure below:

full list here: https://chromium.googlesource.com/chromiumos/docs/+/master/constants/syscalls.md

The system_call_number in Linux is always set to 0 while the actual id is store in X8 and up to six parameters can be passed to x0-x5:

The system call id is stored in x8, x0-x2 stores the three arguments (“/bin/sh”,null,null) and svc 0 will perform the system call

No operation: nop which does nothing, other than advance the value of the program counter by 4. This instruction can be used for instruction alignment purposes.

ARM 64 Assembly Series — Data Processing (Part 1)

+Ch0pin️ — Mon, 01 Aug 2022 09:38:43 GMT

ARM 64 Assembly Series — Data Processing (Part 1)

Previous posts: Basic definitions and registers, lab setup, offset and addressing modes, Load And Store, Branch

So far we talked about load, store and branch instructions and it is time to discuss about a (long) set of instructions that can be used to process data. To quickly refresh your memory on what has been discussed so far, you can refer to the table below or you can simply navigate to the previous posts by following the links in the subtitle section:

https://comp.anu.edu.au/courses/comp2300/resources/ARM_cheat_sheet/

Condition modifiers

General format and operands

In mathematics, an operand is the object of a mathematical operation. For example, in the following addition y and x are the operands of an addition where a is the result : y + x= a. Similarly, in arm assembly most of the data processing instructions require two operands and a destination register:

op Rd, Rn, Rm

op defines the type of the operations, Rd is the result destination register, Rn is the first operand and Rm is the second.

As you probably notice in the figure above, the first operand enters directly the ALU while the second one may be processed before entering (e.g. shifted as in the example above). Indeed, this flexible second operand, also known as operand2, can be one of the following:

A register with an optional shift or extend operation (LSL, LSR, ASR e.t.c)
A 12bit immediate value or 13bit pattern (used only for logical instructions)

Lets see some examples:

On the right: operand2 as a pattern for the logical operations AND and ORR. On the left: operand2 as a register to an add instructions and as an immediate to a sub instruction (examples source)

add x1, x4, x5           // x1 = x4 + x5
sub x0, x0, #1           // x0 = x0 - 1
neg x3, x4, lsl #3       // x3 = -(x4 << 3)

We will go into details regarding the shift and extend operations, but for now and for the sake of simplicity we will refer to them as shift_op and extend_op. That being said, here is a summary of the operand2 formats:

https://medium.com/media/dd153dd42f9b366b980a63ae2a0fcdcb/href

Arithmetic operations

The basic arithmetic instructions are the add, sub and neg corresponding to addition, subtraction and negation. In addition to those, we have the adc, sbc and ngc which are adding the carry bit of the PSTATE register to the two operands. These (carry) instructions can be used only with unsifted/unextended operands. Finally, an s can be appended to each instruction (adds, subs and so on) in order to affect the flags of the PSTATE:

Logical operations

Similarly to the arithmetic operations, the general syntax of the logical operations is op Rd, Rn, operand2 except the bitwise-not which uses only the operand2 and the destination register. As before, appending an s to the instruction can affect the PSTATE flags. Here are the most basic logical operations and their usage:

Examples:

https://medium.com/media/0ff6eb6305456d837e0b877e999f4aa7/href

More on the role of s and the PSTATE

Contrary to AArch32, AArch64 doesn’t not allow other than the branch instructions to be conditionally executed.

Need to know (n2k): Conditional execution controls whether or not the core will execute an instruction. Most instructions have a condition attribute that determines if the core will execute it based on the setting of the condition flags. Prior to execution, the processor compares the condition attribute with the condition flags in the cpsr. If they match, then the instruction is executed; otherwise the instruction is ignored.

As being said before, appending an s to the mnemonic of the operation can be used to set the flags of the PSTATE register which in conjunction with a branch instruction can be used to modify the flow of a program. Let’s see some examples:

https://cpulator.01xz.net/?sys=arm

As it is depicted above, since the result of the subtraction operation is negative, the N (Negative) flag of the program status register (cpsr in this case) will be set to 1. This will make the bmi instruction (branch minus) to set the program counter to the address of the neg label. The Z (Zero) denotes another flag of the status register which is set when a result is equal to zero:

The V (oVerflow) flag is set if the result of an addition or subtraction operation can overflow the range of the result. The flag will be set to 1 if overflow occurs and 0 if not:

The C (Carry) flag is used to indicate whether the result of an unsigned operation is not representable. For example adding 1 to 4294967295 will normally has as a result the value 4294967296. However, if the destination register can hold up to 32 bits then the result will be zero with carry:

Move and Shift Operations

The move operations are used to copy data from a register to a register or from an immediate to a register. The particular set includes the following instructions:

https://medium.com/media/a6c47540dbab9d74a23f2059f7bc3593/href

The shift operations are used to shift or rotate the contents of a register. They can be used either as standalone instructions or for flexible second operands similar to the ones we saw above. The standalone syntax is as follows:

op Rd, Rn, Rm

Where op can be either lsl, lsr, asr or ror:

source: https://armkeil.blob.core.windows.net/developer/Files/pdf/graphics-and-multimedia/ARMv8_InstructionSetOverview.pdf

As it is depicted above, the lsl instruction shifts the contents to left while padding the shifted positions with 0. A single shift to left is like multiplying by 2, a double shift is like multiplying by 4 and so on:

Similarly to lsl, the lsr instruction shifts the contents to right while padding the shifted positions with 0. A single shift to right is like dividing by 2, a double shift is like multiplying by 4 and so on:

The ror instruction will rotate the contents, moving the shifted bit to the most significant bit of the register:

Rotating 10 bits to right

Finally, the asr instruction will shift a number of bits to the right padding with zeros but maintaining the sign bit:

That’s all for now, I hope to see you in part 2 of the data processing instructions.

ARM 64 Assembly Series — Branch

+Ch0pin️ — Thu, 21 Jul 2022 10:27:57 GMT

ARM 64 Assembly Series — Branch

Previous posts: Basic definitions and registers, lab setup, offset and addressing modes, Load And Store

In the previous post we talked about the ldr and str instructions which can be used to transfer data bidirectionally between a memory address and a register (or pair of registers):

Appending b, h or w to the instruction mnemonic indicates an unsigned byte, a half word or a word respectively. Adding an s in front of these letter (sb, sh, sw), it will force to the cpu handle the data as signed.

In this post we are going to talk about branch instructions and how they can be used in order to change the address of the next instruction that will be executed.

Branch

Branching is one of the most important concepts in programming as it allows the developer to define alternative code paths depending on various conditions. In high level languages, these conditions are evaluated using control flow statements like the if, for, while or even goto. Similarly, in low level languages there are special instructions that may be used in order to achieve the same result and route the code flow to a different path.

AArch64 defines a set of branch instructions which can be used to perform conditional or unconditional jumps within a function (branch) or calls to other functions (branch and link). Let’s see the most important of them as well as their usage.

Conditional and Unconditional Branches

Starting with the simplest case, a conditional or unconditional branch instruction looks as follows:

b label

And can be interpreted as if then pc = new_address .If the parameter is omitted (e.g. b label), then it simply sets the pc = new_address .The label is an immediate which is encoded as a relative offset to the program counter. This immediate will be sign extended and multiplied in order to calculate the offset that will be added to the current address of the program counter:

offset = immX * 4

Where X will be 19bits for conditional branches and 26bits for unconditional. Finally, the symbol is a mnemonic which denotes the state of a flag of the PSTATE register. The possible values of as well as their meaning in regard to the PSTATE flags is depicted below:

Table 1: Condition modifiers for AArch64

That being said, the instruction bvs checks the overflow flag V in order to decide to follow or not a new code path, while the bne checks if the Zero flag is not equal to 1. In the following example, the cmp instruction at line 5 will set the Z flag to 1 if w1 is equal to zero, this will have as a result the beq to succeed, thus the code will follow the address indicated by the foo label. In case the w1 is not equal to zero, the code will continue up to line 9 where the unconditional branch will redirect the code flow to the address indicated by the label bar:

https://medium.com/media/2e336e3cd354d7531091ca2d3af2d916/href

Branch to register

In case that the address of the next instruction is fetched by a register, the branch instruction has the following forms:

br   Rn     //meaning that pc will be set to Rn
ret     //meaning that if Rn is omitted pc = lr else pc = Rn

Although that the instructions above are self explanatory, it worths to clarify that in the case of ret the parameter is optional and if it is omitted then the value will be fetched by the link register.

Branch and link

The main difference with the previous cases is that before taking the new branch, the next instruction from the current address will be copied to the link register:

bl    label   //meaning that lr = pc+4 and pc = new_address*
blr   Rn      //meaning that lr = pc+4 and pc = Rn

*in this case the immediate is 26 bits and multiplied with 4

Compare and branch

These are conditional branches where the decision to continue the execution from a new address depends on the value of the register which is given as parameter. Their general form is depicted below:

cbz  Rn, label           //if Rn == 0 then pc = new_address*
cbnz Rn, label           //if Rn != 0 then pc = new_address*
tbz  Rn, #imm6, label    //if Rn[#imm6] == 0 then pc = new_address** 
tbnz Rn, #imm6, label    //if Rn[#imm6] != 0 then pc = new_address**

*the immediate is 19 bits and multiplied with 4

**the immediate is 14 bits and multiplied with 4

The #imm6 is an integer ranging from 0 to 63, indicating a specific bit of the register which is given as a parameter. For example, the following instruction checks if the value contained in X0 is even and takes or not the branch to the address indicated by the label even :

tbz X0, 0, even          //if X0 % 2==0 then pc = even

PC relative address calculation

The adr and adrp instructions can be used to calculate an address associated with a label and store the result to a general register which is given as a parameter. Their general form is as follows:

adr Rn, label and adrp Rn, label

In the first case a 21bit immediate is used, resulting a range of 1MB within the current address while in the second case the address has a range of 4GB to the nearest 4KB page as the the 21bit immediate, is shifted left by 12 bits and the 12 LSB bits are padded with zero. As being said, the result in both cases is stored to the general purpose register which is given as a parameter.

Synopsis

Here is a summary table to help you keep track on what has been discussed so far in regard to the branch instructions:

Examples

Here is a simple loop and its arm equivalent:

x = 3;

while (x > 17) {
   ++x;
}

And here is a simple C program which makes use of the concepts that we discussed so far:

https://medium.com/media/ed5c929586bf306405dd6bc08a267c34/href

After compiling it, load it to gdb and disassemble its main function:

We will go through each line explaining what the corresponding command is doing:

+0 stp x29, x30, [sp, #-32]!

Push the frame pointer (fp) and link register (lr) to the stack. Before executing this instruction, the stack pointer (sp) points to 0x7ffffff9f0. The instruction will be completed in the following steps:

sp -= 0x20 => sp = 0x7FFFFF9D0
Store x29 at 0x7FFFFF9D0
Store x30 at 0x7FFFFF9D8

+8 mov w1, #0x1 and +12 mov w0, #0xa

The instructions above will prepare the call to the looper function by storing its parameters looper(10,1) to w0 and w1.

+16 bl 0x5555550774

Notice that before branching to 0x5555550774 the program counter points to 0x…07cc:

The bl instruction will first store the return address to the link register thus

lr = pc + 4 => lr= 0x…7d0

And finally take the branch:

Inside the looper function we have the following:

The instructions:

<+0> sub sp, sp, #0x20, <+4> str w0, [sp, #12] and <+8> str w1, [sp, #8]

will set up the stack and push the function’s parameters to it. Similarly, the wzr, [sp, #28] will push the zero value to the stack, which after this instruction will be as follows:

Next comes the actual loop and. The w1 register will store our integer variable i and so we have the following:

What the green block does is increasing the second parameter which is given to the function by one (b++). Indeed, the value at address sp+8 contains this parameter which then loaded to w0, increased by one (at +24) and stored back to sp+8. The yellow block does exactly the same thing for the local variable i. Finally at offsets 44 →56 the increased by one value is stored to w1, the threshold is stored to w0, these values are compared and if w1 < w0 the loop continues. When w1 gets to be equal to w0 the blt is not taken and the code continues in order to store the return value to w0, restore the sp and use the ret to set the program counter to the value store to the link register:

The last instruction will bring us back to main:

Not much different that the previous call, our printf takes two parameters:

printf(“%d\n”,k);

As it happened before, “%d\n” will be stored to x0 and the result from the looper will be stored to w1:

Finally, after returning from printf and then returning from main have the call to exit:

ARM 64 Assembly Series — Load and Store

+Ch0pin️ — Thu, 14 Jul 2022 19:22:38 GMT

ARM 64 Assembly Series — Load and Store

Previous posts: Basic definitions and registers, lab setup, offset and addressing modes

As we discussed in the previous post:

The AArch64 architecture supports a single instruction set called A64 which consists of fixed-length 32 bit instructions that can be used to: Load and store data, change the address of the next instruction to be executed, perform arithmetic or logical operations, perform a special operation
AArch64 is a load-store architecture, which means that only load and store instructions can access the memory.
The load register ldr and store register str instructions are used to transfer: bytes (8 bits), half-words (16 bits), words (32 bits) and double words (64 bits) from a memory address to registers or from registers to a memory address.

In this post we are going to cover the load and store instructions and, most importantly, we are going to see how they can be formed in order to carry information about the size of the data that they are operating to. This, in conjunction with the offset and addressing syntax might seem a little bit confusing in the beginning, but hopefully by the end of this article you will be able to fully understand these concepts.

Loading and Storing Data

The ldr and str instructions can be used to load or store one or a pair of registers at a time. Let’s see the corresponding syntax in each case:

Single register

As the title implies, in this case, a single register is used a a source or a destination during a data transfer from -or- to memory. The basic syntax is as follows:

op Rn,

The op refers to the instruction mnemonic, which can be ldr or str (capitalisation is optional)
The refers to the size of the data to be transferred (see below)
The Rn refers to the source or destination register
The
refers to the memory address to which or from the data will be transferred

When the parameter is omitted, the data size to be moved is determined by the symbol which is used to refer to the register (remember x implies 64bit size and w to 32bit size).

Let’s see an example to clarify this case:

ldr x1,        //store 64 bits from  to X1
str x1,        //store 64 bits from X1 to

-----------------

ldr w1,        //store 32 bits from  to w1
str w1,        //store 32 bits from w1 to address

The can be used to force a different than the default size. This parameter can be either b, h or w and indicates an unsigned byte, a half word or a word respectively. Finally, adding an s in front of these letter (sb, sh, sw), it will force to the cpu handle the data as signed.

Let’s see some examples:

ldrb x1,[x2]       //store the least significant byte from *x2 to x1

strh x1,[x2],#3    //store a half word (2 bytes) from x1 to *x2 and set x2 = x2 + 3

strsh w0,[w3]      //store a half word (2 bytes) from w0 to *w3 and sign extend it

By sign-extend we mean that the transferred data will be signed when they get stored to the destination:

https://armkeil.blob.core.windows.net/developer/Files/pdf/graphics-and-multimedia/ARMv8_InstructionSetOverview.pdf

In the first case (see figure above), the byte 0x8A will be loaded to the w4 (32bits) register and the remaining 3 bytes will be modified in order to indicate that the number is signed. Exactly the same happens in the second case, with the only difference that x4 refers to 64 bits, thus 7 bytes are going to be sign extended. Omitting the s extension (last case) will pad the remaining destination bytes with 0.

Pair of registers

The ldp, stp instructions can be used to move data twice as much as the ldr, str since they can use a pair of registers each time. The general syntax is as follows:

Rn,Rm,

This operation can brake down to the following steps:

Load or store Rn to
Increase
according to the size of Rn (4 bytes for 32 bit transfer or 8 for 64 bit transfer)
Load or store the second register to the (increased) address

Further than that, the rest parts of the instruction have the same meaning as in the previous case, so let’s go straight to the examples:

Example 1: *x2 will be stored to w0 and *(x2 + 4) will be stored to w1

ldp w0, w1, [x2]

Example 2: sp (the stack pointer) will be set to sp -16 bytes, then x29 will be stored to the address indicated by the sp and x30 will be stored to sp + 8bytes

stp x29,x30, [sp, #-16]!

Example 3: the value stored in the memory address where sp shows will be stored to x29, the value stored at sp+8bytes to x30 and finally sp will be modified to sp+16bytes

ldp x29,x30, [sp], #16

If you ever used a disassembler in the past, then the last two examples may seem familiar as they can be used to allocate space on the stack during a function call:

Disassembling a function with Ghidra

Example

Let us now write a program that demonstrate the instructions we discussed so far. If you have set up your lab, use the following oneliner to start the vm:

qemu-system-aarch64 -m 1024 -M raspi3b -kernel kernel8.img -dtb bcm2710-rpi-3-b-plus.dtb -sd 2022-01-28-raspios-bullseye-arm64.img -append "console=ttyAMA0 root=/dev/mmcblk0p2 rw rootwait rootfstype=ext4" -nographic -device usb-net,netdev=net0 -netdev user,id=net0,hostfwd=tcp::5555-:22

If you haven’t set up your lab yet, you can use this link to do your experiments (unfortunately it doesn’t support ArmV8 yet but it can be very helpful for simple examples). Next, copy the following code:

https://medium.com/media/616722144e96398d0064963b1f465377/href

And compile it with:

$as filename.s -o filename.o && ld filename.o -o filename

In line 2, you see what is called a label, which is something like a function for higher level languages. The _start defines the entry point of the program while the .global is a way to export a function. The instructions at lines 9,10 form a system call (or syscall in short):

syscall() is a small library function that invokes the system
call whose assembly language interface has the specified number
with the specified arguments. Employing syscall() is useful, for
example, when invoking a system call that has no wrapper function
in the C library.

Simply said a system call is like requesting a task from the kernel. These tasks are indexed and identified by an integer which is passed through a special register followed up by a software interrupt instruction, indicated by the svc #0 mnemonic (in the case of AArch64).

syscall conventions depending on the architecture

In our example above, the exit system call for AArch64 is indicated by number 93, so in our case we first mov this value to w8 (the special register we were talking about) and then use the svc #0to perform the call. Let’s load the program to gdb, set a breakpoint to the beginning of the function _start and hit run:

The mov instruction, will store the values 10 and 20 to the registers x29 and x30 respectively, so after they get executed you will see the following:

Also, notice that sp is pointing to 0xfffffb30 which brings up right to the next instruction that will first subtract the value 16 from sp and store the values 10 and 20 to the stack:

0x7ffffffb20: 0x000000000000000a, 0x7ffffffb28: 0x0000000000000014

The next two instructions will store the values 16 and 11 to x29 and x30:

Next is ldp, which as we said it will restore the previous values of x29 and x30 from the stack and set sp back to 0xfffffb30:

Finally, the b exit will branch the execution to the exit function and finish our program:

Food for though

To make these posts more interactive, here is a challenge until the next post:

Assume the following C statements:

int x[] = {1,2,3,4,5};

x[0] = 6;
x[1] = x[2];
x[3] = x[0];

Write the arm version of it using only ldr, str and mov.

.global _start

_start:
     ldr r0, =x

     @ write your program here

.data 
x: .word 1,2,3,4,5