General Prerequisites

When analyzing binaries, it is important to be able to put what is observed into context. For example, how can CPU instructions be differentiated from data in a binary with a non-standard format? This requires some background knowledge of computer systems in general. I would argue that before any attempt at reverse engineering firmware is made, at least basic familiarity with the following concepts is required:

  • Computer architecture / computer system organization
    • CPU design and function (e.g. registers, the instruction pointer, memory access)
    • memory and the memory hierarchy
    • instruction sets, assembly, opcodes, addressing modes, syntax, mnemonics
    • information representation (binary, hex, endianness)
  • Operating system concepts
    • Virtual memory
    • usermode vs kernelmode, the kernel, the kernel interface (system calls)
    • process layout in memory – stack, heap, data, instructions
    • executable formats
    • application binary interfaces
    • program entry points
  • source code to object code transformation
    • compilation, assembly, linking
    • C/C++ programming
    • Assembly programming
    • source-to-assembly construct correlation (e.g. recognition of loop, switch constructs in assembly)
    • disassembly vs decompilation

My advice is the following:

  • read as much as you can: technical specifications, assembly/disassembly, answers to firmware RE questions, research papers, tutorials, blogs, textbooks, manual pages
  • emulate/copy the methodologies employed and approaches taken by pros
  • gain experience as quickly as possible: look at and experiment with many different types of files (executables, image files, compressed files, firmware, etc.), program in assembly to get a feel for it, disassemble many executables

Firmware RE Resources

Intro to Embedded Reverse Engineering for PC reversers” by Igor Skochinsky provides an overview of what is involved in reversing firmware, and in “Embedded Devices Security: Firmware Reverse Engineering” Jonas Zaddach and Andrei Costin outline a general methodology for reversing firmware beginning on slide 31.

Look at answers given by pros:

These may be useful or interesting:

devttys0’s blog

ea’s blog

Igor’s blog

IOActive Labs Research blog

sviehb’s blog

Embedded systems often use MIPS or ARM processors, and by extension MIPS or ARM instruction sets. This means that being familiar with MIPS and ARM assembly will be very helpful when analyzing firmware for these systems.

Analyzing the binary

Part 1: Identification of the target device’s architecture

We cannot rely on hearsay to obtain the information required to analyze the firmware. Validity of information about the firmware must be proven by using empirical evidence. It is not enough to have a binary blob from a second-hand source and a processor name from a different question.

1. Identify the target device

Fortunately in this case it is easy to at least get the device name: SMOK X Cube II. When the vendor’s firmware and tools support page is examined it turns out that there is a real device with that name. The .hex file is bundled with an upgrade tool from Taiwanese semiconductor manufacturer Nuvoton called “NuMicro ISP Programming Tool“:

~/firmware/e-cig/XCUBE II upgrading tool $ file *
config.ini:                                            ASCII text, with CRLF line terminators
NuMicro ISP Programming Tool.exe:                      PE32 executable (GUI) Intel 80386, for MS Windows
NuMicro ISP Programming Tool User's Guide.pdf:         PDF document, version 1.5
XCUBE II-VIVI-52 (160616)V.1.098(checksum=0x28F9).hex: ASCII text, with CRLF line terminators

This hex file is straight from the manufacturer of the device processor rather than from a second-hand source. It is also a newer version – v1.098 rather than v1.07. I decided to analyze the older firmware version (v1.07) since this is the version of the binary in the question.

Upgrade tool pic 1

2. Identify the processor

There are some interesting things in the pictures used to describe the upgrade process: the name NuMicro and the acronym ISP in the tool name, the term DataFlash, a reference to something called APROM, and most importantly, the part number: NUC220LE3AN. What “part” is this a number for? A Nuvoton-developed microcontroller based on ARM’s Cortex-M0 processor.

3. Identify the instruction set architecture

Nuvoton is kind enough freely share technical documentation for the NuMicro NUC220 series, including the datasheet and the technical reference manual, in addition to various software tools and training materials (click on the “Resources” tab at the top of the NUC220LE3AN product page).

From the datasheet, Section 1: “General Description”, page 7 (emphasis mine):

The NuMicro NUC200 Series 32-bit microcontrollers is embedded with the newest ARM® Cortex™-M0 core with a cost equivalent to traditional 8-bit MCU for industrial control and applications requiring rich communication interfaces. The NuMicro NUC200 Series includes NUC200 and NUC220 product lines.

Is this enough information to conclude that the code in the firmware binary consist of 32-bit ARM instructions? No, it is not. Let us look closely at the functional description of the processor (Chapter 6: Functional Description, section 1: ARM Cortex-M0 Core, page 48):

Processor functional description Processor instruction set

Let us take special note of the following information:

  • The processor can execute Thumb code and is compatible with other Cortex®-M profile processor.

  • ARMv6-M Thumb® instruction set

  • Thumb-2 technology

Note that the processor is an ARM Cortex-M0 Core and not ARM Cortex-M0+ Core, which has a different instruction set.

From ARM’s Cortex-M0 technical reference manual:

The processor implements the ARMv6-M Thumb instruction set, including a number of 32-bit instructions that use Thumb-2 technology. The ARMv6-M instruction set comprises:

  • all of the 16-bit Thumb instructions from ARMv7-M excluding CBZ, CBNZ and IT
  • the 32-bit Thumb instructions BL, DMB, DSB, ISB, MRS and MSR.

What is “Thumb code” and the “Thumb instruction set”?

From “Introduction to ARM thumb” by Joe Lemieux (emphasis mine):

The Thumb instruction set consists of 16-bit instructions that act as a compact shorthand for a subset of the 32-bit instructions of the standard ARM. Every Thumb instruction could instead be executed via the equivalent 32-bit ARM instruction. However, not all ARM instructions are available in the Thumb subset; for example, there’s no way to access status or coprocessor registers. Also, some functions that can be accomplished in a single ARM instruction can only be simulated with a sequence of Thumb instructions.

At this point, you may ask why have two instruction sets in the same CPU? But really the ARM contains only one instruction set: the 32-bit set. When it’s operating in the Thumb state, the processor simply expands the smaller shorthand instructions fetched from memory into their 32-bit equivalents.

The difference between two equivalent instructions lies in how the instructions are fetched and interpreted prior to execution, not in how they function. Since the expansion from 16-bit to 32-bit instruction is accomplished via dedicated hardware within the chip, it doesn’t slow execution even a bit. But the narrower 16-bit instructions do offer memory advantages.

The Thumb instruction set provides most of the functionality required in a typical application. Arithmetic and logical operations, load/store data movements, and conditional and unconditional branches are supported. Based upon the available instruction set, any code written in C could be executed successfully in Thumb state. However, device drivers and exception handlers must often be written at least partly in ARM state.

Here is a good explanation from SO: ARM, Thumb and Thumb 2 instructions confusion

From the ARMv6-M Architecture Reference Manual, Chapter A5:The Thumb Instruction Set Encoding, section 1: Thumb instruction set encoding, page 82:

Thumb instruction set encoding


The NuMicro NUC200 Series only supports little-endian data format.

System memory map

To summarize: the code in the firmware binary will consist of little-endian 16-bit ARM Thumb instructions plus a few 32-bit Thumb2 instructions to be executed by a 32-bit ARM Cortex-M0 processor implementing the ARM 16-bit Thumb instruction set with support for Thumb2.

4. Identify the device’s memory layout

Access to the technical reference manual allows us to determine what APROM and ISP are. From Chapter 6: Functional Description, section 4.4.1: Flash Memory Organization, page 191:

The NuMicro NUC200 Series flash memory consists of program memory (APROM), Data Flash, ISP loader program memory (LDROM), and user configuration. Program memory is main memory for user applications and called APROM. User can write their application to APROM and set system to boot from APROM.

ISP loader program memory is designed for a loader to implement In-System-Programming function. LDROM is independent to APROM and system can also be set to boot from LDROM. Therefore, user can user LDROM to avoid system boot fail when code of APROM was corrupted.

And from Chapter 6: Functional Description, section 4.4.5: In-System-Programming (ISP), page 199:

ISP provides the ability to update system firmware on board. Various peripheral interfaces let ISP loader in LDROM to receive new program code easily. The most common method to perform ISP is via UART along with the ISP loader in LDROM. General speaking, PC transfers the new APROM code through serial port. Then ISP loader receives it and re-programs into APROM through ISP commands.

According to the information in the config.ini file bundled with the NuMicro ISP Programming Tool, flash memory size of the APROM segment is 128 KB:

$ cat config.ini | grep NUC200LE3AN -B2 -A3


Here is a diagram of the flash memory address map:

flash memory address map

We know that the space from 0x0000_0000 to 0x0001_FFFF = 131071 bytes, which is 128 KB, and this is the region to which the binary from the hex file will be flashed to using the upgrade tool. Above that there is a block of memory from 0x0002_0000 to 0x0010_000 which is labeled “Reserved for Further Used”. The size of this “Reserved” space is 0x0010_0000 – 0x0002_0000 = 0xE0000, or 917504 bytes. This is almost 1 megabyte of reserved space. The 128 KB reserved for APROM makes up 12.5% of the address space between 0x0000_0000 and 0x0010_0000, but is represented as being larger than the ~1 MB “Reserved” block. This is very strange. There is also no documentation of this reserved block anywhere in the technical reference manual that I could find. If one had physical access to the device, perhaps the contents of flash memory could be dumped and analyzed to find out what lies in this region.

Since the firmware binary is written to space in flash memory reserved for user applications, it seems unlikely that the firmware binary contains kernel code, bootloader code or a filesystem. This is different from router firmware, which tends to at the very least contain kernel code.

Part 2: Direct analysis of the binary

Quick recap of what we know at this point:

  • The device name – SMOK X Cube II
  • The processor – A NuMicro NUC220LE3AN processor, based on an ARM Cortex-M0 Core processor
  • The instruction set architecture – little-endian ARM-v6 M 16-bit Thumb
  • The location in flash memory to which the firmware will be written – the 128KB APROM region for user applications (in other words, not the kernel)
  • NuMicro is a Taiwan-based company. We will see why this is potentially relevant shortly.
  • The entropy plot generated by binwalk included in the question reveals that there are no encrypted or compressed regions in the firmware
  • Based on information included in the question, there exist ASCII strings embedded in the file that appear to be related to the functionality of the device

Potential complications:

  • firmware binaries do not have a standard format like executable binaries do
  • Data may be intermingled with code/instructions within the binary. If this is the case, it is possible that data such as strings will be disassembled as instructions, resulting in an incorrect representation of the firmware’s code

Preliminary analysis

1. strings and hexdump

The output of strings can be used to quick heuristic in determining if the firmware is encrypted/compressed. If there are no strings in the output, it is a good indicator that the entire file is obfuscated somehow. hexdump with the -C argument can be used to provide some context for the strings i.e. where in the binary they are relative to code and relative to each other. In other words, are the strings packed together in a single block, or are they scattered throughout the binary? The answer can provide clues about the layout of the firmware.

Using hexump, we see that the ASCII strings are intermingled with what might be code:

00002ed0  01 21 1b 20 fd f7 6e fe  21 46 38 6a 09 f0 16 fd  |.!. ..n.!F8j....|
00002ee0  64 21 09 f0 13 fd 08 46  0a 21 09 f0 0f fd 10 30  |d!.....F.!.....0|
00002ef0  14 21 48 43 42 19 01 21  25 20 fd f7 5b fe 73 e0  |.!HCB..!% ..[.s.|
00002f00  68 e2 88 e0 57 41 54 54  0a 00 00 00 4d 4f 44 45  |h...WATT....MODE|
00002f10  0a 00 00 00 7c db 00 00  88 db 00 00 54 45 4d 50  |....|.......TEMP|
00002f20  0a 00 00 00 4d 45 4d 4f  52 59 0a 00 20 4d 4f 44  |....MEMORY.. MOD|
00002f30  45 20 0a 00 ac 01 00 20  53 54 52 45 4e 47 54 48  |E ..... STRENGTH|
00002f40  0a 00 00 00 3c 0b 00 20  20 4d 49 4e 20 0a 00 00  |....<..  MIN ...|
00002f50  53 4f 46 54 0a 00 00 00  4e 4f 52 4d 0a 00 00 00  |SOFT....NORM....|
00002f60  48 41 52 44 0a 00 00 00  20 4d 41 58 20 0a 00 00  |HARD.... MAX ...|
00002f70  ea cf 00 00 42 4c 55 45  54 4f 4f 54 48 0a 00 00  |....BLUETOOTH...|
00002f80  20 20 20 4f 4e 20 20 20  20 0a 00 00 20 20 20 4f  |   ON    ...   O|
00002f90  46 46 20 20 20 0a 00 00  ea d0 00 00 20 20 20 4c  |FF   .......   L|
00002fa0  45 44 20 20 20 0a 00 00  6a d1 00 00 53 54 45 41  |ED   ...j...STEA|
00002fb0  4c 54 48 0a 00 00 00 00  20 4f 46 46 20 20 0a 00  |LTH..... OFF  ..|
00002fc0  20 20 4f 4e 20 20 0a 00  20 20 54 4f 44 41 59 20  |  ON  ..  TODAY |
00002fd0  20 0a 00 00 80 96 98 00  f6 e1 00 00 83 e5 00 00  | ...............|
00002fe0  a0 86 01 00 10 27 00 00  21 46 38 6a 09 f0 8e fc  |.....'..!F8j....|
00002ff0  0a 21 09 f0 8b fc 10 31  14 20 41 43 4a 19 01 21  |.!.....1. ACJ..!|

another group of ASCII strings elsewhere in the binary:

00004f70  84 e0 04 f0 40 fe 00 28  13 d0 00 20 03 f0 ec ff  |....@..(... ....|
00004f80  1e 49 80 31 08 69 88 61  35 4a 90 42 00 d3 8c 61  |.I.1.i.a5J.B...a|
00004f90  88 69 08 62 33 48 06 23  04 22 00 90 19 46 00 20  |.i.b3H.#."...F. |
00004fa0  62 e0 6b e0 20 43 48 45  43 4b 20 20 0a 00 00 00  |b.k. CHECK  ....|
00004fb0  41 54 4f 4d 49 5a 45 52  0a 00 00 00 f6 e0 00 00  |ATOMIZER........|
00004fc0  28 03 00 20 ac 01 00 20  7a e0 00 00 20 20 43 48  |(.. ... z...  CH|
00004fd0  45 43 4b 20 20 0a 00 00  10 4b 00 00 ba e0 00 00  |ECK  ....K......|
00004fe0  44 4f 4e 27 54 0a 00 00  41 42 55 53 45 0a 00 00  |DON'T...ABUSE...|
00004ff0  50 52 4f 54 45 43 54 53  21 0a 00 00 3c 0b 00 20  |PROTECTS!...<.. |
00005000  20 57 41 54 54 20 0a 00  2c 2f 00 00 60 ea 00 00  | WATT ..,/..`...|
00005010  36 e1 00 00 2d 53 48 4f  52 54 2d 20 0a 00 00 00  |6...-SHORT- ....|
00005020  b2 eb 00 00 88 13 00 00  20 53 48 4f 52 54 20 20  |........ SHORT  |
00005030  0a 00 00 00 81 0b 00 00  49 53 20 4e 45 57 0a 00  |........IS NEW..|
00005040  43 4f 49 4c 3f 20 0a 00  59 0a 00 00 4e 0a 00 00  |COIL? ..Y...N...|
00005050  7c db 00 00 88 db 00 00  dc 05 00 00 a0 db 00 00  ||...............|
00005060  0f 27 00 00 94 db 00 00  fb f7 e0 fd 28 46 fd f7  |.'..........(F..|
00005070  a1 f8 fb f7 f0 fe 07 20  fd f7 08 fb af 20 fb f7  |....... ..... ..|
00005080  2f ff 00 20 fb f7 30 ff  38 bd ff 49 08 60 70 47  |/.. ..0.8..I.`pG|
00005090  fe 49 88 72 70 47 fd 48  80 7a 70 47 10 b5 13 24  |.I.rpG.H.zpG...$|

more ASCII strings elsewhere:

00005490  44 2f 00 00 34 0c 00 20  a0 db 00 00 88 db 00 00  |D/..4.. ........|
000054a0  94 db 00 00 7c db 00 00  ea d5 00 00 36 0a 00 00  |....|.......6...|
000054b0  2e 0a 00 00 50 4f 57 45  52 0a 00 00 20 4f 46 46  |....POWER... OFF|
000054c0  20 0a 00 00 20 20 4f 4e  20 0a 00 00 e7 03 00 00  | ...  ON .......|
000054d0  0f 27 00 00 9f 86 01 00  33 08 00 00 5f db 00 00  |.'......3..._...|
000054e0  fb f7 a4 fb fd 49 20 68  07 f0 10 fa 7d 27 08 46  |.....I h....}'.F|
000054f0  ff 00 39 46 07 f0 0a fa  f9 4e 00 01 80 19 01 22  |..9F.....N....."|

There are several more such clusters of ASCII strings in different parts of the file. Some of the ASCII strings are mentioned in the product manual:

strings in product manual

However, many of the ASCII strings in the binary are not mentioned in the manual, such as these:

00009d00  21 b0 f0 bd 00 01 00 50  00 ff 01 00 b4 ed 00 00  |!......P........|
00009d10  43 12 67 00 45 52 52 4f  52 3a 20 20 20 0a 00 00  |C.g.ERROR:   ...|
00009d20  4e 4f 20 53 45 43 52 45  54 0a 00 00 2d 4b 45 59  |NO SECRET...-KEY|
00009d30  21 20 20 20 20 0a 00 00  ef 48 00 68 c0 07 c0 0f  |!    ....H.h....|

Visualization of the binary also shows that byte sequences that fall within the ASCII range are scattered throughout the binary (blue is ASCII):

binary visualization by byteclass

2. Taking the locale the firmware was developed in into consideration

The firmware, the upgrade tool and the microcontroller are all developed by Nuvoton, a Taiwanese company. Perhaps there are sequences of traditional Chinese characters in the binary as well.

By default, strings searches for ASCII character sequences and the -C option for hexdump prints bytes within the ASII range as ASCII characters. But what if there are Unicode-encoded strings in the binary in addition to ASCII-encoded strings? Radare2 can be used to search for strings in the hex file directly, rather than relying on the output of a different tool (hexdump is pretty flexible but it is faster to use radare2). To search for strings, the izz commands will be used to search for strings throughout the binary:

$ r2 ihex://SMOK_X_CUBE_II_firmware_v1.07.hex
 -- I am Pentium of Borg. Division is futile. You will be approximated.
[0x00000000]> izz
Do you want to print 1444 lines? (y/N)   <--- enter "y", obviously

This has some potentially interesting results:

vaddr=0x0000aa95 paddr=0x0000aa95 ordinal=1093 sz=28 len=13 section=unknown type=wide string=h(胐恇ԇӕ栠だi(胐⁇ԇ
vaddr=0x0000aab5 paddr=0x0000aab5 ordinal=1094 sz=54 len=26 section=unknown type=wide string=i(胐ⱇ潩ᄆHhШ⣐ࡉ⡀ѡ⣠ũड蠅⡃灡h(胐
vaddr=0x0000aaef paddr=0x0000aaef ordinal=1095 sz=10 len=4 section=unknown type=wide string=Hh̨⣐
vaddr=0x0000ab07 paddr=0x0000ab07 ordinal=1096 sz=62 len=30 section=unknown type=wide string=h(胐ᄆ탕HhШ棐칩ࡉ桀ѡ棠ũड蠅桃灡i(胐༂웕Hh̨棐
vaddr=0x0000ab53 paddr=0x0000ab53 ordinal=1097 sz=70 len=34 section=unknown type=wide string=i(胐삵汍쁨ԇǐ栠だh(胐ꁇԇ˕栠끠h(胐恇ԇӕ栠だi(胐⁇ԇ
vaddr=0x0000ab9d paddr=0x0000ab9d ordinal=1098 sz=58 len=28 section=unknown type=wide string=i(胐ⱇ潩ᄆ꧕HhШ⣐ꡩࡉ⡀ѡ⣠ũड蠅⡃灡h(胐ꈂཌ
vaddr=0x0000abd7 paddr=0x0000abd7 ordinal=1099 sz=10 len=4 section=unknown type=wide string=Hh̨⣐
vaddr=0x0000abef paddr=0x0000abef ordinal=1100 sz=62 len=30 section=unknown type=wide string=h(胐ᄆ雕HhШ棐鑩ࡉ桀ѡ棠ũड蠅桃灡i(胐༂賕Hh̨棐
vaddr=0x0000ac3b paddr=0x0000ac3b ordinal=1101 sz=22 len=10 section=unknown type=wide string=i(胐袽腈ཨሢᄅ腃

I cannot read these characters, so I do not know what language they are from. Maybe it is just gibberish.

3. Using a hex editor

A hex editor with a GUI can be used to quickly search for patterns in the data. For example, the byte 0A looks like it is used as a terminating character for ASCII strings:

0A ASCII string terminating character


So how should the binary be disassembled using r2? Are any there any special arguments or commands for 16-bit ARM Thumb instructions + some 32-bit Thumb2 instructions?

From How to disassemble to ARM UAL?:

-b16 is asumed for thumb, not because the instruction size or the register size. Its an exception to make things simpler. Because its just a mode of the cpu.

-b16 sets thumb2 mode in capstone disassembler (as well as in gnu). Thumb2 contains 2 byte and 4 byte instruction lengths. Thumb was only 2. But thumb and thumb2 are binarynl compatible, so it makes sense to use thumb2 here, unless the cpu doesnt supports it.

From what i understand from ual is that this ist just a syntax, and this symtax should be ready in capstone.

Capstone knows nothing about code or data. It just disassembles.

In order to properly disassemble the file, it is critical that the correct architecture is specified:

-b bits force asm.bits (16, 32, 64)

For this firmware binary, -b 16 should be used, not -b 32:

$ r2 -a arm -b 16 ihex://SMOK_X_CUBE_II_firmware_v1.07.hex

If -b 32 is used, the result is quite a bit of byte sequences r2 reads as invalid due to misalignment: invalid disassembly

For reference, here is disassembly beginning at the same offset, 0x1e8, with proper 16-bit alignment:

less invalid disassembly

Obviously this is totally different.

It is important to emphasize that the entire binary will be disassembled as executable code, including data such as the ASCII and Unicode byte sequences. This must be taken into consideration when analyzing the disassembled output.

To analyze the disassembled code, one must be familiar with ARM assembly.

Additional Considerations

  • The ISP upgrade tool is a MS Windows PE32 executable binary. This can be reverse engineered to determine how the flashing process takes place.
  • Physical access to the microcontroller could be useful. The entire contents of flash memory could be dumped and analyzed. This would also enable one to see exactly how everything is laid out in flash memory
  • if known good blocks of code can be isolated, it my be possible to decompile it


Hopefully the approach used here proves useful for your future firmware RE endeavors. Analyzing firmware poses its own set of challenges because of the close relationship between it and it the hardware it is designed to be embedded in. Since the design and architecture of the device determines the layout and content of firmware, firmware sometimes cannot be reversed without access to the device, or at the very least knowing the instruction set architecture of the device.

#infomagnum #cyberfit #reverseengineering #re

Please follow and like us:

Leave a Reply

Your email address will not be published. Required fields are marked *

seventeen − five =

Comment moderation is enabled. Your comment may take some time to appear.