Introduction to Assembly Language
Hello friends, lets continue our tutorial on reverse
engineering. Today i will teach you assembly language basic that are necessary
for learning reverse engineering. As we all know assembly language is
very important for reverse engineering and we must know, what are registers and
which register serves for what. How the assembly language instruction work
and how can we relate them with normal high language coding( C, JAVA, VB,
etc.) to hack any software.
What is Assembly language?
Assembly language is a low level or simply called machine
language made up of machine instructions. Assembly language is specific to
processor architecture example different for x86 architecture than for
SPARC architecture. Assembly language consist of assembly instructions and CPU
registers.assembly language is too big
topic... I think i have to tell only what you need for reverse engineering.. So
i start from CPU registers.
CPU registers - Brief Introduction:
First of all what are registers? Most of Computer
Engineering and Electronics Engineering guys knows about them but for others,
Registers are small segments of memory inside CPU that are used for storing
temporary data. Some registers have specific functions, others are just use for
some general data storage. I am considering that you all are using x86
machines. There are two types of processors 32 bit and 64 bit processors. In a
32 bit processor, each register can hold 32 bits of data. On the other hand 64
bit register can hold 64 bit data. I am explaining this tutorial considering
that we are using 32 bit processors. I will explain the same for 64 bits in
later classes on hackguide4u and hackingloops.
There are several registers but for Reverse engineering
general purpose registers. We are interested in only 9 General purpose
registers namely:
EAX
EBX
ECX
EDX
ESI
EDI
ESP
EBP
EIP
EBX
ECX
EDX
ESI
EDI
ESP
EBP
EIP
All these registers serves for different purposes. So I will
start explaining all of them one by one for a more clear and accurate
understanding of register concepts. I am putting more strain on these because
these registers are called heart of reverse engineering.
EAX register is accumulator register which is used to store
results of calculations. If any function returns a value its stored into EAX
register. We can access EAX register using functions to retrieve the value of
EAX register.
Note: EAX register can also be used for holding normal
values regardless of calculations too.
The EDX is the data register. It’s basically an extension of
EAX to assist it in storing extra data for complex operations. It can also be
used for general purpose data storage.
The ECX, also called the count register, is used for looping
operations. The repeated operations could be storing a string or counting
numbers.
The ESI and EDI relied upon by loops that process data. The
ESI register is the source index for data operation and holds the location of
the input data stream. The EDI points to the location where the result of data
operation is stored, or the destination index.
ESP is the stack pointer, and EBP is the base pointer. These
registers are used for managing function calls and stack operations. When a
function is called, the function’s arguments are pushed on the stack and are
followed by a return address. The ESP register points to the very top of the
stack, so it will point to the return address. EBP is used to point to the
bottom of the call stack.
EBX is the only register that was not designed for anything
specific. It can be used for extra storage.
EIP is the register that points to the current instruction
being executed. As the CPU moves through the binary executing code, EIP is
updated to reflect the location where the execution is occurring.
The 'E' at the beginning of each register name stands for
Extended. When a register is referred to by its extended name, it indicates
that all 32 bits of the register are being addressed. An interesting
thing about registers is that they can be broken down into smaller subsets of
themselves; the first sixteen bits of each register can be referenced by simply
removing the 'E' from the name. For example, if you wanted to only manipulate
the first sixteen bits of the EAX register, you would refer to it as the AX
register. Additionally, registers AX through DX can be further broken down into
two eight bit parts. So, if you wanted to manipulate only the first eight bits
(bits 0-7) of the AX register, you would refer to the register as AL; if you
wanted to manipulate the last eight bits (bits 8-15) of the AX register, you
would refer to the register as AH ('L' standing for Low and 'H' standing for
High).
Introduction to Memory and Stacks:
There are three main sections of memory:
1. Stack Section - Where the stack is located, stores local variables and function arguments.
2. Data Section - Where the heap is located, stores static and dynamic variables.
3. Code Section - Where the actual program instructions are located.
The stack section starts at the high memory addresses and grows downwards, towards the lower memory addresses; conversely, the data section (heap) starts at the lower memory addresses and grows upwards, towards the high memory addresses. Therefore, the stack and the heap grow towards each other as more variables are placed in each of those sections. I have shown that in below Figure..
1. Stack Section - Where the stack is located, stores local variables and function arguments.
2. Data Section - Where the heap is located, stores static and dynamic variables.
3. Code Section - Where the actual program instructions are located.
The stack section starts at the high memory addresses and grows downwards, towards the lower memory addresses; conversely, the data section (heap) starts at the lower memory addresses and grows upwards, towards the high memory addresses. Therefore, the stack and the heap grow towards each other as more variables are placed in each of those sections. I have shown that in below Figure..
High Memory Addresses (0xFFFFFFFF)
---------------------- <-----Bottom of the stack
| |
| | |
| Stack | | Stack grows down
| | v
| |
|---------------------| <----Top of the stack (ESP points here)
| |
| |
| |
| |
| |
|---------------------| <----Top of the heap
| |
| | ^
| Heap | | Heap grows up
| | |
| |
|---------------------| <-----Bottom of the heap
| |
| Instructions |
| |
| |
-----------------------
Low Memory Addresses (0x00000000)
---------------------- <-----Bottom of the stack
| |
| | |
| Stack | | Stack grows down
| | v
| |
|---------------------| <----Top of the stack (ESP points here)
| |
| |
| |
| |
| |
|---------------------| <----Top of the heap
| |
| | ^
| Heap | | Heap grows up
| | |
| |
|---------------------| <-----Bottom of the heap
| |
| Instructions |
| |
| |
-----------------------
Low Memory Addresses (0x00000000)
Some Essential Assembly Instructions for Reverse
Engineering:
Instruction
|
Example
|
Description
|
push
|
push eax
|
Pushes the value stored in EAX onto the stack
|
pop
|
pop eax
|
Pops a value off of the stack and stores it in EAX
|
call
|
call 0x08abcdef
|
Calls a function located at 0x08abcdef
|
mov
|
mov eax,0x5
|
Moves the value of 5 into the EAX register
|
sub
|
sub eax,0x4
|
Subtracts 4 from the value in the EAX register
|
add
|
add eax,0x1
|
Adds 1 to the value in the EAX register
|
inc
|
inc eax
|
Increases the value stored in EAX by one
|
dec
|
dec eax
|
Decreases the value stored in EAX by one
|
cmp
|
cmp eax,edx
|
Compare values in EAX and EDX; if equal set the zero flag*
to 1
|
test
|
test eax,edx
|
Performs an AND operation on the values in EAX and EDX; if
the result is zero, sets the zero flag to 1
|
jmp
|
jmp 0x08abcde
|
Jump to the instruction located at 0x08abcde
|
jnz
|
jnz 0x08ffff01
|
Jump if the zero flag is set to 1
|
jne
|
jne 0x08ffff01
|
Jump to 0x08ffff01 if a comparison is not equal
|
and
|
and eax,ebx
|
Performs a bit wise AND operation on the values stored in
EAX and EBX; the result is saved in EAX
|
or
|
or eax,ebx
|
Performs a bit wise OR operation on the values stored in
EAX and EBX; the result is saved in EAX
|
xor
|
xor eax,eax
|
Performs a bit wise XOR operation on the values stored in
EAX and EBX; the result is saved in EAX
|
leave
|
leave
|
Remove data from the stack before returning
|
ret
|
ret
|
Return to a parent function
|
nop
|
nop
|
No operation (a 'do nothing' instruction)
|
*The zero flag (ZF) is a 1 bit indicator which records the
result of a cmp or test instruction
Each instruction performs one specific task, and can deal directly with registers, memory addresses, and the contents thereof. It is easiest to understand exactly what these functions are used for when seen in the context of a simple hello world program and try to relate assembly language with high level language such as C language.
Each instruction performs one specific task, and can deal directly with registers, memory addresses, and the contents thereof. It is easiest to understand exactly what these functions are used for when seen in the context of a simple hello world program and try to relate assembly language with high level language such as C language.
Here is simple C program that displays Hello World:
int main(int argc, char *argv[])
{
printf("Hello
World!\n");
return 0; }
Save this program as helloworld.c and compile it with 'gcc -o helloworld helloworld.c'; run the resulting binary and it should print "Hello World!" on the screen and exit. Ahhah... It looks quite simple. Now let's look how it will look in assembly language.
0x8048384 push
ebp
<--- Save the EBP value on the stack
0x8048385 mov ebp,esp <--- Create a new EBP value for this function
0x8048387 sub esp,0x8 <---Allocate 8 bytes on the stack for local variables
0x804838a and esp,0xfffffff0 <---Clear the last byte of the ESP register
0x804838d mov eax,0x0 <---Place a zero in the EAX register
0x8048392 sub esp,eax <---Subtract EAX (0) from the value in ESP
0x8048394 mov DWORD PTR [esp],0x80484c4 <---Place our argument for the printf() (at address 0x08048384) onto the stack
0x804839b call 0x80482b0 <_init+56> <---Call printf()
0x80483a0 mov eax,0x0 <---Put our return value (0) into EAX
0x80483a5 leave <---Clean up the local variables and restore the EBP value
0x80483a6 ret <---Pop the saved EIP value back into the EIP register
0x8048385 mov ebp,esp <--- Create a new EBP value for this function
0x8048387 sub esp,0x8 <---Allocate 8 bytes on the stack for local variables
0x804838a and esp,0xfffffff0 <---Clear the last byte of the ESP register
0x804838d mov eax,0x0 <---Place a zero in the EAX register
0x8048392 sub esp,eax <---Subtract EAX (0) from the value in ESP
0x8048394 mov DWORD PTR [esp],0x80484c4 <---Place our argument for the printf() (at address 0x08048384) onto the stack
0x804839b call 0x80482b0 <_init+56> <---Call printf()
0x80483a0 mov eax,0x0 <---Put our return value (0) into EAX
0x80483a5 leave <---Clean up the local variables and restore the EBP value
0x80483a6 ret <---Pop the saved EIP value back into the EIP register
As you can easily figure out these instructions are similar
to that of C program. You can easily note that flow of program is same. Off
course it will be same as its a assembly code of same binary (exe) obtained
from executing above C program.
I hope you all like it. We will continue our discussion
tomorrow where i will explain how to analyze assembly language codes for those
binaries whose high level source code we don't have.
A quick tip for all users how to learn assembly language
better... Pick a already made code and generate its binary or exe file
and now obtains the assembly code of that binary and try to relate assembly
code with high language code. I guarantee that will surely help you to
understand better as I always used to do understand things like these ways
only.
No comments:
Post a Comment