# Welcome to this tutorial notebook!

This notebook contains the code necessary to demonstrate interprocess taint analysis using the API.

Executing all the cells will perform various taints and filter on the results.

 You can execute the cells that contain python code using the Ctrl+Enter (execute and stay in same cell) or the Shift+Enter (execute and go to next cell) shortcuts.

Please execute the cells in order, as they are dependent of each other.

If this is the first time that you are using the REVEN API, we recommend you start with our [guided tour](./guided_tour.ipynb) notebook.

In [None]:
# API imports
import reven2  # analysis API
import reven2.preview  # access taint
import reven2.arch.x64 as regs  # shortcut when reading registers
from reven2.address import LogicalAddress  # shortcut when reading addresses
import reven2.types as types   # shortcut when reading a specific type

# various useful helpers for HTML display etc
from taint_tokio_chat.utils import display_table, table_line, read_tainted_memory, get_ret_ctx

In [None]:
# connect to the server
server = reven2.RevenServer("127.0.0.1", 1337)
trace = server.trace
print(server)
print(trace.transition_count)  # check total number of transitions in the trace

# A simple taint

Let's look at the framebuffer towards the end of the trace (at transition `#16810210`):

![Framebuffer](img/framebuffer_tokio_chat.png)

* There are 3 `cmd` windows, 2 of them running a `chat_client.exe`, one of them running what appears to be a server.
* Let's try to see how the `Bob: Hello!` string received by the `Alice` client traveled during the trace by using a backward taint!

In [None]:
# Find the last occurrence of "Bob: Hello!" in the trace
last_string = list(trace.strings("Bob: Hello!"))[-1]
print(last_string)
# Memory location occupied by the string
mem_to_taint = reven2.preview.taint.TaintedMemories(last_string.address, 
                                                    last_string.size)
print(mem_to_taint)

# Contexts between which we will perform the backward taint: from the first access
# to the string to the beginning of the trace (0) 
first_context = last_string.first_access.context_before()
print(first_context)
last_context = trace.context_before(0)
print(last_context)
is_forward_taint = False

# start the backward taint
from reven2.preview.taint import Tainter
tainter = Tainter(trace=trace)
taint = tainter.simple_taint(tag0=mem_to_taint, 
                             is_forward=is_forward_taint, 
                             from_context=last_context, to_context=first_context)
print(taint)

In [None]:
# display taint result each time we change to a different process
process = None
table = ""
# iterate over all changes in tainted data
for change in taint.accesses(changes_only=True).all():
    ctx = change.transition.context_before()
    new_process = ctx.ossi.process() # get current process
    
    if process is None or new_process.pid != process.pid: # we changed process
        table += table_line(["#{}".format(change.transition.id),
                             new_process, read_tainted_memory(change)])
        process = new_process
        
display_table(title="Process changes for the backward taint of 'Bob: Hello!'", 
              headers=["Transition", "Process", "Tainted memory"],
              html_lines=table)

The backward taint informs us of what happened to the message, in reverse chronological order:

1. The `Bob: Hello!` message ends up in the `conhost.exe (2704)` where it is displayed
2. the message was received by the `chat_client.exe (2832)` process 
3. the message was received by the `chat_server.exe (648)`
4. the `Hello!` message was sent by the `chat_client.exe (2816)` along with the `Bob` nickname
5. The `Hello!` message and the `Bob` nickname were originally typed in the `conhost.exe (2788)` process

We got a good idea of where the message comes from and what processes it went through.

## We can also attempt the taint in the forward direction to see the events in the chronological order

In [None]:
# Find the first occurrence of "Hello!" in the trace
first_string = list(trace.strings("Hello!"))[0]
print(first_string)
# Memory location occupied by the string
mem_to_taint = reven2.preview.taint.TaintedMemories(first_string.address, 
                                                    first_string.size)
print(mem_to_taint)

# Contexts between which we will perform the forward taint: from the first access to the
# string to the end of the trace
first_context = first_string.first_access.context_before()
print(first_context)
last_context = trace.transition(trace.transition_count - 1).context_after()
print(last_context)
is_forward_taint = True

# start the forward taint
from reven2.preview.taint import Tainter
tainter = Tainter(trace=trace)
taint = tainter.simple_taint(tag0=mem_to_taint, is_forward=is_forward_taint,
                             from_context=first_context,
                             to_context=last_context)
print(taint)

In [None]:
# Taint is now running in background. You can access its current progress status by executing
# this cell, or you can execute the next code cell to collect all its results in a blocking
# manner.
print(taint.progress())

Because we are using the reven2.preview.taint.TaintAccessView.available() method,
the call to `taint.accesses(changes_only=True).all()` in the next code cell is blocking
if it needs more results and the taint is not finished, in order to get all results.

If your workload is on a very long taint, you may prefer getting the first available
results without blocking. To do so you can use the `taint.accesses(changes_only=True).available()` method
to only get what results are available at the time of the call.

Please refer to the documentation of the taint module (execute `reven2.preview.taint?` in a
cell) for more information, or access the
[online documentation](http://doc.tetrane.com/professional/latest/python-doc/reven2.preview.taint.html) for
[available()](http://doc.tetrane.com/professional/latest/python-doc/reven2.preview.taint.TaintResultView.html#available) and
[all()](http://doc.tetrane.com/professional/latest/python-doc/reven2.preview.taint.TaintResultView.html#all).

In [None]:
# display taint result each time we change to a different process
process = None
table = ""
# iterate over all changes in tainted data
for change in taint.accesses(changes_only=True).all():
    new_process = change.transition.context_before().ossi.process() # get current process
    
    if process is None or new_process.pid != process.pid: # we changed process
        table += table_line(["#{}".format(change.transition.id),
                             new_process, read_tainted_memory(change)])
        process = new_process
        
display_table(title="Process changes for the forward taint of 'Hello!'", 
              headers=["Transition", "Process", "Tainted memory"],
              html_lines=table)

# Analyzing communication between clients and server 

## Sent messages

To analyze sent messages, we will look for calls to the `WSASend` symbol of `ws2_32.dll` using the search API.

For each of these calls, we will then look at the parameters of the `WSASend` function call to find what was called.

The prototype of the `WSASend` is the following:

```C
int WSAAPI WSASend(
  SOCKET                             s,                   // rcx
  LPWSABUF                           lpBuffers,           // rdx
  DWORD                              dwBufferCount,       // r8
  LPDWORD                            lpNumberOfBytesSent, // r9
  DWORD                              dwFlags,
  LPWSAOVERLAPPED                    lpOverlapped,
  LPWSAOVERLAPPED_COMPLETION_ROUTINE lpCompletionRoutine
);
```

We will need to look the sent content by reading `rdx`.

In [None]:
wsasend_symbol = next(server.ossi.symbols("^WSASend$", binary_hint="ws2_32"))
print("List of WSASend calls:", list(trace.search.symbol(wsasend_symbol)))

call = next(trace.search.symbol(wsasend_symbol))

lpBuffers = call.read(regs.rdx, types.Pointer(types.USize))
print(lpBuffers)
buf0_size = call.read(lpBuffers, types.U64)
print(buf0_size)
buf0_buf = call.read(lpBuffers + 8, types.Pointer(types.USize))
print(buf0_buf)
buf = call.read(buf0_buf, buf0_size, raw=True)
print(buf)

In [None]:
wsasend_symbol = next(server.ossi.symbols("^WSASend$", binary_hint="ws2_32"))
# Let's do this for all calls and print the results as an HTML table
table = ""

for call in trace.search.symbol(wsasend_symbol):
    lpBuffers = call.read(regs.rdx, types.Pointer(types.USize))
    buf0_size = call.read(lpBuffers, types.U64)
    buf0_buf = call.read(lpBuffers + 8, types.Pointer(types.USize))
    buf = call.read(buf0_buf, buf0_size, raw=True)
    call_transition = call.transition_after()
    table += table_line(["#{}".format(call_transition.id), call.ossi.process(), buf])
    
display_table("WSASend calls", ["Call Transition", "Process", "Received buffer"], table)

## Received messages

To analyze received messages, we will look at the `recv` function of `ws2_32.dll`.

Its prototype is the following:

```C
int recv(
   SOCKET s,    // rcx
   char   *buf, // rdx
   int    len,  // r8
   int    flags // r9
 );
```

This time, the value of `buf` will be available only at the end of the function, so we will need memory history to reach the end of the function from its beginning.


```
 0x57fcf7    call  0x5c1dec ($+0x420f5) 
 #6216920 ---- __adddf3+0x2bc - chat_server.exe <- Context of the call
 0x5c1dec    jmp   qword ptr [rip + 0xb0d72] 
 #6216921 ---- recv - ws2_32.dll
 0x7ffb6cb6dd90 mov   qword ptr [rsp + 8], rbx
``` 

We will need to get the `call 0x5c1dec` instruction from the context of the call, and then use memory history on the return address to find
the corresponding `ret` instruction
    

In [None]:
recv_symbol = next(server.ossi.symbols("^recv$", binary_hint="ws2_32"))

table = ""

for call_ctx in trace.search.symbol(recv_symbol):
    buf_addr = call_ctx.read(regs.rdx, types.Pointer(types.USize))

    # go to the end of the function
    ret_ctx = get_ret_ctx(trace, call_ctx)
    actually_recvd = ret_ctx.read(regs.rax)
    if actually_recvd == 0xffffffff:
        continue
    buf = ret_ctx.read(buf_addr, actually_recvd, raw=True)
    table += table_line(["#{}".format(call_transition.id), call_ctx.ossi.process(), buf])

display_table("recv calls", ["Call Transition", "Process", "Received buffer"], table)

This is the end of this notebook.

Thank you for following along!