Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for tests architecture. #96

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
381 changes: 381 additions & 0 deletions doc/tests-updates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,381 @@
# Updating test framework


## Requirements

There are several types of tests (some of them are only planned and not
implemented yet):

- Message integrity. In these tests it's important to validate that the
message received from Tempesta is equal to the expected one. It applies to both
forwarded request and response. There should be an option to split messages sent
to Tempesta in every possible set of skbs.

- Invalid message processing. Tests to validate behaviour on invalid message
parsing/processing. Tempesta may send responses to sender or close the
connection.

- Scheduler tests. It's important which server connection was used to serve
the request.

- Cache tests. It's important whether a server connection was used to serve
the request or not.

- Request-response chain integrity. Need to validate that every client receives
its responses in the correct order.

- Fault injections. Based on other types of tests, but random errors to be
emulated using SystemTap or eBPF.

- Fussing. Generating random requests and responses, Tempesta must stay
krizhanovsky marked this conversation as resolved.
Show resolved Hide resolved
operational and serve as many requests as possible.

- Workload testing. Generate pretty high request rate to confirm data safety
on highly concurrent code execution. The more CPUs is used for Tempesta -
the better. But CPU utilisation must be less than 80% to assure that no extreme
conditions happen.

- Stress testing. Testing under extreme conditions, when request rate is higher
than Tempasta can serve.

- Low level TCP testing. Some features can affect TCP behaviour, so need to
validate that the Tempesta conforms RFCs and doesn't break kernel behaviour.

- TLS testing. Having in-house TLS implementation gives great powers and great
responsibility. We must assure TLS support messages integrity, encryption
integrity. It differs from other tests, since it's important to work with raw
data lying under HTTP on client and server side.

- Tempesta-originated requests. Pretty specific tests, since Tempesta may
generate requests on its own, e.g. background cache revalidation, or health
monitor requests. It differs from other tests, since it's important to validate
support TCP messages.


There also some requirements on tests process:

- Full, reduced and extra-stability set of tests. Reduced for developers to
run before making pull requests. Full to run against pull requests.
Extra-stability for every night testing to assure stability under unpredictable
loads.

- Easy new test addition.

- Easy test case failure analysis.

- Generate a few tests based on singe test description. E.g. in fussing test
it's required to know, how the Tempesta would react on a message, and what
should be threated as error. The 'splitting on different skbs' is actually
fussing test feature, not functional. Usually it also make sense to generate
workload test based on functional profile.


## Problems of current implementation

The main problem is having two frameworks in the same time. Old, feature rich,
but too sophisticated, with requirement to understand 5-6 files to understand
what the test is actually does. And a new, more friendly, but not complete and
with lack of the most important features. Transition of an old one to a new
wasn't completed, and the development of both was rather unorganized, thus
new framework is becoming more and more complex and loses its advantages.

In both frameworks every test type is implemented as unique test, so it's
required to write message integrity, fussing, workload and many other test
cases for the same situation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have no idea how to easily make the test framework to split a message into different sets of skbs. Probably we can use TCP_NODELAY and write the message in different parts, but I'm not sure about bufferization on Python side. However, I believe the task is solvable in generic way.

I suppose the the message integrity and probably skb splitting can be done on deproxy side. If we need the same for Nginx tests, then we can just implement the logic as a separate class/module.

All in all, with https://www.youtube.com/watch?v=oO-FMAdjY68 and the requirements above in mind, I believe that the test framework should use OO relatively complex class inheritance for the helping functionality while the tests should just on/off necessary flags for message integrity calculation, sending in different skb and other things. I.e. the helping framework must provide as declarative API as possible and it's not so important how complex it internally is.


Heavy dependency between tests. When a new feature is added, it's added to the
test framework core and affects a lot of tests. Sometimes unpredictably and
badly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we had them. We need to discuss this in chat considering particular cases. There are different possibilities what happens when we're in the situation - we grow the framework that it can run many different things in future or we just keep case-specific login in the framework. Let's talk on the cases in the chat. The cases should make the framework design problems clear.



## Proposed design

The proposed design improves the new test framework and intends to deliver it to
full-feature test tool.

The whole test case is described in single file, configuration of each entity
is listed as plain text template with some parameters. No procedure generated
configs like in old framework.

Most of the test cases should use following description:
```

class SomeTest(BaseTestClass):

backends = [# For all tests except stress tests:
{
'type' : 'deproxy',
'id' : 'id1',
'response' : gen_response_func,
},
# OR:
{
'type' : 'wrk',
'id' : 'id1',
'script' : script_name,
# Other wrk-specific options.
},
]
clients = [# For all tests except stress tests:
{
'type' : 'deproxy',
'id' : 'id1',
'request' : gen_request_func,
'response' : response_cb,
},
# OR:
{
'type' : 'wrk'
'id' : 'id1',
'script' : script_name,
# Other wrk-specific options.
},
]
tempesta = {
'config' : '...'
}
test_variations = ['functional', 'workload', 'fussing']

def test(self):
< ... test procedure steps ... >
pass
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gen_response_func and gen_request_func are nice and must be quite useful, but it seems they also can be done as an extension (feature) of current framework.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update It seems the callbacks are only needed to generate different requests and responses for single deproxy/tempesta run. Actually test_malformed_headers.py and test_tls_integrity.py use current API for the same things: we have set_response() to dynamically change deproxy response also each response is trivially checked.

Please review test_tls_integrity in #103 and let's discuss there if the architecture can benfit from the callbacks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The set_response() callback in the new framework much near to the behaviour I want. But it still has a lot of things to enhance (as I think). Tests in the current framework are not scalable, you cant run the same test on one or 1'000 connections simultaneously.

How the current tests in the new framework run:

  • You start all servers/clients;
  • generate requests and responses and assign them to client/server (callback set_response() you've mentioned above),
  • press Run button: wait until all server connections are established and begin to start requests, wait until all messages are sent;
  • run asserts, on message integrity and message content, e.g. md5 sum checks against received body in test_tls_integrity test. The tests contain no headers validation by default and it must be implemented.

How to make test scalable:

  • configure a number of callback functions for requests and responses;
  • start servers and clients, each in it's own process.
  • press Run button:
    • client makes a callback to generate request (or to chose one of predefined one) and send it to server;
    • server receives request and runs the callback to validate it, compare with expected, generate response for it. Server sends the response;
    • client calls a callback function to validate received response and compare it with the expected;
    • Probably callbacks on connection closes are also required.
  • Just finish the tests: all assertions are done in client or server callbacks.

The second case is more scalable, the test itself is tolerant to number of concurrent connections. May be it's a my misbelief, but I think that scalability from one connection to thousands is important. Issue #107 is very tightly connected here. It's a huge work while we have a plenty of other tasks, I understand that. But I don't know more suitable tools here.

Sure, some special changes will be required to make tests scalable, e.g. in cache tests each client must request unique URIs; in scheduler tests we would like to see backend server ID in the response, anyway it's better than check request counters across all backends.


It doesn't look very different from the current description of tests in the
new framework, but it completely different under the hood. So tests procedures
will significantly change.


### Client and Server Mocks

Each client and server are started in separate process. This is required since
python has no real multithreading capabilities (see Global Interpreterer Lock).
There already are multiple processes in the new framework, but final assertions
are done in main process making it difficult to check the used connections,
clients, servers and order.
krizhanovsky marked this conversation as resolved.
Show resolved Hide resolved

With the proposed solution each mock client and server is to become separate
process with configured request-response sequence, all assertions is to be done
inside that mock processes. With this approach all tests will be closer to
mock client and server programming than programming of tests routines.


#### Mock Client

Separate process, started by the test framework. Has following arguments:

- Script. - filename with the request/response functions. The content of it
is described later.

- ID. - client id.

- Server address. - Address of Tempesta, IP and port.

- Interfaces. - interfaces the client sockets will be binded to
(using `SO_BINDTODEVICE` socket option).

- Concurrent connections. - Number of concurrent connections for each defined
interface.

- Debug file. - Filename to write debug information.

- Repeats. - Number of repeats of the request/response sequence before connection
is closed.

- RPS. - maximum rps.

- Split. - Boolean value, meaning that requests must be splited in multiple skbs.
Don't know how to implement it yet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a look on pv(1). TCP_NODELAY also could be an answer.


Client has two structures to contain required information: `client_data` for
global client data and `conn_data` for per connection data.

First, client waits for POSIX barrier until all other mock servers are ready.
Then for every connection it does the following algorithm:

1. Open a new connection and bind it to interface. Set `conn_data.request_id`
to `0`.

2. Call `gen_request_func()` from the script to generate the request.

3. Copy generated request to send buffer. Push request to `conn_data.request_queue`
and increment `conn_data.request_id`.

4. If not `conn_data.pipeline` go to step `5`, otherwise go to step `2`.

5. Send current buffer to Tempesta. Start `conn_data.req_timeout` timer.

6. Receive as many as possible. If `conn_data.req_timeout` is expired go to `err`

7. Try to parse response from the buffer. If full response is received go to `8`,
else go to `6`.

8. For complete response run `response_cb()` from the script. If error happen
go to `err`. Remove request from `conn_data.request_queue` If there are request
unreplied, go to `6`. Else stop `conn_data.req_timeout` timer.

9. If there are unsent requests go to `2`.

10. Close connection. If `conn_data.repeats` go to `1`.

11. Exit.

`err.` Build error message, close connection and exit.


The algorithm talks about two functions: `gen_request_func()` and `response_cb()`.
Lets take a closer look.

`def gen_request_func(client_data, conn_data)`. Returns `request` if any or `None`.
Generic implementation should build the following request:
```
GET / HTTP/1.1
T-Client-Id: %client_data.client_id%
T-Conn-Id: %conn_data.conn_id%
T-Req-Id: %conn_data.request_id%

```
In the same time the function can modify following variables:
`conn_data.pipeline` - if set, the next request bust be pipelined. In defaults: `false`.
`conn_data.expect_close` - Connection should be closed by the peer after response to this request, `false` by defaults.
`conn_data.expect_reset` - Connection should be closed immediately, `false` by defaults.
`conn_data.close_timer` - Timer is set if connection is to be closed.

If a response is expected for the connection, `request.expected_response` must
be created.

Other variables may be added or set in `conn_data` or `client_data`.

`def response_cb(client_data, conn_data, response, closed)`. Return bolean success code.
The client receives some response:
```
HTTP/1.1 200 OK
T-Client-Id: %client_data.client_id%
T-Conn-Id: %conn_data.conn_id%
T-Req-Id: %conn_data.request_id%
< ... Standard Tempesta's headers ... >

```
All headers including `T-Client-Id`, `T-Conn-Id`, `T-Req-Id` must match
`conn_data.requests[0].expected_response`. Not all header may be matched
exactly, especially `Date`, `Age` and other time headers. If so, headers can't
have values smaller than expected. `T-Srv-Id` and `T-Srv-Conn-Id` headers may be
ignored in most tests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we do the same client/connection/request tracking in current deproxy?


Boolean `closed` argument is set to true if the connection was closed by the peer.
The function is also called if the connection was reset by peer, but no response
was received. In this case `response` is set to `None`.

If received response doesn't match expected one, then an error message is generated:
```
Error on request-response processing!
For request:
---
<Request>
---
Response was expected:
---
<Expected response>
---
But received response was:
---
<Received response>
---
```
Error should also be generated if the connection was closed unexpectedly or
connection wasn't closed as expected.


If debug is enabled, write to file `debug_filename_%connid%` any activity on the
connection with the timestamp: received bytes, sent bytes, and parsed messages.


#### Mock Server

Follows the same concept as mock client. Arguments:

- Script. - filename with the request/response functions. The content of it
is described later.

- ID. - server id.

- Interface:conns. - interfaces the client sockets will be listening and number
of Tempesta's connections for it.

- Split. - Boolean value, meaning that requests must be splited in multiple skbs.
Don't know how to implement it yet.

- Debug file. - Filename to write debug information.

Server is stopped by `TERM` signal from the framework.

Script contains queue of expected requests `server_data.requests[]`. If all
requests was received, but a new request is received, then client is working in
repeated mode, thus `conn_data.expected_request_num` must be reset to `0`.

`def gen_response_func(server_data, conn_data, request)` function is used to
give a response for the client. Returns `response` or `None`. The received
request is compared with `server_data.requests[T-Req-Id]`
but note, that `T-Client-Id`, `T-Conn-Id`, `T-Req-Id` values may be missed in the
expected request.

The generated response **must** contain headers copied from the request:
`T-Client-Id`, `T-Conn-Id`, `T-Req-Id`. `T-Srv-Id` and `T-Srv-Conn-Id` must
also be generated.

If the server need to close the connection in response to request, it sets
`conn_data.close` variable. The variable `conn_data.expect_close` can also be
set if generated response is invalid and Tempesta will reset the connection.


### Framework

The framework now is defined not as list of actions but as HTTP flows on client
and server. This allows to replace Tempesta by other HTTP ADC software and easily
compare behavior between them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each (or almost each) test case must configure an ADC and the configuration language and features are very different for different ADCs. Nginx already has it's own and pretty large test suite, but we can't just use it - mostly because of differences in configurations and features.


This also allows to create multiple tests from the same description:

- Message integrity, Invalid message processing, Cache tests: single client,
single client connection.

- Scheduler, Workload, Request-response chain integrity tests: multiple clients
from many source interfaces, multiple concurrent connections for each client.

- Fuzzing tests: Workload tests, but every generated messages are split into
different number of skbs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it seems that the things can be done in deproxy.


When reduced number of tests is executed, only first group of tests is executed,
the full set is for first and second groups. All groups are executed in background
stability tests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense for generic case, but not so much sense if a developer changes low level logic w/o changing any features and he's curious about network operation. I propose to be able to set this in config as well as for particular tests. In this way you can run for example a particular TLS test with message integrity check and/or fuzzing (strictly speaking, skb splitting isn't a real fuzzing) at the same time some tests oriented on the low level logic have sense only with one of few options enabled.


At the first sight Cache tests can't be executed multiple times.
But if `URI` and/or `Host:` mutated between different client connections and
repeats it becomes possible. Thus Cache Workload testing should be fine.

Fault injections tests are very similar to Message integrity tests, but a few
prior actions to inject errors are required before starting the client.


#### Prepare to start

Copy functions defined in `clients` and `server` array of test class to separate
file to start clients and servers as distinct processes. Probably other python
interpreters can be used for that. `Pypi` is extremely fast in comparing to
standard interpreter. This will allow pretty good RPS in work load tests.

The framework must create all required interfaces before start.

POSIX barrier is used to synchronize clients and servers. Clients shouldn't
start generating requests before all server connections are established.


## TBD

TLS tests - ?
krizhanovsky marked this conversation as resolved.
Show resolved Hide resolved

Low level TCP testing - ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the recent TLS changes affecting TCP transmission process we have to test TCP operation (@i-rinat has questions regarding TCP segment sized generated by Tempesta). The question will be even harder for HTTP/3. Probably ScaPy isn't a good option for this, because it'd be too tricky to generate TCP flow by hands. Probably iptables mangling and/or eBPF with usual TCP options like Nagle will suite the best. We already have something in helpers/analyzer.py and helpers/flaky.py.


Code coverage -?
krizhanovsky marked this conversation as resolved.
Show resolved Hide resolved

There may many thing to be discussed, feel free to raise questions.