-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Azure ORM and deprecate last_dates.json file #12849
Conversation
182eaa0
to
b5470ad
Compare
c9ec631
to
7c9c56f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a couple of improvements that I think it would be great to add to your PR -apart from the ones I suggested directly at the code`:
- It would be great if you stored the query used for each service along with the md5, even if you don't use it later. It could be useful for debugging purposes.
- Try to add the subscription ID to each Log Analytics row and use it along with the md5 hash as the primary key of the DB. This way, you will allow different accounts to make the same queries without the hassle of manually configuring them as
<query> | where SubscriptionId == '<SubscriptionId-1>'
. - I think it would have been better to use exceptions to handle ORM errors instead of returning
False
when something goes wrong. It isn't pythonic at all, which would be fine if it had any clear advantages, but I don't think it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of things that you should review.
0f23dd8
to
cf0e08b
Compare
cf0e08b
to
3c30dae
Compare
This was already addressed and included in this PR.
This will be addressed in #13051
This was replaced with a custom |
Description
This PR adds an ORM for the Azure module to allow us to use a real database instead of a plain text
last_dates.json
file. The development includes the following changes:orm.py
: The core ORM logic.orm
:test_orm.py
.azure/tests/data
path with severallast_dates.json
files used by the unit tests.azure-logs.log
to use the neworm
functionality and deprecate thelast_dates.json
file.ORM details
ORM
design highlights:SQLAlchemy
, similar to how the RBAC ORM works in the Framework.Graph
,LogAnalytics
andStorage
.md5
: the md5 value of the query to process, used as a key. The md5 has been kept to avoid breaking compatibility with the old module when migrating the last_dates.json file.: the text string of the query to execute. It can be used in future versions to replace the
md5` field if necessary. Until then, it will serve as additional debug info when debugging the module.min
: The smallest date processed. This already existed inlast_dates.json
.max
: The highest date processed. This already existed inlast_dates.json
.Considerations
min
andmax
colums store datetimes as strings instead of Python datetime objects. This is due to Azure's use of datetime to indicate the activity date for the events. As we pointed out in the original Azure issue, in some cases the Azure API returns a value for the activity date whose value for microseconds has 7 decimal places. This is incompatible with the python datetime standard, which requires 6 digits. The database had been created to store Datetime objects, so it was missing this last digit, losing precision which in the end translates into the possibility of duplicate events.md5
as a key to identify a query in the database has been retained. This is to maintain compatibility or else duplicate logs could be processed during the first run after a migration.New Azure Unit Tests
New unit tests for the
orm.py
have been added. Here are the results of the new tests:Please note that we currently don't have unit test for the
azure-logs-py
and they are out of the scope of this issue, hence no coverage for this module.Manual testing
The following tests must be performed in sequential order. No data must be modified or removed unless it was specified for the test.
The tests assume there is a valid authentication file located at
{wazuh_path}/wodles/azure/credentials
.A test is considered valid if the following conditions are met:
analysisd
.{wazuh_path}/logs/azure-logs.log
file.Test results
Everything worked as expected
Graph tests
Run the integration using
query 'auditLogs/directoryAudits'
andtime_offset 2d
Remove the database file and run the integration using
-graph_query 'auditLogs/directoryAudits'
and--offset 2d
. Only 4 logs should be processed. Thelast_dates.json
should be updated as expected.Command used
azure-logs output
azure.db contents
Repeat the test to ensure there are no duplicates. Keep the database file from the previous test and run the integration.
Command used
azure-logs output
azure.db contents
Conclusions
The module worked as expected.
Log Analytic tests
Run the integration with
time_offset 30d
Run the command using
--la_time_offset 30d
. 50 logs should be processed.Command used
Command output
azure.db contents
Repeat the test to ensure there are no duplicates. Keep the database file from the previous test and run the integration.
Command used
Command output
azure.db contents
Conclusions
The module worked as expected.
Storage tests
First execution specifying an time_offset value set in the past
Run the integration using
--storage_time_offset 120d
.Only the files contained in the
test_folder
path should be processed as the other ones have an olderlast_modified
datetime.Command used
Command output
azure.db contents
Keep the database file and run the same command again to ensure no duplicate logs have been processed.
Command used
Command output
azure.db contents
Conclusions
The module worked as expected.
Test last_dates.json migration
Having a
last_dates.json
file with the following contents:last_dates.json
Remove the database file and run the module and check the database status after the execution to ensure the data was successfully migrated
Command used
azure-logs output
azure.db contents