New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automation and Alerts overview #6
Comments
Yes that sounds great looking forward to it!! |
Thanks for the troubleshooting tips! I will try that out. So a script check created via a policy would be something like:
There is a many-to-many relationship between policies and client, sites, and agents. This way the same policies can be applied to anywhere. This integrates well with the system already in place for checks and I can reuse the Checks Tab UI within the policy manager which is a plus! |
I saw the recent commits. That's alot of code! I merged them and pushed my local changes to my fork. Here is what I have so far with the policies. I'm moving pretty slow because I have zero XP with Python/Django, but I'm learning as I go! Assigning checks to a policy works and you can assign policies to Clients, Sites, and Agents. I have the overview working mostly. I need to allow previewing the checks and tasks on the right side. I also need to modify the rest of the checks modals to support both agent ids and policy ids. To quickly view the status of the Policy checks, I think I can create a list of the affected agents with the check output. I see the tasks tab for the agents. I'll add that to the policies and work that in. I also noticed that the DiskChecks and the WinService Checks require a list from the agents. I was thinking for DiskChecks to assign checks based on drive letters, and for Windows Service to be able to choose from a list of default Windows Services and also being able to type them in. Let me know if you see any issues with the above. Thanks! |
I also expanded the "developing with Docker" section of the docker readme. I have been moving back and forth from a Windows PC and a Chromebook and haven't had any issues. |
Awesome, am testing out the fork and looks good! You are doing great for no django xp lol. Yea currently I am getting only available disks and services from the agent but I see now that won't work when trying to create a generic policy. I'll work on adding disks and generic services to choose or manually type in, and then change the agent to first check and make sure the disk / service exists first locally before attempting to run the check. Please submit a PR when you can so I can get the changes merged. Also I just realized I added all the code for the new automated tasks in the automation app that you created, I probably should have created a new django app called tasks, hope I didn't mess anything up for you. Let me know if you want me to move it into a separate app. |
#14 A few improvements I have a question on the use of the right-click context menus. Do you want those to be used throughout the application? In the Automation Manager there are buttons along the top for operations and the check and task tables below have the context menu. I was thinking to either convert everything to context menu or to remove the menu on the checks table and add the buttons on the top. While testing, I also found that if the client and site are named the same, The Site won't show in the main Tree. It also doesn't show up in the Policy Overview, so it might not be getting saved to the database. Next, I plan on modifying the task and checks table within Automation Manager to work with policies. |
I prefer context menus. To be honest, I am shit with design and UI and therefore have been copying the design from the solarwinds RMM we use at work lol, so whatever it looks like there I just make it match. That's why most things use context menu but script manager doesn't, literally same exact design as the solarwinds one. But I think going forward we should use context menu for everything else. I'm not able to replicate the client and site same name issue. Check this screenshot, seems like they are showing up in both places. Maybe try creating a new database and running migrations fresh? Cuz I've recently done that so maybe bug is with an old migration, I made some breaking changes recently which require fresh db. Awesome work so far. Right now the agent is just getting data from the ChecksSerializer, so whenever you are done working on all the policy stuff we can just combine regular checks and policy checks into one big serializer that way won't have to make any changes in the agent's code. |
Wow I just tested on my setup and the duplicate names are not an issue. Not sure what that was all about! It was right after I flushed the DB and did the initial site setup. Maybe I'm just crazy. I also noticed from the screenshot that I mixed up the icons for the client and site in the Policy Overview. I can swap those around. I can get the Automation Manager setup with context menus. I think it makes the UI cleaner. I like the serializer idea. That should allow for minimal UI changes. |
lol, I think your icons are correct (client: business, site: apartment), I liked them more than the original ones i had for client and site in the main tree, so when you first added the policy overview, I changed them to match yours but looks like I put them in backwards lmao |
Should there be an edit modal for the automated tasks? Also love the work on the agent exe download! Definitely makes it easier. |
Thanks! Yea there is an edit button but doesn't do anything yet lol Im gonna work on adding it now. |
No problem! I was just making sure I didn't delete it or something |
Can you check this change I made. I have no idea why one works and the other doesn't. |
That is strange! I was able to add a policy task, so I thought everything was working. I'm good with keeping the computed property. Maybe the map state doesn't support a complex callback function. |
Ok thanks. Yea i didn't notice it at first but if you try to add a task and choose check failure as the trigger, the console is full of errors saying policypk not defined. I've been working on adding the edit auto tasks, which has been painful due to how i structured the model using computed relationships instead of actual relationships, which im trying to change and now realizing that I designed the Checks models poorly and am going to be rewriting them to use model inheritance, which is gonna break alot of shit but needs to be done lol especially since I plan on adding alot more checks and there is too much duplicate code so just gonna get worse. just fyi xD |
Neat! So with the inheritance change would we be able to pull all checks with a db query to the base model? I think that will definitely make things more scalable. Let me know if you need a hand refactoring! |
yep exactly that's the plan. using multi table inheritance https://docs.djangoproject.com/en/3.0/topics/db/models/#multi-table-inheritance edit: or not lol, reading alot about multi table inheritance is considered bad practice and im already seeing why just trying to write serializers lol. prolly gonna go with abstract base models https://docs.djangoproject.com/en/3.0/topics/db/models/#abstract-base-classes |
It might make sense in this case to use a single checks table with a JSONField column for the differing columns. It would definitely simplify the checks as a whole, but you may run into issues with field validation within the JSONField. Not sure how that would play out, but is an option. The way I see it, there are 5 main types of checks. Threshold(wmi) Checks, Script Checks, Service Checks, PingChecks, and EventLog Checks. Cpuload, Memory, DiskSpace, etc are basically the same except they compare a different field to a user defined threshold. If you derive from an abstract Checks model, going with the above might simplify that and even allow for user defined "Threshold" checks were they can compare any data point that the agent returns to a defined threshold. Just a few ideas, looking forward to seeing the branch! Meanwhile I'm still struggling getting the vue tests to run in the Policy Checks Table, so I'm going to refactor it again. lol |
That's a great idea! I like the single checks table more than using django's abstract inheritance, which doesn't really change much from now since we still gonna have a table for each check. Yea the problem with jsonfield is alot harder to do validation which is also a reason i wanted to refactor checks is to follow django's best practice of "fat models and serializers, thin views" so should be very little logic in the views just basically validate the serializer and then call save on it. So I think im gonna just make one big check model and have a bunch of null fields for all the check specific columns, and I'll also add a jsonfield called extra_details or something just as a place to store extra info in case ever need it. |
Ha! Nice catch. Ok great ya I'm liking this refactor, waaaaay less code so far. |
Me too! I also combined the PolicyForm when I was setting up the tests and was able to leverage the dynamic component loading with a single q-dialog in the PolicyCheckTab component. It might be a good fit for the AgentChecks Tab as well. I also find it is way easier to test that a vuex action/mutation are fired from a component versus mocking the axios object. I have been adding most things to the main store.js file, but might split that up to make it more readable. It is also nice because the api endpoint url only need to be changed in one spot! |
I just created a pull request including the rework to the policies components and API. I did move the loadpolicychecks route to the automation app to match the policy automated tasks. Feel free to move it back to checks since it is basically just pulling a list of checks now. Awesome work! It is crazy how a small change like that can slim down the views! I also need to figure out the validators. They seem way more efficient when done in the serializers. The only thing I noticed was the Disk Check add with the policies isn't working. I took a swing at it, but wasn't sure what was going on. Here is the error message. I am guess it is trying to pull from the agent disks and that is why it is showing NoneType.
|
Merged, thank you! I fixed the diskcheck error, it was also broken for agent checks not just policy, was a dumb validation error lol ya am still figuring it out but really like validating in the serializer. Nah that's fine you had the right idea, I also want to move some agent specific checks views into the agents app instead of checks app. I think only generic check stuff should be in the checks app, if its agent checks then agent app, and policy checks then automation app. |
merged everything into develop and released a new agent. am sure there are more bugs but will fix in develop. ty for helping me with the refactor! |
No problem! I will get the new agent installed and check it out. I was thinking about also adding context menus to the Site and Client Tree items for allow for actions to be performed quicker. Was thinking items like (Edit, Add to Policy, etc). Also adding a context menu item for the Agent for "Add to Policy". Let me know if that sounds good and I can get that going. |
Ha I was just doing work on cleaning up the client/site views, I just pushed my changes but feel free to change them. I actually wanted to do the edit client/site tree as a context menu, attempted it once but couldn't really figure it out so just moved on lol but yes please if you are able to i would much rather have context menu's for the main tree. And yea the add to policy sounds great too. Thanks! |
When you get a chance can you please look at the new branch I made called quasarcli. Want to move over to using quasar's cli instead of regular npm for building so can start using the full features of quasar, esp the new build mode which i tested significantly reduces the bundle size and compile time. I got everything working except tests, hopefully won't be too hard to switch over. |
No problem! I'll check it out. I think they actually have a testing library that wraps vue-test-utils. |
Great work on the policy checks! I've been testing it out. Was not seeing checks being added under the agent checks table, it seems they are only added once the agent checks in, so if an agent is offline (which happened with some of my agents since we lost internet at a site) I guess the checks won't be added until the next time the agent checks in. Is this intended? Or should they checks be added as soon as an agent is added to a policy, and then show up as "awaiting first synchronization" like they currently do with regular checks. Also I cannot seem to delete a policy, error says policy with PK is not found, even though it's the correct PK so something weird going on. Here's the traceback ` During handling of the above exception, another exception occurred: Traceback (most recent call last): |
Thanks for testing! It looks like the post_delete hook wasn't appropriate for the policy deletion since the policy needs to exist for it to work. I changed it to pre_delete and I think it is good now. I created a pull request. I was a little stumped about how to implement the agent policy checks. When I started digging into how the checks worked, I found that it stored the status and result information in the check itself. So I figured each agent that a policy check affected needed a separate policy check to be able to report the status. And it also needed to be synchronized whenever there was a change to a policy, policy check, and also agent check additions/deletes. I see what you are saying about the agents that aren't checking in. You could move the policy check generation logic directly into the signal, and they would be created immediately. One reason why I though to separate the policy check creation from the request was that it could potentially affect 1000's of agents, so it might take a long time to run a simple policy edit request. You could also fire off a celery task to create the checks asynchronously, so the request is a little more snappy. |
Cool ty. And yea definitely don't want to do that much creating/editing in the view, celery task would be perfect. If you're able to move it to that would be awesome but if too much work don't worry about it. |
I was able to get the agent policy check generation into a task. It worked, but I merged your latest commits and it says that it can't find uninstall_pending on the agent. I am guessing if I update the agent to 0.9.1 it will fix that problem. Also a few things I need to add yet,
I also wanted to change when the policy checks are generated and when checks are overwritten. Currently, whenever certain checks and policies are editing the policy checks are deleted and recreated. Would like to add some tasks that actually update the check contents, versus just deleting them. |
Nah, agent 0.9.1 will not fix that. I think migrations are out of sync cuz if I merge your PR there will be 2 migrations that start with 0006_ in agents/migrations I don't think that's supposed to happen. My 0006 migration file was where uninstall_pending and uninstall_inprogress fields were removed. Can you login to your postgres database and see if maybe those 2 fields still show up? They should not be there. |
oh I see the problem! I removed the policies_pending field on the agent and it created another migration file with the same id as your. It told me to run a makemigrations --merge, so I did that and thought it was good. I'll try to rebase the commit and run the makemigrations after that. |
can you check the new docker setup please it seems to be working for me I tested it on a fresh server but lmk if I did something wrong. |
Sounds good! I'll spin up the env and test it out. |
I did some tests on the mesh setup within docker. The user account creation is working. A few things I am having issues with:
I can get the login token, but getting it to the API is a bit awkward. I currently have a volume mapped to both mesh central and the api container and am saving the token in a file. The API can see this file and could import it once, or can just pull from this file whenever it needs to authenticate to mesh central. The issue with the creating the group and getting the EXE is that I can't start mesh central and run commands afterward within the container. I was thinking that I can install a websockets package in the api container and create a custom django command that will create the group and get the mesh exe. Is it possible to get the mesh exe from mesh central and save it in the correct folder without having to upload it? I'll keep working on this. |
For the login token yea instead of having it set in local_settings.py, can just create a new field in the CoreSettings database and just add some code to the existing initial_db_setup.py management command to read from that file and put in db. I don't think it's possible to get the mesh exe that way without changing mesh's source code. Even with the changes I made when installing using install.sh, you still have to open the link generated at the end of the install script, click on it to download and then upload it. If you go to mesh.yoursite.com/meshagents you can actually see all the binaries and they are also super easy to get from the filesystem. The problem is those do not have any metadata so pretty useless on their own. You can't get an exe without first creating a mesh device group which somehow bundles that into the exe, plus the download link changes every single time you click on it. It's like meshcentral is compiling a new exe on the fly everytime you download it if that makes sense. I don't think it's worth it lol. It's just a one time thing anyway to upload the mesh agent, I was just trying to make the initial setup a bit easier. |
Have you used the policies at all in production at all? I know there might need to be some improvements on the UI. Also was going to finish up the alerts. I was thinking about creating an alerts app. The alerts can be triggered from a failing check or task and can then be resolved, snoozed, ignored by the user. What are your thoughts on that? |
So actually for the past week I have been mass deploying in production, I am at around 400 agents right now and another 300 or so to go so been seeing how it scales and making changes to handle the load. I have not started adding any checks or policies yet but once I get all deployed I will test out the policy stuff and let you know. Yea that sounds great for alerts! |
Ok here's what I found for policy checks. Haven't tried tasks yet I commented out 2 lines that were causing celery tasks to throw errors and fail in commit ebb200f since doesn't look like they were being used anyway No policy checks are being created on the agents when I either apply a policy to a client or a site from the Client/Site tree context menu. Deleting a check from a policy does not remove it from the agent. I have alot of sites with the same name like almost all our clients have a site called HQ, and so when showing policy relations, it's showing the policy as being applied to all agents across all clients that have an HQ site, instead of just that one specific client. |
@sadnub got a discord up if you want to join https://discord.gg/upGTkWp |
Strange, I didn't get any email alerts for these messages. I'll start digging into these issues. I might rework it a little bit also. What might help is adding a foreign key on the agent for the site and client. |
Yes this is my bad I have no idea why i didn't make the client and site foreign keys, I did realize it at some point but never got around to fixing it. |
I was looking into ways to apply checks globally to either sites, clients, or agent level. I thought of having an Automation Manager component. Something like this added to the Settings FileBar.
You can add policies that include multiple checks. Then you can apply these policies at clients, sites, and agents. Similar to a GPO. The policies can be run manually, on a scheduled basis, or triggered by an external event.
I was also thinking of a way to see all active alerts. Possibly located in a notification icon by the logout menu.
Let me know your thoughts and I can work on it. I have the frontend built out mostly just need to add the routes server-side.
The text was updated successfully, but these errors were encountered: