Skip to content

Handle TimeoutException During Agent Replacement in agent configuration #5210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 29, 2025

Conversation

rishabhmalikMS
Copy link
Contributor

@rishabhmalikMS rishabhmalikMS commented May 14, 2025

Context

This PR addresses an issue where TimeoutException thrown during agent replacement was not being caught in unattended mode, causing unhandled exceptions and unclear error reporting. The change ensures that timeouts are always handled gracefully, regardless of the configuration mode.
ICM: https://portal.microsofticm.com/imp/v5/incidents/details/614374419/summary

Work Item

AB#2277929

Description

Updated the exception handling logic in ConfigurationManager to always catch timeout exception during agent replacement, even in unattended mode.


Risk Assessment (Low / Medium / High)

Low
Reason: Adding retrying mechanism for agent update call in unattended agent config process. This does not harm the current process, code update is to make unattended agent configuration more resilient.


Unit Tests Added or Updated (Yes / No)

No new unit tests were added. The change is in error handling and user messaging.


Additional Testing Performed

Manual testing done by configuring agent with delayed call to updateAgent

@rishabhmalikMS rishabhmalikMS requested review from a team as code owners May 14, 2025 03:24
@rishabhmalikMS
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rishabhmalikMS
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rishabhmalikMS rishabhmalikMS changed the title adding timeout check in update agent async call during agent configur… Handle TimeoutException During Agent Replacement during agent configuration May 14, 2025
@rishabhmalikMS rishabhmalikMS changed the title Handle TimeoutException During Agent Replacement during agent configuration Handle TimeoutException During Agent Replacement in agent configuration May 14, 2025
@rishabhmalikMS
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@tarunramsinghani tarunramsinghani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the changes as suggested

@AdityaMankal-MS
Copy link
Contributor

Update the PR description. Select a risk assessment - Low/Medium/High
This is essential for this sprint's SafeFly

@rishabhmalikMS
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rishabhmalikMS
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rishabhmalikMS
Copy link
Contributor Author

Update the PR description. Select a risk assessment - Low/Medium/High This is essential for this sprint's SafeFly

Updated risk section with reason

@rishabhmalikMS
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rishabhmalikMS
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@sanjuyadav24 sanjuyadav24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rishabhmalikMS
these changes looks good, could you please add when this is used with VMSS script these exceptions are handled properly

@rishabhmalikMS
Copy link
Contributor Author

Hi @rishabhmalikMS these changes looks good, could you please add when this is used with VMSS script these exceptions are handled properly

Tested with script as well for unattended configuration
image

Here is the sample diagnostic logs generated with error information
Agent_20250521-104151-utc.log

@rishabhmalikMS
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sanjuyadav24 sanjuyadav24 requested a review from Copilot May 22, 2025 12:54
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the agent replacement process by adding retry logic for TimeoutException and related cancellations, ensuring failures are handled gracefully in both attended and unattended modes.

  • Added a localized retry message for agent replacement attempts.
  • Introduced UpdateAgentWithRetryAsync with exponential backoff and retry limits.
  • Refactored agent update calls in both configure and re-auth flows to use the new retry wrapper.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/Misc/layoutbin/en-US/strings.json Added a new "RetryingReplaceAgent" string for retry attempts notification.
src/Agent.Listener/Configuration/ConfigurationManager.cs Added retry helper, constants, and updated update calls to handle timeouts.
Comments suppressed due to low confidence (1)

src/Agent.Listener/Configuration/ConfigurationManager.cs:49

  • [nitpick] Private constants typically use PascalCase (e.g., MaxRetries) or an s_ prefix according to C# naming conventions; consider renaming _maxRetries accordingly.
private const int _maxRetries = 3;

@sanjuyadav24
Copy link
Contributor

Hi @rishabhmalikMS these changes looks good, could you please add when this is used with VMSS script these exceptions are handled properly

Tested with script as well for unattended configuration image

Here is the sample diagnostic logs generated with error information Agent_20250521-104151-utc.log

these logs are going into Agent log, this will not fail the original script
could you please check how should we modify the original script so that if agent config command has any errors enableagent script fails

@rishabhmalikMS
Copy link
Contributor Author

Further code update required in TFS script to read the output from agent configuration code to break that script if an exception is thrown.
Right now output from start-process is getting neglected not causing control to go into catch.
This can be achieved by updating script to read exitcode and act accordingly
image

Sample output
image

@rishabhmalikMS rishabhmalikMS merged commit 5453c1a into master May 29, 2025
22 checks passed
@rishabhmalikMS rishabhmalikMS deleted the users/rishabhalikMS/agentConfigTimeoutFix branch May 29, 2025 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants