Skip to content

Lightweight backups for cloud storage based tasks #9535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 36 commits into
base: develop
Choose a base branch
from

Conversation

Eldies
Copy link
Contributor

@Eldies Eldies commented Jun 16, 2025

Motivation and context

How has this been tested?

Checklist

  • I submit my changes into the develop branch
  • I have created a changelog fragment
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • I have linked related issues (see GitHub docs)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

Copy link
Contributor

@Marishka17 Marishka17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I do not think that we need to mark a data storage as "missing" when a user does not select a cloud storage from the list but clicks next to the field. It looks better to me to show the original cloud storage.
    image
  • We should check on the server that the newly attached storage has "available" status (bucket still exists and credentials are valid)
  • If I open a task created from a lightweight backup file, then there will be an exception notification:
    image
  • I think now we need to change the exception raised on the server when preparing a data chunk with unlinked CS. Now there will be a 500 status code when fetching a chunk or a task preview - let's return more appropriate codes (probably a 409 status code or at least a better message)

@@ -214,6 +229,7 @@ def _init_callback_with_params(self):
Exporter,
logger,
self.job_result_ttl,
self.export_args.make_lightweight_backup,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change affects already created rq jobs. We need to remove them or add a redis migration. Additionally, I do not really like this approach because this option can only be used when task data is linked with a cloud storage; for other storage types, this option is useless. Probably we can use the rq job meta or introduce another exporter class that inherits TaskExporter.
@SpecLad, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that this parameter has a similar role to the format/save_images parameters of dataset exports, so it should be handled in a similar way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would only make sense for me if:

  1. this option is handled for all storage types
  2. there is an option in the UI to attach some data to the created empty tasks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this option is handled for all storage types

Well... couldn't it be? I don't think there's anything actually preventing us from creating lightweight backups regardless of the current data storage type. We don't have to actually implement this right now, but we might as well structure the code so that it could be implemented in the future.

there is an option in the UI to attach some data to the created empty tasks

Sorry, I don't understand. This part of the code creates the backup, whereas this sounds like something you'd do when restoring the backup. How is it relevant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make lightweight parameter into a kwarg for create_backup. That way, existing rq backup jobs still can execute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way, existing rq backup jobs still can execute

But with another behavior since it's now True :)

cvat/schema.yml Outdated
@@ -9466,6 +9476,10 @@ components:
allOf:
- $ref: '#/components/schemas/StorageRequest'
nullable: true
data_storage:
allOf:
- $ref: '#/components/schemas/TaskDataStorageRequest'
Copy link
Contributor

@SpecLad SpecLad Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the same type as source_storage/target_storage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they have the id field, which is not needed here. Removed it, using other way to change data storage

cvat/schema.yml Outdated
@@ -10791,6 +10824,10 @@ components:
description: |
The number of consensus replica jobs for each annotation job.
Configured at task creation
data_storage:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this field need to be here? AFAIK, TaskWriteRequest is only used to create blank tasks, and the data storage for those is only set via the POST /api/tasks/<id>/data request.

Copy link
Contributor Author

@Eldies Eldies Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got rid of this in favor of using GET/PATCH /api/tasks/<id>/data/meta

cvat/schema.yml Outdated
name: lightweight
schema:
type: boolean
description: Make lightweight backup for cloud based tasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It would be useful to explain what "lightweight" means.
  2. I only skimmed the PR so I could've missed it, but I didn't see any code that would reject this parameter when the task is not cloud based. Please add it if it's not there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Tried to write a better description.
  2. I don't quite understand why it should be explicitly rejected instead of just doing nothing

@codecov-commenter
Copy link

codecov-commenter commented Jun 30, 2025

Codecov Report

Attention: Patch coverage is 71.02804% with 31 lines in your changes missing coverage. Please review.

Project coverage is 71.84%. Comparing base (ae32938) to head (bfd14f8).
Report is 4 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9535      +/-   ##
===========================================
+ Coverage    71.74%   71.84%   +0.10%     
===========================================
  Files          441      441              
  Lines        46246    46326      +80     
  Branches      3946     3952       +6     
===========================================
+ Hits         33180    33284     +104     
+ Misses       13066    13042      -24     
Components Coverage Δ
cvat-ui 77.68% <43.58%> (-0.07%) ⬇️
cvat-server 67.22% <86.76%> (+0.22%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@@ -586,6 +587,11 @@
data=str(ex),
status=status.HTTP_500_INTERNAL_SERVER_ERROR,
)
except CloudStorageMissingError as ex:
return Response(
data=str(ex),

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 25 days ago

To fix the issue, the code should replace the detailed exception message (str(ex)) with a generic error message when responding to external users. The detailed exception message can still be logged on the server for debugging purposes. This approach ensures that sensitive information is not exposed while retaining the ability to diagnose issues internally.

Steps to implement the fix:

  1. Replace str(ex) in the response with a generic error message, such as "An internal error has occurred."
  2. Log the detailed exception message using the slogger instance (ServerLogManager) for internal debugging.
Suggested changeset 1
cvat/apps/engine/views.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/cvat/apps/engine/views.py b/cvat/apps/engine/views.py
--- a/cvat/apps/engine/views.py
+++ b/cvat/apps/engine/views.py
@@ -590,4 +590,5 @@
         except CloudStorageMissingError as ex:
+            slogger.error(f"CloudStorageMissingError occurred: {str(ex)}")
             return Response(
-                data=str(ex),
+                data="An internal error has occurred.",
                 status=status.HTTP_409_CONFLICT,
EOF
@@ -590,4 +590,5 @@
except CloudStorageMissingError as ex:
slogger.error(f"CloudStorageMissingError occurred: {str(ex)}")
return Response(
data=str(ex),
data="An internal error has occurred.",
status=status.HTTP_409_CONFLICT,
Copilot is powered by AI and may make mistakes. Always verify output.
@@ -694,6 +700,11 @@
data=str(ex),
status=status.HTTP_500_INTERNAL_SERVER_ERROR,
)
except CloudStorageMissingError as ex:
return Response(
data=str(ex),

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 25 days ago

To fix the issue, the code should be modified to ensure that the exception message is not directly exposed to the user. Instead, a generic error message should be returned, while the detailed exception information is logged for debugging purposes. This approach ensures that sensitive information is not leaked to external users while still allowing developers to diagnose issues.

The changes required are:

  1. Replace str(ex) in the HTTP response with a generic error message, such as "An internal error occurred."
  2. Log the detailed exception message using the slogger logger for internal debugging.
Suggested changeset 1
cvat/apps/engine/views.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/cvat/apps/engine/views.py b/cvat/apps/engine/views.py
--- a/cvat/apps/engine/views.py
+++ b/cvat/apps/engine/views.py
@@ -703,4 +703,5 @@
             except CloudStorageMissingError as ex:
+                slogger.error(f"CloudStorageMissingError occurred: {str(ex)}")
                 return Response(
-                    data=str(ex),
+                    data="An internal error occurred.",
                     status=status.HTTP_409_CONFLICT,
EOF
@@ -703,4 +703,5 @@
except CloudStorageMissingError as ex:
slogger.error(f"CloudStorageMissingError occurred: {str(ex)}")
return Response(
data=str(ex),
data="An internal error occurred.",
status=status.HTTP_409_CONFLICT,
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Contributor Author

@Eldies Eldies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that we need to mark a data storage as "missing" when a user does not select a cloud storage from the list but clicks next to the field. It looks better to me to show the original cloud storage.

Fixed

We should check on the server that the newly attached storage has "available" status (bucket still exists and credentials are valid)

doing it now

I think now we need to change the exception raised on the server when preparing a data chunk with unlinked CS. Now there will be a 500 status code when fetching a chunk or a task preview - let's return more appropriate codes (probably a 409 status code or at least a better message)

Made a better message and also 409 code.

cvat/schema.yml Outdated
name: lightweight
schema:
type: boolean
description: Make lightweight backup for cloud based tasks
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Tried to write a better description.
  2. I don't quite understand why it should be explicitly rejected instead of just doing nothing

cvat/schema.yml Outdated
@@ -10791,6 +10824,10 @@ components:
description: |
The number of consensus replica jobs for each annotation job.
Configured at task creation
data_storage:
Copy link
Contributor Author

@Eldies Eldies Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got rid of this in favor of using GET/PATCH /api/tasks/<id>/data/meta

cvat/schema.yml Outdated
@@ -9466,6 +9476,10 @@ components:
allOf:
- $ref: '#/components/schemas/StorageRequest'
nullable: true
data_storage:
allOf:
- $ref: '#/components/schemas/TaskDataStorageRequest'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they have the id field, which is not needed here. Removed it, using other way to change data storage

Copy link

sonarqubecloud bot commented Jul 1, 2025


def init_request_args(self) -> None:
super().init_request_args()
lightweight = to_bool(self.request.query_params.get("lightweight", True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True by default may be unexpected behavior for users, as it changes the output result despite the fact that the usage of API parameters does not change on the user's side (in some scripts).

@@ -216,6 +229,9 @@ def _init_callback_with_params(self):
logger,
self.job_result_ttl,
)
self.callback_kwargs = dict(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using dictionary literal

Comment on lines +896 to +901
if data['client_files'] != [self.MEDIA_MANIFEST_FILENAME]:
raise ValidationError(f"Expected {self.MEDIA_MANIFEST_FILENAME} in backup files")

manifest = ImageManifestManager(os.path.join(self._db_task.data.get_upload_dirname(), self.MEDIA_MANIFEST_FILENAME))
data['server_files'] = list(manifest.data)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 file types (client_files/server_files) that are passed to the task creation logic. I don't remember that we're using somewhere more than one data type when creating tasks and I rather think that this should not be allowed.

@@ -1100,6 +1118,8 @@ def create_backup(
Exporter: Type[ProjectExporter | TaskExporter],
logger: Logger,
cache_ttl: timedelta,
*,
lightweight: bool = True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't think the lightweight should be True by default because it changes the logic for RQ jobs created before.

One more disadvantage of setting the default value in this func - the default value for this argument is defined 2 times now: here and when reading the request.query_params. IMHO, it's better to have only one place where the default value is defined (when handling query params). But I'm okay if it's okay for others.

@@ -971,13 +984,14 @@ def _prepare_project_meta(self, project):
class ProjectExporter(_ExporterBase, _ProjectBackupBase):
ModelClass: ClassVar[models.Project] = models.Project

def __init__(self, pk, version=Version.V1):
def __init__(self, pk, version=Version.V1, *, lightweight: bool = True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest not defining the default value for the lightweight argument here (the same comment for the TaskExported class).

Suggested change
def __init__(self, pk, version=Version.V1, *, lightweight: bool = True):
def __init__(self, pk, *, lightweight: bool, version: Version = Version.V1):

): Promise<string | void> {
const { backendAPI } = config;
const params: Params = {
...enableOrganization(),
...configureStorage(targetStorage, useDefaultSettings),
...(fileName ? { filename: fileName } : {}),
...(!lightweight ? { lightweight } : {}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks strange that you send the lightweight query param only when it's False.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the lightweight param should be optional and we should send it only if its specified

...(typeof lightweight === 'boolean' ? { lightweight } : {}),

@@ -483,6 +483,8 @@ export interface SerializedFramesMetaData {
size: number;
start_frame: number;
stop_frame: number;
storage: StorageLocation;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact it can be also "share" type

targetStorage: Storage,
useDefaultSettings: boolean,
fileName?: string,
lightweight: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lightweight: bool,
lightweight: boolean,

But I also think it should be optional

checked={lightweight}
onChange={setLightweight}
/>
<Text strong>Make light-weight backup</Text>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<Text strong>Make light-weight backup</Text>
<Text strong>Make a lightweight backup</Text>

@@ -35,6 +38,7 @@ const initialValues: FormValues = {
cloudStorageId: undefined,
},
useProjectTargetStorage: true,
lightweight: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we shouldn't show this option at least in cases where we know that it's useless (e.g. when a task has no cloud data). Was it agreed upon with others?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree here, in case there is no cs for task we shouldnt even show the form item for this

@bsekachev
Copy link
Member

bsekachev commented Jul 22, 2025

image

I do not think it makes sense to keep this switch for tasks, not created from a cloud storage

Let's add a tooltip (like "Use default settings" has), as it is not obviously for a user what is difference between light weight backup and regular

@bsekachev
Copy link
Member

bsekachev commented Jul 22, 2025

When I export regular backup and lightweight, I can't see both results on requests page, it seems they overwrite each other. Only the latest is visible. I believe it should work like for exports with different formats

image

@bsekachev
Copy link
Member

image

The layout here does not look well.

Are there any reasons why the field only gets rendered after fetching corresponding cloud storage?
May we show loading indicator instead while fetching?

@bsekachev
Copy link
Member

bsekachev commented Jul 22, 2025

image

UI for view/edit is different, may we just always show selector and do this field similar to "Assigned to"? (maybe even align it on the right).

In this form "Select cloud storage" has "*" mark, making this field obligatory, however I may left it empty. I do not think we need this mark

@bsekachev
Copy link
Member

For a project containing mixed cloud storage and non cloud storage data, the switcher "Make lightweight backup" seems a little uncertain. I would suggest to rename it, maybe like "Use lightweight backup whenever possible" (with corresponding hint)..? IDK, I would suggest to think on better message here

@bsekachev
Copy link
Member

@klakhov Could you please look at client changes here?

Copy link
Contributor

@klakhov klakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should update the style of the cloud storage selector to match the assignee selector.
We should also move it to the right to align with the assignee selector.

@@ -701,6 +715,22 @@ function saveJobMeta(meta: FramesMetaData, jobID: number): Promise<FramesMetaDat
return frameMetaCache[jobID];
}

function saveTaskMeta(meta: FramesMetaData, taskID: number): Promise<FramesMetaData> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is complete dublicate of what we have for saveJobMeta, I think we can rewrite it so we have only one function

@@ -949,6 +979,14 @@ export async function patchMeta(jobID: number): Promise<FramesMetaData> {
return newMeta;
}

export async function patchTaskMeta(taskID: number, meta: FramesMetaData): Promise<FramesMetaData> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for saveTaskMeta

targetStorage: Storage,
useDefaultSettings: boolean,
fileName?: string,
lightweight: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lightweight: bool,
lightweight: boolean,

Also we cant have a required parameter follow optional parameter fileName

@@ -995,12 +995,14 @@ async function backupTask(
targetStorage: Storage,
useDefaultSettings: boolean,
fileName?: string,
lightweight: boolean,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here, we cant have required param follow optional one

): Promise<string | void> {
const { backendAPI } = config;
const params: Params = {
...enableOrganization(),
...configureStorage(targetStorage, useDefaultSettings),
...(fileName ? { filename: fileName } : {}),
...(!lightweight ? { lightweight } : {}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the lightweight param should be optional and we should send it only if its specified

...(typeof lightweight === 'boolean' ? { lightweight } : {}),

Comment on lines +35 to +38
const data = await getCore().cloudStorages.get({ id });
if (data.length === 1) {
return data[0];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const data = await getCore().cloudStorages.get({ id });
if (data.length === 1) {
return data[0];
}
const [data] = await core.projects.get({ id });
return data;

}
}, [meta]);

const updateCloudStorage = (_cloudStorage: CloudStorage | null): Promise<void> => (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updateCloudStorage function updates task metadata, which for the user essentially means that the task itself is being updated. This should trigger the loading spinner.

I recommend merging develop and checking how task fields are updated. We use the onUpdateTask callback, which dispatches the updateTaskAsync action.

I think we should create a dedicated updateTaskMetadataAsync action and an onUpdateTaskMetadata callback, instead of handling the update logic directly here.

The reason is that within updateTaskMetadataAsync, we can dispatch updateTask / updateTaskFailed, which will update the activities.updates fields in the reducer—these are responsible for showing the loading spinner on the task page.

Comment on lines +57 to +72
useEffect(() => {
task.meta.get().then((_meta) => {
setMeta(_meta);
});
}, []);

useEffect(() => {
if (meta && meta.cloudStorageId) {
getCloudStorageById(meta.cloudStorageId).then((_cloudStorage) => {
setCloudStorage(_cloudStorage);
setCloudStorageLoaded(true);
});
} else {
setCloudStorageLoaded(true);
}
}, [meta]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetching task metadata and the corresponding cloud storage should be part of the receiveTask function for the task-page, which manages the loading spinner.
If we try to fetch this data here, the user will see the page before all the necessary data has been fully loaded.

meta && meta.storage === StorageLocation.CLOUD_STORAGE && (
<Row justify='space-between' align='middle'>
<Col span={12}>
<Row className='cvat-cloud-storage'>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is wrong with class names here

Suggested change
<Row className='cvat-cloud-storage'>
<Row className='cvat-task-cloud-storage'>

{ cloudStorageLoaded && cloudStorage && renderCloudStorageName(cloudStorage) }
{ cloudStorageLoaded && !cloudStorage && 'MISSING' }
<Text
className='cvat-issue-tracker-value'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
className='cvat-issue-tracker-value'
className='cvat-task-cloud-storage-?'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants