-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Lightweight backups for cloud storage based tasks #9535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…ange cloud storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I do not think that we need to mark a data storage as "missing" when a user does not select a cloud storage from the list but clicks next to the field. It looks better to me to show the original cloud storage.
- We should check on the server that the newly attached storage has "available" status (bucket still exists and credentials are valid)
- If I open a task created from a lightweight backup file, then there will be an exception notification:
- I think now we need to change the exception raised on the server when preparing a data chunk with unlinked CS. Now there will be a 500 status code when fetching a chunk or a task preview - let's return more appropriate codes (probably a 409 status code or at least a better message)
cvat/apps/engine/background.py
Outdated
@@ -214,6 +229,7 @@ def _init_callback_with_params(self): | |||
Exporter, | |||
logger, | |||
self.job_result_ttl, | |||
self.export_args.make_lightweight_backup, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change affects already created rq jobs. We need to remove them or add a redis migration. Additionally, I do not really like this approach because this option can only be used when task data is linked with a cloud storage; for other storage types, this option is useless. Probably we can use the rq job meta or introduce another exporter class that inherits TaskExporter
.
@SpecLad, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that this parameter has a similar role to the format
/save_images
parameters of dataset exports, so it should be handled in a similar way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would only make sense for me if:
- this option is handled for all storage types
- there is an option in the UI to attach some data to the created empty tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this option is handled for all storage types
Well... couldn't it be? I don't think there's anything actually preventing us from creating lightweight backups regardless of the current data storage type. We don't have to actually implement this right now, but we might as well structure the code so that it could be implemented in the future.
there is an option in the UI to attach some data to the created empty tasks
Sorry, I don't understand. This part of the code creates the backup, whereas this sounds like something you'd do when restoring the backup. How is it relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make lightweight
parameter into a kwarg for create_backup
. That way, existing rq backup jobs still can execute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That way, existing rq backup jobs still can execute
But with another behavior since it's now True :)
cvat/schema.yml
Outdated
@@ -9466,6 +9476,10 @@ components: | |||
allOf: | |||
- $ref: '#/components/schemas/StorageRequest' | |||
nullable: true | |||
data_storage: | |||
allOf: | |||
- $ref: '#/components/schemas/TaskDataStorageRequest' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the same type as source_storage
/target_storage
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they have the id
field, which is not needed here. Removed it, using other way to change data storage
cvat/schema.yml
Outdated
@@ -10791,6 +10824,10 @@ components: | |||
description: | | |||
The number of consensus replica jobs for each annotation job. | |||
Configured at task creation | |||
data_storage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this field need to be here? AFAIK, TaskWriteRequest
is only used to create blank tasks, and the data storage for those is only set via the POST /api/tasks/<id>/data
request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got rid of this in favor of using GET/PATCH /api/tasks/<id>/data/meta
cvat/schema.yml
Outdated
name: lightweight | ||
schema: | ||
type: boolean | ||
description: Make lightweight backup for cloud based tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It would be useful to explain what "lightweight" means.
- I only skimmed the PR so I could've missed it, but I didn't see any code that would reject this parameter when the task is not cloud based. Please add it if it's not there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Tried to write a better description.
- I don't quite understand why it should be explicitly rejected instead of just doing nothing
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9535 +/- ##
===========================================
+ Coverage 71.74% 71.84% +0.10%
===========================================
Files 441 441
Lines 46246 46326 +80
Branches 3946 3952 +6
===========================================
+ Hits 33180 33284 +104
+ Misses 13066 13042 -24
🚀 New features to boost your workflow:
|
@@ -586,6 +587,11 @@ | |||
data=str(ex), | |||
status=status.HTTP_500_INTERNAL_SERVER_ERROR, | |||
) | |||
except CloudStorageMissingError as ex: | |||
return Response( | |||
data=str(ex), |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 25 days ago
To fix the issue, the code should replace the detailed exception message (str(ex)
) with a generic error message when responding to external users. The detailed exception message can still be logged on the server for debugging purposes. This approach ensures that sensitive information is not exposed while retaining the ability to diagnose issues internally.
Steps to implement the fix:
- Replace
str(ex)
in the response with a generic error message, such as "An internal error has occurred." - Log the detailed exception message using the
slogger
instance (ServerLogManager
) for internal debugging.
-
Copy modified line R591 -
Copy modified line R593
@@ -590,4 +590,5 @@ | ||
except CloudStorageMissingError as ex: | ||
slogger.error(f"CloudStorageMissingError occurred: {str(ex)}") | ||
return Response( | ||
data=str(ex), | ||
data="An internal error has occurred.", | ||
status=status.HTTP_409_CONFLICT, |
@@ -694,6 +700,11 @@ | |||
data=str(ex), | |||
status=status.HTTP_500_INTERNAL_SERVER_ERROR, | |||
) | |||
except CloudStorageMissingError as ex: | |||
return Response( | |||
data=str(ex), |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 25 days ago
To fix the issue, the code should be modified to ensure that the exception message is not directly exposed to the user. Instead, a generic error message should be returned, while the detailed exception information is logged for debugging purposes. This approach ensures that sensitive information is not leaked to external users while still allowing developers to diagnose issues.
The changes required are:
- Replace
str(ex)
in the HTTP response with a generic error message, such as "An internal error occurred." - Log the detailed exception message using the
slogger
logger for internal debugging.
-
Copy modified line R704 -
Copy modified line R706
@@ -703,4 +703,5 @@ | ||
except CloudStorageMissingError as ex: | ||
slogger.error(f"CloudStorageMissingError occurred: {str(ex)}") | ||
return Response( | ||
data=str(ex), | ||
data="An internal error occurred.", | ||
status=status.HTTP_409_CONFLICT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think that we need to mark a data storage as "missing" when a user does not select a cloud storage from the list but clicks next to the field. It looks better to me to show the original cloud storage.
Fixed
We should check on the server that the newly attached storage has "available" status (bucket still exists and credentials are valid)
doing it now
I think now we need to change the exception raised on the server when preparing a data chunk with unlinked CS. Now there will be a 500 status code when fetching a chunk or a task preview - let's return more appropriate codes (probably a 409 status code or at least a better message)
Made a better message and also 409 code.
cvat/schema.yml
Outdated
name: lightweight | ||
schema: | ||
type: boolean | ||
description: Make lightweight backup for cloud based tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Tried to write a better description.
- I don't quite understand why it should be explicitly rejected instead of just doing nothing
cvat/schema.yml
Outdated
@@ -10791,6 +10824,10 @@ components: | |||
description: | | |||
The number of consensus replica jobs for each annotation job. | |||
Configured at task creation | |||
data_storage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got rid of this in favor of using GET/PATCH /api/tasks/<id>/data/meta
cvat/schema.yml
Outdated
@@ -9466,6 +9476,10 @@ components: | |||
allOf: | |||
- $ref: '#/components/schemas/StorageRequest' | |||
nullable: true | |||
data_storage: | |||
allOf: | |||
- $ref: '#/components/schemas/TaskDataStorageRequest' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they have the id
field, which is not needed here. Removed it, using other way to change data storage
|
|
||
def init_request_args(self) -> None: | ||
super().init_request_args() | ||
lightweight = to_bool(self.request.query_params.get("lightweight", True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True by default may be unexpected behavior for users, as it changes the output result despite the fact that the usage of API parameters does not change on the user's side (in some scripts).
@@ -216,6 +229,9 @@ def _init_callback_with_params(self): | |||
logger, | |||
self.job_result_ttl, | |||
) | |||
self.callback_kwargs = dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using dictionary literal
if data['client_files'] != [self.MEDIA_MANIFEST_FILENAME]: | ||
raise ValidationError(f"Expected {self.MEDIA_MANIFEST_FILENAME} in backup files") | ||
|
||
manifest = ImageManifestManager(os.path.join(self._db_task.data.get_upload_dirname(), self.MEDIA_MANIFEST_FILENAME)) | ||
data['server_files'] = list(manifest.data) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 file types (client_files/server_files) that are passed to the task creation logic. I don't remember that we're using somewhere more than one data type when creating tasks and I rather think that this should not be allowed.
@@ -1100,6 +1118,8 @@ def create_backup( | |||
Exporter: Type[ProjectExporter | TaskExporter], | |||
logger: Logger, | |||
cache_ttl: timedelta, | |||
*, | |||
lightweight: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't think the lightweight should be True by default because it changes the logic for RQ jobs created before.
One more disadvantage of setting the default value in this func - the default value for this argument is defined 2 times now: here and when reading the request.query_params. IMHO, it's better to have only one place where the default value is defined (when handling query params). But I'm okay if it's okay for others.
@@ -971,13 +984,14 @@ def _prepare_project_meta(self, project): | |||
class ProjectExporter(_ExporterBase, _ProjectBackupBase): | |||
ModelClass: ClassVar[models.Project] = models.Project | |||
|
|||
def __init__(self, pk, version=Version.V1): | |||
def __init__(self, pk, version=Version.V1, *, lightweight: bool = True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest not defining the default value for the lightweight argument here (the same comment for the TaskExported class).
def __init__(self, pk, version=Version.V1, *, lightweight: bool = True): | |
def __init__(self, pk, *, lightweight: bool, version: Version = Version.V1): |
): Promise<string | void> { | ||
const { backendAPI } = config; | ||
const params: Params = { | ||
...enableOrganization(), | ||
...configureStorage(targetStorage, useDefaultSettings), | ||
...(fileName ? { filename: fileName } : {}), | ||
...(!lightweight ? { lightweight } : {}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks strange that you send the lightweight query param only when it's False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the lightweight param should be optional and we should send it only if its specified
...(typeof lightweight === 'boolean' ? { lightweight } : {}),
@@ -483,6 +483,8 @@ export interface SerializedFramesMetaData { | |||
size: number; | |||
start_frame: number; | |||
stop_frame: number; | |||
storage: StorageLocation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact it can be also "share" type
targetStorage: Storage, | ||
useDefaultSettings: boolean, | ||
fileName?: string, | ||
lightweight: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lightweight: bool, | |
lightweight: boolean, |
But I also think it should be optional
checked={lightweight} | ||
onChange={setLightweight} | ||
/> | ||
<Text strong>Make light-weight backup</Text> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<Text strong>Make light-weight backup</Text> | |
<Text strong>Make a lightweight backup</Text> |
@@ -35,6 +38,7 @@ const initialValues: FormValues = { | |||
cloudStorageId: undefined, | |||
}, | |||
useProjectTargetStorage: true, | |||
lightweight: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we shouldn't show this option at least in cases where we know that it's useless (e.g. when a task has no cloud data). Was it agreed upon with others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree here, in case there is no cs
for task we shouldnt even show the form item for this
For a project containing mixed cloud storage and non cloud storage data, the switcher "Make lightweight backup" seems a little uncertain. I would suggest to rename it, maybe like "Use lightweight backup whenever possible" (with corresponding hint)..? IDK, I would suggest to think on better message here |
@klakhov Could you please look at client changes here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should update the style of the cloud storage selector to match the assignee selector.
We should also move it to the right to align with the assignee selector.
@@ -701,6 +715,22 @@ function saveJobMeta(meta: FramesMetaData, jobID: number): Promise<FramesMetaDat | |||
return frameMetaCache[jobID]; | |||
} | |||
|
|||
function saveTaskMeta(meta: FramesMetaData, taskID: number): Promise<FramesMetaData> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is complete dublicate of what we have for saveJobMeta, I think we can rewrite it so we have only one function
@@ -949,6 +979,14 @@ export async function patchMeta(jobID: number): Promise<FramesMetaData> { | |||
return newMeta; | |||
} | |||
|
|||
export async function patchTaskMeta(taskID: number, meta: FramesMetaData): Promise<FramesMetaData> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for saveTaskMeta
targetStorage: Storage, | ||
useDefaultSettings: boolean, | ||
fileName?: string, | ||
lightweight: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lightweight: bool, | |
lightweight: boolean, |
Also we cant have a required parameter follow optional parameter fileName
@@ -995,12 +995,14 @@ async function backupTask( | |||
targetStorage: Storage, | |||
useDefaultSettings: boolean, | |||
fileName?: string, | |||
lightweight: boolean, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here, we cant have required param follow optional one
): Promise<string | void> { | ||
const { backendAPI } = config; | ||
const params: Params = { | ||
...enableOrganization(), | ||
...configureStorage(targetStorage, useDefaultSettings), | ||
...(fileName ? { filename: fileName } : {}), | ||
...(!lightweight ? { lightweight } : {}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the lightweight param should be optional and we should send it only if its specified
...(typeof lightweight === 'boolean' ? { lightweight } : {}),
const data = await getCore().cloudStorages.get({ id }); | ||
if (data.length === 1) { | ||
return data[0]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const data = await getCore().cloudStorages.get({ id }); | |
if (data.length === 1) { | |
return data[0]; | |
} | |
const [data] = await core.projects.get({ id }); | |
return data; |
} | ||
}, [meta]); | ||
|
||
const updateCloudStorage = (_cloudStorage: CloudStorage | null): Promise<void> => ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updateCloudStorage
function updates task metadata, which for the user essentially means that the task itself is being updated. This should trigger the loading spinner.
I recommend merging develop
and checking how task fields are updated. We use the onUpdateTask
callback, which dispatches the updateTaskAsync
action.
I think we should create a dedicated updateTaskMetadataAsync
action and an onUpdateTaskMetadata
callback, instead of handling the update logic directly here.
The reason is that within updateTaskMetadataAsync
, we can dispatch updateTask
/ updateTaskFailed
, which will update the activities.updates
fields in the reducer—these are responsible for showing the loading spinner on the task page.
useEffect(() => { | ||
task.meta.get().then((_meta) => { | ||
setMeta(_meta); | ||
}); | ||
}, []); | ||
|
||
useEffect(() => { | ||
if (meta && meta.cloudStorageId) { | ||
getCloudStorageById(meta.cloudStorageId).then((_cloudStorage) => { | ||
setCloudStorage(_cloudStorage); | ||
setCloudStorageLoaded(true); | ||
}); | ||
} else { | ||
setCloudStorageLoaded(true); | ||
} | ||
}, [meta]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fetching task metadata and the corresponding cloud storage should be part of the receiveTask
function for the task-page
, which manages the loading spinner.
If we try to fetch this data here, the user will see the page before all the necessary data has been fully loaded.
meta && meta.storage === StorageLocation.CLOUD_STORAGE && ( | ||
<Row justify='space-between' align='middle'> | ||
<Col span={12}> | ||
<Row className='cvat-cloud-storage'> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is wrong with class names here
<Row className='cvat-cloud-storage'> | |
<Row className='cvat-task-cloud-storage'> |
{ cloudStorageLoaded && cloudStorage && renderCloudStorageName(cloudStorage) } | ||
{ cloudStorageLoaded && !cloudStorage && 'MISSING' } | ||
<Text | ||
className='cvat-issue-tracker-value' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
className='cvat-issue-tracker-value' | |
className='cvat-task-cloud-storage-?' |
Motivation and context
How has this been tested?
Checklist
develop
branchLicense
Feel free to contact the maintainers if that's a concern.