-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version 2.x #110
Version 2.x #110
Conversation
WalkthroughThe updates have significantly improved the Migro tool's functionality for migrating files to Uploadcare from various sources like AWS S3. These enhancements include the introduction of a database module for tracking upload attempts, expanded CLI functionalities with new commands, and the integration of support libraries like Changes
Recent Review DetailsConfiguration used: CodeRabbit UI Files selected for processing (1)
Additional comments not posted (2)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Out of diff range and nitpick comments (2)
migro/utils.py (1)
6-17
: The functionsave_result_to_csv
correctly handles the creation of directories and files for logging. Consider adding error handling for file operations to manage exceptions that may arise during file writing.migro/uploader/process_loop.py (1)
32-40
: The signal handling setup inprocess_loop.py
correctly handles UNIX signals. Consider adding a comment to clarify that Windows does not support these signals, as indicated by theNotImplementedError
handling.
@loookashow how do you like the review? |
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@coderabbitai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Migrating files | ||
--------------- | ||
|
||
Migro supports migration from the following sources: | ||
|
||
- AWS S3: The tool scans the bucket, generates temporary signed URLs, and migrates all files. | ||
- File with URLs: The tool reads a file containing URLs and migrates all files listed. | ||
|
||
Each migration source requires the following arguments: | ||
|
||
``<PUBLIC_KEY>`` — your Uploadcare project `public key`_. | ||
|
||
``[<SECRET_KEY>]`` — your Uploadcare project secret key. | ||
This is optional and required only if the signed uploads feature is enabled in your project. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify the requirements for secret keys.
The documentation should clearly state under what conditions the secret key is necessary. This will help users understand when they need to provide this additional piece of information.
README.rst
Outdated
How it works: | ||
1. Migro verifies the credentials provided and checks if the bucket policy is correct. | ||
2. The tool then scans the bucket and generates temporary signed URLs for all files. | ||
3. Migro proceeds to upload all files to Uploadcare. | ||
|
||
|
||
Set policy for a bucket | ||
~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
To ensure proper functionality, set the following minimal permissions for your AWS S3 bucket policy: | ||
|
||
.. code-block:: | ||
|
||
{ | ||
"Version": "2012-10-17", | ||
"Statement": [ | ||
{ | ||
"Effect": "Allow", | ||
"Action": [ | ||
"s3:GetObject", | ||
"s3:ListBucket" | ||
], | ||
"Resource": [ | ||
"arn:aws:s3:::<YOUR BUCKET NAME>", | ||
"arn:aws:s3:::<YOUR BUCKET NAME>/*" | ||
] | ||
} | ||
] | ||
} | ||
|
||
Remember to replace <YOUR BUCKET NAME> with your actual bucket name. | ||
|
||
To initiate the migration, execute the following command: | ||
|
||
.. code-block:: console | ||
|
||
--output_file FILE Path to a Migro output file. [default: | ||
migro_result.txt] | ||
$ migro s3 <PUBLIC_KEY> [<SECRET_KEY>] [<OPTIONS>] | ||
|
||
--upload_base TEXT Base URL for uploads. [default: | ||
https://upload.uploadcare.com/] | ||
For example: | ||
|
||
--from_url_timeout FLOAT Number of seconds to wait till the file will be | ||
processed by `from_url` upload. [default: 30] | ||
.. code-block:: console | ||
|
||
$ migro s3 <PUBLIC_KEY> --s3_bucket_name <BUCKET_NAME> --s3_access_key_id <ACCESS_KEY_ID> --s3_secret_access_key <SECRET_ACCESS_KEY> --s3_region <REGION> | ||
|
||
|
||
Options: | ||
|
||
.. code-block:: | ||
|
||
|
||
-h, --help Show this help and quit. | ||
|
||
--s3_bucket_name STRING Name of the AWS S3 bucket to migrate files from. | ||
|
||
--s3_access_key_id STRING AWS Access Key ID for accessing the S3 bucket. | ||
|
||
--s3_secret_access_key STRING AWS Secret Access Key for accessing the S3 bucket. | ||
|
||
--max_uploads INTEGER Maximum number of upload requests running in | ||
'parallel'. [default: 20] | ||
--s3_region STRING AWS region where the S3 bucket is located. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review and update the AWS S3 bucket policy example.
Ensure that the bucket policy example provided is up-to-date and includes all necessary permissions for the operations that Migro performs. This is crucial for preventing access issues during migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
SOURCES = { | ||
'URLS': 'urls', | ||
'S3': 's3' | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like these should be subclasses
S3Fetcher, URLFetcher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Co-authored-by: Dmitry Mukhin <dm+github@uploadcare.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
migro/cli.py
Outdated
def update_dotenv(key, value, env_file_path): | ||
with open(env_file_path, 'r') as file: | ||
lines = file.readlines() | ||
|
||
if sys.version_info < (3, 7): | ||
await asyncio.gather(*tasks) | ||
else: | ||
await asyncio.gather(*tasks, return_exceptions=True) | ||
key_exists = False | ||
for i, line in enumerate(lines): | ||
if line.startswith(f'{key}='): | ||
lines[i] = f'{key}={value}\n' | ||
key_exists = True | ||
break | ||
|
||
running_loop = asyncio.get_running_loop() if sys.version_info >= (3, 7) else asyncio.get_event_loop() | ||
running_loop.stop() | ||
if not key_exists: | ||
lines.append(f'{key}={value}\n') | ||
|
||
with open(env_file_path, 'w') as file: | ||
file.writelines(lines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding error handling for file operations in update_dotenv
to prevent crashes due to I/O issues.
+ try:
with open(env_file_path, 'r') as file:
lines = file.readlines()
key_exists = False
for i, line in enumerate(lines):
if line.startswith(f'{key}='):
lines[i] = f'{key}={value}\n'
key_exists = True
break
if not key_exists:
lines.append(f'{key}={value}\n')
with open(env_file_path, 'w') as file:
file.writelines(lines)
+ except IOError as e:
+ click.secho(f"Failed to update .env file: {str(e)}", fg='red')
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def update_dotenv(key, value, env_file_path): | |
with open(env_file_path, 'r') as file: | |
lines = file.readlines() | |
if sys.version_info < (3, 7): | |
await asyncio.gather(*tasks) | |
else: | |
await asyncio.gather(*tasks, return_exceptions=True) | |
key_exists = False | |
for i, line in enumerate(lines): | |
if line.startswith(f'{key}='): | |
lines[i] = f'{key}={value}\n' | |
key_exists = True | |
break | |
running_loop = asyncio.get_running_loop() if sys.version_info >= (3, 7) else asyncio.get_event_loop() | |
running_loop.stop() | |
if not key_exists: | |
lines.append(f'{key}={value}\n') | |
with open(env_file_path, 'w') as file: | |
file.writelines(lines) | |
def update_dotenv(key, value, env_file_path): | |
try: | |
with open(env_file_path, 'r') as file: | |
lines = file.readlines() | |
key_exists = False | |
for i, line in enumerate(lines): | |
if line.startswith(f'{key}='): | |
lines[i] = f'{key}={value}\n' | |
key_exists = True | |
break | |
if not key_exists: | |
lines.append(f'{key}={value}\n') | |
with open(env_file_path, 'w') as file: | |
file.writelines(lines) | |
except IOError as e: | |
click.secho(f"Failed to update .env file: {str(e)}", fg='red') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Out of diff range and nitpick comments (1)
tests/test_uploader.py (1)
Line range hint
26-38
: Consider using a fixture or context manager to temporarily setsettings.PUBLIC_KEY
during the test to avoid side effects in other tests.
Co-authored-by: Dmitry Mukhin <dm+github@uploadcare.com>
Co-authored-by: Dmitry Mukhin <dm+github@uploadcare.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Co-authored-by: Dmitry Mukhin <dm+github@uploadcare.com>
Co-authored-by: Dmitry Mukhin <dm+github@uploadcare.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Out of diff range and nitpick comments (1)
README.rst (1)
34-44
: The migration process is described succinctly. Consider enhancing this section with a diagram or flowchart for better visual understanding, especially for users new to such operations.
In order to install ``migro``, run: | ||
|
||
.. code-block:: console | ||
|
||
$ pip install uploadcare-migro | ||
|
||
|
||
Usage | ||
----- | ||
Get started | ||
----------- | ||
|
||
To begin using Migro, you must have your Uploadcare project `public key`_. | ||
|
||
Optionally, you can store the public key and other necessary credentials in environment variables. | ||
This approach is beneficial if you must migrate many files, allowing you to perform multiple migrations efficiently. | ||
|
||
.. code-block:: console | ||
|
||
$ migro <PUBLIC_KEY> <INPUT_FILE> | ||
$ migro init <OPTIONS> | ||
|
||
Where: | ||
After running the initialization command, your credentials will be saved in a local ``.env`` | ||
file, allowing you to use Migro without specifying them in the command line each time. | ||
Once the ``.env`` file is created, you can edit it manually if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installation and initialization instructions are clear. However, it's crucial to emphasize the security best practices for handling the .env
file. Consider adding a warning about securing this file against unauthorized access.
Once the migration is complete, execute the ``migro drop`` command to | ||
remove the `.env` file containing credentials, clear the local database, and the logs folder. | ||
|
||
.. code-block:: console | ||
|
||
$ migro drop | ||
|
||
|
||
Note for Windows users | ||
---------------------- | ||
|
||
Currently there is an issue with program termination by CTRL+C on Windows, so | ||
program can no be terminated correctly by this way. | ||
Currently, there is an issue with terminating the program using CTRL+C on Windows. | ||
As a result, the program cannot be terminated correctly using this method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address the termination issue on Windows by providing a workaround or potential fix. This will improve the user experience for Windows users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
from functools import wraps | ||
from pathlib import Path | ||
|
||
import click | ||
from dotenv import dotenv_values, find_dotenv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move module level imports to the top of the file.
+ import os
+ import sys
+ from pathlib import Path
+ from dotenv import dotenv_values, find_dotenv
+ import click
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
from functools import wraps | |
from pathlib import Path | |
import click | |
from dotenv import dotenv_values, find_dotenv | |
import os | |
import sys | |
from pathlib import Path | |
from dotenv import dotenv_values, find_dotenv | |
import click | |
from functools import wraps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
def update_dotenv(key, value, env_file_path): | ||
with open(env_file_path, 'r') as file: | ||
lines = file.readlines() | ||
|
||
[task.cancel() for task in tasks] | ||
key_exists = False | ||
for i, line in enumerate(lines): | ||
if line.startswith(f'{key}='): | ||
lines[i] = f'{key}={value}\n' | ||
key_exists = True | ||
break | ||
|
||
if sys.version_info < (3, 7): | ||
await asyncio.gather(*tasks) | ||
else: | ||
await asyncio.gather(*tasks, return_exceptions=True) | ||
if not key_exists: | ||
lines.append(f'{key}={value}\n') | ||
|
||
running_loop = asyncio.get_running_loop() if sys.version_info >= (3, 7) else asyncio.get_event_loop() | ||
running_loop.stop() | ||
with open(env_file_path, 'w') as file: | ||
file.writelines(lines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding error handling for file operations in update_dotenv
to prevent crashes due to I/O issues.
+ try:
with open(env_file_path, 'r') as file:
lines = file.readlines()
key_exists = False
for i, line in enumerate(lines):
if line.startswith(f'{key}='):
lines[i] = f'{key}={value}\n'
key_exists = True
break
if not key_exists:
lines.append(f'{key}={value}\n')
with open(env_file_path, 'w') as file:
file.writelines(lines)
+ except IOError as e:
+ click.secho(f"Failed to update .env file: {str(e)}", fg='red')
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def update_dotenv(key, value, env_file_path): | |
with open(env_file_path, 'r') as file: | |
lines = file.readlines() | |
[task.cancel() for task in tasks] | |
key_exists = False | |
for i, line in enumerate(lines): | |
if line.startswith(f'{key}='): | |
lines[i] = f'{key}={value}\n' | |
key_exists = True | |
break | |
if sys.version_info < (3, 7): | |
await asyncio.gather(*tasks) | |
else: | |
await asyncio.gather(*tasks, return_exceptions=True) | |
if not key_exists: | |
lines.append(f'{key}={value}\n') | |
running_loop = asyncio.get_running_loop() if sys.version_info >= (3, 7) else asyncio.get_event_loop() | |
running_loop.stop() | |
with open(env_file_path, 'w') as file: | |
file.writelines(lines) | |
def update_dotenv(key, value, env_file_path): | |
try: | |
with open(env_file_path, 'r') as file: | |
lines = file.readlines() | |
key_exists = False | |
for i, line in enumerate(lines): | |
if line.startswith(f'{key}='): | |
lines[i] = f'{key}={value}\n' | |
key_exists = True | |
break | |
if not key_exists: | |
lines.append(f'{key}={value}\n') | |
with open(env_file_path, 'w') as file: | |
file.writelines(lines) | |
except IOError as e: | |
click.secho(f"Failed to update .env file: {str(e)}", fg='red') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
To begin using Migro, you must have your Uploadcare project `public key`_. | ||
|
||
Optionally, you can store the public key and other necessary credentials in environment variables. | ||
This approach is beneficial if you must migrate many files, allowing you to perform multiple migrations efficiently. | ||
|
||
.. code-block:: console | ||
|
||
$ migro <PUBLIC_KEY> <INPUT_FILE> | ||
$ migro init <OPTIONS> | ||
|
||
Where: | ||
After running the initialization command, your credentials will be saved in a local ``.env`` | ||
file, allowing you to use Migro without specifying them in the command line each time. | ||
Once the ``.env`` file is created, you can edit it manually if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhance the security advice for credential storage.
Consider adding a warning about securing the .env
file against unauthorized access, especially since it contains sensitive information.
How migration works | ||
------------------- | ||
|
||
Migro does not directly download any files. Instead, it utilizes | ||
the `Uploading API`_, specifically the ``From URL`` method_. | ||
The migration process is straightforward: you either provide a list of file | ||
URLs or set your AWS S3 credentials, and those files are uploaded to your Uploadcare project. | ||
|
||
Upon completion, you will receive a listing of all uploaded files. | ||
You will also receive details on the status of each processed file and any errors that may have occurred. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify the migration process details.
Consider adding more details or a step-by-step breakdown of the migration process to enhance clarity for users who might be new to such operations.
How it works: | ||
1. Migro verifies the credentials provided and checks if the bucket policy is correct. | ||
2. The tool then scans the bucket and generates temporary signed URLs for all files. | ||
3. Migro proceeds to upload all files to Uploadcare. | ||
|
||
|
||
Set policy for a bucket | ||
~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
To ensure proper functionality, set the following minimal permissions for your AWS S3 bucket policy: | ||
|
||
.. code-block:: | ||
|
||
{ | ||
"Version": "2012-10-17", | ||
"Statement": [ | ||
{ | ||
"Principal": "*", | ||
"Effect": "Allow", | ||
"Action": [ | ||
"s3:GetObject", | ||
"s3:ListBucket" | ||
], | ||
"Resource": [ | ||
"arn:aws:s3:::<YOUR BUCKET NAME>", | ||
"arn:aws:s3:::<YOUR BUCKET NAME>/*" | ||
] | ||
} | ||
] | ||
} | ||
|
||
Remember to replace <YOUR BUCKET NAME> with your actual bucket name. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review and update the AWS S3 bucket policy example.
Ensure that the bucket policy example provided is up-to-date and includes all necessary permissions for the operations that Migro performs. This is crucial for preventing access issues during migration.
Once the migration is complete, execute the ``migro drop`` command to | ||
remove the `.env` file containing credentials, clear the local database, and the logs folder. | ||
|
||
.. code-block:: console | ||
|
||
$ migro drop | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhance the post-migration cleanup instructions.
Consider providing specific commands or steps for securely removing the .env
file, clearing the local database, and the logs folder to ensure that all sensitive data is properly handled.
Currently, there is an issue with terminating the program using CTRL+C on Windows. | ||
As a result, the program cannot be terminated correctly using this method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address the termination issue on Windows by providing a workaround or potential fix.
This will improve the user experience for Windows users. Consider suggesting alternative methods for terminating the program or addressing the underlying issue.
@@ -35,11 +35,11 @@ def common_options(func): | |||
@wraps(func) | |||
@click.option('--upload_base_url', help="Base URL for uploads.", type=str, default=env.get('UPLOAD_BASE')) | |||
@click.option('--upload_timeout', | |||
help="Number of seconds to wait till the file will be processed by `from_url` upload.", type=int, | |||
help="Number of seconds to wait till the file will be processed by `from_url` upload.", type=float, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and below,
why float?
there are some outstanding questions that we should address at some point @loookashow |
Overview:
More details:
Known Issues:
Occasionally, a notification like the following appears:
I haven't looked into what can be done about it yet, but it may require some code rewriting in uploader/utils.py to make it more up-to-date. However, this may require dropping support for Python versions prior to 3.8 (although the README already states that versions 3.8+ are supported).
Open Questions:
TBD: Should we keep the script in the library that collects links from Filestack? At the very least, it needs to be tested to see if it works or not. I haven't tested it yet.
Summary by CodeRabbit
New Features
Documentation
Refactor
.gitignore
to exclude database files and logs.Bug Fixes
Chores