Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] File Handling Overhaul #1031

Closed
swainn opened this issue Apr 5, 2024 · 11 comments
Closed

[FEATURE] File Handling Overhaul #1031

swainn opened this issue Apr 5, 2024 · 11 comments

Comments

@swainn
Copy link
Member

swainn commented Apr 5, 2024

Is your feature request related to a problem? Please describe.
Workspaces and static files sometimes get used for configuration files, but neither is really a great place for them, because you end up with syncing issues in production due to the sym-linking. In addition, getting dynamically created files publicly accessible because there isn't a file API for static files like there is for workspaces. Static files is also not a good place for user uploaded content, because static files are assumed to be trusted "source code" files, and uploaded content should be treated as suspect at best.

Describe the solution you'd like
We add additional file management and refactor the workspaces API as follows:

There should be 4 places for files:

  • public: for publicly accessible, runtime static, version controlled files (e.g. images, JavaScript, CSS)
  • resources: for private, runtime static, version controlled files (e.g. config files, data files, other non-python files)
  • workspaces: for private, runtime writable, unversioned files (e.g. files created by the app)
  • media: for publicly accessible, runtime writable, unversioned files (e.g. files uploaded by a user or files created by app, but need to be public)

In addition we overhaul how workspaces works so that symbolic linking is no longer used in production and remove the workspaces folder from the scaffold. The workspaces (and media) folders would be managed folders and users wouldn't need to know where the folders are.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@sdc50
Copy link
Member

sdc50 commented Apr 10, 2024

Similar to the Workspaces API I think there should be 3 ways to access these paths:

  1. controller decorator
  2. app class methods
  3. SDK getter functions

(note that the workspaces also as workspaces decorators that I think should be deprecated).

from tethys_sdk.paths import (
  get_app_workspace,
  get_user_workspace, 
  get_app_resources, 
  get_app_media, 
  get_user_media, 
  get_app_public,
)

Create a new class that is mostly the same as the TethysWorkspace class but update it to use Path objects by default.

class TethysPath:
    """
    Defines objects that represent paths (directories) for apps and users.

    Attributes:
      path(Path): The absolute path to the workspace directory. Cannot be overwritten.
    """

    def __init__(self, path, read_only=False):
        """
        Constructor
        """
        self._path = Path(path).resolve()
        assert not self._path.is_file()
        # Create the path if it doesn't already exist
        self._path.mkdir(parents=True, exists_okay=True)

        self._read_only = read_only

    def __repr__(self):
        """
        Rendering
        """
        return '<TethysPath path="{0}">'.format(self._path)

    @property
    def path(self):
        # Note that this is different from TethysWorkspace in that it now returns a python Path object
        return self._path

    @property
    def read_only(self):
        return self._read_only

    def files(self, names_only=False):
        """
        Return a list of files (as Path objects by default) that are in the workspace.

        Args:
          names_only(bool): Returns list of filenames as strings when True. Defaults to False.

        Returns:
          list: A list of files in the workspace.

        **Examples:**

        ::

            # List file names
            workspace.files()

            # List full path file names
            workspace.files(full_path=True)

        """
        path, dirs, files = next(os.walk(self.path))
        if names_only:
            return files
        return [self.path / f for f in files]

    def directories(self, names_only=False):
        """
        Return a list of directories (as Path objects by default) that are in the workspace.

        Args:
          names_only(bool): Returns list of directory names  as strings when True. Defaults to False.

        Returns:
          list: A list of directories in the workspace.

        **Examples:**

        ::

            # List directory names
            workspace.directories()

            # List full path directory names
            workspace.directories(full_path=True)

        """
        path, dirs, files = next(os.walk(self.path))
        if names_only:
            return dirs
        return [self.path / d for d in dirs]

    def clear(self, exclude=None, exclude_files=False, exclude_directories=False):
        """
        Remove all files and directories in the workspace.

        Args:
          exclude(iterable): A list or tuple of file and directory names to exclude from clearing operation.
          exclude_files(bool): Excludes all files from clearing operation when True. Defaults to False.
          exclude_directories(bool): Excludes all directories from clearing operation when True. Defaults to False.

        **Examples:**

        ::

            # Clear everything
            workspace.clear()

            # Clear directories only
            workspace.clear(exclude_files=True)

            # Clear files only
            workspace.clear(exclude_directories=True)

            # Clear all but specified files and directories
            workspace.clear(exclude=['file1.txt', '/full/path/to/directory1', 'directory2', '/full/path/to/file2.txt'])

        """
        if self.read_only:
            raise RuntimeError('Read only TethysPaths cannot be cleared')

        if exclude is None:
            exclude = list()

        files = self.files()

        if not exclude_files:
            for file in files:
                if file not in exclude and file.name not in exclude:
                    file.unlink()

        if not exclude_directories:
            directories = self.directories()
            for directory in directories:
                if directory not in exclude and directory.name not in exclude:
                    shutil.rmtree(directory)

    def remove(self, item):
        """
        Remove a file or directory from the workspace.

        Args:
          item(str): Name of the item to remove from the workspace.

        **Examples:**

        ::

            workspace.remove('file.txt')
            workspace.remove('/full/path/to/file.txt')
            workspace.remove('relative/path/to/file.txt')
            workspace.remove('directory')
            workspace.remove('/full/path/to/directory')
            workspace.remove('relative/path/to/directory')
            workspace.remove(path_object)

        **Note:** Though you can specify relative paths, the ``remove()`` method will not allow you to back into other directories using "../" or similar notation. Futhermore, absolute paths given must contain the path of the workspace to be valid.

        """  # noqa: E501
        if self.read_only:
            raise RuntimeError('Cannot remove files from read-only TethysPaths')        
        item = Path(item).resolve()

        assert item.relative_to(self.path).  #TODO add an if statement with a helpful error message

        if item.is_dir():
            shutil.rmtree(item)
        elif item.is_file():
            item.unlink()

    def get_size(self, units="b"):
        total_size = 0
        for file in self.files():
            total_size += os.path.getsize(file)

        if units.lower() == "b":
            conversion_factor = 1
        else:
            storage_units = _get_storage_units()
            conversion_factor = [
                item[0] for item in storage_units if units.upper() in item[1]
            ][0]

        return total_size / conversion_factor

@sdc50
Copy link
Member

sdc50 commented Apr 10, 2024

Here are some helper methods to implment:

def _resolve_app_class(app_class_or_request):
    """
    Returns and app class
    """
    pass


def _resolve_username(user_or_request):
    """
    Gets the username from user or request object
    (Also check quotas?)
    """
    

def _get_app_workspace(app):
    """
    Gets the root workspace directory for an app. Uses TETHYS_WORKSPACES_ROOT setting
    """
    return Path(settings.TETHYS_WORKSPACES_ROOT) / app.package

def get_app_workspace(app_or_request) -> TethysPath:
    if settings.USE_OLD_WORKSPACES_API:
        return _get_app_workspace_old(app_or_request)

    app = _resolve_app_class(app_or_request)
    return _get_app_workspace(app)


def _get_user_workspace(app, username):
    app_workspace = get_app_workspace(app)
    return TethysPath(app_workspace.path / username)


def get_user_workspace(app_class_or_request, user_or_request) -> TethysPath:
    if settings.USE_OLD_WORKSPACES_API:
        return _get_user_workspace_old(app_class_or_request, user_or_request)

    app = _resolve_app_class(app_class_or_request)
    username = _resolve_username(user_or_request)
    return _get_user_workspace(app, username)


def _get_app_media(app):
    """
    Gets the root media directory for an app. Uses MEDIA_ROOT setting.
    """
    return Path(settings.MEDIA_ROOT) / app.package


def get_app_media(app_or_request):
    app = _resolve_app_class(app_or_request)
    return _get_app_media(app)


def _get_user_media(app, username):
    app_media = get_app_media(app)
    return TethysPath(app_media.path / username)


def get_public_path(app_or_extension):
    """
    Gets the public directory of an app or extension as a read-only TethysPath
    """
    return app_or_extension.public_path


def get_resources_path(app_or_extension):
    """
    Gets the resources directory of an app or extension as a read-only TethysPath
    """
    return app_or_extension.resources_path

Basics of TethysBase

from importlib.resources import files.  # This is new in Python 3.9

class TethysBase(TethysBaseMixin):

    @property
    def _package_files(self):
        return files(f"{self.package_namespace}.{self.package}")

    @property
    def public_path(self):
        return TethysPath(self._package_files / 'public')

    @property
    def resources_path(self):
        return TethysPath(self._package_files / 'resources')

@sdc50
Copy link
Member

sdc50 commented Apr 11, 2024

Do we want to include arguments in the controller decorator to access all of these paths?

@controller(
    app_workspace=True,
    user_workspace=True,
    app_media_path=True,
    user_media_path=True,
    app_resource_path=True,
    app_public_path=True,
)
def my_controller(request, app_workspace, user_workspace, app_media_path, user_media_path, app_resource_path, app_public_path):
    ...

Maybe we also want them for the consumer decorator?

@sdc50
Copy link
Member

sdc50 commented Apr 11, 2024

Decide on naming convention:

get_app_media or get_app_media_path?

get_resources_path or get_app_resouces_path or get_resources?

@swainn
Copy link
Member Author

swainn commented Apr 11, 2024

Do we want to include arguments in the controller decorator to access all of these paths?

@controller(
    app_workspace=True,
    user_workspace=True,
    app_media_path=True,
    user_media_path=True,
    app_resource_path=True,
    app_public_path=True,
)
def my_controller(request, app_workspace, user_workspace, app_media_path, user_media_path, app_resource_path, app_public_path):
    ...

Maybe we also want them for the consumer decorator?

Yes, we definitely want them in the consumer decorator too. It would be nice if we could generalize the two decorators somehow so we don't have to implement everything twice. Or at least have a formal way to implement it twice.

@swainn
Copy link
Member Author

swainn commented Apr 11, 2024

Decide on naming convention:

get_app_media or get_app_media_path?

get_resources_path or get_app_resouces_path or get_resources?

I like this naming convention:

get_app_workspace
get_user_workspace
get_app_media
get_user_media
get_app_public
get_app_resources

@sdc50
Copy link
Member

sdc50 commented Apr 11, 2024

I like this naming convention:

get_app_workspace
get_user_workspace
get_app_media
get_user_media
get_app_public
get_app_resources

The way that I've implemented get_resources_path it will work with an TethysExtensionBase object, which is why I initially didn't have app in the name.

However, since it can accept a request object and derives the app_class from that, it may be best to call it as you suggested.

@sdc50
Copy link
Member

sdc50 commented Apr 12, 2024

Yes, we definitely want them in the consumer decorator too.

The consumer decorator is tricky since it doesn't have a request object. I think the easiest thing to do would be to piggy-back on the TethysAsyncWebsocketConsumerMixin class that @c-krew created and make methods to get the paths. So they wouldn't be arguments in the consumer decorator.

@swainn swainn added this to the Version 4.3 milestone Apr 12, 2024
@sdc50 sdc50 mentioned this issue Apr 15, 2024
7 tasks
@sdc50
Copy link
Member

sdc50 commented Apr 30, 2024

@swainn, @shawncrawley, @gagelarsen

Should we deprecate the app_workspace and user_workspace decorators, or should we add decorators to get the rest of the paths?

@swainn
Copy link
Member Author

swainn commented May 3, 2024

@swainn, @shawncrawley, @gagelarsen

Should we deprecate the app_workspace and user_workspace decorators, or should we add decorators to get the rest of the paths?

I think it is ok to deprecate these decorators, but we should maintain support for the workspace arguments in the controller decorator.

@swainn
Copy link
Member Author

swainn commented Jun 7, 2024

Implemented by #1037

@swainn swainn closed this as completed Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants