Skip to content
This repository has been archived by the owner on Jan 25, 2023. It is now read-only.

Allow abstracting resource I/O #2198

Closed

Conversation

tiberiusbrown
Copy link

This is the first time I've ever tried to make a pull request on anything... please let me know if I did it wrong.

For a while I've wished that I could abstract the I/O part of resource loading, so I could potentially load resources from a memory buffer, a network request, etc... These are some minimal, non-API-breaking changes that allow someone to do this by subclassing both File and ResourceCache.

@eugeneko
Copy link
Contributor

eugeneko commented Dec 5, 2017

Just curious: how do you use such API now?

@tiberiusbrown
Copy link
Author

tiberiusbrown commented Dec 5, 2017

For example, a File subclass that uses PhysFS as its backend to read from an in-memory archive. Then, a ResourceCache subclass that overloads GetFile in order to use that new File subclass when loading resources. Finally, functionality that allows derived subsystems to be registered to the context using a base class's type so that existing code that calls GetSubsystem<ResourceCache>() can work correctly with the ResourceCache subclass.

@weitjong
Copy link
Contributor

weitjong commented Dec 6, 2017

I still don't see how it would work. Most of the subsystems are registered by the Engine class in its constructor. And it is done in a "hard-wired" fashion. If you would have new subsystem then no problem as it can be registered later elsewhere, but if you have a subclass of existing subsystems then how do you plan to inhibit Engine class from registering the original in the first place?

Also, I also cannot see why it is the ResourceCache being the one need to be subclass. Without looking at the code more closely, my intuition says it should be the FileSystem subclass.

@tiberiusbrown
Copy link
Author

That the subsystems are registered in the Engine constructor I was aware, but for the purpose of keeping the changes minimal, I did not want to modify more code than necessary. For my purposes, I can either forgo using Engine at all and register subsystems myself, or I can overwrite the existing subsystem registration for ResourceCache after Engine construction.

I had put some thought into how to go about this, and I wanted to keep the modifications minimal while not affecting all filesystem access, as it's used for stuff like logging and shader caching.... However, I don't have the understanding of Urho3D's design philosophy that you and other senior contributors do, so any advice towards achieving my goal of loosening the restriction of resource loading to the filesystem would be appreciated! I'd like to contribute the code to accomplish it without compromising the quality of the codebase.

@eugeneko
Copy link
Contributor

eugeneko commented Dec 6, 2017

These are some minimal, non-API-breaking changes that allow someone to do this by subclassing both File and ResourceCache

If you break API, it become harder to use the library.
If you keep API untouched, it leads to overcomplicated design and makes future changes harder.
Your current design seems biased to the second extreme.
RegisterDerivedSubsystem itself is ok. Virtualized ResourceCache is not.
File isn't good candidate for interfacing, this class is full of implementation details.
Deriving from File is definetely a misdesign.

@tiberiusbrown
Copy link
Author

Thanks for the feedback, I appreciate it.

If you break API, it become harder to use the library.
If you keep API untouched, it leads to overcomplicated design and makes future changes harder.
Your current design seems biased to the second extreme.

You're right; I do consider breaking API to be the greater evil. I definitely would not want to make a contribution that breaks anyone's existing Urho projects. To that end, here is another solution that I had thought of that would be a bit more involved.

Since ResourceCache::GetFile returns a File pointer, it seems to me that, in order to not break API, we are forever restricted to returning either a File or a subclass of File from that method. It would be nice to return an AbstractFile instead, but AbstractFile does not inherit from RefCounted, so there would be some ownership details to be worked out, and existing code that does something like
SharedPtr<File> file(cache->GetFile("blah.txt")); would break. Using a new method also seems unhelpful to me, because calls to GetFile occur in most other parts of the Urho codebase, and probably in external Urho projects as well. So, do either of these actions seem appropriate:

  • Extend File somehow to handle other interfaces (any approach that I can think of right now to accomplish this doesn't seem very clean to me).
  • Modify File to allow clean inheritance, with the File class itself preserving its current behavior as a "default" behavior. This would mostly involve making all of its core functionality virtual (including Open(filename, mode), Close, Flush, IsOpen, GetHandle, and IsPackaged), just not pure virtual.

If neither of these options is acceptable, I can withdraw the pull request until a possible solution appears -- not breaking API and/or people's existing code is too important to me. Till now I've just been using a modified Urho codebase in my projects but I'd like to work with a clean pull.

As for not virtualizing ResourceCache, I think there are other solutions that would work, like extending the existing ResourceRouter interface to allow returning a custom Deserializer object for a given name, perhaps. I think the hardest issue will be working around File.

@eugeneko
Copy link
Contributor

eugeneko commented Dec 6, 2017

It would be nice to return an AbstractFile instead, but AbstractFile does not inherit from RefCounted

Ohh, thanks, I missed this point.
Then, I suggest to expand the hierarchy.
Keep aggregate class for Deserializer and Serializer and make a new one that is the Object.
So, MemoryBuffer etc would derive from the first and file system things would use the second.

The hardest thing here is to come up with names.
______________Object--\
Deserializer->IOStream->AbstractFile->File
Serializer--/

…esourceAbstractSource objects, and ResourceCache::GetAbstractFile method.
@tiberiusbrown
Copy link
Author

Thanks for the suggestion! I like it.

I pushed my first stab at changing the hierarchy as you described. There is a new method, ResourceCache::GetAbstractFile that returns an AbstractFile object provided by a ResourceAbstractSource object, using a similar strategy to the existing ResourceRouter. Then, most of the Urho core code that uses ResourceCache::GetFile was changed to use this method, with the exception of cached binary shader loading (pretty sure that's always going to be from the filesystem?) and sample code. If no ResourceAbstractSource objects were added to the resource cache, or if none of them provide an AbstractFile, then ResourceCache::GetAbstractFile falls back to ResourceCache::GetFile, which behaves exactly as before.

So the API should remain compatible with existing projects, except for those that depend on the specific hierarchy of AbstractFile, which is probably rare.

I also left the Context::RegisterDerivedSubsystem as I think it could have potential use, even if it's not relevant to this situation anymore. I'll remove it if you'd like.

@weitjong
Copy link
Contributor

weitjong commented Dec 7, 2017

I like the idea of supporting PhysFS or any other FUSE-based FS (with compatible license) too. However, I feel a little off with your changes to expose the "Abstract" class all over the code base. Instead we should probably refactor the current File class to become an interface-like class with possible multiple impl classes. The user of the File class doesn't really need to know which impl is being used because the interface (i.e. API to open/read/seek/write) remains the same. Considering that currently our File & Filesystem classes are in fact already versatile enough to support actual file from file system or from package file or from memfs (web) or from APK (android) or from bundle (iOS), adding another one to the mix should not need a massive changes as what I feel you plan to embark. Perhaps I am over optimistic.

Speaking of which, do you also plan to share the pieces where you integrate the PhysFS into the engine? I think it would only be fair that after all the changes your proposed which makes our code base more complex than it is, then in the end of day we get the benefit back of having an extra option to use PhysFS for those who want to use it. In short, I hope you are not sharing only the "overhead" bits to us.

@eugeneko
Copy link
Contributor

eugeneko commented Dec 7, 2017

However, I feel a little off with your changes to expose the "Abstract" class all over the code base

Y no? I mean, there are things to do, but the direction is fine.
Clients should work with interfaces, factories should provide implementations.

Instead we should probably refactor the current File class to become an interface-like class with possible multiple impl classes.

Are you talking about refactoring inside the class or splitting logic into separate classes?
File is already overcomplicated, this class shan't be subclassed.

@weitjong
Copy link
Contributor

weitjong commented Dec 7, 2017

Yes, don’t subclass the File class. I am suggesting to refactor it to have multiple impl similar to Graphics and DbConnection class having multiple impl, but only if PhysFS requires it. I have no issue if it just add another #ifdef in the existing File class. I believe this way we don’t have to confuse the library user to have to dealt with AbstractFile. They use File as it is now, whichever underlying implantation is being used. I hope I explain it clearly.

@eugeneko
Copy link
Contributor

eugeneko commented Dec 7, 2017

I have the following vision of things:

  • FileSystem is the low-level physical File System of the OS. No more logic here needed, it works fine now.
  • File is the low-level physical File, probably located inside the package. No more logic here needed, it works fine now.
  • Other (user) file systems like PhysFS, archives and so on shall be separate classes derived from interface. User code shall initialize them and use directly when needed. Shall be injectable into ResourceCache.
  • So there shall be some File interface with almost the same interface as File has. Except creation methods.

I am suggesting to refactor it to have multiple impl similar to Graphics and DbConnection class having multiple impl, but only if PhysFS requires it.

I see. I don't think we should tie ourselves to anythings specific here like PhysFS tho. It's the question of architectural flexibility.

I believe this way we don’t have to confuse the library user to have to dealt with AbstractFile. They use File as it is now, whichever underlying implantation is being used.

If user want to create a new File, he has to create derived class anyway. Or use filesystem.
IMO it's just a matter of names and migration: whether the File is physical file implementation or just an interface without Open method.

@weitjong
Copy link
Contributor

weitjong commented Dec 7, 2017

At the moment the current File and Filesystem class are not the same as your view.

@eugeneko
Copy link
Contributor

eugeneko commented Dec 7, 2017

At the moment the current File and Filesystem class are not the same as your view.

Ohh, that's true... Just checked the code.
FileSystem looks like a stash of different OS commands and operations.
Well, let's skip it for now. It works fine.

I meant that both File and FileSystem have their own area of responsibility, quite complex logic inside. They both works fine and don't need to be touched.

I want to have ResourceCache more customizable, so there should be some abstraction layer between resource management logic and physical file IO

@tiberiusbrown
Copy link
Author

Sorry I am late to the discussion.

@weitjong: I intended my mention of PhysFS to mean an example of how such an abstraction could be used. I don't think PhysFS is necessarily suitable for integration into Urho core (it maintains a lot of global state) but if you wish I could add a sample that uses PhysFS, e.g. 51_ResourcesFromPhysFS, that shows how to use the abstraction layer, whenever the final approach is decided.

I think abstraction would be more useful than adding another specific use case like PhysFS because it also opens other possibilities, like loading from network, or even IPC. For loading from archives like zip files someone also might want to integrate a lighter-weight library than PhysFS.

But again, I am far less familiar than both of you when it comes to Urho's design philosophy.

@weitjong
Copy link
Contributor

weitjong commented Dec 7, 2017

But currently we already can load from network and load from compress package file, and it can be achieved without those fancy abstraction layer.

@eugeneko
Copy link
Contributor

eugeneko commented Dec 7, 2017

But currently we already can load from network and load from compress package file

Specific Urho's compressed package format.
I accept that one may need support of arbitrary package formats.

The only disadvantage I see is that users will have to replace File::Open with smth else.

@tiberiusbrown
Copy link
Author

tiberiusbrown commented Dec 7, 2017

I had no idea Urho could already load resources from network! That's really cool! I can't figure out how it is done, though. HttpRequest implements the Deserializer interface but how does it hook into resource loading?

Edit: Ah, I see it is through Connection. It requires running an Urho server to provide packages.

@eugeneko
Copy link
Contributor

eugeneko commented Dec 7, 2017

I can't figure out how it is done, though

It is implemented on the level of Scene and Network.
https://github.com/urho3d/Urho3D/blob/master/Source/Urho3D/Network/Connection.h#L220

…e to avoid confusion with similar method in ResourceCache

Create ResourceCache::GetResourceAbstractSource method
Update a few comments.
@tiberiusbrown
Copy link
Author

@weitjong: Do you have any advice for a solution acceptable to you that allows users to provide their own resource loading solutions?

@weitjong
Copy link
Contributor

weitjong commented Dec 9, 2017

Without knowing what is the end goals (integrating PhysFS, or what's not), I think I will not be able to come up with any proposal so I will not take part in this. I am just expressing my personal opinion (dislike) on the new direction, but that's only personal opinion. If there are no other library users making noise (implying the majority of them like it) then you can just proceed. Having said that, when personally if/when in future I see too many things that I don't like that I may just leave the project. I only attracted to this project long ago because its elegance, pragmatic, clean and light hierarchy design approach.

@thesquib
Copy link

thesquib commented Dec 9, 2017

Personally, I think the essential goal of the changes could be useful. I have a need to load a custom resource package file, which is essentially a filesystem at some abstraction. The idea @weitjong has about multiple file implementations is more along the lines of what I was working towards on my local branch. But I've had very little time to work/think on this recently.

@eugeneko
Copy link
Contributor

eugeneko commented Dec 9, 2017

Without knowing what is the end goals (integrating PhysFS, or what's not), I think I will not be able to come up with any proposal so I will not take part in this

@weitjong I see at least two end goals.

  1. Be able to use arbitrary archives to store resources;
  2. Be able to inject custom resource loading routine.

Actualy I needed the second item some time ago.
I had some proceduaral content. So I wanted to generate it on demand and use it from the cache if it's already generated. I wanted this process to be completely transparent for users so this generated content could be used just as any other content.

@tiberiusbrown
Copy link
Author

tiberiusbrown commented Dec 9, 2017

I'm sorry -- I think I've been confusing with my intent, which is not to implement any new specific resource I/O backend. Hopefully I can explain better.

You already know this, but all resources are loaded through this method in Resource:

/// Load resource from stream. May be called from a worker thread. Return true if successful.
virtual bool BeginLoad(Deserializer& source);

Since this method takes a Deserializer object, which is an abstract interface, it doesn't dictate where the data comes from. I think this is very clean design and, like others have said, is part of why I like Urho.

However, the actual interface used to load resources (in ResourceCache) restricts itself to using the specialized File class and a few other specific use cases (covered in the above discussion), even though the infrastructure exists to load from any Deserializer implementation.

My intent is: to move towards lifting this restriction while maintaining backwards compatibility in the API. My opinion (which isn't as valuable as that of a senior contributor) is that this is a move towards a cleaner design. As far as I can tell, doing this would require two things: exposure of an abstract deserialization interface (e.g., AbstractFile), and a method to produce user-defined implementations of said interface (e.g., ResourceAbstractSource). However, this intent might be against Urho's design philosophy: perhaps the restriction on resource I/O is intentional?

My intent is not: to implement another specific use case to be available in ResourceCache.

public:
/// Construct.
ResourceRouter(Context* context) :
Object(context)
{
}
/// Destruct.
virtual ~ResourceRouter() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this unnecessary since Object has a virtual destructor?

@SirNate0
Copy link
Contributor

Personally, if we are refactoring classes and adding an abstract deserialization interface (above called AbstractFile) my preference would be to call the abstract interface File and the class that implements the features currently handled by File something like RealFile or PhysicalFile (or even FSFile, though that seems an ugly name). I think the Resource Cache should return the abstract interface when you call GetFile, which I feel is much less cluttered than changing GetFile to GetAbstractFile. In addition, for the most part that would not be an API breaking change (I most certainly do not want to change all of my code to use [Get]AbstractFile everywhere). There is also some precedent for it -- Texture and its derived classes.

Another potential use I see in this is that it should be possible to separate the packaged file's code from that of a regular file (though given the present setup works fine,so I don't know if this is necessary). A good test of your system might be to implement the PackageFile as a ResourceAbstractSource (may I suggest calling it just a ResourceSource or a FileSource) and move the packaged File behavior into a separate PackedFile class.

Also, while I don't have a better suggestion for the name, I disagree with the term IOStream, as it seems like std::iostream, yet it behaves significantly differently in terms of API (no << or >> operator overloads), and one would then think that to match, Deserializer and Serializer should be renamed IStream and OStream.

As a general comment, I like the idea of what you're doing -- lifting the restriction on sources for Files from just the file system and package files (and network requests, I suppose), but I don't like the present way in which you have done it, mostly because of the name changes and the change leaves the present way packed files are handled as just a special case of a normal file, rather than extracting it to a separate implementation that would serve as an example for new implementations (personally I think the present way File handles regular and packaged files feels rather hacked together, though it is certainly tested and effective). With that said, I do see potential benefits to adding these features, such as adding support for encrypted archives.

One other thing I would say about this -- I think you should make the system work such that custom File classes (like my suggested PackedFile can be successfully (and relatively easily) constructed outside of the ResourceCache the way we can at present (and the way we are required to if we want to write to a file).

I do have a question as well for anyone who knows -- is there any significant difference between a read-only File and a Deserializer, other than that File is a ref-counted Object? (I ask because ResourceCache returns read-only files).

@SirNate0 SirNate0 mentioned this pull request Dec 18, 2017
@tiberiusbrown
Copy link
Author

Sorry for the delay in replying -- I've been visiting relatives during the holiday season. Thank you so much for the detailed feedback @SirNate0!

Personally, if we are refactoring classes and adding an abstract deserialization interface (above called AbstractFile) my preference would be to call the abstract interface File and the class that implements the features currently handled by File something like RealFile or PhysicalFile (or even FSFile, though that seems an ugly name). I think the Resource Cache should return the abstract interface when you call GetFile, which I feel is much less cluttered than changing GetFile to GetAbstractFile. In addition, for the most part that would not be an API breaking change (I most certainly do not want to change all of my code to use [Get]AbstractFile everywhere). There is also some precedent for it -- Texture and its derived classes.

I agree with all of this! This approach does break backwards compatibility in the API however.

Another potential use I see in this is that it should be possible to separate the packaged file's code from that of a regular file (though given the present setup works fine,so I don't know if this is necessary). A good test of your system might be to implement the PackageFile as a ResourceAbstractSource (may I suggest calling it just a ResourceSource or a FileSource) and move the packaged File behavior into a separate PackedFile class.

Yes, that would be a beautiful application, and probably wouldn't break too much code as I imagine most projects don't use PackageFile directly, but only through ResourceCache.

Also, while I don't have a better suggestion for the name, I disagree with the term IOStream, as it seems like std::iostream, yet it behaves significantly differently in terms of API (no << or >> operator overloads), and one would then think that to match, Deserializer and Serializer should be renamed IStream and OStream.

That seems reasonable.

I see that you have a pull request of your own. I do think it is a much cleaner approach, but it does break API compatibility. If the Urho devs think that is an acceptable change, I'm all for it too, I suppose. Thanks again!

@github-actions
Copy link

Marking this stale since there has been no activity for 30 days.
It will be closed if there is no activity for another 15 days.

@github-actions github-actions bot added the stale label Mar 23, 2020
@github-actions github-actions bot closed this Apr 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants