-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Implementation of DiskArray interface #105
Conversation
I think the underlying But I'm not sure how bands should be treated really. Is a single band raster 2d or still 3d? In GeoData.jl you always get a 3d array with a |
Thanks for the clarification. For now I removed the tests that were marked as broken
I tend to go for consistency as well, the number of dimensions of the array should always be predictable. Since I already introduced a new function name (maybe @yeesian can comment if this is ok), should we make a distinction between |
I tried to use this implementation as input to an ESDL data cube see https://github.com/esa-esdl/ESDL.jl/pull/193. Could we implement a nicer way to get an interactive RasterDataset? AG.RasterDataset(AG.read("/path/to/file")) I don't like the name readraster, if this returns single bands. I would suggest the name readband and to always expect an Integer to indicate which band should be read. |
So I updated the PR according to the comments. @yeesian maybe you can give a short comment if you would be interested in this PR? |
To be more specific, AG.readraster("raster.tif") now returns a wrapped IDataset and I added forwarding for a bunch of methods like width, height etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it took me so long to get around to this. I was thrown off by "WIP" in the title. I like the decisions of
- the correspondence between a
- RasterDataset
- (GDAL)Dataset
- 3D (Julia) Array
- the correspondence between
- RasterBand
- DiskArray{T,2}
- 2D (Julia) Array
- that
read
returns a GDALDataset
whilereadraster
returns aRasterDataset
.
Since we're introducing RasterDataset as a new data type, it's worth documenting it well, so that we don't have to revisit these design decisions in future PRs. For that reason, it might also be worth putting some thought into the sets of methods that we'll like to support (and not support) on it, and to provide docstrings for those functions that are endorsed for use.
The comments I left are more of minor questions -- for that reason, I don't require that they be addressed before merging it. Nonetheless, I think it'll be good to have visr@ and rafaqz@ (and evetion@) weigh in on this PR too.
Thanks for the comments @yeesian I tried to better document the RasterDataset in the source code which should address many of your comments. Later this week I will try to update https://github.com/yeesian/ArchGDAL.jl/blob/master/docs/src/rasters.md and add a paragraph about dealing with RasterDatasets. Maybe we get some more opinions by @rafaqz or @visr or @evetion on which methods should be forwarded for RasterDatasets or other design decisions that should be taken into account. |
Great to see this coming together. I'll go over it later this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things that would be good to address still:
- Could you use a 4 space indent, for consistency with the rest of the ArchGDAL code?
- Some tests for the specific functions and types that you added would be good to have
- Doc updates: this probably needs some updates to the raster, dataset and windowed reads and writes pages.
For the forwarding of AbstractDataset
functions, I quickly went over dataset.jl
and came up with these functions:
copywholeraster(source::AbstractDataset, dest::AbstractDataset; <keyword arguments>)
copy(dataset::AbstractDataset; [filename, [driver, [<keyword arguments>]]])
write(dataset::AbstractDataset, filename::AbstractString; kwargs...)
getdriver(dataset::AbstractDataset)
filelist(dataset::AbstractDataset)
testcapability(dataset::AbstractDataset, capability::AbstractString)
listcapability
setgeotransform!
ngcp(dataset::AbstractDataset)
setproj!(dataset::AbstractDataset, projstring::AbstractString)
buildoverviews!
Plus from rasterio.jl
functions like rasterio!/read/write/read!/write!
.
These are quite a lot though. We could consider using a delegation/forwarding macro like discussed in this thread.
Although since there are ths many, can't we just make RasterDataset
an AbstractDataset
instead of an AbstractDiskArray{T,3}
, and add the needed DiskArrays method we would otherwise get? The DiskArrays readme mentions that instead of subtyping you can also use interpret_indices_disk
. Or is there another reason you didn't use that here?
The This is landing in GDAL 3.1.0 which will come out in a few days:
Haven't dived much in that RFC yet though. |
Thanks a lot for the commments @visr.
The main reason is that I would like a Initially I also intentionally did not implement the methods from rasterio.jl, because I see the RasterDataset as an alternative interface to treating a Dataset with much more Julian syntax getindex/setindex syntax. So my thinking was that a user either uses the classical |
Ok, I see the advantages of subtyping |
Good, so I have added a paragraph to the documentation and added additional unit tests. I think I have to check coveralls to find out why there is still a decrease in coverage. Otherwise I think I have addressed the major comments and this should be ready to go, please let me know if I forgot something. |
Yes I just created the PR on the registry: JuliaRegistries/General#19258 |
Ok, the new DiskArrays version got tagged. I think @felixcremer wanted to look at the reduced test coverage and add a few tests for the WindowIterator, after that, test coverage should also be fine again. |
I made a pull request against your fork to add tests for WindowIterator. I am wondering, whether we should rename the RasterDataset to RasterArray, because it might be confusing, that the RasterDataset is not a subtype of AbstractDataset. Also the display function for RasterBand and RasterDataset fall back to the show methods of a DiskArray. I think it would be nice to fallback to the show method of the underlying dataset for RasterDataset and to indicate that it is a RasterBand for the RasterBand. |
(cherry picked from commit 9f731d5)
I just looked at the coveralls results and the missing lines in iterators.jl are tested, but not picked up correctly by coveralls. |
also upgrades to Documenter 0.25 See also yeesian#124
Thanks, I didn't see a PR but found the commit and put it on this branch with
I see your point, but can also see the advantages of having Dataset in the name, since it wraps a Dataset. I'd say let's keep it like this.
Indeed. I don't quite understand why it hits those fallbacks, since in I added a few commits. Some of the doc changes are unrelated, sorry for that, but needed to get the docs to build again. Since this changes the existing types, I don't think we can release this as a patch release, so it would be 0.5.0. |
I think, the problem is, that DiskArrays defines a method of show with a MIME type, [1] show(io::IO, ::MIME{Symbol("text/plain")}, X::DiskArrays.AbstractDiskArray) in DiskArrays at src/DiskArrays.jl:210 and this is the function to which the plain show call from the repl is dispatched to. For the RasterDataset, we can include this line to use the show method of the underlying dataset: Base.show(io::IO, ::MIME"text/plain", raster::RasterDataset) = show(io, raster.ds) |
Co-authored-by: Felix Cremer <felix.cremer@uni-jena.de>
Thanks @felixcremer for pointing this out. I remember I had to do this for DiskArrays because otherwise sometimes the generic AbstractArray show method would still leak through, which would be very bad for a DiskArray. I can not remember the exact reason, why this was necessary, we might revisit this in DiskArrays. |
Thanks. I still want to watch JuliaCon 2020 | Display, show and print -- how Julia's display system works | Fredrik Ekre. I think the same patch is needed for |
This avoids the usage of the DiskArray show method for rasterbands We need the version without MIME type so that we can use the print function.
I added the MIME type patch to the RasterBand show function. |
I just watched the video and it looks like the preferred way is not to overload |
This is a very nice addition to ArchGDAL. Everybody good to merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great addition, thanks for seeing it through!
I appreciate the work it took to standardize on the methods for ArchGDAL across array.jl and rasterband.jl (away from the Base index methods).
This PR also makes it easier to convert multiple bands into a RGB image: using ImageCore, ArchGDAL
ds = ArchGDAL.readraster("/vsicurl/https://github.com/yeesian/ArchGDALDatasets/raw/master/gdalworkshop/world.tif")
img = colorview(RGB, normedview(PermutedDimsArray(ds, (3,2,1))))
|
As discussed here there was the suggestion to implement the DiskArray interface for the raster types in ArchGDAL. For a single RasterBand this was straightforward. However I had to jump through some hoops to make it possible to treat a whole Dataset as a AbstractDiskArray. The major problems were:
So I decided to make a new Dataset wrapper (RasterDataset) which does some checks during construction and then wraps a Dataset into a new AbstractDiskArray. I just tested this and the following functionality would come for free (plus everything else we decide to put on top of DiskArrays.jl)
windows
Some examples that I tried
@rafaqz I was confused by a few tests in test_array.jl because it looks like you allowed to omit the last index when accessing a Dataset, even if the number of bands is greater than one. I marked these tests as broken here, what was the rationale behind this behavior?