-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate other large datasets. #340
Comments
Keep an eye on https://www.fruitpunch.ai/challenges/ai-for-trees. We have downloaded this dataset from Kenya, currently cannot get annotations to overlap. In orange: |
There my be data associated with
|
https://essd.copernicus.org/preprints/essd-2022-312/ germany tree species |
Synthetic trees from ground view, useful in understanding generalization. |
https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.13860 |
Is there any overlapping RGB data here? Will need to contact each individually. https://open-research-europe.ec.europa.eu/articles/3-32 |
@henrykironde can we download these datasets and let's start listing here. I'll do some and you can do some as we prepare for GSOC students. For each dataset:
Overall, we can start moving them to
In general my hope is to create a train and test split for each dataset where possible. |
Summary of https://lila.science/datasets/forest-damages-larch-casebearer/.
There are 60,000 training trees.
There are 41,148 test trees. That seems like too many test compared to train. We should reset that.
The resolution of the image is unknown, but looks like be about 10cm. |
Here is a report for https://docs.google.com/document/d/16kKik2clGutKejU8uqZevNY6JALf4aVk2ELxLeR-msQ/edit I have sent the maintainers an email. Things didn't quite line up. Hello, The tree detection tool has become quite popular in the scientific community and we are looking to bolster it with images around the world. I've been collecting links to tree annotations and imagery and have a student this summer who will start training new models. I went to inspect the dataset linked from https://docs.google.com/document/d/16kKik2clGutKejU8uqZevNY6JALf4aVk2ELxLeR-msQ/edit, but I became a bit confused. Looking around https://map.openaerialmap.org/#/-175.26188850402832,-21.1463924204947,13/square/20002233030/5a82999d5a9ef7cb5d5ae685?_k=ofo82l I found a geotiff that does overlap the area. However, zooming in, it looks as if there isn't strong georeferencing, the tree points don't correspond to trees. They seem more or less randomly distributed. I'm assuming that this is not the tile that was intended to be georeferenced to this shapefile. All help appreciated. There are 13,000 labeled trees. We may need to clean up non-coconut trees with new annotations. |
looks like drone images from turkey. |
https://zenodo.org/record/7528566 https://www.mdpi.com/2072-4292/15/5/1463 This is a private dataset and cannot be shared or re-used in additional capacity besides improving the model baseline. I can see the images but haven't checked the .xml annotation overlap. The image quality looks dark but acceptable. |
Report on
This dataset is now on Figure is from paper, trained model, but gives a sense for the resolution and annotation format. Data is still downloading from zenodo. |
I wrote the corresponding author: https://www.sciencedirect.com/science/article/pii/S0303243422000903 A small number of polygon annotations from central park NY. |
Here is a report on https://github.com/jonathanventura/urban-tree-detection-data from article https://arxiv.org/pdf/2208.10607.pdf Nice summary on the README. Excellent paper. https://github.com/jonathanventura/urban-tree-detection-data#readme Annotations are made in point format! This may require more trouble than its worth, but could be generated in a semi-supervised manner using existing deepforest baseline. As a way of improving resolution and urban scenes. NAIP imagery is in high demand from deepforest users. Atleast to be used in an evaluation benchmark against NAIP. |
Exploring https://doi.pangaea.de/10.1594/PANGAEA.933263?format=html#download. Siberia trees from.
The files are too large to download, so I don't know which orthomosaics go with the crown polygons. I downloaded "Kruse_et_al_SiDroForest_RGB_Orthomosiac.zip", but there are two others. From this image it make it seem like there are small image chips, not one giant tile, but I don't see that in the manifest. There are also synthetic trees that I have not investigated: https://doi.pangaea.de/10.1594/PANGAEA.932795 |
Deepforest was used to preprocess training data in Ecuador. Important to use only in train, since it is weakly annotated and already touched our system. https://arxiv.org/pdf/2201.11192.pdf |
We have very little data from south asia. May be interested in annotating some tiles from https://www.mdpi.com/1999-4907/14/3/586 |
9,000 trees from TEAK and SOAP neon sites. with alive dead labels https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2022JG007234 available on github data will need to be cropped |
We’ve received approval by our field partner (LEAD Foundation) too that they are willing to share the datasets that we collected together. We have drone imagery of 17 villages in Tanzania where we implement an agroforestry program. For 13 of these villages, we have data over 3 consecutive years (2018 , 2019, 2020) in this cloud bucket: https://console.cloud.google.com/storage/browser/justdiggit-drone Some of the data has been labeled. The file ‘justdiggit-drone/label_sample/Annotations_trees_only.json’ contains the most complete annotations. There is also another file 'label_sample_COCO_RDP - Tree annotation.json' which also contains a class 'messy vegetation' in addition to trees. However, there are very few annotations for this class compared to the trees. We have some more recent data too, but that hasn’t been added to the cloud bucket yet. Let me know if you would need that too. For some of the villages we also have 50 cm resolution satellite images (Planet SkySat) captured around the same time as we have the drone flights (as we want to use the detected trees on our drone imagery to detect trees on this satellite imagery too). They are in this bucket: https://console.cloud.google.com/storage/browser/justdiggit-skysat Has been downloaded to
very large, over 300GB. |
https://ai-climate.berkeley.edu/scale-mae-website/ scale training and scale based pretraining |
I had a meeting with Josh Veitch Michaelis from ETH about https://restor.eco/?lat=26&lng=14.23&zoom=3. OpenStreetMap data, alot of labels, not sure the status. |
https://www.biorxiv.org/content/10.1101/2023.08.03.548604v1.full.pdf polygon annotations, 14 species
I emailed the authors. Nice paper. Dataset is out with RGB and LiDAR imagery. https://zenodo.org/records/8148479 |
Check back in on this preprint
Data Availability Statement: Crown data used for model training and all M.
polymorpha data products are openly available in Figshare at
xxxxx.Acknowledgments: The Global Airborne Observatory (GAO) is managed by
the Center for Global Discovery and Conservation Science at Arizona State
University. The GAO is made possible by support from private foundations,
visionary individuals, and Arizona State University.
…On Wed, Aug 9, 2023 at 2:40 PM Benjamin Weinstein ***@***.***> wrote:
https://www.biorxiv.org/content/10.1101/2023.08.03.548604v1.full.pdf
polygon annotations, 14 species
The dataset created here (23,000 segmented individual tree crowns) includes 13 species and genera,
which have an environmental and an economic importance in Northeastern North America.
Moreover, these trees are segmented on seven different dates, totalling almost 161,000 annotated
tree crowns. This dataset is available online for use by others.
I emailed the authors. Nice paper.
—
Reply to this email directly, view it on GitHub
<#340 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJHBLDKXRUOYAAGTBEH5ALXUP7UTANCNFSM57ZN6MTA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Ben Weinstein, Ph.D.
Research Scientist
University of Florida
http://benweinstein.weebly.com/
|
https://zenodo.org/record/8008028 https://www.mdpi.com/2072-4292/15/14/3599 Netherlands
|
data was not made available, but probably could ask authors. Netherlands. https://www.mdpi.com/2072-4292/15/17/4128# |
There is a LiDAR dataset, but one figure has a ortho basemap. I emailed the authors. https://essd.copernicus.org/articles/14/2989/2022/essd-14-2989-2022.pdf |
An additional siberia dataset? https://doi.pangaea.de/10.1594/PANGAEA.957253 |
Need to write https://www.nature.com/articles/s41598-020-79653-9 |
France LiDAR paper, is there orthos? https://www.mdpi.com/2072-4292/14/5/1083 |
unlabeled https://datadryad.org/stash/dataset/doi:10.5061/dryad.21t1805 tropical forest liana |
If you still need help then you should change the CRS to ESPG:3857 for .tif layer |
Thanks @rlrognstad. We got this one, and are in contact with the authors. I appreciate you looking with us! I'm compiling a list below. This issue is pretty old and pre-dates any formal attempt in pulling data together. They actually used DeepForest for a portion of their training structure. I suppose I can close this issue and point to the sheet. https://docs.google.com/spreadsheets/d/1-Q6ekQNE7TZBHQnrbjGl_2tcfh_X9154Kbzn3YFFruM/edit?usp=sharing Definitely let me know about absolutely anything you see! |
This issue is meant for cataloging data that should eventually go into a deepforest baseline. Happy to have contributions from the community on this issue.
https://arxiv.org/pdf/2208.10607.pdf
https://github.com/jonathanventura/urban-tree-detection-data
A portion of this could be used.
https://google.github.io/auto-arborist/
More forest data
https://lila.science/datasets/forest-damages-larch-casebearer/Along with this issue we need a strategy for updating the baseline model and potential tradeoffs to new model weights.
nightonion/yosemite-tree-dataset#2
Roadmap
The text was updated successfully, but these errors were encountered: