Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict_tile() output is incorrect when multiple GPUs are present #646

Open
ethanwhite opened this issue Mar 31, 2024 · 1 comment
Open
Assignees
Labels
help wanted Extra attention is needed

Comments

@ethanwhite
Copy link
Member

When running predict_tile() in the presence of multiple GPUs the multiple GPUs are automatically detected and used. The output from these runs differs from runs with a single GPU in terms of the numbers and positions of the predicted boxes. This can be confirmed with the following reprex:

from deepforest import main
from deepforest import get_data

model = main.deepforest()
model.use_release()
raster_path = get_data("OSBS_029.tif")
predicted_raster = model.predict_tile(raster_path, return_plot = False, patch_size=300, patch_overlap=0.25)
predicted_raster.to_csv("boxes.csv")

Running on an interactive instance with 1 GPU (on the HiPerGator) we get:

In [20]: one_gpu.sort(by = ['xmin', 'ymin'])
Out[20]: 
shape: (94, 8)
┌─────┬───────┬───────┬───────┬───────┬───────┬──────────┬──────────────┐
│     ┆ xmin  ┆ ymin  ┆ xmax  ┆ ymax  ┆ label ┆ score    ┆ image_path   │
│ --- ┆ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---      ┆ ---          │
│ i64 ┆ f64   ┆ f64   ┆ f64   ┆ f64   ┆ str   ┆ f64      ┆ str          │
╞═════╪═══════╪═══════╪═══════╪═══════╪═══════╪══════════╪══════════════╡
│ 86  ┆ 0.0   ┆ 0.0   ┆ 16.0  ┆ 18.0  ┆ Tree  ┆ 0.27725  ┆ OSBS_029.tif │
│ 50  ┆ 0.0   ┆ 203.0 ┆ 7.0   ┆ 223.0 ┆ Tree  ┆ 0.430017 ┆ OSBS_029.tif │
│ 54  ┆ 0.0   ┆ 260.0 ┆ 21.0  ┆ 287.0 ┆ Tree  ┆ 0.422261 ┆ OSBS_029.tif │
│ 78  ┆ 0.0   ┆ 357.0 ┆ 11.0  ┆ 387.0 ┆ Tree  ┆ 0.297332 ┆ OSBS_029.tif │
│ 36  ┆ 2.0   ┆ 156.0 ┆ 42.0  ┆ 205.0 ┆ Tree  ┆ 0.543535 ┆ OSBS_029.tif │
│ …   ┆ …     ┆ …     ┆ …     ┆ …     ┆ …     ┆ …        ┆ …            │
│ 87  ┆ 386.0 ┆ 77.0  ┆ 400.0 ┆ 104.0 ┆ Tree  ┆ 0.274974 ┆ OSBS_029.tif │
│ 75  ┆ 387.0 ┆ 174.0 ┆ 400.0 ┆ 200.0 ┆ Tree  ┆ 0.309699 ┆ OSBS_029.tif │
│ 37  ┆ 388.0 ┆ 377.0 ┆ 400.0 ┆ 398.0 ┆ Tree  ┆ 0.529458 ┆ OSBS_029.tif │
│ 43  ┆ 392.0 ┆ 135.0 ┆ 399.0 ┆ 155.0 ┆ Tree  ┆ 0.482293 ┆ OSBS_029.tif │
│ 88  ┆ 393.0 ┆ 352.0 ┆ 400.0 ┆ 375.0 ┆ Tree  ┆ 0.274659 ┆ OSBS_029.tif │
└─────┴───────┴───────┴───────┴───────┴───────┴──────────┴──────────────┘

Running on an interactive instance with 2 GPU's we get:

In [21]: two_gpu.sort(by = ['xmin', 'ymin'])
Out[21]: 
shape: (86, 8)
┌─────┬───────┬───────┬───────┬───────┬───────┬──────────┬──────────────┐
│     ┆ xmin  ┆ ymin  ┆ xmax  ┆ ymax  ┆ label ┆ score    ┆ image_path   │
│ --- ┆ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---      ┆ ---          │
│ i64 ┆ f64   ┆ f64   ┆ f64   ┆ f64   ┆ str   ┆ f64      ┆ str          │
╞═════╪═══════╪═══════╪═══════╪═══════╪═══════╪══════════╪══════════════╡
│ 45  ┆ 0.0   ┆ 103.0 ┆ 7.0   ┆ 123.0 ┆ Tree  ┆ 0.430017 ┆ OSBS_029.tif │
│ 55  ┆ 0.0   ┆ 200.0 ┆ 17.0  ┆ 226.0 ┆ Tree  ┆ 0.406657 ┆ OSBS_029.tif │
│ 64  ┆ 0.0   ┆ 250.0 ┆ 11.0  ┆ 275.0 ┆ Tree  ┆ 0.341502 ┆ OSBS_029.tif │
│ 67  ┆ 0.0   ┆ 279.0 ┆ 15.0  ┆ 312.0 ┆ Tree  ┆ 0.337557 ┆ OSBS_029.tif │
│ 84  ┆ 0.0   ┆ 367.0 ┆ 6.0   ┆ 389.0 ┆ Tree  ┆ 0.257209 ┆ OSBS_029.tif │
│ …   ┆ …     ┆ …     ┆ …     ┆ …     ┆ …     ┆ …        ┆ …            │
│ 39  ┆ 288.0 ┆ 38.0  ┆ 300.0 ┆ 57.0  ┆ Tree  ┆ 0.480631 ┆ OSBS_029.tif │
│ 32  ┆ 288.0 ┆ 377.0 ┆ 300.0 ┆ 398.0 ┆ Tree  ┆ 0.529458 ┆ OSBS_029.tif │
│ 46  ┆ 289.0 ┆ 2.0   ┆ 300.0 ┆ 24.0  ┆ Tree  ┆ 0.428087 ┆ OSBS_029.tif │
│ 38  ┆ 292.0 ┆ 135.0 ┆ 299.0 ┆ 155.0 ┆ Tree  ┆ 0.482293 ┆ OSBS_029.tif │
│ 81  ┆ 293.0 ┆ 352.0 ┆ 300.0 ┆ 375.0 ┆ Tree  ┆ 0.274659 ┆ OSBS_029.tif │
└─────┴───────┴───────┴───────┴───────┴───────┴──────────┴──────────────┘

This shows two issues:

  1. There are fewer boxes on 2 GPUs than on 1 GPU
  2. The boxes are shifted. Compare the last two rows. They have the same scores and y coordinates, but the x coordinates are reduced by 100 when running on 2 GPUs. Line 2 on 1 GPU and Line 1 on 2 GPUs are likely also the same tree and demonstrate a more complex shift in positions near 0,0.

This issue was discovered by @henrykironde, I'm just helping by putting together the reprex.

@ethanwhite
Copy link
Member Author

This has been addressed by forcing a single GPU for prediction (#653), but if anyone is familiar enough with multiple-GPU Pytorch lightning inference to help make it work we'd appreciate the help.

@ethanwhite ethanwhite added the help wanted Extra attention is needed label Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants