`predict_tile()` output is incorrect when multiple GPUs are present #646

ethanwhite · 2024-03-31T18:53:24Z

When running predict_tile() in the presence of multiple GPUs the multiple GPUs are automatically detected and used. The output from these runs differs from runs with a single GPU in terms of the numbers and positions of the predicted boxes. This can be confirmed with the following reprex:

from deepforest import main
from deepforest import get_data

model = main.deepforest()
model.use_release()
raster_path = get_data("OSBS_029.tif")
predicted_raster = model.predict_tile(raster_path, return_plot = False, patch_size=300, patch_overlap=0.25)
predicted_raster.to_csv("boxes.csv")

Running on an interactive instance with 1 GPU (on the HiPerGator) we get:

In [20]: one_gpu.sort(by = ['xmin', 'ymin'])
Out[20]: 
shape: (94, 8)
┌─────┬───────┬───────┬───────┬───────┬───────┬──────────┬──────────────┐
│     ┆ xmin  ┆ ymin  ┆ xmax  ┆ ymax  ┆ label ┆ score    ┆ image_path   │
│ --- ┆ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---      ┆ ---          │
│ i64 ┆ f64   ┆ f64   ┆ f64   ┆ f64   ┆ str   ┆ f64      ┆ str          │
╞═════╪═══════╪═══════╪═══════╪═══════╪═══════╪══════════╪══════════════╡
│ 86  ┆ 0.0   ┆ 0.0   ┆ 16.0  ┆ 18.0  ┆ Tree  ┆ 0.27725  ┆ OSBS_029.tif │
│ 50  ┆ 0.0   ┆ 203.0 ┆ 7.0   ┆ 223.0 ┆ Tree  ┆ 0.430017 ┆ OSBS_029.tif │
│ 54  ┆ 0.0   ┆ 260.0 ┆ 21.0  ┆ 287.0 ┆ Tree  ┆ 0.422261 ┆ OSBS_029.tif │
│ 78  ┆ 0.0   ┆ 357.0 ┆ 11.0  ┆ 387.0 ┆ Tree  ┆ 0.297332 ┆ OSBS_029.tif │
│ 36  ┆ 2.0   ┆ 156.0 ┆ 42.0  ┆ 205.0 ┆ Tree  ┆ 0.543535 ┆ OSBS_029.tif │
│ …   ┆ …     ┆ …     ┆ …     ┆ …     ┆ …     ┆ …        ┆ …            │
│ 87  ┆ 386.0 ┆ 77.0  ┆ 400.0 ┆ 104.0 ┆ Tree  ┆ 0.274974 ┆ OSBS_029.tif │
│ 75  ┆ 387.0 ┆ 174.0 ┆ 400.0 ┆ 200.0 ┆ Tree  ┆ 0.309699 ┆ OSBS_029.tif │
│ 37  ┆ 388.0 ┆ 377.0 ┆ 400.0 ┆ 398.0 ┆ Tree  ┆ 0.529458 ┆ OSBS_029.tif │
│ 43  ┆ 392.0 ┆ 135.0 ┆ 399.0 ┆ 155.0 ┆ Tree  ┆ 0.482293 ┆ OSBS_029.tif │
│ 88  ┆ 393.0 ┆ 352.0 ┆ 400.0 ┆ 375.0 ┆ Tree  ┆ 0.274659 ┆ OSBS_029.tif │
└─────┴───────┴───────┴───────┴───────┴───────┴──────────┴──────────────┘

Running on an interactive instance with 2 GPU's we get:

In [21]: two_gpu.sort(by = ['xmin', 'ymin'])
Out[21]: 
shape: (86, 8)
┌─────┬───────┬───────┬───────┬───────┬───────┬──────────┬──────────────┐
│     ┆ xmin  ┆ ymin  ┆ xmax  ┆ ymax  ┆ label ┆ score    ┆ image_path   │
│ --- ┆ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---      ┆ ---          │
│ i64 ┆ f64   ┆ f64   ┆ f64   ┆ f64   ┆ str   ┆ f64      ┆ str          │
╞═════╪═══════╪═══════╪═══════╪═══════╪═══════╪══════════╪══════════════╡
│ 45  ┆ 0.0   ┆ 103.0 ┆ 7.0   ┆ 123.0 ┆ Tree  ┆ 0.430017 ┆ OSBS_029.tif │
│ 55  ┆ 0.0   ┆ 200.0 ┆ 17.0  ┆ 226.0 ┆ Tree  ┆ 0.406657 ┆ OSBS_029.tif │
│ 64  ┆ 0.0   ┆ 250.0 ┆ 11.0  ┆ 275.0 ┆ Tree  ┆ 0.341502 ┆ OSBS_029.tif │
│ 67  ┆ 0.0   ┆ 279.0 ┆ 15.0  ┆ 312.0 ┆ Tree  ┆ 0.337557 ┆ OSBS_029.tif │
│ 84  ┆ 0.0   ┆ 367.0 ┆ 6.0   ┆ 389.0 ┆ Tree  ┆ 0.257209 ┆ OSBS_029.tif │
│ …   ┆ …     ┆ …     ┆ …     ┆ …     ┆ …     ┆ …        ┆ …            │
│ 39  ┆ 288.0 ┆ 38.0  ┆ 300.0 ┆ 57.0  ┆ Tree  ┆ 0.480631 ┆ OSBS_029.tif │
│ 32  ┆ 288.0 ┆ 377.0 ┆ 300.0 ┆ 398.0 ┆ Tree  ┆ 0.529458 ┆ OSBS_029.tif │
│ 46  ┆ 289.0 ┆ 2.0   ┆ 300.0 ┆ 24.0  ┆ Tree  ┆ 0.428087 ┆ OSBS_029.tif │
│ 38  ┆ 292.0 ┆ 135.0 ┆ 299.0 ┆ 155.0 ┆ Tree  ┆ 0.482293 ┆ OSBS_029.tif │
│ 81  ┆ 293.0 ┆ 352.0 ┆ 300.0 ┆ 375.0 ┆ Tree  ┆ 0.274659 ┆ OSBS_029.tif │
└─────┴───────┴───────┴───────┴───────┴───────┴──────────┴──────────────┘

This shows two issues:

There are fewer boxes on 2 GPUs than on 1 GPU
The boxes are shifted. Compare the last two rows. They have the same scores and y coordinates, but the x coordinates are reduced by 100 when running on 2 GPUs. Line 2 on 1 GPU and Line 1 on 2 GPUs are likely also the same tree and demonstrate a more complex shift in positions near 0,0.

This issue was discovered by @henrykironde, I'm just helping by putting together the reprex.

The text was updated successfully, but these errors were encountered:

ethanwhite · 2024-04-05T11:29:23Z

This has been addressed by forcing a single GPU for prediction (#653), but if anyone is familiar enough with multiple-GPU Pytorch lightning inference to help make it work we'd appreciate the help.

ethanwhite assigned bw4sz and henrykironde Mar 31, 2024

bw4sz mentioned this issue Apr 1, 2024

[WIP] Multi gpu tests and predict_tile error #646 #649

Closed

2 tasks

ethanwhite added the help wanted Extra attention is needed label Apr 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`predict_tile()` output is incorrect when multiple GPUs are present #646

`predict_tile()` output is incorrect when multiple GPUs are present #646

ethanwhite commented Mar 31, 2024

ethanwhite commented Apr 5, 2024

predict_tile() output is incorrect when multiple GPUs are present #646

predict_tile() output is incorrect when multiple GPUs are present #646

Comments

ethanwhite commented Mar 31, 2024

ethanwhite commented Apr 5, 2024

`predict_tile()` output is incorrect when multiple GPUs are present #646

`predict_tile()` output is incorrect when multiple GPUs are present #646