Code for our paper "Enhancing Visual Representation for Text-based Person Searching"
PyTorch 2.0.0
torchvision 0.15.0
easydict
tqdm
prettytable
The CUHK-PEDES dataset is proposed by the paper "Person Search with Natural Language Description" (Shuang Li, et al.) and can be downloaded from here. ICFG-PEDES dataset can be found here and RSTPReid dataset is here.
|-- your dataset root dir/
| |-- <CUHK-PEDES>/
| |-- imgs
| |-- cam_a
| |-- cam_b
| |-- ...
| |-- reid_raw.json
|
| |-- <ICFG-PEDES>/
| |-- imgs
| |-- test
| |-- train
| |-- ICFG_PEDES.json
|
| |-- <RSTPReid>/
| |-- imgs
| |-- data_captions.json
Directly run "run.sh" file
python test.py
We use single Nvidia RTX 3090 GPU (24G) for training and testing
Method | Rank-1 | Rank-5 | Rank-10 | mAP |
---|---|---|---|---|
AXM-Net | 64.44 | 80.52 | 86.77 | 58.73 |
LGUR | 64.21 | 81.94 | 87.93 | - |
IVT | 65.59 | 83.11 | 89.21 | - |
CFine | 69.57 | 85.93 | 91.15 | - |
TP-TPS | 70.16 | 86.10 | 90.98 | 66.32 |
VGSG | 71.38 | 86.75 | 91.86 | - |
VFE-TPS (ours) | 72.47 | 88.24 | 93.24 | 64.26 |
Method | Rank-1 | Rank-5 | Rank-10 | mAP |
---|---|---|---|---|
LGUR | 59.02 | 75.32 | 81.56 | - |
MANet | 59.44 | 76.80 | 82.75 | - |
CFine | 60.83 | 76.55 | 82.42 | - |
TP-TPS | 60.64 | 75.97 | 81.76 | 42.78 |
VFE-TPS (ours) | 62.71 | 78.73 | 84.51 | 43.08 |
Method | Rank-1 | Rank-5 | Rank-10 | mAP |
---|---|---|---|---|
LBUL | 45.55 | 68.20 | 77.85 | - |
IVT | 46.70 | 70.00 | 78.80 | - |
CFine | 50.55 | 72.50 | 81.60 | - |
TP-TPS | 50.65 | 72.45 | 81.20 | 43.11 |
VFE-TPS (ours) | 59.25 | 81.90 | 88.85 | 45.96 |
Our code is partially based on CLIP, IRRA. Sincerely appreciate for their contributions.