The only dependency to run this (with reasonably new versions):
- torch
- numpy
- omegaconf
- wandb
- python-dotenv
- timm
- ipykernel

This works for us
```
conda create -n dinov2eval python=3.10 --yes
conda activate dinov2eval
pip install torch numpy omegaconf wandb python-dotenv timm ipykernel
```

Actually needed are only pytorch, numpy, and omegaconf but the others are imported on the way.


Additionally, importing from `dinov2` needs to work. Options:
- add to PYTHONPATH: 
    - in jupyternotebook: os.chdir(/path/to/PanOpticOn)
    - in python command line: PYTHONPATH=/path/to/PanOpticOn/ python ...
- install as module: pip install -e ., but this will install additional dependencies

In [1]:
import torch
import os
os.chdir('/home/lewaldm/code/PanOpticOn')

In [None]:
# load model
from dinov2.eval.setup import setup_logger, parse_model_obj

logger = setup_logger('dinov2', to_sysout=True, simple_prefix=True)
model = parse_model_obj(
    model_obj='path/to/downloaded/folder',
    return_with_wrapper=False)

  from .autonotebook import tqdm as notebook_tqdm
  ckpt = torch.load(ckpt_path, map_location='cpu') # just load to check keys


23:41:04 __init__.py:59] Using student weights for teacher
23:41:04 vision_transformer.py:114] using MLP layer as FFN
23:41:04 vision_transformer.py:130] Embedding layer: ChnAttnPatchEmb
23:41:04 panopticon.py:25] ChnAttnPatchEmb: id_attn_block: ChnAttnBlockSimple
23:41:04 panopticon.py:101] ChnAttnBlockSimple: norm_input: False, skip_conn: True, norm_output: False, use_layer_scale: False
23:41:05 setup.py:179] Built model dinov2
23:41:05 utils.py:24] Processing pretrained weights from /data/panopticon/logs/dino_logs/ds4/+rgbhead+ibot/lr=1e-4_warmup=0_lrmul=backbone=0.2_wibot=0.1/model_final.pth


  state_dict = torch.load(ckpt_path, map_location="cpu")


23:41:06 utils.py:33] Take key 'model' in provided checkpoint dict
23:41:06 utils.py:53] Applied prefix load map: {'teacher.backbone.': ''}
23:41:06 utils.py:55] From "/data/panopticon/logs/dino_logs/ds4/+rgbhead+ibot/lr=1e-4_warmup=0_lrmul=backbone=0.2_wibot=0.1/model_final.pth" selected keys are: ['cls_token', 'pos_embed', 'mask_token', 'patch_embed.patch_emb.proj.weight', 'patch_embed.patch_emb.proj.bias', 'patch_embed.chnattnblock.query', 'patch_embed.chnattnblock.norm1.weight', 'patch_embed.chnattnblock.norm1.bias', 'patch_embed.chnattnblock.attn.inproj_q.weight', 'patch_embed.chnattnblock.attn.inproj_q.bias', 'patch_embed.chnattnblock.attn.inproj_kv.weight', 'patch_embed.chnattnblock.attn.inproj_kv.bias', 'patch_embed.chnattnblock.attn.proj.weight', 'patch_embed.chnattnblock.attn.proj.bias', 'patch_embed.chnattnblock.ls1.gamma', 'patch_embed.chnattnblock.norm2.weight', 'patch_embed.chnattnblock.norm2.bias', 'patch_embed.chnattnblock.mlp.fc1.weight', 'patch_embed.chnattnblock.mlp.

The data format of our model is a dictionary. It contains the following keys:
- imgs: tensor with the image data of shape (B, C, H, W)
- chn_ids: tensor of shape (B, C) containing identifier of the channels used. For optical satellites, this is the wavelength in micro meter (e.g. 664 for red). For SAR satellites, negative indices {-1, ..., -12} are used as below.

SAR map:
| Index | Description    | Direction  |
|-------|----------------|------------|
| -1    | VV             | both       |
| -2    | VH             | both       |
| -3    | HH             | both       |
| -4    | HV             | both       |
| -5    | VV             | ascending  |
| -6    | VH             | ascending  |
| -7    | HH             | ascending  |
| -8    | HV             | ascending  |
| -9    | VV             | descending |
| -10   | VH             | descending |
| -11   | HH             | descending |
| -12   | HV             | descending |


In [4]:
# create artificial input (optical RGB channels concatenated with VV-both and VH-desc)
x_dict = {
    'imgs': torch.randn(2, 5, 224, 224),  # B, C, H, W
    'chn_ids': torch.tensor([[664, 559, 492, -1, -10], 
                             [664, 559, 492, -1, -10]]) # B, C
}

model.eval()
model.cuda()

x_dict['imgs'] = x_dict['imgs'].cuda()
x_dict['chn_ids'] = x_dict['chn_ids'].cuda()

In [5]:
""" Get the output of any intermediate layers of the model. """
idx_blocks_to_return = [4, 6, 10]

outputs = model.get_intermediate_layers(x_dict, n=idx_blocks_to_return)

# outputs is a tuple of length len(idx_blocks_to_return). Each element is the output
# of the layer with the specified index in idx_blocks_to_return. All elements have
# the same shape.

assert isinstance(outputs, tuple)
assert len(outputs) == len(idx_blocks_to_return)
print('length of outputs: ', len(outputs))
print('patches shape:     ', tuple(outputs[0].shape))

# warning: negative indexing not supported! 

length of outputs:  3
patches shape:      (2, 256, 768)


In [6]:
# classification
idx_blocks_to_return = [4, 6, 10]

outputs = model.get_intermediate_layers(x_dict, n=idx_blocks_to_return, return_class_token=True)

# outputs is a tuple of length len(idx_blocks_to_return). Each element is a tuple of two tensors:
# the first tensor is the output of the corresponding layer, with shape (B, number of patches, embedding dimension)
# the second tensor is class token with shape (B, 1, embedding dimension)

assert isinstance(outputs, tuple)
assert len(outputs) == len(idx_blocks_to_return)
print('length of outputs: ', len(outputs))
print('patches shape:     ', tuple(outputs[0][0].shape))
print('class token shape: ', tuple(outputs[0][1].shape))

# warning: negative indexing not supported! 

length of outputs:  3
patches shape:      (2, 256, 768)
class token shape:  (2, 768)
