Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to generate scale.txt and tuples_dso_optimization_windows.txt files for custom data? #58

Open
yahskapar opened this issue Sep 8, 2023 · 7 comments

Comments

@yahskapar
Copy link

Hi folks,

I've been able to get some interesting results using this pipeline which I'm grateful that the authors were able to make publicly available. I have two questions that I've been scratching my head about, however:

  1. How does one generate a tuples_dso_optimization_windows.txt file per sequence for a given custom dataset, similar to the appearance of this file in the TANDEM-format Replica dataset provided? This file, which appears to do with MVS configuration, appears to be important. Even with the ScanNet pre-trained model evaluated on the Replica dataset, there's a significant drop in performance when generating tuples without providing the aforementioned .txt file (absolute relative error goes from 0.0384 to 0.0706).

  2. How were the scale.txt files generated per set of depth images for a sequence? These scale.txt files seem to contain only a single floating point value, which is a bit confusing since I'd expect the depth scale to have some slight variation at least from sequence to sequence. Currently, I've been computing the depth scale using the camera pose information and ultimately taking the mean to also get a single floating point value, but I'm not sure if this is a reasonable approach given how bad my results ended up being after training with my custom dataset and testing on an eval split of it.

I'd appreciate any and all help regarding these two files if anyone has any pointers. Thanks!

@louie-luo
Copy link

execuse me. Have you resolved these problems yet? I also encounter the same problems as yours. I've been trying to train on other datasets. But these two problems really confuse me a lot.

@yahskapar
Copy link
Author

Hi @louie-luo,

I haven't fully resolved these problems yet, but I believe I've made some progress and maybe sharing that progress here will 1) help others with their own efforts or 2) help others point out mistakes in my efforts. I should also note a lot of what I note below at times required some random, usually small change in the repo's code somewhere, and I may not explicitly include that below just because I've forgotten at this point (this past week has been a blur for me). Regardless, if you run into something like that and post the error message, I should be able to help.

The tuples_dso_optimization_windows.txt and similarly named files

This one is a bit tricky and I definitely haven't fully resolved yet. It should be noted that providing this file is effectively done in place of enabling and using thie below lines in the .yaml training configs as well as elsewhere:

# If TUPLES_DEFAULT_FLAG==True, the tuples will be made up of frames with this index distance.
TUPLES_DEFAULT_FLAG: False
TUPLES_DEFAULT_FRAME_DIST: 20
TUPLES_DEFAULT_FRAME_NUM: 3

If you didn't have the tuples_dso_optimization_windows.txt file generated, you can instead just set TUPLES_DEFAULT_FLAG to True and then play with the TUPLES_DEFAULT_FRAME_DIST and TUPLES_DEFAULT_FRAME_NUM parameters depending on the length of frames in your data sequences. Now, let's say you don't want to do that and do want to generate the tuples_dso_optimization_windows.txt file, which as far as I know can be done in two ways:

  • Generate the keyframes to be included in the tuples file using the poses_dso.txt file. You will additionally need a groundtruth_tum.txt file (basically your poses_gt.txt file in the format as the TUM RGB-D evaluation tools) and the result.txt file you'd get from running TANDEM with the dataset preset to generate that sequence-specific scale (which is different from your global depth scale for the scale.txt file).

On your sequences, run something like from the tracking_euroc.bash file already in the repo:

https://github.com/tum-vision/tandem/blob/f8816c7d9a92b29e84e3d9055c2d3e28056e4a37/tandem/scripts/tracking_euroc.bash#L20-L27C

This will generate a poses_dso.txt file and a result.txt file (the latter will not be empty assuming you were able to do tracking at least somewhat successfully). Convert the poses_dso.txt file to the TUM RGB-D evaluation tools format (expected to be format: timestamp tx ty tz qx qy qz qw but you can also just have the frame index in place of the timestamp, I think). Feel free to use this GitHub gist I made toward that effort: https://gist.github.com/yahskapar/86da0d0c7526dd4ba4c9e25c031ab382

Your generated groundtruth_tum.txt file and the result.txt file from running TANDEM on your data can now be used with this file already in this repo to generate the sequence-specific scale you'd need (again, not to be confused with the scale.txt file and the global depth scale value in there). Alternatively, you can just use this other (secret, viewable by link only) gist I made to generate the final tuples files: https://gist.github.com/yahskapar/24a59b0a1099e053db9c79fad9e677cc. Please carefully read the code here before using it out-of-the-box, it's very possible you might have to modify some of it for your needs. Also, I'm not guaranteeing (yet) it's even the correct way to do this!

  • Generate the keyframes to be included in the tuples file using the result.txt file. You will additionally need a groundtruth_tum.txt file (basically your poses_gt.txt file in the format as the TUM RGB-D evaluation tools) and the same result.txt file you'd get from running TANDEM with the dataset preset to generate that sequence-specific scale (which is different from your global depth scale for the scale.txt file).

Follow the same steps as before, except note the last few lines of the gist I last pointed out here. You'll want generate_tuples(poses_result_path, None, scale=aligned_scale) to just use the result.txt file, but this won't work well if TANDEM ends up being unable to generate keyframes for your data (which is very possible if your data is quite different from what the authors showed the pipeline to work on, for example medical data). If this doesn't work, I'd recommend just using the poses_dso.txt file which will just use all (if the number of frames in a given data sequence is divisible by 8, otherwise it will exclude a few frames up to 7) of the frames as keyframes.

The scale.txt file

This is a lot simpler and I'm reasonably confident about having done this the correct way at this point, at least if you know the range of depth. Let's say you know the range of depth in your depth data (usually available somewhere in corresponding public dataset information) and it's 0 to 100mm (using this as an example from the dataset I used, C3VD). In your grayscale depth map, the maximum value is 255 and is loaded as such into TANDEM, so that maximum value should correspond to 100mm (or 0.1 meters). 0.1/255 will yield your global depth scale value that should go into scale.txt in this case. In your .yaml config, DEPTH_MIN will be given by 2 * that global depth scale value while DEPTH_MAX will be given by 255, the maximum image intensity, multiplied by the global depth scale value.

Feel free to use the below code to easily do this for the first time or to change a scale value in an existing scale.txt file with a given dataset you have that has sequences with a depths folder:

import os

# Function to modify the scale.txt file
def modify_scale_file(folder_path):
    scale_file_path = os.path.join(folder_path, "depths", "scale.txt")
    new_scale_value = "0.00039215686"

    try:
        # Check if the scale.txt file exists
        if not os.path.exists(scale_file_path):
            # If it doesn't exist, create it and write the new scale value
            with open(scale_file_path, "w") as scale_file:
                scale_file.write(new_scale_value)
                print(f"Created and modified scale.txt in {folder_path}")
        else:
            # If it exists, open it in write mode and write the new scale value
            with open(scale_file_path, "w") as scale_file:
                scale_file.write(new_scale_value)
                print(f"Modified scale.txt in {folder_path}")
    except Exception as e:
        print(f"Error modifying scale.txt in {folder_path}: {e}")

# Specify the parent folder containing the sequences
parent_folder = "/playpen-nas-ssd/akshay/3D_medical_vision/datasets/C3VD_registered_videos_undistorted_V2"

# Iterate through all folders within the parent folder
for sequence_folder in os.listdir(parent_folder):
    sequence_folder_path = os.path.join(parent_folder, sequence_folder)

    # Check if the item is a directory (sequence folder)
    if os.path.isdir(sequence_folder_path):
        modify_scale_file(sequence_folder_path)

If you don't know your real-world range of depth (even if it's clamped as it is in some medical datasets), I'm not sure on the best way to calculate the global depth scale or how the authors even calculated it. You could maybe try searching around and seeing if there's some method that could give you a real-world range of depth on self-captured data. In the past I've calculated global depth scale using ground truth camera poses, which got me somewhat close to the method using a range of depth that I mentioned above, but ended up being too inaccurate.

@yahskapar
Copy link
Author

Also, please take everything in my previous reply with a grain of salt, especially since I haven't quite gotten an optimal baseline (as far as I know) with the data I'm working with (colonoscopy data). I will try to do a better job of documenting my final approach and better distilling some of the above information once I achieve that optimal baseline.

@louie-luo
Copy link

Really thanks for your detailed comments. There are a few questions I'd like to discuss with you while training my own datasets. You say the "scale.txt" file is related to the range of depth. On my datasets, the range of depth is 0 to 80m. So my gobal depth scale value should be 80/255 = 0.3137. As you say, DEPTH_MIN and DEPTH_MAX should be 0.63 and 80 respectively. However, while I'm trying to train my datasets, the error occurs as follows.
File "/home/thu440-3080ti/LL/tandem/cva_mvsnet/models/tandem.py", line 43, in forward depth_max=batch['depth_max'], File "/home/thu440-3080ti/anaconda3/envs/mvsnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/thu440-3080ti/LL/tandem/cva_mvsnet/models/cva_mvsnet.py", line 162, in forward volume_gate=self.volume_gates[stage] if self.view_aggregation else None, File "/home/thu440-3080ti/LL/tandem/cva_mvsnet/models/module.py", line 1144, in depth_prediction assert_isfinite(depth_pred, "Depth should be finite") File "/home/thu440-3080ti/LL/tandem/cva_mvsnet/models/module.py", line 27, in assert_isfinite assert not (nans.any() or infs.any()), msg AssertionError: Depth should be finite; #Nan = 20480, #Inf = 0.
Then I just change the DEPTH_MAX from 80 to 1000,the error get fixed totally! And the training begin successfully. (Although the loss is really high, the value of loss in epoch 20 is 42, in epoch 50 is 24, in epoch 100 is 18.)
图片
This makes me feel strange. Then I look at the datasets that TANDEM author gives, which is "tandem_replica" datasets. The value of "scale.txt" is 0.00015259. However, in "default.yaml" the DEPTH_MIN and DEPTH_MAX are 0.01 and 10 respectively. they do not correspond to 2 * and 255 * the global depth scale value.
Why do these happen? I get a little confused. Hope can get comments from you. Also, thanks again for your help.

@yahskapar
Copy link
Author

Hi @louie-luo,

Happy to try to help. Few things:

  1. Did you check the range of values for your depth map from within the TANDEM code's function to read depth itself, meaning while using the -1 option to read the depth image as is like below:

depth = cv2.imread(fname, -1)

It's possible that the range of your depth map isn't actually 0 to 255, and that you may be reading the depth incorrectly in your own code to establish the global depth scale value incorrectly. I would double-check that with the depth images in the tandem_replica dataset as well. It's possible my reasoning isn't sound and there's something more to do this that isn't documented as well, but I've been able to get reasonable (albeit less consistent from frame-to-frame) depth map results thus far with a much smaller range of depth (0 to 100mm) and 16-bit depth images (maximum possible value is 65535).

  1. I'm not sure if this is the only place where this is done in the CVA_MVSNet code based on my recollection at least, but you may want to change some values that unfortunately were hard-coded elsewhere if you have different input depth ranges. For example, here below:

# TODO: Replace 10.0 by max depth
view_aggregation_features = torch.cat([cos_baseline_angle, calc_depth_ref / 10.0, calc_depth_src / 10.0],
dim=1) # (B, 1, D, H, W)

Double-checking for any hard-coded values of importance gets even more complicated with the full TANDEM pipeline (which involves the C++ implementation), and is something that I haven't exhaustively done for that matter.

Sorry that I can't be of much help aside from the above, I'll check back into this thread if I have any more ideas or happen to re-visit using aspects of the TANDEM pipeline. Personally, I was interested in consistent depth prediction with this pipeline and ended up finding some monocular methods (U-NET, DPT-Hybrid) that performed much more consistently given my data (which does have fairly few frames compared to datasets like the Replica one, as well as significantly less camera motion).

@louie-luo
Copy link

louie-luo commented Oct 13, 2023

Hi @yahskapar ,

Really thanks for your explicit and useful comments!

Firstly, I recheck my depth map and find that it's indeed not 8-bit but 16-bit. That is to say the range should go as 0 to 65535. (Tandem_replica dataset is the same, 16-bit. )

Secondly, the values that have to be changed are mainly in "module.py". Besides you mention above, I find three as follows:

line 1184, line 1205 and line 1221. All in "module.py".

If you find other hard-coded values need changing, please tell me.

Now my training gets much better. The loss has dropped to a normal value. Thanks again for your help. I cannot train my data successfully without your comments!

@yahskapar
Copy link
Author

yahskapar commented Oct 13, 2023

No problem, happy to help! I will let you know if I find anything else that needs to be changed. I'm not currently working with this code at the moment, but I had trouble with TSDF volume initialization in this project's code on my 0 to 100mm depth range data before, so I will have to re-visit that sometime soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants