Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a Median Filter for Non-Finite Values #566

Merged
merged 33 commits into from Nov 11, 2021

Conversation

WilliamJudge94
Copy link
Contributor

@WilliamJudge94 WilliamJudge94 commented Nov 1, 2021

The following code has been added to Tomopy to allow for easy removal of non-finite values inside of a 3D array. The code iterates through each projection, finds non-finite values (nan/inf), and replaces them with the median value of the surrounding pixel values based upon the kernel size set by the user. The median value does not include non-finite value in its calculation.

Closes #564

@pep8speaks
Copy link

pep8speaks commented Nov 1, 2021

Hello @WilliamJudge94! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 331:51: E127 continuation line over-indented for visual indent

Comment last updated at 2021-11-11 23:12:09 UTC

@WilliamJudge94
Copy link
Contributor Author

I took a look at some of the failing tests and they are all due to "no module named tqdm" import errors. I would prefer to have a loading bar for these iterations. I have found a few examples of reconstructions with contain 4mil non-finite values. Taking upwards of 1-2 min to replace all values. If users are not given a loading bar they may think the program has stalled on them.

@prjemian
Copy link
Contributor

prjemian commented Nov 1, 2021

Add tqdm to the environments (or other project requirements). It's a good addition that would provide such a progress bar.

@prjemian
Copy link
Contributor

prjemian commented Nov 1, 2021

Also, on the developer side install and use flake8 before each commit/push to ensure your contributions are consistent with PEP8.

@carterbox
Copy link
Member

Since tomopy is a library (not an application), it is out-of-scope to include and UI elements. If you want to provide feedback to the user, you may use the logging module from the standard library. Using the logging module allows end users to customize the verbosity and format of logging messages without editing tomopy's source code.

@WilliamJudge94
Copy link
Contributor Author

That makes sense. Although would you allow me to 1) add in a "disable" variable for the tqdm UI 2) Along with the tomopy standard logging feature?

Copy link
Member

@carterbox carterbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening your first PR!

  • As mentioned above, it is inappropriate to use tqdm in a library.
  • Please add a test of median_filter_nonfinite() in test/test_tomopy/test_misc/test_corr.py. The test should be a simple example which checks that your function does what it claims to do.
  • Add your function to __all__ at the top of the file
  • Add your name to the authors at the top of the file

source/tomopy/misc/corr.py Outdated Show resolved Hide resolved
source/tomopy/misc/corr.py Outdated Show resolved Hide resolved
source/tomopy/misc/corr.py Outdated Show resolved Hide resolved
source/tomopy/misc/corr.py Outdated Show resolved Hide resolved
source/tomopy/misc/corr.py Outdated Show resolved Hide resolved
WilliamJudge94 and others added 5 commits November 1, 2021 17:43
Adding _ to the find_nonfinite_values_prj function since it is not intended to be used by the User

Co-authored-by: Daniel Ching <carterbox@users.noreply.github.com>
Adding _ to the determine_nonfinite_kernel_idxs function since it is not intended to be used by the User

Co-authored-by: Daniel Ching <carterbox@users.noreply.github.com>
@carterbox
Copy link
Member

would you allow me to add in a "disable" variable for the tqdm UI?

No.

The Why

If we included tqdm in our implementation, then tqdm would become a required dependency or lazy import when the end user could easily wrap this function in tqdm with the following:

for i in tqdm(range(len(data))):
    data[(i,)] = filter_median_nonfinite(data[(i,)])

while maintaining full control over the many formatting options that tqdm provides.

Alternative

filter_median_nonfinite can accept a callback function which is called once per iteration. This is very similar to what you have implemented in 040be55, but instead of passing a tqdm instance, you would pass a lambda function.

with tqdm(total=len(data)) as pbar:

    def callback():
        pbar.update(1)

    data = filter_median_nonfinite(data, callback=callback)

This keeps the tomopy codebase simpler and allows end-users to choose another framework besides tqdm.

@WilliamJudge94
Copy link
Contributor Author

WilliamJudge94 commented Nov 1, 2021

  1. Since I have two main loops inside this median_filter function would you like me to have two different callbacks? One for each loop? Or would the solution detailed in 2) be sufficient?

  2. Since I would need a way to pass an iterator_length and iterator_name
    into their callback would it be ok if I changed your callback function into a callback class? Something similar to this:

class CallbackClass():
    
    def callback(self):
        # Allow user to define this but have access to the iterator length and iterator_name
        
    def iterator_len(self):
        # pass in length
        
    def iterator_desc(self):
        # pass in name of loop

This allows me to define callback_loop1 = CallbackClass(), callback_loop2 = CallbackClass() for each loop defined by the function.

@WilliamJudge94
Copy link
Contributor Author

Assuming these commits pass testing, the following is an example of how to properly run the tqdm progress bar with the median_filter_nonfinite() function. I can add this function implementation to the tomopy docs assuming you approve the current implementation.

from tomopy.misc.corr import median_filter_nonfinite
from time import sleep
import numpy as np

data = np.ones(shape=(1000, 1000, 1000))

for i in range(5000):
    x = np.random.randint(0, 100)
    y = np.random.randint(0, 100)
    z = np.random.randint(0, 100)
    data[z, x, y] = np.inf

with tqdm() as pbar:

    def callback(self):
        pbar.total = self.iterator_len
        pbar.set_description(self.iterator_desc)
        pbar.update(1)

        # Callback function is too fast leading to display errors in jupyter
        sleep(0.003)

    data = median_filter_nonfinite(data, callback=callback)

@carterbox
Copy link
Member

carterbox commented Nov 2, 2021

Since I have two main loops inside this median_filter function would you like me to have two different callbacks? One for each loop? Or would the solution detailed in 2) be sufficient?

Why do we need two progress bars? Process one projection at a time so there is only one progress bar needed?

for p in projections:
    nonfinite_idx_store = _find_nonfinite_values_prj(p)
    for indices in zip(nonfinite_idx_store.transpose()):
        # apply filter
    pbar.update(1)

@carterbox
Copy link
Member

carterbox commented Nov 2, 2021

Since I would need a way to pass an iterator_length and iterator_name
into their callback would it be ok if I changed your callback function into a callback class? Something similar to this:

You could define the callback as:

def callback(total, description):
        pbar.total = total
        pbar.set_description(description)
        pbar.update(1)

Then you don't need a custom class?

source/tomopy/misc/corr.py Outdated Show resolved Hide resolved
@WilliamJudge94
Copy link
Contributor Author

I have allowed the median_filter_nonfinite() function to run on a single progress bar instead of two. In addition, I have allowed users to define a callback function that does not depend on a CallbackClass. Below is the proper usage of the callback variable.

with tqdm() as pbar:

    def callback(total, description, unit):
        pbar.total = total
        pbar.set_description(description)
        pbar.unit = unit
        pbar.update(1)

    data = median_filter_nonfinite(data, callback=callback)

test/test_tomopy/test_misc/test_corr.py Outdated Show resolved Hide resolved
source/tomopy/misc/corr.py Outdated Show resolved Hide resolved
source/tomopy/misc/corr.py Outdated Show resolved Hide resolved
source/tomopy/misc/corr.py Outdated Show resolved Hide resolved
source/tomopy/misc/corr.py Outdated Show resolved Hide resolved
test/test_tomopy/test_misc/test_corr.py Outdated Show resolved Hide resolved
WilliamJudge94 and others added 9 commits November 4, 2021 11:53
Modifies tests so that they force the correct ValueError. Ensure that
ValueError is always raised when there is a group of nonfinites larger
than or equal to the filter size. Without making a copy of the current
slice, filters would use already filtered values in the filter.
Copy link
Member

@carterbox carterbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work @WilliamJudge94!

@carterbox carterbox merged commit 75ad0f9 into tomopy:master Nov 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Median Removal Of Nan/Inf Values
4 participants