Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: changes to label_components function #226

Merged
merged 1 commit into from
Nov 26, 2014

Conversation

meggart
Copy link
Contributor

@meggart meggart commented Nov 25, 2014

There are basically 2 changes in this PR.

  1. The function signature of label_components is extended to work with Albl::AbstractArray{Int}.
  2. In addition the Albl is initialized (zeroed) by the caller and not within the function anymore.

In my typical use cases most of the data is background, so a lot of memory is wasted by allocating the whole output array. I suggest these changes to be able to use other array-like containers like a sparse matrix, a memory-mapped array etc, to write the labels to. A small example would be in this gist https://gist.github.com/meggart/20503e86e18ad21c9bbd

@timholy
Copy link
Member

timholy commented Nov 25, 2014

I certainly support the spirit of this change, but there's an important detail. I was deliberately being a bit lazy here: I confined it to Arrays in part because I made use of linear indexing, and for anything that doesn't have efficient linear indexing (e.g., SubArrays) the performance will stink. So I guess the question is, in conjunction with this change do we need to switch to cartesian indexing?

@meggart
Copy link
Contributor Author

meggart commented Nov 26, 2014

I see, I didn't think of Subarrays. I was also mostly using and looking at the second version of the function for general connectivities which is already using Cartesian indexing in the main loop.

How large is actually the performance penalty when indexing linearly into a SubArray? Should I do some benchmarks? I could also try to change the function to use linear indexing.

What I actually wanted to achieve, is some behavior like Matlabs bwconncomp, which returns a cell array of vectors with indices for each component. I think this change would get me there, but I would also be glad to hear other ideas how to do this more efficiently.

@timholy
Copy link
Member

timholy commented Nov 26, 2014

How large is actually the performance penalty when indexing linearly into a SubArray?

Pretty big in 2d, worse in 3d. Example:

julia> A = rand(1000,1000);

julia> B = sub(A, 2:999, 2:999);

julia> function mysum(A)
           s = 0.0
           for i = 1:length(A)
               s += A[i]
           end
           s
       end
mysum (generic function with 1 method)

julia> mysum(A)
499876.9882984452

julia> mysum(B)
497909.8790759961

julia> @time mysum(A)
elapsed time: 0.004373402 seconds (13896 bytes allocated)
499876.9882984452

julia> @time mysum(B)
elapsed time: 0.028761602 seconds (96 bytes allocated)
497909.8790759961

julia> function mysum2(A::AbstractMatrix)
           s = 0.0
           for j = 1:size(A,2), i = 1:size(A,1)
               s += A[i,j]
           end
           s
       end
mysum2 (generic function with 1 method)

julia> mysum2(A)
499876.9882984452

julia> mysum2(B)
497909.8790759961

julia> @time mysum2(A)
elapsed time: 0.003726706 seconds (96 bytes allocated)
499876.9882984452

julia> @time mysum2(B)
elapsed time: 0.008171102 seconds (96 bytes allocated)
497909.8790759961

The penalty increases by another factor of 2 for 3d.

But now I understand your intention much better---for what you want to achieve, this works just fine, so let's not worry about the SubArray case until someone needs it.

timholy added a commit that referenced this pull request Nov 26, 2014
Suggestion: changes to label_components function
@timholy timholy merged commit bc6dd31 into JuliaImages:master Nov 26, 2014
@timholy
Copy link
Member

timholy commented Nov 26, 2014

If you want something more like bwconncomp, you could alternatively create an array type that reverses the typical meaning of setindex!:

type ComponentList <: AbstractVector{Vector{Int}}
    components::Vector{Int}
end

function setindex!(A::ComponentList, v::Int, i::Int)
    if v > length(A.components)
        sizehint(A.components, v)
        for i = length(A.components)+1:v
            push!(A.components, Int[])
        end
    end
    push!(A.components[v], i)
    A
end

@timholy
Copy link
Member

timholy commented Nov 26, 2014

Maybe we should add this to Images?

@meggart
Copy link
Contributor Author

meggart commented Nov 26, 2014

This would be great to have in Images. I just can't see yet how to write an efficient getindex function for this. Would one have run findfirst on all components?
This is why I thought of a Dict-based container...

@timholy
Copy link
Member

timholy commented Nov 26, 2014

You're saying you want something better than bwconncomp, then? 😄 (The representation returned by bwconncomp doesn't allow for an efficient getindex, either.)

There seem to be three options:

  • use a Dict as in your design. This makes getindex fairly efficient, but is not very efficient at finding all voxels with the same label.
  • use a container per label. This has the opposite performance characteristics. We could use a Set{Int} instead of a Vector{Int} for each component, so that searches are more efficient.
  • store both representations, so that both operations are efficient, at the cost of slower construction and larger storage.

@meggart
Copy link
Contributor Author

meggart commented Nov 26, 2014

I think I am going for option 3, using a Dict-based design during the construction and at the end build the container per label struct. Using option 2 significantly throws down performance when the array is accessed in line 43

newlabel = Albl[k-offsets_d]

while the performance penalty for the Dict is still acceptable compared to the memory gain which here is more important to me. I will make another PR once I have tested this and wrapped it into a function. Thank you very much for your help.

@timholy
Copy link
Member

timholy commented Nov 26, 2014

Oh, right; I had forgotten this needs to access the previously-written values. Yes, your approach seems reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants