Suggestion: changes to label_components function #226

meggart · 2014-11-25T16:19:11Z

There are basically 2 changes in this PR.

The function signature of label_components is extended to work with Albl::AbstractArray{Int}.
In addition the Albl is initialized (zeroed) by the caller and not within the function anymore.

In my typical use cases most of the data is background, so a lot of memory is wasted by allocating the whole output array. I suggest these changes to be able to use other array-like containers like a sparse matrix, a memory-mapped array etc, to write the labels to. A small example would be in this gist https://gist.github.com/meggart/20503e86e18ad21c9bbd

…nature

timholy · 2014-11-25T18:55:52Z

I certainly support the spirit of this change, but there's an important detail. I was deliberately being a bit lazy here: I confined it to Arrays in part because I made use of linear indexing, and for anything that doesn't have efficient linear indexing (e.g., SubArrays) the performance will stink. So I guess the question is, in conjunction with this change do we need to switch to cartesian indexing?

meggart · 2014-11-26T10:18:37Z

I see, I didn't think of Subarrays. I was also mostly using and looking at the second version of the function for general connectivities which is already using Cartesian indexing in the main loop.

How large is actually the performance penalty when indexing linearly into a SubArray? Should I do some benchmarks? I could also try to change the function to use linear indexing.

What I actually wanted to achieve, is some behavior like Matlabs bwconncomp, which returns a cell array of vectors with indices for each component. I think this change would get me there, but I would also be glad to hear other ideas how to do this more efficiently.

timholy · 2014-11-26T10:34:57Z

How large is actually the performance penalty when indexing linearly into a SubArray?

Pretty big in 2d, worse in 3d. Example:

julia> A = rand(1000,1000);

julia> B = sub(A, 2:999, 2:999);

julia> function mysum(A)
           s = 0.0
           for i = 1:length(A)
               s += A[i]
           end
           s
       end
mysum (generic function with 1 method)

julia> mysum(A)
499876.9882984452

julia> mysum(B)
497909.8790759961

julia> @time mysum(A)
elapsed time: 0.004373402 seconds (13896 bytes allocated)
499876.9882984452

julia> @time mysum(B)
elapsed time: 0.028761602 seconds (96 bytes allocated)
497909.8790759961

julia> function mysum2(A::AbstractMatrix)
           s = 0.0
           for j = 1:size(A,2), i = 1:size(A,1)
               s += A[i,j]
           end
           s
       end
mysum2 (generic function with 1 method)

julia> mysum2(A)
499876.9882984452

julia> mysum2(B)
497909.8790759961

julia> @time mysum2(A)
elapsed time: 0.003726706 seconds (96 bytes allocated)
499876.9882984452

julia> @time mysum2(B)
elapsed time: 0.008171102 seconds (96 bytes allocated)
497909.8790759961

The penalty increases by another factor of 2 for 3d.

But now I understand your intention much better---for what you want to achieve, this works just fine, so let's not worry about the SubArray case until someone needs it.

Suggestion: changes to label_components function

timholy · 2014-11-26T10:43:23Z

If you want something more like bwconncomp, you could alternatively create an array type that reverses the typical meaning of setindex!:

type ComponentList <: AbstractVector{Vector{Int}}
    components::Vector{Int}
end

function setindex!(A::ComponentList, v::Int, i::Int)
    if v > length(A.components)
        sizehint(A.components, v)
        for i = length(A.components)+1:v
            push!(A.components, Int[])
        end
    end
    push!(A.components[v], i)
    A
end

timholy · 2014-11-26T10:46:09Z

Maybe we should add this to Images?

meggart · 2014-11-26T11:17:09Z

This would be great to have in Images. I just can't see yet how to write an efficient getindex function for this. Would one have run findfirst on all components?
This is why I thought of a Dict-based container...

timholy · 2014-11-26T11:56:33Z

You're saying you want something better than bwconncomp, then? 😄 (The representation returned by bwconncomp doesn't allow for an efficient getindex, either.)

There seem to be three options:

use a Dict as in your design. This makes getindex fairly efficient, but is not very efficient at finding all voxels with the same label.
use a container per label. This has the opposite performance characteristics. We could use a Set{Int} instead of a Vector{Int} for each component, so that searches are more efficient.
store both representations, so that both operations are efficient, at the cost of slower construction and larger storage.

meggart · 2014-11-26T13:36:15Z

I think I am going for option 3, using a Dict-based design during the construction and at the end build the container per label struct. Using option 2 significantly throws down performance when the array is accessed in line 43

newlabel = Albl[k-offsets_d]

while the performance penalty for the Dict is still acceptable compared to the memory gain which here is more important to me. I will make another PR once I have tested this and wrapped it into a function. Thank you very much for your help.

timholy · 2014-11-26T13:51:53Z

Oh, right; I had forgotten this needs to access the previously-written values. Yes, your approach seems reasonable.

label_components initiate Albl with zeros and generalize function sig…

d6eabd6

…nature

timholy added a commit that referenced this pull request Nov 26, 2014

Merge pull request #226 from meggart/genconn

bc6dd31

Suggestion: changes to label_components function

timholy merged commit bc6dd31 into JuliaImages:master Nov 26, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: changes to label_components function #226

Suggestion: changes to label_components function #226

meggart commented Nov 25, 2014

timholy commented Nov 25, 2014

meggart commented Nov 26, 2014

timholy commented Nov 26, 2014

timholy commented Nov 26, 2014

timholy commented Nov 26, 2014

meggart commented Nov 26, 2014

timholy commented Nov 26, 2014

meggart commented Nov 26, 2014

timholy commented Nov 26, 2014

Suggestion: changes to label_components function #226

Suggestion: changes to label_components function #226

Conversation

meggart commented Nov 25, 2014

timholy commented Nov 25, 2014

meggart commented Nov 26, 2014

timholy commented Nov 26, 2014

timholy commented Nov 26, 2014

timholy commented Nov 26, 2014

meggart commented Nov 26, 2014

timholy commented Nov 26, 2014

meggart commented Nov 26, 2014

timholy commented Nov 26, 2014