Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor indexing #54

Merged
merged 3 commits into from
Feb 28, 2016
Merged

Refactor indexing #54

merged 3 commits into from
Feb 28, 2016

Conversation

lokeshh
Copy link
Member

@lokeshh lokeshh commented Feb 19, 2016

Improve the current indexing infrastructure of Daru

# Assume the user is specifing values for index not keys
# Return index object having keys corresponding to values provided
Daru::Index.new key.map { |k| key k }
end
else
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@v0dro I am on my way to improve the indexing infrastructure. I have made a major change in Index behavior. I think an example will illustrate it best:

[3] pry(main)> i = Daru::Index.new([:a, :b, :c])
=> #<Daru::Index:0x00000003f5ca50
 @keys=[:a, :b, :c],
 @relation_hash={:a=>0, :b=>1, :c=>2},
 @size=3>
[4] pry(main)> i[0, 2]
=> #<Daru::Index:0x00000003ed14f0
 @keys=[:a, :c],
 @relation_hash={:a=>0, :c=>1},
 @size=2>

Earlier it used to be:

[4] pry(main)> i = Daru::Index.new([:a, :b, :c])
=> #<Daru::Index:0x000000030be4f8
 @keys=[:a, :b, :c],
 @relation_hash={:a=>0, :b=>1, :c=>2},
 @size=3>
[5] pry(main)> i[0, 2]
=> #<Daru::Index:0x00000003006560
 @keys=[0, 2],
 @relation_hash={0=>0, 2=>1},
 @size=2>

The motive behind this is to remove the frequent checking of type of index and moving the functionality of guessing that the user is perhaps giving the index values (not keys) from Vector class to Index class because it seems more inherent to index.

Are you good with this?

If you are good with this then I plan to the same with MultiIndex and finally remove the conditionals.

@v0dro
Copy link
Member

v0dro commented Feb 20, 2016

Hey could you please pull the latest master and make your changes on top of that?

@v0dro
Copy link
Member

v0dro commented Feb 20, 2016

I like your solution. But are you sure this won't break any other functionality?

Also, we need to make sure that data can be accessed by both index value and element position value. Ranges should be supported too.

If your solution fits into all this AND makes indexing more extensible and less fragile than it is now, do go ahead with the implementation.

@lokeshh
Copy link
Member Author

lokeshh commented Feb 20, 2016

Yes, this is more extensible and less fragile than earlier. Look at this:

    def [](*indexes)
      # Get a proper index object
      indexes = @index[*indexes]

      # If one object is asked return it
      if indexes.is_a? Numeric
        return @data[indexes]
      end

      # Form a new Vector using indexes and return it
      Daru::Vector.new(
        indexes.map { |loc| @data[@index[loc]] },
        name: @name, index: indexes.factor_out, dtype: @dtype)
    end

This approach nicely encapsulates the general behavior. Any new index would simply need follow these rules and it's good to go. And this approach does no harm to access element by index value or element position value. The only thing now is that Index holds this responsibility of deciding whether user gave index value, element position value or range and taking suitable action.

@v0dro
Copy link
Member

v0dro commented Feb 22, 2016

So the indexes Array stores all indexes which the user wants.

The problem is that when you call #map on the indexes object, it again refers to @index to get the actual location in the data. So say for example indexes contains [0,1,5], you'll be passing that into @index, which would produce a wrong result.

@lokeshh
Copy link
Member Author

lokeshh commented Feb 24, 2016

Since I am not sure what you mean, could you please provide an example which you think would produce a wrong result? That would be very helpful.

@lokeshh
Copy link
Member Author

lokeshh commented Feb 24, 2016

If I get what you meant then I think you meant that for a vector v = Daru::Vector.new([1, 2, 3, 4, 5], index: [:a, :b, :c, :d, :e]), v[0, 1, 4] would give wrong output. But this is not the case. Here's the output:

[4] pry(main)> v[0, 1, 4]
=> 
#<Daru::Vector:22398300 @name = nil @size = 3 >
    nil
  a   1
  b   2
  e   5

The only place where this code falls short of I think is with differentiating whether [0, 1, 4] is position index or key values (and too when 0 is key in index). I think this is not hard to fix once I am clear about the desired functionality.

@v0dro
Copy link
Member

v0dro commented Feb 24, 2016

My bad. I had thought that Daru::Index returns actual element positions with the #[] operator.

Your PR looks good the way it is apart from factor_out. Are you done with this feature?

@@ -330,5 +341,11 @@ def values
def inspect
"Daru::MultiIndex:#{self.object_id} (levels: #{levels}\nlabels: #{labels})"
end

def factor_out
# Remove levels not needed for display
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this comment.

@lokeshh
Copy link
Member Author

lokeshh commented Feb 26, 2016

I feel a couple things like following are left to do:

  • give a nice message when index specified is/are invalid. I think broke this functionality while improving [] method.
  • refactor code for #[]= like [].

Should I go on implementing it in this PR or would it be an overkill?

@@ -455,7 +455,7 @@
end

it "returns a Vector if the last level of MultiIndex is tracked" do
expect(@df_mi[:a, :one]).to eq(
expect(@df_mi[:a, :one, :bar]).to eq(
Daru::Vector.new(@vector_arry1, index: @multi_index))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't tracking the last level.

@v0dro
Copy link
Member

v0dro commented Feb 26, 2016

You can revamp the whole index infra for [] and []= in this PR itself. Just remember to keep one feature per commit so our working tree remains clean and readable.

@lokeshh
Copy link
Member Author

lokeshh commented Feb 26, 2016

[]= falls in place every nicely. You would love this. Earlier it used to be:

[1] pry(main)> v = Daru::Vector.new([1,2,3,4], index: [:a, :b, :c, :d])
=> 
#<Daru::Vector:20516460 @name = nil @size = 4 >
    nil
  a   1
  b   2
  c   3
  d   4

[2] pry(main)> v[:a, :b, :c] = 20
=> 20
[3] pry(main)> v
=> 
#<Daru::Vector:20516460 @name = nil @size = 4 >
    nil
  a  20
  b   2
  c   3
  d   4

But now it's

[2] pry(main)> v[:a, :b, :c] = 20
=> 20
[3] pry(main)> v
=> 
#<Daru::Vector:11722400 @name = nil @size = 4 >
    nil
  a  20
  b  20
  c  20
  d   4

@lokeshh lokeshh changed the title [WIP] Refactor indexing Refactor indexing Feb 26, 2016
Daru::DataFrame.rows(
rows, index: @context.index[indexes], order: @context.vectors)
rows, index: new_index, order: @context.vectors)
end
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had no option but to do this because I made Index object return an error whenever invalid indexes are supplied. This really helps in keeping things simple. Moreover I think Index should have this responsibility to raise Exception when invalid indexes are supplied. This makes a lot of sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does. Good fix.

@lokeshh lokeshh changed the title Refactor indexing [WIP] Refactor indexing Feb 26, 2016
@lokeshh lokeshh changed the title [WIP] Refactor indexing Refactor indexing Feb 27, 2016
@v0dro
Copy link
Member

v0dro commented Feb 28, 2016

This looks great! Isn't it necessary to change anything in dataframe [] to reflect the changes?

Will merge once you confirm.

@lokeshh
Copy link
Member Author

lokeshh commented Feb 28, 2016

I don't think it's necessary to change anything in dataframe but a lot of things can be certainly improved. I am currently understanding how the dataframe works inorder to see how it can improved given the changes in the index. Apart from that there's very little chance that something would go wrong with dataframe as all the tests are passing but I haven't taken a thorough look since I do not yet fully understand how it works.

@v0dro
Copy link
Member

v0dro commented Feb 28, 2016

Alright. I'm merging these changes for now. Send a PR with your improvements when you're ready.

v0dro added a commit that referenced this pull request Feb 28, 2016
@v0dro v0dro merged commit bf846e8 into SciRuby:master Feb 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants