-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor indexing #54
Conversation
# Assume the user is specifing values for index not keys | ||
# Return index object having keys corresponding to values provided | ||
Daru::Index.new key.map { |k| key k } | ||
end | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@v0dro I am on my way to improve the indexing infrastructure. I have made a major change in Index behavior. I think an example will illustrate it best:
[3] pry(main)> i = Daru::Index.new([:a, :b, :c])
=> #<Daru::Index:0x00000003f5ca50
@keys=[:a, :b, :c],
@relation_hash={:a=>0, :b=>1, :c=>2},
@size=3>
[4] pry(main)> i[0, 2]
=> #<Daru::Index:0x00000003ed14f0
@keys=[:a, :c],
@relation_hash={:a=>0, :c=>1},
@size=2>
Earlier it used to be:
[4] pry(main)> i = Daru::Index.new([:a, :b, :c])
=> #<Daru::Index:0x000000030be4f8
@keys=[:a, :b, :c],
@relation_hash={:a=>0, :b=>1, :c=>2},
@size=3>
[5] pry(main)> i[0, 2]
=> #<Daru::Index:0x00000003006560
@keys=[0, 2],
@relation_hash={0=>0, 2=>1},
@size=2>
The motive behind this is to remove the frequent checking of type of index and moving the functionality of guessing that the user is perhaps giving the index values (not keys) from Vector
class to Index
class because it seems more inherent to index.
Are you good with this?
If you are good with this then I plan to the same with MultiIndex
and finally remove the conditionals.
Hey could you please pull the latest master and make your changes on top of that? |
I like your solution. But are you sure this won't break any other functionality? Also, we need to make sure that data can be accessed by both index value and element position value. Ranges should be supported too. If your solution fits into all this AND makes indexing more extensible and less fragile than it is now, do go ahead with the implementation. |
Yes, this is more extensible and less fragile than earlier. Look at this: def [](*indexes)
# Get a proper index object
indexes = @index[*indexes]
# If one object is asked return it
if indexes.is_a? Numeric
return @data[indexes]
end
# Form a new Vector using indexes and return it
Daru::Vector.new(
indexes.map { |loc| @data[@index[loc]] },
name: @name, index: indexes.factor_out, dtype: @dtype)
end This approach nicely encapsulates the general behavior. Any new index would simply need follow these rules and it's good to go. And this approach does no harm to access element by index value or element position value. The only thing now is that Index holds this responsibility of deciding whether user gave index value, element position value or range and taking suitable action. |
So the The problem is that when you call |
Since I am not sure what you mean, could you please provide an example which you think would produce a wrong result? That would be very helpful. |
If I get what you meant then I think you meant that for a vector [4] pry(main)> v[0, 1, 4]
=>
#<Daru::Vector:22398300 @name = nil @size = 3 >
nil
a 1
b 2
e 5 The only place where this code falls short of I think is with differentiating whether |
My bad. I had thought that Your PR looks good the way it is apart from |
@@ -330,5 +341,11 @@ def values | |||
def inspect | |||
"Daru::MultiIndex:#{self.object_id} (levels: #{levels}\nlabels: #{labels})" | |||
end | |||
|
|||
def factor_out | |||
# Remove levels not needed for display |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this comment.
I feel a couple things like following are left to do:
Should I go on implementing it in this PR or would it be an overkill? |
@@ -455,7 +455,7 @@ | |||
end | |||
|
|||
it "returns a Vector if the last level of MultiIndex is tracked" do | |||
expect(@df_mi[:a, :one]).to eq( | |||
expect(@df_mi[:a, :one, :bar]).to eq( | |||
Daru::Vector.new(@vector_arry1, index: @multi_index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't tracking the last level.
You can revamp the whole index infra for |
[1] pry(main)> v = Daru::Vector.new([1,2,3,4], index: [:a, :b, :c, :d])
=>
#<Daru::Vector:20516460 @name = nil @size = 4 >
nil
a 1
b 2
c 3
d 4
[2] pry(main)> v[:a, :b, :c] = 20
=> 20
[3] pry(main)> v
=>
#<Daru::Vector:20516460 @name = nil @size = 4 >
nil
a 20
b 2
c 3
d 4 But now it's [2] pry(main)> v[:a, :b, :c] = 20
=> 20
[3] pry(main)> v
=>
#<Daru::Vector:11722400 @name = nil @size = 4 >
nil
a 20
b 20
c 20
d 4 |
Daru::DataFrame.rows( | ||
rows, index: @context.index[indexes], order: @context.vectors) | ||
rows, index: new_index, order: @context.vectors) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had no option but to do this because I made Index object return an error whenever invalid indexes are supplied. This really helps in keeping things simple. Moreover I think Index should have this responsibility to raise Exception when invalid indexes are supplied. This makes a lot of sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it does. Good fix.
This looks great! Isn't it necessary to change anything in dataframe Will merge once you confirm. |
I don't think it's necessary to change anything in dataframe but a lot of things can be certainly improved. I am currently understanding how the dataframe works inorder to see how it can improved given the changes in the index. Apart from that there's very little chance that something would go wrong with dataframe as all the tests are passing but I haven't taken a thorough look since I do not yet fully understand how it works. |
Alright. I'm merging these changes for now. Send a PR with your improvements when you're ready. |
Improve the current indexing infrastructure of Daru