Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expression iterator #1800

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Expression iterator #1800

wants to merge 15 commits into from

Conversation

xdssio
Copy link
Collaborator

@xdssio xdssio commented Dec 24, 2021

I found myself needing to zip columns all the time.
This makes expressions look and feel a bit more like a list.

  1. Implement __len__
len(df.x) == len(df)
  1. Implement __iterator__
  • I used chunks of 1000

Example


# this holds 1000 values in memory at a time
for x in df.x:
  <x> as values

# This bring all values into memory
for x in df.x.values.tolist(): 
   pass

This makes zip works efficiently :

for x,y in zip(df.x,df.y):
   pass
 

@maartenbreddels
Copy link
Member

This makes zip works efficiently :

for x,y in zip(df.x,df.y):
   pass
 

This will be really efficient actually, since it you have two expressions that depends on a shared calculation, it will execute that twice. It will be more efficient to use:

for i, row in df['x', 'y'].iterrrows():
  print(row['x'], row['y'])

However, I really like the idea of using chunks! iterrows doesn't use that. Is it an idea to merge these ideas, and document them in the tutorial (@JovanVeljanoski ?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants