Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"joinLeft" returns dataframe that smaller than left_df #78

Open
GBJim opened this issue Nov 19, 2019 · 1 comment
Open

"joinLeft" returns dataframe that smaller than left_df #78

GBJim opened this issue Nov 19, 2019 · 1 comment

Comments

@GBJim
Copy link

GBJim commented Nov 19, 2019

Hi all:

I found that leftJoin generates df that smaller than the left df

[In] [1]:  joined_flint = left_flint.leftJoin(right_flint, tolerance=tolerance, key=by)  
[In] [2]:  print (joined_flint.count() < left_flint.count())
True

I consider this is a false result since left join does not drop any row in the left table.
Any explanation or suggestion?

@placeybordeaux
Copy link

leftJoin doesn't have the same semantics as sql left join.

I thought this was really confusing as well.

futureLeftJoin A function performs the temporal future left-join to the right TimeSeriesRDD, i.e. left-join using inexact timestamp matches. For each row in the left, appends the closest future row from the right at or after the same time.

This means that if there were no rows with the matching key within the tolerance it will won't return any rows for that instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants