Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copyedit: Using Cross-Encoders as reranker #281

Merged
merged 1 commit into from
Dec 14, 2022
Merged

Conversation

dandv
Copy link
Contributor

@dandv dandv commented Nov 21, 2022

Type of change:

  • Documentation updates (non-breaking change which updates documents)

https://weaviate.io/blog/2022/08/Using-Cross-Encoders-as-reranker-in-multistage-vector-search.html

Higher-level feedback on the article

  • would help to explain on the first use why "Bi-Encoder" and "Cross-Encoder" models are named that way
  • would it help to explain "pooling" from the Bi-Encoder diagram?
  • the article refers to a "Multistage search pipeline", but only two stages are listed. Would "two-stage" be more accurate, or can other stages be added, so the general term is preferred?

CC @laura-ham

@dandv dandv requested a review from laura-ham November 21, 2022 05:04
@sebawita
Copy link
Collaborator

hey @laura-ham do you have any ideas for Dan's questions?

@laura-ham
Copy link
Contributor

Thanks for the suggestions and improvements @dandv!

Here are my thoughts about your additional comments:

  1. would help to explain on the first use why "Bi-Encoder" and "Cross-Encoder" models are named that way

Sure! I don't know if it adds value, but if you do think so then let's add it :)

  1. would it help to explain "pooling" from the Bi-Encoder diagram?

Yeah good point, let's add a line or 2 about that.

  1. the article refers to a "Multistage search pipeline", but only two stages are listed. Would "two-stage" be more accurate, or can other stages be added, so the general term is preferred?

Hmm I'll keep it to "Multistage", exactly for that reason that the pipeline can (and will) be longer in the future. So for consistency I would use "Multistage".

Regarding 1 and 2, do you feel comfortable adding that or do you want me to do that?

@dandv
Copy link
Contributor Author

dandv commented Nov 22, 2022

Thanks for the quick reply!

I was asking about "bi-encoders" because the term is unintuitive to me, and the SBERT article on Bi-Encoder vs. Cross-Encoder doesn't explain it either. From what I've asked around, it doesn't refer to transformer encoders placing attention both before and after the token, either. Does "bi" mean anything here (perhaps the fact that for scoring sentence similarity, bi-encoders work on pairs of inputs?), or are the names "bi-encoder" and "cross-encoder" just not that descriptive?

Regarding 1 and 2, do you feel comfortable adding that or do you want me to do that?

Given my insufficient knowledge about the topic at the moment, I'd rather you make those additions :)

@dandv dandv changed the title Copyedit Cross-Encoders are reranker Copyedit Using Cross-Encoders as reranker Dec 6, 2022
@dandv dandv changed the title Copyedit Using Cross-Encoders as reranker copyedit: Using Cross-Encoders as reranker Dec 14, 2022
@databyjp
Copy link
Contributor

databyjp commented Dec 14, 2022

Hi @dandv - I am merging the copyedit and opened a new issue #316 to capture your feedback & discussion.

@databyjp databyjp merged commit ac74987 into main Dec 14, 2022
@databyjp databyjp deleted the cross-encoders-copyedit branch December 14, 2022 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants