Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Representation Scorer (RSX) serves as a centralized scoring system, offering SimClusters or other embedding-based scoring solutions as machine learning features.
- Loading branch information
twitter-team
committed
Apr 28, 2023
1 parent
43cdcf2
commit 5edbbee
Showing
41 changed files
with
2,544 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# This prevents SQ query from grabbing //:all since it traverses up once to find a BUILD |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Representation Scorer # | ||
|
||
**Representation Scorer** (RSX) serves as a centralized scoring system, offering SimClusters or other embedding-based scoring solutions as machine learning features. | ||
|
||
The Representation Scorer acquires user behavior data from the User Signal Service (USS) and extracts embeddings from the Representation Manager (RMS). It then calculates both pairwise and listwise features. These features are used at various stages, including candidate retrieval and ranking. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
#!/bin/bash | ||
|
||
export CANARY_CHECK_ROLE="representation-scorer" | ||
export CANARY_CHECK_NAME="representation-scorer" | ||
export CANARY_CHECK_INSTANCES="0-19" | ||
|
||
python3 relevance-platform/tools/canary_check.py "$@" | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
#!/usr/bin/env bash | ||
|
||
JOB=representation-scorer bazel run --ui_event_filters=-info,-stdout,-stderr --noshow_progress \ | ||
//relevance-platform/src/main/python/deploy -- "$@" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
#!/bin/bash | ||
|
||
set -o nounset | ||
set -eu | ||
|
||
DC="atla" | ||
ROLE="$USER" | ||
SERVICE="representation-scorer" | ||
INSTANCE="0" | ||
KEY="$DC/$ROLE/devel/$SERVICE/$INSTANCE" | ||
|
||
while test $# -gt 0; do | ||
case "$1" in | ||
-h|--help) | ||
echo "$0 Set up an ssh tunnel for $SERVICE remote debugging and disable aurora health checks" | ||
echo " " | ||
echo "See representation-scorer/README.md for details of how to use this script, and go/remote-debug for" | ||
echo "general information about remote debugging in Aurora" | ||
echo " " | ||
echo "Default instance if called with no args:" | ||
echo " $KEY" | ||
echo " " | ||
echo "Positional args:" | ||
echo " $0 [datacentre] [role] [service_name] [instance]" | ||
echo " " | ||
echo "Options:" | ||
echo " -h, --help show brief help" | ||
exit 0 | ||
;; | ||
*) | ||
break | ||
;; | ||
esac | ||
done | ||
|
||
if [ -n "${1-}" ]; then | ||
DC="$1" | ||
fi | ||
|
||
if [ -n "${2-}" ]; then | ||
ROLE="$2" | ||
fi | ||
|
||
if [ -n "${3-}" ]; then | ||
SERVICE="$3" | ||
fi | ||
|
||
if [ -n "${4-}" ]; then | ||
INSTANCE="$4" | ||
fi | ||
|
||
KEY="$DC/$ROLE/devel/$SERVICE/$INSTANCE" | ||
read -p "Set up remote debugger tunnel for $KEY? (y/n) " -r CONFIRM | ||
if [[ ! $CONFIRM =~ ^[Yy]$ ]]; then | ||
echo "Exiting, tunnel not created" | ||
exit 1 | ||
fi | ||
|
||
echo "Disabling health check and opening tunnel. Exit with control-c when you're finished" | ||
CMD="aurora task ssh $KEY -c 'touch .healthchecksnooze' && aurora task ssh $KEY -L '5005:debug' --ssh-options '-N -S none -v '" | ||
|
||
echo "Running $CMD" | ||
eval "$CMD" | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
Representation Scorer (RSX) | ||
########################### | ||
|
||
Overview | ||
======== | ||
|
||
Representation Scorer (RSX) is a StratoFed service which serves scores for pairs of entities (User, Tweet, Topic...) based on some representation of those entities. For example, it serves User-Tweet scores based on the cosine similarity of SimClusters embeddings for each of these. It aims to provide these with low latency and at high scale, to support applications such as scoring for ANN candidate generation and feature hydration via feature store. | ||
|
||
|
||
Current use cases | ||
----------------- | ||
|
||
RSX currently serves traffic for the following use cases: | ||
|
||
- User-Tweet similarity scores for Home ranking, using SimClusters embedding dot product | ||
- Topic-Tweet similarity scores for topical tweet candidate generation and topic social proof, using SimClusters embedding cosine similarity and CERTO scores | ||
- Tweet-Tweet and User-Tweet similarity scores for ANN candidate generation, using SimClusters embedding cosine similarity | ||
- (in development) User-Tweet similarity scores for Home ranking, based on various aggregations of similarities with recent faves, retweets and follows performed by the user | ||
|
||
Getting Started | ||
=============== | ||
|
||
Fetching scores | ||
--------------- | ||
|
||
Scores are served from the recommendations/representation_scorer/score column. | ||
|
||
Using RSX for your application | ||
------------------------------ | ||
|
||
RSX may be a good fit for your application if you need scores based on combinations of SimCluster embeddings for core nouns. We also plan to support other embeddings and scoring approaches in the future. | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
:hidden: | ||
|
||
index | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
jvm_binary( | ||
name = "bin", | ||
basename = "representation-scorer", | ||
main = "com.twitter.representationscorer.RepresentationScorerFedServerMain", | ||
platform = "java8", | ||
tags = ["bazel-compatible"], | ||
dependencies = [ | ||
"finatra/inject/inject-logback/src/main/scala", | ||
"loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback", | ||
"representation-scorer/server/src/main/resources", | ||
"representation-scorer/server/src/main/scala/com/twitter/representationscorer", | ||
"twitter-server/logback-classic/src/main/scala", | ||
], | ||
) | ||
|
||
# Aurora Workflows build phase convention requires a jvm_app named with ${project-name}-app | ||
jvm_app( | ||
name = "representation-scorer-app", | ||
archive = "zip", | ||
binary = ":bin", | ||
tags = ["bazel-compatible"], | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
resources( | ||
sources = [ | ||
"*.xml", | ||
"*.yml", | ||
"com/twitter/slo/slo.json", | ||
"config/*.yml", | ||
], | ||
tags = ["bazel-compatible"], | ||
) |
55 changes: 55 additions & 0 deletions
55
representation-scorer/server/src/main/resources/com/twitter/slo/slo.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
{ | ||
"servers": [ | ||
{ | ||
"name": "strato", | ||
"indicators": [ | ||
{ | ||
"id": "success_rate_3m", | ||
"indicator_type": "SuccessRateIndicator", | ||
"duration": 3, | ||
"duration_unit": "MINUTES" | ||
}, { | ||
"id": "latency_3m_p99", | ||
"indicator_type": "LatencyIndicator", | ||
"duration": 3, | ||
"duration_unit": "MINUTES", | ||
"percentile": 0.99 | ||
} | ||
], | ||
"objectives": [ | ||
{ | ||
"indicator": "success_rate_3m", | ||
"objective_type": "SuccessRateObjective", | ||
"operator": ">=", | ||
"threshold": 0.995 | ||
}, | ||
{ | ||
"indicator": "latency_3m_p99", | ||
"objective_type": "LatencyObjective", | ||
"operator": "<=", | ||
"threshold": 50 | ||
} | ||
], | ||
"long_term_objectives": [ | ||
{ | ||
"id": "success_rate_28_days", | ||
"objective_type": "SuccessRateObjective", | ||
"operator": ">=", | ||
"threshold": 0.993, | ||
"duration": 28, | ||
"duration_unit": "DAYS" | ||
}, | ||
{ | ||
"id": "latency_p99_28_days", | ||
"objective_type": "LatencyObjective", | ||
"operator": "<=", | ||
"threshold": 60, | ||
"duration": 28, | ||
"duration_unit": "DAYS", | ||
"percentile": 0.99 | ||
} | ||
] | ||
} | ||
], | ||
"@version": 1 | ||
} |
155 changes: 155 additions & 0 deletions
155
representation-scorer/server/src/main/resources/config/decider.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,155 @@ | ||
enableLogFavBasedApeEntity20M145KUpdatedEmbeddingCachedStore: | ||
comment: "Enable to use the non-empty store for logFavBasedApeEntity20M145KUpdatedEmbeddingCachedStore (from 0% to 100%). 0 means use EMPTY readable store for all requests." | ||
default_availability: 0 | ||
|
||
enableLogFavBasedApeEntity20M145K2020EmbeddingCachedStore: | ||
comment: "Enable to use the non-empty store for logFavBasedApeEntity20M145K2020EmbeddingCachedStore (from 0% to 100%). 0 means use EMPTY readable store for all requests." | ||
default_availability: 0 | ||
|
||
representation-scorer_forward_dark_traffic: | ||
comment: "Defines the percentage of traffic to forward to diffy-proxy. Set to 0 to disable dark traffic forwarding" | ||
default_availability: 0 | ||
|
||
"representation-scorer_load_shed_non_prod_callers": | ||
comment: "Discard traffic from all non-prod callers" | ||
default_availability: 0 | ||
|
||
enable_log_fav_based_tweet_embedding_20m145k2020_timeouts: | ||
comment: "If enabled, set a timeout on calls to the logFavBased20M145K2020TweetEmbeddingStore" | ||
default_availability: 0 | ||
|
||
log_fav_based_tweet_embedding_20m145k2020_timeout_value_millis: | ||
comment: "The value of this decider defines the timeout (in milliseconds) to use on calls to the logFavBased20M145K2020TweetEmbeddingStore, i.e. 1.50% is 150ms. Only applied if enable_log_fav_based_tweet_embedding_20m145k2020_timeouts is true" | ||
default_availability: 2000 | ||
|
||
enable_log_fav_based_tweet_embedding_20m145kUpdated_timeouts: | ||
comment: "If enabled, set a timeout on calls to the logFavBased20M145KUpdatedTweetEmbeddingStore" | ||
default_availability: 0 | ||
|
||
log_fav_based_tweet_embedding_20m145kUpdated_timeout_value_millis: | ||
comment: "The value of this decider defines the timeout (in milliseconds) to use on calls to the logFavBased20M145KUpdatedTweetEmbeddingStore, i.e. 1.50% is 150ms. Only applied if enable_log_fav_based_tweet_embedding_20m145kUpdated_timeouts is true" | ||
default_availability: 2000 | ||
|
||
enable_cluster_tweet_index_store_timeouts: | ||
comment: "If enabled, set a timeout on calls to the ClusterTweetIndexStore" | ||
default_availability: 0 | ||
|
||
cluster_tweet_index_store_timeout_value_millis: | ||
comment: "The value of this decider defines the timeout (in milliseconds) to use on calls to the ClusterTweetIndexStore, i.e. 1.50% is 150ms. Only applied if enable_cluster_tweet_index_store_timeouts is true" | ||
default_availability: 2000 | ||
|
||
representation_scorer_fetch_signal_share: | ||
comment: "If enabled, fetches share signals from USS" | ||
default_availability: 0 | ||
|
||
representation_scorer_fetch_signal_reply: | ||
comment: "If enabled, fetches reply signals from USS" | ||
default_availability: 0 | ||
|
||
representation_scorer_fetch_signal_original_tweet: | ||
comment: "If enabled, fetches original tweet signals from USS" | ||
default_availability: 0 | ||
|
||
representation_scorer_fetch_signal_video_playback: | ||
comment: "If enabled, fetches video playback signals from USS" | ||
default_availability: 0 | ||
|
||
representation_scorer_fetch_signal_block: | ||
comment: "If enabled, fetches account block signals from USS" | ||
default_availability: 0 | ||
|
||
representation_scorer_fetch_signal_mute: | ||
comment: "If enabled, fetches account mute signals from USS" | ||
default_availability: 0 | ||
|
||
representation_scorer_fetch_signal_report: | ||
comment: "If enabled, fetches tweet report signals from USS" | ||
default_availability: 0 | ||
|
||
representation_scorer_fetch_signal_dont_like: | ||
comment: "If enabled, fetches tweet don't like signals from USS" | ||
default_availability: 0 | ||
|
||
representation_scorer_fetch_signal_see_fewer: | ||
comment: "If enabled, fetches tweet see fewer signals from USS" | ||
default_availability: 0 | ||
|
||
# To create a new decider, add here with the same format and caller's details : "representation-scorer_load_shed_by_caller_id_twtr:{{role}}:{{name}}:{{environment}}:{{cluster}}" | ||
# All the deciders below are generated by this script - ./strato/bin/fed deciders ./ --service-role=representation-scorer --service-name=representation-scorer | ||
# If you need to run the script and paste the output, add only the prod deciders here. Non-prod ones are being taken care of by representation-scorer_load_shed_non_prod_callers | ||
|
||
"representation-scorer_load_shed_by_caller_id_all": | ||
comment: "Reject all traffic from caller id: all" | ||
default_availability: 0 | ||
|
||
"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice-canary:prod:atla": | ||
comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice-canary:prod:atla" | ||
default_availability: 0 | ||
|
||
"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice-canary:prod:pdxa": | ||
comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice-canary:prod:pdxa" | ||
default_availability: 0 | ||
|
||
"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice-send:prod:atla": | ||
comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice-send:prod:atla" | ||
default_availability: 0 | ||
|
||
"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice:prod:atla": | ||
comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice:prod:atla" | ||
default_availability: 0 | ||
|
||
"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice:prod:pdxa": | ||
comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice:prod:pdxa" | ||
default_availability: 0 | ||
|
||
"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice:staging:atla": | ||
comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice:staging:atla" | ||
default_availability: 0 | ||
|
||
"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice:staging:pdxa": | ||
comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice:staging:pdxa" | ||
default_availability: 0 | ||
|
||
"representation-scorer_load_shed_by_caller_id_twtr:svc:home-scorer:home-scorer:prod:atla": | ||
comment: "Reject all traffic from caller id: twtr:svc:home-scorer:home-scorer:prod:atla" | ||
default_availability: 0 | ||
|
||
"representation-scorer_load_shed_by_caller_id_twtr:svc:home-scorer:home-scorer:prod:pdxa": | ||
comment: "Reject all traffic from caller id: twtr:svc:home-scorer:home-scorer:prod:pdxa" | ||
default_availability: 0 | ||
|
||
"representation-scorer_load_shed_by_caller_id_twtr:svc:stratostore:stratoapi:prod:atla": | ||
comment: "Reject all traffic from caller id: twtr:svc:stratostore:stratoapi:prod:atla" | ||
default_availability: 0 | ||
|
||
"representation-scorer_load_shed_by_caller_id_twtr:svc:stratostore:stratoserver:prod:atla": | ||
comment: "Reject all traffic from caller id: twtr:svc:stratostore:stratoserver:prod:atla" | ||
default_availability: 0 | ||
|
||
"representation-scorer_load_shed_by_caller_id_twtr:svc:stratostore:stratoserver:prod:pdxa": | ||
comment: "Reject all traffic from caller id: twtr:svc:stratostore:stratoserver:prod:pdxa" | ||
default_availability: 0 | ||
|
||
"representation-scorer_load_shed_by_caller_id_twtr:svc:timelinescorer:timelinescorer:prod:atla": | ||
comment: "Reject all traffic from caller id: twtr:svc:timelinescorer:timelinescorer:prod:atla" | ||
default_availability: 0 | ||
|
||
"representation-scorer_load_shed_by_caller_id_twtr:svc:timelinescorer:timelinescorer:prod:pdxa": | ||
comment: "Reject all traffic from caller id: twtr:svc:timelinescorer:timelinescorer:prod:pdxa" | ||
default_availability: 0 | ||
|
||
"representation-scorer_load_shed_by_caller_id_twtr:svc:topic-social-proof:topic-social-proof:prod:atla": | ||
comment: "Reject all traffic from caller id: twtr:svc:topic-social-proof:topic-social-proof:prod:atla" | ||
default_availability: 0 | ||
|
||
"representation-scorer_load_shed_by_caller_id_twtr:svc:topic-social-proof:topic-social-proof:prod:pdxa": | ||
comment: "Reject all traffic from caller id: twtr:svc:topic-social-proof:topic-social-proof:prod:pdxa" | ||
default_availability: 0 | ||
|
||
"enable_sim_clusters_embedding_store_timeouts": | ||
comment: "If enabled, set a timeout on calls to the SimClustersEmbeddingStore" | ||
default_availability: 10000 | ||
|
||
sim_clusters_embedding_store_timeout_value_millis: | ||
comment: "The value of this decider defines the timeout (in milliseconds) to use on calls to the SimClustersEmbeddingStore, i.e. 1.50% is 150ms. Only applied if enable_sim_clusters_embedding_store_timeouts is true" | ||
default_availability: 2000 |
Oops, something went wrong.