Origins of Salient Terms
The term 'salient terms' in the information retrieval literature is first referenced in an almost-uncited conference paper from September 2002, Using Salient Words to Perform Categorization of Web Sites [1]. From the same era as PageRank [2], it has received only 1 citation as of the writing of this post. Later the same year, Detecting Similar Documents Using Salient Terms was published by Cooper et al. from IBM Watson [3].
Despite these humble origins, salient terms has become a widely known set of algorithms within industry [4]. Salient terms has come to represent a family of term weighting algorithms with the goal of upweighting terms most central to the subject of a document [4].
For example, this article references both 'salient-terms' and 'page-rank'. But if a user searched for 'page-rank' this document would not be relevant — PageRank is only tangentially involved. The key subject, and most salient term, is 'salient-terms'.
Popularity Salient Terms
Popularity has long been an embarrassingly effective signal, as shown in Cañamares, Castells (2018), winner of SIGIR 2018 best paper award [5].
The Salient Terms algorithm I'd like to bring forward is Popularity Salient Terms. Popularity Salient Terms maintains a count of interactions at a per-item-term level. Using an inverse-index, this count can then be used to retrieve the most interacted items for each term within a query.
Popularity Salient Termskey_db // our distributed keystore database of (term, item_id) : val
key_db.increment(term,item_id) --> increment val by 1
query // user search query
query_terms // user search query split into a list of terms
item_id // unique item ID of item interacted with
// when user interacts with a search item for a given query
...
for term in query_terms:
key_db.increment(term, item_id)
We could make modifications if we want to maintain a moving window for popularity
Moving Window Popularity Salient Termsevent_db // our event logging database
event_db.salient_terms_events // an event table with columns(query_terms, item_id, timestamp)
event_db.salient_terms_events
.insert(query_terms,item_id, timestamp) --> insert a record into salient_terms
event_db.salient_terms_extract // a table we store daily extract
key_db // our distributed keystore database of (term, item_id) : val
key_db.set(term,item_id,new_val) --> set (term,item_id) to new_val
key_db.reset() --> drop records or reset val to 0
query // user search query
query_terms // user search query split into a list of terms
item_id // unique item ID of item interacted with
N // number of days in our moving window
<TODAY-N> , <TODAY> // SQL Macros for the relevant dates
// when user interacts with a search item for a given query
...
db.insert(query_terms, item_id)
// in a daily sql-pipeline
INSERT INTO salient_terms_extract
(
SELECT <TODAY>, exploded_terms.value as term, item_id, COUNT(*)
FROM salient_terms_events LATERAL,
EXPLODE(query_terms) as exploded_terms
WHERE salient_terms_events.timestamp
BETWEEN <TODAY-N> AND <TODAY>
GROUP BY 2,3
)
//extract salient_terms_extract into key_db using some glue code
event_db.read("SELECT * FROM salient_terms_extract WHERE extract_date=<TODAY>")
.map( record -> key_db.insert(record.term, record.item_id, record.val))
Future Directions for Salient Terms, Salient Embeddings
Salient Terms already has a meaningful overlap with embedding vector search as a retrieval approach — both aim to capture the subject of a document and surface it for relevant queries. Salient Terms' main weakness is that it treats both query and document as a bag-of-words. Its main advantage is that it learns quickly from user feedback.
We can consider an algorithm that combines some of these strengths, and call it Salient Embeddings.
Imagine an internet-scale web search engine. Each website already has an embedding vector produced by some base encoder model, frozen, static, derived from content alone. A user searches for "best espresso machines under $200", clicks a result, and dwells on it. That interaction is a signal: this document was relevant to this query. In a traditional setup, you'd log that signal and eventually feed it into a fine-tuning run. This is manual, slow, delayed by days or weeks, and bottlenecked by embedding model capacity.
Salient Embeddings proposes something simpler: directly nudge the item's embedding toward the query embedding at interaction time — in some ways a streaming variant of word2vec.
Salient Embeddings Update Ruleembedding_db // vector store mapping item_id -> embedding vector
query_embedding // dense vector representation of user's search query
item_embedding // current embedding of interacted item
α // learning rate (small, e.g. 0.01)
// when user positively interacts with item for a given query
item_embedding_new = normalize(item_embedding + α * query_embedding)
embedding_db.set(item_id, item_embedding_new)
Over many interactions, a document that consistently satisfies queries about espresso machines drifts — in embedding space — closer to that cluster of queries, even if its text never used those exact terms. The encoder weights are untouched; the learned signal lives in the stored vectors themselves. Decay can be applied by periodically blending back toward the original content embedding, analogous to the moving window in Popularity Salient Terms, and negative interactions can push the embedding in the opposite direction.
Salient Embeddings with Decayembedding_original // frozen content embedding from base model
embedding_learned // current learned embedding (initialized to embedding_original)
α // learning rate
β // decay rate toward original embedding (applied periodically)
// on positive interaction
embedding_learned = normalize(embedding_learned + α * query_embedding)
// on periodic decay step (e.g., daily)
embedding_learned = normalize((1 - β) * embedding_learned + β * embedding_original)
The tradeoffs are real. This isn't far from what contrastive fine-tuning already does — the difference is that fine-tuning updates the encoder weights globally, while this updates individual item vectors locally. The local approach is cheaper and more immediate, but the learned signal doesn't generalize the same way: a nudge toward "espresso machines under $200" doesn't automatically help for "affordable coffee gear" unless those query embeddings are already nearby in the base space. To satisfy more complex or multi-topic search-intents, shifting an item's embedding towards one query may not positively impact performance for every query. Further, the topic for 'head' queries will dominate the embeddings, starving 'tail' queries for relevant results.
The operational cost is also non-trivial: embeddings are no longer static artifacts derivable from content. They become stateful, per-item, and potentially require back-ups for the scale your index operates.
However, the core appeal holds: Salient Embeddings inherits the semantic richness of dense vector search while learning directly from user behavior — no training pipeline, no labeled dataset, no redeployment. Each interaction is a lightweight update that immediately shifts retrieval, combining the fast feedback loop of Popularity Salient Terms with the query-understanding of embedding-based retrieval.
How Salient Terms Fits into the Ranking/Recommendations Stack
In the age of deep learning, why do we still use such simple algorithms? If you're asking this question, you're probably not familiar with web-scale information retrieval in practice.
Across state-of-the-art search and recommendations systems the query-understanding -> query-expansion -> retrieval -> point-wise ranking -> conditional reranking architecture is well established [4][6]. This architecture is also relevant for LLM applications, where RAG pipelines are often re-inventing the wheel of the established state-of-the-art architecture for search and recommendations.
Within the retrieval layer, we need to 'retrieve' a high-recall set of up to several thousand items from a pool of hundreds of millions to trillions of items, within the 250ms users expect from search and recommendations applications [7][8]. Typically, retrieval is achieved by combining a variety of heuristics, collaborative filtering, and dense vector-retrieval that can scale to webscale problems [4][6]. In this stage, we cannot individually evaluate every item, and can only leverage more scalable algorithms. The primary deep learning based algorithm being dense-vector-retrieval. However, even here dense-vector-retrieval typically isn't the highest performing retrieval algorithm [4]. Salient terms fits neatly into the retrieval layer as another performant heuristic algorithm for modelling query<->item relevance.
Salient terms can be used to retrieve the highest weighted documents via a lookup index. Term weights can also feed into heuristic algorithms (e.g. PageRank) or serve as features in downstream ranking models.
Conclusions
It's an unfortunate reality that information retrieval has diverged between academia and industry, with the most significant research and innovation hidden behind the walls of a handful of Silicon Valley companies.
That gap is unlikely to close. Search hasn't been this competitive since the 2000s, with startups like you.com and Perplexity building LLM-first search engines, and ChatGPT shifting how users seek information away from search altogether [9][10][11]. Google's global search market share has dropped to below 90% for the first time since 2015 [12].
Algorithms have never been the strongest competitive moat. Especially with the Silicon Valley exodus of 2023, shuffling employees between tech giants or sending them into new markets across the world [4].
Bibliography
[1] Trabalka, Marek & Bielikova, Maria. (2002). Using Salient Words to Perform Categorization of Web Sites. 2448. 130-154. 10.1007/3-540-46154-X_9.
[2] Page, Lawrence & Brin, Sergey & Motwani, Rajeev & Winograd, Terry. (1998). The PageRank Citation Ranking: Bringing Order to the Web.
[3] Cooper, James & Coden, Anni & Brown, Eric. (2002). Detecting similar documents using salient terms. 245-251. 10.1145/584792.584835.
[4] Bruh just trust me
[5] Cañamares, Rocío & Castells, Pablo. (2018). Should I Follow the Crowd?: A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. 415-424. 10.1145/3209978.3210014.
[6] Delgado, J., & Greyson, P. (2024, March 27). From structured search to learning-to-rank-and-retrieve. Amazon Science. https://www.amazon.science/blog/from-structured-search-to-learning-to-rank-and-retrieve
[7] Miller, R. B. (1968). Response time in man-computer conversational transactions. Proc. AFIPS Fall Joint Computer Conference Vol. 33, 267-277.
[8] Brutlag, Jake & Hutchinson, Hilary & Google, Maria & Inc,. User Preference and Search Engine Latency.
[9] https://you.com/
[10] https://www.perplexity.ai/
[11] https://openai.com/chatgpt
[12] Goodwin, D. (2025, January 20). Google’s search market share drops below 90% for first time since 2015. Search Engine Land. https://searchengineland.com/google-search-market-share-drops-2024-450497