NIH Research Festival
With more than 27 million articles in MEDLINE, retrieving and ranking the most relevant papers for a given query is increasingly challenging. Starting in the 2000’s, the machine learning community have focused on document ranking and created learning-to-rank (L2R) algorithms, demonstrating that robust and accurate relevance models can be built by utilizing various relevance signals and large training datasets. Recently, this technology has matured enough to scale up to real-world applications. In order for L2R to learn a ranking model, it needs a gold standard to target. We one from actual PubMed queries, using the anonymized queries stored in the logs, as well as any actions users subsequently took. There are two main user actions that we consider to indicate that the document is relevant. One is the abstract click, when a user clicks on a document in the list of results matching their query. The other is full text click, which occurs when a user requests the full text, after having clicked on an abstract. We collected about one year and a half worth of logs, and we assigned relevance scores to documents for each query, based on their number of abstract and full text clicks. The gold standard consists of the queries and the corresponding documents, ordered by descending relevance. Finally, we designed a set of more than 150 features that capture the relatedness between the query and the document (e.g., the number of matches), document specifications (e.g., its publication type) and query specifications (e.g., the query length). The objective for L2R is to correctly predict the relevance score of each document in the gold standard, based on this set of features only. We manually analyzed the output of our system by submitting them to experts in various domains. Their encouraging conclusions motivated us to implement it in production. We optimized the pipeline, as it needed to comply with PubMed’s load requirements. It is now able to process about a thousand queries per second at an average of 100ms per query. We measured the performance of our approach and then-current PubMed by calculating the click through rates for each, that is, the proportion of queries where users click at least once on the first page of results. Solr-L2R showed an improvement in terms of click through rates of 10.8% over PubMed. This new PubMed relevance search algorithm has been deployed in PubMed production system and is used when ‘Best Match’ sort order is selected.
Scientific Focus Area: Computational Biology
This page was last updated on Friday, March 26, 2021