Member-only story
Introduction to Search Relevance Models
Information Retrieval with Term Frequency and TF-IDF Models
One of the core tasks in information retrieval is searching. Anyone who deals with large amounts of text data (and that’s almost all of us) knows how difficult this seemingly simple task can be. If your search term is too broad, you may find yourself sifting through an impossible quantity of documents. And if your search term is too narrow, you could be missing out on relevant results. So how do we decide which documents are the most relevant to our search?
Search relevance is a difficult problem — and modern search engines employ highly sophisticated (and proprietary) algorithms to deal with the issue. We won’t delve into those algorithms, but let’s look at some simple strategies that you might employ in your own information retrieval applications.
If you want to follow along with the full code and dataset for this article, check out the companion notebook, which includes functions for loading, manipulating, and analyzing term-document matrices and term frequency-inverse document frequency matrices. And if you want to learn more about information retrieval, Introduction to Information Retrieval by Christopher D. Manning should be your first stop.