What is TF-IDF
-
This post is deleted! -
TF-IDF is a measure of how uncommon a term is against expectations inferred from a large dataset. In reference to a search engine like Google the large dataset is the Google Index. So if Google finds that a given keyword, let's say "non-alcoholic cocktail", is uncommon even in documents in which it does appear then it might be given a greater weighting in a page in which it appears.
The same page which contains our example keyword, "non-alcoholic cocktail', might also contain another keyword, for example "cocktail", as many times as it contains our keyword and even in the same elements, headings and meta description. So you might expect that the 2 keywords would have equal importance in the eye of the search engine but due to TF-IDF this isn't necessarily true - "non-alcoholic cocktail" would be perceived as having higher importance due to its relative (to the greater dataset of the entire index) scarcity.