Stemming

Stemming is a feature that shows you more search results that might be relevant to you. For example, if you search for “print”, it also finds documents containing “prints”, “printer”, “printed”, “printers”, or “printing”. 

Of course, you might just want pages containing the exact word “Print”, in which case you can simply uncheck the Stemming checkbox! 

Some supermarket websites include this feature. When you search for “apples” they often show you pages for apple juice, apple cider, apple cake, and apple pies before you get to the page where you can buy a bag of apples ðŸ¤¦

TARILIO includes stemming for 30 languages, which can significantly enhance search recall across diverse datasets.

Technically, stemming improves search recall, meaning it will return more search results that might be relevant to you. 

Stemming rule files are plain text files containing a list of several types of rule:

Suffix exceptions: e.g. ss -> ss, or news -> news
The first rule will never remove ss from a word, the second rule ensures news remains intact, both
rules avoid certain words from being stripped of the final s by the rule 3+s ->s which is designed to change plurals into the singular form. These type of rules are placed early in the list.

Suffix stripping: e.g. 4+ing ->
This rule removes ing from the end of a word with seven or more letters. 

Suffix substitution: e.g. 3+ies -> y
This rule converts applies into apply for example.

Each rule is applied in turn until one is found that applies.  After applying a rule, the process is repeated until the word does not change.

Another way of improving search recall is by using synonyms.