The descriptors are called the tags and the
automatic assignment of the descriptors to the given tokens is called tagging.
The process of assigning one of the parts of
speech to the given word is called Parts Of Speech tagging, commonly referred
to as POS tagging. Parts of speech include nouns, verbs, adverbs, adjectives,
pronouns, conjunction and their sub-categories
Tagger (POS Tagger) is a software that
reads text and then assigns parts of speech to each word (and other token),
such as noun, verb, adjective, etc., It uses different kinds of information
such as dictionary, lexicons, rules, etc. because dictionaries
have category or categories of a particular word, that is a word may belong to
more than one category. For example, run is both noun and verb so to solve this
ambiguity taggers use probabilistic information.
There are mainly
two type of taggers:
Rule-based – Uses
hand-written rules to distinguish the tag ambiguity.
taggers are either HMM based – chooses the tag sequence which maximizes the
product of word likelihood and tag sequence probability, or cue-based, using
decision trees or maximum entropy models to combine probabilistic features.