[TEXTOM Manual] TF-IDF value calculation formula

The TF-IDF calculation formula provided by TEXTOM is as follows:

TF: Term Frequency - the frequency of the word

ln: Natural logarithm
D: Total number of documents
DF: Number of documents containing the word

TF-IDF (Term Frequency – Inverse Document Frequency) is a statistical measure indicating how important a word is in a specific document among a set of documents. It is often used in conjunction with morphological analysis.

It is an algorithm that assigns scores to all words used in a sentence. The higher the word frequency in a specific document and the lower the number of documents containing the word in the entire set of documents, the higher the TF-IDF value. Therefore, using this value, common words that appear in all documents can be filtered out, and the key terms of the document can be extracted.

- TF (Term Frequency): Represents how often the word appears in the entire document.

- IDF (Inverse Document Frequency): Represents the reciprocal of DF, i.e., the total number of documents divided by the number of documents where the word appears. It shows how commonly a word appears across the document set, and if a word is frequently used within the document set, it means the word is common.

[TEXTOM Manual] TF-IDF value calculation formula

Posted by TEXTOM

Post a Comment

0 Comments

TEXTOM

Search This Blog

Popular posts

[Analysis Report] 2023 Israel-Hamas War: Armed Conflict, What is Hamas?

Bitcoin Hits All-Time High!📈