[TEXTOM Manual] TF-IDF value calculation formula

The TF-IDF calculation formula provided by TEXTOM is as follows:

TF: Term Frequency - the frequency of the word
ln: Natural logarithm
D: Total number of documents
DF: Number of documents containing the word

TF-IDF (Term Frequency – Inverse Document Frequency) is a statistical measure indicating how important a word is in a specific document among a set of documents. It is often used in conjunction with morphological analysis.

  • It is an algorithm that assigns scores to all words used in a sentence. The higher the word frequency in a specific document and the lower the number of documents containing the word in the entire set of documents, the higher the TF-IDF value. Therefore, using this value, common words that appear in all documents can be filtered out, and the key terms of the document can be extracted.

- TF (Term Frequency): Represents how often the word appears in the entire document.

- IDF (Inverse Document Frequency): Represents the reciprocal of DF, i.e., the total number of documents divided by the number of documents where the word appears. It shows how commonly a word appears across the document set, and if a word is frequently used within the document set, it means the word is common.


이 블로그의 인기 게시물

[Notice] TEXTOM Global Launch Time Announcement