Improve Term Weighting for Text Classification

Authors

  • Abdulmohsen Algarni Author

Keywords:

Feature Selection, Classification, term weighting

Abstract

A large number of features can be extracted from text documents. The extracted features are mixed between positive, negative, and noise features. Improving the quality of extracted features can be a challenging task. The feature selection technique used in text classification is the team-based approach. The term “Frequency-Inverse Document Frequency” (TF-IDF) is widely used to extract all features from text documents. Based on the weight of the extracted features, the most important features can be selected. However, the selected features are based on the frequency of the term in the documents regardless of the importance of the feature. In this paper, we proposed a new method based on TF-IDF to improve the quality of extracted features and revise the weight. The extracted terms from the text are classified based on their importance. Then, the weight of the features can be revised based on the class that the feature belongs to. The proposed model shows significant improvement in the evaluation measures with an average of 3.6 in F-measure.

 

Downloads

Published

2023-11-14

Issue

Section

Articles