A new metric for feature selection on short text datasets

dc.authorid0000-0002-7820-413X
dc.authorid0000-0002-4057-934X
dc.contributor.authorCekik, Rasim
dc.contributor.authorUysal, Alper Kursat
dc.date.accessioned2026-01-24T12:30:49Z
dc.date.available2026-01-24T12:30:49Z
dc.date.issued2022
dc.departmentAlanya Alaaddin Keykubat Üniversitesi
dc.description.abstractIn recent years, short texts are everywhere, especially in social media networks. Short text classification is an essential task for various applications related to the operations on short text documents. In many cases, using the entire feature set causes the high dimensionality problem in short text data. This problem reason of time-consuming and negatively impacts the performance of classifiers. This study presents an effective feature selection algorithm called XY method, which represents the features on XY line and calculates the distance of a feature to the XY line. Also, a value named lambda is calculated. According to this value, the terms are divided into different regions such as negative, positive, and third to determine their discrimination capability. The novel XY method aims to select as few terms as possible in the negative region. The proposed method is evaluated using four different short text datasets with Macro-F1 success measure. In comparisons with other existing feature selection algorithms such as chi-square, information gain, deviation from Poisson distribution, recently proposed max-min ratio, and distinguishing feature selector demonstrate that the XY method achieves either better or competitive performance in significantly reduced various feature sizes.
dc.description.sponsorshipEskisehir Technical University, Fund of Scientific Research Projects [20DRP040]
dc.description.sponsorshipEskisehir Technical University, Fund of Scientific Research Projects, Grant/Award Number: 20DRP040
dc.identifier.doi10.1002/cpe.6909
dc.identifier.issn1532-0626
dc.identifier.issn1532-0634
dc.identifier.issue13
dc.identifier.scopus2-s2.0-85126295552
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.1002/cpe.6909
dc.identifier.urihttps://hdl.handle.net/20.500.12868/5448
dc.identifier.volume34
dc.identifier.wosWOS:000769577400001
dc.identifier.wosqualityQ3
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherWiley
dc.relation.ispartofConcurrency and Computation-Practice & Experience
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_WoS_20260121
dc.subjectfeature selection
dc.subjectshort text classification
dc.subjecttext mining
dc.titleA new metric for feature selection on short text datasets
dc.typeArticle

Dosyalar