A Static Dictionary-Based Approach To Compressing Short Texts
Küçük Resim Yok
Tarih
2021
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Institute of Electrical and Electronics Engineers Inc.
Erişim Hakkı
info:eu-repo/semantics/closedAccess
Özet
In this study, Static Dictionary Compression (SDC) method, which is an approach developed to compress short texts, is proposed. The word-based static dictionaries used in this approach were obtained from clusters formed as a result of running a clustering method repeatedly until certain criteria are met. Short text is compressed with the dictionary that has the largest number of words in common with it. It has been shown by tests conducted with datasets containing short texts in 6 different languages that the proposed method compresses better than the general purpose compression methods Gzip, Bzip2, Zstd and PPMd. In the tests made with the data set containing only English short texts, it has been shown that the SDC method can compress better than the smza, shoco and b64pack methods used to compress short texts, and Brotli, which gives good results in short texts because it uses a static dictionary. © 2021 IEEE
Açıklama
6th International Conference on Computer Science and Engineering, UBMK 2021 -- 15 September 2021 through 17 September 2021 -- -- 176826
Anahtar Kelimeler
K-Means; Machine Learning; Sdc; Short Text Compression; Static Dictionary Compression, K-Means Clustering; Statistical Tests; Clustering Methods; Compression Methods; Dictionary Compression; K-Means; Short Text Compression; Short Texts; Static Dictionary Compression; Text Compressions; Machine Learning
Kaynak
Proceedings - 6th International Conference on Computer Science and Engineering, UBMK 2021
WoS Q Değeri
Scopus Q Değeri
N/A