A Static Dictionary-Based Approach To Compressing Short Texts

Küçük Resim Yok

Tarih

2021

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Institute of Electrical and Electronics Engineers Inc.

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

In this study, Static Dictionary Compression (SDC) method, which is an approach developed to compress short texts, is proposed. The word-based static dictionaries used in this approach were obtained from clusters formed as a result of running a clustering method repeatedly until certain criteria are met. Short text is compressed with the dictionary that has the largest number of words in common with it. It has been shown by tests conducted with datasets containing short texts in 6 different languages that the proposed method compresses better than the general purpose compression methods Gzip, Bzip2, Zstd and PPMd. In the tests made with the data set containing only English short texts, it has been shown that the SDC method can compress better than the smza, shoco and b64pack methods used to compress short texts, and Brotli, which gives good results in short texts because it uses a static dictionary. © 2021 IEEE

Açıklama

6th International Conference on Computer Science and Engineering, UBMK 2021 -- 15 September 2021 through 17 September 2021 -- -- 176826

Anahtar Kelimeler

K-Means; Machine Learning; Sdc; Short Text Compression; Static Dictionary Compression, K-Means Clustering; Statistical Tests; Clustering Methods; Compression Methods; Dictionary Compression; K-Means; Short Text Compression; Short Texts; Static Dictionary Compression; Text Compressions; Machine Learning

Kaynak

Proceedings - 6th International Conference on Computer Science and Engineering, UBMK 2021

WoS Q Değeri

Scopus Q Değeri

N/A

Cilt

Sayı

Künye