Multi-Stream Word-Based Compression Algorithm
Küçük Resim Yok
Tarih
2017
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
IEEE
Erişim Hakkı
info:eu-repo/semantics/closedAccess
Özet
In this article, we present a novel word-based lossless compression algorithm for text files which uses a semi-static model. We named our algorithm as Multi-stream Word-based Compression Algorithm (MWCA), because it stores the compressed forms of the words in three individual streams depending on their frequencies in the text. It also stores two dictionaries and a bit vector as a side information. In our experiments MWCA obtains compression ratio over 3,23 bpc on average and 2,88 bpc on files larger than 50 MB. If a variable length encoder like Huffman Coding is used after MWCA, given ratios will reduce to 2,63 and 2,44 bpc respectively. With the advantage of its multi-stream structure MWCA could become a good solution especially for storing and searching big text data.
Açıklama
2017 International Conference on Computer Science and Engineering (UBMK) -- OCT 05-08, 2017 -- Antalya, TURKEY
Anahtar Kelimeler
Data Compression, Text Compression, Natural-Language Text
Kaynak
2017 International Conference On Computer Science And Engineering (Ubmk)
WoS Q Değeri
N/A
Scopus Q Değeri
N/A