Multi-Stream Word-Based Compression Algorithm

Küçük Resim Yok

Tarih

2017

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

IEEE

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

In this article, we present a novel word-based lossless compression algorithm for text files which uses a semi-static model. We named our algorithm as Multi-stream Word-based Compression Algorithm (MWCA), because it stores the compressed forms of the words in three individual streams depending on their frequencies in the text. It also stores two dictionaries and a bit vector as a side information. In our experiments MWCA obtains compression ratio over 3,23 bpc on average and 2,88 bpc on files larger than 50 MB. If a variable length encoder like Huffman Coding is used after MWCA, given ratios will reduce to 2,63 and 2,44 bpc respectively. With the advantage of its multi-stream structure MWCA could become a good solution especially for storing and searching big text data.

Açıklama

2017 International Conference on Computer Science and Engineering (UBMK) -- OCT 05-08, 2017 -- Antalya, TURKEY

Anahtar Kelimeler

Data Compression, Text Compression, Natural-Language Text

Kaynak

2017 International Conference On Computer Science And Engineering (Ubmk)

WoS Q Değeri

N/A

Scopus Q Değeri

N/A

Cilt

Sayı

Künye