Multi-Stream Word-Based Compression Algorithm

dc.authoridÖztürk, Emir/0000-0002-3734-5171
dc.authoridMesut, Altan/0000-0002-1477-3093;
dc.authorwosidÖztürk, Emir/Z-1726-2018
dc.authorwosidMesut, Altan/AAE-8734-2019
dc.authorwosidDiri, Banu/AAA-1020-2021
dc.contributor.authorOzturk, Emir
dc.contributor.authorMesut, Altan
dc.contributor.authorDiri, Banu
dc.date.accessioned2024-06-12T10:59:09Z
dc.date.available2024-06-12T10:59:09Z
dc.date.issued2017
dc.departmentTrakya Üniversitesien_US
dc.description2017 International Conference on Computer Science and Engineering (UBMK) -- OCT 05-08, 2017 -- Antalya, TURKEYen_US
dc.description.abstractIn this article, we present a novel word-based lossless compression algorithm for text files which uses a semi-static model. We named our algorithm as Multi-stream Word-based Compression Algorithm (MWCA), because it stores the compressed forms of the words in three individual streams depending on their frequencies in the text. It also stores two dictionaries and a bit vector as a side information. In our experiments MWCA obtains compression ratio over 3,23 bpc on average and 2,88 bpc on files larger than 50 MB. If a variable length encoder like Huffman Coding is used after MWCA, given ratios will reduce to 2,63 and 2,44 bpc respectively. With the advantage of its multi-stream structure MWCA could become a good solution especially for storing and searching big text data.en_US
dc.description.sponsorshipIEEE Adv Technol Human,Istanbul Teknik Univ,Gazi Univ,Atilim Univ,TBV,Akdeniz Univ,Tmmob Bilgisayar Muhendisleri Odasien_US
dc.identifier.endpage37en_US
dc.identifier.isbn978-1-5386-0930-9
dc.identifier.scopus2-s2.0-85040605764en_US
dc.identifier.scopusqualityN/Aen_US
dc.identifier.startpage34en_US
dc.identifier.urihttps://hdl.handle.net/20.500.14551/20337
dc.identifier.wosWOS:000426856900007en_US
dc.identifier.wosqualityN/Aen_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.language.isotren_US
dc.publisherIEEEen_US
dc.relation.ispartof2017 International Conference On Computer Science And Engineering (Ubmk)en_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectData Compressionen_US
dc.subjectText Compressionen_US
dc.subjectNatural-Language Texten_US
dc.titleMulti-Stream Word-Based Compression Algorithmen_US
dc.typeConference Objecten_US

Dosyalar