Deep Learning-Based Context-Aware Video Content Analysis on IoT Devices

dc.authoridCengiz, Korhan/0000-0001-6594-8861
dc.authoridMokhtar, Bassem/0000-0002-7138-4721
dc.authoridgad, gad/0000-0001-9177-9950
dc.authoridGad, Eyad/0000-0003-0982-3065
dc.authorwosidCengiz, Korhan/HTN-8060-2023
dc.authorwosidMokhtar, Bassem/HOA-9402-2023
dc.authorwosidMokhtar, Bassem/B-3798-2016
dc.contributor.authorGad, Gad
dc.contributor.authorGad, Eyad
dc.contributor.authorCengiz, Korhan
dc.contributor.authorFadlullah, Zubair
dc.contributor.authorMokhtar, Bassem
dc.date.accessioned2024-06-12T11:17:06Z
dc.date.available2024-06-12T11:17:06Z
dc.date.issued2022
dc.departmentTrakya Üniversitesien_US
dc.description.abstractIntegrating machine learning with the Internet of Things (IoT) enables many useful applications. For IoT applications that incorporate video content analysis (VCA), deep learning models are usually used due to their capacity to encode the high-dimensional spatial and temporal representations of videos. However, limited energy and computation resources present a major challenge. Video captioning is one type of VCA that describes a video with a sentence or a set of sentences. This work proposes an IoT-based deep learning-based framework for video captioning that can (1) Mine large open-domain video-to-text datasets to extract video-caption pairs that belong to a particular domain. (2) Preprocess the selected video-caption pairs including reducing the complexity of the captions' language model to improve performance. (3) Propose two deep learning models: A transformer-based model and an LSTM-based model. Hyperparameter tuning is performed to select the best hyperparameters. Models are evaluated in terms of accuracy and inference time on different platforms. The presented framework generates captions in standard sentence templates to facilitate extracting information in later stages of the analysis. The two developed deep learning models offer a trade-off between accuracy and speed. While the transformer-based model yields a high accuracy of 97%, the LSTM-based model achieves near real-time inference.en_US
dc.description.sponsorshipVector institute through the VI scholarship in AIen_US
dc.description.sponsorshipThis research is partially funded by the Vector institute through the VI scholarship in AI.en_US
dc.identifier.doi10.3390/electronics11111785
dc.identifier.issn2079-9292
dc.identifier.issue11en_US
dc.identifier.scopus2-s2.0-85131188064en_US
dc.identifier.scopusqualityQ2en_US
dc.identifier.urihttps://doi.org/10.3390/electronics11111785
dc.identifier.urihttps://hdl.handle.net/20.500.14551/24579
dc.identifier.volume11en_US
dc.identifier.wosWOS:000808827100001en_US
dc.identifier.wosqualityQ2en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.language.isoenen_US
dc.publisherMdpien_US
dc.relation.ispartofElectronicsen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectVideo Content Analysisen_US
dc.subjectLSTMen_US
dc.subjectTransformer-Based Modelen_US
dc.subjectVideo Captioningen_US
dc.subjectInternet Of Things (Iot)en_US
dc.titleDeep Learning-Based Context-Aware Video Content Analysis on IoT Devicesen_US
dc.typeArticleen_US

Dosyalar