Multimedia

Authors and titles for recent submissions

Fri, 24 May 2024
Wed, 22 May 2024
Tue, 21 May 2024
Mon, 20 May 2024
Fri, 17 May 2024

[ total of 26 entries: 1-25 | 26 ]
[ showing 25 entries per page: fewer | more | all ]

Fri, 24 May 2024

[1] arXiv:2405.14040 [pdf, other]: Title: Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline

Authors: Dingyi Yang, Chunru Zhan, Ziheng Wang, Biao Wang, Tiezheng Ge, Bo Zheng, Qin Jin

Comments: 15 pages, 13 figures

Subjects: Multimedia (cs.MM)
[2] arXiv:2405.14709 (cross-list from cs.CV) [pdf, other]: Title: OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance

Authors: Shuheng Ge, Haoyu Xing, Li Zhang, Xiangqian Wu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[3] arXiv:2405.14598 (cross-list from cs.CV) [pdf, other]: Title: Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Authors: Shiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

Comments: 10 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4] arXiv:2405.14556 (cross-list from cs.CV) [pdf, ps, other]: Title: Deep Learning Classification of Photoplethysmogram Signal for Hypertension Levels

Authors: Nida Nasir, Mustafa Sameer, Feras Barneih, Omar Alshaltone, Muneeb Ahmed

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[5] arXiv:2405.14312 (cross-list from cs.CV) [pdf, other]: Title: Improving Gloss-free Sign Language Translation by Reducing Representation Density

Authors: Jinhui Ye, Xing Wang, Wenxiang Jiao, Junwei Liang, Hui Xiong

Comments: Representation Density and Performance Drop

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[6] arXiv:2405.14225 (cross-list from q-bio.QM) [pdf, other]: Title: ReactXT: Understanding Molecular "Reaction-ship" via Reaction-Contextualized Molecule-Text Pretraining

Authors: Zhiyuan Liu, Yaorui Shi, An Zhang, Sihang Li, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

Comments: ACL 2024 Findings, 9 pages

Subjects: Quantitative Methods (q-bio.QM); Computation and Language (cs.CL); Multimedia (cs.MM)
[7] arXiv:2405.13984 (cross-list from cs.CL) [pdf, other]: Title: Feedback-aligned Mixed LLMs for Machine Language-Molecule Translation

Authors: Dimitris Gkoumas, Maria Liakata

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[8] arXiv:2405.13389 (cross-list from cs.CV) [pdf, other]: Title: HR-INR: Continuous Space-Time Video Super-Resolution via Event Camera

Authors: Yunfan Lu, Zipeng Wang, Yusheng Wang, Hui Xiong

Comments: 30 pages, 20 figures, 8 tables. This work was submitted for review in the second half of 2023. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Robotics (cs.RO)
[9] arXiv:2405.13127 (cross-list from cs.CV) [pdf, other]: Title: Towards Retrieval-Augmented Architectures for Image Captioning

Authors: Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Alessandro Nicolosi, Rita Cucchiara

Comments: ACM Transactions on Multimedia Computing, Communications and Applications (2024)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[10] arXiv:2405.13049 (cross-list from cs.CL) [pdf, other]: Title: SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations

Authors: Fanfan Wang, Heqing Ma, Jianfei Yu, Rui Xia, Erik Cambria

Comments: 12 pages, 3 figures, 4 Tables

Journal-ref: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM)

Wed, 22 May 2024

[11] arXiv:2405.12775 [pdf, other]: Title: Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances

Authors: Hanlei Zhang, Hua Xu, Fei Long, Xin Wang, Kai Gao

Comments: Accepted by ACL 2024, Main Conference, Long Paper

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[12] arXiv:2405.12847 (cross-list from cs.IR) [pdf, other]: Title: A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability

Authors: Li-Yang Tseng, Tzu-Ling Lin, Hong-Han Shuai, Jen-Wei Huang, Wen-Whei Chang

Journal-ref: Proceedings of the 24th International Society for Music Information Retrieval Conference, 174-181. Milan, Italy, November 5-9, 2023

Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[13] arXiv:2405.12564 (cross-list from q-bio.QM) [pdf, other]: Title: ProtT3: Protein-to-Text Generation for Text-based Protein Understanding

Authors: Zhiyuan Liu, An Zhang, Hao Fei, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

Comments: ACL 2024, 9 pages

Subjects: Quantitative Methods (q-bio.QM); Computation and Language (cs.CL); Multimedia (cs.MM)
[14] arXiv:2405.12540 (cross-list from cs.CV) [pdf, other]: Title: Context-Enhanced Video Moment Retrieval with Large Language Models

Authors: Weijia Liu, Bo Miao, Jiuxin Cao, Xuelin Zhu, Bo Liu, Mehwish Nasim, Ajmal Mian

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[15] arXiv:2405.12512 (cross-list from cs.CV) [pdf, other]: Title: Rethink Predicting the Optical Flow with the Kinetics Perspective

Authors: Yuhao Cheng, Siru Zhang, Yiqiang Yan

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[16] arXiv:2405.12336 (cross-list from cs.CR) [pdf, ps, other]: Title: Interoperable Provenance Authentication of Broadcast Media using Open Standards-based Metadata, Watermarking and Cryptography

Authors: John C. Simmons, Joseph M. Winograd

Comments: 17 pages, 9 figures. Submitted to IBC2024 Technical Papers Programme

Subjects: Cryptography and Security (cs.CR); Multimedia (cs.MM)

Tue, 21 May 2024

[17] arXiv:2405.11742 [pdf, other]: Title: Universal Organizer of SAM for Unsupervised Semantic Segmentation

Authors: Tingting Li, Gensheng Pei, Xinhao Cai, Huafeng Liu, Qiong Wang, Yazhou Yao

Comments: accepted by IEEE International Conference on Multimedia & Expo

Subjects: Multimedia (cs.MM)
[18] arXiv:2405.12221 (cross-list from cs.CV) [pdf, other]: Title: Images that Sound: Composing Images and Sounds on a Single Canvas

Authors: Ziyang Chen, Daniel Geng, Andrew Owens

Comments: Project site: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19] arXiv:2405.12126 (cross-list from cs.CV) [pdf, other]: Title: Alzheimer's Magnetic Resonance Imaging Classification Using Deep and Meta-Learning Models

Authors: Nida Nasir, Muneeb Ahmed, Neda Afreen, Mustafa Sameer

Subjects: Computer Vision and Pattern Recognition (cs.CV); Emerging Technologies (cs.ET); Machine Learning (cs.LG); Multimedia (cs.MM)
[20] arXiv:2405.11295 (cross-list from eess.IV) [pdf, ps, other]: Title: Medical Image Analysis for Detection, Treatment and Planning of Disease using Artificial Intelligence Approaches

Authors: Nand Lal Yadav, Satyendra Singh, Rajesh Kumar, Sudhakar Singh

Comments: 10 pages, 3 figures

Journal-ref: International Journal of Microsystems and IoT, Vol. 1, Issue 5, pp.278- 287, 2023

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[21] arXiv:2405.11273 (cross-list from cs.AI) [pdf, other]: Title: Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Authors: Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma, Min Zhang

Comments: 22 pages, 13 figures. Project Website: this https URL Working in progress

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[22] arXiv:2405.11145 (cross-list from cs.CV) [pdf, other]: Title: Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions

Authors: Junzhang Liu, Zhecan Wang, Hammad Ayyubi, Haoxuan You, Chris Thomas, Rui Sun, Shih-Fu Chang, Kai-Wei Chang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[23] arXiv:2405.11093 (cross-list from eess.AS) [pdf, other]: Title: AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted Augmentations

Authors: David Xu

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD)

Mon, 20 May 2024

[24] arXiv:2405.10497 [pdf, other]: Title: SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge

Authors: Bo Wu, Peiye Liu, Wen-Huang Cheng, Bei Liu, Zhaoyang Zeng, Jia Wang, Qiushi Huang, Jiebo Luo

Comments: ACM Multimedia. arXiv admin note: text overlap with arXiv:1910.01795

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)

Fri, 17 May 2024 (showing first 1 of 2 entries)

[25] arXiv:2405.10029 [pdf, other]: Title: AsCL: An Asymmetry-sensitive Contrastive Learning Method for Image-Text Retrieval with Cross-Modal Fusion

Authors: Ziyu Gong, Chengcheng Mai, Yihua Huang

Comments: This work has been strong-accepted as the oral conference paper by IEEE International Conference on Multimedia & Expo (ICME) 2024

Subjects: Multimedia (cs.MM)

Fri, 24 May 2024
Wed, 22 May 2024
Tue, 21 May 2024
Mon, 20 May 2024
Fri, 17 May 2024

[ total of 26 entries: 1-25 | 26 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2405, contact, help (Access key information)

> cs > cs.MM

Multimedia

Authors and titles for recent submissions

Fri, 24 May 2024

Wed, 22 May 2024

Tue, 21 May 2024

Mon, 20 May 2024

Fri, 17 May 2024 (showing first 1 of 2 entries)