Multimedia

Authors and titles for cs.MM in Mar 2024

[ total of 114 entries: 1-114 ]
[ showing 114 entries per page: fewer | more ]

[1] arXiv:2403.00752 [pdf, other]: Title: An Experimental Study of Low-Latency Video Streaming over 5G

Authors: Imran Khan, Tuyen X. Tran, Matti Hiltunen, Theodore Karagioules, Dimitrios Koutsonikolas

Comments: 6 Pages

Subjects: Multimedia (cs.MM); Performance (cs.PF)
[2] arXiv:2403.01087 [pdf, other]: Title: Towards Accurate Lip-to-Speech Synthesis in-the-Wild

Authors: Sindhu Hegde, Rudrabha Mukhopadhyay, C.V. Jawahar, Vinay Namboodiri

Comments: 8 pages of content, 1 page of references and 4 figures

Journal-ref: In Proceedings of the 31st ACM International Conference on Multimedia, 2023

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[3] arXiv:2403.02693 [pdf, other]: Title: Optimizing Mobile-Friendly Viewport Prediction for Live 360-Degree Video Streaming

Authors: Lei Zhang, Tao Long, Weizhen Xu, Laizhong Cui, Jiangchuan Liu

Comments: 14 pages

Subjects: Multimedia (cs.MM); Image and Video Processing (eess.IV)
[4] arXiv:2403.02905 [pdf, other]: Title: MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model

Authors: Sen Wang, Jiangning Zhang, Weijian Cao, Xiaobin Hu, Moran Li, Xiaozhong Ji, Xin Tan, Mengtian Li, Zhifeng Xie, Chengjie Wang, Lizhuang Ma

Subjects: Multimedia (cs.MM)
[5] arXiv:2403.03170 [pdf, other]: Title: SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection

Authors: Peng Qi, Zehong Yan, Wynne Hsu, Mong Li Lee

Comments: To appear in CVPR 2024

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
[6] arXiv:2403.05060 [pdf, other]: Title: Multimodal Infusion Tuning for Large Models

Authors: Hao Sun, Yu Song, Jihong Hu, Xinyao Yu, Jiaqing Liu, Yen-Wei Chen, Lanfen Lin

Subjects: Multimedia (cs.MM); Human-Computer Interaction (cs.HC)
[7] arXiv:2403.05427 [pdf, other]: Title: Reply with Sticker: New Dataset and Model for Sticker Retrieval

Authors: Bin Liang, Bingbing Wang, Zhixin Bai, Qiwei Lang, Mingwei Sun, Kaiheng Hou, Kam-Fai Wong, Ruifeng Xu

Subjects: Multimedia (cs.MM)
[8] arXiv:2403.05428 [pdf, other]: Title: Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition

Authors: Bingbing Wang, Bin Liang, Chun-Mei Feng, Wangmeng Zuo, Zhixin Bai, Shijue Huang, Kam-Fai Wong, Ruifeng Xu

Subjects: Multimedia (cs.MM)
[9] arXiv:2403.05628 [pdf, other]: Title: AMUSE: Adaptive Multi-Segment Encoding for Dataset Watermarking

Authors: Saeed Ranjbar Alvar, Mohammad Akbari, David (Ming Xuan)Yue, Lingyang Chu, Yong Zhang

Subjects: Multimedia (cs.MM); Cryptography and Security (cs.CR)
[10] arXiv:2403.05834 [pdf, other]: Title: Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information

Authors: Qiaochu Huang, Xu He, Boshi Tang, Haolin Zhuang, Liyang Chen, Shuochen Gao, Zhiyong Wu, Haozhi Huang, Helen Meng

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11] arXiv:2403.05851 [pdf, other]: Title: Interest-Aware Joint Caching, Computing, and Communication Optimization for Mobile VR Delivery in MEC Networks

Authors: Baojie Fu, Tong Tang, Dapeng Wu, Ruyan Wang

Subjects: Multimedia (cs.MM); Emerging Technologies (cs.ET)
[12] arXiv:2403.06660 [pdf, other]: Title: FashionReGen: LLM-Empowered Fashion Report Generation

Authors: Yujuan Ding, Yunshan Ma, Wenqi Fan, Yige Yao, Tat-Seng Chua, Qing Li

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
[13] arXiv:2403.10406 [pdf, other]: Title: Deep Bi-directional Attention Network for Image Super-Resolution Quality Assessment

Authors: Yixiao Li, Xiaoyuan Yang, Jun Fu, Guanghui Yue, Wei Zhou

Comments: 7 pages, 3 figures, published to 2024 IEEE International Conference on Multimedia and Expo (ICME)

Subjects: Multimedia (cs.MM)
[14] arXiv:2403.10943 [pdf, other]: Title: MIntRec 2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations

Authors: Hanlei Zhang, Xin Wang, Hua Xu, Qianrui Zhou, Kai Gao, Jianhua Su, jinyue Zhao, Wenrui Li, Yanting Chen

Comments: Accepted by ICLR 2024, Long Paper; The abstract is slightly modified due to the length limitation

Subjects: Multimedia (cs.MM); Computation and Language (cs.CL)
[15] arXiv:2403.10976 [pdf, other]: Title: Quality-Aware Dynamic Resolution Adaptation Framework for Adaptive Video Streaming

Authors: Amritha Premkumar, Prajit T Rajendran, Vignesh V Menon, Adam Wieckowski, Benjamin Bross, Detlev Marpe

Comments: ACM MMSys '24 | Open-Source Software and Dataset. arXiv admin note: substantial text overlap with arXiv:2401.15346

Subjects: Multimedia (cs.MM)
[16] arXiv:2403.11241 [pdf, other]: Title: Fidelity-preserving Learning-Based Image Compression: Loss Function and Subjective Evaluation Methodology

Authors: Shima Mohammadi, Yaojun Wu, João Ascenso

Comments: 5 pages, 6 figures. In 2023 IEEE International Conference on Visual Communications and Image Processing (VCIP)

Subjects: Multimedia (cs.MM)
[17] arXiv:2403.11700 [pdf, other]: Title: Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing

Authors: Juan Zhang, Jiahao Chen, Cheng Wang, Zhiwang Yu, Tangquan Qi, Can Liu, Di Wu

Subjects: Multimedia (cs.MM)
[18] arXiv:2403.11757 [pdf, other]: Title: Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation

Authors: Jun Yu, Wangyuan Zhu, Jichao Zhu

Subjects: Multimedia (cs.MM); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19] arXiv:2403.12053 [src]: Title: PiGW: A Plug-in Generative Watermarking Framework

Authors: Rui Ma, Mengxi Guo, Li Yuming, Hengyuan Zhang, Cong Ma, Yuan Li, Xiaodong Xie, Shanghang Zhang

Comments: Improve experimental content

Subjects: Multimedia (cs.MM)
[20] arXiv:2403.12618 [pdf, other]: Title: NewsCaption: Named-Entity aware Captioning for Out-of-Context Media

Authors: Anurag Singh, Shivangi Aneja

Subjects: Multimedia (cs.MM); Social and Information Networks (cs.SI)
[21] arXiv:2403.12667 [pdf, other]: Title: ICE: Interactive 3D Game Character Editing via Dialogue

Authors: Haoqian Wu, Yunjie Wu, Zhipeng Hu, Lincheng Li, Weijie Chen, Rui Zhao, Changjie Fan, Xin Yu

Subjects: Multimedia (cs.MM); Human-Computer Interaction (cs.HC)
[22] arXiv:2403.15226 [pdf, other]: Title: Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models

Authors: Qiong Wu, Weihao Ye, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji

Subjects: Multimedia (cs.MM); Computation and Language (cs.CL)
[23] arXiv:2403.15256 [pdf, other]: Title: Experimental Studies of Metaverse Streaming

Authors: Haopeng Wang, Roberto Martinez-Velazquez, Haiwei Dong, Abdulmotaleb El Saddik

Comments: Accepted by IEEE Consumer Electronics Magazine

Subjects: Multimedia (cs.MM); Networking and Internet Architecture (cs.NI)
[24] arXiv:2403.16951 [pdf, other]: Title: Network-Assisted Delivery of Adaptive Video Streaming Services through CDN, SDN, and MEC

Authors: Reza Farahani

Comments: PhD thesis defended in 22.08.2023 (this https URL)

Subjects: Multimedia (cs.MM); Networking and Internet Architecture (cs.NI)
[25] arXiv:2403.16985 [pdf, other]: Title: Towards Low-Latency and Energy-Efficient Hybrid P2P-CDN Live Video Streaming

Authors: Reza Farahani, Christian Timmerer, Hermann Hellwagner

Comments: 6 pages, 3 figures, Special Issue on Sustainable Multimedia Communications and Services, IEEE MMTC Communications

Subjects: Multimedia (cs.MM)
[26] arXiv:2403.17420 [pdf, other]: Title: Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge

Authors: Dongjin Kim, Sung Jin Um, Sangmin Lee, Jung Uk Kim

Comments: Accepted at CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[27] arXiv:2403.19002 [pdf, other]: Title: Robust Active Speaker Detection in Noisy Environments

Authors: Siva Sai Nagender Vasireddy, Chenxu Zhang, Xiaohu Guo, Yapeng Tian

Comments: 15 pages, 5 figures

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[28] arXiv:2403.20194 [pdf, other]: Title: ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models

Authors: Shuo Liu, Kaining Ying, Hao Zhang, Yue Yang, Yuqi Lin, Tianle Zhang, Chuanhao Li, Yu Qiao, Ping Luo, Wenqi Shao, Kaipeng Zhang

Subjects: Multimedia (cs.MM)
[29] arXiv:2403.00781 (cross-list from cs.IR) [pdf, other]: Title: ChatDiet: Empowering Personalized Nutrition-Oriented Food Recommender Chatbots through an LLM-Augmented Framework

Authors: Zhongqi Yang, Elahe Khatibi, Nitish Nagesh, Mahyar Abbasian, Iman Azimi, Ramesh Jain, Amir M. Rahmani

Comments: Accepted by The IEEE/ACM international conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE) 2024

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[30] arXiv:2403.01700 (cross-list from cs.SD) [pdf, other]: Title: Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer

Authors: Haoxu Wang, Ming Cheng, Qiang Fu, Ming Li

Comments: Accepted by ICASSP 2024

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[31] arXiv:2403.02707 (cross-list from cs.CV) [pdf, other]: Title: Enhancing Generalization in Medical Visual Question Answering Tasks via Gradient-Guided Model Perturbation

Authors: Gang Liu, Hongyang Li, Zerui He, Shenjun Zhong

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[32] arXiv:2403.03095 (cross-list from cs.CV) [pdf, other]: Title: Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization

Authors: Yuxin Guo, Shijie Ma, Yuhao Zhao, Hu Su, Wei Zou

Comments: Accepted To ICASSP2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33] arXiv:2403.03145 (cross-list from cs.CV) [pdf, other]: Title: Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization

Authors: Yuxin Guo, Shijie Ma, Hu Su, Zhiqing Wang, Yuhao Zhao, Wei Zou, Siyang Sun, Yun Zheng

Comments: Accepted to NeurIPS2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[34] arXiv:2403.03740 (cross-list from cs.CV) [pdf, other]: Title: Self-supervised Photographic Image Layout Representation Learning

Authors: Zhaoran Zhao, Peng Lu, Xujun Peng, Wenhao Guo

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[35] arXiv:2403.04245 (cross-list from cs.SD) [pdf, other]: Title: A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

Authors: Yusheng Dai, Hang Chen, Jun Du, Ruoyu Wang, Shihao Chen, Jiefeng Ma, Haotian Wang, Chin-Hui Lee

Comments: the paper is accepted by CVPR2024

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[36] arXiv:2403.04321 (cross-list from cs.CV) [pdf, other]: Title: Discriminative Probing and Tuning for Text-to-Image Generation

Authors: Leigang Qu, Wenjie Wang, Yongqi Li, Hanwang Zhang, Liqiang Nie, Tat-Seng Chua

Comments: CVPR 2024; project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[37] arXiv:2403.04523 (cross-list from cs.CV) [pdf, other]: Title: T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers

Authors: Mariano V. Ntrougkas, Nikolaos Gkalelis, Vasileios Mezaris

Comments: Under review

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[38] arXiv:2403.05050 (cross-list from cs.CV) [pdf, other]: Title: DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception

Authors: Xiang Huang, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Baigui Sun, Xiao Wu

Comments: Project: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[39] arXiv:2403.05105 (cross-list from cs.CV) [pdf, other]: Title: Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

Authors: Haochen Han, Qinghua Zheng, Guang Dai, Minnan Luo, Jingdong Wang

Comments: CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[40] arXiv:2403.05192 (cross-list from cs.NI) [pdf, other]: Title: An End-to-End Pipeline Perspective on Video Streaming in Best-Effort Networks: A Survey and Tutorial

Authors: Leonardo Peroni, Sergey Gorinsky

Subjects: Networking and Internet Architecture (cs.NI); Multimedia (cs.MM)
[41] arXiv:2403.05261 (cross-list from cs.CV) [pdf, other]: Title: Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval

Authors: Hailang Huang, Zhijie Nie, Ziqiao Wang, Ziyu Shang

Comments: 9 pages, Accepted by AAAI2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[42] arXiv:2403.05658 (cross-list from cs.CV) [pdf, other]: Title: Feature CAM: Interpretable AI in Image Classification

Authors: Frincy Clement, Ji Yang, Irene Cheng

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[43] arXiv:2403.05768 (cross-list from cs.CV) [pdf, other]: Title: Deep Contrastive Multi-view Clustering under Semantic Feature Guidance

Authors: Siwen Liu, Jinyan Liu, Hanning Yuan, Qi Li, Jing Geng, Ziqiang Yuan, Huaxu Han

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[44] arXiv:2403.06324 (cross-list from cs.NI) [pdf, other]: Title: ACM MMSys 2024 Bandwidth Estimation in Real Time Communications Challenge

Authors: Sami Khairy, Gabriel Mittag, Vishak Gopal, Francis Y. Yan, Zhixiong Niu, Ezra Ameri, Scott Inglis, Mehrsa Golestaneh, Ross Cutler

Subjects: Networking and Internet Architecture (cs.NI); Multimedia (cs.MM)
[45] arXiv:2403.06497 (cross-list from cs.CV) [pdf, other]: Title: QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning

Authors: Jiun-Man Chen, Yu-Hsuan Chao, Yu-Jie Wang, Ming-Der Shieh, Chih-Chung Hsu, Wei-Fen Lin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[46] arXiv:2403.06776 (cross-list from cs.HC) [pdf, other]: Title: Born to Run, Programmed to Play: Mapping the Extended Reality Exergames Landscape

Authors: Sukran Karaosmanoglu, Sebastian Cmentowski, Lennart E. Nacke, Frank Steinicke

Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[47] arXiv:2403.07338 (cross-list from cs.IT) [pdf, ps, other]: Title: D$^2$-JSCC: Digital Deep Joint Source-channel Coding for Semantic Communications

Authors: Jianhao Huang, Kai Yuan, Chuan Huang, Kaibin Huang

Subjects: Information Theory (cs.IT); Multimedia (cs.MM); Signal Processing (eess.SP)
[48] arXiv:2403.07613 (cross-list from cs.HC) [pdf, other]: Title: Imagine a dragon made of seaweed: How images enhance learning in Wikipedia

Authors: Anita Silva, Maria Tracy, Katharina Reinecke, Eytan Adar, Miriam Redi

Comments: 16 pages, 10 figures

Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[49] arXiv:2403.07839 (cross-list from cs.CV) [pdf, other]: Title: MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric

Authors: Haokun Lin, Haoli Bai, Zhili Liu, Lu Hou, Muyi Sun, Linqi Song, Ying Wei, Zhenan Sun

Comments: 18 pages, 8 figures, Published in CVPR2024

Journal-ref: In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[50] arXiv:2403.07938 (cross-list from cs.SD) [pdf, other]: Title: Text-to-Audio Generation Synchronized with Videos

Authors: Shentong Mo, Jing Shi, Yapeng Tian

Comments: arXiv admin note: text overlap with arXiv:2305.12903

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[51] arXiv:2403.07952 (cross-list from cs.CV) [pdf, other]: Title: AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production

Authors: Jiuniu Wang, Zehua Du, Yuyuan Zhao, Bo Yuan, Kexiang Wang, Jian Liang, Yaxi Zhao, Yihen Lu, Gengliang Li, Junlong Gao, Xin Tu, Zhenyu Guo

Comments: 22 pages, 13 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[52] arXiv:2403.08580 (cross-list from cs.CV) [pdf, other]: Title: Leveraging Compressed Frame Sizes For Ultra-Fast Video Classification

Authors: Yuxing Han, Yunan Ding, Chen Ye Gan, Jiangtao Wen

Comments: 5 pages, 5 figures, 1 table. arXiv admin note: substantial text overlap with arXiv:2309.07361

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[53] arXiv:2403.08773 (cross-list from cs.CV) [pdf, other]: Title: Veagle: Advancements in Multimodal Representation Learning

Authors: Rajat Chawla, Arkajit Datta, Tushar Verma, Adarsh Jha, Anmol Gautam, Ayush Vatsal, Sukrit Chaterjee, Mukunda NS, Ishaan Bhola

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[54] arXiv:2403.08806 (cross-list from cs.CV) [pdf, other]: Title: Adversarially Robust Deepfake Detection via Adversarial Feature Similarity Learning

Authors: Sarwar Khan

Comments: MMM 2024 Accepted

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[55] arXiv:2403.08824 (cross-list from cs.HC) [pdf, other]: Title: Measuring Non-Typical Emotions for Mental Health: A Survey of Computational Approaches

Authors: Puneet Kumar, Alexander Vedernikov, Xiaobai Li

Comments: Under review in IEEE Transactions on Affective Computing

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[56] arXiv:2403.09407 (cross-list from cs.SD) [pdf, other]: Title: LM2D: Lyrics- and Music-Driven Dance Synthesis

Authors: Wenjie Yin, Xuejiao Zhao, Yi Yu, Hang Yin, Danica Kragic, Mårten Björkman

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[57] arXiv:2403.09451 (cross-list from cs.CV) [pdf, other]: Title: M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment

Authors: Long Nguyen-Phuoc, Renald Gaboriau, Dimitri Delacroix, Laurent Navarro

Journal-ref: Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2 VISAPP: VISAPP, 869-876, 2024 , Rome, Italy

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[58] arXiv:2403.09502 (cross-list from cs.LG) [pdf, other]: Title: EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning

Authors: Jongsuk Kim, Hyeongkeun Lee, Kyeongha Rho, Junmo Kim, Joon Son Chung

Comments: 14 pages, 3 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[59] arXiv:2403.10020 (cross-list from cs.CL) [pdf, other]: Title: Lost in Overlap: Exploring Watermark Collision in LLMs

Authors: Yiyang Luo, Ke Lin, Chao Gu

Comments: Short Paper, 4 pages

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[60] arXiv:2403.10024 (cross-list from cs.SD) [pdf, other]: Title: MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage

Authors: Hao Hao Tan, Kin Wai Cheuk, Taemin Cho, Wei-Hsiang Liao, Yuki Mitsufuji

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[61] arXiv:2403.10061 (cross-list from cs.CV) [pdf, other]: Title: PAME: Self-Supervised Masked Autoencoder for No-Reference Point Cloud Quality Assessment

Authors: Ziyu Shan, Yujie Zhang, Qi Yang, Haichen Yang, Yiling Xu, Shan Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[62] arXiv:2403.10066 (cross-list from cs.CV) [pdf, other]: Title: Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment

Authors: Ziyu Shan, Yujie Zhang, Qi Yang, Haichen Yang, Yiling Xu, Jenq-Neng Hwang, Xiaozhong Xu, Shan Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[63] arXiv:2403.10107 (cross-list from cs.CV) [pdf, other]: Title: Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning

Authors: Hang Zhang, Wenxiao Zhang, Haoxuan Qu, Jun Liu

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[64] arXiv:2403.10254 (cross-list from cs.CV) [pdf, other]: Title: Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Authors: Pingping Zhang, Yuhao Wang, Yang Liu, Zhengzheng Tu, Huchuan Lu

Comments: This work is accepted by CVPR2024. More modifications may be performed

Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Multimedia (cs.MM)
[65] arXiv:2403.10667 (cross-list from cs.IR) [pdf, other]: Title: Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond

Authors: Tianxin Wei, Bowen Jin, Ruirui Li, Hansi Zeng, Zhengyang Wang, Jianhui Sun, Qingyu Yin, Hanqing Lu, Suhang Wang, Jingrui He, Xianfeng Tang

Comments: ICLR 2024

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[66] arXiv:2403.10851 (cross-list from cs.HC) [pdf, ps, other]: Title: GustosonicSense: Towards understanding the design of playful gustosonic eating experiences

Authors: Yan Wang, Humphrey O. Obie, Zhuying Li, Flora D. Salim, John Grundy, Florian 'Floyd' Mueller

Comments: To appear at CHI'24: The ACM Conference on Human Factors in Computing Systems (CHI), Honolulu, Hawaii, 2024

Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[67] arXiv:2403.10883 (cross-list from cs.CV) [pdf, other]: Title: Improving Adversarial Transferability of Visual-Language Pre-training Models through Collaborative Multimodal Interaction

Authors: Jiyuan Fu, Zhaoyu Chen, Kaixun Jiang, Haijing Guo, Jiafeng Wang, Shuyong Gao, Wenqiang Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Multimedia (cs.MM)
[68] arXiv:2403.10897 (cross-list from cs.CV) [pdf, other]: Title: Rethinking Multi-view Representation Learning via Distilled Disentangling

Authors: Guanzhou Ke, Bo Wang, Xiaoli Wang, Shengfeng He

Comments: Accepted by CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[69] arXiv:2403.11074 (cross-list from cs.CV) [pdf, other]: Title: Audio-Visual Segmentation via Unlabeled Frame Exploitation

Authors: Jinxiang Liu, Yikun Liu, Fei Zhang, Chen Ju, Ya Zhang, Yanfeng Wang

Comments: Accepted by CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[70] arXiv:2403.11311 (cross-list from cs.CL) [pdf, other]: Title: Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding

Authors: Zichen Wu, Hsiu-Yuan Huang, Fanyi Qu, Yunfang Wu

Comments: LREC-COLING 2024, Long Paper

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[71] arXiv:2403.11572 (cross-list from cs.CV) [pdf, other]: Title: Augment Before Copy-Paste: Data and Memory Efficiency-Oriented Instance Segmentation Framework for Sport-scenes

Authors: Chih-Chung Hsu, Chia-Ming Lee, Ming-Shyen Wu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[72] arXiv:2403.11576 (cross-list from cs.CV) [pdf, other]: Title: MISS: Memory-efficient Instance Segmentation Framework By Visual Inductive Priors Flow Propagation

Authors: Chih-Chung Hsu, Chia-Ming Lee

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[73] arXiv:2403.11626 (cross-list from cs.GR) [pdf, other]: Title: QEAN: Quaternion-Enhanced Attention Network for Visual Dance Generation

Authors: Zhizhen Zhou, Yejing Huo, Guoheng Huang, An Zeng, Xuhang Chen, Lian Huang, Zinuo Li

Comments: Accepted by The Visual Computer Journal

Subjects: Graphics (cs.GR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74] arXiv:2403.11959 (cross-list from cs.CV) [pdf, other]: Title: IVAC-P2L: Leveraging Irregular Repetition Priors for Improving Video Action Counting

Authors: Hang Wang, Zhi-Qi Cheng, Youtian Du, Lei Zhang

Comments: Source code: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[75] arXiv:2403.11999 (cross-list from cs.CV) [pdf, other]: Title: HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs

Authors: Ting Yao, Yehao Li, Yingwei Pan, Tao Mei

Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[76] arXiv:2403.12686 (cross-list from cs.CV) [pdf, other]: Title: WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar

Authors: Runwei Guan, Liye Jia, Fengyufan Yang, Shanliang Yao, Erick Purwanto, Xiaohui Zhu, Eng Gee Lim, Jeremy Smith, Ka Lok Man, Xuming Hu, Yutao Yue

Comments: 10 pages, 10 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Robotics (cs.RO)
[77] arXiv:2403.13480 (cross-list from cs.CV) [pdf, other]: Title: A Unified Optimal Transport Framework for Cross-Modal Retrieval with Noisy Labels

Authors: Haochen Han, Minnan Luo, Huan Liu, Fang Nan

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Multimedia (cs.MM)
[78] arXiv:2403.13501 (cross-list from cs.CV) [pdf, other]: Title: VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis

Authors: Yumeng Li, William Beluch, Margret Keuper, Dan Zhang, Anna Khoreva

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[79] arXiv:2403.13667 (cross-list from cs.CV) [pdf, other]: Title: DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance

Authors: Zixuan Wang, Jia Jia, Shikun Sun, Haozhe Wu, Rong Han, Zhenyu Li, Di Tang, Jiaqing Zhou, Jiebo Luo

Comments: Accept to CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[80] arXiv:2403.14449 (cross-list from cs.RO) [pdf, other]: Title: Bringing Robots Home: The Rise of AI Robots in Consumer Electronics

Authors: Haiwei Dong, Yang Liu, Ted Chu, Abdulmotaleb El Saddik

Comments: Accepted by IEEE Consumer Electronics Magazine

Subjects: Robotics (cs.RO); Multimedia (cs.MM)
[81] arXiv:2403.14468 (cross-list from cs.CV) [pdf, other]: Title: AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

Authors: Max Ku, Cong Wei, Weiming Ren, Harry Yang, Wenhu Chen

Comments: preprint

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[82] arXiv:2403.14652 (cross-list from cs.CY) [pdf, other]: Title: MemeCraft: Contextual and Stance-Driven Multimodal Meme Generation

Authors: Han Wang, Roy Ka-Wei Lee

Comments: 8 pages, 7 figures, ACM MM 2024

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[83] arXiv:2403.14773 (cross-list from cs.CV) [pdf, other]: Title: StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Authors: Roberto Henschel, Levon Khachatryan, Daniil Hayrapetyan, Hayk Poghosyan, Vahram Tadevosyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi

Comments: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[84] arXiv:2403.14972 (cross-list from cs.AI) [pdf, other]: Title: A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal Reasoning

Authors: Changmeng Zheng, Dayong Liang, Wengyu Zhang, Xiao-Yong Wei, Tat-Seng Chua, Qing Li

Comments: Work in progress

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multiagent Systems (cs.MA); Multimedia (cs.MM)
[85] arXiv:2403.15048 (cross-list from cs.CV) [pdf, other]: Title: Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning

Authors: Bumsoo Kim, Wonseop Shin, Kyuchul Lee, Sanghyun Seo

Comments: 11 pages, 12 figures, 1 table, Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[86] arXiv:2403.15679 (cross-list from cs.CV) [pdf, other]: Title: DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes

Authors: Hao Yan, Zhihui Ke, Xiaobo Zhou, Tie Qiu, Xidong Shi, Dadong Jiang

Comments: CVPR 2024. Project page at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[87] arXiv:2403.15694 (cross-list from cs.LG) [pdf, other]: Title: Group Benefits Instances Selection for Data Purification

Authors: Zhenhuang Cai, Chuanyi Zhang, Dan Huang, Yuanbo Chen, Xiuyun Guan, Yazhou Yao

Comments: accepted by IEEE Intelligent Systems

Subjects: Machine Learning (cs.LG); Multimedia (cs.MM)
[88] arXiv:2403.16071 (cross-list from cs.AI) [pdf, other]: Title: Landmark-Guided Cross-Speaker Lip Reading with Mutual Information Regularization

Authors: Linzhi Wu, Xingyu Zhang, Yakun Zhang, Changyan Zheng, Tiejun Liu, Liang Xie, Ye Yan, Erwei Yin

Comments: To appear in LREC-COLING 2024

Journal-ref: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[89] arXiv:2403.17000 (cross-list from cs.CV) [pdf, other]: Title: Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution

Authors: Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou, Jiebo Luo, Tao Mei

Comments: CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[90] arXiv:2403.17001 (cross-list from cs.CV) [pdf, other]: Title: VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

Authors: Yang Chen, Yingwei Pan, Haibo Yang, Ting Yao, Tao Mei

Comments: CVPR 2024; Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[91] arXiv:2403.17004 (cross-list from cs.CV) [pdf, other]: Title: SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer

Authors: Rui Zhu, Yingwei Pan, Yehao Li, Ting Yao, Zhenglong Sun, Tao Mei, Chang Wen Chen

Comments: CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[92] arXiv:2403.17005 (cross-list from cs.CV) [pdf, other]: Title: TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models

Authors: Zhongwei Zhang, Fuchen Long, Yingwei Pan, Zhaofan Qiu, Ting Yao, Yang Cao, Tao Mei

Comments: CVPR 2024; Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[93] arXiv:2403.17589 (cross-list from cs.CV) [pdf, other]: Title: Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

Authors: Yabin Zhang, Wenjie Zhu, Hui Tang, Zhiyuan Ma, Kaiyang Zhou, Lei Zhang

Comments: CVPR2024; Codes are available at \url{this https URL}

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[94] arXiv:2403.17708 (cross-list from cs.CV) [pdf, other]: Title: Panonut360: A Head and Eye Tracking Dataset for Panoramic Video

Authors: Yutong Xu, Junhao Du, Jiahe Wang, Yuwei Ning, Sihan Zhou Yang Cao

Comments: 7 pages,ACM MMSys'24 accepted

Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[95] arXiv:2403.17727 (cross-list from cs.CV) [pdf, other]: Title: FastPerson: Enhancing Video Learning through Effective Video Summarization that Preserves Linguistic and Visual Contexts

Authors: Kazuki Kawamura, Jun Rekimoto

Journal-ref: AHs '24: Proceedings of the Augmented Humans International Conference 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[96] arXiv:2403.17837 (cross-list from cs.CV) [pdf, other]: Title: GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction

Authors: Hrishav Bakul Barua, Kalin Stefanov, KokSheik Wong, Abhinav Dhall, Ganesh Krishnasamy

Comments: Submitted to IEEE

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[97] arXiv:2403.17870 (cross-list from cs.CV) [pdf, other]: Title: Boosting Diffusion Models with Moving Average Sampling in Frequency Domain

Authors: Yurui Qian, Qi Cai, Yingwei Pan, Yehao Li, Ting Yao, Qibin Sun, Tao Mei

Comments: CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[98] arXiv:2403.18063 (cross-list from cs.CV) [pdf, other]: Title: Spectral Convolutional Transformer: Harmonizing Real vs. Complex Multi-View Spectral Operators for Vision Transformer

Authors: Badri N. Patro, Vinay P. Namboodiri, Vijay S. Agneeswaran

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[99] arXiv:2403.18252 (cross-list from cs.CV) [pdf, other]: Title: Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models

Authors: Yiwu Zhong, Zi-Yuan Hu, Michael R. Lyu, Liwei Wang

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[100] arXiv:2403.18323 (cross-list from cs.NI) [pdf, other]: Title: How to Cache Important Contents for Multi-modal Service in Dynamic Networks: A DRL-based Caching Scheme

Authors: Zhe Zhang, Marc St-Hilaire, Xin Wei, Haiwei Dong, Abdulmotaleb El Saddik

Journal-ref: IEEE Transactions on Multimedia (Early Access), 2024

Subjects: Networking and Internet Architecture (cs.NI); Multimedia (cs.MM)
[101] arXiv:2403.18714 (cross-list from cs.CV) [pdf, other]: Title: Bringing Textual Prompt to AI-Generated Image Quality Assessment

Authors: Bowen Qu, Haohui Li, Wei Gao

Comments: 6 pages, 3 figures, accepted by ICME2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[102] arXiv:2403.18715 (cross-list from cs.CV) [pdf, other]: Title: Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding

Authors: Xintong Wang, Jingheng Pan, Liang Ding, Chris Biemann

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[103] arXiv:2403.18821 (cross-list from cs.SD) [pdf, other]: Title: Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark

Authors: Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander Richard

Comments: Accepted to CVPR 2024. Project site: this https URL

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[104] arXiv:2403.19456 (cross-list from cs.CV) [pdf, other]: Title: Break-for-Make: Modular Low-Rank Adaptations for Composable Content-Style Customization

Authors: Yu Xu, Fan Tang, Juan Cao, Yuxin Zhang, Oliver Deussen, Weiming Dong, Jintao Li, Tong-Yee Lee

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM)
[105] arXiv:2403.19651 (cross-list from cs.CV) [pdf, other]: Title: MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Authors: Kai Zhang, Yi Luan, Hexiang Hu, Kenton Lee, Siyuan Qiao, Wenhu Chen, Yu Su, Ming-Wei Chang

Comments: Work in progress

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Multimedia (cs.MM)
[106] arXiv:2403.19723 (cross-list from cs.CL) [pdf, other]: Title: HGT: Leveraging Heterogeneous Graph-enhanced Large Language Models for Few-shot Complex Table Understanding

Authors: Rihui Jin, Yu Li, Guilin Qi, Nan Hu, Yuan-Fang Li, Jiaoyan Chen, Jianan Wang, Yongrui Chen, Dehai Min

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Databases (cs.DB); Multimedia (cs.MM)
[107] arXiv:2403.19763 (cross-list from cs.SD) [pdf, other]: Title: Creating Aesthetic Sonifications on the Web with SIREN

Authors: Tristan Peng, Hongchan Choi, Jonathan Berger

Comments: 7 pages, 1 figure, 5 listings, submitted to the Web Audio Conference 2024

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[108] arXiv:2403.04804 (cross-list from eess.AS) [pdf, other]: Title: AttentionStitch: How Attention Solves the Speech Editing Problem

Authors: Antonios Alexos, Pierre Baldi

Comments: Accepted in Machine Learning for Audio workship in NeurIPS 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[109] arXiv:2403.08505 (cross-list from eess.IV) [pdf, other]: Title: Content-aware Masked Image Modeling Transformer for Stereo Image Compression

Authors: Xinjie Zhang, Shenyuan Gao, Zhening Liu, Jiawei Shao, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Jun Zhang

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[110] arXiv:2403.08551 (cross-list from eess.IV) [pdf, other]: Title: GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

Authors: Xinjie Zhang, Xingtong Ge, Tongda Xu, Dailan He, Yan Wang, Hongwei Qin, Guo Lu, Jing Geng, Jun Zhang

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[111] arXiv:2403.10936 (cross-list from eess.IV) [pdf, ps, other]: Title: Channel-wise Feature Decorrelation for Enhanced Learned Image Compression

Authors: Farhad Pakdaman, Moncef Gabbouj

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[112] arXiv:2403.11155 (cross-list from eess.IV) [pdf, other]: Title: Interactive $360^{\circ}$ Video Streaming Using FoV-Adaptive Coding with Temporal Prediction

Authors: Yixiang Mao, Liyang Sun, Yong Liu, Yao Wang

Subjects: Image and Video Processing (eess.IV); Multimedia (cs.MM)
[113] arXiv:2403.15336 (cross-list from eess.AS) [pdf, other]: Title: Dialogue Understandability: Why are we streaming movies with subtitles?

Authors: Helard Becerra Martinez, Alessandro Ragano, Diptasree Debnath, Asad Ullah, Crisron Rudolf Lucas, Martin Walsh, Andrew Hines

Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM)
[114] arXiv:2403.16143 (cross-list from eess.IV) [pdf, other]: Title: CFAT: Unleashing TriangularWindows for Image Super-resolution

Authors: Abhisek Ray, Gaurav Kumar, Maheshkumar H. Kolekar

Comments: Accepted to CVPR 2024

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)

[ total of 114 entries: 1-114 ]
[ showing 114 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2405, contact, help (Access key information)

> cs > cs.MM

Multimedia

Authors and titles for cs.MM in Mar 2024