We gratefully acknowledge support from
the Simons Foundation and member institutions.

Multimedia

Authors and titles for recent submissions, skipping first 34

[ total of 24 entries: 1-24 ]
[ showing up to 50 entries per page: fewer | more ]

Fri, 17 May 2024

[1]  arXiv:2405.10029 [pdf, other]
Title: AsCL: An Asymmetry-sensitive Contrastive Learning Method for Image-Text Retrieval with Cross-Modal Fusion
Subjects: Multimedia (cs.MM)
[2]  arXiv:2405.10121 (cross-list from cs.CL) [pdf, other]
Title: Distilling Implicit Multimodal Knowledge into LLMs for Zero-Resource Dialogue Generation
Comments: Under Review
Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)

Thu, 16 May 2024

[3]  arXiv:2405.09286 [pdf, other]
Title: MVBIND: Self-Supervised Music Recommendation For Videos Via Embedding Space Binding
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
[4]  arXiv:2405.09539 (cross-list from eess.IV) [pdf, ps, other]
Title: MMFusion: Multi-modality Diffusion Model for Lymph Node Metastasis Diagnosis in Esophageal Cancer
Comments: Early accepted to MICCAI 2024 (6/6/5)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[5]  arXiv:2405.09321 (cross-list from cs.CV) [pdf, other]
Title: ReconBoost: Boosting Can Achieve Modality Reconcilement
Comments: This paper has been accepted by ICML2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[6]  arXiv:2405.09266 (cross-list from cs.CV) [pdf, other]
Title: Dance Any Beat: Blending Beats with Visuals in Dance Video Generation
Comments: 11 pages, 6 figures, demo page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7]  arXiv:2405.09191 (cross-list from cs.CR) [pdf, other]
Title: QMedShield: A Novel Quantum Chaos-based Image Encryption Scheme for Secure Medical Image Storage in the Cloud
Comments: 20 pages, 17 Figures, 9 Tables
Subjects: Cryptography and Security (cs.CR); Multimedia (cs.MM)
[8]  arXiv:2405.09152 (cross-list from cs.CV) [pdf, other]
Title: Scalable Image Coding for Humans and Machines Using Feature Fusion Network
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Wed, 15 May 2024

[9]  arXiv:2405.08813 (cross-list from cs.CV) [pdf, other]
Title: CinePile: A Long Video Question Answering Dataset and Benchmark
Comments: Project page with all the artifacts - this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[10]  arXiv:2405.08745 (cross-list from eess.IV) [pdf, other]
Title: Enhancing Blind Video Quality Assessment with Rich Quality-aware Features
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[11]  arXiv:2405.08619 (cross-list from cs.CL) [pdf, other]
Title: ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive Optimisation
Authors: Dimitris Gkoumas
Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[12]  arXiv:2405.08555 (cross-list from cs.CV) [pdf, other]
Title: Dual-Branch Network for Portrait Image Quality Assessment
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[13]  arXiv:2405.08465 (cross-list from cs.IR) [pdf, other]
Title: How to Surprisingly Consider Recommendations? A Knowledge-Graph-based Approach Relying on Complex Network Metrics
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Social and Information Networks (cs.SI)

Tue, 14 May 2024

[14]  arXiv:2405.07930 [pdf, other]
Title: Improving Multimodal Learning with Multi-Loss Gradient Modulation
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[15]  arXiv:2405.07827 [pdf, other]
Title: Automatic Recognition of Food Ingestion Environment from the AIM-2 Wearable Sensor
Comments: Accepted at CVPRw 2024
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[16]  arXiv:2405.07759 [pdf, other]
Title: MADRL-Based Rate Adaptation for 360$\degree$ Video Streaming with Multi-Viewpoint Prediction
Comments: Accepted by IEEE Internet of Things Journal
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI); Image and Video Processing (eess.IV)
[17]  arXiv:2405.07689 [pdf, other]
Title: Quality of Experience Optimization for Real-time XR Video Transmission with Energy Constraints
Comments: 6 pages, 5 figures
Subjects: Multimedia (cs.MM); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
[18]  arXiv:2405.07229 [pdf, other]
Title: MM-InstructEval: Zero-Shot Evaluation of (Multimodal) Large Language Models on Multimodal Reasoning Tasks
Comments: Under review, the new version of MM-BigBench: arXiv:2310.09036
Subjects: Multimedia (cs.MM)
[19]  arXiv:2405.07682 (cross-list from cs.SD) [pdf, other]
Title: FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation
Comments: IJCAI 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[20]  arXiv:2405.07354 (cross-list from cs.SD) [pdf, other]
Title: SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[21]  arXiv:2405.07202 (cross-list from cs.CV) [pdf, other]
Title: Unified Video-Language Pre-training with Synchronized Audio
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22]  arXiv:2405.06995 (cross-list from cs.SD) [pdf, other]
Title: Benchmarking Cross-Domain Audio-Visual Deception Detection
Comments: 10 pages
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

Mon, 13 May 2024

[23]  arXiv:2405.06217 (cross-list from cs.CV) [pdf, other]
Title: DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding
Comments: Accepted by ICME 2024 (Oral)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[24]  arXiv:2405.06143 (cross-list from cs.CV) [pdf, other]
Title: Perceptual Crack Detection for Rendered 3D Textured Meshes
Comments: Accepted by IEEE QoMEX 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG); Multimedia (cs.MM)
[ total of 24 entries: 1-24 ]
[ showing up to 50 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2405, contact, help  (Access key information)