We gratefully acknowledge support from
the Simons Foundation and member institutions.

Multimedia

Authors and titles for recent submissions

[ total of 26 entries: 1-25 | 26 ]
[ showing 25 entries per page: fewer | more | all ]

Fri, 24 May 2024

[1]  arXiv:2405.14040 [pdf, other]
Title: Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline
Comments: 15 pages, 13 figures
Subjects: Multimedia (cs.MM)
[2]  arXiv:2405.14709 (cross-list from cs.CV) [pdf, other]
Title: OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[3]  arXiv:2405.14598 (cross-list from cs.CV) [pdf, other]
Title: Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4]  arXiv:2405.14556 (cross-list from cs.CV) [pdf, ps, other]
Title: Deep Learning Classification of Photoplethysmogram Signal for Hypertension Levels
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[5]  arXiv:2405.14312 (cross-list from cs.CV) [pdf, other]
Title: Improving Gloss-free Sign Language Translation by Reducing Representation Density
Comments: Representation Density and Performance Drop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
[6]  arXiv:2405.14225 (cross-list from q-bio.QM) [pdf, other]
Title: ReactXT: Understanding Molecular "Reaction-ship" via Reaction-Contextualized Molecule-Text Pretraining
Comments: ACL 2024 Findings, 9 pages
Subjects: Quantitative Methods (q-bio.QM); Computation and Language (cs.CL); Multimedia (cs.MM)
[7]  arXiv:2405.13984 (cross-list from cs.CL) [pdf, other]
Title: Feedback-aligned Mixed LLMs for Machine Language-Molecule Translation
Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[8]  arXiv:2405.13389 (cross-list from cs.CV) [pdf, other]
Title: HR-INR: Continuous Space-Time Video Super-Resolution via Event Camera
Comments: 30 pages, 20 figures, 8 tables. This work was submitted for review in the second half of 2023. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Robotics (cs.RO)
[9]  arXiv:2405.13127 (cross-list from cs.CV) [pdf, other]
Title: Towards Retrieval-Augmented Architectures for Image Captioning
Comments: ACM Transactions on Multimedia Computing, Communications and Applications (2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[10]  arXiv:2405.13049 (cross-list from cs.CL) [pdf, other]
Title: SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations
Comments: 12 pages, 3 figures, 4 Tables
Journal-ref: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM)

Wed, 22 May 2024

[11]  arXiv:2405.12775 [pdf, other]
Title: Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances
Comments: Accepted by ACL 2024, Main Conference, Long Paper
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[12]  arXiv:2405.12847 (cross-list from cs.IR) [pdf, other]
Title: A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability
Journal-ref: Proceedings of the 24th International Society for Music Information Retrieval Conference, 174-181. Milan, Italy, November 5-9, 2023
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[13]  arXiv:2405.12564 (cross-list from q-bio.QM) [pdf, other]
Title: ProtT3: Protein-to-Text Generation for Text-based Protein Understanding
Comments: ACL 2024, 9 pages
Subjects: Quantitative Methods (q-bio.QM); Computation and Language (cs.CL); Multimedia (cs.MM)
[14]  arXiv:2405.12540 (cross-list from cs.CV) [pdf, other]
Title: Context-Enhanced Video Moment Retrieval with Large Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[15]  arXiv:2405.12512 (cross-list from cs.CV) [pdf, other]
Title: Rethink Predicting the Optical Flow with the Kinetics Perspective
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[16]  arXiv:2405.12336 (cross-list from cs.CR) [pdf, ps, other]
Title: Interoperable Provenance Authentication of Broadcast Media using Open Standards-based Metadata, Watermarking and Cryptography
Comments: 17 pages, 9 figures. Submitted to IBC2024 Technical Papers Programme
Subjects: Cryptography and Security (cs.CR); Multimedia (cs.MM)

Tue, 21 May 2024

[17]  arXiv:2405.11742 [pdf, other]
Title: Universal Organizer of SAM for Unsupervised Semantic Segmentation
Comments: accepted by IEEE International Conference on Multimedia & Expo
Subjects: Multimedia (cs.MM)
[18]  arXiv:2405.12221 (cross-list from cs.CV) [pdf, other]
Title: Images that Sound: Composing Images and Sounds on a Single Canvas
Comments: Project site: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19]  arXiv:2405.12126 (cross-list from cs.CV) [pdf, other]
Title: Alzheimer's Magnetic Resonance Imaging Classification Using Deep and Meta-Learning Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Emerging Technologies (cs.ET); Machine Learning (cs.LG); Multimedia (cs.MM)
[20]  arXiv:2405.11295 (cross-list from eess.IV) [pdf, ps, other]
Title: Medical Image Analysis for Detection, Treatment and Planning of Disease using Artificial Intelligence Approaches
Comments: 10 pages, 3 figures
Journal-ref: International Journal of Microsystems and IoT, Vol. 1, Issue 5, pp.278- 287, 2023
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[21]  arXiv:2405.11273 (cross-list from cs.AI) [pdf, other]
Title: Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Comments: 22 pages, 13 figures. Project Website: this https URL Working in progress
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[22]  arXiv:2405.11145 (cross-list from cs.CV) [pdf, other]
Title: Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[23]  arXiv:2405.11093 (cross-list from eess.AS) [pdf, other]
Title: AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted Augmentations
Authors: David Xu
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD)

Mon, 20 May 2024

[24]  arXiv:2405.10497 [pdf, other]
Title: SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge
Comments: ACM Multimedia. arXiv admin note: text overlap with arXiv:1910.01795
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)

Fri, 17 May 2024 (showing first 1 of 2 entries)

[25]  arXiv:2405.10029 [pdf, other]
Title: AsCL: An Asymmetry-sensitive Contrastive Learning Method for Image-Text Retrieval with Cross-Modal Fusion
Comments: This work has been strong-accepted as the oral conference paper by IEEE International Conference on Multimedia & Expo (ICME) 2024
Subjects: Multimedia (cs.MM)
[ total of 26 entries: 1-25 | 26 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2405, contact, help  (Access key information)