We gratefully acknowledge support from
the Simons Foundation and member institutions.

Multimedia

Authors and titles for cs.MM in Mar 2024

[ total of 114 entries: 1-114 ]
[ showing 114 entries per page: fewer | more ]
[1]  arXiv:2403.00752 [pdf, other]
Title: An Experimental Study of Low-Latency Video Streaming over 5G
Comments: 6 Pages
Subjects: Multimedia (cs.MM); Performance (cs.PF)
[2]  arXiv:2403.01087 [pdf, other]
Title: Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Comments: 8 pages of content, 1 page of references and 4 figures
Journal-ref: In Proceedings of the 31st ACM International Conference on Multimedia, 2023
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[3]  arXiv:2403.02693 [pdf, other]
Title: Optimizing Mobile-Friendly Viewport Prediction for Live 360-Degree Video Streaming
Comments: 14 pages
Subjects: Multimedia (cs.MM); Image and Video Processing (eess.IV)
[4]  arXiv:2403.02905 [pdf, other]
Title: MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model
Subjects: Multimedia (cs.MM)
[5]  arXiv:2403.03170 [pdf, other]
Title: SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
Comments: To appear in CVPR 2024
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
[6]  arXiv:2403.05060 [pdf, other]
Title: Multimodal Infusion Tuning for Large Models
Subjects: Multimedia (cs.MM); Human-Computer Interaction (cs.HC)
[7]  arXiv:2403.05427 [pdf, other]
Title: Reply with Sticker: New Dataset and Model for Sticker Retrieval
Subjects: Multimedia (cs.MM)
[8]  arXiv:2403.05428 [pdf, other]
Title: Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition
Subjects: Multimedia (cs.MM)
[9]  arXiv:2403.05628 [pdf, other]
Title: AMUSE: Adaptive Multi-Segment Encoding for Dataset Watermarking
Subjects: Multimedia (cs.MM); Cryptography and Security (cs.CR)
[10]  arXiv:2403.05834 [pdf, other]
Title: Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11]  arXiv:2403.05851 [pdf, other]
Title: Interest-Aware Joint Caching, Computing, and Communication Optimization for Mobile VR Delivery in MEC Networks
Subjects: Multimedia (cs.MM); Emerging Technologies (cs.ET)
[12]  arXiv:2403.06660 [pdf, other]
Title: FashionReGen: LLM-Empowered Fashion Report Generation
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
[13]  arXiv:2403.10406 [pdf, other]
Title: Deep Bi-directional Attention Network for Image Super-Resolution Quality Assessment
Comments: 7 pages, 3 figures, published to 2024 IEEE International Conference on Multimedia and Expo (ICME)
Subjects: Multimedia (cs.MM)
[14]  arXiv:2403.10943 [pdf, other]
Title: MIntRec 2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations
Comments: Accepted by ICLR 2024, Long Paper; The abstract is slightly modified due to the length limitation
Subjects: Multimedia (cs.MM); Computation and Language (cs.CL)
[15]  arXiv:2403.10976 [pdf, other]
Title: Quality-Aware Dynamic Resolution Adaptation Framework for Adaptive Video Streaming
Comments: ACM MMSys '24 | Open-Source Software and Dataset. arXiv admin note: substantial text overlap with arXiv:2401.15346
Subjects: Multimedia (cs.MM)
[16]  arXiv:2403.11241 [pdf, other]
Title: Fidelity-preserving Learning-Based Image Compression: Loss Function and Subjective Evaluation Methodology
Comments: 5 pages, 6 figures. In 2023 IEEE International Conference on Visual Communications and Image Processing (VCIP)
Subjects: Multimedia (cs.MM)
[17]  arXiv:2403.11700 [pdf, other]
Title: Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing
Subjects: Multimedia (cs.MM)
[18]  arXiv:2403.11757 [pdf, other]
Title: Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation
Subjects: Multimedia (cs.MM); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19]  arXiv:2403.12053 [src]
Title: PiGW: A Plug-in Generative Watermarking Framework
Comments: Improve experimental content
Subjects: Multimedia (cs.MM)
[20]  arXiv:2403.12618 [pdf, other]
Title: NewsCaption: Named-Entity aware Captioning for Out-of-Context Media
Subjects: Multimedia (cs.MM); Social and Information Networks (cs.SI)
[21]  arXiv:2403.12667 [pdf, other]
Title: ICE: Interactive 3D Game Character Editing via Dialogue
Subjects: Multimedia (cs.MM); Human-Computer Interaction (cs.HC)
[22]  arXiv:2403.15226 [pdf, other]
Title: Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models
Subjects: Multimedia (cs.MM); Computation and Language (cs.CL)
[23]  arXiv:2403.15256 [pdf, other]
Title: Experimental Studies of Metaverse Streaming
Comments: Accepted by IEEE Consumer Electronics Magazine
Subjects: Multimedia (cs.MM); Networking and Internet Architecture (cs.NI)
[24]  arXiv:2403.16951 [pdf, other]
Title: Network-Assisted Delivery of Adaptive Video Streaming Services through CDN, SDN, and MEC
Authors: Reza Farahani
Comments: PhD thesis defended in 22.08.2023 (this https URL)
Subjects: Multimedia (cs.MM); Networking and Internet Architecture (cs.NI)
[25]  arXiv:2403.16985 [pdf, other]
Title: Towards Low-Latency and Energy-Efficient Hybrid P2P-CDN Live Video Streaming
Comments: 6 pages, 3 figures, Special Issue on Sustainable Multimedia Communications and Services, IEEE MMTC Communications
Subjects: Multimedia (cs.MM)
[26]  arXiv:2403.17420 [pdf, other]
Title: Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
Comments: Accepted at CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[27]  arXiv:2403.19002 [pdf, other]
Title: Robust Active Speaker Detection in Noisy Environments
Comments: 15 pages, 5 figures
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[28]  arXiv:2403.20194 [pdf, other]
Title: ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models
Subjects: Multimedia (cs.MM)
[29]  arXiv:2403.00781 (cross-list from cs.IR) [pdf, other]
Title: ChatDiet: Empowering Personalized Nutrition-Oriented Food Recommender Chatbots through an LLM-Augmented Framework
Comments: Accepted by The IEEE/ACM international conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE) 2024
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[30]  arXiv:2403.01700 (cross-list from cs.SD) [pdf, other]
Title: Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer
Comments: Accepted by ICASSP 2024
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[31]  arXiv:2403.02707 (cross-list from cs.CV) [pdf, other]
Title: Enhancing Generalization in Medical Visual Question Answering Tasks via Gradient-Guided Model Perturbation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[32]  arXiv:2403.03095 (cross-list from cs.CV) [pdf, other]
Title: Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization
Comments: Accepted To ICASSP2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33]  arXiv:2403.03145 (cross-list from cs.CV) [pdf, other]
Title: Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization
Comments: Accepted to NeurIPS2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[34]  arXiv:2403.03740 (cross-list from cs.CV) [pdf, other]
Title: Self-supervised Photographic Image Layout Representation Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[35]  arXiv:2403.04245 (cross-list from cs.SD) [pdf, other]
Title: A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
Comments: the paper is accepted by CVPR2024
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[36]  arXiv:2403.04321 (cross-list from cs.CV) [pdf, other]
Title: Discriminative Probing and Tuning for Text-to-Image Generation
Comments: CVPR 2024; project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[37]  arXiv:2403.04523 (cross-list from cs.CV) [pdf, other]
Title: T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[38]  arXiv:2403.05050 (cross-list from cs.CV) [pdf, other]
Title: DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception
Comments: Project: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[39]  arXiv:2403.05105 (cross-list from cs.CV) [pdf, other]
Title: Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
Comments: CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[40]  arXiv:2403.05192 (cross-list from cs.NI) [pdf, other]
Title: An End-to-End Pipeline Perspective on Video Streaming in Best-Effort Networks: A Survey and Tutorial
Subjects: Networking and Internet Architecture (cs.NI); Multimedia (cs.MM)
[41]  arXiv:2403.05261 (cross-list from cs.CV) [pdf, other]
Title: Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval
Comments: 9 pages, Accepted by AAAI2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[42]  arXiv:2403.05658 (cross-list from cs.CV) [pdf, other]
Title: Feature CAM: Interpretable AI in Image Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[43]  arXiv:2403.05768 (cross-list from cs.CV) [pdf, other]
Title: Deep Contrastive Multi-view Clustering under Semantic Feature Guidance
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[44]  arXiv:2403.06324 (cross-list from cs.NI) [pdf, other]
Title: ACM MMSys 2024 Bandwidth Estimation in Real Time Communications Challenge
Subjects: Networking and Internet Architecture (cs.NI); Multimedia (cs.MM)
[45]  arXiv:2403.06497 (cross-list from cs.CV) [pdf, other]
Title: QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[46]  arXiv:2403.06776 (cross-list from cs.HC) [pdf, other]
Title: Born to Run, Programmed to Play: Mapping the Extended Reality Exergames Landscape
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[47]  arXiv:2403.07338 (cross-list from cs.IT) [pdf, ps, other]
Title: D$^2$-JSCC: Digital Deep Joint Source-channel Coding for Semantic Communications
Subjects: Information Theory (cs.IT); Multimedia (cs.MM); Signal Processing (eess.SP)
[48]  arXiv:2403.07613 (cross-list from cs.HC) [pdf, other]
Title: Imagine a dragon made of seaweed: How images enhance learning in Wikipedia
Comments: 16 pages, 10 figures
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[49]  arXiv:2403.07839 (cross-list from cs.CV) [pdf, other]
Title: MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
Comments: 18 pages, 8 figures, Published in CVPR2024
Journal-ref: In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[50]  arXiv:2403.07938 (cross-list from cs.SD) [pdf, other]
Title: Text-to-Audio Generation Synchronized with Videos
Comments: arXiv admin note: text overlap with arXiv:2305.12903
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[51]  arXiv:2403.07952 (cross-list from cs.CV) [pdf, other]
Title: AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production
Comments: 22 pages, 13 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[52]  arXiv:2403.08580 (cross-list from cs.CV) [pdf, other]
Title: Leveraging Compressed Frame Sizes For Ultra-Fast Video Classification
Comments: 5 pages, 5 figures, 1 table. arXiv admin note: substantial text overlap with arXiv:2309.07361
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[53]  arXiv:2403.08773 (cross-list from cs.CV) [pdf, other]
Title: Veagle: Advancements in Multimodal Representation Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[54]  arXiv:2403.08806 (cross-list from cs.CV) [pdf, other]
Title: Adversarially Robust Deepfake Detection via Adversarial Feature Similarity Learning
Authors: Sarwar Khan
Comments: MMM 2024 Accepted
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[55]  arXiv:2403.08824 (cross-list from cs.HC) [pdf, other]
Title: Measuring Non-Typical Emotions for Mental Health: A Survey of Computational Approaches
Comments: Under review in IEEE Transactions on Affective Computing
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[56]  arXiv:2403.09407 (cross-list from cs.SD) [pdf, other]
Title: LM2D: Lyrics- and Music-Driven Dance Synthesis
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[57]  arXiv:2403.09451 (cross-list from cs.CV) [pdf, other]
Title: M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment
Journal-ref: Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2 VISAPP: VISAPP, 869-876, 2024 , Rome, Italy
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[58]  arXiv:2403.09502 (cross-list from cs.LG) [pdf, other]
Title: EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
Comments: 14 pages, 3 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[59]  arXiv:2403.10020 (cross-list from cs.CL) [pdf, other]
Title: Lost in Overlap: Exploring Watermark Collision in LLMs
Comments: Short Paper, 4 pages
Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[60]  arXiv:2403.10024 (cross-list from cs.SD) [pdf, other]
Title: MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[61]  arXiv:2403.10061 (cross-list from cs.CV) [pdf, other]
Title: PAME: Self-Supervised Masked Autoencoder for No-Reference Point Cloud Quality Assessment
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[62]  arXiv:2403.10066 (cross-list from cs.CV) [pdf, other]
Title: Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[63]  arXiv:2403.10107 (cross-list from cs.CV) [pdf, other]
Title: Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[64]  arXiv:2403.10254 (cross-list from cs.CV) [pdf, other]
Title: Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Comments: This work is accepted by CVPR2024. More modifications may be performed
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Multimedia (cs.MM)
[65]  arXiv:2403.10667 (cross-list from cs.IR) [pdf, other]
Title: Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond
Comments: ICLR 2024
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[66]  arXiv:2403.10851 (cross-list from cs.HC) [pdf, ps, other]
Title: GustosonicSense: Towards understanding the design of playful gustosonic eating experiences
Comments: To appear at CHI'24: The ACM Conference on Human Factors in Computing Systems (CHI), Honolulu, Hawaii, 2024
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[67]  arXiv:2403.10883 (cross-list from cs.CV) [pdf, other]
Title: Improving Adversarial Transferability of Visual-Language Pre-training Models through Collaborative Multimodal Interaction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Multimedia (cs.MM)
[68]  arXiv:2403.10897 (cross-list from cs.CV) [pdf, other]
Title: Rethinking Multi-view Representation Learning via Distilled Disentangling
Comments: Accepted by CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[69]  arXiv:2403.11074 (cross-list from cs.CV) [pdf, other]
Title: Audio-Visual Segmentation via Unlabeled Frame Exploitation
Comments: Accepted by CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[70]  arXiv:2403.11311 (cross-list from cs.CL) [pdf, other]
Title: Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding
Comments: LREC-COLING 2024, Long Paper
Subjects: Computation and Language (cs.CL); Multimedia (cs.MM)
[71]  arXiv:2403.11572 (cross-list from cs.CV) [pdf, other]
Title: Augment Before Copy-Paste: Data and Memory Efficiency-Oriented Instance Segmentation Framework for Sport-scenes
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[72]  arXiv:2403.11576 (cross-list from cs.CV) [pdf, other]
Title: MISS: Memory-efficient Instance Segmentation Framework By Visual Inductive Priors Flow Propagation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[73]  arXiv:2403.11626 (cross-list from cs.GR) [pdf, other]
Title: QEAN: Quaternion-Enhanced Attention Network for Visual Dance Generation
Comments: Accepted by The Visual Computer Journal
Subjects: Graphics (cs.GR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74]  arXiv:2403.11959 (cross-list from cs.CV) [pdf, other]
Title: IVAC-P2L: Leveraging Irregular Repetition Priors for Improving Video Action Counting
Comments: Source code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[75]  arXiv:2403.11999 (cross-list from cs.CV) [pdf, other]
Title: HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs
Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[76]  arXiv:2403.12686 (cross-list from cs.CV) [pdf, other]
Title: WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar
Comments: 10 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Robotics (cs.RO)
[77]  arXiv:2403.13480 (cross-list from cs.CV) [pdf, other]
Title: A Unified Optimal Transport Framework for Cross-Modal Retrieval with Noisy Labels
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Multimedia (cs.MM)
[78]  arXiv:2403.13501 (cross-list from cs.CV) [pdf, other]
Title: VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[79]  arXiv:2403.13667 (cross-list from cs.CV) [pdf, other]
Title: DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance
Comments: Accept to CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[80]  arXiv:2403.14449 (cross-list from cs.RO) [pdf, other]
Title: Bringing Robots Home: The Rise of AI Robots in Consumer Electronics
Comments: Accepted by IEEE Consumer Electronics Magazine
Subjects: Robotics (cs.RO); Multimedia (cs.MM)
[81]  arXiv:2403.14468 (cross-list from cs.CV) [pdf, other]
Title: AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
Comments: preprint
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[82]  arXiv:2403.14652 (cross-list from cs.CY) [pdf, other]
Title: MemeCraft: Contextual and Stance-Driven Multimodal Meme Generation
Comments: 8 pages, 7 figures, ACM MM 2024
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[83]  arXiv:2403.14773 (cross-list from cs.CV) [pdf, other]
Title: StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Comments: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[84]  arXiv:2403.14972 (cross-list from cs.AI) [pdf, other]
Title: A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal Reasoning
Comments: Work in progress
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multiagent Systems (cs.MA); Multimedia (cs.MM)
[85]  arXiv:2403.15048 (cross-list from cs.CV) [pdf, other]
Title: Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning
Comments: 11 pages, 12 figures, 1 table, Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[86]  arXiv:2403.15679 (cross-list from cs.CV) [pdf, other]
Title: DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes
Comments: CVPR 2024. Project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[87]  arXiv:2403.15694 (cross-list from cs.LG) [pdf, other]
Title: Group Benefits Instances Selection for Data Purification
Comments: accepted by IEEE Intelligent Systems
Subjects: Machine Learning (cs.LG); Multimedia (cs.MM)
[88]  arXiv:2403.16071 (cross-list from cs.AI) [pdf, other]
Title: Landmark-Guided Cross-Speaker Lip Reading with Mutual Information Regularization
Comments: To appear in LREC-COLING 2024
Journal-ref: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[89]  arXiv:2403.17000 (cross-list from cs.CV) [pdf, other]
Title: Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution
Comments: CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[90]  arXiv:2403.17001 (cross-list from cs.CV) [pdf, other]
Title: VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation
Comments: CVPR 2024; Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[91]  arXiv:2403.17004 (cross-list from cs.CV) [pdf, other]
Title: SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer
Comments: CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[92]  arXiv:2403.17005 (cross-list from cs.CV) [pdf, other]
Title: TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
Comments: CVPR 2024; Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[93]  arXiv:2403.17589 (cross-list from cs.CV) [pdf, other]
Title: Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
Comments: CVPR2024; Codes are available at \url{this https URL}
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[94]  arXiv:2403.17708 (cross-list from cs.CV) [pdf, other]
Title: Panonut360: A Head and Eye Tracking Dataset for Panoramic Video
Comments: 7 pages,ACM MMSys'24 accepted
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[95]  arXiv:2403.17727 (cross-list from cs.CV) [pdf, other]
Title: FastPerson: Enhancing Video Learning through Effective Video Summarization that Preserves Linguistic and Visual Contexts
Journal-ref: AHs '24: Proceedings of the Augmented Humans International Conference 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
[96]  arXiv:2403.17837 (cross-list from cs.CV) [pdf, other]
Title: GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction
Comments: Submitted to IEEE
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)
[97]  arXiv:2403.17870 (cross-list from cs.CV) [pdf, other]
Title: Boosting Diffusion Models with Moving Average Sampling in Frequency Domain
Comments: CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[98]  arXiv:2403.18063 (cross-list from cs.CV) [pdf, other]
Title: Spectral Convolutional Transformer: Harmonizing Real vs. Complex Multi-View Spectral Operators for Vision Transformer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[99]  arXiv:2403.18252 (cross-list from cs.CV) [pdf, other]
Title: Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[100]  arXiv:2403.18323 (cross-list from cs.NI) [pdf, other]
Title: How to Cache Important Contents for Multi-modal Service in Dynamic Networks: A DRL-based Caching Scheme
Journal-ref: IEEE Transactions on Multimedia (Early Access), 2024
Subjects: Networking and Internet Architecture (cs.NI); Multimedia (cs.MM)
[101]  arXiv:2403.18714 (cross-list from cs.CV) [pdf, other]
Title: Bringing Textual Prompt to AI-Generated Image Quality Assessment
Comments: 6 pages, 3 figures, accepted by ICME2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[102]  arXiv:2403.18715 (cross-list from cs.CV) [pdf, other]
Title: Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[103]  arXiv:2403.18821 (cross-list from cs.SD) [pdf, other]
Title: Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark
Comments: Accepted to CVPR 2024. Project site: this https URL
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[104]  arXiv:2403.19456 (cross-list from cs.CV) [pdf, other]
Title: Break-for-Make: Modular Low-Rank Adaptations for Composable Content-Style Customization
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM)
[105]  arXiv:2403.19651 (cross-list from cs.CV) [pdf, other]
Title: MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Comments: Work in progress
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Multimedia (cs.MM)
[106]  arXiv:2403.19723 (cross-list from cs.CL) [pdf, other]
Title: HGT: Leveraging Heterogeneous Graph-enhanced Large Language Models for Few-shot Complex Table Understanding
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Databases (cs.DB); Multimedia (cs.MM)
[107]  arXiv:2403.19763 (cross-list from cs.SD) [pdf, other]
Title: Creating Aesthetic Sonifications on the Web with SIREN
Comments: 7 pages, 1 figure, 5 listings, submitted to the Web Audio Conference 2024
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[108]  arXiv:2403.04804 (cross-list from eess.AS) [pdf, other]
Title: AttentionStitch: How Attention Solves the Speech Editing Problem
Comments: Accepted in Machine Learning for Audio workship in NeurIPS 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[109]  arXiv:2403.08505 (cross-list from eess.IV) [pdf, other]
Title: Content-aware Masked Image Modeling Transformer for Stereo Image Compression
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[110]  arXiv:2403.08551 (cross-list from eess.IV) [pdf, other]
Title: GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[111]  arXiv:2403.10936 (cross-list from eess.IV) [pdf, ps, other]
Title: Channel-wise Feature Decorrelation for Enhanced Learned Image Compression
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[112]  arXiv:2403.11155 (cross-list from eess.IV) [pdf, other]
Title: Interactive $360^{\circ}$ Video Streaming Using FoV-Adaptive Coding with Temporal Prediction
Subjects: Image and Video Processing (eess.IV); Multimedia (cs.MM)
[113]  arXiv:2403.15336 (cross-list from eess.AS) [pdf, other]
Title: Dialogue Understandability: Why are we streaming movies with subtitles?
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM)
[114]  arXiv:2403.16143 (cross-list from eess.IV) [pdf, other]
Title: CFAT: Unleashing TriangularWindows for Image Super-resolution
Comments: Accepted to CVPR 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[ total of 114 entries: 1-114 ]
[ showing 114 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2405, contact, help  (Access key information)