We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computer Vision and Pattern Recognition

Title: MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

Abstract: With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic. However, existing compression algorithms must sacrifice either consistency with the ground truth or perceptual quality at ultra-low bitrate. In recent years, the rapid development of the Large Multimodal Model (LMM) has made it possible to balance these two goals. To solve this problem, this paper proposes a method called Multimodal Image Semantic Compression (MISC), which consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information. Experimental results show that our proposed MISC is suitable for compressing both traditional Natural Sense Images (NSIs) and emerging AI-Generated Images (AIGIs) content. It can achieve optimal consistency and perception results while saving 50% bitrate, which has strong potential applications in the next generation of storage and communication. The code will be released on this https URL
Comments: 13 page, 11 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Cite as: arXiv:2402.16749 [cs.CV]
  (or arXiv:2402.16749v3 [cs.CV] for this version)

Submission history

From: Chunyi Li [view email]
[v1] Mon, 26 Feb 2024 17:11:11 GMT (45904kb,D)
[v2] Thu, 29 Feb 2024 16:53:20 GMT (48616kb,D)
[v3] Wed, 17 Apr 2024 14:06:28 GMT (44713kb,D)

Link back to: arXiv, form interface, contact.