Streaming Algorithms for Diversity Maximization with Fairness Constraints

Wang, Yanhao; Fabbri, Francesco; Mathioudakis, Michael

doi:10.1109/ICDE53745.2022.00008

Full-text links:

Download:

Current browse context:

cs.DS

< prev | next >

new | recent | 2208

Computer Science > Data Structures and Algorithms

Title: Streaming Algorithms for Diversity Maximization with Fairness Constraints

Authors: Yanhao Wang, Francesco Fabbri, Michael Mathioudakis

(Submitted on 30 Jul 2022)

Abstract: Diversity maximization is a fundamental problem with wide applications in data summarization, web search, and recommender systems. Given a set $X$ of $n$ elements, it asks to select a subset $S$ of $k \ll n$ elements with maximum \emph{diversity}, as quantified by the dissimilarities among the elements in $S$. In this paper, we focus on the diversity maximization problem with fairness constraints in the streaming setting. Specifically, we consider the max-min diversity objective, which selects a subset $S$ that maximizes the minimum distance (dissimilarity) between any pair of distinct elements within it. Assuming that the set $X$ is partitioned into $m$ disjoint groups by some sensitive attribute, e.g., sex or race, ensuring \emph{fairness} requires that the selected subset $S$ contains $k_i$ elements from each group $i \in [1,m]$. A streaming algorithm should process $X$ sequentially in one pass and return a subset with maximum \emph{diversity} while guaranteeing the fairness constraint. Although diversity maximization has been extensively studied, the only known algorithms that can work with the max-min diversity objective and fairness constraints are very inefficient for data streams. Since diversity maximization is NP-hard in general, we propose two approximation algorithms for fair diversity maximization in data streams, the first of which is $\frac{1-\varepsilon}{4}$-approximate and specific for $m=2$, where $\varepsilon \in (0,1)$, and the second of which achieves a $\frac{1-\varepsilon}{3m+2}$-approximation for an arbitrary $m$. Experimental results on real-world and synthetic datasets show that both algorithms provide solutions of comparable quality to the state-of-the-art algorithms while running several orders of magnitude faster in the streaming setting.

Comments:	13 pages, 11 figures; published in ICDE 2022
Subjects:	Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
DOI:	10.1109/ICDE53745.2022.00008
Cite as:	arXiv:2208.00194 [cs.DS]
	(or arXiv:2208.00194v1 [cs.DS] for this version)

Submission history

From: Yanhao Wang [view email]
[v1] Sat, 30 Jul 2022 11:47:31 GMT (3014kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2208.00194

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Data Structures and Algorithms

Title: Streaming Algorithms for Diversity Maximization with Fairness Constraints

Submission history