We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DB

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Databases

Title: SCHENO: Measuring Schema vs. Noise in Graphs

Abstract: Real-world data is typically a noisy manifestation of a core pattern (schema), and the purpose of data mining algorithms is to uncover that pattern, thereby splitting (i.e. decomposing) the data into schema and noise. We introduce SCHENO, a principled evaluation metric for the goodness of a schema-noise decomposition of a graph. SCHENO captures how schematic the schema is, how noisy the noise is, and how well the combination of the two represent the original graph data. We visually demonstrate what this metric prioritizes in small graphs, then show that if SCHENO is used as the fitness function for a simple optimization strategy, we can uncover a wide variety of patterns. Finally, we evaluate several well-known graph mining algorithms with this metric; we find that although they produce patterns, those patterns are not always the best representation of the input data.
Subjects: Databases (cs.DB)
MSC classes: 68R10, 68T10, 08A35
Cite as: arXiv:2404.13489 [cs.DB]
  (or arXiv:2404.13489v2 [cs.DB] for this version)

Submission history

From: Justus Hibshman [view email]
[v1] Sat, 20 Apr 2024 23:54:52 GMT (6322kb,D)
[v2] Thu, 25 Apr 2024 00:59:20 GMT (6323kb,D)

Link back to: arXiv, form interface, contact.