References & Citations
Computer Science > Databases
Title: SCHENO: Measuring Schema vs. Noise in Graphs
(Submitted on 20 Apr 2024 (this version), latest version 25 Apr 2024 (v2))
Abstract: Real-world data is typically a noisy manifestation of a core pattern ("schema"), and the purpose of data mining algorithms is to uncover that pattern, thereby splitting (i.e. "decomposing") the data into schema and noise. We introduce SCHENO, a principled evaluation metric for the goodness of a schema-noise decomposition of a graph. SCHENO captures how schematic the schema is, how noisy the noise is, and how well the combination of the two represent the original graph data. We visually demonstrate what our metric prioritizes in small graphs, then show that if SCHENO is used as the fitness function for a simple genetic algorithm, we can uncover a wide variety of patterns. Finally, we evaluate several famous graph mining algorithms with our metric, finding that although they produce patterns, those patterns do not always represent the input data.
Submission history
From: Justus Hibshman [view email][v1] Sat, 20 Apr 2024 23:54:52 GMT (6322kb,D)
[v2] Thu, 25 Apr 2024 00:59:20 GMT (6323kb,D)
Link back to: arXiv, form interface, contact.