An exploratory analysis of methods for real-time data deduplication in streaming processes
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Modern stream processing systems typically require ingesting and correlating data from multiple data sources. However, these sources are out of control and prone to software errors and unavailability, causing data anomalies that must be necessarily remedied before processing the data. In this context, anomaly, such as data duplication, appears as one of the most prominent challenges of stream processing. Data duplication can hinder real-time analysis of data for decision making. This paper investigates the challenges and performs an experimental analysis of operators and auxiliary tools to help with data deduplication. The results show that there is an increase in data delivery time when using external mechanisms. However, these mechanisms are essential for an ingestion process to guarantee that no data is lost and that no duplicates are persisted.
Originalsprog | Engelsk |
---|---|
Titel | DEBS '23: Proceedings of the 17th ACM International Conference on Distributed and Event-Based Systems |
Forlag | Association for Computing Machinery |
Publikationsdato | 27 jun. 2023 |
Sider | 91–102 |
ISBN (Elektronisk) | 9798400701221 |
DOI | |
Status | Udgivet - 27 jun. 2023 |
Begivenhed | 17th ACM International Conference on Distributed and Event-based Systems - DEBS '23 - Neuchatel, Schweiz Varighed: 27 jun. 2023 → 30 jun. 2023 |
Konference
Konference | 17th ACM International Conference on Distributed and Event-based Systems - DEBS '23 |
---|---|
Land | Schweiz |
By | Neuchatel |
Periode | 27/06/2023 → 30/06/2023 |
ID: 359260915