An exploratory analysis of methods for real-time data deduplication in streaming processes

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Modern stream processing systems typically require ingesting and correlating data from multiple data sources. However, these sources are out of control and prone to software errors and unavailability, causing data anomalies that must be necessarily remedied before processing the data. In this context, anomaly, such as data duplication, appears as one of the most prominent challenges of stream processing. Data duplication can hinder real-time analysis of data for decision making. This paper investigates the challenges and performs an experimental analysis of operators and auxiliary tools to help with data deduplication. The results show that there is an increase in data delivery time when using external mechanisms. However, these mechanisms are essential for an ingestion process to guarantee that no data is lost and that no duplicates are persisted.
OriginalsprogEngelsk
TitelDEBS '23: Proceedings of the 17th ACM International Conference on Distributed and Event-Based Systems
ForlagAssociation for Computing Machinery
Publikationsdato27 jun. 2023
Sider91–102
ISBN (Elektronisk)9798400701221
DOI
StatusUdgivet - 27 jun. 2023
Begivenhed17th ACM International Conference on Distributed and Event-based Systems - DEBS '23 - Neuchatel, Schweiz
Varighed: 27 jun. 202330 jun. 2023

Konference

Konference17th ACM International Conference on Distributed and Event-based Systems - DEBS '23
LandSchweiz
ByNeuchatel
Periode27/06/202330/06/2023

ID: 359260915