Research on Cassandra Data Compaction Strategies for Time-Series Data


Research on Cassandra Data Compaction Strategies for Time-Series Data is a scholarly work, published in 2016 in ''Journal of Computers''. The main subjects of the publication include grid computing, compaction, stream processing, cloud computing, series, and computer science. Storage and analysis of time-series data is a subject of intense interest in the current international database research field.Time series data, a sequence of collected data information points by fixing time interval, is an important basis to proceed business analysis and prediction in the future.As an excellent NoSQL database, Cassandra is often used to storage time-series data because of its characteristics of data model.In the scene of real application, time-series data used to proceed the management of data life cycle by setting up TTL; the real delete operation would not be executed immediately, while unnecessary data will be deleted during the compaction course.This paper focuses on the issue of the effect of different strategies for time-series data storage and the research on three Cassandra storage strategies: Size-Tiered Compaction Strategy, Leveled Compaction Strategy and Date-Tiered Compaction Strategy; and comparative test based on stable data storage, recording speed sorted string tables file numbers and so on.Finally, the compaction strategies suitable for time-series data application scenarios are obtained by carrying on experiments.