-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IOTDB-5443] Implement Chimp encoding in IoTDB #8766
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, this is your first pull request in IoTDB project. Thanks for your contribution! IoTDB will be better because of you.
Some checks are failing but I don't think my PR has anything to do with the failures. |
tsfile/src/main/java/org/apache/iotdb/tsfile/encoding/decoder/DoublePrecisionChimpDecoder.java
Outdated
Show resolved
Hide resolved
Hi Panagiotis, thanks for contribution! Please update the UserGuide, then users could see this encoding: docs/UserGuide/Data-Concept/Encoding.md You could just copy the English version into docs/zh/UserGuide/Data-Concept/Encoding.md, then we could translate in the review. |
Hi, there are more code to be update, in order to make Chimp can be used in IoTDB. server/src/main/java/org/apache/iotdb/db/utils/SchemaUtils.java |
I have used the above example to make tests with 4 different configurations using the first 1,000 values of the basel-wind dataset (see Figure in the PR text). The results are: Chimp (with snappy): 4695 bytes |
Also some timing comparisons/experiments in milliseconds with three different datasets: 5,000,000 values (Stocks-Germany)GORILLA Encoding time: 1163 2,905,887 values (city-temperature)GORILLA Encoding time: 855 8,927 values (SSD-benchmark)GORILLA Encoding time: 28 These timings refer to encoding the values to a byte array and decoding them using the Java code only. If writing the data to disk is involved the speed up that Chimp can offer will be more evident, as it usually needs to write and read much less bytes to and from the disk. |
Co-authored-by: Haonan <[email protected]>
Description
This PR adds the Chimp compression algorithm for double and single precision floating point data.
Algorithm.
Chimp was recently presented in VLDB 2022 (https://www.vldb.org/pvldb/vol15/p3058-liakos.pdf):
Panagiotis Liakos, Katia Papakonstantinopoulou, Yannis Kotidis:
Chimp: Efficient Lossless Floating Point Compression for Time Series Databases. Proc. VLDB Endow. 15(11): 3058-3070 (2022)
The algorithm focuses exclusively on floating point data and takes advantage of more than one earlier values encountered to significantly outperform the state-of-the-art Gorilla algorithm in terms of compression ratio, while preserving its speed.
The implementations provided here focus on the Chimp128 variation for double precision, that uses 128 previous earlier values, and Chimp64 for single precision, that uses 64 earlier values.
Indicative results for many different time series datasets, that highlight the significant benefits expected with the adoption of Chimp in terms of space savings:

Adoption
Chimp is already used in the latest releases of DuckDB (duckdb/duckdb#4878)
Implementation
The algorithm is implemented to reuse code from GorillaEncoderV2.java and GorillaDecoderV2.java, and Long, DoublePrecision, Int and Float versions have been built on top of it. Method organization, design, and naming has been based on the respective Gorilla classes.
Testing
A new class named ChimpDecoderTest executes all tests implemented in the GorillaDecoderV2Test class, ensuring the threshold for code coverage
This PR has:
for an unfamiliar reader.
for code coverage.
Key changed/added classes (or packages if there are too many classes) in this PR
tsfile/src/main/java/org/apache/iotdb/tsfile/encoding/encoder/IntChimpEncoder.java
tsfile/src/main/java/org/apache/iotdb/tsfile/encoding/encoder/LongChimpEncoder.java
tsfile/src/main/java/org/apache/iotdb/tsfile/encoding/encoder/SinglePrecisionChimpEncoder.java
tsfile/src/main/java/org/apache/iotdb/tsfile/encoding/encoder/DoublePrecisionChimpEncoder.java
tsfile/src/main/java/org/apache/iotdb/tsfile/encoding/encoder/IntChimpDecoder.java
tsfile/src/main/java/org/apache/iotdb/tsfile/encoding/encoder/LongChimpDecoder.java
tsfile/src/main/java/org/apache/iotdb/tsfile/encoding/encoder/SinglePrecisionChimpDecoder.java
tsfile/src/main/java/org/apache/iotdb/tsfile/encoding/encoder/DoublePrecisionChimpDecoder.java