Codebase list c-blosc / a0f483f
New upstream version 1.21.1+ds1 Håvard Flaget Aasen 2 years ago
9 changed file(s) with 410 addition(s) and 258 deletion(s). Raw diff Collapse all Expand all
00 name: CIFuzz
1 on: [pull_request]
1 on: [push, pull_request]
22 jobs:
33 Fuzzing:
44 runs-on: ubuntu-latest
00 ===============================================================
1 Announcing C-Blosc 1.20.1
1 Announcing C-Blosc 1.21.1
22 A blocking, shuffling and lossless compression library for C
33 ===============================================================
44
55 What is new?
66 ============
77
8 This is a maintenance release. Vendored zlib 1.2.8 is now compatible
9 with Python 3.8 in recent Mac OSX. For details, see:
10 https://github.com/Blosc/python-blosc/issues/229
8 This is a maintenance release. Fix pthread flag when linking on ppc64le.
9 Vendored BloscLZ, Zlib and Zstd codecs have been updated to their latest
10 versions too; this can bring important performance improvements, so if
11 speed is a priority to you, an upgrade is recommended.
1112
1213 For more info, please see the release notes in:
1314
0 C-Blosc libraries come with Python-Blosc wheels
1 ===============================================
2
3 Starting on version 1.21.0, C-Blosc binary libraries can easily be installed from Python-Blosc (>= 1.10) wheels:
4
5 .. code-block:: console
6
7 $ pip install blosc (base)
8 Collecting blosc
9 Downloading blosc-1.10.0-cp37-cp37m-macosx_10_9_x86_64.whl (2.2 MB)
10 |████████████████████████████████| 2.2 MB 4.7 MB/s
11 Installing collected packages: blosc
12 Attempting uninstall: blosc
13 Found existing installation: blosc 1.10.0
14 Uninstalling blosc-1.10.0:
15 Successfully uninstalled blosc-1.10.0
16 Successfully installed blosc-1.10.0
17
18 As a result, one can easily update to the latest version of C-Blosc binaries without the need to manually compile the thing. Following are instructions on how to use the libraries in wheels for different platforms.
19
20
21 Compiling C files with Blosc wheels on Windows
22 ----------------------------------------------
23
24 - The wheels for Windows have been produced with the Microsoft MSVC compiler, so we recommend that you use it too. You can get it for free at: https://visualstudio.microsoft.com/es/downloads/.
25
26 - In order to check that the MSVC command line is set up correctly, enter ``cl`` in the command prompt window and verify that the output looks something like this:
27
28 .. code-block:: console
29
30 > cl
31 Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24245 for x64
32 Copyright (C) Microsoft Corporation. All rights reserved.
33
34 usage: cl [ option... ] filename... [ /link linkoption... ]
35
36 - Now, install the wheels:
37
38 .. code-block:: console
39
40 > pip install blosc
41 Collecting blosc
42 Using cached blosc-1.10.0-cp37-cp37m-win_amd64.whl (1.5 MB)
43 Installing collected packages: blosc
44 Successfully installed blosc-1.10.0
45
46 - Make the compiler available. Its typical installation location uses to be `C:\\Program files (x86)\\Microsoft Visual Studio`, so change your current directory there. Then, to set up the build architecture environment you can open a command prompt window in the `VC\\Auxiliary\\Build` subdirectory and execute `vcvarsall.bat x64` if your achitecture is 64 bits or `vcvarsall.bat x86` if it is 32 bits.
47
48 - You will need to know the path where the Blosc wheel has installed its files. For this we will use the `dir /s` command (but you can use your preferred location method):
49
50 .. code-block:: console
51
52 > dir /s c:\blosc.lib
53 Volume in drive C is OS
54 Volume Serial Number is 7A21-A5D5
55
56 Directory of c:\Users\user\miniconda3\Lib
57
58 14/12/2020 09:56 7.022 blosc.lib
59 1 File(s) 7.022 bytes
60
61 Total list files:
62 1 File(s) 7.022 bytes
63 0 dirs 38.981.902.336 free bytes
64
65 - The output shows the path of blosc.lib in your system, but we are rather interested in the parent one:
66
67 .. code-block:: console
68
69 > set WHEEL_DIR=c:\Users\user\miniconda3
70
71 - Now, it is important to copy the library `blosc.dll` to C:\\Windows\\System32 directory, so it can be found by the executable when it is necessary.
72
73 - Finally, to compile C files using Blosc libraries, enter this command:
74
75 .. code-block:: console
76
77 > cl <file_name>.c <path_of_blosc.lib> /Ox /Fe<file_name>.exe /I<path_of_blosc.h> /MT /link/NODEFAULTLIB:MSVCRT
78
79 - For instance, in the case of blosc "examples/simple.c":
80
81 .. code-block:: console
82
83 > cl simple.c %WHEEL_DIR%\lib\blosc.lib /Ox /Fesimple.exe /I%WHEEL_DIR%\include /MT /link/NODEFAULTLIB:MSVCRT
84
85 Microsoft (R) C/C++ Optimizing Compiler Version 19.10.25017 for x86
86 Copyright (C) Microsoft Corporation. All rights reserved.
87
88 simple.c
89 Microsoft (R) Incremental Linker Version 14.10.25017.0
90 Copyright (C) Microsoft Corporation. All rights reserved.
91
92 /out:simple.exe
93 simple.obj
94 /NODEFAULTLIB:MSVCRT
95 .\miniconda3\lib\blosc.lib
96
97 - And you can run your program:
98
99 .. code-block:: console
100
101 > simple
102 Blosc version info: 1.20.1 ($Date:: 2020-09-08 #$)
103 Compression: 4000000 -> 37816 (105.8x)
104 Decompression succesful!
105 Succesful roundtrip!
106
107 - Rejoice!
108
109
110 Compiling C files with Blosc wheels on Linux
111 --------------------------------------------
112
113 - Install the wheels:
114
115 .. code-block:: console
116
117 $ pip install blosc
118 Collecting blosc
119 Using cached blosc-1.10.0-cp37-cp37m-manylinux2010_x86_64.whl (2.2 MB)
120 Installing collected packages: blosc
121 Successfully installed blosc-1.10.0
122
123 - Find the path where blosc wheel has installed its files:
124
125 .. code-block:: console
126
127 $ find / -name libblosc.so 2>/dev/null
128 /home/soscar/miniconda3/lib/libblosc.so
129
130 - The output shows the path of libblosc.so, but we are rather interested in the parent one:
131
132 .. code-block:: console
133
134 $ WHEEL_DIR=/home/soscar/miniconda3
135
136 - To compile C files using blosc you only need to enter the commands:
137
138 .. code-block:: console
139
140 $ export LD_LIBRARY_PATH=<path_of_libblosc.so>
141 $ gcc <file_name>.c -I<path_of_blosc.h> -o <file_name> -L<path_of_libblosc.so> -lblosc
142
143 - For instance, let's compile blosc's "examples/many_compressors.c":
144
145 .. code-block:: console
146
147 $ export LD_LIBRARY_PATH=$WHEEL_DIR/lib # note that you need the LD_LIBRARY_PATH env variable
148 $ gcc many_compressors.c -I$WHEEL_DIR/include -o many_compressors -L$WHEEL_DIR/lib -lblosc
149
150 - Run your program:
151
152 .. code-block:: console
153
154 $ ./many_compressors
155 Blosc version info: 1.20.1 ($Date:: 2020-09-08 #$)
156 Using 4 threads (previously using 1)
157 Using blosclz compressor
158 Compression: 4000000 -> 37816 (105.8x)
159 Succesful roundtrip!
160 Using lz4 compressor
161 Compression: 4000000 -> 37938 (105.4x)
162 Succesful roundtrip!
163 Using lz4hc compressor
164 Compression: 4000000 -> 27165 (147.2x)
165 Succesful roundtrip!
166
167 - Rejoice!
168
169
170 Compiling C files with Blosc wheels on MacOS
171 --------------------------------------------
172
173 - Install the wheels:
174
175 .. code-block:: console
176
177 $ pip install blosc (base)
178 Collecting blosc
179 Downloading blosc-1.10.0-cp37-cp37m-macosx_10_9_x86_64.whl (2.2 MB)
180 |████████████████████████████████| 2.2 MB 4.7 MB/s
181 Installing collected packages: blosc
182 Attempting uninstall: blosc
183 Found existing installation: blosc 1.10.0
184 Uninstalling blosc-1.10.0:
185 Successfully uninstalled blosc-1.10.0
186 Successfully installed blosc-1.10.0
187
188 - Find the path where blosc wheel has installed its files:
189
190 .. code-block:: console
191
192 $ find / -name libblosc.dylib 2>/dev/null
193 /home/soscar/miniconda3/lib/libblosc.dylib
194
195 - The output shows the path of libblosc.dylib, but we are rather interested in the parent one:
196
197 .. code-block:: console
198
199 $ WHEEL_DIR=/home/soscar/miniconda3
200
201 - To compile C files using blosc you only need to enter the commands:
202
203 .. code-block:: console
204
205 $ export LD_LIBRARY_PATH=<path_of_libblosc.dylib>
206 $ clang <file_name>.c -I<path_of_blosc.h> -o <file_name> -L<path_of_libblosc.dylib> -lblosc
207
208 - For instance, let's compile blosc's "examples/many_compressors.c":
209
210 .. code-block:: console
211
212 $ export LD_LIBRARY_PATH=$WHEEL_DIR/lib # note that you need the LD_LIBRARY_PATH env variable
213 $ clang many_compressors.c -I$WHEEL_DIR/include -o many_compressors -L$WHEEL_DIR/lib -lblosc
214
215 - Run your program:
216
217 .. code-block:: console
218
219 $ ./many_compressors
220 Blosc version info: 1.20.1 ($Date:: 2020-09-08 #$)
221 Using 4 threads (previously using 1)
222 Using blosclz compressor
223 Compression: 4000000 -> 37816 (105.8x)
224 Succesful roundtrip!
225 Using lz4 compressor
226 Compression: 4000000 -> 37938 (105.4x)
227 Succesful roundtrip!
228 Using lz4hc compressor
229 Compression: 4000000 -> 27165 (147.2x)
230 Succesful roundtrip!
231
232 - Rejoice!
00 ===========================
11 Release notes for C-Blosc
22 ===========================
3
4 Changes from 1.21.0 to 1.21.1
5 =============================
6
7 * Fix pthread flag when linking on ppc64le. See #318. Thanks to Axel Huebl.
8
9 * Updates in codecs (some bring important performance improvements):
10 * BloscLZ updated to 2.5.1.
11 * Zlib updated to 1.2.11
12 * Zstd updated to 1.5.0
13
14
15 Changes from 1.20.1 to 1.21.0
16 =============================
17
18 * Updated zstd codec to 1.4.8.
19
20 * Updated lz4 codec to 1.9.3.
21
22 * New instructions on how to use the libraries in python-blosc wheels
23 so as to compile C-Blosc applications. See:
24 https://github.com/Blosc/c-blosc/blob/master/COMPILING_WITH_WHEELS.rst
25
326
427 Changes from 1.20.0 to 1.20.1
528 =============================
730 * Added `<unistd.h>` in vendored zlib 1.2.8 for compatibility with Python 3.8
831 in recent Mac OSX. For details, see:
932 https://github.com/Blosc/python-blosc/issues/229
33
1034
1135 Changes from 1.19.1 to 1.20.0
1236 =============================
1313 if (LZ4_FOUND)
1414 set(BLOSC_INCLUDE_DIRS ${BLOSC_INCLUDE_DIRS} ${LZ4_INCLUDE_DIR})
1515 else(LZ4_FOUND)
16 set(LZ4_LOCAL_DIR ${INTERNAL_LIBS}/lz4-1.9.2)
16 set(LZ4_LOCAL_DIR ${INTERNAL_LIBS}/lz4-1.9.3)
1717 set(BLOSC_INCLUDE_DIRS ${BLOSC_INCLUDE_DIRS} ${LZ4_LOCAL_DIR})
1818 endif(LZ4_FOUND)
1919 endif(NOT DEACTIVATE_LZ4)
3131 if (ZLIB_FOUND)
3232 set(BLOSC_INCLUDE_DIRS ${BLOSC_INCLUDE_DIRS} ${ZLIB_INCLUDE_DIR})
3333 else(ZLIB_FOUND)
34 set(ZLIB_LOCAL_DIR ${INTERNAL_LIBS}/zlib-1.2.8)
34 set(ZLIB_LOCAL_DIR ${INTERNAL_LIBS}/zlib-1.2.11)
3535 set(BLOSC_INCLUDE_DIRS ${BLOSC_INCLUDE_DIRS} ${ZLIB_LOCAL_DIR})
3636 endif(ZLIB_FOUND)
3737 endif(NOT DEACTIVATE_ZLIB)
4040 if (ZSTD_FOUND)
4141 set(BLOSC_INCLUDE_DIRS ${BLOSC_INCLUDE_DIRS} ${ZSTD_INCLUDE_DIR})
4242 else (ZSTD_FOUND)
43 set(ZSTD_LOCAL_DIR ${INTERNAL_LIBS}/zstd-1.4.5)
43 set(ZSTD_LOCAL_DIR ${INTERNAL_LIBS}/zstd-1.5.0)
4444 set(BLOSC_INCLUDE_DIRS ${BLOSC_INCLUDE_DIRS} ${ZSTD_LOCAL_DIR} ${ZSTD_LOCAL_DIR}/common)
4545 endif (ZSTD_FOUND)
4646 endif (NOT DEACTIVATE_ZSTD)
6464 set(lib_dir lib${LIB_SUFFIX})
6565 set(version_string ${BLOSC_VERSION_MAJOR}.${BLOSC_VERSION_MINOR}.${BLOSC_VERSION_PATCH})
6666
67 set(CMAKE_THREAD_PREFER_PTHREAD TRUE)
67 set(CMAKE_THREAD_PREFER_PTHREAD TRUE) # pre 3.1
68 set(THREADS_PREFER_PTHREAD_FLAG TRUE) # CMake 3.1+
6869 if(WIN32)
6970 # try to use the system library
7071 find_package(Threads)
7273 message(STATUS "using the internal pthread library for win32 systems.")
7374 set(SOURCES ${SOURCES} win32/pthread.c)
7475 else(NOT Threads_FOUND)
75 set(LIBS ${LIBS} ${CMAKE_THREAD_LIBS_INIT})
76 if(CMAKE_VERSION VERSION_LESS 3.1)
77 set(LIBS ${LIBS} ${CMAKE_THREAD_LIBS_INIT})
78 else()
79 set(LIBS ${LIBS} Threads::Threads)
80 endif()
7681 endif(NOT Threads_FOUND)
7782 else(WIN32)
7883 find_package(Threads REQUIRED)
79 set(LIBS ${LIBS} ${CMAKE_THREAD_LIBS_INIT})
84 if(CMAKE_VERSION VERSION_LESS 3.1)
85 set(LIBS ${LIBS} ${CMAKE_THREAD_LIBS_INIT})
86 else()
87 set(LIBS ${LIBS} Threads::Threads)
88 endif()
8089 endif(WIN32)
8190
8291 if(NOT DEACTIVATE_LZ4)
185194 set_property(
186195 TARGET blosc_shared_testing
187196 APPEND PROPERTY COMPILE_DEFINITIONS BLOSC_TESTING)
188 # TEMP : CMake doesn't automatically add -lpthread here like it does
189 # for the blosc_shared target. Force it for now.
190 if(UNIX)
191 set_property(
192 TARGET blosc_shared_testing
193 APPEND PROPERTY LINK_FLAGS "-lpthread")
194 endif()
195197 endif()
196198
197199 if (BUILD_SHARED)
650650 }
651651 if (context->compcode == BLOSC_BLOSCLZ) {
652652 cbytes = blosclz_compress(context->clevel, _tmp+j*neblock, neblock,
653 dest, maxout);
653 dest, maxout, !dont_split);
654654 }
655655 #if defined(HAVE_LZ4)
656656 else if (context->compcode == BLOSC_LZ4) {
1717
1818 /* Version numbers */
1919 #define BLOSC_VERSION_MAJOR 1 /* for major interface/format changes */
20 #define BLOSC_VERSION_MINOR 20 /* for minor interface/format changes */
20 #define BLOSC_VERSION_MINOR 21 /* for minor interface/format changes */
2121 #define BLOSC_VERSION_RELEASE 1 /* for tweaks, bug-fixes, or development */
2222
23 #define BLOSC_VERSION_STRING "1.20.1" /* string version. Sync with above! */
23 #define BLOSC_VERSION_STRING "1.21.1" /* string version. Sync with above! */
2424 #define BLOSC_VERSION_REVISION "$Rev$" /* revision version */
25 #define BLOSC_VERSION_DATE "$Date:: 2020-09-08 #$" /* date version */
26
27 #define BLOSCLZ_VERSION_STRING "2.3.0" /* the internal compressor version */
25 #define BLOSC_VERSION_DATE "$Date:: 2021-10-06 #$" /* date version */
26
27 #define BLOSCLZ_VERSION_STRING "2.5.1" /* the internal compressor version */
2828
2929 /* The *_FORMAT symbols should be just 1-byte long */
3030 #define BLOSC_VERSION_FORMAT 2 /* Blosc format version, starting at 1 */
00 /*********************************************************************
11 Blosc - Blocked Shuffling and Compression Library
22
3 Author: Francesc Alted <francesc@blosc.org>
4 Creation date: 2009-05-20
3 Copyright (C) 2021 The Blosc Developers <blosc@blosc.org>
4 https://blosc.org
5 License: BSD 3-Clause (see LICENSE.txt)
56
67 See LICENSE.txt for details about copyright and rights to use.
78 **********************************************************************/
4243 #define MAX_FARDISTANCE (65535 + MAX_DISTANCE - 1)
4344
4445 #ifdef BLOSC_STRICT_ALIGN
45 #define BLOSCLZ_READU16(p) ((p)[0] | (p)[1]<<8)
46 #define BLOSCLZ_READU16(p) ((p)[0] | (p)[1]<<8)
4647 #define BLOSCLZ_READU32(p) ((p)[0] | (p)[1]<<8 | (p)[2]<<16 | (p)[3]<<24)
4748 #else
48 #define BLOSCLZ_READU16(p) *((const uint16_t*)(p))
49 #define BLOSCLZ_READU32(p) *((const uint32_t*)(p))
50 #endif
51
52 #define HASH_LOG (12U)
49 #define BLOSCLZ_READU16(p) *((const uint16_t*)(p))
50 #define BLOSCLZ_READU32(p) *((const uint32_t*)(p))
51 #endif
52
53 #define HASH_LOG (14U)
54 #define HASH_LOG2 (12U)
5355
5456 // This is used in LZ4 and seems to work pretty well here too
5557 #define HASH_FUNCTION(v, s, h) { \
6062 #if defined(__AVX2__)
6163 static uint8_t *get_run_32(uint8_t *ip, const uint8_t *ip_bound, const uint8_t *ref) {
6264 uint8_t x = ip[-1];
63 /* safe because the outer check against ip limit */
64 if (ip < (ip_bound - sizeof(int64_t))) {
65 int64_t value, value2;
66 /* Broadcast the value for every byte in a 64-bit register */
67 memset(&value, x, 8);
68 #if defined(BLOSC_STRICT_ALIGN)
69 memcpy(&value2, ref, 8);
70 #else
71 value2 = ((int64_t*)ref)[0];
72 #endif
73 if (value != value2) {
74 /* Return the byte that starts to differ */
75 while (*ref++ == x) ip++;
76 return ip;
77 }
78 else {
79 ip += 8;
80 ref += 8;
81 }
82 }
83 if (ip < (ip_bound - sizeof(__m128i))) {
84 __m128i value, value2, cmp;
85 /* Broadcast the value for every byte in a 128-bit register */
86 memset(&value, x, sizeof(__m128i));
87 value2 = _mm_loadu_si128((__m128i *) ref);
88 cmp = _mm_cmpeq_epi32(value, value2);
89 if (_mm_movemask_epi8(cmp) != 0xFFFF) {
90 /* Return the byte that starts to differ */
91 while (*ref++ == x) ip++;
92 return ip;
93 } else {
94 ip += sizeof(__m128i);
95 ref += sizeof(__m128i);
96 }
97 }
65
9866 while (ip < (ip_bound - (sizeof(__m256i)))) {
9967 __m256i value, value2, cmp;
10068 /* Broadcast the value for every byte in a 256-bit register */
11583 while ((ip < ip_bound) && (*ref++ == x)) ip++;
11684 return ip;
11785 }
118
119 #elif defined(__SSE2__)
120
86 #endif
87
88 #if defined(__SSE2__)
12189 static uint8_t *get_run_16(uint8_t *ip, const uint8_t *ip_bound, const uint8_t *ref) {
12290 uint8_t x = ip[-1];
12391
124 if (ip < (ip_bound - sizeof(int64_t))) {
125 int64_t value, value2;
126 /* Broadcast the value for every byte in a 64-bit register */
127 memset(&value, x, 8);
128 #if defined(BLOSC_STRICT_ALIGN)
129 memcpy(&value2, ref, 8);
130 #else
131 value2 = ((int64_t*)ref)[0];
132 #endif
133 if (value != value2) {
134 /* Return the byte that starts to differ */
135 while (*ref++ == x) ip++;
136 return ip;
137 }
138 else {
139 ip += 8;
140 ref += 8;
141 }
142 }
143 /* safe because the outer check against ip limit */
14492 while (ip < (ip_bound - sizeof(__m128i))) {
14593 __m128i value, value2, cmp;
14694 /* Broadcast the value for every byte in a 128-bit register */
162110 return ip;
163111 }
164112
165 #else
113 #endif
114
166115
167116 static uint8_t *get_run(uint8_t *ip, const uint8_t *ip_bound, const uint8_t *ref) {
168117 uint8_t x = ip[-1];
191140 return ip;
192141 }
193142
194 #endif
195
196143
197144 /* Return the byte that starts to differ */
198145 static uint8_t *get_match(uint8_t *ip, const uint8_t *ip_bound, const uint8_t *ref) {
219166 static uint8_t *get_match_16(uint8_t *ip, const uint8_t *ip_bound, const uint8_t *ref) {
220167 __m128i value, value2, cmp;
221168
222 if (ip < (ip_bound - sizeof(int64_t))) {
223 if (*(int64_t *) ref != *(int64_t *) ip) {
224 /* Return the byte that starts to differ */
225 while (*ref++ == *ip++) {}
226 return ip;
227 } else {
228 ip += sizeof(int64_t);
229 ref += sizeof(int64_t);
230 }
231 }
232169 while (ip < (ip_bound - sizeof(__m128i))) {
233170 value = _mm_loadu_si128((__m128i *) ip);
234171 value2 = _mm_loadu_si128((__m128i *) ref);
235172 cmp = _mm_cmpeq_epi32(value, value2);
236173 if (_mm_movemask_epi8(cmp) != 0xFFFF) {
237174 /* Return the byte that starts to differ */
238 return get_match(ip, ip_bound, ref);
175 while (*ref++ == *ip++) {}
176 return ip;
239177 }
240178 else {
241179 ip += sizeof(__m128i);
252190 #if defined(__AVX2__)
253191 static uint8_t *get_match_32(uint8_t *ip, const uint8_t *ip_bound, const uint8_t *ref) {
254192
255 if (ip < (ip_bound - sizeof(int64_t))) {
256 if (*(int64_t *) ref != *(int64_t *) ip) {
257 /* Return the byte that starts to differ */
258 while (*ref++ == *ip++) {}
259 return ip;
260 } else {
261 ip += sizeof(int64_t);
262 ref += sizeof(int64_t);
263 }
264 }
265 if (ip < (ip_bound - sizeof(__m128i))) {
266 __m128i value, value2, cmp;
267 value = _mm_loadu_si128((__m128i *) ip);
268 value2 = _mm_loadu_si128((__m128i *) ref);
269 cmp = _mm_cmpeq_epi32(value, value2);
270 if (_mm_movemask_epi8(cmp) != 0xFFFF) {
271 /* Return the byte that starts to differ */
272 return get_match_16(ip, ip_bound, ref);
273 }
274 else {
275 ip += sizeof(__m128i);
276 ref += sizeof(__m128i);
277 }
278 }
279193 while (ip < (ip_bound - sizeof(__m256i))) {
280194 __m256i value, value2, cmp;
281195 value = _mm256_loadu_si256((__m256i *) ip);
298212 #endif
299213
300214
301 static uint8_t* get_run_or_match(uint8_t* ip, uint8_t* ip_bound, const uint8_t* ref, bool run) {
215 static uint8_t* get_run_or_match(uint8_t* ip, const uint8_t* ip_bound, const uint8_t* ref, bool run) {
302216 if (BLOSCLZ_UNLIKELY(run)) {
303217 #if defined(__AVX2__)
304 ip = get_run_32(ip, ip_bound, ref);
218 // Extensive experiments on AMD Ryzen3 say that regular get_run is faster
219 // ip = get_run_32(ip, ip_bound, ref);
220 ip = get_run(ip, ip_bound, ref);
305221 #elif defined(__SSE2__)
306 ip = get_run_16(ip, ip_bound, ref);
222 // Extensive experiments on AMD Ryzen3 say that regular get_run is faster
223 // ip = get_run_16(ip, ip_bound, ref);
224 ip = get_run(ip, ip_bound, ref);
307225 #else
308226 ip = get_run(ip, ip_bound, ref);
309227 #endif
310228 }
311229 else {
312230 #if defined(__AVX2__)
313 ip = get_match_32(ip, ip_bound, ref);
231 // Extensive experiments on AMD Ryzen3 say that regular get_match_16 is faster
232 // ip = get_match_32(ip, ip_bound, ref);
233 ip = get_match_16(ip, ip_bound, ref);
314234 #elif defined(__SSE2__)
315235 ip = get_match_16(ip, ip_bound, ref);
316236 #else
334254 } \
335255 }
336256
337 #define LITERAL2(ip, oc, anchor, copy) { \
257 #define LITERAL2(ip, anchor, copy) { \
338258 oc++; anchor++; \
339259 ip = anchor; \
340260 copy++; \
344264 } \
345265 }
346266
347 #define DISTANCE_SHORT(op, op_limit, len, distance) { \
267 #define MATCH_SHORT(op, op_limit, len, distance) { \
348268 if (BLOSCLZ_UNLIKELY(op + 2 > op_limit)) \
349269 goto out; \
350270 *op++ = (uint8_t)((len << 5U) + (distance >> 8U)); \
351271 *op++ = (uint8_t)((distance & 255U)); \
352272 }
353273
354 #define DISTANCE_LONG(op, op_limit, len, distance) { \
274 #define MATCH_LONG(op, op_limit, len, distance) { \
355275 if (BLOSCLZ_UNLIKELY(op + 1 > op_limit)) \
356276 goto out; \
357277 *op++ = (uint8_t)((7U << 5U) + (distance >> 8U)); \
366286 *op++ = (uint8_t)((distance & 255U)); \
367287 }
368288
369 #define DISTANCE_SHORT_FAR(op, op_limit, len, distance) { \
289 #define MATCH_SHORT_FAR(op, op_limit, len, distance) { \
370290 if (BLOSCLZ_UNLIKELY(op + 4 > op_limit)) \
371291 goto out; \
372292 *op++ = (uint8_t)((len << 5U) + 31); \
375295 *op++ = (uint8_t)(distance & 255U); \
376296 }
377297
378 #define DISTANCE_LONG_FAR(op, op_limit, len, distance) { \
298 #define MATCH_LONG_FAR(op, op_limit, len, distance) { \
379299 if (BLOSCLZ_UNLIKELY(op + 1 > op_limit)) \
380300 goto out; \
381301 *op++ = (7U << 5U) + 31; \
393313 }
394314
395315
396 // Get the compressed size of a buffer. Useful for testing compression ratios for high clevels.
397 static int get_csize(uint8_t* ibase, int maxlen, bool force_3b_shift) {
316 // Get a guess for the compressed size of a buffer
317 static double get_cratio(uint8_t* ibase, int maxlen, int minlen, int ipshift) {
398318 uint8_t* ip = ibase;
399319 int32_t oc = 0;
400 uint8_t* ip_bound = ibase + maxlen - 1;
401 uint8_t* ip_limit = ibase + maxlen - 12;
402 uint32_t htab[1U << (uint8_t)HASH_LOG];
320 const uint16_t hashlen = (1U << (uint8_t)HASH_LOG2);
321 uint16_t htab[1U << (uint8_t)HASH_LOG2];
403322 uint32_t hval;
404323 uint32_t seq;
405324 uint8_t copy;
325 // Make a tradeoff between testing too much and too little
326 uint16_t limit = (maxlen > hashlen) ? hashlen : maxlen;
327 uint8_t* ip_bound = ibase + limit - 1;
328 uint8_t* ip_limit = ibase + limit - 12;
406329
407330 // Initialize the hash table to distances of 0
408 for (unsigned i = 0; i < (1U << HASH_LOG); i++) {
409 htab[i] = 0;
410 }
331 memset(htab, 0, hashlen * sizeof(uint16_t));
411332
412333 /* we start with literal copy */
413334 copy = 4;
421342
422343 /* find potential match */
423344 seq = BLOSCLZ_READU32(ip);
424 HASH_FUNCTION(hval, seq, HASH_LOG)
345 HASH_FUNCTION(hval, seq, HASH_LOG2)
425346 ref = ibase + htab[hval];
426347
427348 /* calculate distance to the match */
428 distance = anchor - ref;
349 distance = (unsigned int)(anchor - ref);
429350
430351 /* update hash table */
431 htab[hval] = (uint32_t) (anchor - ibase);
352 htab[hval] = (uint16_t) (anchor - ibase);
432353
433354 if (distance == 0 || (distance >= MAX_FARDISTANCE)) {
434 LITERAL2(ip, oc, anchor, copy)
355 LITERAL2(ip, anchor, copy)
435356 continue;
436357 }
437358
438359 /* is this a match? check the first 4 bytes */
439 if (BLOSCLZ_UNLIKELY(BLOSCLZ_READU32(ref) == BLOSCLZ_READU32(ip))) {
360 if (BLOSCLZ_READU32(ref) == BLOSCLZ_READU32(ip)) {
440361 ref += 4;
441362 }
442363 else {
443364 /* no luck, copy as a literal */
444 LITERAL2(ip, oc, anchor, copy)
365 LITERAL2(ip, anchor, copy)
445366 continue;
446367 }
447368
454375 /* get runs or matches; zero distance means a run */
455376 ip = get_run_or_match(ip, ip_bound, ref, !distance);
456377
457 ip -= force_3b_shift ? 3 : 4;
378 ip -= ipshift;
458379 unsigned len = (int)(ip - anchor);
459 // If match is close, let's reduce the minimum length to encode it
460 unsigned minlen = (distance < MAX_DISTANCE) ? 3 : 4;
461 // Encoding short lengths is expensive during decompression
462380 if (len < minlen) {
463 LITERAL2(ip, oc, anchor, copy)
381 LITERAL2(ip, anchor, copy)
464382 continue;
465383 }
466384
467 /* if we have'nt copied anything, adjust the output counter */
385 /* if we haven't copied anything, adjust the output counter */
468386 if (!copy)
469387 oc--;
470388 /* reset literal counter */
487405
488406 /* update the hash at match boundary */
489407 seq = BLOSCLZ_READU32(ip);
490 HASH_FUNCTION(hval, seq, HASH_LOG)
491 htab[hval] = (uint32_t) (ip++ - ibase);
492 seq >>= 8U;
493 HASH_FUNCTION(hval, seq, HASH_LOG)
494 htab[hval] = (uint32_t) (ip++ - ibase);
408 HASH_FUNCTION(hval, seq, HASH_LOG2)
409 htab[hval] = (uint16_t)(ip++ - ibase);
410 ip++;
495411 /* assuming literal copy */
496412 oc++;
497
498 }
499
500 /* if we have copied something, adjust the copy length */
501 if (!copy)
502 oc--;
503
504 return (int)oc;
413 }
414
415 double ic;
416 ic = (double)(ip - ibase);
417 return ic / (double)oc;
505418 }
506419
507420
508421 int blosclz_compress(const int clevel, const void* input, int length,
509 void* output, int maxout) {
422 void* output, int maxout, const int split_block) {
510423 uint8_t* ibase = (uint8_t*)input;
511 uint8_t* ip = ibase;
512 uint8_t* ip_bound = ibase + length - 1;
513 uint8_t* ip_limit = ibase + length - 12;
514 uint8_t* op = (uint8_t*)output;
515 uint8_t* op_limit;
516 uint32_t htab[1U << (uint8_t)HASH_LOG];
517 uint32_t hval;
518 uint32_t seq;
519 uint8_t copy;
520
521 op_limit = op + maxout;
522
523 // Minimum lengths for encoding
524 unsigned minlen_[10] = {0, 12, 12, 11, 10, 9, 8, 7, 6, 5};
525
526 // Minimum compression ratios for initiate encoding
527 double cratio_[10] = {0, 2, 2, 2, 2, 1.8, 1.6, 1.4, 1.2, 1.1};
528
529 uint8_t hashlog_[10] = {0, HASH_LOG - 2, HASH_LOG - 1, HASH_LOG, HASH_LOG,
530 HASH_LOG, HASH_LOG, HASH_LOG, HASH_LOG, HASH_LOG};
531 uint8_t hashlog = hashlog_[clevel];
532 // Initialize the hash table to distances of 0
533 for (unsigned i = 0; i < (1U << hashlog); i++) {
534 htab[i] = 0;
535 }
536
537 /* input and output buffer cannot be less than 16 and 66 bytes or we can get into trouble */
538 if (length < 16 || maxout < 66) {
539 return 0;
424
425 // Experiments say that checking 1/4 of the buffer is enough to figure out approx cratio
426 int maxlen = length / 4;
427 // Start probing somewhere inside the buffer
428 int shift = length - maxlen;
429 // Actual entropy probing!
430 double cratio = get_cratio(ibase + shift, maxlen, 3, 3);
431 // discard probes with small compression ratios (too expensive)
432 double cratio_[10] = {0, 2, 1.5, 1.2, 1.2, 1.2, 1.2, 1.15, 1.1, 1.0};
433 if (cratio < cratio_[clevel]) {
434 goto out;
540435 }
541436
542437 /* When we go back in a match (shift), we obtain quite different compression properties.
543438 * It looks like 4 is more useful in combination with bitshuffle and small typesizes
544 * (compress better and faster in e.g. `b2bench blosclz bitshuffle single 6 6291456 1 19`).
545 * Fallback to 4 because it provides more consistent results on small itemsizes.
439 * Fallback to 4 because it provides more consistent results for large cratios.
546440 *
547441 * In this block we also check cratios for the beginning of the buffers and
548442 * eventually discard those that are small (take too long to decompress).
549443 * This process is called _entropy probing_.
550444 */
551 int ipshift = 4;
552 int maxlen; // maximum length for entropy probing
553 int csize_3b;
554 int csize_4b;
555 double cratio = 0;
556 switch (clevel) {
557 case 1:
558 case 2:
559 case 3:
560 maxlen = length / 8;
561 csize_4b = get_csize(ibase, maxlen, false);
562 cratio = (double)maxlen / csize_4b;
563 break;
564 case 4:
565 case 5:
566 case 6:
567 case 7:
568 case 8:
569 maxlen = length / 8;
570 csize_4b = get_csize(ibase, maxlen, false);
571 cratio = (double)maxlen / csize_4b;
572 break;
573 case 9:
574 // case 9 is special. we need to asses the optimal shift
575 maxlen = length / 8;
576 csize_3b = get_csize(ibase, maxlen, true);
577 csize_4b = get_csize(ibase, maxlen, false);
578 ipshift = (csize_3b < csize_4b) ? 3 : 4;
579 cratio = (csize_3b < csize_4b) ? ((double)maxlen / csize_3b) : ((double)maxlen / csize_4b);
580 break;
581 default:
582 break;
583 }
584 // discard probes with small compression ratios (too expensive)
585 if (cratio < cratio_ [clevel]) {
586 goto out;
587 }
445 unsigned ipshift = 4;
446 // Compute optimal shift and minimum lengths for encoding
447 // Use 4 by default, except for low entropy data, where we should do a best effort
448 unsigned minlen = 4;
449 // BloscLZ works better with splits mostly, so when data is not split, do a best effort
450 // Why using cratio < 4 is based in experiments with low and high entropy
451 if (!split_block || cratio < 4) {
452 ipshift = 3;
453 minlen = 3;
454 }
455
456 uint8_t hashlog_[10] = {0, HASH_LOG - 2, HASH_LOG - 1, HASH_LOG, HASH_LOG,
457 HASH_LOG, HASH_LOG, HASH_LOG, HASH_LOG, HASH_LOG};
458 uint8_t hashlog = hashlog_[clevel];
459
460 uint8_t* ip = ibase;
461 const uint8_t* ip_bound = ibase + length - 1;
462 const uint8_t* ip_limit = ibase + length - 12;
463 uint8_t* op = (uint8_t*)output;
464 const uint8_t* op_limit = op + maxout;
465
466 /* input and output buffer cannot be less than 16 and 66 bytes or we can get into trouble */
467 if (length < 16 || maxout < 66) {
468 return 0;
469 }
470
471 // Initialize the hash table
472 uint32_t htab[1U << (uint8_t)HASH_LOG];
473 memset(htab, 0, (1U << hashlog) * sizeof(uint32_t));
588474
589475 /* we start with literal copy */
590 copy = 4;
476 uint8_t copy = 4;
591477 *op++ = MAX_COPY - 1;
592478 *op++ = *ip++;
593479 *op++ = *ip++;
601487 uint8_t* anchor = ip; /* comparison starting-point */
602488
603489 /* find potential match */
604 seq = BLOSCLZ_READU32(ip);
490 uint32_t seq = BLOSCLZ_READU32(ip);
491 uint32_t hval;
605492 HASH_FUNCTION(hval, seq, hashlog)
606493 ref = ibase + htab[hval];
607494
608495 /* calculate distance to the match */
609 distance = anchor - ref;
496 distance = (unsigned int)(anchor - ref);
610497
611498 /* update hash table */
612499 htab[hval] = (uint32_t) (anchor - ibase);
638525 ip -= ipshift;
639526
640527 unsigned len = (int)(ip - anchor);
641 // If match is close, let's reduce the minimum length to encode it
642 unsigned minlen = (clevel == 9) ? ipshift : minlen_[clevel];
643528
644529 // Encoding short lengths is expensive during decompression
645 // Encode only for reasonable lengths (extensive experiments done)
646530 if (len < minlen || (len <= 5 && distance >= MAX_DISTANCE)) {
647531 LITERAL(ip, op, op_limit, anchor, copy)
648532 continue;
661545 /* encode the match */
662546 if (distance < MAX_DISTANCE) {
663547 if (len < 7) {
664 DISTANCE_SHORT(op, op_limit, len, distance)
548 MATCH_SHORT(op, op_limit, len, distance)
665549 } else {
666 DISTANCE_LONG(op, op_limit, len, distance)
550 MATCH_LONG(op, op_limit, len, distance)
667551 }
668552 } else {
669553 /* far away, but not yet in the another galaxy... */
670554 distance -= MAX_DISTANCE;
671555 if (len < 7) {
672 DISTANCE_SHORT_FAR(op, op_limit, len, distance)
556 MATCH_SHORT_FAR(op, op_limit, len, distance)
673557 } else {
674 DISTANCE_LONG_FAR(op, op_limit, len, distance)
558 MATCH_LONG_FAR(op, op_limit, len, distance)
675559 }
676560 }
677561
679563 seq = BLOSCLZ_READU32(ip);
680564 HASH_FUNCTION(hval, seq, hashlog)
681565 htab[hval] = (uint32_t) (ip++ - ibase);
682 seq >>= 8U;
683 HASH_FUNCTION(hval, seq, hashlog)
684 htab[hval] = (uint32_t) (ip++ - ibase);
685 /* assuming literal copy */
566 if (clevel == 9) {
567 // In some situations, including a second hash proves to be useful,
568 // but not in others. Activating here in max clevel only.
569 seq >>= 8U;
570 HASH_FUNCTION(hval, seq, hashlog)
571 htab[hval] = (uint32_t) (ip++ - ibase);
572 }
573 else {
574 ip++;
575 }
686576
687577 if (BLOSCLZ_UNLIKELY(op + 1 > op_limit))
688578 goto out;
579
580 /* assuming literal copy */
689581 *op++ = MAX_COPY - 1;
690582 }
691583
716608 }
717609
718610 // See https://habr.com/en/company/yandex/blog/457612/
719 #ifdef __AVX2__
611 #if defined(__AVX2__)
720612
721613 #if defined(_MSC_VER)
722614 #define ALIGNED_(x) __declspec(align(x))
852744 }
853745 else {
854746 // general copy with any overlap
855 #ifdef __AVX2__
747 #if defined(__AVX2__)
856748 if (op - ref <= 16) {
857749 // This is not faster on a combination of compilers (clang, gcc, icc) or machines, but
858750 // it is not slower either. Let's activate here for experimentation.
860752 }
861753 else {
862754 #endif
863 op = copy_match(op, ref, (unsigned) len);
864 #ifdef __AVX2__
755 op = copy_match(op, ref, (unsigned) len);
756 #if defined(__AVX2__)
865757 }
866758 #endif
867759 }
4141 */
4242
4343 int blosclz_compress(int opt_level, const void* input, int length,
44 void* output, int maxout);
44 void* output, int maxout, int split_block);
4545
4646 /**
4747 Decompress a block of compressed data and returns the size of the