Clarify comment on unrolling that this is now the default and where the macrofied version can be found in the history.
Also fix a small typo.
frankw
6 years ago
39 | 39 |
|
40 | 40 |
Below is the speed in MB/s for a single core (ranked fast to slow) as well as the factor of improvement over `crypto/sha256` (when applicable).
|
41 | 41 |
|
42 | |
| Processor | Package | Speed | Improvement |
|
43 | |
| --------------------------------- | ---------------------------- | -----------:| -----------:|
|
44 | |
| 1.2 GHz ARM Cortex-A53 | minio/sha256-simd (ARM64) | 638.2 MB/s | 105x |
|
45 | |
| 2.4 GHz Intel Xeon CPU E5-2620 v3 | minio/sha256-simd (AVX2) (*) | 355.0 MB/s | 1.88x |
|
46 | |
| 2.4 GHz Intel Xeon CPU E5-2620 v3 | minio/sha256-simd (AVX) | 306.0 MB/s | 1.62x |
|
47 | |
| 2.4 GHz Intel Xeon CPU E5-2620 v3 | minio/sha256-simd (SSE) | 298.7 MB/s | 1.58x |
|
48 | |
| 2.4 GHz Intel Xeon CPU E5-2620 v3 | crypto/sha256 | 189.2 MB/s | |
|
49 | |
| 1.2 GHz ARM Cortex-A53 | crypto/sha256 | 6.1 MB/s | |
|
|
42 |
| Processor | Package | Speed | Improvement |
|
|
43 |
| --------------------------------- | ------------------------- | -----------:| -----------:|
|
|
44 |
| 1.2 GHz ARM Cortex-A53 | minio/sha256-simd (ARM64) | 638.2 MB/s | 105x |
|
|
45 |
| 2.4 GHz Intel Xeon CPU E5-2620 v3 | minio/sha256-simd (AVX2) | 355.0 MB/s | 1.88x |
|
|
46 |
| 2.4 GHz Intel Xeon CPU E5-2620 v3 | minio/sha256-simd (AVX) | 306.0 MB/s | 1.62x |
|
|
47 |
| 2.4 GHz Intel Xeon CPU E5-2620 v3 | minio/sha256-simd (SSE) | 298.7 MB/s | 1.58x |
|
|
48 |
| 2.4 GHz Intel Xeon CPU E5-2620 v3 | crypto/sha256 | 189.2 MB/s | |
|
|
49 |
| 1.2 GHz ARM Cortex-A53 | crypto/sha256 | 6.1 MB/s | |
|
50 | 50 |
|
51 | |
(*) Measured with the "unrolled"/"demacro-ed" AVX2 version. Due to some Golang assembly restrictions the AVX2 version that uses `defines` loses about 15% performance. The optimized version is contained in the git history so for maximum speed you want to do this after getting: `git cat-file blob 586b6e > sha256blockAvx2_amd64.s` (or vendor it for your project; see [here](https://github.com/minio/sha256-simd/blob/13b11bdf9b0580a756a111492d2ae382bab7ec79/sha256blockAvx2_amd64.s) to view it in its full glory).
|
|
51 |
Note that the AVX2 version is measured with the "unrolled"/"demacro-ed" version. Due to some Golang assembly restrictions the AVX2 version that uses `defines` loses about 15% performance (you can see the macrofied version, which is a little bit easier to read, here https://github.com/minio/sha256-simd/blob/e1b0a493b71bb31e3f1bf82d3b8cbd0d6960dfa6/sha256blockAvx2_amd64.s).
|
52 | 52 |
|
53 | 53 |
See further down for detailed performance.
|
54 | 54 |
|
18 | 18 |
//
|
19 | 19 |
|
20 | 20 |
//
|
21 | |
// Based on implementaion as found in https://github.com/jocover/sha256-armv8
|
|
21 |
// Based on implementation as found in https://github.com/jocover/sha256-armv8
|
22 | 22 |
//
|
23 | 23 |
// Use github.com/minio/asm2plan9s on this file to assemble ARM instructions to
|
24 | 24 |
// their Plan9 equivalents
|