Codebase list golang-github-minio-sha256-simd / 8e7e59e
Update upstream source from tag 'upstream/0.1.1' Update to upstream version '0.1.1' with Debian dir 72f781ec53ea739cecaa211db66396d9b2d05a6f Dmitry Smirnov 3 years ago
31 changed file(s) with 3067 addition(s) and 2526 deletion(s). Raw diff Collapse all Expand all
0 *.test
33
44 os:
55 - linux
6 - osx
7
8 osx_image: xcode7.2
96
107 go:
11 - 1.6
12 - 1.5
8 - tip
9 - 1.12.x
1310
1411 env:
1512 - ARCH=x86_64
1613 - ARCH=i686
1714
15 matrix:
16 fast_finish: true
17 allow_failures:
18 - go: tip
19
1820 script:
1921 - diff -au <(gofmt -d .) <(printf "")
2022 - go test -race -v ./...
23 - go vet -asmdecl .
24 - ./test-architectures.sh
00 # sha256-simd
11
2 Accelerate SHA256 computations in pure Go using AVX512 and AVX2 for Intel and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core) in comparison to AVX2.
2 Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions and AVX2 for Intel and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core) in comparison to AVX2. SHA Extensions give a performance boost of close to 4x over AVX2.
33
44 ## Introduction
55
77
88 This package uses Golang assembly. The AVX512 version is based on the Intel's "multi-buffer crypto library for IPSec" whereas the other Intel implementations are described in "Fast SHA-256 Implementations on Intel Architecture Processors" by J. Guilford et al.
99
10 ## New: Support for AVX512
10 ## New: Support for Intel SHA Extensions
11
12 Support for the Intel SHA Extensions has been added by Kristofer Peterson (@svenski123), originally developed for spacemeshos [here](https://github.com/spacemeshos/POET/issues/23). On CPUs that support it (known thus far Intel Celeron J3455 and AMD Ryzen) it gives a significant boost in performance (with thanks to @AudriusButkevicius for reporting the results; full results [here](https://github.com/minio/sha256-simd/pull/37#issuecomment-451607827)).
13
14 ```
15 $ benchcmp avx2.txt sha-ext.txt
16 benchmark AVX2 MB/s SHA Ext MB/s speedup
17 BenchmarkHash5M 514.40 1975.17 3.84x
18 ```
19
20 Thanks to Kristofer Peterson, we also added additional performance changes such as optimized padding, endian conversions which sped up all implementations i.e. Intel SHA alone while doubled performance for small sizes, the other changes increased everything roughly 50%.
21
22 ## Support for AVX512
1123
1224 We have added support for AVX512 which results in an up to 8x performance improvement over AVX2 (3.0 GHz Xeon Platinum 8124M CPU):
1325
2335
2436 Whereas the original Intel C implementation requires some sort of explicit scheduling of messages to be processed in parallel, for Golang it makes sense to take advantage of channels in order to group messages together and use channels as well for sending back the results (thereby effectively decoupling the calculations). We have implemented a fairly simple scheduling mechanism that seems to work well in practice.
2537
26 Due to this differrent way of scheduling, we decided to use an explicit method to instantiate the AVX512 version. Essentially one or more AVX512 processing servers ([`Avx512Server`](https://github.com/minio/sha256-simd/blob/master/sha256blockAvx512_amd64.go#L294)) have to be created whereby each server can hash over 3 GB/s on a single core. An `hash.Hash` object ([`Avx512Digest`](https://github.com/minio/sha256-simd/blob/master/sha256blockAvx512_amd64.go#L45)) is then instantiated using one of these servers and used in the regular fashion:
38 Due to this different way of scheduling, we decided to use an explicit method to instantiate the AVX512 version. Essentially one or more AVX512 processing servers ([`Avx512Server`](https://github.com/minio/sha256-simd/blob/master/sha256blockAvx512_amd64.go#L294)) have to be created whereby each server can hash over 3 GB/s on a single core. An `hash.Hash` object ([`Avx512Digest`](https://github.com/minio/sha256-simd/blob/master/sha256blockAvx512_amd64.go#L45)) is then instantiated using one of these servers and used in the regular fashion:
2739
2840 ```go
2941 import "github.com/minio/sha256-simd"
6577 | Processor | SIMD | Speed (MB/s) |
6678 | --------------------------------- | ------- | ------------:|
6779 | 3.0 GHz Intel Xeon Platinum 8124M | AVX512 | 3498 |
80 | 3.7 GHz AMD Ryzen 7 2700X | SHA Ext | 1979 |
6881 | 1.2 GHz ARM Cortex-A53 | ARM64 | 638 |
6982 | 3.0 GHz Intel Xeon Platinum 8124M | AVX2 | 449 |
7083 | 3.1 GHz Intel Core i7 | AVX | 362 |
8396 ## ARM SHA Extensions
8497
8598 The 64-bit ARMv8 core has introduced new instructions for SHA1 and SHA2 acceleration as part of the [Cryptography Extensions](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0501f/CHDFJBCJ.html). Below you can see a small excerpt highlighting one of the rounds as is done for the SHA256 calculation process (for full code see [sha256block_arm64.s](https://github.com/minio/sha256-simd/blob/master/sha256block_arm64.s)).
86
99
87100 ```
88101 sha256h q2, q3, v9.4s
89102 sha256h2 q3, q4, v9.4s
99112
100113 ### Detailed benchmarks
101114
102 Benchmarks generated on a 1.2 Ghz Quad-Core ARM Cortex A53 equipped [Pine64](https://www.pine64.com/).
115 Benchmarks generated on a 1.2 Ghz Quad-Core ARM Cortex A53 equipped [Pine64](https://www.pine64.com/).
103116
104117 ```
105118 minio@minio-arm:$ benchcmp golang.txt arm64.txt
1515 package sha256
1616
1717 // True when SIMD instructions are available.
18 var avx512 = haveAVX512()
19 var avx2 = haveAVX2()
20 var avx = haveAVX()
21 var ssse3 = haveSSSE3()
18 var avx512 bool
19 var avx2 bool
20 var avx bool
21 var sse bool
22 var sse2 bool
23 var sse3 bool
24 var ssse3 bool
25 var sse41 bool
26 var sse42 bool
27 var popcnt bool
28 var sha bool
2229 var armSha = haveArmSha()
2330
24 // haveAVX returns true when there is AVX support
25 func haveAVX() bool {
26 _, _, c, _ := cpuid(1)
31 func init() {
32 var _xsave bool
33 var _osxsave bool
34 var _avx bool
35 var _avx2 bool
36 var _avx512f bool
37 var _avx512dq bool
38 // var _avx512pf bool
39 // var _avx512er bool
40 // var _avx512cd bool
41 var _avx512bw bool
42 var _avx512vl bool
43 var _sseState bool
44 var _avxState bool
45 var _opmaskState bool
46 var _zmmHI256State bool
47 var _hi16ZmmState bool
2748
28 // Check XGETBV, OXSAVE and AVX bits
29 if c&(1<<26) != 0 && c&(1<<27) != 0 && c&(1<<28) != 0 {
30 // Check for OS support
31 eax, _ := xgetbv(0)
32 return (eax & 0x6) == 0x6
33 }
34 return false
35 }
36
37 // haveAVX2 returns true when there is AVX2 support
38 func haveAVX2() bool {
3949 mfi, _, _, _ := cpuid(0)
4050
41 // Check AVX2, AVX2 requires OS support, but BMI1/2 don't.
42 if mfi >= 7 && haveAVX() {
43 _, ebx, _, _ := cpuidex(7, 0)
44 return (ebx & 0x00000020) != 0
51 if mfi >= 1 {
52 _, _, c, d := cpuid(1)
53
54 sse = (d & (1 << 25)) != 0
55 sse2 = (d & (1 << 26)) != 0
56 sse3 = (c & (1 << 0)) != 0
57 ssse3 = (c & (1 << 9)) != 0
58 sse41 = (c & (1 << 19)) != 0
59 sse42 = (c & (1 << 20)) != 0
60 popcnt = (c & (1 << 23)) != 0
61 _xsave = (c & (1 << 26)) != 0
62 _osxsave = (c & (1 << 27)) != 0
63 _avx = (c & (1 << 28)) != 0
4564 }
46 return false
65
66 if mfi >= 7 {
67 _, b, _, _ := cpuid(7)
68
69 _avx2 = (b & (1 << 5)) != 0
70 _avx512f = (b & (1 << 16)) != 0
71 _avx512dq = (b & (1 << 17)) != 0
72 // _avx512pf = (b & (1 << 26)) != 0
73 // _avx512er = (b & (1 << 27)) != 0
74 // _avx512cd = (b & (1 << 28)) != 0
75 _avx512bw = (b & (1 << 30)) != 0
76 _avx512vl = (b & (1 << 31)) != 0
77 sha = (b & (1 << 29)) != 0
78 }
79
80 // Stop here if XSAVE unsupported or not enabled
81 if !_xsave || !_osxsave {
82 return
83 }
84
85 if _xsave && _osxsave {
86 a, _ := xgetbv(0)
87
88 _sseState = (a & (1 << 1)) != 0
89 _avxState = (a & (1 << 2)) != 0
90 _opmaskState = (a & (1 << 5)) != 0
91 _zmmHI256State = (a & (1 << 6)) != 0
92 _hi16ZmmState = (a & (1 << 7)) != 0
93 } else {
94 _sseState = true
95 }
96
97 // Very unlikely that OS would enable XSAVE and then disable SSE
98 if !_sseState {
99 sse = false
100 sse2 = false
101 sse3 = false
102 ssse3 = false
103 sse41 = false
104 sse42 = false
105 }
106
107 if _avxState {
108 avx = _avx
109 avx2 = _avx2
110 }
111
112 if _opmaskState && _zmmHI256State && _hi16ZmmState {
113 avx512 = (_avx512f &&
114 _avx512dq &&
115 _avx512bw &&
116 _avx512vl)
117 }
47118 }
48
49 // haveAVX512 returns true when there is AVX512 support
50 func haveAVX512() bool {
51 mfi, _, _, _ := cpuid(0)
52
53 // Check AVX2, AVX2 requires OS support, but BMI1/2 don't.
54 if mfi >= 7 {
55 _, _, c, _ := cpuid(1)
56
57 // Only detect AVX-512 features if XGETBV is supported
58 if c&((1<<26)|(1<<27)) == (1<<26)|(1<<27) {
59 // Check for OS support
60 eax, _ := xgetbv(0)
61 _, ebx, _, _ := cpuidex(7, 0)
62
63 // Verify that XCR0[7:5] = ‘111b’ (OPMASK state, upper 256-bit of ZMM0-ZMM15 and
64 // ZMM16-ZMM31 state are enabled by OS)
65 /// and that XCR0[2:1] = ‘11b’ (XMM state and YMM state are enabled by OS).
66 if (eax>>5)&7 == 7 && (eax>>1)&3 == 3 {
67 if ebx&(1<<16) == 0 {
68 return false // no AVX512F
69 }
70 if ebx&(1<<17) == 0 {
71 return false // no AVX512DQ
72 }
73 if ebx&(1<<30) == 0 {
74 return false // no AVX512BW
75 }
76 if ebx&(1<<31) == 0 {
77 return false // no AVX512VL
78 }
79 return true
80 }
81 }
82 }
83 return false
84 }
85
86 // haveSSSE3 returns true when there is SSSE3 support
87 func haveSSSE3() bool {
88
89 _, _, c, _ := cpuid(1)
90
91 return (c & 0x00000200) != 0
92 }
2323
2424 // func cpuid(op uint32) (eax, ebx, ecx, edx uint32)
2525 TEXT ·cpuid(SB), 7, $0
26 XORL CX, CX
27 MOVL op+0(FP), AX
28 CPUID
29 MOVL AX, eax+4(FP)
30 MOVL BX, ebx+8(FP)
31 MOVL CX, ecx+12(FP)
32 MOVL DX, edx+16(FP)
33 RET
26 XORL CX, CX
27 MOVL op+0(FP), AX
28 CPUID
29 MOVL AX, eax+4(FP)
30 MOVL BX, ebx+8(FP)
31 MOVL CX, ecx+12(FP)
32 MOVL DX, edx+16(FP)
33 RET
3434
3535 // func cpuidex(op, op2 uint32) (eax, ebx, ecx, edx uint32)
3636 TEXT ·cpuidex(SB), 7, $0
37 MOVL op+0(FP), AX
38 MOVL op2+4(FP), CX
39 CPUID
40 MOVL AX, eax+8(FP)
41 MOVL BX, ebx+12(FP)
42 MOVL CX, ecx+16(FP)
43 MOVL DX, edx+20(FP)
44 RET
37 MOVL op+0(FP), AX
38 MOVL op2+4(FP), CX
39 CPUID
40 MOVL AX, eax+8(FP)
41 MOVL BX, ebx+12(FP)
42 MOVL CX, ecx+16(FP)
43 MOVL DX, edx+20(FP)
44 RET
4545
4646 // func xgetbv(index uint32) (eax, edx uint32)
4747 TEXT ·xgetbv(SB), 7, $0
48 MOVL index+0(FP), CX
49 BYTE $0x0f; BYTE $0x01; BYTE $0xd0 // XGETBV
50 MOVL AX, eax+4(FP)
51 MOVL DX, edx+8(FP)
52 RET
48 MOVL index+0(FP), CX
49 BYTE $0x0f; BYTE $0x01; BYTE $0xd0 // XGETBV
50 MOVL AX, eax+4(FP)
51 MOVL DX, edx+8(FP)
52 RET
2323
2424 // func cpuid(op uint32) (eax, ebx, ecx, edx uint32)
2525 TEXT ·cpuid(SB), 7, $0
26 XORQ CX, CX
27 MOVL op+0(FP), AX
28 CPUID
29 MOVL AX, eax+8(FP)
30 MOVL BX, ebx+12(FP)
31 MOVL CX, ecx+16(FP)
32 MOVL DX, edx+20(FP)
33 RET
34
26 XORQ CX, CX
27 MOVL op+0(FP), AX
28 CPUID
29 MOVL AX, eax+8(FP)
30 MOVL BX, ebx+12(FP)
31 MOVL CX, ecx+16(FP)
32 MOVL DX, edx+20(FP)
33 RET
3534
3635 // func cpuidex(op, op2 uint32) (eax, ebx, ecx, edx uint32)
3736 TEXT ·cpuidex(SB), 7, $0
38 MOVL op+0(FP), AX
39 MOVL op2+4(FP), CX
40 CPUID
41 MOVL AX, eax+8(FP)
42 MOVL BX, ebx+12(FP)
43 MOVL CX, ecx+16(FP)
44 MOVL DX, edx+20(FP)
45 RET
37 MOVL op+0(FP), AX
38 MOVL op2+4(FP), CX
39 CPUID
40 MOVL AX, eax+8(FP)
41 MOVL BX, ebx+12(FP)
42 MOVL CX, ecx+16(FP)
43 MOVL DX, edx+20(FP)
44 RET
4645
4746 // func xgetbv(index uint32) (eax, edx uint32)
4847 TEXT ·xgetbv(SB), 7, $0
49 MOVL index+0(FP), CX
50 BYTE $0x0f; BYTE $0x01; BYTE $0xd0 // XGETBV
51 MOVL AX, eax+8(FP)
52 MOVL DX, edx+12(FP)
53 RET
48 MOVL index+0(FP), CX
49 BYTE $0x0f; BYTE $0x01; BYTE $0xd0 // XGETBV
50 MOVL AX, eax+8(FP)
51 MOVL DX, edx+12(FP)
52 RET
1212 // limitations under the License.
1313 //
1414
15 // +build ppc64 ppc64le mips mipsle mips64 mips64le s390x
15 // +build !386,!amd64,!arm,!arm64 arm64,!linux
1616
1717 package sha256
1818
+0
-35
cpuid_others_arm64.go less more
0 // +build arm64,!linux
1
2 // Minio Cloud Storage, (C) 2016 Minio, Inc.
3 //
4 // Licensed under the Apache License, Version 2.0 (the "License");
5 // you may not use this file except in compliance with the License.
6 // You may obtain a copy of the License at
7 //
8 // http://www.apache.org/licenses/LICENSE-2.0
9 //
10 // Unless required by applicable law or agreed to in writing, software
11 // distributed under the License is distributed on an "AS IS" BASIS,
12 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 // See the License for the specific language governing permissions and
14 // limitations under the License.
15 //
16
17 package sha256
18
19 func cpuid(op uint32) (eax, ebx, ecx, edx uint32) {
20 return 0, 0, 0, 0
21 }
22
23 func cpuidex(op, op2 uint32) (eax, ebx, ecx, edx uint32) {
24 return 0, 0, 0, 0
25 }
26
27 func xgetbv(index uint32) (eax, edx uint32) {
28 return 0, 0
29 }
30
31 // Check for sha2 instruction flag.
32 func haveArmSha() bool {
33 return false
34 }
0 module github.com/minio/sha256-simd
1
2 go 1.12
1717
1818 import (
1919 "crypto/sha256"
20 "encoding/binary"
2021 "hash"
2122 "runtime"
2223 )
2829 const BlockSize = 64
2930
3031 const (
31 chunk = 64
32 chunk = BlockSize
3233 init0 = 0x6A09E667
3334 init1 = 0xBB67AE85
3435 init2 = 0x3C6EF372
6162 d.len = 0
6263 }
6364
64 func block(dig *digest, p []byte) {
65 type blockfuncType int
66
67 const (
68 blockfuncGeneric blockfuncType = iota
69 blockfuncAvx512 blockfuncType = iota
70 blockfuncAvx2 blockfuncType = iota
71 blockfuncAvx blockfuncType = iota
72 blockfuncSsse blockfuncType = iota
73 blockfuncSha blockfuncType = iota
74 blockfuncArm blockfuncType = iota
75 )
76
77 var blockfunc blockfuncType
78
79 func init() {
6580 is386bit := runtime.GOARCH == "386"
6681 isARM := runtime.GOARCH == "arm"
67 if is386bit || isARM {
68 blockGeneric(dig, p)
69 }
70 switch !is386bit && !isARM {
82 switch {
83 case is386bit || isARM:
84 blockfunc = blockfuncGeneric
85 case sha && ssse3 && sse41:
86 blockfunc = blockfuncSha
7187 case avx2:
72 blockAvx2Go(dig, p)
88 blockfunc = blockfuncAvx2
7389 case avx:
74 blockAvxGo(dig, p)
90 blockfunc = blockfuncAvx
7591 case ssse3:
76 blockSsseGo(dig, p)
92 blockfunc = blockfuncSsse
7793 case armSha:
78 blockArmGo(dig, p)
94 blockfunc = blockfuncArm
7995 default:
80 blockGeneric(dig, p)
96 blockfunc = blockfuncGeneric
8197 }
8298 }
8399
84100 // New returns a new hash.Hash computing the SHA256 checksum.
85101 func New() hash.Hash {
86 if avx2 || avx || ssse3 || armSha {
102 if blockfunc != blockfuncGeneric {
87103 d := new(digest)
88104 d.Reset()
89105 return d
94110 }
95111
96112 // Sum256 - single caller sha256 helper
97 func Sum256(data []byte) [Size]byte {
113 func Sum256(data []byte) (result [Size]byte) {
98114 var d digest
99115 d.Reset()
100116 d.Write(data)
101 return d.checkSum()
117 result = d.checkSum()
118 return
102119 }
103120
104121 // Return size of checksum
140157 }
141158
142159 // Intermediate checksum function
143 func (d *digest) checkSum() [Size]byte {
144 len := d.len
145 // Padding. Add a 1 bit and 0 bits until 56 bytes mod 64.
146 var tmp [64]byte
147 tmp[0] = 0x80
148 if len%64 < 56 {
149 d.Write(tmp[0 : 56-len%64])
150 } else {
151 d.Write(tmp[0 : 64+56-len%64])
152 }
153
154 // Length in bits.
155 len <<= 3
156 for i := uint(0); i < 8; i++ {
157 tmp[i] = byte(len >> (56 - 8*i))
158 }
159 d.Write(tmp[0:8])
160
161 if d.nx != 0 {
162 panic("d.nx != 0")
163 }
164
165 h := d.h[:]
166
167 var digest [Size]byte
168 for i, s := range h {
169 digest[i*4] = byte(s >> 24)
170 digest[i*4+1] = byte(s >> 16)
171 digest[i*4+2] = byte(s >> 8)
172 digest[i*4+3] = byte(s)
173 }
174
175 return digest
176 }
160 func (d *digest) checkSum() (digest [Size]byte) {
161 n := d.nx
162
163 var k [64]byte
164 copy(k[:], d.x[:n])
165
166 k[n] = 0x80
167
168 if n >= 56 {
169 block(d, k[:])
170
171 // clear block buffer - go compiles this to optimal 1x xorps + 4x movups
172 // unfortunately expressing this more succinctly results in much worse code
173 k[0] = 0
174 k[1] = 0
175 k[2] = 0
176 k[3] = 0
177 k[4] = 0
178 k[5] = 0
179 k[6] = 0
180 k[7] = 0
181 k[8] = 0
182 k[9] = 0
183 k[10] = 0
184 k[11] = 0
185 k[12] = 0
186 k[13] = 0
187 k[14] = 0
188 k[15] = 0
189 k[16] = 0
190 k[17] = 0
191 k[18] = 0
192 k[19] = 0
193 k[20] = 0
194 k[21] = 0
195 k[22] = 0
196 k[23] = 0
197 k[24] = 0
198 k[25] = 0
199 k[26] = 0
200 k[27] = 0
201 k[28] = 0
202 k[29] = 0
203 k[30] = 0
204 k[31] = 0
205 k[32] = 0
206 k[33] = 0
207 k[34] = 0
208 k[35] = 0
209 k[36] = 0
210 k[37] = 0
211 k[38] = 0
212 k[39] = 0
213 k[40] = 0
214 k[41] = 0
215 k[42] = 0
216 k[43] = 0
217 k[44] = 0
218 k[45] = 0
219 k[46] = 0
220 k[47] = 0
221 k[48] = 0
222 k[49] = 0
223 k[50] = 0
224 k[51] = 0
225 k[52] = 0
226 k[53] = 0
227 k[54] = 0
228 k[55] = 0
229 k[56] = 0
230 k[57] = 0
231 k[58] = 0
232 k[59] = 0
233 k[60] = 0
234 k[61] = 0
235 k[62] = 0
236 k[63] = 0
237 }
238 binary.BigEndian.PutUint64(k[56:64], uint64(d.len)<<3)
239 block(d, k[:])
240
241 {
242 const i = 0
243 binary.BigEndian.PutUint32(digest[i*4:i*4+4], d.h[i])
244 }
245 {
246 const i = 1
247 binary.BigEndian.PutUint32(digest[i*4:i*4+4], d.h[i])
248 }
249 {
250 const i = 2
251 binary.BigEndian.PutUint32(digest[i*4:i*4+4], d.h[i])
252 }
253 {
254 const i = 3
255 binary.BigEndian.PutUint32(digest[i*4:i*4+4], d.h[i])
256 }
257 {
258 const i = 4
259 binary.BigEndian.PutUint32(digest[i*4:i*4+4], d.h[i])
260 }
261 {
262 const i = 5
263 binary.BigEndian.PutUint32(digest[i*4:i*4+4], d.h[i])
264 }
265 {
266 const i = 6
267 binary.BigEndian.PutUint32(digest[i*4:i*4+4], d.h[i])
268 }
269 {
270 const i = 7
271 binary.BigEndian.PutUint32(digest[i*4:i*4+4], d.h[i])
272 }
273
274 return
275 }
276
277 func block(dig *digest, p []byte) {
278 if blockfunc == blockfuncSha {
279 blockShaGo(dig, p)
280 } else if blockfunc == blockfuncAvx2 {
281 blockAvx2Go(dig, p)
282 } else if blockfunc == blockfuncAvx {
283 blockAvxGo(dig, p)
284 } else if blockfunc == blockfuncSsse {
285 blockSsseGo(dig, p)
286 } else if blockfunc == blockfuncArm {
287 blockArmGo(dig, p)
288 } else if blockfunc == blockfuncGeneric {
289 blockGeneric(dig, p)
290 }
291 }
292
293 func blockGeneric(dig *digest, p []byte) {
294 var w [64]uint32
295 h0, h1, h2, h3, h4, h5, h6, h7 := dig.h[0], dig.h[1], dig.h[2], dig.h[3], dig.h[4], dig.h[5], dig.h[6], dig.h[7]
296 for len(p) >= chunk {
297 // Can interlace the computation of w with the
298 // rounds below if needed for speed.
299 for i := 0; i < 16; i++ {
300 j := i * 4
301 w[i] = uint32(p[j])<<24 | uint32(p[j+1])<<16 | uint32(p[j+2])<<8 | uint32(p[j+3])
302 }
303 for i := 16; i < 64; i++ {
304 v1 := w[i-2]
305 t1 := (v1>>17 | v1<<(32-17)) ^ (v1>>19 | v1<<(32-19)) ^ (v1 >> 10)
306 v2 := w[i-15]
307 t2 := (v2>>7 | v2<<(32-7)) ^ (v2>>18 | v2<<(32-18)) ^ (v2 >> 3)
308 w[i] = t1 + w[i-7] + t2 + w[i-16]
309 }
310
311 a, b, c, d, e, f, g, h := h0, h1, h2, h3, h4, h5, h6, h7
312
313 for i := 0; i < 64; i++ {
314 t1 := h + ((e>>6 | e<<(32-6)) ^ (e>>11 | e<<(32-11)) ^ (e>>25 | e<<(32-25))) + ((e & f) ^ (^e & g)) + _K[i] + w[i]
315
316 t2 := ((a>>2 | a<<(32-2)) ^ (a>>13 | a<<(32-13)) ^ (a>>22 | a<<(32-22))) + ((a & b) ^ (a & c) ^ (b & c))
317
318 h = g
319 g = f
320 f = e
321 e = d + t1
322 d = c
323 c = b
324 b = a
325 a = t1 + t2
326 }
327
328 h0 += a
329 h1 += b
330 h2 += c
331 h3 += d
332 h4 += e
333 h5 += f
334 h6 += g
335 h7 += h
336
337 p = p[chunk:]
338 }
339
340 dig.h[0], dig.h[1], dig.h[2], dig.h[3], dig.h[4], dig.h[5], dig.h[6], dig.h[7] = h0, h1, h2, h3, h4, h5, h6, h7
341 }
342
343 var _K = []uint32{
344 0x428a2f98,
345 0x71374491,
346 0xb5c0fbcf,
347 0xe9b5dba5,
348 0x3956c25b,
349 0x59f111f1,
350 0x923f82a4,
351 0xab1c5ed5,
352 0xd807aa98,
353 0x12835b01,
354 0x243185be,
355 0x550c7dc3,
356 0x72be5d74,
357 0x80deb1fe,
358 0x9bdc06a7,
359 0xc19bf174,
360 0xe49b69c1,
361 0xefbe4786,
362 0x0fc19dc6,
363 0x240ca1cc,
364 0x2de92c6f,
365 0x4a7484aa,
366 0x5cb0a9dc,
367 0x76f988da,
368 0x983e5152,
369 0xa831c66d,
370 0xb00327c8,
371 0xbf597fc7,
372 0xc6e00bf3,
373 0xd5a79147,
374 0x06ca6351,
375 0x14292967,
376 0x27b70a85,
377 0x2e1b2138,
378 0x4d2c6dfc,
379 0x53380d13,
380 0x650a7354,
381 0x766a0abb,
382 0x81c2c92e,
383 0x92722c85,
384 0xa2bfe8a1,
385 0xa81a664b,
386 0xc24b8b70,
387 0xc76c51a3,
388 0xd192e819,
389 0xd6990624,
390 0xf40e3585,
391 0x106aa070,
392 0x19a4c116,
393 0x1e376c08,
394 0x2748774c,
395 0x34b0bcb5,
396 0x391c0cb3,
397 0x4ed8aa4a,
398 0x5b9cca4f,
399 0x682e6ff3,
400 0x748f82ee,
401 0x78a5636f,
402 0x84c87814,
403 0x8cc70208,
404 0x90befffa,
405 0xa4506ceb,
406 0xbef9a3f7,
407 0xc67178f2,
408 }
5151 import (
5252 "encoding/hex"
5353 "fmt"
54 "runtime"
5455 "strings"
5556 "testing"
5657 )
22072208 }
22082209
22092210 func TestGolden(t *testing.T) {
2211 blockfuncSaved := blockfunc
2212
2213 defer func() {
2214 blockfunc = blockfuncSaved
2215 }()
2216
2217 if true {
2218 blockfunc = blockfuncGeneric
2219 for _, g := range golden {
2220 s := fmt.Sprintf("%x", Sum256([]byte(g.in)))
2221 if Sum256([]byte(g.in)) != g.out {
2222 t.Fatalf("Generic: Sum256 function: sha256(%s) = %s want %s", g.in, s, hex.EncodeToString(g.out[:]))
2223 }
2224 }
2225 }
2226
2227 if runtime.GOARCH == "386" || runtime.GOARCH == "arm" {
2228 // doesn't support anything but the generic version.
2229 return
2230 }
2231
2232 if sha && ssse3 && sse41 {
2233 blockfunc = blockfuncSha
2234 for _, g := range golden {
2235 s := fmt.Sprintf("%x", Sum256([]byte(g.in)))
2236 if Sum256([]byte(g.in)) != g.out {
2237 t.Fatalf("SHA: Sum256 function: sha256(%s) = %s want %s", g.in, s, hex.EncodeToString(g.out[:]))
2238 }
2239 }
2240 }
22102241 if avx2 {
2242 blockfunc = blockfuncAvx2
22112243 for _, g := range golden {
22122244 s := fmt.Sprintf("%x", Sum256([]byte(g.in)))
22132245 if Sum256([]byte(g.in)) != g.out {
22142246 t.Fatalf("AVX2: Sum256 function: sha256(%s) = %s want %s", g.in, s, hex.EncodeToString(g.out[:]))
22152247 }
22162248 }
2217 avx2 = false
22182249 }
22192250 if avx {
2251 blockfunc = blockfuncAvx
22202252 for _, g := range golden {
22212253 s := fmt.Sprintf("%x", Sum256([]byte(g.in)))
22222254 if Sum256([]byte(g.in)) != g.out {
22232255 t.Fatalf("AVX: Sum256 function: sha256(%s) = %s want %s", g.in, s, hex.EncodeToString(g.out[:]))
22242256 }
22252257 }
2226 avx = false
22272258 }
22282259 if ssse3 {
2260 blockfunc = blockfuncSsse
22292261 for _, g := range golden {
22302262 s := fmt.Sprintf("%x", Sum256([]byte(g.in)))
22312263 if Sum256([]byte(g.in)) != g.out {
22542286 var buf = make([]byte, size)
22552287 b.SetBytes(int64(size))
22562288 sum := make([]byte, bench.Size())
2289 b.ResetTimer()
22572290 for i := 0; i < b.N; i++ {
22582291 bench.Reset()
22592292 bench.Write(buf[:size])
22612294 }
22622295 }
22632296
2264 func BenchmarkHash8Bytes(b *testing.B) { benchmarkSize(b, 8) }
2265 func BenchmarkHash1K(b *testing.B) { benchmarkSize(b, 1024) }
2266 func BenchmarkHash8K(b *testing.B) { benchmarkSize(b, 8192) }
2267 func BenchmarkHash1MAvx2(b *testing.B) { benchmarkSize(b, 1024*1024) }
2268 func BenchmarkHash5MAvx2(b *testing.B) { benchmarkSize(b, 5*1024*1024) }
2269 func BenchmarkHash10MAvx2(b *testing.B) { benchmarkSize(b, 10*1024*1024) }
2297 func BenchmarkHash(b *testing.B) {
2298 algos := []struct {
2299 n string
2300 t blockfuncType
2301 f bool
2302 }{
2303 {"SHA_", blockfuncSha, sha && sse41 && ssse3},
2304 {"AVX2", blockfuncAvx2, avx2},
2305 {"AVX_", blockfuncAvx, avx},
2306 {"SSSE", blockfuncSsse, ssse3},
2307 {"GEN_", blockfuncGeneric, true},
2308 }
2309
2310 sizes := []struct {
2311 n string
2312 f func(*testing.B, int)
2313 s int
2314 }{
2315 {"8Bytes", benchmarkSize, 1 << 3},
2316 {"1K", benchmarkSize, 1 << 10},
2317 {"8K", benchmarkSize, 1 << 13},
2318 {"1M", benchmarkSize, 1 << 20},
2319 {"5M", benchmarkSize, 5 << 20},
2320 {"10M", benchmarkSize, 5 << 21},
2321 }
2322
2323 for _, a := range algos {
2324 if a.f {
2325 blockfuncSaved := blockfunc
2326 blockfunc = a.t
2327 for _, y := range sizes {
2328 s := a.n + "/" + y.n
2329 b.Run(s, func(b *testing.B) { y.f(b, y.s) })
2330 }
2331 blockfunc = blockfuncSaved
2332 }
2333 }
2334 }
0 //+build !noasm
0 //+build !noasm,!appengine
11
22 /*
33 * Minio Cloud Storage, (C) 2016 Minio, Inc.
0 //+build !noasm !appengine
0 //+build !noasm,!appengine
11
22 // SHA256 implementation for AVX2
33
3030 // github.com/minio/asm2plan9s to assemble Intel instructions to their Plan9
3131 // equivalents
3232 //
33
34 #include "textflag.h"
3533
3634 DATA K256<>+0x000(SB)/8, $0x71374491428a2f98
3735 DATA K256<>+0x008(SB)/8, $0xe9b5dba5b5c0fbcf
113111
114112 GLOBL K256<>(SB), 8, $608
115113
114 // We need 0x220 stack space aligned on a 512 boundary, so for the
115 // worstcase-aligned SP we need twice this amount, being 1088 (=0x440)
116 //
117 // SP aligned end-aligned stacksize
118 // 100013d0 10001400 10001620 592
119 // 100013d8 10001400 10001620 584
120 // 100013e0 10001600 10001820 1088
121 // 100013e8 10001600 10001820 1080
122
116123 // func blockAvx2(h []uint32, message []uint8)
117 TEXT ·blockAvx2(SB), 7, $0
118
119 MOVQ ctx+0(FP), DI // DI: &h
120 MOVQ inp+24(FP), SI // SI: &message
121 MOVQ inplength+32(FP), DX // len(message)
122 ADDQ SI, DX // end pointer of input
123 MOVQ SP, R11 // copy stack pointer
124 SUBQ $0x220, SP // sp -= 0x220
125 ANDQ $0xfffffffffffffc00, SP // align stack frame
126 ADDQ $0x1c0, SP
127 MOVQ DI, 0x40(SP) // save ctx
128 MOVQ SI, 0x48(SP) // save input
129 MOVQ DX, 0x50(SP) // save end pointer
130 MOVQ R11, 0x58(SP) // save copy of stack pointer
131
132 WORD $0xf8c5; BYTE $0x77 // vzeroupper
133 ADDQ $0x40, SI // input++
134 MOVL (DI), AX
135 MOVQ SI, R12 // borrow $T1
136 MOVL 4(DI), BX
137 CMPQ SI, DX // $_end
138 MOVL 8(DI), CX
139 LONG $0xe4440f4c // cmove r12,rsp /* next block or random data */
140 MOVL 12(DI), DX
141 MOVL 16(DI), R8
142 MOVL 20(DI), R9
143 MOVL 24(DI), R10
144 MOVL 28(DI), R11
145
146 LEAQ K256<>(SB), BP
147 LONG $0x856f7dc5; LONG $0x00000220 // VMOVDQA YMM8, 0x220[rbp] /* vmovdqa ymm8,YMMWORD PTR [rip+0x220] */
148 LONG $0x8d6f7dc5; LONG $0x00000240 // VMOVDQA YMM9, 0x240[rbp] /* vmovdqa ymm9,YMMWORD PTR [rip+0x240] */
149 LONG $0x956f7dc5; LONG $0x00000200 // VMOVDQA YMM10, 0x200[rbp] /* vmovdqa ymm7,YMMWORD PTR [rip+0x200] */
124 TEXT ·blockAvx2(SB),$1088-48
125
126 MOVQ h+0(FP), DI // DI: &h
127 MOVQ message_base+24(FP), SI // SI: &message
128 MOVQ message_len+32(FP), DX // len(message)
129 ADDQ SI, DX // end pointer of input
130 MOVQ SP, R11 // copy stack pointer
131 ADDQ $0x220, SP // sp += 0x220
132 ANDQ $0xfffffffffffffe00, SP // align stack frame
133 ADDQ $0x1c0, SP
134 MOVQ DI, 0x40(SP) // save ctx
135 MOVQ SI, 0x48(SP) // save input
136 MOVQ DX, 0x50(SP) // save end pointer
137 MOVQ R11, 0x58(SP) // save copy of stack pointer
138
139 WORD $0xf8c5; BYTE $0x77 // vzeroupper
140 ADDQ $0x40, SI // input++
141 MOVL (DI), AX
142 MOVQ SI, R12 // borrow $T1
143 MOVL 4(DI), BX
144 CMPQ SI, DX // $_end
145 MOVL 8(DI), CX
146 LONG $0xe4440f4c // cmove r12,rsp /* next block or random data */
147 MOVL 12(DI), DX
148 MOVL 16(DI), R8
149 MOVL 20(DI), R9
150 MOVL 24(DI), R10
151 MOVL 28(DI), R11
152
153 LEAQ K256<>(SB), BP
154 LONG $0x856f7dc5; LONG $0x00000220 // VMOVDQA YMM8, 0x220[rbp] /* vmovdqa ymm8,YMMWORD PTR [rip+0x220] */
155 LONG $0x8d6f7dc5; LONG $0x00000240 // VMOVDQA YMM9, 0x240[rbp] /* vmovdqa ymm9,YMMWORD PTR [rip+0x240] */
156 LONG $0x956f7dc5; LONG $0x00000200 // VMOVDQA YMM10, 0x200[rbp] /* vmovdqa ymm7,YMMWORD PTR [rip+0x200] */
150157
151158 loop0:
152 LONG $0x6f7dc1c4; BYTE $0xfa // VMOVDQA YMM7, YMM10
153
154 // Load first 16 dwords from two blocks
155 MOVOU -64(SI), X0 // vmovdqu xmm0,XMMWORD PTR [rsi-0x40]
156 MOVOU -48(SI), X1 // vmovdqu xmm1,XMMWORD PTR [rsi-0x30]
157 MOVOU -32(SI), X2 // vmovdqu xmm2,XMMWORD PTR [rsi-0x20]
158 MOVOU -16(SI), X3 // vmovdqu xmm3,XMMWORD PTR [rsi-0x10]
159
160 // Byte swap data and transpose data into high/low
161 LONG $0x387dc3c4; WORD $0x2404; BYTE $0x01 // vinserti128 ymm0,ymm0,[r12],0x1
162 LONG $0x3875c3c4; LONG $0x0110244c // vinserti128 ymm1,ymm1,0x10[r12],0x1
163 LONG $0x007de2c4; BYTE $0xc7 // vpshufb ymm0,ymm0,ymm7
164 LONG $0x386dc3c4; LONG $0x01202454 // vinserti128 ymm2,ymm2,0x20[r12],0x1
165 LONG $0x0075e2c4; BYTE $0xcf // vpshufb ymm1,ymm1,ymm7
166 LONG $0x3865c3c4; LONG $0x0130245c // vinserti128 ymm3,ymm3,0x30[r12],0x1
167
168 LEAQ K256<>(SB), BP
169 LONG $0x006de2c4; BYTE $0xd7 // vpshufb ymm2,ymm2,ymm7
170 LONG $0x65fefdc5; BYTE $0x00 // vpaddd ymm4,ymm0,[rbp]
171 LONG $0x0065e2c4; BYTE $0xdf // vpshufb ymm3,ymm3,ymm7
172 LONG $0x6dfef5c5; BYTE $0x20 // vpaddd ymm5,ymm1,0x20[rbp]
173 LONG $0x75feedc5; BYTE $0x40 // vpaddd ymm6,ymm2,0x40[rbp]
174 LONG $0x7dfee5c5; BYTE $0x60 // vpaddd ymm7,ymm3,0x60[rbp]
175
176 LONG $0x247ffdc5; BYTE $0x24 // vmovdqa [rsp],ymm4
177 XORQ R14, R14
178 LONG $0x6c7ffdc5; WORD $0x2024 // vmovdqa [rsp+0x20],ymm5
179
180 ADDQ $-0x40, SP
181 MOVQ BX, DI
182 LONG $0x347ffdc5; BYTE $0x24 // vmovdqa [rsp],ymm6
183 XORQ CX, DI // magic
184 LONG $0x7c7ffdc5; WORD $0x2024 // vmovdqa [rsp+0x20],ymm7
185 MOVQ R9, R12
186 ADDQ $0x80,BP
159 LONG $0x6f7dc1c4; BYTE $0xfa // VMOVDQA YMM7, YMM10
160
161 // Load first 16 dwords from two blocks
162 MOVOU -64(SI), X0 // vmovdqu xmm0,XMMWORD PTR [rsi-0x40]
163 MOVOU -48(SI), X1 // vmovdqu xmm1,XMMWORD PTR [rsi-0x30]
164 MOVOU -32(SI), X2 // vmovdqu xmm2,XMMWORD PTR [rsi-0x20]
165 MOVOU -16(SI), X3 // vmovdqu xmm3,XMMWORD PTR [rsi-0x10]
166
167 // Byte swap data and transpose data into high/low
168 LONG $0x387dc3c4; WORD $0x2404; BYTE $0x01 // vinserti128 ymm0,ymm0,[r12],0x1
169 LONG $0x3875c3c4; LONG $0x0110244c // vinserti128 ymm1,ymm1,0x10[r12],0x1
170 LONG $0x007de2c4; BYTE $0xc7 // vpshufb ymm0,ymm0,ymm7
171 LONG $0x386dc3c4; LONG $0x01202454 // vinserti128 ymm2,ymm2,0x20[r12],0x1
172 LONG $0x0075e2c4; BYTE $0xcf // vpshufb ymm1,ymm1,ymm7
173 LONG $0x3865c3c4; LONG $0x0130245c // vinserti128 ymm3,ymm3,0x30[r12],0x1
174
175 LEAQ K256<>(SB), BP
176 LONG $0x006de2c4; BYTE $0xd7 // vpshufb ymm2,ymm2,ymm7
177 LONG $0x65fefdc5; BYTE $0x00 // vpaddd ymm4,ymm0,[rbp]
178 LONG $0x0065e2c4; BYTE $0xdf // vpshufb ymm3,ymm3,ymm7
179 LONG $0x6dfef5c5; BYTE $0x20 // vpaddd ymm5,ymm1,0x20[rbp]
180 LONG $0x75feedc5; BYTE $0x40 // vpaddd ymm6,ymm2,0x40[rbp]
181 LONG $0x7dfee5c5; BYTE $0x60 // vpaddd ymm7,ymm3,0x60[rbp]
182
183 LONG $0x247ffdc5; BYTE $0x24 // vmovdqa [rsp],ymm4
184 XORQ R14, R14
185 LONG $0x6c7ffdc5; WORD $0x2024 // vmovdqa [rsp+0x20],ymm5
186
187 ADDQ $-0x40, SP
188 MOVQ BX, DI
189 LONG $0x347ffdc5; BYTE $0x24 // vmovdqa [rsp],ymm6
190 XORQ CX, DI // magic
191 LONG $0x7c7ffdc5; WORD $0x2024 // vmovdqa [rsp+0x20],ymm7
192 MOVQ R9, R12
193 ADDQ $0x80, BP
187194
188195 loop1:
189 // Schedule 48 input dwords, by doing 3 rounds of 12 each
190 // Note: SIMD instructions are interleaved with the SHA calculations
191 ADDQ $-0x40, SP
192 LONG $0x0f75e3c4; WORD $0x04e0 // vpalignr ymm4,ymm1,ymm0,0x4
193
194 // ROUND(AX, BX, CX, DX, R8, R9, R10, R11, R12, R13, R14, R15, DI, SP, 0x80)
195 LONG $0x249c0344; LONG $0x00000080 // add r11d,[rsp+0x80]
196 WORD $0x2145; BYTE $0xc4 // and r12d,r8d
197 LONG $0xf07b43c4; WORD $0x19e8 // rorx r13d,r8d,0x19
198 LONG $0x0f65e3c4; WORD $0x04fa // vpalignr ymm7,ymm3,ymm2,0x4
199 LONG $0xf07b43c4; WORD $0x0bf8 // rorx r15d,r8d,0xb
200 LONG $0x30048d42 // lea eax,[rax+r14*1]
201 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
202 LONG $0xd472cdc5; BYTE $0x07 // vpsrld ymm6,ymm4,0x7
203 LONG $0xf23842c4; BYTE $0xe2 // andn r12d,r8d,r10d
204 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
205 LONG $0xf07b43c4; WORD $0x06f0 // rorx r14d,r8d,0x6
206 LONG $0xc7fefdc5 // vpaddd ymm0,ymm0,ymm7
207 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
208 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
209 WORD $0x8941; BYTE $0xc7 // mov r15d,eax
210 LONG $0xd472c5c5; BYTE $0x03 // vpsrld ymm7,ymm4,0x3
211 LONG $0xf07b63c4; WORD $0x16e0 // rorx r12d,eax,0x16
212 LONG $0x2b1c8d47 // lea r11d,[r11+r13*1]
213 WORD $0x3141; BYTE $0xdf // xor r15d,ebx
214 LONG $0xf472d5c5; BYTE $0x0e // vpslld ymm5,ymm4,0xe
215 LONG $0xf07b63c4; WORD $0x0df0 // rorx r14d,eax,0xd
216 LONG $0xf07b63c4; WORD $0x02e8 // rorx r13d,eax,0x2
217 LONG $0x1a148d42 // lea edx,[rdx+r11*1]
218 LONG $0xe6efc5c5 // vpxor ymm4,ymm7,ymm6
219 WORD $0x2144; BYTE $0xff // and edi,r15d
220 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
221 WORD $0xdf31 // xor edi,ebx
222 LONG $0xfb70fdc5; BYTE $0xfa // vpshufd ymm7,ymm3,0xfa
223 WORD $0x3145; BYTE $0xee // xor r14d,r13d
224 LONG $0x3b1c8d45 // lea r11d,[r11+rdi*1]
225 WORD $0x8945; BYTE $0xc4 // mov r12d,r8d
226 LONG $0xd672cdc5; BYTE $0x0b // vpsrld ymm6,ymm6,0xb
227
228 // ROUND(R11, AX, BX, CX, DX, R8, R9, R10, R12, R13, R14, DI, R15, SP, 0x84)
229 LONG $0x24940344; LONG $0x00000084 // add r10d,[rsp+0x84]
230 WORD $0x2141; BYTE $0xd4 // and r12d,edx
231 LONG $0xf07b63c4; WORD $0x19ea // rorx r13d,edx,0x19
232 LONG $0xe5efddc5 // vpxor ymm4,ymm4,ymm5
233 LONG $0xf07be3c4; WORD $0x0bfa // rorx edi,edx,0xb
234 LONG $0x331c8d47 // lea r11d,[r11+r14*1]
235 LONG $0x22148d47 // lea r10d,[r10+r12*1]
236 LONG $0xf572d5c5; BYTE $0x0b // vpslld ymm5,ymm5,0xb
237 LONG $0xf26842c4; BYTE $0xe1 // andn r12d,edx,r9d
238 WORD $0x3141; BYTE $0xfd // xor r13d,edi
239 LONG $0xf07b63c4; WORD $0x06f2 // rorx r14d,edx,0x6
240 LONG $0xe6efddc5 // vpxor ymm4,ymm4,ymm6
241 LONG $0x22148d47 // lea r10d,[r10+r12*1]
242 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
243 WORD $0x8944; BYTE $0xdf // mov edi,r11d
244 LONG $0xd772cdc5; BYTE $0x0a // vpsrld ymm6,ymm7,0xa
245 LONG $0xf07b43c4; WORD $0x16e3 // rorx r12d,r11d,0x16
246 LONG $0x2a148d47 // lea r10d,[r10+r13*1]
247 WORD $0xc731 // xor edi,eax
248 LONG $0xe5efddc5 // vpxor ymm4,ymm4,ymm5
249 LONG $0xf07b43c4; WORD $0x0df3 // rorx r14d,r11d,0xd
250 LONG $0xf07b43c4; WORD $0x02eb // rorx r13d,r11d,0x2
251 LONG $0x110c8d42 // lea ecx,[rcx+r10*1]
252 LONG $0xd773c5c5; BYTE $0x11 // vpsrlq ymm7,ymm7,0x11
253 WORD $0x2141; BYTE $0xff // and r15d,edi
254 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
255 WORD $0x3141; BYTE $0xc7 // xor r15d,eax
256 LONG $0xc4fefdc5 // vpaddd ymm0,ymm0,ymm4
257 WORD $0x3145; BYTE $0xee // xor r14d,r13d
258 LONG $0x3a148d47 // lea r10d,[r10+r15*1]
259 WORD $0x8941; BYTE $0xd4 // mov r12d,edx
260 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
261
262 // ROUND(R10, R11, AX, BX, CX, DX, R8, R9, R12, R13, R14, R15, DI, SP, 0x88)
263 LONG $0x248c0344; LONG $0x00000088 // add r9d,[rsp+0x88]
264 WORD $0x2141; BYTE $0xcc // and r12d,ecx
265 LONG $0xf07b63c4; WORD $0x19e9 // rorx r13d,ecx,0x19
266 LONG $0xd773c5c5; BYTE $0x02 // vpsrlq ymm7,ymm7,0x2
267 LONG $0xf07b63c4; WORD $0x0bf9 // rorx r15d,ecx,0xb
268 LONG $0x32148d47 // lea r10d,[r10+r14*1]
269 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
270 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
271 LONG $0xf27042c4; BYTE $0xe0 // andn r12d,ecx,r8d
272 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
273 LONG $0xf07b63c4; WORD $0x06f1 // rorx r14d,ecx,0x6
274 LONG $0x004dc2c4; BYTE $0xf0 // vpshufb ymm6,ymm6,ymm8
275 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
276 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
277 WORD $0x8945; BYTE $0xd7 // mov r15d,r10d
278 LONG $0xc6fefdc5 // vpaddd ymm0,ymm0,ymm6
279 LONG $0xf07b43c4; WORD $0x16e2 // rorx r12d,r10d,0x16
280 LONG $0x290c8d47 // lea r9d,[r9+r13*1]
281 WORD $0x3145; BYTE $0xdf // xor r15d,r11d
282 LONG $0xf870fdc5; BYTE $0x50 // vpshufd ymm7,ymm0,0x50
283 LONG $0xf07b43c4; WORD $0x0df2 // rorx r14d,r10d,0xd
284 LONG $0xf07b43c4; WORD $0x02ea // rorx r13d,r10d,0x2
285 LONG $0x0b1c8d42 // lea ebx,[rbx+r9*1]
286 LONG $0xd772cdc5; BYTE $0x0a // vpsrld ymm6,ymm7,0xa
287 WORD $0x2144; BYTE $0xff // and edi,r15d
288 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
289 WORD $0x3144; BYTE $0xdf // xor edi,r11d
290 LONG $0xd773c5c5; BYTE $0x11 // vpsrlq ymm7,ymm7,0x11
291 WORD $0x3145; BYTE $0xee // xor r14d,r13d
292 LONG $0x390c8d45 // lea r9d,[r9+rdi*1]
293 WORD $0x8941; BYTE $0xcc // mov r12d,ecx
294 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
295
296 // ROUND(R9, R10, R11, AX, BX, CX, DX, R8, R12, R13, R14, DI, R15, SP, 0x8c)
297 LONG $0x24840344; LONG $0x0000008c // add r8d,[rsp+0x8c]
298 WORD $0x2141; BYTE $0xdc // and r12d,ebx
299 LONG $0xf07b63c4; WORD $0x19eb // rorx r13d,ebx,0x19
300 LONG $0xd773c5c5; BYTE $0x02 // vpsrlq ymm7,ymm7,0x2
301 LONG $0xf07be3c4; WORD $0x0bfb // rorx edi,ebx,0xb
302 LONG $0x310c8d47 // lea r9d,[r9+r14*1]
303 LONG $0x20048d47 // lea r8d,[r8+r12*1]
304 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
305 LONG $0xf26062c4; BYTE $0xe2 // andn r12d,ebx,edx
306 WORD $0x3141; BYTE $0xfd // xor r13d,edi
307 LONG $0xf07b63c4; WORD $0x06f3 // rorx r14d,ebx,0x6
308 LONG $0x004dc2c4; BYTE $0xf1 // vpshufb ymm6,ymm6,ymm9
309 LONG $0x20048d47 // lea r8d,[r8+r12*1]
310 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
311 WORD $0x8944; BYTE $0xcf // mov edi,r9d
312 LONG $0xc6fefdc5 // vpaddd ymm0,ymm0,ymm6
313 LONG $0xf07b43c4; WORD $0x16e1 // rorx r12d,r9d,0x16
314 LONG $0x28048d47 // lea r8d,[r8+r13*1]
315 WORD $0x3144; BYTE $0xd7 // xor edi,r10d
316 LONG $0x75fefdc5; BYTE $0x00 // vpaddd ymm6,ymm0,[rbp+0x0]
317 LONG $0xf07b43c4; WORD $0x0df1 // rorx r14d,r9d,0xd
318 LONG $0xf07b43c4; WORD $0x02e9 // rorx r13d,r9d,0x2
319 LONG $0x00048d42 // lea eax,[rax+r8*1]
320 WORD $0x2141; BYTE $0xff // and r15d,edi
321 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
322 WORD $0x3145; BYTE $0xd7 // xor r15d,r10d
323 WORD $0x3145; BYTE $0xee // xor r14d,r13d
324 LONG $0x38048d47 // lea r8d,[r8+r15*1]
325 WORD $0x8941; BYTE $0xdc // mov r12d,ebx
326
327 LONG $0x347ffdc5; BYTE $0x24 // vmovdqa [rsp],ymm6
328 LONG $0x0f6de3c4; WORD $0x04e1 // vpalignr ymm4,ymm2,ymm1,0x4
329
330 // ROUND(R8, R9, R10, R11, AX, BX, CX, DX, R12, R13, R14, R15, DI, SP, 0xa0)
331 LONG $0xa0249403; WORD $0x0000; BYTE $0x00 // add edx,[rsp+0xa0]
332 WORD $0x2141; BYTE $0xc4 // and r12d,eax
333 LONG $0xf07b63c4; WORD $0x19e8 // rorx r13d,eax,0x19
334 LONG $0x0f7de3c4; WORD $0x04fb // vpalignr ymm7,ymm0,ymm3,0x4
335 LONG $0xf07b63c4; WORD $0x0bf8 // rorx r15d,eax,0xb
336 LONG $0x30048d47 // lea r8d,[r8+r14*1]
337 LONG $0x22148d42 // lea edx,[rdx+r12*1]
338 LONG $0xd472cdc5; BYTE $0x07 // vpsrld ymm6,ymm4,0x7
339 LONG $0xf27862c4; BYTE $0xe1 // andn r12d,eax,ecx
340 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
341 LONG $0xf07b63c4; WORD $0x06f0 // rorx r14d,eax,0x6
342 LONG $0xcffef5c5 // vpaddd ymm1,ymm1,ymm7
343 LONG $0x22148d42 // lea edx,[rdx+r12*1]
344 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
345 WORD $0x8945; BYTE $0xc7 // mov r15d,r8d
346 LONG $0xd472c5c5; BYTE $0x03 // vpsrld ymm7,ymm4,0x3
347 LONG $0xf07b43c4; WORD $0x16e0 // rorx r12d,r8d,0x16
348 LONG $0x2a148d42 // lea edx,[rdx+r13*1]
349 WORD $0x3145; BYTE $0xcf // xor r15d,r9d
350 LONG $0xf472d5c5; BYTE $0x0e // vpslld ymm5,ymm4,0xe
351 LONG $0xf07b43c4; WORD $0x0df0 // rorx r14d,r8d,0xd
352 LONG $0xf07b43c4; WORD $0x02e8 // rorx r13d,r8d,0x2
353 LONG $0x131c8d45 // lea r11d,[r11+rdx*1]
354 LONG $0xe6efc5c5 // vpxor ymm4,ymm7,ymm6
355 WORD $0x2144; BYTE $0xff // and edi,r15d
356 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
357 WORD $0x3144; BYTE $0xcf // xor edi,r9d
358 LONG $0xf870fdc5; BYTE $0xfa // vpshufd ymm7,ymm0,0xfa
359 WORD $0x3145; BYTE $0xee // xor r14d,r13d
360 WORD $0x148d; BYTE $0x3a // lea edx,[rdx+rdi*1]
361 WORD $0x8941; BYTE $0xc4 // mov r12d,eax
362 LONG $0xd672cdc5; BYTE $0x0b // vpsrld ymm6,ymm6,0xb
363
364 // ROUND(DX, R8, R9, R10, R11, AX, BX, CX, R12, R13, R14, DI, R15, SP, 0xa4)
365 LONG $0xa4248c03; WORD $0x0000; BYTE $0x00 // add ecx,[rsp+0xa4]
366 WORD $0x2145; BYTE $0xdc // and r12d,r11d
367 LONG $0xf07b43c4; WORD $0x19eb // rorx r13d,r11d,0x19
368 LONG $0xe5efddc5 // vpxor ymm4,ymm4,ymm5
369 LONG $0xf07bc3c4; WORD $0x0bfb // rorx edi,r11d,0xb
370 LONG $0x32148d42 // lea edx,[rdx+r14*1]
371 LONG $0x210c8d42 // lea ecx,[rcx+r12*1]
372 LONG $0xf572d5c5; BYTE $0x0b // vpslld ymm5,ymm5,0xb
373 LONG $0xf22062c4; BYTE $0xe3 // andn r12d,r11d,ebx
374 WORD $0x3141; BYTE $0xfd // xor r13d,edi
375 LONG $0xf07b43c4; WORD $0x06f3 // rorx r14d,r11d,0x6
376 LONG $0xe6efddc5 // vpxor ymm4,ymm4,ymm6
377 LONG $0x210c8d42 // lea ecx,[rcx+r12*1]
378 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
379 WORD $0xd789 // mov edi,edx
380 LONG $0xd772cdc5; BYTE $0x0a // vpsrld ymm6,ymm7,0xa
381 LONG $0xf07b63c4; WORD $0x16e2 // rorx r12d,edx,0x16
382 LONG $0x290c8d42 // lea ecx,[rcx+r13*1]
383 WORD $0x3144; BYTE $0xc7 // xor edi,r8d
384 LONG $0xe5efddc5 // vpxor ymm4,ymm4,ymm5
385 LONG $0xf07b63c4; WORD $0x0df2 // rorx r14d,edx,0xd
386 LONG $0xf07b63c4; WORD $0x02ea // rorx r13d,edx,0x2
387 LONG $0x0a148d45 // lea r10d,[r10+rcx*1]
388 LONG $0xd773c5c5; BYTE $0x11 // vpsrlq ymm7,ymm7,0x11
389 WORD $0x2141; BYTE $0xff // and r15d,edi
390 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
391 WORD $0x3145; BYTE $0xc7 // xor r15d,r8d
392 LONG $0xccfef5c5 // vpaddd ymm1,ymm1,ymm4
393 WORD $0x3145; BYTE $0xee // xor r14d,r13d
394 LONG $0x390c8d42 // lea ecx,[rcx+r15*1]
395 WORD $0x8945; BYTE $0xdc // mov r12d,r11d
396 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
397
398 // ROUND(CX, DX, R8, R9, R10, R11, AX, BX, R12, R13, R14, R15, DI, SP, 0xa8)
399 LONG $0xa8249c03; WORD $0x0000; BYTE $0x00 // add ebx,[rsp+0xa8]
400 WORD $0x2145; BYTE $0xd4 // and r12d,r10d
401 LONG $0xf07b43c4; WORD $0x19ea // rorx r13d,r10d,0x19
402 LONG $0xd773c5c5; BYTE $0x02 // vpsrlq ymm7,ymm7,0x2
403 LONG $0xf07b43c4; WORD $0x0bfa // rorx r15d,r10d,0xb
404 LONG $0x310c8d42 // lea ecx,[rcx+r14*1]
405 LONG $0x231c8d42 // lea ebx,[rbx+r12*1]
406 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
407 LONG $0xf22862c4; BYTE $0xe0 // andn r12d,r10d,eax
408 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
409 LONG $0xf07b43c4; WORD $0x06f2 // rorx r14d,r10d,0x6
410 LONG $0x004dc2c4; BYTE $0xf0 // vpshufb ymm6,ymm6,ymm8
411 LONG $0x231c8d42 // lea ebx,[rbx+r12*1]
412 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
413 WORD $0x8941; BYTE $0xcf // mov r15d,ecx
414 LONG $0xcefef5c5 // vpaddd ymm1,ymm1,ymm6
415 LONG $0xf07b63c4; WORD $0x16e1 // rorx r12d,ecx,0x16
416 LONG $0x2b1c8d42 // lea ebx,[rbx+r13*1]
417 WORD $0x3141; BYTE $0xd7 // xor r15d,edx
418 LONG $0xf970fdc5; BYTE $0x50 // vpshufd ymm7,ymm1,0x50
419 LONG $0xf07b63c4; WORD $0x0df1 // rorx r14d,ecx,0xd
420 LONG $0xf07b63c4; WORD $0x02e9 // rorx r13d,ecx,0x2
421 LONG $0x190c8d45 // lea r9d,[r9+rbx*1]
422 LONG $0xd772cdc5; BYTE $0x0a // vpsrld ymm6,ymm7,0xa
423 WORD $0x2144; BYTE $0xff // and edi,r15d
424 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
425 WORD $0xd731 // xor edi,edx
426 LONG $0xd773c5c5; BYTE $0x11 // vpsrlq ymm7,ymm7,0x11
427 WORD $0x3145; BYTE $0xee // xor r14d,r13d
428 WORD $0x1c8d; BYTE $0x3b // lea ebx,[rbx+rdi*1]
429 WORD $0x8945; BYTE $0xd4 // mov r12d,r10d
430 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
431
432 // ROUND(BX, CX, DX, R8, R9, R10, R11, AX, R12, R13, R14, DI, R15, SP, 0xac)
433 LONG $0xac248403; WORD $0x0000; BYTE $0x00 // add eax,[rsp+0xac]
434 WORD $0x2145; BYTE $0xcc // and r12d,r9d
435 LONG $0xf07b43c4; WORD $0x19e9 // rorx r13d,r9d,0x19
436 LONG $0xd773c5c5; BYTE $0x02 // vpsrlq ymm7,ymm7,0x2
437 LONG $0xf07bc3c4; WORD $0x0bf9 // rorx edi,r9d,0xb
438 LONG $0x331c8d42 // lea ebx,[rbx+r14*1]
439 LONG $0x20048d42 // lea eax,[rax+r12*1]
440 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
441 LONG $0xf23042c4; BYTE $0xe3 // andn r12d,r9d,r11d
442 WORD $0x3141; BYTE $0xfd // xor r13d,edi
443 LONG $0xf07b43c4; WORD $0x06f1 // rorx r14d,r9d,0x6
444 LONG $0x004dc2c4; BYTE $0xf1 // vpshufb ymm6,ymm6,ymm9
445 LONG $0x20048d42 // lea eax,[rax+r12*1]
446 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
447 WORD $0xdf89 // mov edi,ebx
448 LONG $0xcefef5c5 // vpaddd ymm1,ymm1,ymm6
449 LONG $0xf07b63c4; WORD $0x16e3 // rorx r12d,ebx,0x16
450 LONG $0x28048d42 // lea eax,[rax+r13*1]
451 WORD $0xcf31 // xor edi,ecx
452 LONG $0x75fef5c5; BYTE $0x20 // vpaddd ymm6,ymm1,[rbp+0x20]
453 LONG $0xf07b63c4; WORD $0x0df3 // rorx r14d,ebx,0xd
454 LONG $0xf07b63c4; WORD $0x02eb // rorx r13d,ebx,0x2
455 LONG $0x00048d45 // lea r8d,[r8+rax*1]
456 WORD $0x2141; BYTE $0xff // and r15d,edi
457 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
458 WORD $0x3141; BYTE $0xcf // xor r15d,ecx
459 WORD $0x3145; BYTE $0xee // xor r14d,r13d
460 LONG $0x38048d42 // lea eax,[rax+r15*1]
461 WORD $0x8945; BYTE $0xcc // mov r12d,r9d
462
463 LONG $0x747ffdc5; WORD $0x2024 // vmovdqa [rsp+0x20],ymm6
464
465 LONG $0x24648d48; BYTE $0xc0 // lea rsp,[rsp-0x40]
466 LONG $0x0f65e3c4; WORD $0x04e2 // vpalignr ymm4,ymm3,ymm2,0x4
467
468 // ROUND(AX, BX, CX, DX, R8, R9, R10, R11, R12, R13, R14, R15, DI, SP, 0x80)
469 LONG $0x249c0344; LONG $0x00000080 // add r11d,[rsp+0x80]
470 WORD $0x2145; BYTE $0xc4 // and r12d,r8d
471 LONG $0xf07b43c4; WORD $0x19e8 // rorx r13d,r8d,0x19
472 LONG $0x0f75e3c4; WORD $0x04f8 // vpalignr ymm7,ymm1,ymm0,0x4
473 LONG $0xf07b43c4; WORD $0x0bf8 // rorx r15d,r8d,0xb
474 LONG $0x30048d42 // lea eax,[rax+r14*1]
475 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
476 LONG $0xd472cdc5; BYTE $0x07 // vpsrld ymm6,ymm4,0x7
477 LONG $0xf23842c4; BYTE $0xe2 // andn r12d,r8d,r10d
478 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
479 LONG $0xf07b43c4; WORD $0x06f0 // rorx r14d,r8d,0x6
480 LONG $0xd7feedc5 // vpaddd ymm2,ymm2,ymm7
481 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
482 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
483 WORD $0x8941; BYTE $0xc7 // mov r15d,eax
484 LONG $0xd472c5c5; BYTE $0x03 // vpsrld ymm7,ymm4,0x3
485 LONG $0xf07b63c4; WORD $0x16e0 // rorx r12d,eax,0x16
486 LONG $0x2b1c8d47 // lea r11d,[r11+r13*1]
487 WORD $0x3141; BYTE $0xdf // xor r15d,ebx
488 LONG $0xf472d5c5; BYTE $0x0e // vpslld ymm5,ymm4,0xe
489 LONG $0xf07b63c4; WORD $0x0df0 // rorx r14d,eax,0xd
490 LONG $0xf07b63c4; WORD $0x02e8 // rorx r13d,eax,0x2
491 LONG $0x1a148d42 // lea edx,[rdx+r11*1]
492 LONG $0xe6efc5c5 // vpxor ymm4,ymm7,ymm6
493 WORD $0x2144; BYTE $0xff // and edi,r15d
494 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
495 WORD $0xdf31 // xor edi,ebx
496 LONG $0xf970fdc5; BYTE $0xfa // vpshufd ymm7,ymm1,0xfa
497 WORD $0x3145; BYTE $0xee // xor r14d,r13d
498 LONG $0x3b1c8d45 // lea r11d,[r11+rdi*1]
499 WORD $0x8945; BYTE $0xc4 // mov r12d,r8d
500 LONG $0xd672cdc5; BYTE $0x0b // vpsrld ymm6,ymm6,0xb
501
502 // ROUND(R11, AX, BX, CX, DX, R8, R9, R10, R12, R13, R14, DI, R15, SP, 0x84)
503 LONG $0x24940344; LONG $0x00000084 // add r10d,[rsp+0x84]
504 WORD $0x2141; BYTE $0xd4 // and r12d,edx
505 LONG $0xf07b63c4; WORD $0x19ea // rorx r13d,edx,0x19
506 LONG $0xe5efddc5 // vpxor ymm4,ymm4,ymm5
507 LONG $0xf07be3c4; WORD $0x0bfa // rorx edi,edx,0xb
508 LONG $0x331c8d47 // lea r11d,[r11+r14*1]
509 LONG $0x22148d47 // lea r10d,[r10+r12*1]
510 LONG $0xf572d5c5; BYTE $0x0b // vpslld ymm5,ymm5,0xb
511 LONG $0xf26842c4; BYTE $0xe1 // andn r12d,edx,r9d
512 WORD $0x3141; BYTE $0xfd // xor r13d,edi
513 LONG $0xf07b63c4; WORD $0x06f2 // rorx r14d,edx,0x6
514 LONG $0xe6efddc5 // vpxor ymm4,ymm4,ymm6
515 LONG $0x22148d47 // lea r10d,[r10+r12*1]
516 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
517 WORD $0x8944; BYTE $0xdf // mov edi,r11d
518 LONG $0xd772cdc5; BYTE $0x0a // vpsrld ymm6,ymm7,0xa
519 LONG $0xf07b43c4; WORD $0x16e3 // rorx r12d,r11d,0x16
520 LONG $0x2a148d47 // lea r10d,[r10+r13*1]
521 WORD $0xc731 // xor edi,eax
522 LONG $0xe5efddc5 // vpxor ymm4,ymm4,ymm5
523 LONG $0xf07b43c4; WORD $0x0df3 // rorx r14d,r11d,0xd
524 LONG $0xf07b43c4; WORD $0x02eb // rorx r13d,r11d,0x2
525 LONG $0x110c8d42 // lea ecx,[rcx+r10*1]
526 LONG $0xd773c5c5; BYTE $0x11 // vpsrlq ymm7,ymm7,0x11
527 WORD $0x2141; BYTE $0xff // and r15d,edi
528 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
529 WORD $0x3141; BYTE $0xc7 // xor r15d,eax
530 LONG $0xd4feedc5 // vpaddd ymm2,ymm2,ymm4
531 WORD $0x3145; BYTE $0xee // xor r14d,r13d
532 LONG $0x3a148d47 // lea r10d,[r10+r15*1]
533 WORD $0x8941; BYTE $0xd4 // mov r12d,edx
534 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
535
536 // ROUND(R10, R11, AX, BX, CX, DX, R8, R9, R12, R13, R14, R15, DI, SP, 0x88)
537 LONG $0x248c0344; LONG $0x00000088 // add r9d,[rsp+0x88]
538 WORD $0x2141; BYTE $0xcc // and r12d,ecx
539 LONG $0xf07b63c4; WORD $0x19e9 // rorx r13d,ecx,0x19
540 LONG $0xd773c5c5; BYTE $0x02 // vpsrlq ymm7,ymm7,0x2
541 LONG $0xf07b63c4; WORD $0x0bf9 // rorx r15d,ecx,0xb
542 LONG $0x32148d47 // lea r10d,[r10+r14*1]
543 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
544 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
545 LONG $0xf27042c4; BYTE $0xe0 // andn r12d,ecx,r8d
546 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
547 LONG $0xf07b63c4; WORD $0x06f1 // rorx r14d,ecx,0x6
548 LONG $0x004dc2c4; BYTE $0xf0 // vpshufb ymm6,ymm6,ymm8
549 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
550 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
551 WORD $0x8945; BYTE $0xd7 // mov r15d,r10d
552 LONG $0xd6feedc5 // vpaddd ymm2,ymm2,ymm6
553 LONG $0xf07b43c4; WORD $0x16e2 // rorx r12d,r10d,0x16
554 LONG $0x290c8d47 // lea r9d,[r9+r13*1]
555 WORD $0x3145; BYTE $0xdf // xor r15d,r11d
556 LONG $0xfa70fdc5; BYTE $0x50 // vpshufd ymm7,ymm2,0x50
557 LONG $0xf07b43c4; WORD $0x0df2 // rorx r14d,r10d,0xd
558 LONG $0xf07b43c4; WORD $0x02ea // rorx r13d,r10d,0x2
559 LONG $0x0b1c8d42 // lea ebx,[rbx+r9*1]
560 LONG $0xd772cdc5; BYTE $0x0a // vpsrld ymm6,ymm7,0xa
561 WORD $0x2144; BYTE $0xff // and edi,r15d
562 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
563 WORD $0x3144; BYTE $0xdf // xor edi,r11d
564 LONG $0xd773c5c5; BYTE $0x11 // vpsrlq ymm7,ymm7,0x11
565 WORD $0x3145; BYTE $0xee // xor r14d,r13d
566 LONG $0x390c8d45 // lea r9d,[r9+rdi*1]
567 WORD $0x8941; BYTE $0xcc // mov r12d,ecx
568 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
569
570 // ROUND(R9, R10, R11, AX, BX, CX, DX, R8, R12, R13, R14, DI, R15, SP, 0x8c)
571 LONG $0x24840344; LONG $0x0000008c // add r8d,[rsp+0x8c]
572 WORD $0x2141; BYTE $0xdc // and r12d,ebx
573 LONG $0xf07b63c4; WORD $0x19eb // rorx r13d,ebx,0x19
574 LONG $0xd773c5c5; BYTE $0x02 // vpsrlq ymm7,ymm7,0x2
575 LONG $0xf07be3c4; WORD $0x0bfb // rorx edi,ebx,0xb
576 LONG $0x310c8d47 // lea r9d,[r9+r14*1]
577 LONG $0x20048d47 // lea r8d,[r8+r12*1]
578 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
579 LONG $0xf26062c4; BYTE $0xe2 // andn r12d,ebx,edx
580 WORD $0x3141; BYTE $0xfd // xor r13d,edi
581 LONG $0xf07b63c4; WORD $0x06f3 // rorx r14d,ebx,0x6
582 LONG $0x004dc2c4; BYTE $0xf1 // vpshufb ymm6,ymm6,ymm9
583 LONG $0x20048d47 // lea r8d,[r8+r12*1]
584 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
585 WORD $0x8944; BYTE $0xcf // mov edi,r9d
586 LONG $0xd6feedc5 // vpaddd ymm2,ymm2,ymm6
587 LONG $0xf07b43c4; WORD $0x16e1 // rorx r12d,r9d,0x16
588 LONG $0x28048d47 // lea r8d,[r8+r13*1]
589 WORD $0x3144; BYTE $0xd7 // xor edi,r10d
590 LONG $0x75feedc5; BYTE $0x40 // vpaddd ymm6,ymm2,[rbp+0x40]
591 LONG $0xf07b43c4; WORD $0x0df1 // rorx r14d,r9d,0xd
592 LONG $0xf07b43c4; WORD $0x02e9 // rorx r13d,r9d,0x2
593 LONG $0x00048d42 // lea eax,[rax+r8*1]
594 WORD $0x2141; BYTE $0xff // and r15d,edi
595 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
596 WORD $0x3145; BYTE $0xd7 // xor r15d,r10d
597 WORD $0x3145; BYTE $0xee // xor r14d,r13d
598 LONG $0x38048d47 // lea r8d,[r8+r15*1]
599 WORD $0x8941; BYTE $0xdc // mov r12d,ebx
600
601 LONG $0x347ffdc5; BYTE $0x24 // vmovdqa [rsp],ymm6
602 LONG $0x0f7de3c4; WORD $0x04e3 // vpalignr ymm4,ymm0,ymm3,0x4
603
604 // ROUND(R8, R9, R10, R11, AX, BX, CX, DX, R12, R13, R14, R15, DI, SP, 0xa0)
605 LONG $0xa0249403; WORD $0x0000; BYTE $0x00 // add edx,[rsp+0xa0]
606 WORD $0x2141; BYTE $0xc4 // and r12d,eax
607 LONG $0xf07b63c4; WORD $0x19e8 // rorx r13d,eax,0x19
608 LONG $0x0f6de3c4; WORD $0x04f9 // vpalignr ymm7,ymm2,ymm1,0x4
609 LONG $0xf07b63c4; WORD $0x0bf8 // rorx r15d,eax,0xb
610 LONG $0x30048d47 // lea r8d,[r8+r14*1]
611 LONG $0x22148d42 // lea edx,[rdx+r12*1]
612 LONG $0xd472cdc5; BYTE $0x07 // vpsrld ymm6,ymm4,0x7
613 LONG $0xf27862c4; BYTE $0xe1 // andn r12d,eax,ecx
614 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
615 LONG $0xf07b63c4; WORD $0x06f0 // rorx r14d,eax,0x6
616 LONG $0xdffee5c5 // vpaddd ymm3,ymm3,ymm7
617 LONG $0x22148d42 // lea edx,[rdx+r12*1]
618 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
619 WORD $0x8945; BYTE $0xc7 // mov r15d,r8d
620 LONG $0xd472c5c5; BYTE $0x03 // vpsrld ymm7,ymm4,0x3
621 LONG $0xf07b43c4; WORD $0x16e0 // rorx r12d,r8d,0x16
622 LONG $0x2a148d42 // lea edx,[rdx+r13*1]
623 WORD $0x3145; BYTE $0xcf // xor r15d,r9d
624 LONG $0xf472d5c5; BYTE $0x0e // vpslld ymm5,ymm4,0xe
625 LONG $0xf07b43c4; WORD $0x0df0 // rorx r14d,r8d,0xd
626 LONG $0xf07b43c4; WORD $0x02e8 // rorx r13d,r8d,0x2
627 LONG $0x131c8d45 // lea r11d,[r11+rdx*1]
628 LONG $0xe6efc5c5 // vpxor ymm4,ymm7,ymm6
629 WORD $0x2144; BYTE $0xff // and edi,r15d
630 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
631 WORD $0x3144; BYTE $0xcf // xor edi,r9d
632 LONG $0xfa70fdc5; BYTE $0xfa // vpshufd ymm7,ymm2,0xfa
633 WORD $0x3145; BYTE $0xee // xor r14d,r13d
634 WORD $0x148d; BYTE $0x3a // lea edx,[rdx+rdi*1]
635 WORD $0x8941; BYTE $0xc4 // mov r12d,eax
636 LONG $0xd672cdc5; BYTE $0x0b // vpsrld ymm6,ymm6,0xb
637
638 // ROUND(DX, R8, R9, R10, R11, AX, BX, CX, R12, R13, R14, DI, R15, SP, 0xa4)
639 LONG $0xa4248c03; WORD $0x0000; BYTE $0x00 // add ecx,[rsp+0xa4]
640 WORD $0x2145; BYTE $0xdc // and r12d,r11d
641 LONG $0xf07b43c4; WORD $0x19eb // rorx r13d,r11d,0x19
642 LONG $0xe5efddc5 // vpxor ymm4,ymm4,ymm5
643 LONG $0xf07bc3c4; WORD $0x0bfb // rorx edi,r11d,0xb
644 LONG $0x32148d42 // lea edx,[rdx+r14*1]
645 LONG $0x210c8d42 // lea ecx,[rcx+r12*1]
646 LONG $0xf572d5c5; BYTE $0x0b // vpslld ymm5,ymm5,0xb
647 LONG $0xf22062c4; BYTE $0xe3 // andn r12d,r11d,ebx
648 WORD $0x3141; BYTE $0xfd // xor r13d,edi
649 LONG $0xf07b43c4; WORD $0x06f3 // rorx r14d,r11d,0x6
650 LONG $0xe6efddc5 // vpxor ymm4,ymm4,ymm6
651 LONG $0x210c8d42 // lea ecx,[rcx+r12*1]
652 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
653 WORD $0xd789 // mov edi,edx
654 LONG $0xd772cdc5; BYTE $0x0a // vpsrld ymm6,ymm7,0xa
655 LONG $0xf07b63c4; WORD $0x16e2 // rorx r12d,edx,0x16
656 LONG $0x290c8d42 // lea ecx,[rcx+r13*1]
657 WORD $0x3144; BYTE $0xc7 // xor edi,r8d
658 LONG $0xe5efddc5 // vpxor ymm4,ymm4,ymm5
659 LONG $0xf07b63c4; WORD $0x0df2 // rorx r14d,edx,0xd
660 LONG $0xf07b63c4; WORD $0x02ea // rorx r13d,edx,0x2
661 LONG $0x0a148d45 // lea r10d,[r10+rcx*1]
662 LONG $0xd773c5c5; BYTE $0x11 // vpsrlq ymm7,ymm7,0x11
663 WORD $0x2141; BYTE $0xff // and r15d,edi
664 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
665 WORD $0x3145; BYTE $0xc7 // xor r15d,r8d
666 LONG $0xdcfee5c5 // vpaddd ymm3,ymm3,ymm4
667 WORD $0x3145; BYTE $0xee // xor r14d,r13d
668 LONG $0x390c8d42 // lea ecx,[rcx+r15*1]
669 WORD $0x8945; BYTE $0xdc // mov r12d,r11d
670 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
671
672 // ROUND(CX, DX, R8, R9, R10, R11, AX, BX, R12, R13, R14, R15, DI, SP, 0xa8)
673 LONG $0xa8249c03; WORD $0x0000; BYTE $0x00 // add ebx,[rsp+0xa8]
674 WORD $0x2145; BYTE $0xd4 // and r12d,r10d
675 LONG $0xf07b43c4; WORD $0x19ea // rorx r13d,r10d,0x19
676 LONG $0xd773c5c5; BYTE $0x02 // vpsrlq ymm7,ymm7,0x2
677 LONG $0xf07b43c4; WORD $0x0bfa // rorx r15d,r10d,0xb
678 LONG $0x310c8d42 // lea ecx,[rcx+r14*1]
679 LONG $0x231c8d42 // lea ebx,[rbx+r12*1]
680 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
681 LONG $0xf22862c4; BYTE $0xe0 // andn r12d,r10d,eax
682 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
683 LONG $0xf07b43c4; WORD $0x06f2 // rorx r14d,r10d,0x6
684 LONG $0x004dc2c4; BYTE $0xf0 // vpshufb ymm6,ymm6,ymm8
685 LONG $0x231c8d42 // lea ebx,[rbx+r12*1]
686 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
687 WORD $0x8941; BYTE $0xcf // mov r15d,ecx
688 LONG $0xdefee5c5 // vpaddd ymm3,ymm3,ymm6
689 LONG $0xf07b63c4; WORD $0x16e1 // rorx r12d,ecx,0x16
690 LONG $0x2b1c8d42 // lea ebx,[rbx+r13*1]
691 WORD $0x3141; BYTE $0xd7 // xor r15d,edx
692 LONG $0xfb70fdc5; BYTE $0x50 // vpshufd ymm7,ymm3,0x50
693 LONG $0xf07b63c4; WORD $0x0df1 // rorx r14d,ecx,0xd
694 LONG $0xf07b63c4; WORD $0x02e9 // rorx r13d,ecx,0x2
695 LONG $0x190c8d45 // lea r9d,[r9+rbx*1]
696 LONG $0xd772cdc5; BYTE $0x0a // vpsrld ymm6,ymm7,0xa
697 WORD $0x2144; BYTE $0xff // and edi,r15d
698 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
699 WORD $0xd731 // xor edi,edx
700 LONG $0xd773c5c5; BYTE $0x11 // vpsrlq ymm7,ymm7,0x11
701 WORD $0x3145; BYTE $0xee // xor r14d,r13d
702 WORD $0x1c8d; BYTE $0x3b // lea ebx,[rbx+rdi*1]
703 WORD $0x8945; BYTE $0xd4 // mov r12d,r10d
704 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
705
706 // ROUND(BX, CX, DX, R8, R9, R10, R11, AX, R12, R13, R14, DI, R15, SP, 0xac)
707 LONG $0xac248403; WORD $0x0000; BYTE $0x00 // add eax,[rsp+0xac]
708 WORD $0x2145; BYTE $0xcc // and r12d,r9d
709 LONG $0xf07b43c4; WORD $0x19e9 // rorx r13d,r9d,0x19
710 LONG $0xd773c5c5; BYTE $0x02 // vpsrlq ymm7,ymm7,0x2
711 LONG $0xf07bc3c4; WORD $0x0bf9 // rorx edi,r9d,0xb
712 LONG $0x331c8d42 // lea ebx,[rbx+r14*1]
713 LONG $0x20048d42 // lea eax,[rax+r12*1]
714 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
715 LONG $0xf23042c4; BYTE $0xe3 // andn r12d,r9d,r11d
716 WORD $0x3141; BYTE $0xfd // xor r13d,edi
717 LONG $0xf07b43c4; WORD $0x06f1 // rorx r14d,r9d,0x6
718 LONG $0x004dc2c4; BYTE $0xf1 // vpshufb ymm6,ymm6,ymm9
719 LONG $0x20048d42 // lea eax,[rax+r12*1]
720 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
721 WORD $0xdf89 // mov edi,ebx
722 LONG $0xdefee5c5 // vpaddd ymm3,ymm3,ymm6
723 LONG $0xf07b63c4; WORD $0x16e3 // rorx r12d,ebx,0x16
724 LONG $0x28048d42 // lea eax,[rax+r13*1]
725 WORD $0xcf31 // xor edi,ecx
726 LONG $0x75fee5c5; BYTE $0x60 // vpaddd ymm6,ymm3,[rbp+0x60]
727 LONG $0xf07b63c4; WORD $0x0df3 // rorx r14d,ebx,0xd
728 LONG $0xf07b63c4; WORD $0x02eb // rorx r13d,ebx,0x2
729 LONG $0x00048d45 // lea r8d,[r8+rax*1]
730 WORD $0x2141; BYTE $0xff // and r15d,edi
731 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
732 WORD $0x3141; BYTE $0xcf // xor r15d,ecx
733 WORD $0x3145; BYTE $0xee // xor r14d,r13d
734 LONG $0x38048d42 // lea eax,[rax+r15*1]
735 WORD $0x8945; BYTE $0xcc // mov r12d,r9d
736
737 LONG $0x747ffdc5; WORD $0x2024 // vmovdqa [rsp+0x20],ymm6
738 ADDQ $0x80, BP
739
740 CMPB 0x3(BP),$0x0
741 JNE loop1
742
743 // ROUND(AX, BX, CX, DX, R8, R9, R10, R11, R12, R13, R14, R15, DI, SP, 0x40)
744 LONG $0x245c0344; BYTE $0x40 // add r11d,[rsp+0x40]
745 WORD $0x2145; BYTE $0xc4 // and r12d,r8d
746 LONG $0xf07b43c4; WORD $0x19e8 // rorx r13d,r8d,0x19
747 LONG $0xf07b43c4; WORD $0x0bf8 // rorx r15d,r8d,0xb
748 LONG $0x30048d42 // lea eax,[rax+r14*1]
749 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
750 LONG $0xf23842c4; BYTE $0xe2 // andn r12d,r8d,r10d
751 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
752 LONG $0xf07b43c4; WORD $0x06f0 // rorx r14d,r8d,0x6
753 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
754 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
755 WORD $0x8941; BYTE $0xc7 // mov r15d,eax
756 LONG $0xf07b63c4; WORD $0x16e0 // rorx r12d,eax,0x16
757 LONG $0x2b1c8d47 // lea r11d,[r11+r13*1]
758 WORD $0x3141; BYTE $0xdf // xor r15d,ebx
759 LONG $0xf07b63c4; WORD $0x0df0 // rorx r14d,eax,0xd
760 LONG $0xf07b63c4; WORD $0x02e8 // rorx r13d,eax,0x2
761 LONG $0x1a148d42 // lea edx,[rdx+r11*1]
762 WORD $0x2144; BYTE $0xff // and edi,r15d
763 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
764 WORD $0xdf31 // xor edi,ebx
765 WORD $0x3145; BYTE $0xee // xor r14d,r13d
766 LONG $0x3b1c8d45 // lea r11d,[r11+rdi*1]
767 WORD $0x8945; BYTE $0xc4 // mov r12d,r8d
768
769 // ROUND(R11, AX, BX, CX, DX, R8, R9, R10, R12, R13, R14, DI, R15, SP, 0x44)
770 LONG $0x24540344; BYTE $0x44 // add r10d,[rsp+0x44]
771 WORD $0x2141; BYTE $0xd4 // and r12d,edx
772 LONG $0xf07b63c4; WORD $0x19ea // rorx r13d,edx,0x19
773 LONG $0xf07be3c4; WORD $0x0bfa // rorx edi,edx,0xb
774 LONG $0x331c8d47 // lea r11d,[r11+r14*1]
775 LONG $0x22148d47 // lea r10d,[r10+r12*1]
776 LONG $0xf26842c4; BYTE $0xe1 // andn r12d,edx,r9d
777 WORD $0x3141; BYTE $0xfd // xor r13d,edi
778 LONG $0xf07b63c4; WORD $0x06f2 // rorx r14d,edx,0x6
779 LONG $0x22148d47 // lea r10d,[r10+r12*1]
780 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
781 WORD $0x8944; BYTE $0xdf // mov edi,r11d
782 LONG $0xf07b43c4; WORD $0x16e3 // rorx r12d,r11d,0x16
783 LONG $0x2a148d47 // lea r10d,[r10+r13*1]
784 WORD $0xc731 // xor edi,eax
785 LONG $0xf07b43c4; WORD $0x0df3 // rorx r14d,r11d,0xd
786 LONG $0xf07b43c4; WORD $0x02eb // rorx r13d,r11d,0x2
787 LONG $0x110c8d42 // lea ecx,[rcx+r10*1]
788 WORD $0x2141; BYTE $0xff // and r15d,edi
789 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
790 WORD $0x3141; BYTE $0xc7 // xor r15d,eax
791 WORD $0x3145; BYTE $0xee // xor r14d,r13d
792 LONG $0x3a148d47 // lea r10d,[r10+r15*1]
793 WORD $0x8941; BYTE $0xd4 // mov r12d,edx
794
795 // ROUND(R10, R11, AX, BX, CX, DX, R8, R9, R12, R13, R14, R15, DI, SP, 0x48)
796 LONG $0x244c0344; BYTE $0x48 // add r9d,[rsp+0x48]
797 WORD $0x2141; BYTE $0xcc // and r12d,ecx
798 LONG $0xf07b63c4; WORD $0x19e9 // rorx r13d,ecx,0x19
799 LONG $0xf07b63c4; WORD $0x0bf9 // rorx r15d,ecx,0xb
800 LONG $0x32148d47 // lea r10d,[r10+r14*1]
801 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
802 LONG $0xf27042c4; BYTE $0xe0 // andn r12d,ecx,r8d
803 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
804 LONG $0xf07b63c4; WORD $0x06f1 // rorx r14d,ecx,0x6
805 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
806 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
807 WORD $0x8945; BYTE $0xd7 // mov r15d,r10d
808 LONG $0xf07b43c4; WORD $0x16e2 // rorx r12d,r10d,0x16
809 LONG $0x290c8d47 // lea r9d,[r9+r13*1]
810 WORD $0x3145; BYTE $0xdf // xor r15d,r11d
811 LONG $0xf07b43c4; WORD $0x0df2 // rorx r14d,r10d,0xd
812 LONG $0xf07b43c4; WORD $0x02ea // rorx r13d,r10d,0x2
813 LONG $0x0b1c8d42 // lea ebx,[rbx+r9*1]
814 WORD $0x2144; BYTE $0xff // and edi,r15d
815 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
816 WORD $0x3144; BYTE $0xdf // xor edi,r11d
817 WORD $0x3145; BYTE $0xee // xor r14d,r13d
818 LONG $0x390c8d45 // lea r9d,[r9+rdi*1]
819 WORD $0x8941; BYTE $0xcc // mov r12d,ecx
820
821 // ROUND(R9, R10, R11, AX, BX, CX, DX, R8, R12, R13, R14, DI, R15, SP, 0x4c)
822 LONG $0x24440344; BYTE $0x4c // add r8d,[rsp+0x4c]
823 WORD $0x2141; BYTE $0xdc // and r12d,ebx
824 LONG $0xf07b63c4; WORD $0x19eb // rorx r13d,ebx,0x19
825 LONG $0xf07be3c4; WORD $0x0bfb // rorx edi,ebx,0xb
826 LONG $0x310c8d47 // lea r9d,[r9+r14*1]
827 LONG $0x20048d47 // lea r8d,[r8+r12*1]
828 LONG $0xf26062c4; BYTE $0xe2 // andn r12d,ebx,edx
829 WORD $0x3141; BYTE $0xfd // xor r13d,edi
830 LONG $0xf07b63c4; WORD $0x06f3 // rorx r14d,ebx,0x6
831 LONG $0x20048d47 // lea r8d,[r8+r12*1]
832 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
833 WORD $0x8944; BYTE $0xcf // mov edi,r9d
834 LONG $0xf07b43c4; WORD $0x16e1 // rorx r12d,r9d,0x16
835 LONG $0x28048d47 // lea r8d,[r8+r13*1]
836 WORD $0x3144; BYTE $0xd7 // xor edi,r10d
837 LONG $0xf07b43c4; WORD $0x0df1 // rorx r14d,r9d,0xd
838 LONG $0xf07b43c4; WORD $0x02e9 // rorx r13d,r9d,0x2
839 LONG $0x00048d42 // lea eax,[rax+r8*1]
840 WORD $0x2141; BYTE $0xff // and r15d,edi
841 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
842 WORD $0x3145; BYTE $0xd7 // xor r15d,r10d
843 WORD $0x3145; BYTE $0xee // xor r14d,r13d
844 LONG $0x38048d47 // lea r8d,[r8+r15*1]
845 WORD $0x8941; BYTE $0xdc // mov r12d,ebx
846
847 // ROUND(R8, R9, R10, R11, AX, BX, CX, DX, R12, R13, R14, R15, DI, SP, 0x60)
848 LONG $0x60245403 // add edx,[rsp+0x60]
849 WORD $0x2141; BYTE $0xc4 // and r12d,eax
850 LONG $0xf07b63c4; WORD $0x19e8 // rorx r13d,eax,0x19
851 LONG $0xf07b63c4; WORD $0x0bf8 // rorx r15d,eax,0xb
852 LONG $0x30048d47 // lea r8d,[r8+r14*1]
853 LONG $0x22148d42 // lea edx,[rdx+r12*1]
854 LONG $0xf27862c4; BYTE $0xe1 // andn r12d,eax,ecx
855 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
856 LONG $0xf07b63c4; WORD $0x06f0 // rorx r14d,eax,0x6
857 LONG $0x22148d42 // lea edx,[rdx+r12*1]
858 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
859 WORD $0x8945; BYTE $0xc7 // mov r15d,r8d
860 LONG $0xf07b43c4; WORD $0x16e0 // rorx r12d,r8d,0x16
861 LONG $0x2a148d42 // lea edx,[rdx+r13*1]
862 WORD $0x3145; BYTE $0xcf // xor r15d,r9d
863 LONG $0xf07b43c4; WORD $0x0df0 // rorx r14d,r8d,0xd
864 LONG $0xf07b43c4; WORD $0x02e8 // rorx r13d,r8d,0x2
865 LONG $0x131c8d45 // lea r11d,[r11+rdx*1]
866 WORD $0x2144; BYTE $0xff // and edi,r15d
867 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
868 WORD $0x3144; BYTE $0xcf // xor edi,r9d
869 WORD $0x3145; BYTE $0xee // xor r14d,r13d
870 WORD $0x148d; BYTE $0x3a // lea edx,[rdx+rdi*1]
871 WORD $0x8941; BYTE $0xc4 // mov r12d,eax
872
873 // ROUND(DX, R8, R9, R10, R11, AX, BX, CX, R12, R13, R14, DI, R15, SP, 0x64)
874 LONG $0x64244c03 // add ecx,[rsp+0x64]
875 WORD $0x2145; BYTE $0xdc // and r12d,r11d
876 LONG $0xf07b43c4; WORD $0x19eb // rorx r13d,r11d,0x19
877 LONG $0xf07bc3c4; WORD $0x0bfb // rorx edi,r11d,0xb
878 LONG $0x32148d42 // lea edx,[rdx+r14*1]
879 LONG $0x210c8d42 // lea ecx,[rcx+r12*1]
880 LONG $0xf22062c4; BYTE $0xe3 // andn r12d,r11d,ebx
881 WORD $0x3141; BYTE $0xfd // xor r13d,edi
882 LONG $0xf07b43c4; WORD $0x06f3 // rorx r14d,r11d,0x6
883 LONG $0x210c8d42 // lea ecx,[rcx+r12*1]
884 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
885 WORD $0xd789 // mov edi,edx
886 LONG $0xf07b63c4; WORD $0x16e2 // rorx r12d,edx,0x16
887 LONG $0x290c8d42 // lea ecx,[rcx+r13*1]
888 WORD $0x3144; BYTE $0xc7 // xor edi,r8d
889 LONG $0xf07b63c4; WORD $0x0df2 // rorx r14d,edx,0xd
890 LONG $0xf07b63c4; WORD $0x02ea // rorx r13d,edx,0x2
891 LONG $0x0a148d45 // lea r10d,[r10+rcx*1]
892 WORD $0x2141; BYTE $0xff // and r15d,edi
893 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
894 WORD $0x3145; BYTE $0xc7 // xor r15d,r8d
895 WORD $0x3145; BYTE $0xee // xor r14d,r13d
896 LONG $0x390c8d42 // lea ecx,[rcx+r15*1]
897 WORD $0x8945; BYTE $0xdc // mov r12d,r11d
898
899 // ROUND(CX, DX, R8, R9, R10, R11, AX, BX, R12, R13, R14, R15, DI, SP, 0x68)
900 LONG $0x68245c03 // add ebx,[rsp+0x68]
901 WORD $0x2145; BYTE $0xd4 // and r12d,r10d
902 LONG $0xf07b43c4; WORD $0x19ea // rorx r13d,r10d,0x19
903 LONG $0xf07b43c4; WORD $0x0bfa // rorx r15d,r10d,0xb
904 LONG $0x310c8d42 // lea ecx,[rcx+r14*1]
905 LONG $0x231c8d42 // lea ebx,[rbx+r12*1]
906 LONG $0xf22862c4; BYTE $0xe0 // andn r12d,r10d,eax
907 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
908 LONG $0xf07b43c4; WORD $0x06f2 // rorx r14d,r10d,0x6
909 LONG $0x231c8d42 // lea ebx,[rbx+r12*1]
910 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
911 WORD $0x8941; BYTE $0xcf // mov r15d,ecx
912 LONG $0xf07b63c4; WORD $0x16e1 // rorx r12d,ecx,0x16
913 LONG $0x2b1c8d42 // lea ebx,[rbx+r13*1]
914 WORD $0x3141; BYTE $0xd7 // xor r15d,edx
915 LONG $0xf07b63c4; WORD $0x0df1 // rorx r14d,ecx,0xd
916 LONG $0xf07b63c4; WORD $0x02e9 // rorx r13d,ecx,0x2
917 LONG $0x190c8d45 // lea r9d,[r9+rbx*1]
918 WORD $0x2144; BYTE $0xff // and edi,r15d
919 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
920 WORD $0xd731 // xor edi,edx
921 WORD $0x3145; BYTE $0xee // xor r14d,r13d
922 WORD $0x1c8d; BYTE $0x3b // lea ebx,[rbx+rdi*1]
923 WORD $0x8945; BYTE $0xd4 // mov r12d,r10d
924
925 // ROUND(BX, CX, DX, R8, R9, R10, R11, AX, R12, R13, R14, DI, R15, SP, 0x6c)
926 LONG $0x6c244403 // add eax,[rsp+0x6c]
927 WORD $0x2145; BYTE $0xcc // and r12d,r9d
928 LONG $0xf07b43c4; WORD $0x19e9 // rorx r13d,r9d,0x19
929 LONG $0xf07bc3c4; WORD $0x0bf9 // rorx edi,r9d,0xb
930 LONG $0x331c8d42 // lea ebx,[rbx+r14*1]
931 LONG $0x20048d42 // lea eax,[rax+r12*1]
932 LONG $0xf23042c4; BYTE $0xe3 // andn r12d,r9d,r11d
933 WORD $0x3141; BYTE $0xfd // xor r13d,edi
934 LONG $0xf07b43c4; WORD $0x06f1 // rorx r14d,r9d,0x6
935 LONG $0x20048d42 // lea eax,[rax+r12*1]
936 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
937 WORD $0xdf89 // mov edi,ebx
938 LONG $0xf07b63c4; WORD $0x16e3 // rorx r12d,ebx,0x16
939 LONG $0x28048d42 // lea eax,[rax+r13*1]
940 WORD $0xcf31 // xor edi,ecx
941 LONG $0xf07b63c4; WORD $0x0df3 // rorx r14d,ebx,0xd
942 LONG $0xf07b63c4; WORD $0x02eb // rorx r13d,ebx,0x2
943 LONG $0x00048d45 // lea r8d,[r8+rax*1]
944 WORD $0x2141; BYTE $0xff // and r15d,edi
945 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
946 WORD $0x3141; BYTE $0xcf // xor r15d,ecx
947 WORD $0x3145; BYTE $0xee // xor r14d,r13d
948 LONG $0x38048d42 // lea eax,[rax+r15*1]
949 WORD $0x8945; BYTE $0xcc // mov r12d,r9d
950
951 // ROUND(AX, BX, CX, DX, R8, R9, R10, R11, R12, R13, R14, R15, DI, SP, 0x00)
952 LONG $0x241c0344 // add r11d,[rsp]
953 WORD $0x2145; BYTE $0xc4 // and r12d,r8d
954 LONG $0xf07b43c4; WORD $0x19e8 // rorx r13d,r8d,0x19
955 LONG $0xf07b43c4; WORD $0x0bf8 // rorx r15d,r8d,0xb
956 LONG $0x30048d42 // lea eax,[rax+r14*1]
957 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
958 LONG $0xf23842c4; BYTE $0xe2 // andn r12d,r8d,r10d
959 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
960 LONG $0xf07b43c4; WORD $0x06f0 // rorx r14d,r8d,0x6
961 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
962 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
963 WORD $0x8941; BYTE $0xc7 // mov r15d,eax
964 LONG $0xf07b63c4; WORD $0x16e0 // rorx r12d,eax,0x16
965 LONG $0x2b1c8d47 // lea r11d,[r11+r13*1]
966 WORD $0x3141; BYTE $0xdf // xor r15d,ebx
967 LONG $0xf07b63c4; WORD $0x0df0 // rorx r14d,eax,0xd
968 LONG $0xf07b63c4; WORD $0x02e8 // rorx r13d,eax,0x2
969 LONG $0x1a148d42 // lea edx,[rdx+r11*1]
970 WORD $0x2144; BYTE $0xff // and edi,r15d
971 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
972 WORD $0xdf31 // xor edi,ebx
973 WORD $0x3145; BYTE $0xee // xor r14d,r13d
974 LONG $0x3b1c8d45 // lea r11d,[r11+rdi*1]
975 WORD $0x8945; BYTE $0xc4 // mov r12d,r8d
976
977 // ROUND(R11, AX, BX, CX, DX, R8, R9, R10, R12, R13, R14, DI, R15, SP, 0x04)
978 LONG $0x24540344; BYTE $0x04 // add r10d,[rsp+0x4]
979 WORD $0x2141; BYTE $0xd4 // and r12d,edx
980 LONG $0xf07b63c4; WORD $0x19ea // rorx r13d,edx,0x19
981 LONG $0xf07be3c4; WORD $0x0bfa // rorx edi,edx,0xb
982 LONG $0x331c8d47 // lea r11d,[r11+r14*1]
983 LONG $0x22148d47 // lea r10d,[r10+r12*1]
984 LONG $0xf26842c4; BYTE $0xe1 // andn r12d,edx,r9d
985 WORD $0x3141; BYTE $0xfd // xor r13d,edi
986 LONG $0xf07b63c4; WORD $0x06f2 // rorx r14d,edx,0x6
987 LONG $0x22148d47 // lea r10d,[r10+r12*1]
988 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
989 WORD $0x8944; BYTE $0xdf // mov edi,r11d
990 LONG $0xf07b43c4; WORD $0x16e3 // rorx r12d,r11d,0x16
991 LONG $0x2a148d47 // lea r10d,[r10+r13*1]
992 WORD $0xc731 // xor edi,eax
993 LONG $0xf07b43c4; WORD $0x0df3 // rorx r14d,r11d,0xd
994 LONG $0xf07b43c4; WORD $0x02eb // rorx r13d,r11d,0x2
995 LONG $0x110c8d42 // lea ecx,[rcx+r10*1]
996 WORD $0x2141; BYTE $0xff // and r15d,edi
997 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
998 WORD $0x3141; BYTE $0xc7 // xor r15d,eax
999 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1000 LONG $0x3a148d47 // lea r10d,[r10+r15*1]
1001 WORD $0x8941; BYTE $0xd4 // mov r12d,edx
1002
1003 // ROUND(R10, R11, AX, BX, CX, DX, R8, R9, R12, R13, R14, R15, DI, SP, 0x08)
1004 LONG $0x244c0344; BYTE $0x08 // add r9d,[rsp+0x8]
1005 WORD $0x2141; BYTE $0xcc // and r12d,ecx
1006 LONG $0xf07b63c4; WORD $0x19e9 // rorx r13d,ecx,0x19
1007 LONG $0xf07b63c4; WORD $0x0bf9 // rorx r15d,ecx,0xb
1008 LONG $0x32148d47 // lea r10d,[r10+r14*1]
1009 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
1010 LONG $0xf27042c4; BYTE $0xe0 // andn r12d,ecx,r8d
1011 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
1012 LONG $0xf07b63c4; WORD $0x06f1 // rorx r14d,ecx,0x6
1013 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
1014 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
1015 WORD $0x8945; BYTE $0xd7 // mov r15d,r10d
1016 LONG $0xf07b43c4; WORD $0x16e2 // rorx r12d,r10d,0x16
1017 LONG $0x290c8d47 // lea r9d,[r9+r13*1]
1018 WORD $0x3145; BYTE $0xdf // xor r15d,r11d
1019 LONG $0xf07b43c4; WORD $0x0df2 // rorx r14d,r10d,0xd
1020 LONG $0xf07b43c4; WORD $0x02ea // rorx r13d,r10d,0x2
1021 LONG $0x0b1c8d42 // lea ebx,[rbx+r9*1]
1022 WORD $0x2144; BYTE $0xff // and edi,r15d
1023 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1024 WORD $0x3144; BYTE $0xdf // xor edi,r11d
1025 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1026 LONG $0x390c8d45 // lea r9d,[r9+rdi*1]
1027 WORD $0x8941; BYTE $0xcc // mov r12d,ecx
1028
1029 // ROUND(R9, R10, R11, AX, BX, CX, DX, R8, R12, R13, R14, DI, R15, SP, 0x0c)
1030 LONG $0x24440344; BYTE $0x0c // add r8d,[rsp+0xc]
1031 WORD $0x2141; BYTE $0xdc // and r12d,ebx
1032 LONG $0xf07b63c4; WORD $0x19eb // rorx r13d,ebx,0x19
1033 LONG $0xf07be3c4; WORD $0x0bfb // rorx edi,ebx,0xb
1034 LONG $0x310c8d47 // lea r9d,[r9+r14*1]
1035 LONG $0x20048d47 // lea r8d,[r8+r12*1]
1036 LONG $0xf26062c4; BYTE $0xe2 // andn r12d,ebx,edx
1037 WORD $0x3141; BYTE $0xfd // xor r13d,edi
1038 LONG $0xf07b63c4; WORD $0x06f3 // rorx r14d,ebx,0x6
1039 LONG $0x20048d47 // lea r8d,[r8+r12*1]
1040 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
1041 WORD $0x8944; BYTE $0xcf // mov edi,r9d
1042 LONG $0xf07b43c4; WORD $0x16e1 // rorx r12d,r9d,0x16
1043 LONG $0x28048d47 // lea r8d,[r8+r13*1]
1044 WORD $0x3144; BYTE $0xd7 // xor edi,r10d
1045 LONG $0xf07b43c4; WORD $0x0df1 // rorx r14d,r9d,0xd
1046 LONG $0xf07b43c4; WORD $0x02e9 // rorx r13d,r9d,0x2
1047 LONG $0x00048d42 // lea eax,[rax+r8*1]
1048 WORD $0x2141; BYTE $0xff // and r15d,edi
1049 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1050 WORD $0x3145; BYTE $0xd7 // xor r15d,r10d
1051 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1052 LONG $0x38048d47 // lea r8d,[r8+r15*1]
1053 WORD $0x8941; BYTE $0xdc // mov r12d,ebx
1054
1055 // ROUND(R8, R9, R10, R11, AX, BX, CX, DX, R12, R13, R14, R15, DI, SP, 0x20)
1056 LONG $0x20245403 // add edx,[rsp+0x20]
1057 WORD $0x2141; BYTE $0xc4 // and r12d,eax
1058 LONG $0xf07b63c4; WORD $0x19e8 // rorx r13d,eax,0x19
1059 LONG $0xf07b63c4; WORD $0x0bf8 // rorx r15d,eax,0xb
1060 LONG $0x30048d47 // lea r8d,[r8+r14*1]
1061 LONG $0x22148d42 // lea edx,[rdx+r12*1]
1062 LONG $0xf27862c4; BYTE $0xe1 // andn r12d,eax,ecx
1063 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
1064 LONG $0xf07b63c4; WORD $0x06f0 // rorx r14d,eax,0x6
1065 LONG $0x22148d42 // lea edx,[rdx+r12*1]
1066 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
1067 WORD $0x8945; BYTE $0xc7 // mov r15d,r8d
1068 LONG $0xf07b43c4; WORD $0x16e0 // rorx r12d,r8d,0x16
1069 LONG $0x2a148d42 // lea edx,[rdx+r13*1]
1070 WORD $0x3145; BYTE $0xcf // xor r15d,r9d
1071 LONG $0xf07b43c4; WORD $0x0df0 // rorx r14d,r8d,0xd
1072 LONG $0xf07b43c4; WORD $0x02e8 // rorx r13d,r8d,0x2
1073 LONG $0x131c8d45 // lea r11d,[r11+rdx*1]
1074 WORD $0x2144; BYTE $0xff // and edi,r15d
1075 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1076 WORD $0x3144; BYTE $0xcf // xor edi,r9d
1077 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1078 WORD $0x148d; BYTE $0x3a // lea edx,[rdx+rdi*1]
1079 WORD $0x8941; BYTE $0xc4 // mov r12d,eax
1080
1081 // ROUND(DX, R8, R9, R10, R11, AX, BX, CX, R12, R13, R14, DI, R15, SP, 0x24)
1082 LONG $0x24244c03 // add ecx,[rsp+0x24]
1083 WORD $0x2145; BYTE $0xdc // and r12d,r11d
1084 LONG $0xf07b43c4; WORD $0x19eb // rorx r13d,r11d,0x19
1085 LONG $0xf07bc3c4; WORD $0x0bfb // rorx edi,r11d,0xb
1086 LONG $0x32148d42 // lea edx,[rdx+r14*1]
1087 LONG $0x210c8d42 // lea ecx,[rcx+r12*1]
1088 LONG $0xf22062c4; BYTE $0xe3 // andn r12d,r11d,ebx
1089 WORD $0x3141; BYTE $0xfd // xor r13d,edi
1090 LONG $0xf07b43c4; WORD $0x06f3 // rorx r14d,r11d,0x6
1091 LONG $0x210c8d42 // lea ecx,[rcx+r12*1]
1092 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
1093 WORD $0xd789 // mov edi,edx
1094 LONG $0xf07b63c4; WORD $0x16e2 // rorx r12d,edx,0x16
1095 LONG $0x290c8d42 // lea ecx,[rcx+r13*1]
1096 WORD $0x3144; BYTE $0xc7 // xor edi,r8d
1097 LONG $0xf07b63c4; WORD $0x0df2 // rorx r14d,edx,0xd
1098 LONG $0xf07b63c4; WORD $0x02ea // rorx r13d,edx,0x2
1099 LONG $0x0a148d45 // lea r10d,[r10+rcx*1]
1100 WORD $0x2141; BYTE $0xff // and r15d,edi
1101 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1102 WORD $0x3145; BYTE $0xc7 // xor r15d,r8d
1103 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1104 LONG $0x390c8d42 // lea ecx,[rcx+r15*1]
1105 WORD $0x8945; BYTE $0xdc // mov r12d,r11d
1106
1107 // ROUND(CX, DX, R8, R9, R10, R11, AX, BX, R12, R13, R14, R15, DI, SP, 0x28)
1108 LONG $0x28245c03 // add ebx,[rsp+0x28]
1109 WORD $0x2145; BYTE $0xd4 // and r12d,r10d
1110 LONG $0xf07b43c4; WORD $0x19ea // rorx r13d,r10d,0x19
1111 LONG $0xf07b43c4; WORD $0x0bfa // rorx r15d,r10d,0xb
1112 LONG $0x310c8d42 // lea ecx,[rcx+r14*1]
1113 LONG $0x231c8d42 // lea ebx,[rbx+r12*1]
1114 LONG $0xf22862c4; BYTE $0xe0 // andn r12d,r10d,eax
1115 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
1116 LONG $0xf07b43c4; WORD $0x06f2 // rorx r14d,r10d,0x6
1117 LONG $0x231c8d42 // lea ebx,[rbx+r12*1]
1118 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
1119 WORD $0x8941; BYTE $0xcf // mov r15d,ecx
1120 LONG $0xf07b63c4; WORD $0x16e1 // rorx r12d,ecx,0x16
1121 LONG $0x2b1c8d42 // lea ebx,[rbx+r13*1]
1122 WORD $0x3141; BYTE $0xd7 // xor r15d,edx
1123 LONG $0xf07b63c4; WORD $0x0df1 // rorx r14d,ecx,0xd
1124 LONG $0xf07b63c4; WORD $0x02e9 // rorx r13d,ecx,0x2
1125 LONG $0x190c8d45 // lea r9d,[r9+rbx*1]
1126 WORD $0x2144; BYTE $0xff // and edi,r15d
1127 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1128 WORD $0xd731 // xor edi,edx
1129 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1130 WORD $0x1c8d; BYTE $0x3b // lea ebx,[rbx+rdi*1]
1131 WORD $0x8945; BYTE $0xd4 // mov r12d,r10d
1132
1133 // ROUND(BX, CX, DX, R8, R9, R10, R11, AX, R12, R13, R14, DI, R15, SP, 0x2c)
1134 LONG $0x2c244403 // add eax,[rsp+0x2c]
1135 WORD $0x2145; BYTE $0xcc // and r12d,r9d
1136 LONG $0xf07b43c4; WORD $0x19e9 // rorx r13d,r9d,0x19
1137 LONG $0xf07bc3c4; WORD $0x0bf9 // rorx edi,r9d,0xb
1138 LONG $0x331c8d42 // lea ebx,[rbx+r14*1]
1139 LONG $0x20048d42 // lea eax,[rax+r12*1]
1140 LONG $0xf23042c4; BYTE $0xe3 // andn r12d,r9d,r11d
1141 WORD $0x3141; BYTE $0xfd // xor r13d,edi
1142 LONG $0xf07b43c4; WORD $0x06f1 // rorx r14d,r9d,0x6
1143 LONG $0x20048d42 // lea eax,[rax+r12*1]
1144 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
1145 WORD $0xdf89 // mov edi,ebx
1146 LONG $0xf07b63c4; WORD $0x16e3 // rorx r12d,ebx,0x16
1147 LONG $0x28048d42 // lea eax,[rax+r13*1]
1148 WORD $0xcf31 // xor edi,ecx
1149 LONG $0xf07b63c4; WORD $0x0df3 // rorx r14d,ebx,0xd
1150 LONG $0xf07b63c4; WORD $0x02eb // rorx r13d,ebx,0x2
1151 LONG $0x00048d45 // lea r8d,[r8+rax*1]
1152 WORD $0x2141; BYTE $0xff // and r15d,edi
1153 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1154 WORD $0x3141; BYTE $0xcf // xor r15d,ecx
1155 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1156 LONG $0x38048d42 // lea eax,[rax+r15*1]
1157 WORD $0x8945; BYTE $0xcc // mov r12d,r9d
1158
1159 MOVQ 0x200(SP), DI // $_ctx
1160 ADDQ R14, AX
1161
1162 LEAQ 0x1c0(SP), BP
1163
1164 ADDL (DI), AX
1165 ADDL 4(DI), BX
1166 ADDL 8(DI), CX
1167 ADDL 12(DI), DX
1168 ADDL 16(DI), R8
1169 ADDL 20(DI), R9
1170 ADDL 24(DI), R10
1171 ADDL 28(DI), R11
1172
1173 MOVL AX, (DI)
1174 MOVL BX, 4(DI)
1175 MOVL CX, 8(DI)
1176 MOVL DX, 12(DI)
1177 MOVL R8, 16(DI)
1178 MOVL R9, 20(DI)
1179 MOVL R10, 24(DI)
1180 MOVL R11, 28(DI)
1181
1182 CMPQ SI, 0x50(BP) // $_end
1183 JE done
1184
1185 XORQ R14, R14
1186 MOVQ BX, DI
1187 XORQ CX, DI // magic
1188 MOVQ R9, R12
196 // Schedule 48 input dwords, by doing 3 rounds of 12 each
197 // Note: SIMD instructions are interleaved with the SHA calculations
198 ADDQ $-0x40, SP
199 LONG $0x0f75e3c4; WORD $0x04e0 // vpalignr ymm4,ymm1,ymm0,0x4
200
201 // ROUND(AX, BX, CX, DX, R8, R9, R10, R11, R12, R13, R14, R15, DI, SP, 0x80)
202 LONG $0x249c0344; LONG $0x00000080 // add r11d,[rsp+0x80]
203 WORD $0x2145; BYTE $0xc4 // and r12d,r8d
204 LONG $0xf07b43c4; WORD $0x19e8 // rorx r13d,r8d,0x19
205 LONG $0x0f65e3c4; WORD $0x04fa // vpalignr ymm7,ymm3,ymm2,0x4
206 LONG $0xf07b43c4; WORD $0x0bf8 // rorx r15d,r8d,0xb
207 LONG $0x30048d42 // lea eax,[rax+r14*1]
208 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
209 LONG $0xd472cdc5; BYTE $0x07 // vpsrld ymm6,ymm4,0x7
210 LONG $0xf23842c4; BYTE $0xe2 // andn r12d,r8d,r10d
211 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
212 LONG $0xf07b43c4; WORD $0x06f0 // rorx r14d,r8d,0x6
213 LONG $0xc7fefdc5 // vpaddd ymm0,ymm0,ymm7
214 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
215 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
216 WORD $0x8941; BYTE $0xc7 // mov r15d,eax
217 LONG $0xd472c5c5; BYTE $0x03 // vpsrld ymm7,ymm4,0x3
218 LONG $0xf07b63c4; WORD $0x16e0 // rorx r12d,eax,0x16
219 LONG $0x2b1c8d47 // lea r11d,[r11+r13*1]
220 WORD $0x3141; BYTE $0xdf // xor r15d,ebx
221 LONG $0xf472d5c5; BYTE $0x0e // vpslld ymm5,ymm4,0xe
222 LONG $0xf07b63c4; WORD $0x0df0 // rorx r14d,eax,0xd
223 LONG $0xf07b63c4; WORD $0x02e8 // rorx r13d,eax,0x2
224 LONG $0x1a148d42 // lea edx,[rdx+r11*1]
225 LONG $0xe6efc5c5 // vpxor ymm4,ymm7,ymm6
226 WORD $0x2144; BYTE $0xff // and edi,r15d
227 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
228 WORD $0xdf31 // xor edi,ebx
229 LONG $0xfb70fdc5; BYTE $0xfa // vpshufd ymm7,ymm3,0xfa
230 WORD $0x3145; BYTE $0xee // xor r14d,r13d
231 LONG $0x3b1c8d45 // lea r11d,[r11+rdi*1]
232 WORD $0x8945; BYTE $0xc4 // mov r12d,r8d
233 LONG $0xd672cdc5; BYTE $0x0b // vpsrld ymm6,ymm6,0xb
234
235 // ROUND(R11, AX, BX, CX, DX, R8, R9, R10, R12, R13, R14, DI, R15, SP, 0x84)
236 LONG $0x24940344; LONG $0x00000084 // add r10d,[rsp+0x84]
237 WORD $0x2141; BYTE $0xd4 // and r12d,edx
238 LONG $0xf07b63c4; WORD $0x19ea // rorx r13d,edx,0x19
239 LONG $0xe5efddc5 // vpxor ymm4,ymm4,ymm5
240 LONG $0xf07be3c4; WORD $0x0bfa // rorx edi,edx,0xb
241 LONG $0x331c8d47 // lea r11d,[r11+r14*1]
242 LONG $0x22148d47 // lea r10d,[r10+r12*1]
243 LONG $0xf572d5c5; BYTE $0x0b // vpslld ymm5,ymm5,0xb
244 LONG $0xf26842c4; BYTE $0xe1 // andn r12d,edx,r9d
245 WORD $0x3141; BYTE $0xfd // xor r13d,edi
246 LONG $0xf07b63c4; WORD $0x06f2 // rorx r14d,edx,0x6
247 LONG $0xe6efddc5 // vpxor ymm4,ymm4,ymm6
248 LONG $0x22148d47 // lea r10d,[r10+r12*1]
249 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
250 WORD $0x8944; BYTE $0xdf // mov edi,r11d
251 LONG $0xd772cdc5; BYTE $0x0a // vpsrld ymm6,ymm7,0xa
252 LONG $0xf07b43c4; WORD $0x16e3 // rorx r12d,r11d,0x16
253 LONG $0x2a148d47 // lea r10d,[r10+r13*1]
254 WORD $0xc731 // xor edi,eax
255 LONG $0xe5efddc5 // vpxor ymm4,ymm4,ymm5
256 LONG $0xf07b43c4; WORD $0x0df3 // rorx r14d,r11d,0xd
257 LONG $0xf07b43c4; WORD $0x02eb // rorx r13d,r11d,0x2
258 LONG $0x110c8d42 // lea ecx,[rcx+r10*1]
259 LONG $0xd773c5c5; BYTE $0x11 // vpsrlq ymm7,ymm7,0x11
260 WORD $0x2141; BYTE $0xff // and r15d,edi
261 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
262 WORD $0x3141; BYTE $0xc7 // xor r15d,eax
263 LONG $0xc4fefdc5 // vpaddd ymm0,ymm0,ymm4
264 WORD $0x3145; BYTE $0xee // xor r14d,r13d
265 LONG $0x3a148d47 // lea r10d,[r10+r15*1]
266 WORD $0x8941; BYTE $0xd4 // mov r12d,edx
267 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
268
269 // ROUND(R10, R11, AX, BX, CX, DX, R8, R9, R12, R13, R14, R15, DI, SP, 0x88)
270 LONG $0x248c0344; LONG $0x00000088 // add r9d,[rsp+0x88]
271 WORD $0x2141; BYTE $0xcc // and r12d,ecx
272 LONG $0xf07b63c4; WORD $0x19e9 // rorx r13d,ecx,0x19
273 LONG $0xd773c5c5; BYTE $0x02 // vpsrlq ymm7,ymm7,0x2
274 LONG $0xf07b63c4; WORD $0x0bf9 // rorx r15d,ecx,0xb
275 LONG $0x32148d47 // lea r10d,[r10+r14*1]
276 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
277 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
278 LONG $0xf27042c4; BYTE $0xe0 // andn r12d,ecx,r8d
279 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
280 LONG $0xf07b63c4; WORD $0x06f1 // rorx r14d,ecx,0x6
281 LONG $0x004dc2c4; BYTE $0xf0 // vpshufb ymm6,ymm6,ymm8
282 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
283 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
284 WORD $0x8945; BYTE $0xd7 // mov r15d,r10d
285 LONG $0xc6fefdc5 // vpaddd ymm0,ymm0,ymm6
286 LONG $0xf07b43c4; WORD $0x16e2 // rorx r12d,r10d,0x16
287 LONG $0x290c8d47 // lea r9d,[r9+r13*1]
288 WORD $0x3145; BYTE $0xdf // xor r15d,r11d
289 LONG $0xf870fdc5; BYTE $0x50 // vpshufd ymm7,ymm0,0x50
290 LONG $0xf07b43c4; WORD $0x0df2 // rorx r14d,r10d,0xd
291 LONG $0xf07b43c4; WORD $0x02ea // rorx r13d,r10d,0x2
292 LONG $0x0b1c8d42 // lea ebx,[rbx+r9*1]
293 LONG $0xd772cdc5; BYTE $0x0a // vpsrld ymm6,ymm7,0xa
294 WORD $0x2144; BYTE $0xff // and edi,r15d
295 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
296 WORD $0x3144; BYTE $0xdf // xor edi,r11d
297 LONG $0xd773c5c5; BYTE $0x11 // vpsrlq ymm7,ymm7,0x11
298 WORD $0x3145; BYTE $0xee // xor r14d,r13d
299 LONG $0x390c8d45 // lea r9d,[r9+rdi*1]
300 WORD $0x8941; BYTE $0xcc // mov r12d,ecx
301 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
302
303 // ROUND(R9, R10, R11, AX, BX, CX, DX, R8, R12, R13, R14, DI, R15, SP, 0x8c)
304 LONG $0x24840344; LONG $0x0000008c // add r8d,[rsp+0x8c]
305 WORD $0x2141; BYTE $0xdc // and r12d,ebx
306 LONG $0xf07b63c4; WORD $0x19eb // rorx r13d,ebx,0x19
307 LONG $0xd773c5c5; BYTE $0x02 // vpsrlq ymm7,ymm7,0x2
308 LONG $0xf07be3c4; WORD $0x0bfb // rorx edi,ebx,0xb
309 LONG $0x310c8d47 // lea r9d,[r9+r14*1]
310 LONG $0x20048d47 // lea r8d,[r8+r12*1]
311 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
312 LONG $0xf26062c4; BYTE $0xe2 // andn r12d,ebx,edx
313 WORD $0x3141; BYTE $0xfd // xor r13d,edi
314 LONG $0xf07b63c4; WORD $0x06f3 // rorx r14d,ebx,0x6
315 LONG $0x004dc2c4; BYTE $0xf1 // vpshufb ymm6,ymm6,ymm9
316 LONG $0x20048d47 // lea r8d,[r8+r12*1]
317 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
318 WORD $0x8944; BYTE $0xcf // mov edi,r9d
319 LONG $0xc6fefdc5 // vpaddd ymm0,ymm0,ymm6
320 LONG $0xf07b43c4; WORD $0x16e1 // rorx r12d,r9d,0x16
321 LONG $0x28048d47 // lea r8d,[r8+r13*1]
322 WORD $0x3144; BYTE $0xd7 // xor edi,r10d
323 LONG $0x75fefdc5; BYTE $0x00 // vpaddd ymm6,ymm0,[rbp+0x0]
324 LONG $0xf07b43c4; WORD $0x0df1 // rorx r14d,r9d,0xd
325 LONG $0xf07b43c4; WORD $0x02e9 // rorx r13d,r9d,0x2
326 LONG $0x00048d42 // lea eax,[rax+r8*1]
327 WORD $0x2141; BYTE $0xff // and r15d,edi
328 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
329 WORD $0x3145; BYTE $0xd7 // xor r15d,r10d
330 WORD $0x3145; BYTE $0xee // xor r14d,r13d
331 LONG $0x38048d47 // lea r8d,[r8+r15*1]
332 WORD $0x8941; BYTE $0xdc // mov r12d,ebx
333
334 LONG $0x347ffdc5; BYTE $0x24 // vmovdqa [rsp],ymm6
335 LONG $0x0f6de3c4; WORD $0x04e1 // vpalignr ymm4,ymm2,ymm1,0x4
336
337 // ROUND(R8, R9, R10, R11, AX, BX, CX, DX, R12, R13, R14, R15, DI, SP, 0xa0)
338 LONG $0xa0249403; WORD $0x0000; BYTE $0x00 // add edx,[rsp+0xa0]
339 WORD $0x2141; BYTE $0xc4 // and r12d,eax
340 LONG $0xf07b63c4; WORD $0x19e8 // rorx r13d,eax,0x19
341 LONG $0x0f7de3c4; WORD $0x04fb // vpalignr ymm7,ymm0,ymm3,0x4
342 LONG $0xf07b63c4; WORD $0x0bf8 // rorx r15d,eax,0xb
343 LONG $0x30048d47 // lea r8d,[r8+r14*1]
344 LONG $0x22148d42 // lea edx,[rdx+r12*1]
345 LONG $0xd472cdc5; BYTE $0x07 // vpsrld ymm6,ymm4,0x7
346 LONG $0xf27862c4; BYTE $0xe1 // andn r12d,eax,ecx
347 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
348 LONG $0xf07b63c4; WORD $0x06f0 // rorx r14d,eax,0x6
349 LONG $0xcffef5c5 // vpaddd ymm1,ymm1,ymm7
350 LONG $0x22148d42 // lea edx,[rdx+r12*1]
351 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
352 WORD $0x8945; BYTE $0xc7 // mov r15d,r8d
353 LONG $0xd472c5c5; BYTE $0x03 // vpsrld ymm7,ymm4,0x3
354 LONG $0xf07b43c4; WORD $0x16e0 // rorx r12d,r8d,0x16
355 LONG $0x2a148d42 // lea edx,[rdx+r13*1]
356 WORD $0x3145; BYTE $0xcf // xor r15d,r9d
357 LONG $0xf472d5c5; BYTE $0x0e // vpslld ymm5,ymm4,0xe
358 LONG $0xf07b43c4; WORD $0x0df0 // rorx r14d,r8d,0xd
359 LONG $0xf07b43c4; WORD $0x02e8 // rorx r13d,r8d,0x2
360 LONG $0x131c8d45 // lea r11d,[r11+rdx*1]
361 LONG $0xe6efc5c5 // vpxor ymm4,ymm7,ymm6
362 WORD $0x2144; BYTE $0xff // and edi,r15d
363 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
364 WORD $0x3144; BYTE $0xcf // xor edi,r9d
365 LONG $0xf870fdc5; BYTE $0xfa // vpshufd ymm7,ymm0,0xfa
366 WORD $0x3145; BYTE $0xee // xor r14d,r13d
367 WORD $0x148d; BYTE $0x3a // lea edx,[rdx+rdi*1]
368 WORD $0x8941; BYTE $0xc4 // mov r12d,eax
369 LONG $0xd672cdc5; BYTE $0x0b // vpsrld ymm6,ymm6,0xb
370
371 // ROUND(DX, R8, R9, R10, R11, AX, BX, CX, R12, R13, R14, DI, R15, SP, 0xa4)
372 LONG $0xa4248c03; WORD $0x0000; BYTE $0x00 // add ecx,[rsp+0xa4]
373 WORD $0x2145; BYTE $0xdc // and r12d,r11d
374 LONG $0xf07b43c4; WORD $0x19eb // rorx r13d,r11d,0x19
375 LONG $0xe5efddc5 // vpxor ymm4,ymm4,ymm5
376 LONG $0xf07bc3c4; WORD $0x0bfb // rorx edi,r11d,0xb
377 LONG $0x32148d42 // lea edx,[rdx+r14*1]
378 LONG $0x210c8d42 // lea ecx,[rcx+r12*1]
379 LONG $0xf572d5c5; BYTE $0x0b // vpslld ymm5,ymm5,0xb
380 LONG $0xf22062c4; BYTE $0xe3 // andn r12d,r11d,ebx
381 WORD $0x3141; BYTE $0xfd // xor r13d,edi
382 LONG $0xf07b43c4; WORD $0x06f3 // rorx r14d,r11d,0x6
383 LONG $0xe6efddc5 // vpxor ymm4,ymm4,ymm6
384 LONG $0x210c8d42 // lea ecx,[rcx+r12*1]
385 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
386 WORD $0xd789 // mov edi,edx
387 LONG $0xd772cdc5; BYTE $0x0a // vpsrld ymm6,ymm7,0xa
388 LONG $0xf07b63c4; WORD $0x16e2 // rorx r12d,edx,0x16
389 LONG $0x290c8d42 // lea ecx,[rcx+r13*1]
390 WORD $0x3144; BYTE $0xc7 // xor edi,r8d
391 LONG $0xe5efddc5 // vpxor ymm4,ymm4,ymm5
392 LONG $0xf07b63c4; WORD $0x0df2 // rorx r14d,edx,0xd
393 LONG $0xf07b63c4; WORD $0x02ea // rorx r13d,edx,0x2
394 LONG $0x0a148d45 // lea r10d,[r10+rcx*1]
395 LONG $0xd773c5c5; BYTE $0x11 // vpsrlq ymm7,ymm7,0x11
396 WORD $0x2141; BYTE $0xff // and r15d,edi
397 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
398 WORD $0x3145; BYTE $0xc7 // xor r15d,r8d
399 LONG $0xccfef5c5 // vpaddd ymm1,ymm1,ymm4
400 WORD $0x3145; BYTE $0xee // xor r14d,r13d
401 LONG $0x390c8d42 // lea ecx,[rcx+r15*1]
402 WORD $0x8945; BYTE $0xdc // mov r12d,r11d
403 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
404
405 // ROUND(CX, DX, R8, R9, R10, R11, AX, BX, R12, R13, R14, R15, DI, SP, 0xa8)
406 LONG $0xa8249c03; WORD $0x0000; BYTE $0x00 // add ebx,[rsp+0xa8]
407 WORD $0x2145; BYTE $0xd4 // and r12d,r10d
408 LONG $0xf07b43c4; WORD $0x19ea // rorx r13d,r10d,0x19
409 LONG $0xd773c5c5; BYTE $0x02 // vpsrlq ymm7,ymm7,0x2
410 LONG $0xf07b43c4; WORD $0x0bfa // rorx r15d,r10d,0xb
411 LONG $0x310c8d42 // lea ecx,[rcx+r14*1]
412 LONG $0x231c8d42 // lea ebx,[rbx+r12*1]
413 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
414 LONG $0xf22862c4; BYTE $0xe0 // andn r12d,r10d,eax
415 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
416 LONG $0xf07b43c4; WORD $0x06f2 // rorx r14d,r10d,0x6
417 LONG $0x004dc2c4; BYTE $0xf0 // vpshufb ymm6,ymm6,ymm8
418 LONG $0x231c8d42 // lea ebx,[rbx+r12*1]
419 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
420 WORD $0x8941; BYTE $0xcf // mov r15d,ecx
421 LONG $0xcefef5c5 // vpaddd ymm1,ymm1,ymm6
422 LONG $0xf07b63c4; WORD $0x16e1 // rorx r12d,ecx,0x16
423 LONG $0x2b1c8d42 // lea ebx,[rbx+r13*1]
424 WORD $0x3141; BYTE $0xd7 // xor r15d,edx
425 LONG $0xf970fdc5; BYTE $0x50 // vpshufd ymm7,ymm1,0x50
426 LONG $0xf07b63c4; WORD $0x0df1 // rorx r14d,ecx,0xd
427 LONG $0xf07b63c4; WORD $0x02e9 // rorx r13d,ecx,0x2
428 LONG $0x190c8d45 // lea r9d,[r9+rbx*1]
429 LONG $0xd772cdc5; BYTE $0x0a // vpsrld ymm6,ymm7,0xa
430 WORD $0x2144; BYTE $0xff // and edi,r15d
431 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
432 WORD $0xd731 // xor edi,edx
433 LONG $0xd773c5c5; BYTE $0x11 // vpsrlq ymm7,ymm7,0x11
434 WORD $0x3145; BYTE $0xee // xor r14d,r13d
435 WORD $0x1c8d; BYTE $0x3b // lea ebx,[rbx+rdi*1]
436 WORD $0x8945; BYTE $0xd4 // mov r12d,r10d
437 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
438
439 // ROUND(BX, CX, DX, R8, R9, R10, R11, AX, R12, R13, R14, DI, R15, SP, 0xac)
440 LONG $0xac248403; WORD $0x0000; BYTE $0x00 // add eax,[rsp+0xac]
441 WORD $0x2145; BYTE $0xcc // and r12d,r9d
442 LONG $0xf07b43c4; WORD $0x19e9 // rorx r13d,r9d,0x19
443 LONG $0xd773c5c5; BYTE $0x02 // vpsrlq ymm7,ymm7,0x2
444 LONG $0xf07bc3c4; WORD $0x0bf9 // rorx edi,r9d,0xb
445 LONG $0x331c8d42 // lea ebx,[rbx+r14*1]
446 LONG $0x20048d42 // lea eax,[rax+r12*1]
447 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
448 LONG $0xf23042c4; BYTE $0xe3 // andn r12d,r9d,r11d
449 WORD $0x3141; BYTE $0xfd // xor r13d,edi
450 LONG $0xf07b43c4; WORD $0x06f1 // rorx r14d,r9d,0x6
451 LONG $0x004dc2c4; BYTE $0xf1 // vpshufb ymm6,ymm6,ymm9
452 LONG $0x20048d42 // lea eax,[rax+r12*1]
453 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
454 WORD $0xdf89 // mov edi,ebx
455 LONG $0xcefef5c5 // vpaddd ymm1,ymm1,ymm6
456 LONG $0xf07b63c4; WORD $0x16e3 // rorx r12d,ebx,0x16
457 LONG $0x28048d42 // lea eax,[rax+r13*1]
458 WORD $0xcf31 // xor edi,ecx
459 LONG $0x75fef5c5; BYTE $0x20 // vpaddd ymm6,ymm1,[rbp+0x20]
460 LONG $0xf07b63c4; WORD $0x0df3 // rorx r14d,ebx,0xd
461 LONG $0xf07b63c4; WORD $0x02eb // rorx r13d,ebx,0x2
462 LONG $0x00048d45 // lea r8d,[r8+rax*1]
463 WORD $0x2141; BYTE $0xff // and r15d,edi
464 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
465 WORD $0x3141; BYTE $0xcf // xor r15d,ecx
466 WORD $0x3145; BYTE $0xee // xor r14d,r13d
467 LONG $0x38048d42 // lea eax,[rax+r15*1]
468 WORD $0x8945; BYTE $0xcc // mov r12d,r9d
469
470 LONG $0x747ffdc5; WORD $0x2024 // vmovdqa [rsp+0x20],ymm6
471
472 LONG $0x24648d48; BYTE $0xc0 // lea rsp,[rsp-0x40]
473 LONG $0x0f65e3c4; WORD $0x04e2 // vpalignr ymm4,ymm3,ymm2,0x4
474
475 // ROUND(AX, BX, CX, DX, R8, R9, R10, R11, R12, R13, R14, R15, DI, SP, 0x80)
476 LONG $0x249c0344; LONG $0x00000080 // add r11d,[rsp+0x80]
477 WORD $0x2145; BYTE $0xc4 // and r12d,r8d
478 LONG $0xf07b43c4; WORD $0x19e8 // rorx r13d,r8d,0x19
479 LONG $0x0f75e3c4; WORD $0x04f8 // vpalignr ymm7,ymm1,ymm0,0x4
480 LONG $0xf07b43c4; WORD $0x0bf8 // rorx r15d,r8d,0xb
481 LONG $0x30048d42 // lea eax,[rax+r14*1]
482 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
483 LONG $0xd472cdc5; BYTE $0x07 // vpsrld ymm6,ymm4,0x7
484 LONG $0xf23842c4; BYTE $0xe2 // andn r12d,r8d,r10d
485 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
486 LONG $0xf07b43c4; WORD $0x06f0 // rorx r14d,r8d,0x6
487 LONG $0xd7feedc5 // vpaddd ymm2,ymm2,ymm7
488 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
489 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
490 WORD $0x8941; BYTE $0xc7 // mov r15d,eax
491 LONG $0xd472c5c5; BYTE $0x03 // vpsrld ymm7,ymm4,0x3
492 LONG $0xf07b63c4; WORD $0x16e0 // rorx r12d,eax,0x16
493 LONG $0x2b1c8d47 // lea r11d,[r11+r13*1]
494 WORD $0x3141; BYTE $0xdf // xor r15d,ebx
495 LONG $0xf472d5c5; BYTE $0x0e // vpslld ymm5,ymm4,0xe
496 LONG $0xf07b63c4; WORD $0x0df0 // rorx r14d,eax,0xd
497 LONG $0xf07b63c4; WORD $0x02e8 // rorx r13d,eax,0x2
498 LONG $0x1a148d42 // lea edx,[rdx+r11*1]
499 LONG $0xe6efc5c5 // vpxor ymm4,ymm7,ymm6
500 WORD $0x2144; BYTE $0xff // and edi,r15d
501 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
502 WORD $0xdf31 // xor edi,ebx
503 LONG $0xf970fdc5; BYTE $0xfa // vpshufd ymm7,ymm1,0xfa
504 WORD $0x3145; BYTE $0xee // xor r14d,r13d
505 LONG $0x3b1c8d45 // lea r11d,[r11+rdi*1]
506 WORD $0x8945; BYTE $0xc4 // mov r12d,r8d
507 LONG $0xd672cdc5; BYTE $0x0b // vpsrld ymm6,ymm6,0xb
508
509 // ROUND(R11, AX, BX, CX, DX, R8, R9, R10, R12, R13, R14, DI, R15, SP, 0x84)
510 LONG $0x24940344; LONG $0x00000084 // add r10d,[rsp+0x84]
511 WORD $0x2141; BYTE $0xd4 // and r12d,edx
512 LONG $0xf07b63c4; WORD $0x19ea // rorx r13d,edx,0x19
513 LONG $0xe5efddc5 // vpxor ymm4,ymm4,ymm5
514 LONG $0xf07be3c4; WORD $0x0bfa // rorx edi,edx,0xb
515 LONG $0x331c8d47 // lea r11d,[r11+r14*1]
516 LONG $0x22148d47 // lea r10d,[r10+r12*1]
517 LONG $0xf572d5c5; BYTE $0x0b // vpslld ymm5,ymm5,0xb
518 LONG $0xf26842c4; BYTE $0xe1 // andn r12d,edx,r9d
519 WORD $0x3141; BYTE $0xfd // xor r13d,edi
520 LONG $0xf07b63c4; WORD $0x06f2 // rorx r14d,edx,0x6
521 LONG $0xe6efddc5 // vpxor ymm4,ymm4,ymm6
522 LONG $0x22148d47 // lea r10d,[r10+r12*1]
523 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
524 WORD $0x8944; BYTE $0xdf // mov edi,r11d
525 LONG $0xd772cdc5; BYTE $0x0a // vpsrld ymm6,ymm7,0xa
526 LONG $0xf07b43c4; WORD $0x16e3 // rorx r12d,r11d,0x16
527 LONG $0x2a148d47 // lea r10d,[r10+r13*1]
528 WORD $0xc731 // xor edi,eax
529 LONG $0xe5efddc5 // vpxor ymm4,ymm4,ymm5
530 LONG $0xf07b43c4; WORD $0x0df3 // rorx r14d,r11d,0xd
531 LONG $0xf07b43c4; WORD $0x02eb // rorx r13d,r11d,0x2
532 LONG $0x110c8d42 // lea ecx,[rcx+r10*1]
533 LONG $0xd773c5c5; BYTE $0x11 // vpsrlq ymm7,ymm7,0x11
534 WORD $0x2141; BYTE $0xff // and r15d,edi
535 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
536 WORD $0x3141; BYTE $0xc7 // xor r15d,eax
537 LONG $0xd4feedc5 // vpaddd ymm2,ymm2,ymm4
538 WORD $0x3145; BYTE $0xee // xor r14d,r13d
539 LONG $0x3a148d47 // lea r10d,[r10+r15*1]
540 WORD $0x8941; BYTE $0xd4 // mov r12d,edx
541 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
542
543 // ROUND(R10, R11, AX, BX, CX, DX, R8, R9, R12, R13, R14, R15, DI, SP, 0x88)
544 LONG $0x248c0344; LONG $0x00000088 // add r9d,[rsp+0x88]
545 WORD $0x2141; BYTE $0xcc // and r12d,ecx
546 LONG $0xf07b63c4; WORD $0x19e9 // rorx r13d,ecx,0x19
547 LONG $0xd773c5c5; BYTE $0x02 // vpsrlq ymm7,ymm7,0x2
548 LONG $0xf07b63c4; WORD $0x0bf9 // rorx r15d,ecx,0xb
549 LONG $0x32148d47 // lea r10d,[r10+r14*1]
550 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
551 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
552 LONG $0xf27042c4; BYTE $0xe0 // andn r12d,ecx,r8d
553 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
554 LONG $0xf07b63c4; WORD $0x06f1 // rorx r14d,ecx,0x6
555 LONG $0x004dc2c4; BYTE $0xf0 // vpshufb ymm6,ymm6,ymm8
556 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
557 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
558 WORD $0x8945; BYTE $0xd7 // mov r15d,r10d
559 LONG $0xd6feedc5 // vpaddd ymm2,ymm2,ymm6
560 LONG $0xf07b43c4; WORD $0x16e2 // rorx r12d,r10d,0x16
561 LONG $0x290c8d47 // lea r9d,[r9+r13*1]
562 WORD $0x3145; BYTE $0xdf // xor r15d,r11d
563 LONG $0xfa70fdc5; BYTE $0x50 // vpshufd ymm7,ymm2,0x50
564 LONG $0xf07b43c4; WORD $0x0df2 // rorx r14d,r10d,0xd
565 LONG $0xf07b43c4; WORD $0x02ea // rorx r13d,r10d,0x2
566 LONG $0x0b1c8d42 // lea ebx,[rbx+r9*1]
567 LONG $0xd772cdc5; BYTE $0x0a // vpsrld ymm6,ymm7,0xa
568 WORD $0x2144; BYTE $0xff // and edi,r15d
569 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
570 WORD $0x3144; BYTE $0xdf // xor edi,r11d
571 LONG $0xd773c5c5; BYTE $0x11 // vpsrlq ymm7,ymm7,0x11
572 WORD $0x3145; BYTE $0xee // xor r14d,r13d
573 LONG $0x390c8d45 // lea r9d,[r9+rdi*1]
574 WORD $0x8941; BYTE $0xcc // mov r12d,ecx
575 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
576
577 // ROUND(R9, R10, R11, AX, BX, CX, DX, R8, R12, R13, R14, DI, R15, SP, 0x8c)
578 LONG $0x24840344; LONG $0x0000008c // add r8d,[rsp+0x8c]
579 WORD $0x2141; BYTE $0xdc // and r12d,ebx
580 LONG $0xf07b63c4; WORD $0x19eb // rorx r13d,ebx,0x19
581 LONG $0xd773c5c5; BYTE $0x02 // vpsrlq ymm7,ymm7,0x2
582 LONG $0xf07be3c4; WORD $0x0bfb // rorx edi,ebx,0xb
583 LONG $0x310c8d47 // lea r9d,[r9+r14*1]
584 LONG $0x20048d47 // lea r8d,[r8+r12*1]
585 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
586 LONG $0xf26062c4; BYTE $0xe2 // andn r12d,ebx,edx
587 WORD $0x3141; BYTE $0xfd // xor r13d,edi
588 LONG $0xf07b63c4; WORD $0x06f3 // rorx r14d,ebx,0x6
589 LONG $0x004dc2c4; BYTE $0xf1 // vpshufb ymm6,ymm6,ymm9
590 LONG $0x20048d47 // lea r8d,[r8+r12*1]
591 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
592 WORD $0x8944; BYTE $0xcf // mov edi,r9d
593 LONG $0xd6feedc5 // vpaddd ymm2,ymm2,ymm6
594 LONG $0xf07b43c4; WORD $0x16e1 // rorx r12d,r9d,0x16
595 LONG $0x28048d47 // lea r8d,[r8+r13*1]
596 WORD $0x3144; BYTE $0xd7 // xor edi,r10d
597 LONG $0x75feedc5; BYTE $0x40 // vpaddd ymm6,ymm2,[rbp+0x40]
598 LONG $0xf07b43c4; WORD $0x0df1 // rorx r14d,r9d,0xd
599 LONG $0xf07b43c4; WORD $0x02e9 // rorx r13d,r9d,0x2
600 LONG $0x00048d42 // lea eax,[rax+r8*1]
601 WORD $0x2141; BYTE $0xff // and r15d,edi
602 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
603 WORD $0x3145; BYTE $0xd7 // xor r15d,r10d
604 WORD $0x3145; BYTE $0xee // xor r14d,r13d
605 LONG $0x38048d47 // lea r8d,[r8+r15*1]
606 WORD $0x8941; BYTE $0xdc // mov r12d,ebx
607
608 LONG $0x347ffdc5; BYTE $0x24 // vmovdqa [rsp],ymm6
609 LONG $0x0f7de3c4; WORD $0x04e3 // vpalignr ymm4,ymm0,ymm3,0x4
610
611 // ROUND(R8, R9, R10, R11, AX, BX, CX, DX, R12, R13, R14, R15, DI, SP, 0xa0)
612 LONG $0xa0249403; WORD $0x0000; BYTE $0x00 // add edx,[rsp+0xa0]
613 WORD $0x2141; BYTE $0xc4 // and r12d,eax
614 LONG $0xf07b63c4; WORD $0x19e8 // rorx r13d,eax,0x19
615 LONG $0x0f6de3c4; WORD $0x04f9 // vpalignr ymm7,ymm2,ymm1,0x4
616 LONG $0xf07b63c4; WORD $0x0bf8 // rorx r15d,eax,0xb
617 LONG $0x30048d47 // lea r8d,[r8+r14*1]
618 LONG $0x22148d42 // lea edx,[rdx+r12*1]
619 LONG $0xd472cdc5; BYTE $0x07 // vpsrld ymm6,ymm4,0x7
620 LONG $0xf27862c4; BYTE $0xe1 // andn r12d,eax,ecx
621 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
622 LONG $0xf07b63c4; WORD $0x06f0 // rorx r14d,eax,0x6
623 LONG $0xdffee5c5 // vpaddd ymm3,ymm3,ymm7
624 LONG $0x22148d42 // lea edx,[rdx+r12*1]
625 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
626 WORD $0x8945; BYTE $0xc7 // mov r15d,r8d
627 LONG $0xd472c5c5; BYTE $0x03 // vpsrld ymm7,ymm4,0x3
628 LONG $0xf07b43c4; WORD $0x16e0 // rorx r12d,r8d,0x16
629 LONG $0x2a148d42 // lea edx,[rdx+r13*1]
630 WORD $0x3145; BYTE $0xcf // xor r15d,r9d
631 LONG $0xf472d5c5; BYTE $0x0e // vpslld ymm5,ymm4,0xe
632 LONG $0xf07b43c4; WORD $0x0df0 // rorx r14d,r8d,0xd
633 LONG $0xf07b43c4; WORD $0x02e8 // rorx r13d,r8d,0x2
634 LONG $0x131c8d45 // lea r11d,[r11+rdx*1]
635 LONG $0xe6efc5c5 // vpxor ymm4,ymm7,ymm6
636 WORD $0x2144; BYTE $0xff // and edi,r15d
637 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
638 WORD $0x3144; BYTE $0xcf // xor edi,r9d
639 LONG $0xfa70fdc5; BYTE $0xfa // vpshufd ymm7,ymm2,0xfa
640 WORD $0x3145; BYTE $0xee // xor r14d,r13d
641 WORD $0x148d; BYTE $0x3a // lea edx,[rdx+rdi*1]
642 WORD $0x8941; BYTE $0xc4 // mov r12d,eax
643 LONG $0xd672cdc5; BYTE $0x0b // vpsrld ymm6,ymm6,0xb
644
645 // ROUND(DX, R8, R9, R10, R11, AX, BX, CX, R12, R13, R14, DI, R15, SP, 0xa4)
646 LONG $0xa4248c03; WORD $0x0000; BYTE $0x00 // add ecx,[rsp+0xa4]
647 WORD $0x2145; BYTE $0xdc // and r12d,r11d
648 LONG $0xf07b43c4; WORD $0x19eb // rorx r13d,r11d,0x19
649 LONG $0xe5efddc5 // vpxor ymm4,ymm4,ymm5
650 LONG $0xf07bc3c4; WORD $0x0bfb // rorx edi,r11d,0xb
651 LONG $0x32148d42 // lea edx,[rdx+r14*1]
652 LONG $0x210c8d42 // lea ecx,[rcx+r12*1]
653 LONG $0xf572d5c5; BYTE $0x0b // vpslld ymm5,ymm5,0xb
654 LONG $0xf22062c4; BYTE $0xe3 // andn r12d,r11d,ebx
655 WORD $0x3141; BYTE $0xfd // xor r13d,edi
656 LONG $0xf07b43c4; WORD $0x06f3 // rorx r14d,r11d,0x6
657 LONG $0xe6efddc5 // vpxor ymm4,ymm4,ymm6
658 LONG $0x210c8d42 // lea ecx,[rcx+r12*1]
659 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
660 WORD $0xd789 // mov edi,edx
661 LONG $0xd772cdc5; BYTE $0x0a // vpsrld ymm6,ymm7,0xa
662 LONG $0xf07b63c4; WORD $0x16e2 // rorx r12d,edx,0x16
663 LONG $0x290c8d42 // lea ecx,[rcx+r13*1]
664 WORD $0x3144; BYTE $0xc7 // xor edi,r8d
665 LONG $0xe5efddc5 // vpxor ymm4,ymm4,ymm5
666 LONG $0xf07b63c4; WORD $0x0df2 // rorx r14d,edx,0xd
667 LONG $0xf07b63c4; WORD $0x02ea // rorx r13d,edx,0x2
668 LONG $0x0a148d45 // lea r10d,[r10+rcx*1]
669 LONG $0xd773c5c5; BYTE $0x11 // vpsrlq ymm7,ymm7,0x11
670 WORD $0x2141; BYTE $0xff // and r15d,edi
671 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
672 WORD $0x3145; BYTE $0xc7 // xor r15d,r8d
673 LONG $0xdcfee5c5 // vpaddd ymm3,ymm3,ymm4
674 WORD $0x3145; BYTE $0xee // xor r14d,r13d
675 LONG $0x390c8d42 // lea ecx,[rcx+r15*1]
676 WORD $0x8945; BYTE $0xdc // mov r12d,r11d
677 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
678
679 // ROUND(CX, DX, R8, R9, R10, R11, AX, BX, R12, R13, R14, R15, DI, SP, 0xa8)
680 LONG $0xa8249c03; WORD $0x0000; BYTE $0x00 // add ebx,[rsp+0xa8]
681 WORD $0x2145; BYTE $0xd4 // and r12d,r10d
682 LONG $0xf07b43c4; WORD $0x19ea // rorx r13d,r10d,0x19
683 LONG $0xd773c5c5; BYTE $0x02 // vpsrlq ymm7,ymm7,0x2
684 LONG $0xf07b43c4; WORD $0x0bfa // rorx r15d,r10d,0xb
685 LONG $0x310c8d42 // lea ecx,[rcx+r14*1]
686 LONG $0x231c8d42 // lea ebx,[rbx+r12*1]
687 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
688 LONG $0xf22862c4; BYTE $0xe0 // andn r12d,r10d,eax
689 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
690 LONG $0xf07b43c4; WORD $0x06f2 // rorx r14d,r10d,0x6
691 LONG $0x004dc2c4; BYTE $0xf0 // vpshufb ymm6,ymm6,ymm8
692 LONG $0x231c8d42 // lea ebx,[rbx+r12*1]
693 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
694 WORD $0x8941; BYTE $0xcf // mov r15d,ecx
695 LONG $0xdefee5c5 // vpaddd ymm3,ymm3,ymm6
696 LONG $0xf07b63c4; WORD $0x16e1 // rorx r12d,ecx,0x16
697 LONG $0x2b1c8d42 // lea ebx,[rbx+r13*1]
698 WORD $0x3141; BYTE $0xd7 // xor r15d,edx
699 LONG $0xfb70fdc5; BYTE $0x50 // vpshufd ymm7,ymm3,0x50
700 LONG $0xf07b63c4; WORD $0x0df1 // rorx r14d,ecx,0xd
701 LONG $0xf07b63c4; WORD $0x02e9 // rorx r13d,ecx,0x2
702 LONG $0x190c8d45 // lea r9d,[r9+rbx*1]
703 LONG $0xd772cdc5; BYTE $0x0a // vpsrld ymm6,ymm7,0xa
704 WORD $0x2144; BYTE $0xff // and edi,r15d
705 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
706 WORD $0xd731 // xor edi,edx
707 LONG $0xd773c5c5; BYTE $0x11 // vpsrlq ymm7,ymm7,0x11
708 WORD $0x3145; BYTE $0xee // xor r14d,r13d
709 WORD $0x1c8d; BYTE $0x3b // lea ebx,[rbx+rdi*1]
710 WORD $0x8945; BYTE $0xd4 // mov r12d,r10d
711 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
712
713 // ROUND(BX, CX, DX, R8, R9, R10, R11, AX, R12, R13, R14, DI, R15, SP, 0xac)
714 LONG $0xac248403; WORD $0x0000; BYTE $0x00 // add eax,[rsp+0xac]
715 WORD $0x2145; BYTE $0xcc // and r12d,r9d
716 LONG $0xf07b43c4; WORD $0x19e9 // rorx r13d,r9d,0x19
717 LONG $0xd773c5c5; BYTE $0x02 // vpsrlq ymm7,ymm7,0x2
718 LONG $0xf07bc3c4; WORD $0x0bf9 // rorx edi,r9d,0xb
719 LONG $0x331c8d42 // lea ebx,[rbx+r14*1]
720 LONG $0x20048d42 // lea eax,[rax+r12*1]
721 LONG $0xf7efcdc5 // vpxor ymm6,ymm6,ymm7
722 LONG $0xf23042c4; BYTE $0xe3 // andn r12d,r9d,r11d
723 WORD $0x3141; BYTE $0xfd // xor r13d,edi
724 LONG $0xf07b43c4; WORD $0x06f1 // rorx r14d,r9d,0x6
725 LONG $0x004dc2c4; BYTE $0xf1 // vpshufb ymm6,ymm6,ymm9
726 LONG $0x20048d42 // lea eax,[rax+r12*1]
727 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
728 WORD $0xdf89 // mov edi,ebx
729 LONG $0xdefee5c5 // vpaddd ymm3,ymm3,ymm6
730 LONG $0xf07b63c4; WORD $0x16e3 // rorx r12d,ebx,0x16
731 LONG $0x28048d42 // lea eax,[rax+r13*1]
732 WORD $0xcf31 // xor edi,ecx
733 LONG $0x75fee5c5; BYTE $0x60 // vpaddd ymm6,ymm3,[rbp+0x60]
734 LONG $0xf07b63c4; WORD $0x0df3 // rorx r14d,ebx,0xd
735 LONG $0xf07b63c4; WORD $0x02eb // rorx r13d,ebx,0x2
736 LONG $0x00048d45 // lea r8d,[r8+rax*1]
737 WORD $0x2141; BYTE $0xff // and r15d,edi
738 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
739 WORD $0x3141; BYTE $0xcf // xor r15d,ecx
740 WORD $0x3145; BYTE $0xee // xor r14d,r13d
741 LONG $0x38048d42 // lea eax,[rax+r15*1]
742 WORD $0x8945; BYTE $0xcc // mov r12d,r9d
743
744 LONG $0x747ffdc5; WORD $0x2024 // vmovdqa [rsp+0x20],ymm6
745 ADDQ $0x80, BP
746
747 CMPB 0x3(BP), $0x0
748 JNE loop1
749
750 // ROUND(AX, BX, CX, DX, R8, R9, R10, R11, R12, R13, R14, R15, DI, SP, 0x40)
751 LONG $0x245c0344; BYTE $0x40 // add r11d,[rsp+0x40]
752 WORD $0x2145; BYTE $0xc4 // and r12d,r8d
753 LONG $0xf07b43c4; WORD $0x19e8 // rorx r13d,r8d,0x19
754 LONG $0xf07b43c4; WORD $0x0bf8 // rorx r15d,r8d,0xb
755 LONG $0x30048d42 // lea eax,[rax+r14*1]
756 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
757 LONG $0xf23842c4; BYTE $0xe2 // andn r12d,r8d,r10d
758 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
759 LONG $0xf07b43c4; WORD $0x06f0 // rorx r14d,r8d,0x6
760 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
761 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
762 WORD $0x8941; BYTE $0xc7 // mov r15d,eax
763 LONG $0xf07b63c4; WORD $0x16e0 // rorx r12d,eax,0x16
764 LONG $0x2b1c8d47 // lea r11d,[r11+r13*1]
765 WORD $0x3141; BYTE $0xdf // xor r15d,ebx
766 LONG $0xf07b63c4; WORD $0x0df0 // rorx r14d,eax,0xd
767 LONG $0xf07b63c4; WORD $0x02e8 // rorx r13d,eax,0x2
768 LONG $0x1a148d42 // lea edx,[rdx+r11*1]
769 WORD $0x2144; BYTE $0xff // and edi,r15d
770 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
771 WORD $0xdf31 // xor edi,ebx
772 WORD $0x3145; BYTE $0xee // xor r14d,r13d
773 LONG $0x3b1c8d45 // lea r11d,[r11+rdi*1]
774 WORD $0x8945; BYTE $0xc4 // mov r12d,r8d
775
776 // ROUND(R11, AX, BX, CX, DX, R8, R9, R10, R12, R13, R14, DI, R15, SP, 0x44)
777 LONG $0x24540344; BYTE $0x44 // add r10d,[rsp+0x44]
778 WORD $0x2141; BYTE $0xd4 // and r12d,edx
779 LONG $0xf07b63c4; WORD $0x19ea // rorx r13d,edx,0x19
780 LONG $0xf07be3c4; WORD $0x0bfa // rorx edi,edx,0xb
781 LONG $0x331c8d47 // lea r11d,[r11+r14*1]
782 LONG $0x22148d47 // lea r10d,[r10+r12*1]
783 LONG $0xf26842c4; BYTE $0xe1 // andn r12d,edx,r9d
784 WORD $0x3141; BYTE $0xfd // xor r13d,edi
785 LONG $0xf07b63c4; WORD $0x06f2 // rorx r14d,edx,0x6
786 LONG $0x22148d47 // lea r10d,[r10+r12*1]
787 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
788 WORD $0x8944; BYTE $0xdf // mov edi,r11d
789 LONG $0xf07b43c4; WORD $0x16e3 // rorx r12d,r11d,0x16
790 LONG $0x2a148d47 // lea r10d,[r10+r13*1]
791 WORD $0xc731 // xor edi,eax
792 LONG $0xf07b43c4; WORD $0x0df3 // rorx r14d,r11d,0xd
793 LONG $0xf07b43c4; WORD $0x02eb // rorx r13d,r11d,0x2
794 LONG $0x110c8d42 // lea ecx,[rcx+r10*1]
795 WORD $0x2141; BYTE $0xff // and r15d,edi
796 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
797 WORD $0x3141; BYTE $0xc7 // xor r15d,eax
798 WORD $0x3145; BYTE $0xee // xor r14d,r13d
799 LONG $0x3a148d47 // lea r10d,[r10+r15*1]
800 WORD $0x8941; BYTE $0xd4 // mov r12d,edx
801
802 // ROUND(R10, R11, AX, BX, CX, DX, R8, R9, R12, R13, R14, R15, DI, SP, 0x48)
803 LONG $0x244c0344; BYTE $0x48 // add r9d,[rsp+0x48]
804 WORD $0x2141; BYTE $0xcc // and r12d,ecx
805 LONG $0xf07b63c4; WORD $0x19e9 // rorx r13d,ecx,0x19
806 LONG $0xf07b63c4; WORD $0x0bf9 // rorx r15d,ecx,0xb
807 LONG $0x32148d47 // lea r10d,[r10+r14*1]
808 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
809 LONG $0xf27042c4; BYTE $0xe0 // andn r12d,ecx,r8d
810 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
811 LONG $0xf07b63c4; WORD $0x06f1 // rorx r14d,ecx,0x6
812 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
813 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
814 WORD $0x8945; BYTE $0xd7 // mov r15d,r10d
815 LONG $0xf07b43c4; WORD $0x16e2 // rorx r12d,r10d,0x16
816 LONG $0x290c8d47 // lea r9d,[r9+r13*1]
817 WORD $0x3145; BYTE $0xdf // xor r15d,r11d
818 LONG $0xf07b43c4; WORD $0x0df2 // rorx r14d,r10d,0xd
819 LONG $0xf07b43c4; WORD $0x02ea // rorx r13d,r10d,0x2
820 LONG $0x0b1c8d42 // lea ebx,[rbx+r9*1]
821 WORD $0x2144; BYTE $0xff // and edi,r15d
822 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
823 WORD $0x3144; BYTE $0xdf // xor edi,r11d
824 WORD $0x3145; BYTE $0xee // xor r14d,r13d
825 LONG $0x390c8d45 // lea r9d,[r9+rdi*1]
826 WORD $0x8941; BYTE $0xcc // mov r12d,ecx
827
828 // ROUND(R9, R10, R11, AX, BX, CX, DX, R8, R12, R13, R14, DI, R15, SP, 0x4c)
829 LONG $0x24440344; BYTE $0x4c // add r8d,[rsp+0x4c]
830 WORD $0x2141; BYTE $0xdc // and r12d,ebx
831 LONG $0xf07b63c4; WORD $0x19eb // rorx r13d,ebx,0x19
832 LONG $0xf07be3c4; WORD $0x0bfb // rorx edi,ebx,0xb
833 LONG $0x310c8d47 // lea r9d,[r9+r14*1]
834 LONG $0x20048d47 // lea r8d,[r8+r12*1]
835 LONG $0xf26062c4; BYTE $0xe2 // andn r12d,ebx,edx
836 WORD $0x3141; BYTE $0xfd // xor r13d,edi
837 LONG $0xf07b63c4; WORD $0x06f3 // rorx r14d,ebx,0x6
838 LONG $0x20048d47 // lea r8d,[r8+r12*1]
839 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
840 WORD $0x8944; BYTE $0xcf // mov edi,r9d
841 LONG $0xf07b43c4; WORD $0x16e1 // rorx r12d,r9d,0x16
842 LONG $0x28048d47 // lea r8d,[r8+r13*1]
843 WORD $0x3144; BYTE $0xd7 // xor edi,r10d
844 LONG $0xf07b43c4; WORD $0x0df1 // rorx r14d,r9d,0xd
845 LONG $0xf07b43c4; WORD $0x02e9 // rorx r13d,r9d,0x2
846 LONG $0x00048d42 // lea eax,[rax+r8*1]
847 WORD $0x2141; BYTE $0xff // and r15d,edi
848 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
849 WORD $0x3145; BYTE $0xd7 // xor r15d,r10d
850 WORD $0x3145; BYTE $0xee // xor r14d,r13d
851 LONG $0x38048d47 // lea r8d,[r8+r15*1]
852 WORD $0x8941; BYTE $0xdc // mov r12d,ebx
853
854 // ROUND(R8, R9, R10, R11, AX, BX, CX, DX, R12, R13, R14, R15, DI, SP, 0x60)
855 LONG $0x60245403 // add edx,[rsp+0x60]
856 WORD $0x2141; BYTE $0xc4 // and r12d,eax
857 LONG $0xf07b63c4; WORD $0x19e8 // rorx r13d,eax,0x19
858 LONG $0xf07b63c4; WORD $0x0bf8 // rorx r15d,eax,0xb
859 LONG $0x30048d47 // lea r8d,[r8+r14*1]
860 LONG $0x22148d42 // lea edx,[rdx+r12*1]
861 LONG $0xf27862c4; BYTE $0xe1 // andn r12d,eax,ecx
862 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
863 LONG $0xf07b63c4; WORD $0x06f0 // rorx r14d,eax,0x6
864 LONG $0x22148d42 // lea edx,[rdx+r12*1]
865 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
866 WORD $0x8945; BYTE $0xc7 // mov r15d,r8d
867 LONG $0xf07b43c4; WORD $0x16e0 // rorx r12d,r8d,0x16
868 LONG $0x2a148d42 // lea edx,[rdx+r13*1]
869 WORD $0x3145; BYTE $0xcf // xor r15d,r9d
870 LONG $0xf07b43c4; WORD $0x0df0 // rorx r14d,r8d,0xd
871 LONG $0xf07b43c4; WORD $0x02e8 // rorx r13d,r8d,0x2
872 LONG $0x131c8d45 // lea r11d,[r11+rdx*1]
873 WORD $0x2144; BYTE $0xff // and edi,r15d
874 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
875 WORD $0x3144; BYTE $0xcf // xor edi,r9d
876 WORD $0x3145; BYTE $0xee // xor r14d,r13d
877 WORD $0x148d; BYTE $0x3a // lea edx,[rdx+rdi*1]
878 WORD $0x8941; BYTE $0xc4 // mov r12d,eax
879
880 // ROUND(DX, R8, R9, R10, R11, AX, BX, CX, R12, R13, R14, DI, R15, SP, 0x64)
881 LONG $0x64244c03 // add ecx,[rsp+0x64]
882 WORD $0x2145; BYTE $0xdc // and r12d,r11d
883 LONG $0xf07b43c4; WORD $0x19eb // rorx r13d,r11d,0x19
884 LONG $0xf07bc3c4; WORD $0x0bfb // rorx edi,r11d,0xb
885 LONG $0x32148d42 // lea edx,[rdx+r14*1]
886 LONG $0x210c8d42 // lea ecx,[rcx+r12*1]
887 LONG $0xf22062c4; BYTE $0xe3 // andn r12d,r11d,ebx
888 WORD $0x3141; BYTE $0xfd // xor r13d,edi
889 LONG $0xf07b43c4; WORD $0x06f3 // rorx r14d,r11d,0x6
890 LONG $0x210c8d42 // lea ecx,[rcx+r12*1]
891 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
892 WORD $0xd789 // mov edi,edx
893 LONG $0xf07b63c4; WORD $0x16e2 // rorx r12d,edx,0x16
894 LONG $0x290c8d42 // lea ecx,[rcx+r13*1]
895 WORD $0x3144; BYTE $0xc7 // xor edi,r8d
896 LONG $0xf07b63c4; WORD $0x0df2 // rorx r14d,edx,0xd
897 LONG $0xf07b63c4; WORD $0x02ea // rorx r13d,edx,0x2
898 LONG $0x0a148d45 // lea r10d,[r10+rcx*1]
899 WORD $0x2141; BYTE $0xff // and r15d,edi
900 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
901 WORD $0x3145; BYTE $0xc7 // xor r15d,r8d
902 WORD $0x3145; BYTE $0xee // xor r14d,r13d
903 LONG $0x390c8d42 // lea ecx,[rcx+r15*1]
904 WORD $0x8945; BYTE $0xdc // mov r12d,r11d
905
906 // ROUND(CX, DX, R8, R9, R10, R11, AX, BX, R12, R13, R14, R15, DI, SP, 0x68)
907 LONG $0x68245c03 // add ebx,[rsp+0x68]
908 WORD $0x2145; BYTE $0xd4 // and r12d,r10d
909 LONG $0xf07b43c4; WORD $0x19ea // rorx r13d,r10d,0x19
910 LONG $0xf07b43c4; WORD $0x0bfa // rorx r15d,r10d,0xb
911 LONG $0x310c8d42 // lea ecx,[rcx+r14*1]
912 LONG $0x231c8d42 // lea ebx,[rbx+r12*1]
913 LONG $0xf22862c4; BYTE $0xe0 // andn r12d,r10d,eax
914 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
915 LONG $0xf07b43c4; WORD $0x06f2 // rorx r14d,r10d,0x6
916 LONG $0x231c8d42 // lea ebx,[rbx+r12*1]
917 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
918 WORD $0x8941; BYTE $0xcf // mov r15d,ecx
919 LONG $0xf07b63c4; WORD $0x16e1 // rorx r12d,ecx,0x16
920 LONG $0x2b1c8d42 // lea ebx,[rbx+r13*1]
921 WORD $0x3141; BYTE $0xd7 // xor r15d,edx
922 LONG $0xf07b63c4; WORD $0x0df1 // rorx r14d,ecx,0xd
923 LONG $0xf07b63c4; WORD $0x02e9 // rorx r13d,ecx,0x2
924 LONG $0x190c8d45 // lea r9d,[r9+rbx*1]
925 WORD $0x2144; BYTE $0xff // and edi,r15d
926 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
927 WORD $0xd731 // xor edi,edx
928 WORD $0x3145; BYTE $0xee // xor r14d,r13d
929 WORD $0x1c8d; BYTE $0x3b // lea ebx,[rbx+rdi*1]
930 WORD $0x8945; BYTE $0xd4 // mov r12d,r10d
931
932 // ROUND(BX, CX, DX, R8, R9, R10, R11, AX, R12, R13, R14, DI, R15, SP, 0x6c)
933 LONG $0x6c244403 // add eax,[rsp+0x6c]
934 WORD $0x2145; BYTE $0xcc // and r12d,r9d
935 LONG $0xf07b43c4; WORD $0x19e9 // rorx r13d,r9d,0x19
936 LONG $0xf07bc3c4; WORD $0x0bf9 // rorx edi,r9d,0xb
937 LONG $0x331c8d42 // lea ebx,[rbx+r14*1]
938 LONG $0x20048d42 // lea eax,[rax+r12*1]
939 LONG $0xf23042c4; BYTE $0xe3 // andn r12d,r9d,r11d
940 WORD $0x3141; BYTE $0xfd // xor r13d,edi
941 LONG $0xf07b43c4; WORD $0x06f1 // rorx r14d,r9d,0x6
942 LONG $0x20048d42 // lea eax,[rax+r12*1]
943 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
944 WORD $0xdf89 // mov edi,ebx
945 LONG $0xf07b63c4; WORD $0x16e3 // rorx r12d,ebx,0x16
946 LONG $0x28048d42 // lea eax,[rax+r13*1]
947 WORD $0xcf31 // xor edi,ecx
948 LONG $0xf07b63c4; WORD $0x0df3 // rorx r14d,ebx,0xd
949 LONG $0xf07b63c4; WORD $0x02eb // rorx r13d,ebx,0x2
950 LONG $0x00048d45 // lea r8d,[r8+rax*1]
951 WORD $0x2141; BYTE $0xff // and r15d,edi
952 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
953 WORD $0x3141; BYTE $0xcf // xor r15d,ecx
954 WORD $0x3145; BYTE $0xee // xor r14d,r13d
955 LONG $0x38048d42 // lea eax,[rax+r15*1]
956 WORD $0x8945; BYTE $0xcc // mov r12d,r9d
957
958 // ROUND(AX, BX, CX, DX, R8, R9, R10, R11, R12, R13, R14, R15, DI, SP, 0x00)
959 LONG $0x241c0344 // add r11d,[rsp]
960 WORD $0x2145; BYTE $0xc4 // and r12d,r8d
961 LONG $0xf07b43c4; WORD $0x19e8 // rorx r13d,r8d,0x19
962 LONG $0xf07b43c4; WORD $0x0bf8 // rorx r15d,r8d,0xb
963 LONG $0x30048d42 // lea eax,[rax+r14*1]
964 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
965 LONG $0xf23842c4; BYTE $0xe2 // andn r12d,r8d,r10d
966 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
967 LONG $0xf07b43c4; WORD $0x06f0 // rorx r14d,r8d,0x6
968 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
969 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
970 WORD $0x8941; BYTE $0xc7 // mov r15d,eax
971 LONG $0xf07b63c4; WORD $0x16e0 // rorx r12d,eax,0x16
972 LONG $0x2b1c8d47 // lea r11d,[r11+r13*1]
973 WORD $0x3141; BYTE $0xdf // xor r15d,ebx
974 LONG $0xf07b63c4; WORD $0x0df0 // rorx r14d,eax,0xd
975 LONG $0xf07b63c4; WORD $0x02e8 // rorx r13d,eax,0x2
976 LONG $0x1a148d42 // lea edx,[rdx+r11*1]
977 WORD $0x2144; BYTE $0xff // and edi,r15d
978 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
979 WORD $0xdf31 // xor edi,ebx
980 WORD $0x3145; BYTE $0xee // xor r14d,r13d
981 LONG $0x3b1c8d45 // lea r11d,[r11+rdi*1]
982 WORD $0x8945; BYTE $0xc4 // mov r12d,r8d
983
984 // ROUND(R11, AX, BX, CX, DX, R8, R9, R10, R12, R13, R14, DI, R15, SP, 0x04)
985 LONG $0x24540344; BYTE $0x04 // add r10d,[rsp+0x4]
986 WORD $0x2141; BYTE $0xd4 // and r12d,edx
987 LONG $0xf07b63c4; WORD $0x19ea // rorx r13d,edx,0x19
988 LONG $0xf07be3c4; WORD $0x0bfa // rorx edi,edx,0xb
989 LONG $0x331c8d47 // lea r11d,[r11+r14*1]
990 LONG $0x22148d47 // lea r10d,[r10+r12*1]
991 LONG $0xf26842c4; BYTE $0xe1 // andn r12d,edx,r9d
992 WORD $0x3141; BYTE $0xfd // xor r13d,edi
993 LONG $0xf07b63c4; WORD $0x06f2 // rorx r14d,edx,0x6
994 LONG $0x22148d47 // lea r10d,[r10+r12*1]
995 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
996 WORD $0x8944; BYTE $0xdf // mov edi,r11d
997 LONG $0xf07b43c4; WORD $0x16e3 // rorx r12d,r11d,0x16
998 LONG $0x2a148d47 // lea r10d,[r10+r13*1]
999 WORD $0xc731 // xor edi,eax
1000 LONG $0xf07b43c4; WORD $0x0df3 // rorx r14d,r11d,0xd
1001 LONG $0xf07b43c4; WORD $0x02eb // rorx r13d,r11d,0x2
1002 LONG $0x110c8d42 // lea ecx,[rcx+r10*1]
1003 WORD $0x2141; BYTE $0xff // and r15d,edi
1004 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1005 WORD $0x3141; BYTE $0xc7 // xor r15d,eax
1006 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1007 LONG $0x3a148d47 // lea r10d,[r10+r15*1]
1008 WORD $0x8941; BYTE $0xd4 // mov r12d,edx
1009
1010 // ROUND(R10, R11, AX, BX, CX, DX, R8, R9, R12, R13, R14, R15, DI, SP, 0x08)
1011 LONG $0x244c0344; BYTE $0x08 // add r9d,[rsp+0x8]
1012 WORD $0x2141; BYTE $0xcc // and r12d,ecx
1013 LONG $0xf07b63c4; WORD $0x19e9 // rorx r13d,ecx,0x19
1014 LONG $0xf07b63c4; WORD $0x0bf9 // rorx r15d,ecx,0xb
1015 LONG $0x32148d47 // lea r10d,[r10+r14*1]
1016 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
1017 LONG $0xf27042c4; BYTE $0xe0 // andn r12d,ecx,r8d
1018 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
1019 LONG $0xf07b63c4; WORD $0x06f1 // rorx r14d,ecx,0x6
1020 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
1021 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
1022 WORD $0x8945; BYTE $0xd7 // mov r15d,r10d
1023 LONG $0xf07b43c4; WORD $0x16e2 // rorx r12d,r10d,0x16
1024 LONG $0x290c8d47 // lea r9d,[r9+r13*1]
1025 WORD $0x3145; BYTE $0xdf // xor r15d,r11d
1026 LONG $0xf07b43c4; WORD $0x0df2 // rorx r14d,r10d,0xd
1027 LONG $0xf07b43c4; WORD $0x02ea // rorx r13d,r10d,0x2
1028 LONG $0x0b1c8d42 // lea ebx,[rbx+r9*1]
1029 WORD $0x2144; BYTE $0xff // and edi,r15d
1030 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1031 WORD $0x3144; BYTE $0xdf // xor edi,r11d
1032 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1033 LONG $0x390c8d45 // lea r9d,[r9+rdi*1]
1034 WORD $0x8941; BYTE $0xcc // mov r12d,ecx
1035
1036 // ROUND(R9, R10, R11, AX, BX, CX, DX, R8, R12, R13, R14, DI, R15, SP, 0x0c)
1037 LONG $0x24440344; BYTE $0x0c // add r8d,[rsp+0xc]
1038 WORD $0x2141; BYTE $0xdc // and r12d,ebx
1039 LONG $0xf07b63c4; WORD $0x19eb // rorx r13d,ebx,0x19
1040 LONG $0xf07be3c4; WORD $0x0bfb // rorx edi,ebx,0xb
1041 LONG $0x310c8d47 // lea r9d,[r9+r14*1]
1042 LONG $0x20048d47 // lea r8d,[r8+r12*1]
1043 LONG $0xf26062c4; BYTE $0xe2 // andn r12d,ebx,edx
1044 WORD $0x3141; BYTE $0xfd // xor r13d,edi
1045 LONG $0xf07b63c4; WORD $0x06f3 // rorx r14d,ebx,0x6
1046 LONG $0x20048d47 // lea r8d,[r8+r12*1]
1047 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
1048 WORD $0x8944; BYTE $0xcf // mov edi,r9d
1049 LONG $0xf07b43c4; WORD $0x16e1 // rorx r12d,r9d,0x16
1050 LONG $0x28048d47 // lea r8d,[r8+r13*1]
1051 WORD $0x3144; BYTE $0xd7 // xor edi,r10d
1052 LONG $0xf07b43c4; WORD $0x0df1 // rorx r14d,r9d,0xd
1053 LONG $0xf07b43c4; WORD $0x02e9 // rorx r13d,r9d,0x2
1054 LONG $0x00048d42 // lea eax,[rax+r8*1]
1055 WORD $0x2141; BYTE $0xff // and r15d,edi
1056 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1057 WORD $0x3145; BYTE $0xd7 // xor r15d,r10d
1058 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1059 LONG $0x38048d47 // lea r8d,[r8+r15*1]
1060 WORD $0x8941; BYTE $0xdc // mov r12d,ebx
1061
1062 // ROUND(R8, R9, R10, R11, AX, BX, CX, DX, R12, R13, R14, R15, DI, SP, 0x20)
1063 LONG $0x20245403 // add edx,[rsp+0x20]
1064 WORD $0x2141; BYTE $0xc4 // and r12d,eax
1065 LONG $0xf07b63c4; WORD $0x19e8 // rorx r13d,eax,0x19
1066 LONG $0xf07b63c4; WORD $0x0bf8 // rorx r15d,eax,0xb
1067 LONG $0x30048d47 // lea r8d,[r8+r14*1]
1068 LONG $0x22148d42 // lea edx,[rdx+r12*1]
1069 LONG $0xf27862c4; BYTE $0xe1 // andn r12d,eax,ecx
1070 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
1071 LONG $0xf07b63c4; WORD $0x06f0 // rorx r14d,eax,0x6
1072 LONG $0x22148d42 // lea edx,[rdx+r12*1]
1073 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
1074 WORD $0x8945; BYTE $0xc7 // mov r15d,r8d
1075 LONG $0xf07b43c4; WORD $0x16e0 // rorx r12d,r8d,0x16
1076 LONG $0x2a148d42 // lea edx,[rdx+r13*1]
1077 WORD $0x3145; BYTE $0xcf // xor r15d,r9d
1078 LONG $0xf07b43c4; WORD $0x0df0 // rorx r14d,r8d,0xd
1079 LONG $0xf07b43c4; WORD $0x02e8 // rorx r13d,r8d,0x2
1080 LONG $0x131c8d45 // lea r11d,[r11+rdx*1]
1081 WORD $0x2144; BYTE $0xff // and edi,r15d
1082 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1083 WORD $0x3144; BYTE $0xcf // xor edi,r9d
1084 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1085 WORD $0x148d; BYTE $0x3a // lea edx,[rdx+rdi*1]
1086 WORD $0x8941; BYTE $0xc4 // mov r12d,eax
1087
1088 // ROUND(DX, R8, R9, R10, R11, AX, BX, CX, R12, R13, R14, DI, R15, SP, 0x24)
1089 LONG $0x24244c03 // add ecx,[rsp+0x24]
1090 WORD $0x2145; BYTE $0xdc // and r12d,r11d
1091 LONG $0xf07b43c4; WORD $0x19eb // rorx r13d,r11d,0x19
1092 LONG $0xf07bc3c4; WORD $0x0bfb // rorx edi,r11d,0xb
1093 LONG $0x32148d42 // lea edx,[rdx+r14*1]
1094 LONG $0x210c8d42 // lea ecx,[rcx+r12*1]
1095 LONG $0xf22062c4; BYTE $0xe3 // andn r12d,r11d,ebx
1096 WORD $0x3141; BYTE $0xfd // xor r13d,edi
1097 LONG $0xf07b43c4; WORD $0x06f3 // rorx r14d,r11d,0x6
1098 LONG $0x210c8d42 // lea ecx,[rcx+r12*1]
1099 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
1100 WORD $0xd789 // mov edi,edx
1101 LONG $0xf07b63c4; WORD $0x16e2 // rorx r12d,edx,0x16
1102 LONG $0x290c8d42 // lea ecx,[rcx+r13*1]
1103 WORD $0x3144; BYTE $0xc7 // xor edi,r8d
1104 LONG $0xf07b63c4; WORD $0x0df2 // rorx r14d,edx,0xd
1105 LONG $0xf07b63c4; WORD $0x02ea // rorx r13d,edx,0x2
1106 LONG $0x0a148d45 // lea r10d,[r10+rcx*1]
1107 WORD $0x2141; BYTE $0xff // and r15d,edi
1108 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1109 WORD $0x3145; BYTE $0xc7 // xor r15d,r8d
1110 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1111 LONG $0x390c8d42 // lea ecx,[rcx+r15*1]
1112 WORD $0x8945; BYTE $0xdc // mov r12d,r11d
1113
1114 // ROUND(CX, DX, R8, R9, R10, R11, AX, BX, R12, R13, R14, R15, DI, SP, 0x28)
1115 LONG $0x28245c03 // add ebx,[rsp+0x28]
1116 WORD $0x2145; BYTE $0xd4 // and r12d,r10d
1117 LONG $0xf07b43c4; WORD $0x19ea // rorx r13d,r10d,0x19
1118 LONG $0xf07b43c4; WORD $0x0bfa // rorx r15d,r10d,0xb
1119 LONG $0x310c8d42 // lea ecx,[rcx+r14*1]
1120 LONG $0x231c8d42 // lea ebx,[rbx+r12*1]
1121 LONG $0xf22862c4; BYTE $0xe0 // andn r12d,r10d,eax
1122 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
1123 LONG $0xf07b43c4; WORD $0x06f2 // rorx r14d,r10d,0x6
1124 LONG $0x231c8d42 // lea ebx,[rbx+r12*1]
1125 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
1126 WORD $0x8941; BYTE $0xcf // mov r15d,ecx
1127 LONG $0xf07b63c4; WORD $0x16e1 // rorx r12d,ecx,0x16
1128 LONG $0x2b1c8d42 // lea ebx,[rbx+r13*1]
1129 WORD $0x3141; BYTE $0xd7 // xor r15d,edx
1130 LONG $0xf07b63c4; WORD $0x0df1 // rorx r14d,ecx,0xd
1131 LONG $0xf07b63c4; WORD $0x02e9 // rorx r13d,ecx,0x2
1132 LONG $0x190c8d45 // lea r9d,[r9+rbx*1]
1133 WORD $0x2144; BYTE $0xff // and edi,r15d
1134 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1135 WORD $0xd731 // xor edi,edx
1136 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1137 WORD $0x1c8d; BYTE $0x3b // lea ebx,[rbx+rdi*1]
1138 WORD $0x8945; BYTE $0xd4 // mov r12d,r10d
1139
1140 // ROUND(BX, CX, DX, R8, R9, R10, R11, AX, R12, R13, R14, DI, R15, SP, 0x2c)
1141 LONG $0x2c244403 // add eax,[rsp+0x2c]
1142 WORD $0x2145; BYTE $0xcc // and r12d,r9d
1143 LONG $0xf07b43c4; WORD $0x19e9 // rorx r13d,r9d,0x19
1144 LONG $0xf07bc3c4; WORD $0x0bf9 // rorx edi,r9d,0xb
1145 LONG $0x331c8d42 // lea ebx,[rbx+r14*1]
1146 LONG $0x20048d42 // lea eax,[rax+r12*1]
1147 LONG $0xf23042c4; BYTE $0xe3 // andn r12d,r9d,r11d
1148 WORD $0x3141; BYTE $0xfd // xor r13d,edi
1149 LONG $0xf07b43c4; WORD $0x06f1 // rorx r14d,r9d,0x6
1150 LONG $0x20048d42 // lea eax,[rax+r12*1]
1151 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
1152 WORD $0xdf89 // mov edi,ebx
1153 LONG $0xf07b63c4; WORD $0x16e3 // rorx r12d,ebx,0x16
1154 LONG $0x28048d42 // lea eax,[rax+r13*1]
1155 WORD $0xcf31 // xor edi,ecx
1156 LONG $0xf07b63c4; WORD $0x0df3 // rorx r14d,ebx,0xd
1157 LONG $0xf07b63c4; WORD $0x02eb // rorx r13d,ebx,0x2
1158 LONG $0x00048d45 // lea r8d,[r8+rax*1]
1159 WORD $0x2141; BYTE $0xff // and r15d,edi
1160 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1161 WORD $0x3141; BYTE $0xcf // xor r15d,ecx
1162 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1163 LONG $0x38048d42 // lea eax,[rax+r15*1]
1164 WORD $0x8945; BYTE $0xcc // mov r12d,r9d
1165
1166 MOVQ 0x200(SP), DI // $_ctx
1167 ADDQ R14, AX
1168
1169 LEAQ 0x1c0(SP), BP
1170
1171 ADDL (DI), AX
1172 ADDL 4(DI), BX
1173 ADDL 8(DI), CX
1174 ADDL 12(DI), DX
1175 ADDL 16(DI), R8
1176 ADDL 20(DI), R9
1177 ADDL 24(DI), R10
1178 ADDL 28(DI), R11
1179
1180 MOVL AX, (DI)
1181 MOVL BX, 4(DI)
1182 MOVL CX, 8(DI)
1183 MOVL DX, 12(DI)
1184 MOVL R8, 16(DI)
1185 MOVL R9, 20(DI)
1186 MOVL R10, 24(DI)
1187 MOVL R11, 28(DI)
1188
1189 CMPQ SI, 0x50(BP) // $_end
1190 JE done
1191
1192 XORQ R14, R14
1193 MOVQ BX, DI
1194 XORQ CX, DI // magic
1195 MOVQ R9, R12
11891196
11901197 loop2:
1191 // ROUND(AX, BX, CX, DX, R8, R9, R10, R11, R12, R13, R14, R15, DI, BP, 0x10)
1192 LONG $0x105d0344 // add r11d,[rbp+0x10]
1193 WORD $0x2145; BYTE $0xc4 // and r12d,r8d
1194 LONG $0xf07b43c4; WORD $0x19e8 // rorx r13d,r8d,0x19
1195 LONG $0xf07b43c4; WORD $0x0bf8 // rorx r15d,r8d,0xb
1196 LONG $0x30048d42 // lea eax,[rax+r14*1]
1197 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
1198 LONG $0xf23842c4; BYTE $0xe2 // andn r12d,r8d,r10d
1199 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
1200 LONG $0xf07b43c4; WORD $0x06f0 // rorx r14d,r8d,0x6
1201 LONG $0x231c8d47 // lea r11d,[r11+r12*1]
1202 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
1203 WORD $0x8941; BYTE $0xc7 // mov r15d,eax
1204 LONG $0xf07b63c4; WORD $0x16e0 // rorx r12d,eax,0x16
1205 LONG $0x2b1c8d47 // lea r11d,[r11+r13*1]
1206 WORD $0x3141; BYTE $0xdf // xor r15d,ebx
1207 LONG $0xf07b63c4; WORD $0x0df0 // rorx r14d,eax,0xd
1208 LONG $0xf07b63c4; WORD $0x02e8 // rorx r13d,eax,0x2
1209 LONG $0x1a148d42 // lea edx,[rdx+r11*1]
1210 WORD $0x2144; BYTE $0xff // and edi,r15d
1211 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1212 WORD $0xdf31 // xor edi,ebx
1213 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1214 LONG $0x3b1c8d45 // lea r11d,[r11+rdi*1]
1215 WORD $0x8945; BYTE $0xc4 // mov r12d,r8d
1216
1217 // ROUND(R11, AX, BX, CX, DX, R8, R9, R10, R12, R13, R14, DI, R15, BP, 0x14)
1218 LONG $0x14550344 // add r10d,[rbp+0x14]
1219 WORD $0x2141; BYTE $0xd4 // and r12d,edx
1220 LONG $0xf07b63c4; WORD $0x19ea // rorx r13d,edx,0x19
1221 LONG $0xf07be3c4; WORD $0x0bfa // rorx edi,edx,0xb
1222 LONG $0x331c8d47 // lea r11d,[r11+r14*1]
1223 LONG $0x22148d47 // lea r10d,[r10+r12*1]
1224 LONG $0xf26842c4; BYTE $0xe1 // andn r12d,edx,r9d
1225 WORD $0x3141; BYTE $0xfd // xor r13d,edi
1226 LONG $0xf07b63c4; WORD $0x06f2 // rorx r14d,edx,0x6
1227 LONG $0x22148d47 // lea r10d,[r10+r12*1]
1228 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
1229 WORD $0x8944; BYTE $0xdf // mov edi,r11d
1230 LONG $0xf07b43c4; WORD $0x16e3 // rorx r12d,r11d,0x16
1231 LONG $0x2a148d47 // lea r10d,[r10+r13*1]
1232 WORD $0xc731 // xor edi,eax
1233 LONG $0xf07b43c4; WORD $0x0df3 // rorx r14d,r11d,0xd
1234 LONG $0xf07b43c4; WORD $0x02eb // rorx r13d,r11d,0x2
1235 LONG $0x110c8d42 // lea ecx,[rcx+r10*1]
1236 WORD $0x2141; BYTE $0xff // and r15d,edi
1237 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1238 WORD $0x3141; BYTE $0xc7 // xor r15d,eax
1239 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1240 LONG $0x3a148d47 // lea r10d,[r10+r15*1]
1241 WORD $0x8941; BYTE $0xd4 // mov r12d,edx
1242
1243 // ROUND(R10, R11, AX, BX, CX, DX, R8, R9, R12, R13, R14, R15, DI, BP, 0x18)
1244 LONG $0x184d0344 // add r9d,[rbp+0x18]
1245 WORD $0x2141; BYTE $0xcc // and r12d,ecx
1246 LONG $0xf07b63c4; WORD $0x19e9 // rorx r13d,ecx,0x19
1247 LONG $0xf07b63c4; WORD $0x0bf9 // rorx r15d,ecx,0xb
1248 LONG $0x32148d47 // lea r10d,[r10+r14*1]
1249 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
1250 LONG $0xf27042c4; BYTE $0xe0 // andn r12d,ecx,r8d
1251 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
1252 LONG $0xf07b63c4; WORD $0x06f1 // rorx r14d,ecx,0x6
1253 LONG $0x210c8d47 // lea r9d,[r9+r12*1]
1254 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
1255 WORD $0x8945; BYTE $0xd7 // mov r15d,r10d
1256 LONG $0xf07b43c4; WORD $0x16e2 // rorx r12d,r10d,0x16
1257 LONG $0x290c8d47 // lea r9d,[r9+r13*1]
1258 WORD $0x3145; BYTE $0xdf // xor r15d,r11d
1259 LONG $0xf07b43c4; WORD $0x0df2 // rorx r14d,r10d,0xd
1260 LONG $0xf07b43c4; WORD $0x02ea // rorx r13d,r10d,0x2
1261 LONG $0x0b1c8d42 // lea ebx,[rbx+r9*1]
1262 WORD $0x2144; BYTE $0xff // and edi,r15d
1263 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1264 WORD $0x3144; BYTE $0xdf // xor edi,r11d
1265 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1266 LONG $0x390c8d45 // lea r9d,[r9+rdi*1]
1267 WORD $0x8941; BYTE $0xcc // mov r12d,ecx
1268
1269 // ROUND(R9, R10, R11, AX, BX, CX, DX, R8, R12, R13, R14, DI, R15, BP, 0x1c)
1270 LONG $0x1c450344 // add r8d,[rbp+0x1c]
1271 WORD $0x2141; BYTE $0xdc // and r12d,ebx
1272 LONG $0xf07b63c4; WORD $0x19eb // rorx r13d,ebx,0x19
1273 LONG $0xf07be3c4; WORD $0x0bfb // rorx edi,ebx,0xb
1274 LONG $0x310c8d47 // lea r9d,[r9+r14*1]
1275 LONG $0x20048d47 // lea r8d,[r8+r12*1]
1276 LONG $0xf26062c4; BYTE $0xe2 // andn r12d,ebx,edx
1277 WORD $0x3141; BYTE $0xfd // xor r13d,edi
1278 LONG $0xf07b63c4; WORD $0x06f3 // rorx r14d,ebx,0x6
1279 LONG $0x20048d47 // lea r8d,[r8+r12*1]
1280 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
1281 WORD $0x8944; BYTE $0xcf // mov edi,r9d
1282 LONG $0xf07b43c4; WORD $0x16e1 // rorx r12d,r9d,0x16
1283 LONG $0x28048d47 // lea r8d,[r8+r13*1]
1284 WORD $0x3144; BYTE $0xd7 // xor edi,r10d
1285 LONG $0xf07b43c4; WORD $0x0df1 // rorx r14d,r9d,0xd
1286 LONG $0xf07b43c4; WORD $0x02e9 // rorx r13d,r9d,0x2
1287 LONG $0x00048d42 // lea eax,[rax+r8*1]
1288 WORD $0x2141; BYTE $0xff // and r15d,edi
1289 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1290 WORD $0x3145; BYTE $0xd7 // xor r15d,r10d
1291 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1292 LONG $0x38048d47 // lea r8d,[r8+r15*1]
1293 WORD $0x8941; BYTE $0xdc // mov r12d,ebx
1294
1295 // ROUND(R8, R9, R10, R11, AX, BX, CX, DX, R12, R13, R14, R15, DI, BP, 0x30)
1296 WORD $0x5503; BYTE $0x30 // add edx,[rbp+0x30]
1297 WORD $0x2141; BYTE $0xc4 // and r12d,eax
1298 LONG $0xf07b63c4; WORD $0x19e8 // rorx r13d,eax,0x19
1299 LONG $0xf07b63c4; WORD $0x0bf8 // rorx r15d,eax,0xb
1300 LONG $0x30048d47 // lea r8d,[r8+r14*1]
1301 LONG $0x22148d42 // lea edx,[rdx+r12*1]
1302 LONG $0xf27862c4; BYTE $0xe1 // andn r12d,eax,ecx
1303 WORD $0x3145; BYTE $0xfd // xor r13d,r15d
1304 LONG $0xf07b63c4; WORD $0x06f0 // rorx r14d,eax,0x6
1305 LONG $0x22148d42 // lea edx,[rdx+r12*1]
1306 WORD $0x3145; BYTE $0xf5 // xor r13d,r14d
1307 WORD $0x8945; BYTE $0xc7 // mov r15d,r8d
1308 LONG $0xf07b43c4; WORD $0x16e0 // rorx r12d,r8d,0x16
1309 LONG $0x2a148d42 // lea edx,[rdx+r13*1]
1310 WORD $0x3145; BYTE $0xcf // xor r15d,r9d
1311 LONG $0xf07b43c4; WORD $0x0df0 // rorx r14d,r8d,0xd
1312 LONG $0xf07b43c4; WORD $0x02e8 // rorx r13d,r8d,0x2
1313 LONG $0x131c8d45 // lea r11d,[r11+rdx*1]
1314 WORD $0x2144; BYTE $0xff // and edi,r15d
1315 WORD $0x3145; BYTE $0xe6 // xor r14d,r12d
1316 WORD $0x3144; BYTE $0xcf // xor edi,r9d
1317 WORD $0x3145; BYTE $0xee // xor r14d,r13d
1318 WORD $0x148d; BYTE $0x3a // lea edx,[rdx+rdi*1]
1319 WORD $0x8941; BYTE $0xc4 // mov r12d,eax
1320