Codebase list openssl / 053fa39
Conversion to UTF-8 where needed This leaves behind files with names ending with '.iso-8859-1'. These should be safe to remove. If something went wrong when re-encoding, there will be some files with names ending with '.utf8' left behind. Reviewed-by: Rich Salz <rsalz@openssl.org> Richard Levitte 8 years ago
28 changed file(s) with 176 addition(s) and 176 deletion(s). Raw diff Collapse all Expand all
99 *) Remove SSL_OP_TLS_BLOCK_PADDING_BUG. This is SSLeay legacy, we're
1010 not aware of clients that still exhibit this bug, and the workaround
1111 hasn't been working properly for a while.
12 [Emilia Käsper]
12 [Emilia Käsper]
1313
1414 *) The return type of BIO_number_read() and BIO_number_written() as well as
1515 the corresponding num_read and num_write members in the BIO structure has
400400 This parameter will be set to 1 or 0 depending on the ciphersuite selected
401401 by the SSL/TLS server library, indicating whether it can provide forward
402402 security.
403 [Emilia Käsper <emilia.kasper@esat.kuleuven.be> (Google)]
403 [Emilia Käsper <emilia.kasper@esat.kuleuven.be> (Google)]
404404
405405 *) New -verify_name option in command line utilities to set verification
406406 parameters by name.
487487 callbacks.
488488
489489 This issue was reported to OpenSSL by Robert Swiecki (Google), and
490 independently by Hanno Böck.
490 independently by Hanno Böck.
491491 (CVE-2015-1789)
492 [Emilia Käsper]
492 [Emilia Käsper]
493493
494494 *) PKCS7 crash with missing EnvelopedContent
495495
503503
504504 This issue was reported to OpenSSL by Michal Zalewski (Google).
505505 (CVE-2015-1790)
506 [Emilia Käsper]
506 [Emilia Käsper]
507507
508508 *) CMS verify infinite loop with unknown hash function
509509
622622
623623 This issue was reported to OpenSSL by Michal Zalewski (Google).
624624 (CVE-2015-0289)
625 [Emilia Käsper]
625 [Emilia Käsper]
626626
627627 *) DoS via reachable assert in SSLv2 servers fix
628628
630630 servers that both support SSLv2 and enable export cipher suites by sending
631631 a specially crafted SSLv2 CLIENT-MASTER-KEY message.
632632
633 This issue was discovered by Sean Burford (Google) and Emilia Käsper
633 This issue was discovered by Sean Burford (Google) and Emilia Käsper
634634 (OpenSSL development team).
635635 (CVE-2015-0293)
636 [Emilia Käsper]
636 [Emilia Käsper]
637637
638638 *) Empty CKE with client auth and DHE fix
639639
11381138 version does not match the session's version. Resuming with a different
11391139 version, while not strictly forbidden by the RFC, is of questionable
11401140 sanity and breaks all known clients.
1141 [David Benjamin, Emilia Käsper]
1141 [David Benjamin, Emilia Käsper]
11421142
11431143 *) Tighten handling of the ChangeCipherSpec (CCS) message: reject
11441144 early CCS messages during renegotiation. (Note that because
11451145 renegotiation is encrypted, this early CCS was not exploitable.)
1146 [Emilia Käsper]
1146 [Emilia Käsper]
11471147
11481148 *) Tighten client-side session ticket handling during renegotiation:
11491149 ensure that the client only accepts a session ticket if the server sends
11541154 Similarly, ensure that the client requires a session ticket if one
11551155 was advertised in the ServerHello. Previously, a TLS client would
11561156 ignore a missing NewSessionTicket message.
1157 [Emilia Käsper]
1157 [Emilia Käsper]
11581158
11591159 Changes between 1.0.1i and 1.0.1j [15 Oct 2014]
11601160
12341234 with a null pointer dereference (read) by specifying an anonymous (EC)DH
12351235 ciphersuite and sending carefully crafted handshake messages.
12361236
1237 Thanks to Felix Gröbert (Google) for discovering and researching this
1237 Thanks to Felix Gröbert (Google) for discovering and researching this
12381238 issue.
12391239 (CVE-2014-3510)
1240 [Emilia Käsper]
1240 [Emilia Käsper]
12411241
12421242 *) By sending carefully crafted DTLS packets an attacker could cause openssl
12431243 to leak memory. This can be exploited through a Denial of Service attack.
12741274 properly negotiated with the client. This can be exploited through a
12751275 Denial of Service attack.
12761276
1277 Thanks to Joonas Kuorilehto and Riku Hietamäki (Codenomicon) for
1277 Thanks to Joonas Kuorilehto and Riku Hietamäki (Codenomicon) for
12781278 discovering and researching this issue.
12791279 (CVE-2014-5139)
12801280 [Steve Henson]
12861286
12871287 Thanks to Ivan Fratric (Google) for discovering this issue.
12881288 (CVE-2014-3508)
1289 [Emilia Käsper, and Steve Henson]
1289 [Emilia Käsper, and Steve Henson]
12901290
12911291 *) Fix ec_GFp_simple_points_make_affine (thus, EC_POINTs_mul etc.)
12921292 for corner cases. (Certain input points at infinity could lead to
13161316 client or server. This is potentially exploitable to run arbitrary
13171317 code on a vulnerable client or server.
13181318
1319 Thanks to Jüri Aedla for reporting this issue. (CVE-2014-0195)
1320 [Jüri Aedla, Steve Henson]
1319 Thanks to Jüri Aedla for reporting this issue. (CVE-2014-0195)
1320 [Jüri Aedla, Steve Henson]
13211321
13221322 *) Fix bug in TLS code where clients enable anonymous ECDH ciphersuites
13231323 are subject to a denial of service attack.
13241324
1325 Thanks to Felix Gröbert and Ivan Fratric at Google for discovering
1325 Thanks to Felix Gröbert and Ivan Fratric at Google for discovering
13261326 this issue. (CVE-2014-3470)
1327 [Felix Gröbert, Ivan Fratric, Steve Henson]
1327 [Felix Gröbert, Ivan Fratric, Steve Henson]
13281328
13291329 *) Harmonize version and its documentation. -f flag is used to display
13301330 compilation flags.
14031403 Thanks go to Nadhem Alfardan and Kenny Paterson of the Information
14041404 Security Group at Royal Holloway, University of London
14051405 (www.isg.rhul.ac.uk) for discovering this flaw and Adam Langley and
1406 Emilia Käsper for the initial patch.
1406 Emilia Käsper for the initial patch.
14071407 (CVE-2013-0169)
1408 [Emilia Käsper, Adam Langley, Ben Laurie, Andy Polyakov, Steve Henson]
1408 [Emilia Käsper, Adam Langley, Ben Laurie, Andy Polyakov, Steve Henson]
14091409
14101410 *) Fix flaw in AESNI handling of TLS 1.2 and 1.1 records for CBC mode
14111411 ciphersuites which can be exploited in a denial of service attack.
15801580 EC_GROUP_new_by_curve_name() will automatically use these (while
15811581 EC_GROUP_new_curve_GFp() currently prefers the more flexible
15821582 implementations).
1583 [Emilia Käsper, Adam Langley, Bodo Moeller (Google)]
1583 [Emilia Käsper, Adam Langley, Bodo Moeller (Google)]
15841584
15851585 *) Use type ossl_ssize_t instad of ssize_t which isn't available on
15861586 all platforms. Move ssize_t definition from e_os.h to the public
18561856 [Adam Langley (Google)]
18571857
18581858 *) Fix spurious failures in ecdsatest.c.
1859 [Emilia Käsper (Google)]
1859 [Emilia Käsper (Google)]
18601860
18611861 *) Fix the BIO_f_buffer() implementation (which was mixing different
18621862 interpretations of the '..._len' fields).
18701870 lock to call BN_BLINDING_invert_ex, and avoids one use of
18711871 BN_BLINDING_update for each BN_BLINDING structure (previously,
18721872 the last update always remained unused).
1873 [Emilia Käsper (Google)]
1873 [Emilia Käsper (Google)]
18741874
18751875 *) In ssl3_clear, preserve s3->init_extra along with s3->rbuf.
18761876 [Bob Buckholz (Google)]
26792679
26802680 *) Add RFC 3161 compliant time stamp request creation, response generation
26812681 and response verification functionality.
2682 [Zoltán Glózik <zglozik@opentsa.org>, The OpenTSA Project]
2682 [Zoltán Glózik <zglozik@opentsa.org>, The OpenTSA Project]
26832683
26842684 *) Add initial support for TLS extensions, specifically for the server_name
26852685 extension so far. The SSL_SESSION, SSL_CTX, and SSL data structures now
38473847
38483848 *) BN_CTX_get() should return zero-valued bignums, providing the same
38493849 initialised value as BN_new().
3850 [Geoff Thorpe, suggested by Ulf Möller]
3850 [Geoff Thorpe, suggested by Ulf Möller]
38513851
38523852 *) Support for inhibitAnyPolicy certificate extension.
38533853 [Steve Henson]
38663866 some point, these tighter rules will become openssl's default to improve
38673867 maintainability, though the assert()s and other overheads will remain only
38683868 in debugging configurations. See bn.h for more details.
3869 [Geoff Thorpe, Nils Larsch, Ulf Möller]
3869 [Geoff Thorpe, Nils Larsch, Ulf Möller]
38703870
38713871 *) BN_CTX_init() has been deprecated, as BN_CTX is an opaque structure
38723872 that can only be obtained through BN_CTX_new() (which implicitly
39333933 [Douglas Stebila (Sun Microsystems Laboratories)]
39343934
39353935 *) Add the possibility to load symbols globally with DSO.
3936 [Götz Babin-Ebell <babin-ebell@trustcenter.de> via Richard Levitte]
3936 [Götz Babin-Ebell <babin-ebell@trustcenter.de> via Richard Levitte]
39373937
39383938 *) Add the functions ERR_set_mark() and ERR_pop_to_mark() for better
39393939 control of the error stack.
46484648 [Steve Henson]
46494649
46504650 *) Undo Cygwin change.
4651 [Ulf Möller]
4651 [Ulf Möller]
46524652
46534653 *) Added support for proxy certificates according to RFC 3820.
46544654 Because they may be a security thread to unaware applications,
46814681 [Stephen Henson, reported by UK NISCC]
46824682
46834683 *) Use Windows randomness collection on Cygwin.
4684 [Ulf Möller]
4684 [Ulf Möller]
46854685
46864686 *) Fix hang in EGD/PRNGD query when communication socket is closed
46874687 prematurely by EGD/PRNGD.
4688 [Darren Tucker <dtucker@zip.com.au> via Lutz Jänicke, resolves #1014]
4688 [Darren Tucker <dtucker@zip.com.au> via Lutz Jänicke, resolves #1014]
46894689
46904690 *) Prompt for pass phrases when appropriate for PKCS12 input format.
46914691 [Steve Henson]
51475147 pointers passed to them whenever necessary. Otherwise it is possible
51485148 the caller may have overwritten (or deallocated) the original string
51495149 data when a later ENGINE operation tries to use the stored values.
5150 [Götz Babin-Ebell <babinebell@trustcenter.de>]
5150 [Götz Babin-Ebell <babinebell@trustcenter.de>]
51515151
51525152 *) Improve diagnostics in file reading and command-line digests.
51535153 [Ben Laurie aided and abetted by Solar Designer <solar@openwall.com>]
72527252 [Bodo Moeller]
72537253
72547254 *) BN_sqr() bug fix.
7255 [Ulf Möller, reported by Jim Ellis <jim.ellis@cavium.com>]
7255 [Ulf Möller, reported by Jim Ellis <jim.ellis@cavium.com>]
72567256
72577257 *) Rabin-Miller test analyses assume uniformly distributed witnesses,
72587258 so use BN_pseudo_rand_range() instead of using BN_pseudo_rand()
74127412 [Bodo Moeller]
74137413
74147414 *) Fix OAEP check.
7415 [Ulf Möller, Bodo Möller]
7415 [Ulf Möller, Bodo Möller]
74167416
74177417 *) The countermeasure against Bleichbacher's attack on PKCS #1 v1.5
74187418 RSA encryption was accidentally removed in s3_srvr.c in OpenSSL 0.9.5
76747674 [Bodo Moeller]
76757675
76767676 *) Use better test patterns in bntest.
7677 [Ulf Möller]
7677 [Ulf Möller]
76787678
76797679 *) rand_win.c fix for Borland C.
7680 [Ulf Möller]
7680 [Ulf Möller]
76817681
76827682 *) BN_rshift bugfix for n == 0.
76837683 [Bodo Moeller]
78227822
78237823 *) New BIO_shutdown_wr macro, which invokes the BIO_C_SHUTDOWN_WR
78247824 BIO_ctrl (for BIO pairs).
7825 [Bodo Möller]
7825 [Bodo Möller]
78267826
78277827 *) Add DSO method for VMS.
78287828 [Richard Levitte]
78297829
78307830 *) Bug fix: Montgomery multiplication could produce results with the
78317831 wrong sign.
7832 [Ulf Möller]
7832 [Ulf Möller]
78337833
78347834 *) Add RPM specification openssl.spec and modify it to build three
78357835 packages. The default package contains applications, application
78477847
78487848 *) Don't set the two most significant bits to one when generating a
78497849 random number < q in the DSA library.
7850 [Ulf Möller]
7850 [Ulf Möller]
78517851
78527852 *) New SSL API mode 'SSL_MODE_AUTO_RETRY'. This disables the default
78537853 behaviour that SSL_read may result in SSL_ERROR_WANT_READ (even if
81138113 *) Randomness polling function for Win9x, as described in:
81148114 Peter Gutmann, Software Generation of Practically Strong
81158115 Random Numbers.
8116 [Ulf Möller]
8116 [Ulf Möller]
81178117
81188118 *) Fix so PRNG is seeded in req if using an already existing
81198119 DSA key.
83338333 [Steve Henson]
83348334
83358335 *) Eliminate non-ANSI declarations in crypto.h and stack.h.
8336 [Ulf Möller]
8336 [Ulf Möller]
83378337
83388338 *) Fix for SSL server purpose checking. Server checking was
83398339 rejecting certificates which had extended key usage present
83658365 [Bodo Moeller]
83668366
83678367 *) Bugfix for linux-elf makefile.one.
8368 [Ulf Möller]
8368 [Ulf Möller]
83698369
83708370 *) RSA_get_default_method() will now cause a default
83718371 RSA_METHOD to be chosen if one doesn't exist already.
84548454 [Steve Henson]
84558455
84568456 *) des_quad_cksum() byte order bug fix.
8457 [Ulf Möller, using the problem description in krb4-0.9.7, where
8457 [Ulf Möller, using the problem description in krb4-0.9.7, where
84588458 the solution is attributed to Derrick J Brashear <shadow@DEMENTIA.ORG>]
84598459
84608460 *) Fix so V_ASN1_APP_CHOOSE works again: however its use is strongly
85558555 [Rolf Haberrecker <rolf@suse.de>]
85568556
85578557 *) Assembler module support for Mingw32.
8558 [Ulf Möller]
8558 [Ulf Möller]
85598559
85608560 *) Shared library support for HPUX (in shlib/).
85618561 [Lutz Jaenicke <Lutz.Jaenicke@aet.TU-Cottbus.DE> and Anonymous]
85748574
85758575 *) BN_mul bugfix: In bn_mul_part_recursion() only the a>a[n] && b>b[n]
85768576 case was implemented. This caused BN_div_recp() to fail occasionally.
8577 [Ulf Möller]
8577 [Ulf Möller]
85788578
85798579 *) Add an optional second argument to the set_label() in the perl
85808580 assembly language builder. If this argument exists and is set
86048604 [Steve Henson]
86058605
86068606 *) Fix potential buffer overrun problem in BIO_printf().
8607 [Ulf Möller, using public domain code by Patrick Powell; problem
8607 [Ulf Möller, using public domain code by Patrick Powell; problem
86088608 pointed out by David Sacerdote <das33@cornell.edu>]
86098609
86108610 *) Support EGD <http://www.lothar.com/tech/crypto/>. New functions
86118611 RAND_egd() and RAND_status(). In the command line application,
86128612 the EGD socket can be specified like a seed file using RANDFILE
86138613 or -rand.
8614 [Ulf Möller]
8614 [Ulf Möller]
86158615
86168616 *) Allow the string CERTIFICATE to be tolerated in PKCS#7 structures.
86178617 Some CAs (e.g. Verisign) distribute certificates in this form.
86448644 #define OPENSSL_ALGORITHM_DEFINES
86458645 #include <openssl/opensslconf.h>
86468646 defines all pertinent NO_<algo> symbols, such as NO_IDEA, NO_RSA, etc.
8647 [Richard Levitte, Ulf and Bodo Möller]
8647 [Richard Levitte, Ulf and Bodo Möller]
86488648
86498649 *) Bugfix: Tolerate fragmentation and interleaving in the SSL 3/TLS
86508650 record layer.
86958695
86968696 *) Bug fix for BN_div_recp() for numerators with an even number of
86978697 bits.
8698 [Ulf Möller]
8698 [Ulf Möller]
86998699
87008700 *) More tests in bntest.c, and changed test_bn output.
8701 [Ulf Möller]
8701 [Ulf Möller]
87028702
87038703 *) ./config recognizes MacOS X now.
87048704 [Andy Polyakov]
87058705
87068706 *) Bug fix for BN_div() when the first words of num and divsor are
87078707 equal (it gave wrong results if (rem=(n1-q*d0)&BN_MASK2) < d0).
8708 [Ulf Möller]
8708 [Ulf Möller]
87098709
87108710 *) Add support for various broken PKCS#8 formats, and command line
87118711 options to produce them.
87138713
87148714 *) New functions BN_CTX_start(), BN_CTX_get() and BT_CTX_end() to
87158715 get temporary BIGNUMs from a BN_CTX.
8716 [Ulf Möller]
8716 [Ulf Möller]
87178717
87188718 *) Correct return values in BN_mod_exp_mont() and BN_mod_exp2_mont()
87198719 for p == 0.
8720 [Ulf Möller]
8720 [Ulf Möller]
87218721
87228722 *) Change the SSLeay_add_all_*() functions to OpenSSL_add_all_*() and
87238723 include a #define from the old name to the new. The original intent
87418741
87428742 *) Source code cleanups: use const where appropriate, eliminate casts,
87438743 use void * instead of char * in lhash.
8744 [Ulf Möller]
8744 [Ulf Möller]
87458745
87468746 *) Bugfix: ssl3_send_server_key_exchange was not restartable
87478747 (the state was not changed to SSL3_ST_SW_KEY_EXCH_B, and because of
87868786 [Steve Henson]
87878787
87888788 *) New function BN_pseudo_rand().
8789 [Ulf Möller]
8789 [Ulf Möller]
87908790
87918791 *) Clean up BN_mod_mul_montgomery(): replace the broken (and unreadable)
87928792 bignum version of BN_from_montgomery() with the working code from
87938793 SSLeay 0.9.0 (the word based version is faster anyway), and clean up
87948794 the comments.
8795 [Ulf Möller]
8795 [Ulf Möller]
87968796
87978797 *) Avoid a race condition in s2_clnt.c (function get_server_hello) that
87988798 made it impossible to use the same SSL_SESSION data structure in
88028802 *) The return value of RAND_load_file() no longer counts bytes obtained
88038803 by stat(). RAND_load_file(..., -1) is new and uses the complete file
88048804 to seed the PRNG (previously an explicit byte count was required).
8805 [Ulf Möller, Bodo Möller]
8805 [Ulf Möller, Bodo Möller]
88068806
88078807 *) Clean up CRYPTO_EX_DATA functions, some of these didn't have prototypes
88088808 used (char *) instead of (void *) and had casts all over the place.
88098809 [Steve Henson]
88108810
88118811 *) Make BN_generate_prime() return NULL on error if ret!=NULL.
8812 [Ulf Möller]
8812 [Ulf Möller]
88138813
88148814 *) Retain source code compatibility for BN_prime_checks macro:
88158815 BN_is_prime(..., BN_prime_checks, ...) now uses
88168816 BN_prime_checks_for_size to determine the appropriate number of
88178817 Rabin-Miller iterations.
8818 [Ulf Möller]
8818 [Ulf Möller]
88198819
88208820 *) Diffie-Hellman uses "safe" primes: DH_check() return code renamed to
88218821 DH_CHECK_P_NOT_SAFE_PRIME.
88228822 (Check if this is true? OpenPGP calls them "strong".)
8823 [Ulf Möller]
8823 [Ulf Möller]
88248824
88258825 *) Merge the functionality of "dh" and "gendh" programs into a new program
88268826 "dhparam". The old programs are retained for now but will handle DH keys
88768876 *) Add missing #ifndefs that caused missing symbols when building libssl
88778877 as a shared library without RSA. Use #ifndef NO_SSL2 instead of
88788878 NO_RSA in ssl/s2*.c.
8879 [Kris Kennaway <kris@hub.freebsd.org>, modified by Ulf Möller]
8879 [Kris Kennaway <kris@hub.freebsd.org>, modified by Ulf Möller]
88808880
88818881 *) Precautions against using the PRNG uninitialized: RAND_bytes() now
88828882 has a return value which indicates the quality of the random data
88858885 guaranteed to be unique but not unpredictable. RAND_add is like
88868886 RAND_seed, but takes an extra argument for an entropy estimate
88878887 (RAND_seed always assumes full entropy).
8888 [Ulf Möller]
8888 [Ulf Möller]
88898889
88908890 *) Do more iterations of Rabin-Miller probable prime test (specifically,
88918891 3 for 1024-bit primes, 6 for 512-bit primes, 12 for 256-bit primes
89158915 [Steve Henson]
89168916
89178917 *) Honor the no-xxx Configure options when creating .DEF files.
8918 [Ulf Möller]
8918 [Ulf Möller]
89198919
89208920 *) Add PKCS#10 attributes to field table: challengePassword,
89218921 unstructuredName and unstructuredAddress. These are taken from
97499749
97509750 *) More DES library cleanups: remove references to srand/rand and
97519751 delete an unused file.
9752 [Ulf Möller]
9752 [Ulf Möller]
97539753
97549754 *) Add support for the the free Netwide assembler (NASM) under Win32,
97559755 since not many people have MASM (ml) and it can be hard to obtain.
98389838 worked.
98399839
98409840 *) Fix problems with no-hmac etc.
9841 [Ulf Möller, pointed out by Brian Wellington <bwelling@tislabs.com>]
9841 [Ulf Möller, pointed out by Brian Wellington <bwelling@tislabs.com>]
98429842
98439843 *) New functions RSA_get_default_method(), RSA_set_method() and
98449844 RSA_get_method(). These allows replacement of RSA_METHODs without having
99559955 [Ben Laurie]
99569956
99579957 *) DES library cleanups.
9958 [Ulf Möller]
9958 [Ulf Möller]
99599959
99609960 *) Add support for PKCS#5 v2.0 PBE algorithms. This will permit PKCS#8 to be
99619961 used with any cipher unlike PKCS#5 v1.5 which can at most handle 64 bit
99989998 [Christian Forster <fo@hawo.stw.uni-erlangen.de>]
99999999
1000010000 *) config now generates no-xxx options for missing ciphers.
10001 [Ulf Möller]
10001 [Ulf Möller]
1000210002
1000310003 *) Support the EBCDIC character set (work in progress).
1000410004 File ebcdic.c not yet included because it has a different license.
1011110111 [Bodo Moeller]
1011210112
1011310113 *) Move openssl.cnf out of lib/.
10114 [Ulf Möller]
10114 [Ulf Möller]
1011510115
1011610116 *) Fix various things to let OpenSSL even pass ``egcc -pipe -O2 -Wall
1011710117 -Wshadow -Wpointer-arith -Wcast-align -Wmissing-prototypes
1016810168 [Ben Laurie]
1016910169
1017010170 *) Support Borland C++ builder.
10171 [Janez Jere <jj@void.si>, modified by Ulf Möller]
10171 [Janez Jere <jj@void.si>, modified by Ulf Möller]
1017210172
1017310173 *) Support Mingw32.
10174 [Ulf Möller]
10174 [Ulf Möller]
1017510175
1017610176 *) SHA-1 cleanups and performance enhancements.
1017710177 [Andy Polyakov <appro@fy.chalmers.se>]
1018010180 [Andy Polyakov <appro@fy.chalmers.se>]
1018110181
1018210182 *) Accept any -xxx and +xxx compiler options in Configure.
10183 [Ulf Möller]
10183 [Ulf Möller]
1018410184
1018510185 *) Update HPUX configuration.
1018610186 [Anonymous]
1021310213 [Bodo Moeller]
1021410214
1021510215 *) OAEP decoding bug fix.
10216 [Ulf Möller]
10216 [Ulf Möller]
1021710217
1021810218 *) Support INSTALL_PREFIX for package builders, as proposed by
1021910219 David Harris.
1023610236 [Niels Poppe <niels@netbox.org>]
1023710237
1023810238 *) New Configure option no-<cipher> (rsa, idea, rc5, ...).
10239 [Ulf Möller]
10239 [Ulf Möller]
1024010240
1024110241 *) Add the PKCS#12 API documentation to openssl.txt. Preliminary support for
1024210242 extension adding in x509 utility.
1024310243 [Steve Henson]
1024410244
1024510245 *) Remove NOPROTO sections and error code comments.
10246 [Ulf Möller]
10246 [Ulf Möller]
1024710247
1024810248 *) Partial rewrite of the DEF file generator to now parse the ANSI
1024910249 prototypes.
1025010250 [Steve Henson]
1025110251
1025210252 *) New Configure options --prefix=DIR and --openssldir=DIR.
10253 [Ulf Möller]
10253 [Ulf Möller]
1025410254
1025510255 *) Complete rewrite of the error code script(s). It is all now handled
1025610256 by one script at the top level which handles error code gathering,
1027910279 [Steve Henson]
1028010280
1028110281 *) Move the autogenerated header file parts to crypto/opensslconf.h.
10282 [Ulf Möller]
10282 [Ulf Möller]
1028310283
1028410284 *) Fix new 56-bit DES export ciphersuites: they were using 7 bytes instead of
1028510285 8 of keying material. Merlin has also confirmed interop with this fix
1029710297 [Andy Polyakov <appro@fy.chalmers.se>]
1029810298
1029910299 *) Change functions to ANSI C.
10300 [Ulf Möller]
10300 [Ulf Möller]
1030110301
1030210302 *) Fix typos in error codes.
10303 [Martin Kraemer <Martin.Kraemer@MchP.Siemens.De>, Ulf Möller]
10303 [Martin Kraemer <Martin.Kraemer@MchP.Siemens.De>, Ulf Möller]
1030410304
1030510305 *) Remove defunct assembler files from Configure.
10306 [Ulf Möller]
10306 [Ulf Möller]
1030710307
1030810308 *) SPARC v8 assembler BIGNUM implementation.
1030910309 [Andy Polyakov <appro@fy.chalmers.se>]
1034010340 [Steve Henson]
1034110341
1034210342 *) New Configure option "rsaref".
10343 [Ulf Möller]
10343 [Ulf Möller]
1034410344
1034510345 *) Don't auto-generate pem.h.
1034610346 [Bodo Moeller]
1038810388
1038910389 *) New functions DSA_do_sign and DSA_do_verify to provide access to
1039010390 the raw DSA values prior to ASN.1 encoding.
10391 [Ulf Möller]
10391 [Ulf Möller]
1039210392
1039310393 *) Tweaks to Configure
1039410394 [Niels Poppe <niels@netbox.org>]
1039810398 [Steve Henson]
1039910399
1040010400 *) New variables $(RANLIB) and $(PERL) in the Makefiles.
10401 [Ulf Möller]
10401 [Ulf Möller]
1040210402
1040310403 *) New config option to avoid instructions that are illegal on the 80386.
1040410404 The default code is faster, but requires at least a 486.
10405 [Ulf Möller]
10405 [Ulf Möller]
1040610406
1040710407 *) Got rid of old SSL2_CLIENT_VERSION (inconsistently used) and
1040810408 SSL2_SERVER_VERSION (not used at all) macros, which are now the
1094110941 Hagino <itojun@kame.net>]
1094210942
1094310943 *) File was opened incorrectly in randfile.c.
10944 [Ulf Möller <ulf@fitug.de>]
10944 [Ulf Möller <ulf@fitug.de>]
1094510945
1094610946 *) Beginning of support for GeneralizedTime. d2i, i2d, check and print
1094710947 functions. Also ASN1_TIME suite which is a CHOICE of UTCTime or
1095110951 [Steve Henson]
1095210952
1095310953 *) Correct Linux 1 recognition in config.
10954 [Ulf Möller <ulf@fitug.de>]
10954 [Ulf Möller <ulf@fitug.de>]
1095510955
1095610956 *) Remove pointless MD5 hash when using DSA keys in ca.
1095710957 [Anonymous <nobody@replay.com>]
1109811098
1109911099 *) Fix the RSA header declarations that hid a bug I fixed in 0.9.0b but
1110011100 was already fixed by Eric for 0.9.1 it seems.
11101 [Ben Laurie - pointed out by Ulf Möller <ulf@fitug.de>]
11101 [Ben Laurie - pointed out by Ulf Möller <ulf@fitug.de>]
1110211102
1110311103 *) Autodetect FreeBSD3.
1110411104 [Ben Laurie]
4444 # the undertaken effort was that it appeared that in tight IA-32
4545 # register window little-endian flavor could achieve slightly higher
4646 # Instruction Level Parallelism, and it indeed resulted in up to 15%
47 # better performance on most recent µ-archs...
47 # better performance on most recent µ-archs...
4848 #
4949 # Third version adds AES_cbc_encrypt implementation, which resulted in
5050 # up to 40% performance imrovement of CBC benchmark results. 40% was
223223 $speed_limit=512; # chunks smaller than $speed_limit are
224224 # processed with compact routine in CBC mode
225225 $small_footprint=1; # $small_footprint=1 code is ~5% slower [on
226 # recent µ-archs], but ~5 times smaller!
226 # recent µ-archs], but ~5 times smaller!
227227 # I favor compact code to minimize cache
228228 # contention and in hope to "collect" 5% back
229229 # in real-life applications...
564564 # Performance is not actually extraordinary in comparison to pure
565565 # x86 code. In particular encrypt performance is virtually the same.
566566 # Decrypt performance on the other hand is 15-20% better on newer
567 # µ-archs [but we're thankful for *any* improvement here], and ~50%
567 # µ-archs [but we're thankful for *any* improvement here], and ~50%
568568 # better on PIII:-) And additionally on the pros side this code
569569 # eliminates redundant references to stack and thus relieves/
570570 # minimizes the pressure on the memory bus.
890890 MVC B0,ILC
891891 || SUB B0,1,B0
892892
893 GMPY4 $K[0],A24,$Kx9[0] ; ·0x09
893 GMPY4 $K[0],A24,$Kx9[0] ; ·0x09
894894 || GMPY4 $K[1],A24,$Kx9[1]
895895 || MVK 0x00000D0D,A25
896896 || MVK 0x00000E0E,B25
899899 || MVKH 0x0D0D0000,A25
900900 || MVKH 0x0E0E0000,B25
901901
902 GMPY4 $K[0],B24,$KxB[0] ; ·0x0B
902 GMPY4 $K[0],B24,$KxB[0] ; ·0x0B
903903 || GMPY4 $K[1],B24,$KxB[1]
904904 GMPY4 $K[2],B24,$KxB[2]
905905 || GMPY4 $K[3],B24,$KxB[3]
906906
907907 SPLOOP 11 ; InvMixColumns
908908 ;;====================================================================
909 GMPY4 $K[0],A25,$KxD[0] ; ·0x0D
909 GMPY4 $K[0],A25,$KxD[0] ; ·0x0D
910910 || GMPY4 $K[1],A25,$KxD[1]
911911 || SWAP2 $Kx9[0],$Kx9[0] ; rotate by 16
912912 || SWAP2 $Kx9[1],$Kx9[1]
923923 || [B0] LDW *${KPA}[6],$K[2]
924924 || [B0] LDW *${KPB}[7],$K[3]
925925
926 GMPY4 $s[0],B25,$KxE[0] ; ·0x0E
926 GMPY4 $s[0],B25,$KxE[0] ; ·0x0E
927927 || GMPY4 $s[1],B25,$KxE[1]
928928 || XOR $Kx9[0],$KxB[0],$KxB[0]
929929 || XOR $Kx9[1],$KxB[1],$KxB[1]
943943
944944 XOR $KxE[0],$KxD[0],$KxE[0]
945945 || XOR $KxE[1],$KxD[1],$KxE[1]
946 || [B0] GMPY4 $K[0],A24,$Kx9[0] ; ·0x09
946 || [B0] GMPY4 $K[0],A24,$Kx9[0] ; ·0x09
947947 || [B0] GMPY4 $K[1],A24,$Kx9[1]
948948 || ADDAW $KPA,4,$KPA
949949 XOR $KxE[2],$KxD[2],$KxE[2]
954954
955955 XOR $KxB[0],$KxE[0],$KxE[0]
956956 || XOR $KxB[1],$KxE[1],$KxE[1]
957 || [B0] GMPY4 $K[0],B24,$KxB[0] ; ·0x0B
957 || [B0] GMPY4 $K[0],B24,$KxB[0] ; ·0x0B
958958 || [B0] GMPY4 $K[1],B24,$KxB[1]
959959 XOR $KxB[2],$KxE[2],$KxE[2]
960960 || XOR $KxB[3],$KxE[3],$KxE[3]
2626 # referred below, which improves ECDH and ECDSA verify benchmarks
2727 # by 18-40%.
2828 #
29 # Câmara, D.; Gouvêa, C. P. L.; López, J. & Dahab, R.: Fast Software
29 # Câmara, D.; Gouvêa, C. P. L.; López, J. & Dahab, R.: Fast Software
3030 # Polynomial Multiplication on ARM Processors using the NEON Engine.
3131 #
3232 # http://conradoplg.cryptoland.net/files/2010/12/mocrysen13.pdf
147147 ################
148148 # void bn_GF2m_mul_2x2(BN_ULONG *r,
149149 # BN_ULONG a1,BN_ULONG a0,
150 # BN_ULONG b1,BN_ULONG b0); # r[3..0]=a1a0·b1b0
150 # BN_ULONG b1,BN_ULONG b0); # r[3..0]=a1a0·b1b0
151151 {
152152 $code.=<<___;
153153 .global bn_GF2m_mul_2x2
170170 mov $mask,#7<<2
171171 sub sp,sp,#32 @ allocate tab[8]
172172
173 bl mul_1x1_ialu @ a1·b1
173 bl mul_1x1_ialu @ a1·b1
174174 str $lo,[$ret,#8]
175175 str $hi,[$ret,#12]
176176
180180 eor r2,r2,$a
181181 eor $b,$b,r3
182182 eor $a,$a,r2
183 bl mul_1x1_ialu @ a0·b0
183 bl mul_1x1_ialu @ a0·b0
184184 str $lo,[$ret]
185185 str $hi,[$ret,#4]
186186
187187 eor $a,$a,r2
188188 eor $b,$b,r3
189 bl mul_1x1_ialu @ (a1+a0)·(b1+b0)
189 bl mul_1x1_ialu @ (a1+a0)·(b1+b0)
190190 ___
191191 @r=map("r$_",(6..9));
192192 $code.=<<___;
119119 .asmfunc
120120 MVK 0xFF,$xFF
121121 ___
122 &mul_1x1_upper($a0,$b0); # a0·b0
122 &mul_1x1_upper($a0,$b0); # a0·b0
123123 $code.=<<___;
124124 || MV $b1,$B
125125 MV $a1,$A
126126 ___
127 &mul_1x1_merged("A28","B28",$A,$B); # a0·b0/a1·b1
127 &mul_1x1_merged("A28","B28",$A,$B); # a0·b0/a1·b1
128128 $code.=<<___;
129129 || XOR $b0,$b1,$B
130130 XOR $a0,$a1,$A
131131 ___
132 &mul_1x1_merged("A31","B31",$A,$B); # a1·b1/(a0+a1)·(b0+b1)
132 &mul_1x1_merged("A31","B31",$A,$B); # a1·b1/(a0+a1)·(b0+b1)
133133 $code.=<<___;
134134 XOR A28,A31,A29
135 || XOR B28,B31,B29 ; a0·b0+a1·b1
135 || XOR B28,B31,B29 ; a0·b0+a1·b1
136136 ___
137 &mul_1x1_lower("A30","B30"); # (a0+a1)·(b0+b1)
137 &mul_1x1_lower("A30","B30"); # (a0+a1)·(b0+b1)
138138 $code.=<<___;
139139 || BNOP B3
140140 XOR A29,A30,A30
141 || XOR B29,B30,B30 ; (a0+a1)·(b0+b1)-a0·b0-a1·b1
141 || XOR B29,B30,B30 ; (a0+a1)·(b0+b1)-a0·b0-a1·b1
142142 XOR B28,A30,A30
143143 || STW A28,*${rp}[0]
144144 XOR B30,A31,A31
567567 // I've estimated this routine to run in ~120 ticks, but in reality
568568 // (i.e. according to ar.itc) it takes ~160 ticks. Are those extra
569569 // cycles consumed for instructions fetch? Or did I misinterpret some
570 // clause in Itanium µ-architecture manual? Comments are welcomed and
570 // clause in Itanium µ-architecture manual? Comments are welcomed and
571571 // highly appreciated.
572572 //
573573 // On Itanium 2 it takes ~190 ticks. This is because of stalls on
171171 if ($SIZE_T==8) {
172172 my @r=map("%r$_",(6..9));
173173 $code.=<<___;
174 bras $ra,_mul_1x1 # a1·b1
174 bras $ra,_mul_1x1 # a1·b1
175175 stmg $lo,$hi,16($rp)
176176
177177 lg $a,`$stdframe+128+4*$SIZE_T`($sp)
178178 lg $b,`$stdframe+128+6*$SIZE_T`($sp)
179 bras $ra,_mul_1x1 # a0·b0
179 bras $ra,_mul_1x1 # a0·b0
180180 stmg $lo,$hi,0($rp)
181181
182182 lg $a,`$stdframe+128+3*$SIZE_T`($sp)
183183 lg $b,`$stdframe+128+5*$SIZE_T`($sp)
184184 xg $a,`$stdframe+128+4*$SIZE_T`($sp)
185185 xg $b,`$stdframe+128+6*$SIZE_T`($sp)
186 bras $ra,_mul_1x1 # (a0+a1)·(b0+b1)
186 bras $ra,_mul_1x1 # (a0+a1)·(b0+b1)
187187 lmg @r[0],@r[3],0($rp)
188188
189189 xgr $lo,$hi
1313 # the time being... Except that it has three code paths: pure integer
1414 # code suitable for any x86 CPU, MMX code suitable for PIII and later
1515 # and PCLMULQDQ suitable for Westmere and later. Improvement varies
16 # from one benchmark and µ-arch to another. Below are interval values
16 # from one benchmark and µ-arch to another. Below are interval values
1717 # for 163- and 571-bit ECDH benchmarks relative to compiler-generated
1818 # code:
1919 #
225225 &push ("edi");
226226 &mov ($a,&wparam(1));
227227 &mov ($b,&wparam(3));
228 &call ("_mul_1x1_mmx"); # a1·b1
228 &call ("_mul_1x1_mmx"); # a1·b1
229229 &movq ("mm7",$R);
230230
231231 &mov ($a,&wparam(2));
232232 &mov ($b,&wparam(4));
233 &call ("_mul_1x1_mmx"); # a0·b0
233 &call ("_mul_1x1_mmx"); # a0·b0
234234 &movq ("mm6",$R);
235235
236236 &mov ($a,&wparam(1));
237237 &mov ($b,&wparam(3));
238238 &xor ($a,&wparam(2));
239239 &xor ($b,&wparam(4));
240 &call ("_mul_1x1_mmx"); # (a0+a1)·(b0+b1)
240 &call ("_mul_1x1_mmx"); # (a0+a1)·(b0+b1)
241241 &pxor ($R,"mm7");
242242 &mov ($a,&wparam(0));
243 &pxor ($R,"mm6"); # (a0+a1)·(b0+b1)-a1·b1-a0·b0
243 &pxor ($R,"mm6"); # (a0+a1)·(b0+b1)-a1·b1-a0·b0
244244
245245 &movq ($A,$R);
246246 &psllq ($R,32);
265265
266266 &mov ($a,&wparam(1));
267267 &mov ($b,&wparam(3));
268 &call ("_mul_1x1_ialu"); # a1·b1
268 &call ("_mul_1x1_ialu"); # a1·b1
269269 &mov (&DWP(8,"esp"),$lo);
270270 &mov (&DWP(12,"esp"),$hi);
271271
272272 &mov ($a,&wparam(2));
273273 &mov ($b,&wparam(4));
274 &call ("_mul_1x1_ialu"); # a0·b0
274 &call ("_mul_1x1_ialu"); # a0·b0
275275 &mov (&DWP(0,"esp"),$lo);
276276 &mov (&DWP(4,"esp"),$hi);
277277
279279 &mov ($b,&wparam(3));
280280 &xor ($a,&wparam(2));
281281 &xor ($b,&wparam(4));
282 &call ("_mul_1x1_ialu"); # (a0+a1)·(b0+b1)
282 &call ("_mul_1x1_ialu"); # (a0+a1)·(b0+b1)
283283
284284 &mov ("ebp",&wparam(0));
285285 @r=("ebx","ecx","edi","esi");
6464 # undef mul_add
6565
6666 /*-
67 * "m"(a), "+m"(r) is the way to favor DirectPath µ-code;
67 * "m"(a), "+m"(r) is the way to favor DirectPath µ-code;
6868 * "g"(0) let the compiler to decide where does it
6969 * want to keep the value of zero;
7070 */
1212 # in bn_gf2m.c. It's kind of low-hanging mechanical port from C for
1313 # the time being... Except that it has two code paths: code suitable
1414 # for any x86_64 CPU and PCLMULQDQ one suitable for Westmere and
15 # later. Improvement varies from one benchmark and µ-arch to another.
15 # later. Improvement varies from one benchmark and µ-arch to another.
1616 # Vanilla code path is at most 20% faster than compiler-generated code
1717 # [not very impressive], while PCLMULQDQ - whole 85%-160% better on
1818 # 163- and 571-bit ECDH benchmarks on Intel CPUs. Keep in mind that
183183 $code.=<<___;
184184 movdqa %xmm0,%xmm4
185185 movdqa %xmm1,%xmm5
186 pclmulqdq \$0,%xmm1,%xmm0 # a1·b1
186 pclmulqdq \$0,%xmm1,%xmm0 # a1·b1
187187 pxor %xmm2,%xmm4
188188 pxor %xmm3,%xmm5
189 pclmulqdq \$0,%xmm3,%xmm2 # a0·b0
190 pclmulqdq \$0,%xmm5,%xmm4 # (a0+a1)·(b0+b1)
189 pclmulqdq \$0,%xmm3,%xmm2 # a0·b0
190 pclmulqdq \$0,%xmm5,%xmm4 # (a0+a1)·(b0+b1)
191191 xorps %xmm0,%xmm4
192 xorps %xmm2,%xmm4 # (a0+a1)·(b0+b1)-a0·b0-a1·b1
192 xorps %xmm2,%xmm4 # (a0+a1)·(b0+b1)-a0·b0-a1·b1
193193 movdqa %xmm4,%xmm5
194194 pslldq \$8,%xmm4
195195 psrldq \$8,%xmm5
224224 mov \$0xf,$mask
225225 mov $a1,$a
226226 mov $b1,$b
227 call _mul_1x1 # a1·b1
227 call _mul_1x1 # a1·b1
228228 mov $lo,16(%rsp)
229229 mov $hi,24(%rsp)
230230
231231 mov 48(%rsp),$a
232232 mov 64(%rsp),$b
233 call _mul_1x1 # a0·b0
233 call _mul_1x1 # a0·b0
234234 mov $lo,0(%rsp)
235235 mov $hi,8(%rsp)
236236
238238 mov 56(%rsp),$b
239239 xor 48(%rsp),$a
240240 xor 64(%rsp),$b
241 call _mul_1x1 # (a0+a1)·(b0+b1)
241 call _mul_1x1 # (a0+a1)·(b0+b1)
242242 ___
243243 @r=("%rbx","%rcx","%rdi","%rsi");
244244 $code.=<<___;
4444 # processes one byte in 8.45 cycles, A9 - in 10.2, A15 - in 7.63,
4545 # Snapdragon S4 - in 9.33.
4646 #
47 # Câmara, D.; Gouvêa, C. P. L.; López, J. & Dahab, R.: Fast Software
47 # Câmara, D.; Gouvêa, C. P. L.; López, J. & Dahab, R.: Fast Software
4848 # Polynomial Multiplication on ARM Processors using the NEON Engine.
4949 #
5050 # http://conradoplg.cryptoland.net/files/2010/12/mocrysen13.pdf
448448 veor $IN,$Xl @ inp^=Xi
449449 .Lgmult_neon:
450450 ___
451 &clmul64x64 ($Xl,$Hlo,"$IN#lo"); # H.lo·Xi.lo
451 &clmul64x64 ($Xl,$Hlo,"$IN#lo"); # H.lo·Xi.lo
452452 $code.=<<___;
453453 veor $IN#lo,$IN#lo,$IN#hi @ Karatsuba pre-processing
454454 ___
455 &clmul64x64 ($Xm,$Hhl,"$IN#lo"); # (H.lo+H.hi)·(Xi.lo+Xi.hi)
456 &clmul64x64 ($Xh,$Hhi,"$IN#hi"); # H.hi·Xi.hi
455 &clmul64x64 ($Xm,$Hhl,"$IN#lo"); # (H.lo+H.hi)·(Xi.lo+Xi.hi)
456 &clmul64x64 ($Xh,$Hhi,"$IN#hi"); # H.hi·Xi.hi
457457 $code.=<<___;
458458 veor $Xm,$Xm,$Xl @ Karatsuba post-processing
459459 veor $Xm,$Xm,$Xh
152152 # 8/2 S1 L1x S2 | ....
153153 #####... ................|............
154154 $code.=<<___;
155 XORMPY $H0,$xia,$H0x ; 0 ; H·(Xi[i]<<1)
155 XORMPY $H0,$xia,$H0x ; 0 ; H·(Xi[i]<<1)
156156 || XORMPY $H01u,$xib,$H01y
157157 || [A0] LDBU *--${xip},$x0
158158 XORMPY $H1,$xia,$H1x ; 1
161161 XORMPY $H3,$xia,$H3x ; 3
162162 || XORMPY $H3u,$xib,$H3y
163163 ||[!A0] MVK.D 15,A0 ; *--${xip} counter
164 XOR.L $H0x,$Z0,$Z0 ; 4 ; Z^=H·(Xi[i]<<1)
164 XOR.L $H0x,$Z0,$Z0 ; 4 ; Z^=H·(Xi[i]<<1)
165165 || [A0] SUB.S A0,1,A0
166166 XOR.L $H1x,$Z1,$Z1 ; 5
167167 || AND.D $H01y,$FF000000,$H0z
378378 or $V,%lo(0xA0406080),$V
379379 or %l0,%lo(0x20C0E000),%l0
380380 sllx $V,32,$V
381 or %l0,$V,$V ! (0xE0·i)&0xff=0xA040608020C0E000
381 or %l0,$V,$V ! (0xE0·i)&0xff=0xA040608020C0E000
382382 stx $V,[%i0+16]
383383
384384 ret
398398
399399 mov 0xE1,%l7
400400 sllx %l7,57,$xE1 ! 57 is not a typo
401 ldx [$Htable+16],$V ! (0xE0·i)&0xff=0xA040608020C0E000
401 ldx [$Htable+16],$V ! (0xE0·i)&0xff=0xA040608020C0E000
402402
403403 xor $Hhi,$Hlo,$Hhl ! Karatsuba pre-processing
404404 xmulx $Xlo,$Hlo,$C0
410410 xmulx $Xhi,$Hhi,$Xhi
411411
412412 sll $C0,3,$sqr
413 srlx $V,$sqr,$sqr ! ·0xE0 [implicit &(7<<3)]
413 srlx $V,$sqr,$sqr ! ·0xE0 [implicit &(7<<3)]
414414 xor $C0,$sqr,$sqr
415 sllx $sqr,57,$sqr ! ($C0·0xE1)<<1<<56 [implicit &0x7f]
415 sllx $sqr,57,$sqr ! ($C0·0xE1)<<1<<56 [implicit &0x7f]
416416
417417 xor $C0,$C1,$C1 ! Karatsuba post-processing
418418 xor $Xlo,$C2,$C2
422422 xor $Xhi,$C2,$C2
423423 xor $Xhi,$C1,$C1
424424
425 xmulxhi $C0,$xE1,$Xlo ! ·0xE1<<1<<56
425 xmulxhi $C0,$xE1,$Xlo ! ·0xE1<<1<<56
426426 xor $C0,$C2,$C2
427427 xmulx $C1,$xE1,$C0
428428 xor $C1,$C3,$C3
452452
453453 mov 0xE1,%l7
454454 sllx %l7,57,$xE1 ! 57 is not a typo
455 ldx [$Htable+16],$V ! (0xE0·i)&0xff=0xA040608020C0E000
455 ldx [$Htable+16],$V ! (0xE0·i)&0xff=0xA040608020C0E000
456456
457457 and $inp,7,$shl
458458 andn $inp,7,$inp
489489 xmulx $Xhi,$Hhi,$Xhi
490490
491491 sll $C0,3,$sqr
492 srlx $V,$sqr,$sqr ! ·0xE0 [implicit &(7<<3)]
492 srlx $V,$sqr,$sqr ! ·0xE0 [implicit &(7<<3)]
493493 xor $C0,$sqr,$sqr
494 sllx $sqr,57,$sqr ! ($C0·0xE1)<<1<<56 [implicit &0x7f]
494 sllx $sqr,57,$sqr ! ($C0·0xE1)<<1<<56 [implicit &0x7f]
495495
496496 xor $C0,$C1,$C1 ! Karatsuba post-processing
497497 xor $Xlo,$C2,$C2
501501 xor $Xhi,$C2,$C2
502502 xor $Xhi,$C1,$C1
503503
504 xmulxhi $C0,$xE1,$Xlo ! ·0xE1<<1<<56
504 xmulxhi $C0,$xE1,$Xlo ! ·0xE1<<1<<56
505505 xor $C0,$C2,$C2
506506 xmulx $C1,$xE1,$C0
507507 xor $C1,$C3,$C3
357357 # effective address calculation and finally merge of value to Z.hi.
358358 # Reference to rem_4bit is scheduled so late that I had to >>4
359359 # rem_4bit elements. This resulted in 20-45% procent improvement
360 # on contemporary µ-archs.
360 # on contemporary µ-archs.
361361 {
362362 my $cnt;
363363 my $rem_4bit = "eax";
575575 # experimental alternative. special thing about is that there
576576 # no dependency between the two multiplications...
577577 mov \$`0xE1<<1`,%eax
578 mov \$0xA040608020C0E000,%r10 # ((7..0)·0xE0)&0xff
578 mov \$0xA040608020C0E000,%r10 # ((7..0)·0xE0)&0xff
579579 mov \$0x07,%r11d
580580 movq %rax,$T1
581581 movq %r10,$T2
582582 movq %r11,$T3 # borrow $T3
583583 pand $Xi,$T3
584 pshufb $T3,$T2 # ($Xi&7)·0xE0
584 pshufb $T3,$T2 # ($Xi&7)·0xE0
585585 movq %rax,$T3
586 pclmulqdq \$0x00,$Xi,$T1 # ·(0xE1<<1)
586 pclmulqdq \$0x00,$Xi,$T1 # ·(0xE1<<1)
587587 pxor $Xi,$T2
588588 pslldq \$15,$T2
589589 paddd $T2,$T2 # <<(64+56+1)
656656 je .Lskip4x
657657
658658 sub \$0x30,$len
659 mov \$0xA040608020C0E000,%rax # ((7..0)·0xE0)&0xff
659 mov \$0xA040608020C0E000,%rax # ((7..0)·0xE0)&0xff
660660 movdqu 0x30($Htbl),$Hkey3
661661 movdqu 0x40($Htbl),$Hkey4
662662
117117 le?vperm $IN,$IN,$IN,$lemask
118118 vxor $zero,$zero,$zero
119119
120 vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
121 vpmsumd $Xm,$IN,$H # H.hi·Xi.lo+H.lo·Xi.hi
122 vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
120 vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
121 vpmsumd $Xm,$IN,$H # H.hi·Xi.lo+H.lo·Xi.hi
122 vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
123123
124124 vpmsumd $t2,$Xl,$xC2 # 1st phase
125125
177177 .align 5
178178 Loop:
179179 subic $len,$len,16
180 vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
180 vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
181181 subfe. r0,r0,r0 # borrow?-1:0
182 vpmsumd $Xm,$IN,$H # H.hi·Xi.lo+H.lo·Xi.hi
182 vpmsumd $Xm,$IN,$H # H.hi·Xi.lo+H.lo·Xi.hi
183183 and r0,r0,$len
184 vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
184 vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
185185 add $inp,$inp,r0
186186
187187 vpmsumd $t2,$Xl,$xC2 # 1st phase
143143 #endif
144144 vext.8 $IN,$t1,$t1,#8
145145
146 vpmull.p64 $Xl,$H,$IN @ H.lo·Xi.lo
146 vpmull.p64 $Xl,$H,$IN @ H.lo·Xi.lo
147147 veor $t1,$t1,$IN @ Karatsuba pre-processing
148 vpmull2.p64 $Xh,$H,$IN @ H.hi·Xi.hi
149 vpmull.p64 $Xm,$Hhl,$t1 @ (H.lo+H.hi)·(Xi.lo+Xi.hi)
148 vpmull2.p64 $Xh,$H,$IN @ H.hi·Xi.hi
149 vpmull.p64 $Xm,$Hhl,$t1 @ (H.lo+H.hi)·(Xi.lo+Xi.hi)
150150
151151 vext.8 $t1,$Xl,$Xh,#8 @ Karatsuba post-processing
152152 veor $t2,$Xl,$Xh
234234 #endif
235235 vext.8 $In,$t1,$t1,#8
236236 veor $IN,$IN,$Xl @ I[i]^=Xi
237 vpmull.p64 $Xln,$H,$In @ H·Ii+1
237 vpmull.p64 $Xln,$H,$In @ H·Ii+1
238238 veor $t1,$t1,$In @ Karatsuba pre-processing
239239 vpmull2.p64 $Xhn,$H,$In
240240 b .Loop_mod2x_v8
243243 .Loop_mod2x_v8:
244244 vext.8 $t2,$IN,$IN,#8
245245 subs $len,$len,#32 @ is there more data?
246 vpmull.p64 $Xl,$H2,$IN @ H^2.lo·Xi.lo
246 vpmull.p64 $Xl,$H2,$IN @ H^2.lo·Xi.lo
247247 cclr $inc,lo @ is it time to zero $inc?
248248
249249 vpmull.p64 $Xmn,$Hhl,$t1
250250 veor $t2,$t2,$IN @ Karatsuba pre-processing
251 vpmull2.p64 $Xh,$H2,$IN @ H^2.hi·Xi.hi
251 vpmull2.p64 $Xh,$H2,$IN @ H^2.hi·Xi.hi
252252 veor $Xl,$Xl,$Xln @ accumulate
253 vpmull2.p64 $Xm,$Hhl,$t2 @ (H^2.lo+H^2.hi)·(Xi.lo+Xi.hi)
253 vpmull2.p64 $Xm,$Hhl,$t2 @ (H^2.lo+H^2.hi)·(Xi.lo+Xi.hi)
254254 vld1.64 {$t0},[$inp],$inc @ load [rotated] I[i+2]
255255
256256 veor $Xh,$Xh,$Xhn
275275 vext.8 $In,$t1,$t1,#8
276276 vext.8 $IN,$t0,$t0,#8
277277 veor $Xl,$Xm,$t2
278 vpmull.p64 $Xln,$H,$In @ H·Ii+1
278 vpmull.p64 $Xln,$H,$In @ H·Ii+1
279279 veor $IN,$IN,$Xh @ accumulate $IN early
280280
281281 vext.8 $t2,$Xl,$Xl,#8 @ 2nd phase of reduction
299299 veor $IN,$IN,$Xl @ inp^=Xi
300300 veor $t1,$t0,$t2 @ $t1 is rotated inp^Xi
301301
302 vpmull.p64 $Xl,$H,$IN @ H.lo·Xi.lo
302 vpmull.p64 $Xl,$H,$IN @ H.lo·Xi.lo
303303 veor $t1,$t1,$IN @ Karatsuba pre-processing
304 vpmull2.p64 $Xh,$H,$IN @ H.hi·Xi.hi
305 vpmull.p64 $Xm,$Hhl,$t1 @ (H.lo+H.hi)·(Xi.lo+Xi.hi)
304 vpmull2.p64 $Xh,$H,$IN @ H.hi·Xi.hi
305 vpmull.p64 $Xm,$Hhl,$t1 @ (H.lo+H.hi)·(Xi.lo+Xi.hi)
306306
307307 vext.8 $t1,$Xl,$Xh,#8 @ Karatsuba post-processing
308308 veor $t2,$Xl,$Xh
4343 # Sandy Bridge 5.0/+8%
4444 # Atom 12.6/+6%
4545 # VIA Nano 6.4/+9%
46 # Ivy Bridge 4.9/±0%
46 # Ivy Bridge 4.9/±0%
4747 # Bulldozer 4.9/+15%
4848 #
4949 # (*) PIII can actually deliver 6.6 cycles per byte with MMX code,
5555 # achieves respectful 432MBps on 2.8GHz processor now. For reference.
5656 # If executed on Xeon, current RC4_CHAR code-path is 2.7x faster than
5757 # RC4_INT code-path. While if executed on Opteron, it's only 25%
58 # slower than the RC4_INT one [meaning that if CPU µ-arch detection
58 # slower than the RC4_INT one [meaning that if CPU µ-arch detection
5959 # is not implemented, then this final RC4_CHAR code-path should be
6060 # preferred, as it provides better *all-round* performance].
6161
6565 # switch to AVX alone improves performance by as little as 4% in
6666 # comparison to SSSE3 code path. But below result doesn't look like
6767 # 4% improvement... Trouble is that Sandy Bridge decodes 'ro[rl]' as
68 # pair of µ-ops, and it's the additional µ-ops, two per round, that
68 # pair of µ-ops, and it's the additional µ-ops, two per round, that
6969 # make it run slower than Core2 and Westmere. But 'sh[rl]d' is decoded
70 # as single µ-op by Sandy Bridge and it's replacing 'ro[rl]' with
70 # as single µ-op by Sandy Bridge and it's replacing 'ro[rl]' with
7171 # equivalent 'sh[rl]d' that is responsible for the impressive 5.1
7272 # cycles per processed byte. But 'sh[rl]d' is not something that used
7373 # to be fast, nor does it appear to be fast in upcoming Bulldozer
99 # SHA256 block transform for x86. September 2007.
1010 #
1111 # Performance improvement over compiler generated code varies from
12 # 10% to 40% [see below]. Not very impressive on some µ-archs, but
12 # 10% to 40% [see below]. Not very impressive on some µ-archs, but
1313 # it's 5 times smaller and optimizies amount of writes.
1414 #
1515 # May 2012.
3636 #
3737 # IALU code-path is optimized for elder Pentiums. On vanilla Pentium
3838 # performance improvement over compiler generated code reaches ~60%,
39 # while on PIII - ~35%. On newer µ-archs improvement varies from 15%
39 # while on PIII - ~35%. On newer µ-archs improvement varies from 15%
4040 # to 50%, but it's less important as they are expected to execute SSE2
4141 # code-path, which is commonly ~2-3x faster [than compiler generated
4242 # code]. SSE2 code-path is as fast as original sha512-sse2.pl, even
126126 fmovs %f1,%f3
127127 fmovs %f0,%f2
128128
129 add %fp,BIAS,%i0 ! return pointer to caller´s top of stack
129 add %fp,BIAS,%i0 ! return pointer to caller´s top of stack
130130
131131 ret
132132 restore
1515 # table]. I stick to value of 2 for two reasons: 1. smaller table
1616 # minimizes cache trashing and thus mitigates the hazard of side-
1717 # channel leakage similar to AES cache-timing one; 2. performance
18 # gap among different µ-archs is smaller.
18 # gap among different µ-archs is smaller.
1919 #
2020 # Performance table lists rounded amounts of CPU cycles spent by
2121 # whirlpool_block_mmx routine on single 64 byte input block, i.e.
22 * Contributed to the OpenSSL Project 2004 by Richard Levitte
33 * (richard@levitte.org)
44 */
5 /* Copyright (c) 2004 Kungliga Tekniska Högskolan
5 /* Copyright (c) 2004 Kungliga Tekniska Högskolan
66 * (Royal Institute of Technology, Stockholm, Sweden).
77 * All rights reserved.
88 *
22 * Contributed to the OpenSSL Project 2004 by Richard Levitte
33 * (richard@levitte.org)
44 */
5 /* Copyright (c) 2004 Kungliga Tekniska Högskolan
5 /* Copyright (c) 2004 Kungliga Tekniska Högskolan
66 * (Royal Institute of Technology, Stockholm, Sweden).
77 * All rights reserved.
88 *
6161 day, which means that future revisions will not be fully compatible to
6262 the current version.
6363
64 Bodo Möller <bodo@openssl.org>
64 Bodo Möller <bodo@openssl.org>
5656 VALUE "ProductVersion", "$version\\0"
5757 // Optional:
5858 //VALUE "Comments", "\\0"
59 VALUE "LegalCopyright", "Copyright © 1998-2006 The OpenSSL Project. Copyright © 1995-1998 Eric A. Young, Tim J. Hudson. All rights reserved.\\0"
59 VALUE "LegalCopyright", "Copyright © 1998-2006 The OpenSSL Project. Copyright © 1995-1998 Eric A. Young, Tim J. Hudson. All rights reserved.\\0"
6060 //VALUE "LegalTrademarks", "\\0"
6161 //VALUE "PrivateBuild", "\\0"
6262 //VALUE "SpecialBuild", "\\0"