diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-06-21 23:36:36 +0200 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-06-21 23:42:26 +0200 |
commit | a89a14ef5da44684a16b204e7a70460cc8c4922a (patch) | |
tree | b23b4c6b155977909ef508fdae2f48d33d802813 /vendor/gmp-6.3.0/mpn/x86_64/README | |
parent | 1db63fcedab0b288820d66e100b1877b1a5a8851 (diff) |
Basic constant folding implementation
Diffstat (limited to 'vendor/gmp-6.3.0/mpn/x86_64/README')
-rw-r--r-- | vendor/gmp-6.3.0/mpn/x86_64/README | 74 |
1 files changed, 74 insertions, 0 deletions
diff --git a/vendor/gmp-6.3.0/mpn/x86_64/README b/vendor/gmp-6.3.0/mpn/x86_64/README new file mode 100644 index 0000000..9c8a586 --- /dev/null +++ b/vendor/gmp-6.3.0/mpn/x86_64/README @@ -0,0 +1,74 @@ +Copyright 2003, 2004, 2006, 2008 Free Software Foundation, Inc. + +This file is part of the GNU MP Library. + +The GNU MP Library is free software; you can redistribute it and/or modify +it under the terms of either: + + * the GNU Lesser General Public License as published by the Free + Software Foundation; either version 3 of the License, or (at your + option) any later version. + +or + + * the GNU General Public License as published by the Free Software + Foundation; either version 2 of the License, or (at your option) any + later version. + +or both in parallel, as here. + +The GNU MP Library is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received copies of the GNU General Public License and the +GNU Lesser General Public License along with the GNU MP Library. If not, +see https://www.gnu.org/licenses/. + + + + + + AMD64 MPN SUBROUTINES + + +This directory contains mpn functions for AMD64 chips. It is also useful +for 64-bit Pentiums, and "Core 2". + + + RELEVANT OPTIMIZATION ISSUES + +The Opteron and Athlon64 can sustain up to 3 instructions per cycle, but in +practice that is only possible for integer instructions. But almost any +three integer instructions can issue simultaneously, including any 3 ALU +operations, including shifts. Up to two memory operations can issue each +cycle. + +Scheduling typically requires that load-use instructions are split into +separate load and use instructions. That requires more decode resources, +and it is rarely a win. Opteron/Athlon64 have deep out-of-order core. + + +Optimizing for 64-bit Pentium4 is probably a waste of time, as the most +critical instructions are very poorly implemented here. Perhaps we could +save a cycle or two, but the most common loops now run at between 10 and 22 +cycles, so a saved cycle isn't too exciting. + + +The new spin of the venerable P6 core, the "Core 2" is much better than the +Pentium4 for the GMP loops. Its integer pipeline is somewhat similar to to +the Opteron/Athlon64 pipeline, except that the GMP favourites ADC/SBB and +MUL are slower. Furthermore, an INC/DEC followed by ADC/SBB incur a +pipeline stall of around 10 cycles. The default mpn_add_n and mpn_sub_n +code suffers badly from the stall. The code in the core2 subdirectory uses +the almost forgotten instruction JRCXZ for loop control, and updates the +induction variable using LEA. + + + +REFERENCES + +"System V Application Binary Interface AMD64 Architecture Processor +Supplement", draft version 0.99, December 2007. +http://www.x86-64.org/documentation/abi.pdf |