万万没想到,除了香农计划,Python3.11竟还有这么多性能提升!( 三 )


对于 math 库:优化了 comb(n, k) 与 perm(n, k=None)Python 3.8 在math 标准库中增加了 comb(n, k) 和 perm(n, k=None) 函数 。两者都用于计算从 n 个无重复的元素中选择 k 个元素的方法数,comb 返回无序计算的结果,而perm 返回有序计算的结果 。(译注:即一个求组合数,一个求排列数)
【万万没想到,除了香农计划,Python3.11竟还有这么多性能提升!】3.11 的优化由多个较小的改进组成,比如使用分治算法来实现 Karatsuba 大数乘法,以及尽可能用 C 语言unsigned long long 类型而不是 Python 整数进行comb计算(python/cpython/pull/29090#issue-1031333783" rel="external nofollow noreferrer">*) 。
另外一项改进是针对较小的 k 值(0 <= k <= n <= 67):
(译注:以下两段费解,暂跳过)

对于 0 <= k <= n <= 67, comb(n, k) always fits into a uint64_t. We compute it as comb_odd_part << shift where 2 ** shift is the largest power of two dividing comb(n, k) and comb_odd_part is comb(n, k) >> shift. comb_odd_part can be calculated efficiently via arithmetic modulo 2 ** 64, using three lookups and two uint64_t multiplications, while the necessary shift can be computed via Kummer's theorem: it's the number of carries when adding k to n - k in binary, which in turn is the number of set bits of n ^ k ^ (n - k). python/blob/03dccc557adf39db0150410e7c448ff3164e7022/Modules/mathmodule.c#L3583" rel="external nofollow noreferrer">*
One more improvement is that the previous popcount-based code for computing the largest power of two dividing math.comb(n, k) (for small n) got replaced with a more direct method based on counting trailing zeros of the factorials involved. (python/cpython/pull/30313#issue-1091542983" rel="external nofollow noreferrer">*).
Python 3.10:
$ python -m pyperf timeit -s \'import math' -- 'math.comb(100, 55)'.....................Mean +- std dev: 3.72 us +- 0.07 us# ---$ python -m pyperf timeit -s \'import math' -- 'math.comb(10000, 5500)'.....................Mean +- std dev: 11.9 ms +- 0.1 msPython 3.11:
$ python -m pyperf timeit -s \'import math' -- 'math.comb(100, 55)'.....................Mean +- std dev: 476 ns +- 20 ns# ---$ python -m pyperf timeit -s \'import math' -- 'math.comb(10000, 5500)'.....................Mean +- std dev: 2.28 ms +- 0.10 ms对于 statistics 库:优化了 mean(data)、variance(data, xbar=None) 与 stdev(data, xbar=None)3.11 优化了statistics模块中的 meanvariancestdev 函数 。如果入参是一个迭代器,则会直接用于计算,而不是先将其转换为列表 。这种python/blob/208abcd8f1726646f8d86306616b0db802d8064c/Lib/statistics.py#L205" rel="external nofollow noreferrer">计算方法 的速度比之前的快了一倍 。python.org/3/whatsnew/changelog.html" rel="external nofollow noreferrer">*
Python 3.10:
# Mean$ python -m pyperf timeit -s \'import statistics' -- 'statistics.mean(range(1_000))'.....................Mean +- std dev: 255 us +- 11 us# Variance$ python -m pyperf timeit -s \'import statistics' -- 'statistics.variance((x * 0.1 for x in range(0, 10)))'.....................Mean +- std dev: 77.0 us +- 2.9 us# Sample standard deviation (stdev)$ python -m pyperf timeit -s \'import statistics' -- 'statistics.stdev((x * 0.1 for x in range(0, 10)))'.....................Mean +- std dev: 78.0 us +- 2.2 usPython 3.11:
# Mean$ python -m pyperf timeit -s \'import statistics' -- 'statistics.mean(range(1_000))'.....................Mean +- std dev: 193 us +- 7 us# Variance$ python -m pyperf timeit -s \'import statistics' -- 'statistics.variance((x * 0.1 for x in range(0, 10)))'.....................Mean +- std dev: 56.1 us +- 2.3 us# Sample standard deviation (stdev)$ python -m pyperf timeit -s \'import statistics' -- 'statistics.stdev((x * 0.1 for x in range(0, 10)))'.....................Mean +- std dev: 59.4 us +- 2.6 us

经验总结扩展阅读