Javascript性能:在递减循环中负数的模运算减慢了代码的100%以上

问题描述:

我正在经历Eloquent JavaScript(再次)并且遇到练习"Chess Board" of Chapter 2。当我第一次阅读它的时候,我有一个像样的解决方案写回来,另外一个版本的解决方案在Elequent Javascript website提供。我是其中一个新手想要超级高效的程序员只有一个问题在他们的头上:“我可以让它工作更快或更小吗?Javascript性能:在递减循环中负数的模运算减慢了代码的100%以上

所以,几个月前,我在网络上搜索时,我碰到a question来到堆栈溢出,关于for循环VS while环路上的业绩奠定了基础。因为在那个线程中有人提到for循环比while慢,并且使用递减迭代器的循环速度更快,所以我重写了代码以获得更好的性能。

这里是新版本forwhile和条件,编辑对于递减代替:

console.time("looping"); 
 
var gridSize = 5000, str = '', i = gridSize, j; 
 
while (i--) { 
 
    j = gridSize; 
 
    while (j--) { 
 
    if ((i - j) % 2 === 0) 
 
     str += " "; 
 
    else 
 
     str += "#"; 
 
    } 
 
    str += "\n"; 
 
} 
 

 
//console.log(str); 
 
console.timeEnd("looping");

但让我吃惊的代码有性能更差。然后,过了一段时间我发现if ((i - j) % 2 === 0)是主要的罪魁祸首,与更换减号加上减少执行的总时间〜750ms之间

//Checked on NODE.js = v6.11.2 
Book version of code   --> 893.76 ms 
while loop with subtraction --> 1562.43 ms 
while loop with addition  --> 749.62 ms 


//firefox Benchmark v54.0 (64-bit) (OS Ubuntu 16.04) 
Book version of code   --> 725.10 ms 
while loop with subtraction --> 1565.29 ms 
while loop with addition  --> 601.12 ms 

为什么正在对总如此巨大的影响减法执行时间处理时间?看着@jaromandaX答案

编辑01 6:30 AM(北京时间)8月8日

后,我敢肯定,这不是减法,多数民众赞成放缓这个循环中,负数的其模。 再次我想知道是什么让负数慢

+0

必须是一个V8的东西,我猜你在nodejs以及Chrome浏览器尝试 - 因为Firefox没有显示任何区别 - 但它远慢得多! - 即在我的PC上,它比Chrome慢10倍! –

+0

@ jaromanda-x与V8或spiderMonkey无关,它们都是最先进的,在某些事情上取代了其他,但这种性能上的主要差异只是疯狂的,50-60毫秒是可以接受的,但它超过了一倍,而且我在mozilla上检查了它,并上传了这个结果。 –

+1

查看XOR/AND组合产生的结果会很有趣:'if((i^j)&1 === 0)' – SBS

这远不是一个完整的答案,需要进一步调查(或从别人谁知道V8的实现细节的见解)。不过,这里是我的发现:

Sidenode:结果聚集运行使用下面的命令行的Node.js:

节点--expose-GC --print码--code-评论--print-OPT-代码--trace氢--redirect码迹线--redirect码迹线到= code.asm --trace_representation --trace-deopt --trace-OPT 1.js

和一点展望V8源代码。

1.性能差异来自于在这种情况下生成不同机器码的事实。对于+%代码

    ;;; <@134,#123> add-i 
00000151A32DD74B 395 03c2   addl rax,rdx 
00000151A32DD74D 397 0f807a030000 jo 1293 (00000151A32DDACD) 
        ;;; <@136,#126> mod-by-power-of-2-i 
00000151A32DD753 403 85c0   testl rax,rax 
00000151A32DD755 405 790f   jns 422 (00000151A32DD766) 
00000151A32DD757 407 f7d8   negl rax 
00000151A32DD759 409 83e001   andl rax,0x1 
00000151A32DD75C 412 f7d8   negl rax 
00000151A32DD75E 414 0f846e030000 jz 1298 (00000151A32DDAD2) 
00000151A32DD764 420 eb03   jmp 425 (00000151A32DD769) 
00000151A32DD766 422 83e001   andl rax,0x1 
        ;;; <@138,#200> smi-tag 
00000151A32DD769 425 8bd8   movl rbx,rax 
00000151A32DD76B 427 48c1e320  REX.W shlq rbx, 32 
        ;;; <@140,#130> gap 
00000151A32DD76F 431 488bc2   REX.W movq rax,rdx 

-代码要复杂得多:

    ;;; <@136,#123> sub-i 
00000151A32E57E1 417 412bc3   subl rax,r11 
00000151A32E57E4 420 0f8039040000 jo 1507 (00000151A32E5C23) 
        ;;; <@138,#200> int32-to-double 
00000151A32E57EA 426 c5f957c0  vxorpd xmm0,xmm0,xmm0 
00000151A32E57EE 430 c5fb2ac0  vcvtlsi2sd xmm0,xmm0,rax 
        ;;; <@139,#200> gap 
00000151A32E57F2 434 c5f928ca  vmovapd xmm1,xmm2 
        ;;; <@140,#126> mod-d 
00000151A32E57F6 438 4989e2   REX.W movq r10,rsp 
00000151A32E57F9 441 4883ec28  REX.W subq rsp,0x28 
00000151A32E57FD 445 4883e4f0  REX.W andq rsp,0xf0 
00000151A32E5801 449 4c89542420  REX.W movq [rsp+0x20],r10 
00000151A32E5806 454 48b830d4124001000000 REX.W movq rax,000000014012D430 
00000151A32E5810 464 ffd0   call rax 
00000151A32E5812 466 488b642420  REX.W movq rsp,[rsp+0x20] 
        ;;; <@142,#126> lazy-bailout 
        ;;; <@144,#202> number-tag-d 
00000151A32E5817 471 498b9dc06f0400 REX.W movq rbx,[r13+0x46fc0] 
00000151A32E581E 478 488bc3   REX.W movq rax,rbx 
00000151A32E5821 481 4883c010  REX.W addq rax,0x10 
00000151A32E5825 485 493b85c86f0400 REX.W cmpq rax,[r13+0x46fc8] 
00000151A32E582C 492 0f878f030000 ja 1409 (00000151A32E5BC1) 
00000151A32E5832 498 498985c06f0400 REX.W movq [r13+0x46fc0],rax 
00000151A32E5839 505 48ffc3   REX.W incq rbx 
00000151A32E583C 508 4d8b5560  REX.W movq r10,[r13+0x60] 
00000151A32E5840 512 4c8953ff  REX.W movq [rbx-0x1],r10 
00000151A32E5844 516 c5fb114307  vmovsd [rbx+0x7],xmm0 
        ;;; <@146,#130> gap 
00000151A32E5849 521 488b45a0  REX.W movq rax,[rbp-0x60] 
00000151A32E584D 525 488b7db8  REX.W movq rdi,[rbp-0x48] 
00000151A32E5851 529 488b75c0  REX.W movq rsi,[rbp-0x40] 
00000151A32E5855 533 488b4dc8  REX.W movq rcx,[rbp-0x38] 
00000151A32E5859 537 488b55b0  REX.W movq rdx,[rbp-0x50] 
00000151A32E585D 541 4c8b4da8  REX.W movq r9,[rbp-0x58] 
00000151A32E5861 545 4c8b4598  REX.W movq r8,[rbp-0x68] 
00000151A32E5865 549 c5fb109578ffffff vmovsd xmm2,[rbp-0x88] 

总之不同的是,对于 “加” 的情况下MOD(%使用高度专业化的mod-by-power-of-2-i机器代码但它使用mod-d其基于浮点算术implemen所做的“减”情况下,进行)动作塔季翁。

还要注意mod-by-power-of-2-i机器代码不支持负值。可以粗略地将其重写为:

if (rax < 0) { 
    rax = -rax; 
    rax = rax & 1; 
    rax = -rax; 
} 
else { 
    rax = rax & 1; 
} 

因此,这不是仅针对正值的优化机器码的情况。

2.生成的代码的差异似乎来自类型推断工作不同的事实。通过--trace_representation生产日志显示的简化程序“加”和“减”的情况之间存在以下区别:

var f_minus = function(log) { 
    var str = '', i = gridSize, j; 
    while (i--) { 
     j = gridSize; 
     while (j--) { 
     var ttt = (i - j) % 2 
     } 
    } 

    if(log) { 
    if(ttt == -1) 
     console.log(t); 
    } 
} 


var f_plus = function(log) { 
    var str = '', i = gridSize, j; 
    while (i--) { 
     j = gridSize; 
     while (j--) { 
     var ttt = (i + j) % 2 
     } 
    } 

    if(log){ 
    if(ttt == -1) 
     console.log(t); 
    } 
} 

比较

[marking 00000025D4303E91 <JS Function f_minus (SharedFunctionInfo 00000278933F61C1)> for optimized recompilation, reason: small function, ICs with typeinfo: 8/12 (66%), generic ICs: 0/12 (0%)] 
[compiling method 00000025D4303E91 <JS Function f_minus (SharedFunctionInfo 00000278933F61C1)> using Crankshaft OSR] 
#37 Phi is used by real #110 Branch as v 
#38 Phi is used by real #58 Add as t 
#38 Phi is used by real #69 StackCheck as v 
#38 Phi is used by real #70 LoadContextSlot as v 
#38 Phi is used by real #122 CompareGeneric as t 
#38 Phi is used by real #132 LoadGlobalGeneric as v 
#38 Phi is used by real #134 LoadNamedGeneric as v 
#38 Phi is used by real #136 LoadGlobalGeneric as v 
#38 Phi is used by real #141 CallWithDescriptor as v 
#38 Phi is used by real #156 Return as v 
#38 Phi is used by real #101 Mod as t 
#38 Phi is used by real #98 Sub as t 
#38 Phi is used by real #95 StackCheck as v 
#38 Phi is used by real #84 Add as t 
#47 Phi is used by real #56 ForceRepresentation as s 
#49 Phi is used by real #122 CompareGeneric as t 
#77 Phi is used by real #83 ForceRepresentation as s 
generalizing use representation 'v' of #40 Phi with uses of #47 Phi 's' 
generalizing use representation 'v' of #42 Phi with uses of #49 Phi 't' 
generalizing use representation 't' of #42 Phi with uses of #78 Phi 'v' 
generalizing use representation 'v' of #49 Phi with uses of #78 Phi 'v' 
generalizing use representation 'v' of #78 Phi with uses of #49 Phi 't' 
Changing #101 Mod representation v -> i based on inputs 
Changing #101 Mod representation i -> d based on output 
Changing #98 Sub representation v -> s based on inputs 
Changing #98 Sub representation s -> i based on use requirements 
Changing #84 Add representation v -> i based on inputs 
... 

这个

[marking 000002C17CAAB341 <JS Function f_plus (SharedFunctionInfo 00000278933F6289)> for optimized recompilation, reason: small function, ICs with typeinfo: 8/12 (66%), generic ICs: 0/12 (0%)] 
[compiling method 000002C17CAAB341 <JS Function f_plus (SharedFunctionInfo 00000278933F6289)> using Crankshaft OSR] 
#37 Phi is used by real #110 Branch as v 
#38 Phi is used by real #58 Add as t 
#38 Phi is used by real #69 StackCheck as v 
#38 Phi is used by real #70 LoadContextSlot as v 
#38 Phi is used by real #122 CompareGeneric as t 
#38 Phi is used by real #132 LoadGlobalGeneric as v 
#38 Phi is used by real #134 LoadNamedGeneric as v 
#38 Phi is used by real #136 LoadGlobalGeneric as v 
#38 Phi is used by real #141 CallWithDescriptor as v 
#38 Phi is used by real #156 Return as v 
#38 Phi is used by real #101 Mod as t 
#38 Phi is used by real #98 Add as t 
#38 Phi is used by real #95 StackCheck as v 
#38 Phi is used by real #84 Add as t 
#47 Phi is used by real #56 ForceRepresentation as s 
#49 Phi is used by real #122 CompareGeneric as t 
#77 Phi is used by real #83 ForceRepresentation as s 
generalizing use representation 'v' of #40 Phi with uses of #47 Phi 's' 
generalizing use representation 'v' of #42 Phi with uses of #49 Phi 't' 
generalizing use representation 't' of #42 Phi with uses of #78 Phi 'v' 
generalizing use representation 'v' of #49 Phi with uses of #78 Phi 'v' 
generalizing use representation 'v' of #78 Phi with uses of #49 Phi 't' 
Changing #101 Mod representation v -> i based on inputs 
Changing #98 Add representation v -> s based on inputs 
Changing #98 Add representation s -> i based on use requirements 
Changing #84 Add representation v -> i based on inputs 
... 

有趣的差别该行

Changing #101 Mod representation i -> d based on output 

只存在于f_minus中,但不存在f_plus的情况。出于某种原因,编译器认为在f_minus中,case类型应该是Double而不是Integer,以猜测输出值。有趣的是,如果我改变行

 var ttt = (i - j) % 2 

 var ttt = (i - j + gridSize) % 2; 

再次开始产生快速mod-by-power-of-2-i代码。所以是的,它很可能是输出值影响优化编译器。但目前还不清楚为什么会在这种特殊情况下发生。

乍一看,这种行为看起来像是优化编译器中的一个错误,它没有注意到mod-by-power-of-2-i也可以处理“负”情况。我无法追踪为什么会发生这种情况,只是看了一下源代码,因此欢迎进一步的输入。

+0

很好的答案!然而,似乎'(-i-j)%3'而不是'(+ i + j)%3'会导致相当的性能下降。我想,专门的“2-i-mod-by-power-power”的效率只是战斗的一半。也许保留divals符号的必要性('-1%2'是'-1',你知道)强制实现基于浮点的实现或者双rax = -rax;(它有它的价格)... –

+0

@斯坦尼斯拉夫·卡拉林的关键区别显然不在于2-i-mod-by-power-of-2-i中,这有一些额外的好处。重要的区别是整数与浮点运算。对于'+3'来说''%3'生成的mod-by-const-i'代码比'mod-by-power-of-2-i'效率低一点,但仍然比'mod -d'。而这个(整数与浮点数)就是我在我的(未完成的)调查的后半部分指出的。 – SergGr

+0

@StanislavKralin,至于保留符号,我在我的回答中明确提到'mod-by-power-of-2-i'可以处理负值(参见** 2。**以上的段落)。所以问题不在于那个代码,而是在类型推断阶段有一些更高级别的逻辑错误地猜测。 – SergGr

的模数,而不是使用昂贵的模块化操作

((i - j) % 2 === 0) 

可以使用位运算

(((i-j)&1) === 0) 

正如意见建议的SBS你也应该尝试

(((i^j)&1) === 0) 
+0

不错的建议,兄弟。但任何想法为什么是减法搞砸? –

+0

尝试一下建议并测量它的收益。然后我们可以进一步分析。 – SBS

+0

谢谢很多我不知道这个诀窍,最后加快了程序605毫秒mozilla @sbs –

我的测试(avera 5 GE运行的每个)在的NodeJS显示

(i - j) % 2 // 1170ms 
(i + j) % 2 // 720ms 
Math.abs(i - j) % 2 // 720ms 
Math.abs(i + j) % 2 // 720ms 
(gridSize + i + j) % 2 // 715ms 
(gridSize + i - j) % 2 // 710ms 
(-i - j) % 2 // 1500ms 

有些奇怪那里,大惊喜呼吁Math.absi + j情况下几乎为零的效果,但更令人惊讶的是,加入gridSize使gridSize + i - j情况下,最快的! !

但我可以从它的主要问题是

(i - j) % 2 

许多(i - j)< 0(其中一半?)

随着(-i - j)ALL< 0

结论:当面对负数的模操作时,性能d ecreases显著

笔记,你应该能够在浏览器中使用

console.time("looping"); 
... 
console.timeEnd("looping"); 

为好,这样你就可以运行相同的代码,而无需在浏览器中

不能确定使用performance.now()这个“基准”有多浓厚,但是

console.time("positive"); 
 
(function() { 
 
    var size = 100000; 
 
    var v = 0; 
 
    while (size--) { 
 
     v+=(+size)%2 
 
    } 
 
})(); 
 
console.timeEnd("positive"); 
 

 
console.time("negative"); 
 
(function() { 
 
    var size = 100000; 
 
    var v = 0; 
 
    while (size--) { 
 
     v+=(-size)%2 
 
    } 
 
})(); 
 
console.timeEnd("negative");

+0

我觉得这就是它 –

片段中最耗时的操作是字符串concat。所以我决定删除它并使用一个实际的数组。下面是2个实现,其中一个来自本书,另外一个来自您的代码。我的机器上的书本版本快两倍。此外,使这种变化加快约6倍的代码。也许数组大小在这里也有效果,我们应该测试一个没有数组的版本。

我的结论:问题实际上可能不是负模量,但其他一些部分可能会影响基准。

console.time("looping"); 
 

 
var board = []; 
 

 
var size = 5000; 
 

 
for (var y = 0; y < size; y++) { 
 
    board[y]=[]; 
 
    for (var x = 0; x < size; x++) { 
 
    if ((x + y) % 2 === 0) 
 
     board[y][x] = " "; 
 
    else 
 
     board[y][x] = "#"; 
 
    } 
 
    
 
} 
 

 
    
 
console.timeEnd("looping"); 
 

console.time("looping2"); 
 
    var gridSize = 5000, str = [], i = gridSize, j; 
 
    while (i--) { 
 
     str[i]=[]; 
 
     j = gridSize; 
 
     while (j--) { 
 
     if ((i - j) % 2 === 0) 
 
      str[i][j] = " "; 
 
     else 
 
      str[i][j] = "#"; 
 
     } 
 
    } 
 

 
    console.timeEnd("looping2");

你的主要问题是嵌套循环。尽量避免他们:

console.time("looping"); 
 
var gridSize = 5000; 
 
var board = ""; 
 
var firstLine = ""; 
 
var secondLine = ""; 
 

 
var counter = 0; 
 
var isBlack = true; 
 

 
while (counter < gridSize) { 
 
    firstLine += isBlack ? "#" : " "; 
 
    secondLine += isBlack ? " " : "#"; 
 
    isBlack = !isBlack ; 
 
    counter++ ; 
 
} 
 

 
var counter = 0; 
 
var isBlack = false; 
 
while (counter < gridSize) { 
 
    board = board + (isBlack ? firstLine : secondLine) + "\n"; 
 
    isBlack = !isBlack; 
 
    counter++; 
 
} 
 
// console.log(board); 
 
console.timeEnd("looping");

存在着更高效的算法,以填补大串短重复的模式。但是,代码太复杂了。我希望这个算法用于padStart()的实现。 (Windows 7中,Firefox的54.0 64位)

console.time("looping"); 
 
var gridSize = 5000; 
 
var firstLine = "".padStart(gridSize, " #"); 
 
var secondLine = "".padStart(gridSize, "# "); 
 
var board = "".padStart(gridSize * (gridSize + 1), firstLine + "\n" + secondLine + "\n"); 
 
// console.log(board); 
 
console.timeEnd("looping");

我的机器上结果:

+------------------------+---------+ 
|   Method   | Time | 
+------------------------+---------+ 
| nested loops, (-i-j)%2 | 1500 ms | 
| nested loops, (i+j)%2 | 500 ms | 
| nested loops, (i^j)&1 | 500 ms | 
| manual filling   | 2 ms | 
| padStart()    | 1 ms | 
+------------------------+---------+ 

顺便说一句,国际棋联规则says

将棋盘放置在玩家之间,使得玩家右边角落附近的 为白色。