CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

试图还原老师讲课的思路。

Switch statements

一个例子：

很多人会认为switch是decision trees，实际上二者不一样，决策树是一个一个case判断执行；switch当条件不满足时就不会执行，直到遇到一个满足条件的case就开始顺序往下执行，即使不匹配的case也会被执行。

也有人认为switch和if是等价的，其实不然。case后面是只能跟常量（const value）的,编译时会对 switch 进行优化，根据 case 标签后面的常量值，生成跳转表，只经过少数次数的比较，就可以跳到对应标签下面。所以，标签也是不能重复的。如果允许变量，switch 就退化成一连串的 if else, 对于一个数据也只能从头到尾地进行比较，也就失去了 switch 的意义。跳转表和逐个比较，这两种方式的复杂度差很多。

然后我们来看上面的例子，这里面包含了case的几种情形：
1.顺序往下，都有break：1,2,3

2.多个标签：5&6

3.落空的case：2

4.缺失的case：4

跳转表jump table

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

跳转表其实是一个指针数组。Code block与跳转表里的指针一一对应。

对于情形1：顺序往下，都有break：自然是一个指针对应一个code block，执行完return。

对于情形2：多个标签：多个指针指向同一个code block。

对于情形3：落空的case：会执行它对应的code block，但是会继续执行后面。

对于情形4：缺失的case：跳转表里没有其指针，当然也没有相应的code block。

那么这些case是怎样和指针数组的index相对应的呢？

还是说上面的例子，case 1，2 ，3分别是index 0, 1,2，case 4缺失，就指向default case，case 5，6重复了，就都是index3，default就是default标签(可能有误)。

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

case最高是6，这样只要x的值大于6的话就直接进入default。一个指针是4字节，4*%eax就是所需指针在的地址，

下面我们看看跳转表的结构：

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

每一个target需要4字节；基地址是.L4。

1.直接跳转：jmp .L2 直接跳转到label .L2

2.非直接跳转：jmp *.L4(,%eax,4) 跳转表的头是.L4，标签与头的距离时4的倍数，有效地址是.L4 + eax*4 （0 ≤ x ≤ 6）

对应关系如下：

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

从这张图我们可以看出x的值是从0开始的，缺失的0和4都是指向default的，标签是.L8；case 1 2 3是正常的，分别对应标签.L3,.L5.L9；case 5 6都指向标签.L7。另外，switch完成则跳转到标签.L2。

然后我们看看code block的内容：

x==1:

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

这里先取y，然后取z并计算y*z，然后跳到.L2

x==2,x==3:

处理Fall-Through，

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

可见，编译器是聪明的，case 2没有用到w的初始值，所以这里先计算w=y/z，然后跳到merge，也就是case 3 的主体部分。而case 3用到了w,所以先得到w的初始值，然后计算w的新值。

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

注：cltd指令：将%eax寄存器的值符号扩展32位到%edx寄存器，也就是说，如果%eax寄存器的二进制序列的最高位为0，则cltd指令将把%edx置为32个0，相反，如果%eax寄存器的二进制序列最高位为1，则cltd指令将会自从填充%edx寄存器为32个1。

这段汇编也很好理解了。

x==5,x=6:

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

同样好理解。

结束：

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

最后在提几句：

1.跳转表通过cases避免了顺序执行，它执行是Constant time，而不是linear time

2.可以利用跳转表处理tags缺失或者重复的情况

3.利用顺序执行或者跳转表处理fall-through

4.尽量别去初始化w，除非不可避免

x86-64 Switch Implementation

64位和32位本质上是一样的。不过指针变成8字节了。

总结一下，上节课和这节课所学的：

¢ C Control

§ if-then-else

§ do-while

§ while, for

§ switch

¢ Assembler Control

§ Conditional jump

§ Conditional move

§ Indirect jump

§ Compiler generates code sequence to implement more complex control

¢ Standard Techniques

§ Loops converted to do-while form

§ Large switch statements use jump tables

§ Sparse switch statements may use decision trees

IA 32 Procedures

IA32 Stack

理解栈就能理解function call，包括递归。

栈其实就是一块内存，一块遵循栈规则的内存。栈总是朝着低地址的方向生长。

栈的结构：

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

寄存器%esp，成为栈指针（Stack Pointer），存放着栈的最低地址也就是栈顶（Stack “Top”）。

PUSH操作：pushl Src

执行这句话的流程是，得到操作数src，然后将%esp的存放的地址值减4，最后把操作数写到当前%esp给的地址处。

POP操作：popl Dest

执行这句话的流程是，从%esp给的地址处读取值，然后将%esp加4，最后把值存到dest中。

利用栈完成call:

§ Push return address on stack

§ Jump to label

注：return address——Address of the next instruction right after call

利用栈完成return：

§ Pop address from stack

§ Jump to address

Stack Structure

栈帧结构：

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

栈有两个极其重要的寄存器：

Base pointer:栈帧的最高地址

Stack pointer:栈帧的最低地址（也就是栈生长的方向）

Base pointer位于当前栈帧和和调用它的栈帧之间

Stack pointer位于当前栈帧和未分配的内存区域（growth area）之间

理解帧结构：

首先我们要理解function call的caller（function that doing the calling）和callee（function that being called）是如何工作的。他们俩有各自的stack frame。caller开始调用callee时，首先要set up,然后caller要给callee操作数，所以操作数一定是在caller的栈帧里的，它要先把这些parameters记下来；最后，当callee完成时，它要return，return到它执行前的地方，继续执行下面的指令（instruction），那么谁知道下一个指令在哪里呢，答案是caller，等callee返回，caller继续完成下面的任务。因此，arguments（实参）和return address都在caller的栈帧里。而且顺序必须是arguments-->return address，因为caller要先提供操作数，不然callee不知道加减乘除什么东西，然后caller要提供return address，不然callee不知道要返回到什么地方。如果先写下返回地址，然后才写下arguments，那么返回时将读取 arguments，又调用一遍函数。

接下来就是调用函数了。并不是所有函数都会用到栈，比如一个加法函数，a+b返回结果，这个函数就只从caller那里读取a,b的值，但本身其栈的size为0，因为它没有存储任何东西，我只需要移动base pointer，然后pop base pointer。栈的维护（stack maintenance）由callee完成而不是由caller完成，caller不能替callee堆叠栈，因为caller不知道callee需要多少空间。（存疑？？）

caller（caller自然是其父辈的callee）运行时保存parents base pointer，保存后移动parents base pointer直到parents stack pointer.此时，caller就是一个size为0的栈帧了，如果它需要space，例如需要local variables,，只需将stack pointer往下移就可以了。

到这里我们就能理解为什么我们的第一个参数总是从地址8开始，一个是old base pointer，一个是return address。

看一个调用swap函数的例子

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

先存变量2，再存变量1.

第一句的subl是怎么来的？

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

这里就不解释了，有了第六节课的基础这里就能看懂。

知乎上一位老哥的举例详解：https://www.zhihu.com/search?type=content&q=%E5%87%BD%E6%95%B0%20%E6%A0%88

Calling Conventions

处理器里只有为数不多的register（8086 16个寄存器），但是使用频率非常高，register相当于硬件的变量。在call 之前所有寄存器随便用，但是一旦开始call了，就要把callee需要的寄存器释放出来(也就是把里面的数据都存到其他地方去)；callee也同样的，如果你要用caller的寄存器，可以，先把original value 存起来，等return时再恢复。

Conventions

§ “Caller Save”

§ Caller saves temporary values in its frame before the call

§ “Callee Save”

§ Callee saves temporary values in its frame before using

IA32/Linux+Windows Register Usage

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

%eax, %edx, %ecx：caller的寄存器，如果后续需要用哪些值，caller要在call之前把它们存起来；%eax也被用来返回整数值

%ebx, %esi, %edi：callee的寄存器，callee如果想用也是得先存起来

%esp, %ebp：callee保存的特殊值，退出程序时恢复为原始值

Illustrations of Recursion & Pointers

Recursion

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

pcount_r:

pushl %ebp #保存old base pointer

movl%esp, %ebp #将stack pointer赋值给base pointer

pushl %ebx #%ebx是callee的寄存器，所以用之前先保存下原始值

subl$20, %esp #---为递归call分配空间----（中间是有未使用的栈空间的）

movl8(%ebp), %ebx # 得到x的值，存在%ebx

movl$0, %eax #%eax赋为0

testl %ebx, %ebx #相与 & ,这里应该是testl %ebx, %eax 吧

je .L3 #=0的话跳转到标签.L3

movl%ebx, %eax #将x的值赋给%eax

shrl%eax #将x右移1位

movl%eax, (%esp) #将移位后的结果存到%esp指向的空间（传递参数）

call pcount_r #递归调用pcount_r

movl%ebx, %edx #将x赋给%edx

andl$1, %edx #x&1

addl%edx, %eax #相加，结果存在%eax中

.L3:

addl$20, %esp #---恢复%esp--

popl%ebx #恢复%ebx

popl%ebp #恢复%ebp

ret

关于递归Recursion：

1.处理起来没什么特别的，每一个function call 都有自己的私有存储（registers & local variables/return pointer）；遵循saving conventions；Last-In, First-Out

2.混合递归也是一样，P calls Q; Q calls P

Pointer

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

add3:

pushl%ebp

movl %esp, %ebp

subl $24, %esp # Alloc. 24 bytes

movl 8(%ebp), %eax

movl %eax, -4(%ebp)# Set localx to x

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

movl $3, 4(%esp) # 2nd arg = 3

leal -4(%ebp), %eax # &localx 用lea指令计算变量localsx的地址

movl %eax, (%esp) # 1st arg = &localx

call incrk

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

movl -4(%ebp), %eax # Return val= localx

leave

ret

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

IA32 Optimization: Inlining

将两个函数合并

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

add3:

pushl %ebp

movl %esp, %ebp

movl 8(%ebp), %eax

addl $3, %eax

popl %ebp

ret

Summary

1.call/return都是由stack来完成

2.Recursion也是遵循正常的调用规则

3.Pointers是value的地址

CMU15213_Lecture_07_Machine Level Programming III：Switch Statements and IA32 Procedures

Switch statements

IA 32 Procedures

Stack Structure

Calling Conventions

Illustrations of Recursion & Pointers

Summary

相关推荐