这边是我OS Three Easy Piece的阅读笔记，全书见：
http://pages.cs.wisc.edu/~remzi/OSTEP/

本文包含书中第二章至第十章的内容，主要介绍了OS如何实现在单个CPU上运行多个进程。书本从两个角度来探讨这一问题：

Mechanism：Limited Direct Execution
这部分介绍了CPU虚拟化实现的基本原理，如何在保证OS对CPU绝对掌握权（为了保证每个process都能公平地享有资源&保证process private data不被其他process获取）的前提下，实现CPU的共享。
Policy：各种scheduling algorithm
这部分讨论了OS如何分配CPU使用权。从理想化的STCF（shortest time-to-completion first）的scheduling算法讲起，再讲到图灵奖得主Corbato提出的MLFQ，和Linux CFS和在multiprocessor上的scheduling algorithm。

以下是笔记内容：

Intro

What is OS

Virtualization: physical resources -> virtual resources
- CPU/ Memory
APIs: System call, allow users to use virtual machine
Resource Manager

Question

Virtualization: How does OS attain virtualization?
- policy, efficiency, hardware support
Concurrency: How to write program?
- OS mechanism/ hardware support
Persistence: Data can be easily lost, OS-> file system
- mechanism/ performance/ reliability

CPU Virtualization

The Abstraction: Process

RoadMap

process is a running program
Many processes share the CPU
- Mechanism(low-level): time-sharing, with context switch
- Policy(high-level): scheduling algorithm

Process in detail

a running program, summarized by the pieces of the system it accesses, such as:

Memory -> address space
Registers: Program counter, stack pointer, frame pointer
I/O information: a list of files the process opened

Process API

create, destroy, wait, miscellaneous control, status (how long it has run for, what state it is in)

Creation

Step 1. load code and static data from disk to memory

Modern OS loads lazily(only when data is needed)

Step 2. Allocate run-time stack, initialize args(argc, argv)

Step 3. Allocate heap

Step 4. Initialize file descriptors (stdin, stdout, stderr)

Step 5. Jump to main(), context switch to new process
Process States

Running, Ready, Blocked(waiting for I/O)

Example: Process creation in UNIX (Ch.5)

fork() create new process

The child process is almost same copy of its parent, but now with its own address space, child can access the file opened by parent, basically, everything the same!
wait()

Return child pid if any child process exit, if the parent process has no children, return code is -1
exec() execute another program

overwrite the code segment of the current process with the target executable

-> a successful exec() never return

Variants - p: include search path, e: environment, l: list args in c array

TIPS: man -S 2 pipe for system call!

Q: Why separating fork() and exec()? Why not having a system call creating a new process with another program?

A: For linux shell, allow modifying environment after fork() and before exec(). (I/O redirection)

Process Control and Users

Process control achieved by sending signal to process or process group

kill()

Process Control Block

Process list - PCB(Process Control Block)
- ready processes, blocked processes
About each process
- Context (physical registers)
- Process state(initial(Embryo), ready, running, blocked, final(zombie, exit but not cleaned up by parent))
- PID, parent process, opened files

Mechanism: Limited Direct Execution

Problem:

Performance: Implement virtualization without adding excessive overhead?
Control: OS should retain control, stop user from executing unwanted code.

Problem 1: Restricted Operations

Process may need restricted resources: I/O devices, CPU, memory

System call: will put the value into specific register(in assembly code), perform the trap instruction.

execute trap instruction: raise privilege level, save process value(PC, flags,…) in kernel stack
trap table: trap handler for hardware interrupt, keyboard interrupt, and system call

system-call number is assigned to each system call.
Q1. Kernel code executed in kernel address space? Another process or not?

A: Yes, it is executed in kernel address space, but it is not a new process somehow… It may be treated as a special process.

Q2. What if you could install your own trap table? Could you take over the machine?

A: Might be troublesome since you can jump arbitrarily.

Q3. What is kernel stack and kernel register?

A: For a trap execution, the user stack can’t be used because it might be corrupted by malicious user process. The kernel stack is used by kernel code only, so it is safe. Kernel stack is in kernel address space and could not be accessed by the user.

Problem2: Switching Between Processes

How does OS regain control?

Solution 1: Wait for system call.

Solution 2: Timer interrupt. (Similar to system call, but called from time to time)

Concurrency? OS might disable interrupt during the handler.

Imbench is used for measuring the cost of context switch

OSTEP Note: Ch2-10, Intro & CPU Virtualization

Timer interrupt: reg saved by hardware since it is caused by hardware.

Q1. What happens when interrupt is disabled, will interrupt be lost?

A. Interrupt will be queued by the hardware but not serviced. But it depends on the hardware.

Scheduling

Metrics

Performance:
$T_{turnaround} = T_{completion}-T_{arrival}\\ T_{response} = T_{first\text{ }run}-T_{arrival}\\$
Fairness

Basic Algorithm

FIFO/ FCFS

Advantage:

Simple, easy to implement.
Low overhead
No starving
Optimal average turnaround time for same sized job

Disadvantage:

Convoy effect. Heavy weight consumers may come first. Poor average turnaround time for different sized jobs. (i.e. sensitive to arrival time)
Not responsive to interactive tasks

Shortest job first(SJF)

Advantage:

Optimal average turnaround time for works available simultaneously.

Disadvantage:

If short tasks arrive a bit late, it still causes convoy effects.
Bad for response time
Starve long jobs
Needs estimate of execution time

Shortest time-to-completion first (STCF)

Preemptive

Disadvantage:

Needs estimate of execution time

Round Robin

Advantages:

Decrease response time
No starvation

Disadvantage:

Increase turnaround time (especially for simultaneous, equal length).
Extra overhead.

Summary

There is tradeoff between turnaround time and response time. The basic algorithm is good for one or the other. If we want to balance this two, more advanced algorithm is needed.
If I/O is considered, overlap is needed to hide the time for I/O.
If no more oracle, SJF/ STCF is not useful.

The Multi-Level Feedback Queue(MLFQ)

Target: Short turnaround time + responsive + without oracle

Solution: Priority + Change priority based on history

Workload: short-running interactive jobs + long-running CPU-bound jobs(response time not important)

Rule 1. large priority first.

Rule 2. same priority Round Robin.

Change priority

Move down

Rule 3. initially highest priority.

Rule 4a. Use up an entire time slice, reduced priority. ->CPU bound

Rule 4b. Give up CPU, same priority. -> interactive job

Intent: Approximate SJF, the job is assumed to be a short job, slowly move down the priority queue if it is actually a long job. The job does not get penalty if it is interactive.

Problem:

Starvation
User can game the scheduler
Program change its behavior (CPU bound->interactive)

Priority Boost

Solve Problem 1 and Problem 3

Rule 5. After a while (S), all jobs are top priority.
Better Accounting

Sovle Problem 2.

Rule 4. Once a job uses up its time allotment, reduce priority.

Tuning MLFQ and Other Issues

#queues
Time slice per queue (shorter time slice for higher priority)
Priority boost time S

User advice is needed!

Proportional Share

Lottery Scheduling

Each job obtain certain percentage of CPU time

A: 0-74

B: 75-99

Scheduling is just random number from 0-99

User assign ticket currency
Ticket can be transferred between processes
Ticket inflation(get a boost in ticket) in cooperative environment

Tip: Organize the list with the process with highest tickets come first.

Q1. How to assign ticket?

A. Let user make the decision? Actually, this problem remain open.

Q2. Why not determinisitic?

A. Compared with stride scheduling, lottery scheduling has no global state and is easier for implementation.

Stride Scheduling

Proportional share with deterministic scheduler with stride.

The Linux Completely Fair Scheduler (CFS)

fair-share scheduling, efficient, scalable

Idea: Count vruntime of each process. Pick the process with lowest vruntime to run. (RR if there is a tie)

Control Parameters

sched_latency: total schedule latency, with the goal of giving each process at least one chance to run in that cycle

time slice = schedule latency/ # processes

min granularity = minimum value for time slice, to gurantee the time slice not too small
Weighting (Niceness)

adjust priority of the process

nice varies from -20 to +19, can be mapped to weight
$time\_slice_k=\dfrac{weight_k}{\sum_{i=0}^{n-1}weight_i}\cdot sched\_latency\\ vruntime_i = vruntime_i + \dfrac{weight_0}{weight_i}\cdot runtime_i$

Red-Black Trees

Use to store process, vruntime as the key
enhance efficiency

Avoid Starvation

Set the vruntime of the newly waked up process to be the minimum value in the tree

Multiprocessor Scheduling(Advanced)

How can we extend the idea of simple CPU to work on multiple processors?

Cache Affinity

Keep the process on the same CPU if possible.

Single-Queue Multiprocessor Scheduling (SQMS)

Single queue, take the process outside the queue

Advantage:

Simple, reuse the knowledge from single processor

Disadvantage:

Locking needed for accessing queue, introduce synchronization overhead
x cache affinity

Multi-Queue Multiprocessor Scheduling (MQMS)

Each process stick to one queue

Advantage:

cache affinity
scalable

Disadvantage:

load imbalance will cause problem -> need to migrate job, called work stealing
- Still need to decide how often to steal.

Linux Multiprocessor Scheduler

O(1)
- MLFQ with multiple queue
CFS with multiple queue
BF Scheduler
- Single queue
- Earliest Eligible Virtual Deadline First

OSTEP Note: Ch2-10, Intro & CPU Virtualization

Intro

What is OS

Question

CPU Virtualization

The Abstraction: Process

Process in detail

Process API

Process Control Block

Mechanism: Limited Direct Execution

Problem 1: Restricted Operations

Problem2: Switching Between Processes

Scheduling

Metrics

Basic Algorithm

FIFO/ FCFS

Shortest job first(SJF)

Shortest time-to-completion first (STCF)

Round Robin

Summary

The Multi-Level Feedback Queue(MLFQ)

Change priority

Tuning MLFQ and Other Issues

Proportional Share

Lottery Scheduling

Stride Scheduling

The Linux Completely Fair Scheduler (CFS)

Multiprocessor Scheduling(Advanced)

Cache Affinity

Single-Queue Multiprocessor Scheduling (SQMS)

Multi-Queue Multiprocessor Scheduling (MQMS)

Linux Multiprocessor Scheduler

相关推荐