整理自CSE 444 Database Internals, Spring 2019 的课程Lectures，课程地址：https://courses.cs.washington.edu/courses/cse444/19sp/

Lecture5-6 Indexing

Searching in a Heap File

CSE444: Database Systems Internals notes3

Heap File Search Example

10,000 students

10 student records per page

Total number of pages: 1,000 pages

Find student whose sid is 80 – Must read on average 500 pages

Find all students older than 20 – Must read all 1,000 pages

Can we do better?

Sequential File

CSE444: Database Systems Internals notes3

Sequential File Example

Total number of pages: 1,000 pages

Find student whose sid is 80 – Could do binary search, read log2(1,000) ≈ 10 pages

Find all students older than 20 – Must still read all 1,000 pages

Can we do even better?

Note: Sorted files are inefficient for inserts/deletes

Creating Indexes in SQL

CSE444: Database Systems Internals notes3

Outline

CSE444: Database Systems Internals notes3

Indexes

Index: data structure that organizes data records on disk to optimize selections on the search key fields for the index

An index contains a collection of data entries, and supports efficient retrieval of all data entries with a given search key value k

Indexes are also access methods!

So they provide the same API as we have seen for Heap Files
And efficiently support scans over tuples matching predicate on search key

CSE444: Database Systems Internals notes3

Search key = can be any set of fields

not the same as the primary key, nor a key

Index = collection of data entries

Data entry for key k can be:

(k, RID)
(k, list-of-RIDs)
The actual record with key k: In this case, the index is also a special file organization,Called: “indexed file organization”

Page Format Approach 2

CSE444: Database Systems Internals notes3

Different Types of Files

For the data inside base relations:

Heap file (tuples stored without any order)
Sequential file (tuples sorted on some attribute(s))
Indexed file (tuples organized following an index)

Then we can have additional index files that store (key,rid) pairs

Index can also be a “covering index”

Index contains (search key + other attributes, rid)
Index suffices to answer some queries

Primary Index

CSE444: Database Systems Internals notes3

Primary Index with Duplicate Keys

CSE444: Database Systems Internals notes3

Secondary Indexes

CSE444: Database Systems Internals notes3

Clustered vs. Unclustered Index

CSE444: Database Systems Internals notes3

Clustered/Unclustered

Primary index = clustered by definition

Secondary indexes = usually unclustered

Index Classification Summary

CSE444: Database Systems Internals notes3

Large Indexes

What if index does not fit in memory?

Would like to index the index itself

Hash-based index
Tree-based index

Hash-Based Index

CSE444: Database Systems Internals notes3

Tree-Based Index

How many index levels do we need?

Can we create them automatically? Yes!

Can do something even more powerful!

B+ Trees

Search trees

Idea in B Trees

Make 1 node = 1 page (= 1 block)

• Idea in B+ Trees

Keep tree balanced in height – dynamic rather than static
Make leaves into a linked list : facilitates range queries

CSE444: Database Systems Internals notes3

B+ Trees Basics

Parameter d = the degree

Each node has d <= m <= 2d keys (except root)

Each node also has m+1 pointers

CSE444: Database Systems Internals notes3

Each leaf has d <= m <= 2d keys:

CSE444: Database Systems Internals notes3

B+ Trees Properties

For each node except the root, maintain 50% occupancy of keys

Insert and delete must rebalance to maintain constraints

Searching a B+ Tree

CSE444: Database Systems Internals notes3

B+ Tree Example

CSE444: Database Systems Internals notes3

B+ Tree Design

CSE444: Database Systems Internals notes3

B+ Trees in Practice

CSE444: Database Systems Internals notes3

Insertion in a B+ Tree

CSE444: Database Systems Internals notes3

Deletion in a B+ Tree

CSE444: Database Systems Internals notes3

Summary on B+ Trees

CSE444: Database Systems Internals notes3

CSE444: Database Systems Internals notes3

Lecture5-6 Indexing

Searching in a Heap File

Heap File Search Example

Sequential File

Sequential File Example

Creating Indexes in SQL

Outline

Indexes

Page Format Approach 2

Different Types of Files

Primary Index

Primary Index with Duplicate Keys

Secondary Indexes

Clustered vs. Unclustered Index

Clustered/Unclustered

Index Classification Summary

Large Indexes

Hash-Based Index

Tree-Based Index

B+ Trees

B+ Trees Basics

B+ Trees Properties

Searching a B+ Tree

B+ Tree Example

B+ Tree Design

B+ Trees in Practice

Insertion in a B+ Tree

Deletion in a B+ Tree

Summary on B+ Trees

相关推荐