CSE444: Database Systems Internals notes3

整理自CSE 444 Database Internals, Spring 2019 的课程Lectures,课程地址:https://courses.cs.washington.edu/courses/cse444/19sp/

 

Lecture5-6 Indexing


Searching in a Heap File

CSE444: Database Systems Internals notes3

Heap File Search Example

10,000 students

10 student records per page

Total number of pages: 1,000 pages

Find student whose sid is 80 – Must read on average 500 pages

Find all students older than 20 – Must read all 1,000 pages

Can we do better?

Sequential File

CSE444: Database Systems Internals notes3

Sequential File Example

Total number of pages: 1,000 pages

Find student whose sid is 80 – Could do binary search, read log2(1,000) ≈ 10 pages

Find all students older than 20 – Must still read all 1,000 pages

Can we do even better?

Note: Sorted files are inefficient for inserts/deletes

 

Creating Indexes in SQL

CSE444: Database Systems Internals notes3

Outline

CSE444: Database Systems Internals notes3

Indexes

 

Index: data structure that organizes data records on disk to optimize selections on the search key fields for the index

An index contains a collection of data entries, and supports efficient retrieval of all data entries with a given search key value k

Indexes are also access methods!

  • So they provide the same API as we have seen for Heap Files
  • And efficiently support scans over tuples matching predicate on search key

CSE444: Database Systems Internals notes3

Search key = can be any set of fields

  • not the same as the primary key, nor a key

Index = collection of data entries

Data entry for key k can be:

  • (k, RID)
  • (k, list-of-RIDs)
  • The actual record with key k: In this case, the index is also a special file organization,Called: “indexed file organization”

 

Page Format Approach 2

CSE444: Database Systems Internals notes3

 

Different Types of Files

For the data inside base relations:

  • Heap file (tuples stored without any order)
  • Sequential file (tuples sorted on some attribute(s))
  • Indexed file (tuples organized following an index)

Then we can have additional index files that store (key,rid) pairs

Index can also be a “covering index”

  • Index contains (search key + other attributes, rid)
  • Index suffices to answer some queries

 

Primary Index

CSE444: Database Systems Internals notes3

CSE444: Database Systems Internals notes3

Primary Index with Duplicate Keys

CSE444: Database Systems Internals notes3

CSE444: Database Systems Internals notes3

Secondary Indexes

CSE444: Database Systems Internals notes3

CSE444: Database Systems Internals notes3

Clustered vs. Unclustered Index

 

CSE444: Database Systems Internals notes3

Clustered/Unclustered

Primary index = clustered by definition

Secondary indexes = usually unclustered

 

Index Classification Summary

CSE444: Database Systems Internals notes3

 

Large Indexes

What if index does not fit in memory?

Would like to index the index itself

  • Hash-based index
  • Tree-based index

Hash-Based Index

CSE444: Database Systems Internals notes3

 

Tree-Based Index

How many index levels do we need?

Can we create them automatically? Yes!

Can do something even more powerful!

 

B+ Trees

Search trees

Idea in B Trees

  • Make 1 node = 1 page (= 1 block)

• Idea in B+ Trees

  • Keep tree balanced in height – dynamic rather than static
  • Make leaves into a linked list : facilitates range queries

CSE444: Database Systems Internals notes3

 

B+ Trees Basics

Parameter d = the degree

Each node has d <= m <= 2d keys (except root)

Each node also has m+1 pointers

CSE444: Database Systems Internals notes3

Each leaf has d <= m <= 2d keys:

CSE444: Database Systems Internals notes3

 

B+ Trees Properties

For each node except the root, maintain 50% occupancy of keys

Insert and delete must rebalance to maintain constraints

 

Searching a B+ Tree

CSE444: Database Systems Internals notes3

 

B+ Tree Example

 

CSE444: Database Systems Internals notes3

B+ Tree Design

CSE444: Database Systems Internals notes3

B+ Trees in Practice

CSE444: Database Systems Internals notes3

 

Insertion in a B+ Tree

CSE444: Database Systems Internals notes3

CSE444: Database Systems Internals notes3

CSE444: Database Systems Internals notes3

CSE444: Database Systems Internals notes3

CSE444: Database Systems Internals notes3

CSE444: Database Systems Internals notes3

CSE444: Database Systems Internals notes3

Deletion in a B+ Tree

CSE444: Database Systems Internals notes3

CSE444: Database Systems Internals notes3

CSE444: Database Systems Internals notes3

CSE444: Database Systems Internals notes3

CSE444: Database Systems Internals notes3

 

CSE444: Database Systems Internals notes3

CSE444: Database Systems Internals notes3

CSE444: Database Systems Internals notes3

 

Summary on B+ Trees

CSE444: Database Systems Internals notes3