Skip to content

02_On Disk B Tree Design

youngSSS edited this page Oct 30, 2020 · 2 revisions

Disk File Layout

1. File Format

Data file's offset 0 - 4095 is always 'header_page'

All page's size is fixed as 4096 bytes.

So, the start offset of page in data file is 4096 * pagenum.

image

Page Layout

All page sizes are 4096 bytes. 4 kinds of pages.

1. Header Page

image

2. Free Page

image

3. Internal Page

image

4. Leaf Page

image

5. Page Header

Internal Page Header

image

Leaf Page Header

image

Page Implementation

union is used to integrate each page structure.

typedef struct page_t {
	union {
		headerPage h;
		freePage f;
		page p;
	};
} page_t;

Each page is structured in different way.

1. Header Page

typedef struct headerPage {
	pagenum_t free_pagenum;
	pagenum_t root_pagenum;
	uint64_t num_pages;
	// HEADER_PAGE_RESERVED == 4096 - 24
	char reserved[HEADER_PAGE_RESERVED];
} headerPage;

2. Free Page

typedef struct freePage {
	pagenum_t next_free_pagenum;
	// FREE_PAGE_RESERVED == 4096 - 8
	char reserved[FREE_PAGE_RESERVED];
} freePage;

3. Internal Page & Leaf Page

Internal & Leaf page are structured in same struct.

There are 2 differences between Internal page and Leaf page.

  1. one_more_pagenum (8 bytes) & right_sibling_pagenum (8 bytes)
  2. 248 i_records (3968 bytes) & 31 l_records (3968 bytes)

union is used to integrate them.

typedef struct page {
	// Start of page header
	pagenum_t parent_pagenum;
	uint32_t is_leaf;
	uint32_t num_keys;
	// PAGE_HEADER_RESERVED == 128 - 24
	char reserved[PAGE_HEADER_RESERVED];
	union {
		// Internal page
		pagenum_t one_more_pagenum;
		// Leaf page
		pagenum_t right_sibling_pagenum;
	};
	// End of page header

	union {
		// Internal page : 248 key-pagenum pairs
		internalRecord i_records[NUM_INTERNAL_RECORD];
		// Leaf page : 31 key-value pairs
		leafRecord l_records[NUM_LEAF_RECORD];
	};
} page;

Record Layout

2 kinds of record.

1. Internal Record

image

2. Leaf Record

image

Record Implemetation

1. Internal Record

typedef struct internalRecord {
	int64_t key;
	pagenum_t pagenum;
} internalRecord;

2. Leaf Record

typedef struct leafRecord {
	int64_t key;
	// VALUE_SIZE == 120
	char value[VALUE_SIZE];
} leafRecord;

I/O Timing

3 cases

1. Open

When opening the data file from disk, I/O is happening.

2. Read

When reading page from the data file, I/O is happening.

(Read page, when the content of page is needed)

3. Write

When writing page to the data file, I/O is happening.

(Write page, when the content of page is changed)

Disk Manager

disk_manager.c is a Disk Manager.

'Only' Disk Manager calls system call.

image

File Manager

file.c is a File Manager.

File Manager manages the data file.

File Manager calls Disk Manager to access data file.

image

Index Manager

bpt.c is Index Manager.

Index Manager manages index of data file by maintaining b+ tree structure.

image

Index Manager calls File Manager in below 6 cases.

Case 1 : To get new page

→ During Insertion, if new page is needed, get usable new page number by calling File Manager (file_alloc_page()).

Case 2 : To free an page

→ If page gets empty during a deletion, free a page by calling File Manager (file_free_page(pagenum)).

Case 3 : To read from page

→ If page reading is needed, call the File Manger(file_read_page(pagenum, dest)) .

Case 4 : To write to page

→ If page writing is needed, call the File Manger (file_write_page(pagenum, src)).

Case 5 : To open the data file

→ Open the data file by calling the File Manager (open_file(pathname)).

DB APIs

db_api.c is DB APIs.

DB APIs are used as interface between user and data file.

User can open, insert, delete, find or print data file through below DB APIs.

DB APIs calls Index Manager.

image

Flow Summary

image

Delayed Merge

Do not merge page until page's number of keys become 0.

2 cases

/* Case : key_page is leaf page */

if (key_page->p.is_leaf)
    return coalesce_nodes(parent, key_page, neighbor, neighbor_flag, k_prime);

/* Case : key_page is internal page */

else {
    if (neighbor->p.num_keys == internal_order - 1)
        return redistribute_nodes(parent, key_page, neighbor, neighbor_flag, k_prime_index, k_prime);

    else
        return coalesce_nodes(parent, key_page, neighbor, neighbor_flag, k_prime);
}

Case 1 : key_page is Leaf Page

Only do the coalesce.

Because, leaf page can coalesce regardless of neighbor page's number of keys.

Case 2 : key_page is Internal Page

2 cases

  1. Do the redistribute, when neighbor page's number of keys is internal_order - 1.

    It means neighbor has no room for page number in key_page's one_more_pagenum. So, page number can not move to neighbor page.

    In this case, redistribute is happened.

    Neighbor page gives half of record to key_page.

  2. Do the coalesce, when neighbor page's number of keys is less than internal_order - 1.

    It means neighbor has room for page number in key_page's one_more_pagenum.