# Task No.4 Index Maintenance

1. Upload the rest 20% dataset from Task1.
2. Check index fragmentation (for indexes from Task 3)
3. Do index maintenance (REBUILD/REORGANIZE)
4. Check index fragmentation again.
5. Make a conclusion.


#### Student: Jingyu Yan


## Solution

Due to system differences, I use mysql for this experiment, and some data and operations may be different from sqlserver in the experiment process.

### 1. Upload the Rest Dataset

The remaining unused data is uploaded from the dataset in task1. The main data table involved is "event", because it is the key table to make the database conform to the database OLTP.


In MySQL, we have to employ the method of inserting data and then selectively deleting it to create significant fragmentation, unlike in SQL Server where simply adding data can lead to fragmentation. This approach is necessary because MySQL, particularly with storage engines like InnoDB and MyISAM, is more efficient in space allocation and reuse, often automatically compacting and optimizing the storage to minimize fragmentation. As a result, to simulate a high level of fragmentation in MySQL for experimental or testing purposes, a more involved method of creating gaps and irregularities in the data — achieved through a combination of data insertion and random deletions — becomes necessary. This contrasts with SQL Server, where the internal mechanisms for managing space can result in fragmentation more readily just from data additions.

### 2. Check index fragmentation

Due to differences in database architecture, MySQL and SQL Server exhibit significant variations in certain features.  For instance, unlike SQL Server, MySQL does not offer a direct way to view index fragmentation percentages.  This is largely because MySQL's internal mechanisms for managing and optimizing data storage differ from those of SQL Server.  In MySQL, especially with the InnoDB engine, the system automatically manages space allocation and optimization.  This automated process often results in a reduction of fragmented space, as MySQL continuously works to efficiently utilize and reclaim space, leaving less room for fragmentation.  Additionally, the concept and measurement of fragmentation in MySQL are handled differently, often requiring indirect methods such as examining the ratio of free space to the total space used by the table (DATA_FREE versus DATA_LENGTH + INDEX_LENGTH) to infer fragmentation levels.


Therefore, we need to manually implement certain functions to conduct this experiment. This includes manually calculating the fragmentation rate and disabling MySQL's automatic optimization of fragmented indexes to prevent any impact on experimental data.

In [10]:
from setting import get_engine
from sqlalchemy.orm import sessionmaker

# Create the engine
engine = get_engine()

Session = sessionmaker(bind=engine)
session = Session()

result = session.execute("SHOW TABLE STATUS LIKE 'events'")
table_status = result.fetchone()

# Print data
print("Table Name:", table_status['Name'])
print("Data Length:", table_status['Data_length'])
print("Index Length:", table_status['Index_length'])
print("Data Free:", table_status['Data_free'])

session.close()

Table Name: events
Data Length: 21991216
Index Length: 36086784
Data Free: 19404624


- **Data Length:** This value represents the total size of the data stored in the table, measured in bytes. It encompasses the actual data contained within the rows of the table.

- **Index Length:** This indicates the total size of all the indexes for the table, also measured in bytes. Indexes are used to improve the speed of data retrieval operations on a database table, but they also consume space.

- **Data Free:** This field shows the amount of space in the table that is currently unused or available for new data. In bytes, it represents the space that has been allocated but is not currently being used, often resulting from deleted or moved data within the table. This space can be reused for future data storage. A higher value in this field can indicate more fragmentation.

So, we calculate it this way: FRAGMENTATION_RATE = DATA_FREE / DATA_LENGTH + INDEX_LENGTH

In [11]:
# The index fragmentation rate was calculated manually
total_length = table_status["Data_length"] + table_status["Index_length"]
data_free = table_status["Data_free"]
fragmentation_percentage = (data_free / total_length) * 100 if total_length else 0

print(f"fragmentation: {round(fragmentation_percentage, 2)}%")

fragmentation: 33.41%


The index fragmentation rate of the event table is 33.41%.

### 3. Do index maintenance

In MySQL, the equivalent of SQL Server's "REBUILD" and "REORGANIZE" index operations are as follows:

- **REBUILD:** Use OPTIMIZE TABLE for both MyISAM and InnoDB engines. This command rebuilds the table and indexes, which can reduce fragmentation. Its effect varies slightly between MyISAM and InnoDB due to their different optimization features. 


- **REORGANIZE:** MySQL doesn't have a direct equivalent. The storage engines in MySQL, particularly InnoDB, automatically manage and optimize data storage, typically making a specific "REORGANIZE" operation unnecessary.

For indexing in MySQL, we use the OPTIMIZE command to optimize tables. This command helps in defragmenting the table, improving performance. It works by restructuring the table and rebuilding the index in MyISAM, and compacting data in InnoDB, ensuring efficient space usage and better performance.

In [13]:
Session = sessionmaker(bind=engine)
session = Session()

# Optimizing index Fragmentation
session.execute("OPTIMIZE TABLE events")
session.close()

<sqlalchemy.engine.result.ResultProxy at 0x7fbe5af8fc50>

### 4. Check index fragmentation again.

After performing index fragmentation optimization, we proceed to check the index fragmentation again, executing the same SQL query as used in the previous steps.

In [18]:
Session = sessionmaker(bind=engine)
session = Session()

result = session.execute("SHOW TABLE STATUS LIKE 'events'")
table_status = result.fetchone()

# The index fragmentation rate was calculated manually
total_length = table_status["Data_length"] + table_status["Index_length"]
data_free = table_status["Data_free"]
fragmentation_percentage = (data_free / total_length) * 100 if total_length else 0

print(f"fragmentation: {round(fragmentation_percentage, 2)}%")

session.close()

fragmentation: 0.0%


After optimization, the index fragmentation rate dropped directly to 0, indicating effective reorganization and efficient space utilization in the table.

### 5. Make a conclusion.

- **Effective Optimization:** The experiment demonstrated that the OPTIMIZE TABLE command in MySQL effectively reduces index fragmentation. After the optimization, the fragmentation rate dropped significantly, indicating a successful reorganization of the table data and indexes.


- **Efficient Space Utilization:** The reduction in fragmentation rate to 0 suggests that the space within the table is now being used more efficiently. This implies fewer gaps and a more compact data structure.


- **Performance Improvement:** Lower fragmentation typically correlates with improved performance, especially in read-intensive operations. The optimized table is likely to exhibit better efficiency in data retrieval.


- **Database Health:** Regular optimization, as shown in this experiment, can be a crucial aspect of maintaining database health, particularly for tables subject to frequent insertions, deletions, and updates.


- **Adaptability of MySQL:** The results also highlight MySQL’s adaptability in handling storage and indexing efficiently, showcasing its capability to automatically manage data storage and optimize performance.