# Project 2

## Dataset 1: MDM Inventory
ABC Inc. allows employees to use both organizational devices and BYOD (bring your own device) for work. However, for employees to use their personal devices for work, the devices must be enrolled and in compliance with ABC's MDM (Mobile device management) tools. The following dataset records devices that have been enrolled to ABC Inc's MDM.

The objective of the script is to reframe the dataset into clean and meaningful information for IT department to keep track of number, models, operating system versions, and compliance status of company and BYOD devices.

In [1]:
# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Import devices csv
devices = pd.read_csv("dummy_devices.csv", encoding='utf-8')
devices.head(20)

Unnamed: 0,Device Name,Device Type,Operating Systems,Owner,Username,MDM Compliance,Registration Time,Device Model
0,Laptop_05,Laptop,Windows 10.0.19044,Ruby Brewer,rb.wr@icloud.org,True,3/3/2022 15:40,Lenovo ThinkPad E16 Gen 2
1,Desktop_66,Desktop,Windows 10.0.19045,Fritz Spence,velit@yahoo.couk,True,6/13/2022 17:07,HP Z2 Mini G9
2,Griffin’s iPhones,Smartphone,iOS 17.5.1,Griffin Lambert,ac.feugiat@hotmail.com,True,9/23/2022 15:05,iPhone 11 Pro
3,Jade’s iPhone,Smartphone,iOS 17.6.1,Jade Rowe,eu.metus@protonmail.couk,True,9/23/2022 19:06,iPhone 12 Pro Max
4,Desktop_32,Desktop,Windows 11.0.26100,Salvador Nash,dignissim@yahoo.couk,True,12/8/2022 20:40,HP Z2 Mini G9
5,Desktop_01,Desktop,Windows 10.0.19045,,,,12/11/2022 18:25,Lenovo ThinkCentre M710
6,Desktop_87,Desktop,Windows 11.0.22631,Simon English,sed.turpis@google.ca,True,12/12/2022 12:45,Dell OptiPlex 7000 Micro PC
7,Laptop_19,Laptop,Windows 10.0.19045,Mia Ayala,aenean@hotmail.couk,False,12/13/2022 13:15,Lenovo ThinkPad X1 Yoga Gen 8
8,Laptop_34,Laptop,Windows 10.0.19044,,,,12/18/2022 0:01,
9,Teagan’s iPhone,Smartphone,iOS 17.5.1,Teagan Serrano,facilisi.sed.neque@yahoo.edu,True,1/31/2023 16:24,iPhone 11


1. drop any device without username.

In [2]:
devices.dropna(subset=["Username"], inplace=True)
devices.head(20)

Unnamed: 0,Device Name,Device Type,Operating Systems,Owner,Username,MDM Compliance,Registration Time,Device Model
0,Laptop_05,Laptop,Windows 10.0.19044,Ruby Brewer,rb.wr@icloud.org,True,3/3/2022 15:40,Lenovo ThinkPad E16 Gen 2
1,Desktop_66,Desktop,Windows 10.0.19045,Fritz Spence,velit@yahoo.couk,True,6/13/2022 17:07,HP Z2 Mini G9
2,Griffin’s iPhones,Smartphone,iOS 17.5.1,Griffin Lambert,ac.feugiat@hotmail.com,True,9/23/2022 15:05,iPhone 11 Pro
3,Jade’s iPhone,Smartphone,iOS 17.6.1,Jade Rowe,eu.metus@protonmail.couk,True,9/23/2022 19:06,iPhone 12 Pro Max
4,Desktop_32,Desktop,Windows 11.0.26100,Salvador Nash,dignissim@yahoo.couk,True,12/8/2022 20:40,HP Z2 Mini G9
6,Desktop_87,Desktop,Windows 11.0.22631,Simon English,sed.turpis@google.ca,True,12/12/2022 12:45,Dell OptiPlex 7000 Micro PC
7,Laptop_19,Laptop,Windows 10.0.19045,Mia Ayala,aenean@hotmail.couk,False,12/13/2022 13:15,Lenovo ThinkPad X1 Yoga Gen 8
9,Teagan’s iPhone,Smartphone,iOS 17.5.1,Teagan Serrano,facilisi.sed.neque@yahoo.edu,True,1/31/2023 16:24,iPhone 11
10,Kermit’s iPhone,Smartphone,iOS 17.6.1,Kermit Knapp,a.nunc.in@google.net,True,2/23/2023 17:59,iPhone 13 Pro Max
12,Ella’s iPad,Tablet,iPadOS 16.3.1,Ruby Brewer,rb.wr@icloud.org,False,3/2/2023 18:49,iPad Mini 3rd


2. Separate the Operating systems into two variables. Operating System and OS Version. We need to record them separately so that it is easier to manage the OS the MDM supports, and to set up different requirements for different versions of each OS.

In [3]:
# split the Operating Systems into OS_type and os_version
devices[["OS Type","OS Version"]] = devices["Operating Systems"].str.split(' ', expand=True)

devices.head()

Unnamed: 0,Device Name,Device Type,Operating Systems,Owner,Username,MDM Compliance,Registration Time,Device Model,OS Type,OS Version
0,Laptop_05,Laptop,Windows 10.0.19044,Ruby Brewer,rb.wr@icloud.org,True,3/3/2022 15:40,Lenovo ThinkPad E16 Gen 2,Windows,10.0.19044
1,Desktop_66,Desktop,Windows 10.0.19045,Fritz Spence,velit@yahoo.couk,True,6/13/2022 17:07,HP Z2 Mini G9,Windows,10.0.19045
2,Griffin’s iPhones,Smartphone,iOS 17.5.1,Griffin Lambert,ac.feugiat@hotmail.com,True,9/23/2022 15:05,iPhone 11 Pro,iOS,17.5.1
3,Jade’s iPhone,Smartphone,iOS 17.6.1,Jade Rowe,eu.metus@protonmail.couk,True,9/23/2022 19:06,iPhone 12 Pro Max,iOS,17.6.1
4,Desktop_32,Desktop,Windows 11.0.26100,Salvador Nash,dignissim@yahoo.couk,True,12/8/2022 20:40,HP Z2 Mini G9,Windows,11.0.26100


3. Same for the device models. To make it easier, separate the brand and the model of the device by separating the column. Split at the first occurance.

In [4]:
# split the Device Models into brand and model
devices[["Brand","Model"]] = devices["Device Model"].str.split(' ', n=1, expand=True)

devices.head()

Unnamed: 0,Device Name,Device Type,Operating Systems,Owner,Username,MDM Compliance,Registration Time,Device Model,OS Type,OS Version,Brand,Model
0,Laptop_05,Laptop,Windows 10.0.19044,Ruby Brewer,rb.wr@icloud.org,True,3/3/2022 15:40,Lenovo ThinkPad E16 Gen 2,Windows,10.0.19044,Lenovo,ThinkPad E16 Gen 2
1,Desktop_66,Desktop,Windows 10.0.19045,Fritz Spence,velit@yahoo.couk,True,6/13/2022 17:07,HP Z2 Mini G9,Windows,10.0.19045,HP,Z2 Mini G9
2,Griffin’s iPhones,Smartphone,iOS 17.5.1,Griffin Lambert,ac.feugiat@hotmail.com,True,9/23/2022 15:05,iPhone 11 Pro,iOS,17.5.1,iPhone,11 Pro
3,Jade’s iPhone,Smartphone,iOS 17.6.1,Jade Rowe,eu.metus@protonmail.couk,True,9/23/2022 19:06,iPhone 12 Pro Max,iOS,17.6.1,iPhone,12 Pro Max
4,Desktop_32,Desktop,Windows 11.0.26100,Salvador Nash,dignissim@yahoo.couk,True,12/8/2022 20:40,HP Z2 Mini G9,Windows,11.0.26100,HP,Z2 Mini G9


4. clean up the columns by removing the multi-variable columns.

In [5]:
# drop the old operating systems column and device model column
devices.drop(["Operating Systems",'Device Model'],inplace=True, axis=1)
devices.head()

Unnamed: 0,Device Name,Device Type,Owner,Username,MDM Compliance,Registration Time,OS Type,OS Version,Brand,Model
0,Laptop_05,Laptop,Ruby Brewer,rb.wr@icloud.org,True,3/3/2022 15:40,Windows,10.0.19044,Lenovo,ThinkPad E16 Gen 2
1,Desktop_66,Desktop,Fritz Spence,velit@yahoo.couk,True,6/13/2022 17:07,Windows,10.0.19045,HP,Z2 Mini G9
2,Griffin’s iPhones,Smartphone,Griffin Lambert,ac.feugiat@hotmail.com,True,9/23/2022 15:05,iOS,17.5.1,iPhone,11 Pro
3,Jade’s iPhone,Smartphone,Jade Rowe,eu.metus@protonmail.couk,True,9/23/2022 19:06,iOS,17.6.1,iPhone,12 Pro Max
4,Desktop_32,Desktop,Salvador Nash,dignissim@yahoo.couk,True,12/8/2022 20:40,Windows,11.0.26100,HP,Z2 Mini G9


1) count the number of devices each device type has.

In [6]:
devices['Device Type'].value_counts(ascending=True)

Device Type
Laptop         5
Desktop        5
Tablet         7
Smartphone    19
Name: count, dtype: int64

2) What is the minimum os version the mobile device management accepted for each os type?

First, I will filter to keep the devices that are MDM compliant (MDM Compliance == True).

Next, for each os type, the minimum os version will be listed.

In [7]:
compliant_devices = devices.loc[devices['MDM Compliance'] == True]
compliant_devices.groupby('OS Type')['OS Version'].min()

OS Type
Android            12
Windows    10.0.19044
iOS            17.5.1
iPadOS           17.4
Name: OS Version, dtype: object

3) Which users have more than one devices?

In [10]:
devices['Owner'].value_counts().loc[lambda x: x>1].index.tolist()

['Ruby Brewer', 'Jade Rowe', 'Simon English', 'Kermit Knapp']

Ruby Brewer, Jade Rowe, Simon English, and Kermit Knapp have more than one devices enrolled into the mobile device management system.


It took me about two hours to create the dummy dataset and one hour to clean it. To make the dataset easy to work with, I separated columns that store multiple variables into two separate columns. For example, I separated Operating systems into two variables: OS Type and OS Version. I recorded them separately so that it is easier to manage the OS the MDM supports, and to set up different requirements for different versions of each OS. If I did not use Python to tidy the data, I would have used Excel. Excel allows user to write customize functions, display graphs based on the data, replace, and filter to tidy data. However, it would take long time to split the columns that stored multiple variables into separate columns.

## Dataset 2: Salary

The following dataset shows name, position, job location, age, starting date, and salary of individuals from the Datatables.net

source: https://datatables.net/

By tidying the data, we can analyze job information such as the lowest, highest, and average salary based on position, job location, age, years of work experience,etc.

In [11]:
# Import salary csv
salary_df = pd.read_csv("salary.csv", encoding='utf-8')
salary_df.head(5)

Unnamed: 0,Name,Position,Office,Age,Start date,Salary
0,Airi Satou,Accountant,Tokyo,33,11/27/2008,"$162,700"
1,Angelica Ramos,Chief Executive Officer (CEO),London,47,10/8/2009,"$1,200,000"
2,Ashton Cox,Junior Technical Author,San Francisco,66,1/11/2009,"$86,000"
3,Bradley Greer,Software Engineer,London,41,10/12/2012,"$132,000"
4,Brenden Wagner,Software Engineer,San Francisco,28,6/6/2011,"$206,850"


First thing I notice is that the start date information can be more meaningful if I can transform it to years of working as current position.

I will get the current date, then find the difference in days between the start date and current date. Next, I will find the years of employment by dividing the days to 365 days per year, and round it down to the year.

In [12]:
# set current date
current_date = pd.to_datetime('now')

#calculate the Years of Employment
duration = (current_date - pd.to_datetime(salary_df['Start date']))
duration_years = np.floor(duration.dt.days/365).astype(int)

# Add the Years of Employment to the dataset
salary_df.insert(5, 'Years of Employment', duration_years)
salary_df.head()

Unnamed: 0,Name,Position,Office,Age,Start date,Years of Employment,Salary
0,Airi Satou,Accountant,Tokyo,33,11/27/2008,15,"$162,700"
1,Angelica Ramos,Chief Executive Officer (CEO),London,47,10/8/2009,15,"$1,200,000"
2,Ashton Cox,Junior Technical Author,San Francisco,66,1/11/2009,15,"$86,000"
3,Bradley Greer,Software Engineer,London,41,10/12/2012,12,"$132,000"
4,Brenden Wagner,Software Engineer,San Francisco,28,6/6/2011,13,"$206,850"


Next, the salary value should be in number format so that it is easier to perform calculations. 

The current salary column values are in string data type, since the value includes symbols such as '$' and ','

I will convert the salary column to number format.

In [13]:
# convert values in the salary column to int
salary_df["Salary"] = salary_df["Salary"].str.replace("$", "").str.replace(",", "").astype(int)

# rename the Salary column to include the $ as the currency unit
salary_df.rename(columns = {'Salary':'Salary ($)'}, inplace = True)
salary_df.head()

Unnamed: 0,Name,Position,Office,Age,Start date,Years of Employment,Salary ($)
0,Airi Satou,Accountant,Tokyo,33,11/27/2008,15,162700
1,Angelica Ramos,Chief Executive Officer (CEO),London,47,10/8/2009,15,1200000
2,Ashton Cox,Junior Technical Author,San Francisco,66,1/11/2009,15,86000
3,Bradley Greer,Software Engineer,London,41,10/12/2012,12,132000
4,Brenden Wagner,Software Engineer,San Francisco,28,6/6/2011,13,206850


Now we can analyze the salary dataset.

1) What is the highest average salary based on office (job location)?

In [14]:
salary_df.groupby('Office')['Salary ($)'].mean().round(2).sort_values(ascending=False)

Office
New York         326190.00
London           278271.67
San Francisco    235248.21
Edinburgh        232470.56
Singapore        228431.25
Tokyo            192785.00
Sydney            90500.00
Name: Salary ($), dtype: float64

The location with the highest average salary is New York.

2) What is the highest average salary based on job positon?

In [15]:
salary_df.groupby('Position')['Salary ($)'].mean().round(2).sort_values(ascending=False)

Position
Chief Executive Officer (CEO)    1200000.00
Chief Operating Officer (COO)     850000.00
Chief Financial Officer (CFO)     725000.00
Chief Marketing Officer (CMO)     675000.00
Director                          645750.00
Financial Controller              452500.00
Senior Javascript Developer       433060.00
Regional Director                 350650.00
Development Lead                  345000.00
Support Lead                      342000.00
Post-Sales support                324050.00
System Architect                  320800.00
Senior Marketing Designer         313500.00
Integration Specialist            265100.00
Team Leader                       235500.00
Personnel Lead                    217500.00
Javascript Developer              194250.00
Systems Administrator             170500.00
Accountant                        166725.00
Regional Marketing                163000.00
Support Engineer                  153858.33
Technical Author                  145000.00
Marketing Designer     

The position with highest salary is Chief Executive Officer (CEO).

3) Which position in the dataset has the most number of employees?

In [16]:
salary_df['Position'].value_counts()

Position
Software Engineer                6
Regional Director                5
Developer                        4
Support Engineer                 3
Sales Assistant                  3
Office Manager                   3
Integration Specialist           3
Accountant                       2
Javascript Developer             2
Marketing Designer               2
Systems Administrator            2
Senior Javascript Developer      1
Pre-Sales Support                1
Customer Support                 1
Personnel Lead                   1
Junior Technical Author          1
Chief Executive Officer (CEO)    1
Chief Operating Officer (COO)    1
Team Leader                      1
Secretary                        1
Senior Marketing Designer        1
Director                         1
Development Lead                 1
Junior Javascript Developer      1
Post-Sales support               1
Chief Financial Officer (CFO)    1
Technical Author                 1
Support Lead                     1
Data Coordi

Software Engineer is the most employeed job in this dataset.

4) What is the maximum years of employment for Software Engineer in this dataset?

In [17]:
# filter by position to only display software engineer salary information
SE_salary_df = salary_df[salary_df['Position'] == 'Software Engineer']
SE_salary_df.head()


max_SE_years_of_employment = SE_salary_df['Years of Employment'].max()
print("The maximum years of employement for Software Engineer in this dataset is ",max_SE_years_of_employment, " years.")

The maximum years of employement for Software Engineer in this dataset is  15  years.




I found the data from the discusion that the professor provide. It took me about 15 minutes to copy the data and convert it to a csv file. Then it look me about an hour to tidy the dataset into a meaningful dataset.


I transformed the start date information to years of employment. The start date is difficult to tell how many years an employees have worked for the current position. Changing it to years of employment can make the data more meaningful. For example, we can analyze the relationship between years of employment and salary increase.

Next, the original salary column values are in string data type, since the value includes symbols such as '$' and ','. I converted the salary column to number format so that it is easier to perform calculations.

If I did not use Python to tidy the data, I would have used Excel. Excel allows user to write customize functions, display graphs based on the data, perform calculations such as average, min, max. I would also use filters to choose the column values that I want to include to display. Since this salary dataset is does not have too many column variable and not too many rows, and that excel has many good functionalities to perform on currency calculations, I prefer to use Excel.

## Dataset 3: Shows
The following dataset is from the Streaming Originals Charts about the top ten shows between Oct. 4 – 10, 2024 and the previous week.

Source: https://variety.com/h/most-watched-streaming-originals-movies-tv-shows/

In [18]:
# Import shows csv
shows_df = pd.read_csv("shows.csv", encoding='utf-8')
shows_df

Unnamed: 0,Title,Season,Creator/Showrunner,Platform,Est. Min. Watched,Previous Est. Min. Watched,Percentage Change
0,Nobody Wants This,1,Erin Foster,Netflix,1.2B,1.7B,-37%
1,Monsters: The Lyle and Erik Menendez Story,1,"Ian Brennan, Ryan Murphy",Netflix,1.2B,1.7B,-33%
2,Mr. McMahon,1,,Netflix,297.5M,765.6M,-64%
3,Tulsa King,2,Taylor Sheridan,Paramount+,647.4M,568.5M,14%
4,The Perfect Couple,1,Jenna Lamia,Netflix,340.2M,523.8M,-36%
5,The Lord of the Rings: The Rings of Power,2,"John D. Payne, Patrick McKay",Prime Video,354.2M,358.5M,1%
6,Tulsa King,1,Taylor Sheridan,Paramount+,350.8M,228.5M,54%
7,Bad Monkey,1,Bill Lawrence,Apple TV+,272.1M,227.1M,5%
8,Love is Blind,7,Chris Coelen,Netflix,1.3B,204M,459%
9,The Great British Baking Show,12,"Anna Beattie, Richard McKerrow",Netflix,236.5M,171.8M,36%


The dataset is presented in a wide dataset form. However, if the dataset wants to include the previous 5 weeks of estmiated minitues watched, the data columns will become messy to include multiple columns of (previous x N) Est. Min. Watched variable.

I will remove the percentage change column, since I can perform the percentage calculation separately with other analysis about show performance based on the percentage change between Est. Min. Watched and Previous Est. Min. Watched.

I will also change the dataset to a long format.

In [19]:
# drop column
shows_df.drop('Percentage Change', axis=1, inplace=True)

# rename the column variable Est. Min. Watched to this week's date range 10/04/2024 - 10/10/2024.
# rename the column variable Previous Est. Min. Watched to previous week's date range 09/27/2024 - 10/03/2024.
shows_df.rename(columns={'Est. Min. Watched':'10/04/2024 - 10/10/2024','Previous Est. Min. Watched':'09/27/2024 - 10/03/2024'},inplace=True)

shows_df

Unnamed: 0,Title,Season,Creator/Showrunner,Platform,10/04/2024 - 10/10/2024,09/27/2024 - 10/03/2024
0,Nobody Wants This,1,Erin Foster,Netflix,1.2B,1.7B
1,Monsters: The Lyle and Erik Menendez Story,1,"Ian Brennan, Ryan Murphy",Netflix,1.2B,1.7B
2,Mr. McMahon,1,,Netflix,297.5M,765.6M
3,Tulsa King,2,Taylor Sheridan,Paramount+,647.4M,568.5M
4,The Perfect Couple,1,Jenna Lamia,Netflix,340.2M,523.8M
5,The Lord of the Rings: The Rings of Power,2,"John D. Payne, Patrick McKay",Prime Video,354.2M,358.5M
6,Tulsa King,1,Taylor Sheridan,Paramount+,350.8M,228.5M
7,Bad Monkey,1,Bill Lawrence,Apple TV+,272.1M,227.1M
8,Love is Blind,7,Chris Coelen,Netflix,1.3B,204M
9,The Great British Baking Show,12,"Anna Beattie, Richard McKerrow",Netflix,236.5M,171.8M


In [20]:
# converting dataset to a long table
long_shows_df = pd.melt(shows_df, id_vars=['Title', 'Season', 'Creator/Showrunner', 'Platform'], var_name="Week", value_name="Est. Min. Watched")
long_shows_df

Unnamed: 0,Title,Season,Creator/Showrunner,Platform,Week,Est. Min. Watched
0,Nobody Wants This,1,Erin Foster,Netflix,10/04/2024 - 10/10/2024,1.2B
1,Monsters: The Lyle and Erik Menendez Story,1,"Ian Brennan, Ryan Murphy",Netflix,10/04/2024 - 10/10/2024,1.2B
2,Mr. McMahon,1,,Netflix,10/04/2024 - 10/10/2024,297.5M
3,Tulsa King,2,Taylor Sheridan,Paramount+,10/04/2024 - 10/10/2024,647.4M
4,The Perfect Couple,1,Jenna Lamia,Netflix,10/04/2024 - 10/10/2024,340.2M
5,The Lord of the Rings: The Rings of Power,2,"John D. Payne, Patrick McKay",Prime Video,10/04/2024 - 10/10/2024,354.2M
6,Tulsa King,1,Taylor Sheridan,Paramount+,10/04/2024 - 10/10/2024,350.8M
7,Bad Monkey,1,Bill Lawrence,Apple TV+,10/04/2024 - 10/10/2024,272.1M
8,Love is Blind,7,Chris Coelen,Netflix,10/04/2024 - 10/10/2024,1.3B
9,The Great British Baking Show,12,"Anna Beattie, Richard McKerrow",Netflix,10/04/2024 - 10/10/2024,236.5M


However, the estimated minutes watched (Est. Min. Watched) are stored as string with different units, B representing billions, and M representing millions. This is format is inconvinient to perform calculations. I will change this column to be all in millions unit (M).

In [21]:
#converting Billion to Million, and droping the units
long_shows_df['Est. Min. Watched (Million)'] = long_shows_df['Est. Min. Watched'].apply(lambda x: float(x.rstrip(x[-1]))*1000 if ('B' in x) else float(x.rstrip(x[-1])))

#removing the old Est. Min. Watched column
long_shows_df.drop('Est. Min. Watched', axis=1, inplace=True)
long_shows_df

Unnamed: 0,Title,Season,Creator/Showrunner,Platform,Week,Est. Min. Watched (Million)
0,Nobody Wants This,1,Erin Foster,Netflix,10/04/2024 - 10/10/2024,1200.0
1,Monsters: The Lyle and Erik Menendez Story,1,"Ian Brennan, Ryan Murphy",Netflix,10/04/2024 - 10/10/2024,1200.0
2,Mr. McMahon,1,,Netflix,10/04/2024 - 10/10/2024,297.5
3,Tulsa King,2,Taylor Sheridan,Paramount+,10/04/2024 - 10/10/2024,647.4
4,The Perfect Couple,1,Jenna Lamia,Netflix,10/04/2024 - 10/10/2024,340.2
5,The Lord of the Rings: The Rings of Power,2,"John D. Payne, Patrick McKay",Prime Video,10/04/2024 - 10/10/2024,354.2
6,Tulsa King,1,Taylor Sheridan,Paramount+,10/04/2024 - 10/10/2024,350.8
7,Bad Monkey,1,Bill Lawrence,Apple TV+,10/04/2024 - 10/10/2024,272.1
8,Love is Blind,7,Chris Coelen,Netflix,10/04/2024 - 10/10/2024,1300.0
9,The Great British Baking Show,12,"Anna Beattie, Richard McKerrow",Netflix,10/04/2024 - 10/10/2024,236.5


It took me about 5 minutes to convert the orginal shows dataset into a csv file. It look about two hours to tidy and reforming the dataset into meaningful dataset that is easy to perform analysis. 

I used the melt function to pivot the columns and transformed the current week's estimated data and previous week's estimated data to include specific date range. I also converted the values with billions unit into millions unit, so that the data units for Est. Min. Watched are consistent for calculation.

For the shows dataset, I believe using Python and Pandas is easier than using Excel, because it is diffult to pivot the data using excel without accidentally messing up the original values in the cell. In pandas, the original dataset is never manipulated, unless the user explicitly export the data and save as the same file.