The Ushichka dataset has multiple types of files spread across multiple hard-disks, and of different types. This notebook will try to compare, and check how many unique files are there.

The broad reason Ushichka was split over multiple hard-disks was for 1) back-up and 2) due to space limitations on the first hard disk. The dataset has two 'fieldseason' phases: 1 & 2. Phase 1 is pre-Lund visit, while Phase 2 is all that happened post-Lund.

The four disks are named:

1. Ushichka dataset harddisk #1
1. Ushichka dataset harddisk #2
1. Ushichka dataset harddisk #3
1. Ushichka dataset harddisk #4

...and shortly referred to as #1-4

Here I will be discussing the data collected only in the 'main' recording volume for recordings done on (See Beleyur&Goerlitz 2021 (in prep) or Beleyur PhD Thesis, Uni. Konstanz):

* 2018-06-19
* 2018-06-21
* 2018-06-22
* 2018-07-14
* 2018-07-21
* 2018-07-25
* 2018-07-28
* 2018-08-02
* 2018-08-14
* 2018-08-17
* 2018-08-18
* 2018-08-19

Aside from all of this data - there is actually a *lot* of data collected using the audio-video array system of free-flying bats recorded in various field sites in and outside of caves. 




The typical organisation of the data in the drives followed this pattern showing the organisation of folders containing raw data. 

### Typical Audio and Video raw data organisation

```
MAIN DISK:
    * fieldwork_2018_001
        * actrackdata
            * wav
                # one folder per starting date and session number. 
                # the 'starting date' remains the same past midnight even though the actual date
                # has changed.
                * YYYY-MM-DD_sss
                    * FILE1.WAV
                    * FILE2.WAV
                    * ..........
            * weather
        * video
            * YYYY-MM-DD
                # one folder per camera
                * K1
                    # first folder made on switching is always P0000000
                    # all following folders are +1
                    * P0000000
                        # same folder numbering system . First recording triggered is always
                        # stored as 00000000.TMC and +1 for every new file triggered.
                        * 00000000.TMC
                        * 00000001.TMC
                        * .........
                    * P0000001
                    * ......
                    
                * K2
                * K3
        * notebook_scans
            * YYYY-MM-DD
                * notebook photos, scans, additional notes/observations
```
In addition to the *raw* data, there are also some folders in ```actrackdata/video/yyyy-mm-dd/``` which have the DLTdv wand calibrations, the .avi exported videos, and even a few bat trajectories. 


### LiDAR data

In addition to the audio and video data, the LiDAR scans are stored in disk #2. The whole folder is 6.72GB

```
MAIN DISK:
    * Orlova_Chuka_LiDAR
        * Exports
            * various formats of the LiDAR scan - .ply, .dxf and a compressed folder 'OrlovaChukaTotalPointCloud.rar'
        # zipped folder containing a Word file explaining how the raw data was processed. 
        * Re _Orlova_Chuka_data.zip
```
Asparuh Kamburov also has the original data with him - and can be contacted for a copy if necessary. 


### Dates for ``` fieldwork_2018_001\```

```\actrackdata\wav```

|Disk #|Disk #1   | Disk #2|    Disk #3   |   Disk #4    |
|------|-----------|-----|--------|-----|
|Start|2018-06-19 |2018-06-19 | non-Ushichka |2018-06-19 |
|  End| 2018-07-25|2018-07-28 | non-Ushichka | 2018-07-28   | 

```\video```

|Disk #|  Disk #1  |  Disk #2   | Disk #3   |   Disk #4    |
|------|-----------|------------|-----------|--------------|
|Start |2018-06-19 | 2018-04-09 |2018-04-09 |non-Ushichka     |
|  End | 2018-07-25| 2018-07-25 |2018-05-01 | non-Ushichka    | 



### Dates for ``` fieldwork_2018_002\```

```\actrackdata\wav```

|Disk #|  Disk #1  |  Disk #2   | Disk #3    |   Disk #4    |
|------|-----------|------------|------------|--------------|
|Start | No data   | 2018-07-28 | 2018-07-28 |  2018-07-28  |
|  End | No data   | 2018-08-17 | 2018-08-19 |  2018-08-19  | 

```\video```

|Disk #|   Disk #1 |  Disk #2  |    Disk #3   |   Disk #4    |
|------|-----------|-----------|--------------|--------------|
| Start|  No data  |2018-07-28 |  2018-07-28  |  2018-07-28  |
|  End |  No data  |2018-08-19 |  2018-08-19  |  2018-08-19  | 


In [39]:
import os
import glob
import sys 

import numpy as np

In [None]:
drives = ['D:/','E:/','F/:','G:/']

In [13]:
# fieldseason 01 all wav files: 
fieldseason1_path = 'fieldwork_2018_001/actrackdata/wav/*/*.wav'
all_wav_files = [glob.glob(drive+fieldseason1_path) for drive in drives[:2]]
all_wav_files
print([len(each) for i,each in enumerate(all_wav_files)])

[5276, 5437]


In [25]:
# check the difference in the files
wav_fieldwork01_diff = list(set(all_wav_files[1]).difference(set(all_wav_files[0])))
print(list(wav_fieldwork01_diff)[:20])

['E:/fieldwork_2018_001/actrackdata/wav\\2018-06-14_002\\MULTIWAV_2018-06-15_04-51-02_1529027462.WAV', 'E:/fieldwork_2018_001/actrackdata/wav\\2018-06-14_001\\MULTIWAV_2018-06-14_22-06-38_1529003198.WAV', 'E:/fieldwork_2018_001/actrackdata/wav\\2018-05-30_001\\MULTIWAV_2018-05-30_03-19-09_1527643149.WAV', 'E:/fieldwork_2018_001/actrackdata/wav\\2018-06-14_002\\MULTIWAV_2018-06-15_04-23-04_1529025784.WAV', 'E:/fieldwork_2018_001/actrackdata/wav\\2018-07-14_002\\MULTIWAV_2018-07-15_05-13-41_1531620821.WAV', 'E:/fieldwork_2018_001/actrackdata/wav\\2018-05-20_003\\Mic02_2018-05-20_22-34-35_1526848475.WAV', 'E:/fieldwork_2018_001/actrackdata/wav\\2018-05-31_001\\MULTIWAV_2018-05-31_22-13-04_1527793984.WAV', 'E:/fieldwork_2018_001/actrackdata/wav\\2018-07-28_001\\MULTIWAV_2018-07-29_00-22-20_1532812940.WAV', 'E:/fieldwork_2018_001/actrackdata/wav\\2018-07-25_001\\MULTIWAV_2018-07-25_23-37-29_1532551049.WAV', 'E:/fieldwork_2018_001/actrackdata/wav\\2018-05-30_001\\MULTIWAV_2018-05-30_03-31-41

('E:/fieldwork_2018_001/actrackdata/wav', '2018-06-14_002')

In [29]:
# which files are these -- get the session dates on them 
session_dates_uniquefiles = []
for each in wav_fieldwork01_diff:
    folder,file = os.path.split(each)
    _, session_folder = os.path.split(folder)
    session_dates_uniquefiles.append(session_folder)
    

In [43]:
unique_sessions_f01 = np.unique(sorted(session_dates_uniquefiles)).tolist()
unique_sessions_f01

['2018-04-09_001',
 '2018-04-14_001',
 '2018-04-17_001',
 '2018-04-19_001',
 '2018-04-19_002',
 '2018-04-21_001',
 '2018-04-22_001',
 '2018-04-22_002',
 '2018-04-25_001',
 '2018-04-26_001',
 '2018-04-26_002',
 '2018-05-01_001',
 '2018-05-01_002',
 '2018-05-20_001',
 '2018-05-20_002',
 '2018-05-20_003',
 '2018-05-24_001',
 '2018-05-24_002',
 '2018-05-24_003',
 '2018-05-24_004',
 '2018-05-28_001',
 '2018-05-30_001',
 '2018-05-30_003',
 '2018-05-31_001',
 '2018-05-31_002',
 '2018-05-31_003',
 '2018-05-31_004',
 '2018-06-07_001',
 '2018-06-07_002',
 '2018-06-07_003',
 '2018-06-12_001',
 '2018-06-12_002',
 '2018-06-14_001',
 '2018-06-14_002',
 '2018-06-19_001',
 '2018-06-19_002',
 '2018-06-21_001',
 '2018-06-21_002',
 '2018-06-22_001',
 '2018-06-22_002',
 '2018-06-22_003',
 '2018-07-14_001',
 '2018-07-14_002',
 '2018-07-14_003',
 '2018-07-21_001',
 '2018-07-21_002',
 '2018-07-21_003',
 '2018-07-21_004',
 '2018-07-25_001',
 '2018-07-25_002',
 '2018-07-25_003',
 '2018-07-28_001',
 '2018-07-28

We see that most of 

### There are files in #2 that need to be added on top of the wav files already in #1. 

In [34]:
drives = ['D:/','E:/','F/:','G:/']
fieldseason1_path = 'fieldwork_2018_001/video/*/K*/*.TMC'
all_wav_files = glob.glob(drives[3]+fieldseason1_path)
all_wav_files
print(len(all_wav_files))

0
