## Intro

I will be analysing seismic waveform data from Southern California Earthquake Data Center (SCEDC). This is a ongoing research project of mine where I am processing these data as part of my research. I'm thinking of using some of the machine learning techniques to help me further analyse the collected data.

Here is an example of what the data looks like when I print it out. 
In this graph there are 3 parts, each representing the seismic wave from a different component, the Z(vertical), the radial and the traversal. To put it simply, you can understand it as looking at the wave from 3 different directions. The line labeled "P" is just an way to organize the waveform so that it all lines up and makes it easier to analyse. 
![image.png](attachment:image.png)


## How am I collecting the data?

I used the Python module ObsPy to download all the data. The module is uses the API provided by SCEDC. 
The specifics of the module can be found at: https://docs.obspy.org/

The time frame of the data I downloaded is from January 1st 2022 to July 1st 2022.
The co-ordinates of the data range from latitude:32 to 37, longitude: -122 to -114. 

## How many observations do I have?

I have downloaded 514 seismic events with roughly a total of 33000 waveforms after some filtering. The specifics of the filtering process is rather complex, there are many steps that involves creating directories, specifying seismic station channel and converting file types that I think is beyond the scope of this data memo. However, I will include the whole process into my final project write up.

The waveforms are timeseries data so the raw data's only variable is time.

## Overview of research questions

The goal of my research is to use machine learning to find a specific signal in the waveform, which are called sP wave phases. These wave phases have the property of locating earthquake centers. In other words, if these wave phases are correctly found and labeled it can be used by seismologist to locate where an earthquake happened.

Here is an example of a specific waveform that I have labeled previously: 
In this picture we see that the waveform data is a time series, the amplitude(y-axis) of the seismic wave changes over time(the x-axis). There are three colored vertical lines in this images, this line is a prediction of approximately where the wave phase should be. The colored regions are where the wave actually is. The prediction is done using a simplified simulation of the Earth's structure. The colored region is labeled manually by performing a number of tests and judging by experience comparing to other similar waveforms. This process ususally is tedious and time consuming. In addition, we can see that there is a significant difference between the prediction line and the actual region. My goal is to find a better way of automating the process using machine learning that improves the efficiency and accuracy of finding and labeling sP wave phases. 
![image-2.png](attachment:image-2.png)

## Proposed plan of the project

My planned approach for this project is to use a neural network to find these sP wave phases. The method will be similar to how this neural network identifies PmP wave phases (a different but similar seismic wave phase) specified in this paper: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2021JB023830

The idea is to first manually create a training database with identified and labeled signals like the picture shown above with about 100 signals. Then use that as the training set for a neural network which takes in the waveform timeseries data and then outputs a probability and a time value. The probability value will represent the likelyhood of a sP signal present in the waveform and the time value will be where the sP signal is. 

## Proposed timeline of the project

The data is already downloaded and my workflow for manually labeling waveforms is already implemented. 

01/20 - 02/01 Manually label waveforms, create a database with over 100 labeled signals

02/01 - 02/15 Try different machine learning models that we have learnt

02/15 - 03/01 Try to implement a neural network by modifying the code from the above paper

03/01 - 03/24 Write up project report 

## Potential problems 

It could be that sP signals are just not identifiable by neural networks