# Geomechanical Injection Scenario Toolkit (GIST)

#Disclaimer
GIST aims to give the _gist_ of a wide range of potential scenarios and aid collective decision making when responding to seismicity.

The results of GIST are entirely dependent upon the inputs provided, which may be incomplete or inaccurate.

There are other potentially plausible inducement scenarios that are not considered, including fluid migration into the basement, 
out-of-zone poroelastic stressing, or hydraulic fracturing.

None of the individual models produced by GIST accurately represent what happens in the subsurface and cannot be credibly used 
to accurately assign liability or responsibility for seismicity.

"All models are wrong, but some are useful" - George Box, 1976

## Prerequisites

Assumes InjectionSQLScheduled completed successfully and injection data are sampled uniformly in time

##Install Dependencies
- geopandas
- gistMC.py
- eqSQL.py
- gistPlots.py
- numpy
- scipy
- pandas


In [0]:
%restart_python

In [0]:
%run "/Workspace/_utils/Utility_Functions"

In [0]:
!pip install geopandas
!pip install geodatasets
!pip install contextily
#! pip install folium matplotlib mapclassify contextily

Collecting geopandas
  Downloading geopandas-1.0.1-py3-none-any.whl (323 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 323.6/323.6 kB 5.0 MB/s eta 0:00:00
Collecting pyogrio>=0.7.2
  Downloading pyogrio-0.10.0-cp310-cp310-manylinux_2_28_x86_64.whl (23.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.9/23.9 MB 64.3 MB/s eta 0:00:00
Collecting shapely>=2.0.0
  Downloading shapely-2.0.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 88.2 MB/s eta 0:00:00
Collecting pyproj>=3.3.0
  Downloading pyproj-3.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.2/9.2 MB 98.3 MB/s eta 0:00:00
Installing collected packages: shapely, pyproj, pyogrio, geopandas
Successfully installed geopandas-1.0.1 pyogrio-0.10.0 pyproj-3.7.0 shapely-2.0.7
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use 

##Paths

In [0]:
# Paths
homePath='/Workspace/Users/bill.curry@exxonmobil.com/'
# Injection data path 
injPath=homePath+'injection/WeeklyRun/ScheduledOutput/'
# GIST library path
gistPath=homePath+'GIST/'

##Libraries

- numpy
- scipy
- pandas
- matplotlib
- geopandas
- pyspark


In [0]:
import sys
sys.path.append(gistPath+'lib')

In [0]:
#Databricks-specific
#import dataBricksConfig as db
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("eqSQL").getOrCreate()

In [0]:
import numpy as np
import pandas as pd
import os
import gistMC as gi
import eqSQL as es
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas
#import contextily as cx



In [0]:
import gistPlots as gp

#Restore state from Step 4

In [0]:
eventID='texnet2024oqfb'
deepOrShallow='Deep'
verb=0
forecastYears=5.
runPath=gistPath+'/runs/'+eventID+'/'
runIntervalPath=runPath+deepOrShallow+'/'
initialRunIntervalPath=runIntervalPath+'initialRun/'
updatedRunIntervalPath=runIntervalPath+'udpatedRun/'
forecastRunIntervalPath=runIntervalPath+'forecastRun/'
disposalPath=runIntervalPath+'updatedDisposal/'

In [0]:
import pickle
with open(updatedRunIntervalPath+'gist.pkl', 'rb') as file:
  gist2=pickle.load(file)
EQ=pd.read_csv(runPath+'EQ.csv').loc[0]
EQ['Origin Date']=pd.to_datetime(EQ['Origin Date'])

In [0]:
#selectedWellsDF=pd.read_csv(disposalPath+'forecastSelectedWells.csv')
#ignoredWellsDF=pd.read_csv(disposalPath+'forecastIgnoredWells.csv')
allWellsFile=disposalPath+'allInZoneWells.csv'
injFile=disposalPath+'inj.csv'
allInZoneWellsDF=pd.read_csv(disposalPath+'allInZoneWells.csv')
#allOutOfZoneWellsDF=pd.read_csv(runIntervalPath+'updatedDisposal/allOutOfZoneWells.csv')
#injDF=pd.read_csv(disposalPath+'inj.csv')

#5. Forecast

##5.1 Input Forecast Times

In [0]:
# End Year of time series forecast
EndYear=2030
# Rerun select wells with an updated earthquake time set to the last day of EndYear
future=EQ.copy()
future['Origin Date']=future['Origin Date']+pd.Timedelta(forecastYears*365.25,'D')
future.to_frame().T.to_csv(forecastRunIntervalPath+'/forecastEQT.csv')
future.to_frame().to_csv(forecastRunIntervalPath+'/forecastEQ.csv')


##5.2 Distributions of rates for each well

In [0]:
gist2.addWells(userWellFile=allWellsFile,userInjFile=injFile,verbose=2)

 gistMC.addWells: user wells and injection provided, no default wells/injection
 gistMC.addWells: no user wells and injection provided, only default wells/injection
 gistMC.addWells: well file added with  2863  wells
 gistMC.addWells: well columns: Index(['Unnamed: 0', 'ID', 'InjectionWellId', 'UniqueWellIdentifier',
       'UICNumber', 'APINumber', 'LeaseName', 'Operator', 'OperatorType',
       'OperatorPrincipalCompany', 'OperatorPrincipalCompanyType', 'WellName',
       'WellNumber', 'State', 'Basin', 'County', 'District', 'SRAOrSIR',
       'B3InjectionType', 'B3InjectionStatus', 'RegulatoryInjectionType',
       'PermittedMaxLiquidBPD', 'PermittedMaxLiquidPSIG',
       'PermittedMaxGasMCFPerDay', 'PermittedMaxGasPSIG',
       'PermittedCommercialStatus', 'PermittedIntervalTopFt',
       'PermittedIntervalBottomFt', 'InjectionClass', 'PermitStage',
       'PermittedWellDepthClassification', 'DaysApplicationHasBeenInReview',
       'DaysToPermitApproval', 'PermitIsAmendment',
     

In [0]:

forecastSelectedWellsDF,forecastIgnoredWellsDF,forecastInitialInjDF=gist2.findWells(future,PE=False,verbose=verb)
# Merge udpated well dataframe with prior unselected wells
#addedWellsDF=forecastSelectedWellsDF[~forecastSelectedWellsDF['InjectionWellId'].isin(updatedSelectedWellsDF['InjectionWellId'])]
#print(addedWellsDF)
#forecastSelectedWellsDF['Added']=False
#forecastSelectedWellsDF.loc[forecastSelectedWellsDF['InjectionWellId'].isin(addedWellsDF['InjectionWellId']),'Added']=True
# Re-generate an R-T plot here. Should it have two sets of curves?
# We need a column for wells that are newly added - 




In [0]:
# I need new code here that calculates the time derivatives at a future point
# TDTD0DF= gist.getTHTD0(newSelectedWellsDF,injDF,future,verb)

In [0]:
# Get ordered well IDs from orderedWellList[:-1]
updatedOrderedWellList=pd.read_csv(updatedRunIntervalPath+'wellOrder.csv',squeeze=True,index_col=0)
print(updatedOrderedWellList[:-1])
forecastWellIDList=[]
for wellName in updatedOrderedWellList[:-1]:
  #print('wellName:',wellName)
  wellID=forecastSelectedWellsDF[forecastSelectedWellsDF['WellName']==wellName]['ID'].to_list()[0]
  print(wellName,' ID:',wellID)
  forecastWellIDList.append(wellID)



  updatedOrderedWellList=pd.read_csv(updatedRunIntervalPath+'wellOrder.csv',squeeze=True,index_col=0)


0         MARIENFELD 13 1D
1            ANNALEA SWD 1
2             PALO VERDE 1
3               FAUDREE 1D
4            NEPTUNE SWD 1
5          DICKENSON 20 8D
6              CORFU SWD 1
7        SALE RANCH 27 1DD
8                PAT, K. 3
9          SALT LAKE SWD 1
10      LIMEQUEST 6 SWD 1D
11     MCMORRIES 18 SWD 2D
12    PENROSE-OLDHAM SWD 2
13             BERRY SWD 1
14    DAGGER LAKE SWD 08SD
15           MARBILL SWD 1
16    NAIL RANCH \"36\" 1D
17             BROWN SWD 1
Name: Name, dtype: object
MARIENFELD 13 1D  ID: 2121094
ANNALEA SWD 1  ID: 2121271
PALO VERDE 1  ID: 2123938
FAUDREE 1D  ID: 2117885
NEPTUNE SWD 1  ID: 2111946
DICKENSON 20 8D  ID: 2109094
CORFU SWD 1  ID: 2116501
SALE RANCH 27 1DD  ID: 2112288
PAT, K. 3  ID: 2081404
SALT LAKE SWD 1  ID: 2114817
LIMEQUEST 6 SWD 1D  ID: 2112104
MCMORRIES 18 SWD 2D  ID: 2111358
PENROSE-OLDHAM SWD 2  ID: 2056720
BERRY SWD 1  ID: 2121191
DAGGER LAKE SWD 08SD  ID: 2113898
MARBILL SWD 1  ID: 2111690
NAIL RANCH \"36\" 1D  ID: 211899

In [0]:
# Put in new rates - every well gets 10kbd
rateDict=dict(zip(forecastWellIDList,[10000.]*len(forecastWellIDList)))
print(rateDict)

{2121094: 10000.0, 2121271: 10000.0, 2123938: 10000.0, 2117885: 10000.0, 2111946: 10000.0, 2109094: 10000.0, 2116501: 10000.0, 2112288: 10000.0, 2081404: 10000.0, 2114817: 10000.0, 2112104: 10000.0, 2111358: 10000.0, 2056720: 10000.0, 2121191: 10000.0, 2113898: 10000.0, 2111690: 10000.0, 2118993: 10000.0, 2093925: 10000.0}


In [0]:
startDate=pd.to_datetime('2025-01-15')
print(gi.__dir__())
forecastInjDF=gi.extendDisposal(forecastInitialInjDF,startDate,future['Origin Date'],rateDict,dDays=gist2.injDT,verbose=1)

 gist.extendDisposal - lastDay:  20100
 gist.extendDisposal - future time horizon:  20110.0 21767.0 10.0
 extendDisposal - wellIDList:  [2078938 2078251 2080714 2079858 2079878 2077198 2081262 2081279 2077335
 2084245 2083389 2083417 2083672 2085423 2084091 2086897 2086098 2087824
 2088381 2086294 2086664 2082680 2090790 2090834 2093611 2091330 2100538
 2098949 2099704 2099183 2100310 2105399 2105922 2106015 2106155 2106820
 2108311 2108259 2105036 2111004 2110195 2112042 2112104 2112591 2111864
 2112175 2112172 2113672 2112198 2111946 2111968 2112249 2111988 2112227
 2112288 2113898 2114930 2114823 2114949 2109286 2114597 2117363 2117616
 2118060 2116406 2117633 2119505 2117806 2117915 2120827 2120859 2120458
 2117885 2119978 2120624 2120563 2118820 2120964 2120979 2120980 2118993
 2121001 2121323 2121013 2121310 2119799 2121094 2121097 2121129 2121091
 2119049 2121177 2121190 2121203 2121181 2121206 2121213 2121617 2121255
 2121715 2119358 2126182 2118517 2119465 2121854 2120225 2120

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pastInjDF['Type']='Original'


In [0]:
forecastRTDF,forecastMergedWellsDF = gi.prepRTPlot(forecastSelectedWellsDF,forecastIgnoredWellsDF,minYear='1985',diffRange=(gist2.diffPPVec.min(),gist2.diffPPVec.max()),eq=future,clipYear=False)


In [0]:
forecastMergedWellsDF.to_csv(forecastRunIntervalPath+'forecastRTwells.csv')
forecastRTDF.to_csv(forecastRunIntervalPath+'forecastRTDF.csv')
forecastInitialInjDF.to_csv(forecastRunIntervalPath+'forecastInitialInjDF.csv')

In [0]:
forecastScenarioDF=gist2.runPressureScenarios(future,forecastSelectedWellsDF,forecastInjDF,verbose=verb)
forecastFilteredDF,forecastOrderedWellList=gi.summarizePPResults(forecastScenarioDF,forecastSelectedWellsDF,threshold=1.,nOrder=40,verbose=verb)
forecastDisaggregationDF=gi.prepDisaggregationPlot(forecastFilteredDF,forecastOrderedWellList,jitter=0.1,verbose=1)
forecastWinWellsDF,forecastWinInjDF=gi.getWinWells(forecastFilteredDF,forecastSelectedWellsDF,forecastInjDF)

 prepDisaggregationPlot:  21  wells in disaggregation plot with  1000  realizations
 prepDisaggregationPlot:  1000  rows for  MARIENFELD 13 1D
 prepDisaggregationPlot:  1000  rows for  ANNALEA SWD 1
 prepDisaggregationPlot:  1000  rows for  PALO VERDE 1
 prepDisaggregationPlot:  1000  rows for  SALT LAKE SWD 1
 prepDisaggregationPlot:  1000  rows for  FAUDREE 1D
 prepDisaggregationPlot:  1000  rows for  SALE RANCH 27 1DD
 prepDisaggregationPlot:  1000  rows for  CORFU SWD 1
 prepDisaggregationPlot:  1000  rows for  NEPTUNE SWD 1
 prepDisaggregationPlot:  1000  rows for  DICKENSON 20 8D
 prepDisaggregationPlot:  1000  rows for  LIMEQUEST 6 SWD 1D
 prepDisaggregationPlot:  1000  rows for  DAGGER LAKE SWD 08SD
 prepDisaggregationPlot:  1000  rows for  MCMORRIES 18 SWD 2D
 prepDisaggregationPlot:  1000  rows for  JAKE THE SNAKE SWD 1
 prepDisaggregationPlot:  1000  rows for  MARBILL SWD 1
 prepDisaggregationPlot:  1000  rows for  BERRY SWD 1
 prepDisaggregationPlot:  1000  rows for  PENROS

In [0]:
forecastScenarioTSRDF,forecastdPTimeSeriesR,forecastWellIDsR,forecastDayVecR = gist2.runPressureScenariosTimeSeries(future,forecastWinWellsDF,forecastWinInjDF,verbose=verb)

In [0]:
forecastTotalPPQuantilesDF=gi.prepTotalPressureTimeSeriesPlot(forecastdPTimeSeriesR,forecastDayVecR,nQuantiles=11,epoch=pd.to_datetime('1970-01-01'),verbose=1)
forecastTotalPPSpaghettiDF=gi.prepTotalPressureTimeSeriesSpaghettiPlot(forecastdPTimeSeriesR,forecastDayVecR,gist2.diffPPVec,epoch=pd.to_datetime('1970-01-01'),verbose=1)

prepTotalPressureTimeSeriesPlot: deltaPP.shape= (20, 1000, 1578)  dayVec.shape= (1578,)
prepTotalPressureTimeSeriesPlot: totalDeltaPP.shape= (1000, 1578)
prepTotalPressureTimeSeriesPlot: quantiles: [0.0, 10.0, 20.0, 30.0, 40.0, 50.1, 60.0, 70.0, 80.0, 90.0, 100.0]
prepTotalPressureTimeSeriesSpaghettiPlot: deltaPP.shape= (20, 1000, 1578)  dayVec.shape= (1578,)
prepTotalPressureTimeSeriesSpaghettiPlot: totalDeltaPP.shape= (1000, 1578)


In [0]:
#forecastAllPPQuantilesDF=gi.getPerWellPressureTimeSeriesQuantiles(forecastdPTimeSeriesR,forecastDayVecR,forecastWellIDsR,nQuantiles=11,epoch=pd.to_datetime('01-01-1970'))
forecastAllPPQuantilesDF,forecastAllPPSpaghettiDF=gi.getPerWellPressureTimeSeriesSpaghettiAndQuantiles(forecastdPTimeSeriesR,forecastDayVecR,gist2.diffPPVec,forecastWellIDsR,nQuantiles=11,epoch=pd.to_datetime('01-01-1970'))

In [0]:
#forecastWellPressureDict=gi.prepPressureAndDisposalTimeSeriesPlots(forecastAllPPQuantilesDF,forecastWinWellsDF,forecastWinInjDF,forecastOrderedWellList[:-1],verbose=0)

forecastWellPressureDict=gi.prepPressureAndDisposalTimeSeriesPlots(forecastAllPPQuantilesDF,forecastAllPPSpaghettiDF,forecastWinWellsDF,forecastWinInjDF,forecastOrderedWellList[:-1],verbose=0)

In [0]:
forecastMergedWellsDF.to_csv(forecastRunIntervalPath+'RTwells.csv')
forecastRTDF.to_csv(forecastRunIntervalPath+'RTDF.csv')
forecastInjDF.to_csv(forecastRunIntervalPath+'forecastInjDF.csv')
forecastSelectedWellsDF.to_csv(forecastRunIntervalPath+'updatedSelectedWells.csv')
forecastIgnoredWellsDF.to_csv(forecastRunIntervalPath+'updatedIgnoredWells.csv')
#futureInjDF.to_csv(forecastRunIntervalPath+'futureInj.csv')
pd.Series(data=forecastOrderedWellList).to_csv(forecastRunIntervalPath+'wellOrder.csv')
forecastScenarioDF.to_csv(forecastRunIntervalPath+'fullScenarios.csv')
forecastFilteredDF.to_csv(forecastRunIntervalPath+'filteredScenarios.csv')
pd.Series(data=forecastOrderedWellList).to_csv(forecastRunIntervalPath+'forecastWellOrder.csv')
forecastDisaggregationDF.to_csv(forecastRunIntervalPath+'forecastDisaggregation.csv')
forecastTotalPPQuantilesDF.to_csv(forecastRunIntervalPath+'forecastTotalPPQuantiles.csv')
forecastTotalPPSpaghettiDF.to_csv(forecastRunIntervalPath+'forecastTotalPPSpaghetti.csv')

In [0]:
i=0
for wellDictKey, wellDictValue in forecastWellPressureDict.items():
  wellID=wellDictValue['WellInfo']['ID'].to_list()[0]
  futureWellFilePrefix='/perWell/well_'+str(i)+'_'
  wellDictValue['PPQuantiles'].to_csv(forecastRunIntervalPath+futureWellFilePrefix+'PPQuantiles.csv')
  wellDictValue['Disposal'].to_csv(forecastRunIntervalPath+futureWellFilePrefix+'Disposal.csv')
  wellDictValue['WellInfo'].to_csv(forecastRunIntervalPath+futureWellFilePrefix+'WellInfo.csv')
  wellDictValue['Spaghetti'].to_csv(forecastRunIntervalPath+futureWellFilePrefix+'Spaghetti.csv')
  print('well',i,', ID:',wellID,' completed')
  i=i+1

well 0 , ID: 2121094  completed
well 1 , ID: 2121271  completed
well 2 , ID: 2123938  completed
well 3 , ID: 2114817  completed
well 4 , ID: 2117885  completed
well 5 , ID: 2112288  completed
well 6 , ID: 2116501  completed
well 7 , ID: 2111946  completed
well 8 , ID: 2109094  completed
well 9 , ID: 2112104  completed
well 10 , ID: 2113898  completed
well 11 , ID: 2111358  completed
well 12 , ID: 2117633  completed
well 13 , ID: 2111690  completed
well 14 , ID: 2121191  completed
well 15 , ID: 2056720  completed
well 16 , ID: 2121195  completed
well 17 , ID: 2123953  completed
well 18 , ID: 2111988  completed
well 19 , ID: 2081404  completed


In [0]:
forecastScenarioTSRDF.to_csv(forecastRunIntervalPath+'materialScenariosR.csv')
np.savez_compressed(forecastRunIntervalPath+'forecastTimeSeriesR.npz', deltaPP=forecastdPTimeSeriesR,dayVec=forecastDayVecR,wellIDs=forecastWellIDsR)