In [33]:


import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)



import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))



# Description of Dataset 
 **Context**
 
Historically, gold had been used as a form of currency in various parts of the world including the USA. In present times, precious metals like gold are held with central banks of all countries to guarantee re-payment of foreign debts, and also to control inflation which results in reflecting the financial strength of the country. Recently, emerging world economies, such as China, Russia, and India have been big buyers of gold, whereas the USA, SoUSA, South Africa, and Australia are among the big seller of gold.

Forecasting rise and fall in the daily gold rates can help investors to decide when to buy (or sell) the commodity. But Gold prices are dependent on many factors such as prices of other precious metals, prices of crude oil, stock exchange performance, Bonds prices, currency exchange rates, etc.

The challenge of this project is to accurately predict the future adjusted closing price of Gold ETF across a given period of time in the future. The problem is a regression problem, because the output value which is the adjusted closing price in this project is continuous value.

**Dataset Content**


Data for this study is collected from November 18th 2011 to January 1st 2019 from various sources. The data has 1718 rows in total and 80 columns in total. Data for attributes, such as Oil Price, Standard and Poor’s (S&P) 500 index, Dow Jones Index US Bond rates (10 years), Euro USD exchange rates, prices of precious metals Silver and Platinum and other metals such as Palladium and Rhodium, prices of US Dollar Index, Eldorado Gold Corporation and Gold Miners ETF were gathered.

**Features**

1. Gold ETF :- Date, Open, High, Low, Close and Volume.
2. S&P 500 Index :- 'SP_open', 'SP_high', 'SP_low', 'SP_close', 'SP_Ajclose', 'SP_volume'
3. Dow Jones Index :- 'DJ_open','DJ_high', 'DJ_low', 'DJ_close', 'DJ_Ajclose', 'DJ_volume'
4. Eldorado Gold Corporation (EGO) :- 'EG_open', 'EG_high', 'EG_low', 'EG_close', 'EG_Ajclose', 'EG_volume'
5. EURO - USD Exchange Rate :- 'EU_Price','EU_open', 'EU_high', 'EU_low', 'EU_Trend'
6. Brent Crude Oil Futures :- 'OF_Price', 'OF_Open', 'OF_High', 'OF_Low', 'OF_Volume', 'OF_Trend'
7. Crude Oil WTI USD :- 'OS_Price', 'OS_Open', 'OS_High', 'OS_Low', 'OS_Trend'
8. Silver Futures :- 'SF_Price', 'SF_Open', 'SF_High', 'SF_Low', 'SF_Volume', 'SF_Trend'
9. US Bond Rate (10 years) :- 'USB_Price', 'USB_Open', 'USB_High','USB_Low', 'USB_Trend'
10. Platinum Price :- 'PLT_Price', 'PLT_Open', 'PLT_High', 'PLT_Low','PLT_Trend'
11. Palladium Price :- 'PLD_Price', 'PLD_Open', 'PLD_High', 'PLD_Low','PLD_Trend'
12. Rhodium Prices :- 'RHO_PRICE'
13. US Dollar Index : 'USDI_Price', 'USDI_Open', 'USDI_High','USDI_Low', 'USDI_Volume', 'USDI_Trend'
14. Gold Miners ETF :- 'GDX_Open', 'GDX_High', 'GDX_Low', 'GDX_Close', 'GDX_Adj Close', 'GDX_Volume'
15. Oil ETF USO :- 'USO_Open','USO_High', 'USO_Low', 'USO_Close', 'USO_Adj Close', 'USO_Volume'

**Target**

Adjusted Close

In [34]:
gold=pd.read_csv('../input/gold-price-prediction-dataset/FINAL_USO.csv')
gold.head()

In [35]:
gold['Date'] = pd.to_datetime(gold['Date'])

In [36]:
gold.tail(3)

In [37]:
gold.columns

In [38]:
gold.isnull().any().value_counts()

In [39]:
gold.isnull().sum()

# Exploratory Data Analysis

The major aim of this EDA is to:
1. Gain better understanding of the dataset.
2. Discover relationship between variable
3. Extract important variables.
4. Summarize main characteristics of the data.

So we basically looking for the features that have the most impact on gold price.

# **EDA**

1.Statistical Analysis.
2.Correlation Analysis.
3.Technical Analysis.

# **1.Statistical Analysis**

In [40]:
gold.describe()

# **2. Correlation Analysis**

In [41]:

gold_corr=gold.corr()

In [42]:
gold_corr

In [43]:
import seaborn as sns

In [44]:
import matplotlib.pyplot as plt
import matplotlib
import matplotlib.dates as mdates
plt.figure(figsize = (20,15))
sns.heatmap(gold_corr, annot = True)

We can see that using conventional heatmaps is too ambiguous to visualize because the data has 81 features. Therefore we would have to use other visualization method to understand the correllation data.

In [45]:
#this features are very similar with a correlation of 1. you can read the intro to the dataset for 
#further understanding
Y=gold.drop(['Adj Close','Close'],axis=1)
Y.head(1)

In [46]:
Y.corrwith(gold['Adj Close']).plot.bar(title = 'Correlation with Adj Close',
                                          rot = 90, grid = True, figsize = (20,15),fontsize = 15)

The Plot above you can notice that the Gold Price value is affected by for other market prices. Therefor we can say that their have relative impart on one another.
 

In [47]:
correlation_matrix = gold.corr()
coeff = correlation_matrix['Adj Close'].sort_values(ascending = False)

In [48]:
posi_corr = coeff[coeff > 0]
posi_corr

In [49]:
posi_corr = coeff[coeff < 0]
posi_corr

Note we would not drop any column permanently, because we would still do feature engineering and modeling with all features and then with the selected features with high correlation. Then compare accuracy.

In [50]:
#date_time formating
gold["Date"]=pd.to_datetime(gold.Date,format="%Y-%m-%d")

sns.set(rc={"figure.figsize":(20, 10)})
ax = sns.lineplot(x="Date", y="Adj Close", data=gold,color='blue',label='Adj Close').set(title="Gold's Adjusted Daily Close Price")
ax1= sns.lineplot(x='Date', y='Low', data=gold, color='red',label='Low')
ax2= sns.lineplot(x='Date', y='High', data=gold, color='green',label='High')

In the graph above we can hardly distinguish between the Adj Close values, Low, and High values. But we can notice that the raise and fall of each graph moves with similar trend. These is due to the high upward  correlation of this features.

In [51]:
gold.head()

In [52]:
gold.loc[0,'Date'].day_name()

**Taking a section of the plot**

In [53]:
filt=(gold['Date']<'2013')
Date_2012=gold.loc[filt]

In [54]:
sns.set(rc={"figure.figsize":(20, 10)})
ax = sns.lineplot(x="Date", y="Adj Close", data=Date_2012,color='blue',label='Adj Close').set(title="Gold's Adjusted Daily Close Price 2012")
ax1= sns.lineplot(x='Date', y='Low', data=Date_2012, color='red',label='Low')
ax2= sns.lineplot(x='Date', y='High', data=Date_2012, color='green',label='High')

In [55]:
filt=(gold['Date']>'2015') & (gold['Date']<'2017')
Date_2015_2016=gold.loc[filt]

In [56]:
sns.set(rc={"figure.figsize":(20, 10)})
ax = sns.lineplot(x="Date", y="Adj Close", data=Date_2015_2016,color='blue',label='Adj Close').set(title="Gold's Adjusted Daily Close Price 2012")
ax1= sns.lineplot(x='Date', y='Low', data=Date_2015_2016, color='red',label='Low')
ax2= sns.lineplot(x='Date', y='High', data=Date_2015_2016, color='green',label='High')

# Visualization of the Minimum Adjusted Close

In [57]:
Adj_Close_min=gold[gold['Adj Close']== gold['Adj Close'].min()]

In [58]:
Adj_Close_min

In [59]:
filt=(gold['Date']>'2015-10') & (gold['Date']<'2016-04')
Date_min=gold.loc[filt]

In [60]:
sns.set(rc={"figure.figsize":(20, 10)})
ax = sns.lineplot(x="Date", y="Adj Close", data=Date_min,color='blue',label='Adj Close').set(title="Minimium Gold's Adjusted Daily Close Price 2015")
ax1= sns.lineplot(x='Date', y='Low', data=Date_min, color='red',label='Low')
ax2= sns.lineplot(x='Date', y='High', data=Date_min, color='green',label='High')

# Visualization of the Maximum Adjusted Close

In [61]:
Adj_Close_max=gold[gold['Adj Close']== gold['Adj Close'].max()]
Adj_Close_max

In [62]:
filt=(gold['Date']>'2012-09') & (gold['Date']<'2012-11')
Date_max=gold.loc[filt]

In [63]:
sns.set(rc={"figure.figsize":(20, 10)})
ax = sns.lineplot(x="Date", y="Adj Close", data=Date_max,color='blue',label='Adj Close').set(title="Maximium Gold's Adjusted Daily Close Price 2012")
ax1= sns.lineplot(x='Date', y='Low', data=Date_max, color='red',label='Low')
ax2= sns.lineplot(x='Date', y='High', data=Date_max, color='green',label='High')

# **Technical Indicators**

Comprehensive list of technical indicators that are widely used by professionals and scholars, and those that I believe are most beneficial in automated trading. The list of indicators are:
1. Simple Moving Average (Fast and Slow)
2. Average True Range
3. Average Directional Index (Fast and Slow)
4. Stochastic Oscillators (Fast and Slow)
5. Relative Strength Index (Fast and Slow)
6. Moving Average Convergence Divergence
7. Bollinger Bands

for further reading visit https://towardsdatascience.com/building-a-comprehensive-set-of-technical-indicators-in-python-for-quantitative-trading-8d98751b5fb 

**Reference Description gotten from:** https://towardsdatascience.com/building-a-comprehensive-set-of-technical-indicators-in-python-for-quantitative-trading-8d98751b5fb

Predicting asset price movements has been a widely researched area aimed at developing alpha-generating trading strategies that capture these asset price movements “accurately”. I say accurately with a pinch of salt given the stochastic nature of most asset prices which, by definition, is random in nature. The idea thus focuses on performing some sort of analysis to capture, with some degree of confidence, the movement of this stochastic element. Among the multitude of methods used to predict this movement, technical indicators have been around for quite some time (reportedly used since the 1800s) as one of the methods used in forming an opinion of a potential move.
Until the widespread of algorithmic trading, technical indicators were primarily used by traders who would look up at these indicators on their trading screen to make a buy/sell decision. Even though this is still very prevalent, technical analysis has made its way into automated trading given the ability of Machine-Learning and other statistical tools to analyze this data in a fraction of time and the computational ability of computers to back-test with multiple decades of data.
Even though this article does not argue for or against use of Technical analysis, the technical indicators below can be used to perform various back-tests and come up with an opinion on their prediction power.