<center><img src='img/ai4eo_logos.jpg' alt='Logos AI4EO MOOC' width='80%'></img></center>

<hr>

<br>

<a href='https://www.futurelearn.com/courses/artificial-intelligence-for-earth-monitoring/1/steps/1291928' target='_blank'><< Back to FutureLearn</a><br>

# A Chatbot for Questions about the Earth 

<i>by Anna-Lena Erdmann, EUMETSAT, Darmstadt, Germany</i>

<hr>

## Introduction



*Imagine you can ask any question about the Earth - and get a data-driven answer*

By the end of this notebook, you will learn how WEkEO tools can be used to create a chatbot that can answer user questions about sea surface temperatures by analyzing the underlying Earth Observation (EO) data: 

<left><img src='img/knoweo_webinterface.png' alt='knoweo Web Interface' width='50%'></img></left>


This is possible by **combining EO data analysis tools of WEkEO with Large Language Models**. 


**Large Language Models** and applications built on LLMs have experienced an immense boost of popularity since the launch of Open AI's <a href="https://chatgpt.com/">ChatGPT</a> in 2023.
LLMs are artificial intelligence foundation models, which have been trained on massive amounts of data with billions of parameters. Being trained on large amounts of text data, LLMs are capable of processing and generating text and generalize to multiple tasks [[Naveed et al. 2023]](#Naveedetal2023). 

 When it comes to **specialized** tasks, which require any type of information they have not been trained on, the usage of LLM soon reaches its limit. Examples where LLMs are reaching its limit are: 
 - accessing recent information from after they have been trained
 - making complex mathematical computations
 - writing code for a not well-documented programming language 

Accessing and analyzing EO data is a really specialized task, which requires expert knowledge of EO data formats, access APIs, processing tools, and visualization tools. For non-experts, it is challenging to analyze Earth Observation data to get insights about the Earth. 

<a href="https://wekeo.eu/">WEkEO</a> provides a **harmonized data access** to all data from the Copernicus programm. Data Access is simplified through a number of tools, such as the HDA API and climetlab plugins. Being able to access and process many different earth observation datasets through one platform reduces the complexity of EO data access and analysis. 

This notebook explores the **capability of LLMs to access and analyze EO data** to answer user questions about the Earth. Equipped with the right tools coming from WEkEO, an LLM generates data-driven answers about the Earth on natural Language questions from anyone. 

## What this Notebook will cover

* [1 - Introduction to LLMs and LangChain](#intro)
* [2 - Creating a Sea Surface Temperature Chatbot](#sstchatbot)
     * [2.1 - Defining the LLM](#sstagent)
     * [2.2 - Creating the EO toolbox](#ssttoolbox)
     * [2.3 - Initializing the Agent](#iniagent)
     * [2.4 - Asking questions and receiving answers](#sstqa)
* [3 - Conclusions](#conclusions)
* [4 - Additional Resources](#resources)

<hr>

## <a id='intro'></a> 1 - Introduction to LLMs

### LLMs: 

Large Language Models are aritificial intelligence foundation models, that can process and generate text. They have recently enjoyed an increase in capability, and following this popularity, mainly because of advances in transformer networks, increased computational capabilities, and the availability of large-scale training data [[Naveed et al. 2023]](#Naveedetal2023). 

LLMs are transformer neural networks, first described in the paper "Attention Is All You Need" [[Vaswani et al 2017]](#Vasvanietal2017). Transformers are sequence-to-sequence networks designed to predict an output sequence based on an input sequence. Applications of transformers are in translation task, or text completion tasks. In the use case of text completion task, based on an input sequence the LLM predicts the probability distribution of the next word (or token).  

Giving the LLM a prompt, e.g. a task description or a user question as input sequence, the LLM will predict the next word. This word is added to the input sequence and acts as a new input sequence, where the subsequent word is predicted, added to the input sequence, and so on. This is repeated until a full paragraph is finished and the user will finally receive a chatGPT-like answer from the LLM. 

<left><img src='img/token_prediction.png' alt='token prediction' width='50%'></img></left>

Image source: [Mishra 2024][]

[Mishra 2024]:https://www.linkedin.com/pulse/how-do-language-modelsllm-work-we-call-chatgpt-mishra-fdqsc/



### Augmented LLMs: 

While pre-trained LLMs are masters of generalization and practically all-round talents, they soon reach their limits for specialized tasks. There are different ways of specializing pre-trained LLMs [[Naveed et al. 2023]](#Naveedetal2023): 

- *fine-tuning*: re-open the "box" of weights of the LLM and **retrain it on specialized training data** the LLM needs to serve the specialized task. Fine-tuning usually has a high demand for computational resources. 

- *LLM augmentation*: augmentation means to equip the LLM with **external resources** that help the LLM to solve the required task. Only new information is added without altering the original LLM. This method is quite efficient for specializing LLMs without getting too deep into the process of training and validation of large neural network architectures. 

Ways of augmenting LLMs: 

*Retrieval-augmented generation (RAG) technique* This technique adds a so-called retriever next to the LLM that retrieves information and provides this information to the LLM. The LLM has more context specifically related to the user question and can therefore rely on external sources of information in addition to the information the LLM was trained on. This external information can be e.g. a database with specific documentation or a web search. 


*Tool augmented LLMs* While the RAG technique relies on the retriever to provide information to the LLM, tool augmentation relies on the LLM-inherent reasoning capabilities. The LLMs can divide tasks into sub-tasks and address the right tools to solve the sub-tasks. One example is the RestGPT, which provides an LLM with prompted API documentation so that the LLM can create queries for the REST APIs based on their documentation [[Song et al. 2023]](#Songetal2023). 

<left><img src='img/rag.png' alt='RAG' width='35%'></img></left>
<left><img src='img/tool.png' alt='Tool' width='32%'></img></left>

Image source: [[Naveed et al. 2023]](#Naveedetal2023)



### AI Agents 

[Xi et al. 2023](#Xietal2023) describe AI agents as "**artificial entities that sense their environment, make
decisions, and take actions**." While earlier AI agents were rule-based, the capabilities of LLM make them predestined as flexible and adaptable **"brain" of agents**. 
As the brain of the agent, the LLM steers the workflow from the user question to the solution by dividing the task, selecting which tools to choose from, and reacts properly if something goes wrong. An agent can also consist of many LLMs, from which one is the brain, abd the other are other "personas" focusing on one specific task, such as code generation. 

<left><img src='img/agent_framework.png' alt='Agent' width='45%'></img></left>

Using frameworks such as the ReAct [[Yao et al. 2023]](#Yaoetal2023), the LLM undergoes a chain of thought, action, and evaluation. This image shows an example of the ReAct framework and how it triggers the LLM to act as the brain of the agent.

This notebook makes use of the ReAct framework to create an AI agent with OpenAIs gpt-4 as LLM "brain". Using WEkEO tools, together with additional tools, such as python executors or code templates, a toolbox is created which enables the LLM to solve geospatial analysis tasks on earth observation data.

### Langchain 🦜️🔗 

To build the AI Agent and create our LLM-powered application this notebook uses LangChain. <a href="https://github.com/langchain-ai/langchain">LangChain</a> is a powerful open-source framework that simplifies the modular build-up of LLM applications. Through the LAngChain expression language and abstraction modules between the LLM and the developer, LangChain enables a simple and fast augmentation of LLMs with additional data sources and even the creation of LLM-powered agents. The full documentation of LangChain is available <a href="https://python.langchain.com/v0.2/docs/introduction/">here</a>.

## <a id='sstchatbot'></a> 2 - Creating a Sea Surface Temperature Chatbot

The goal of this notebook is to provide an example of how data access tools of WEkEO can be used to create LLM applications for data analysis tasks. While acting as an experiment and example, the scope of the AI agent is limited to the following capabilities: 

- this notebook only develops the LLM agent - **not the web interface**
- it is only possible to analyze **one dataset** on the topic of sea surface temperature
- the data for analysis is downloaded in the background: the execution will fail for user questions where **massive amounts of data** are used and should be limited on questions for specific days (max. one week)
-  EO workflow consists of **recurring modules**: 
   - identification which data is needed to answer the question
   - download of the data
   - understanding the data and data variables
   - analyzing the data
- create a **specialized chatbot**, that knows this procedure and has information on the questions that can be answered with the dataset. For remaining questions, the chatbot can use internal knowledge. 

### <a id='sstsetup'></a> 2.0 - Setting up the Python Environment

Install the packages which are required for execitung this notebook. 

If you have already set up the environment using the **environment.yaml** file provided with the notebook, or have launched this notebook inside the **WEkEO JupyterHub**, you can skip this step. 

In [1]:
%pip -q install langchain
%pip -q install climetlab-wekeo-source
%pip -q install climetlab-wekeo-datasets
%pip -q install climetlab
%pip install -q zarr
%pip install -q xarray[complete]
%pip install -q langchain-experimental
%pip install -q openai==0.27.4

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-text-splitters 0.0.1 requires langchain-core<0.2.0,>=0.1.28, but you have langchain-core 0.0.13 which is incompatible.


Note: you may need to restart the kernel to use updated packages.


Import the packages

In [6]:
import os
import json

# LLM + Langchain packages
from langchain import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.agents import Tool
from langchain.tools import BaseTool
from langchain_experimental.tools.python.tool import PythonAstREPLTool

# data analysis packages
import climetlab as cml
from climetlab_wekeo_datasets import hda2cml
import xarray as xr
import json


We are using OpenAI's GPT 4 for this notebook. You need an API key for this. Set your OpenAI key as enviroment variable. If you don't have an OpenAI API key, you can create a user account [here](https://openai.com/). 

In [7]:
os.environ["OPENAI_API_KEY"] = ""

### <a id='sstagent'></a> 2.1 - Defining the LLM

We are using the OpenAI's GPT-4 model for this example. You can also substitute this model by other LLMs if prefered. 

In [8]:
# Set up the LLM - OpenAIs GPT4
turbo_llm = ChatOpenAI(
    temperature=0,
    model_name='gpt-4'
)

### <a id='ssttoolbox'></a> 2.2 - Creating the EO toolbox

As described in Section 2, an AI Agent needs access to **specific tools** for problem solving. In this section we define the necessary tools for the data access and analysis. The tools are built of 2 parts: 

1. The **Description**: The description is exposed to the LLM. Based on the description of the tool, the LLM will determine wheter the tool is suitable for the specific sub-task. It is important that the description described which input is necessary to the tool and which output can be expected after the tool is executed. 

2. The **Function Definition**: The function defined the actual processes which are triggered when the tool is executed by the LLM. 

#### Tool 1: API Request Creator

In [9]:
class APICreator(BaseTool):
    name = "Create API Request"
    description = "Generates an API request for satellite data given a dictionary of 'start_time' in the format \"YYYY-MM-DDT00:00:00.000Z\", 'end_time' in the format \"YYYY-MM-DDT00:00:00.000Z\" and 'dataset_id'. Please wrap the json in a string"
    def _run(self, input: str):
        input_dict = json.loads(input)

        api_template = '''{
                      "dataset_id": "EO:EUM:DAT:METOP:GLB-SST-NC",
                      "dtstart": {start_time},
                      "dtend": {end_time}
                    }'''
        api_request =  api_template.replace("{start_time}", '"'+input_dict['start_time']+'"').replace("{end_time}", '"'+input_dict['end_time']+'"')
        return api_request.replace('{', '{{').replace('}', '}}')


    def _arun(self, input: list):
        raise NotImplementedError("This tool does not support async")

api_creator = APICreator()

#### Tool 2: Data Downloader

In [10]:
class DataDownloader(BaseTool):
    name = "Download Data and get Shape"
    description = "The input to the tool is an API request created by the APICreator tool! Given this API request as a json wrapped in a string, this tool downloads the data, converts it to an xarray and gives back the shape of the data. This tool is useful if you want to find out which python code is needed next to analyze the data."
    def _run(self, input: str):
        global api_request_glob
        api_request_glob = input
        input_dict = json.loads(input)
        dsid, args = hda2cml(input_dict)
        cml_ds = cml.load_dataset(dsid, **args)
        ds_local = cml_ds.to_xarray()
        output = "Coordinates of xarray Dataset: ", ds_local.coords, " Variables: ", list(ds_local.keys())
        return output


    def _arun(self, input: list):
        raise NotImplementedError("This tool does not support async")

downloader = DataDownloader()

#### Tool 3: Python Executor

In [11]:
python_exe = PythonAstREPLTool()

python_tool = Tool(
        name = "python executer",
        func=python_exe,
        description="takes python code as input and executes it. all packages have to be imported inside the python code. make sure that relevant findings are printed to be further processed."
    )

#### Tool 4: Data Analyzer

In [12]:
analysis_cmlcode = '''import climetlab as cml
import xarray as xr
import json
from climetlab_wekeo_datasets import hda2cml
dsid, args = hda2cml(api_request)
cml_ds = cml.load_dataset(dsid, **args)
ds_local = cml_ds.to_xarray()

'''

class DataAnalyzer(BaseTool):
    name = "Climetlab Data Analyzer"
    description = "The input to the tool is valid python code to make data analysis. This tool is useful after downloading the data from an API to answer the user question. The data is a xarray dataset and is stored in the variable ds_local. It is most likely a das-sliced xarray, so always use the python package xarray to make the calculations and  the function.compute()in the end! Print the final result using the print() statement."
    def _run(self, input: str):
        code = analysis_cmlcode.replace("api_request", api_request_glob)
        print(code)
        return  python_exe.run(code+input)


analysis_tool = DataAnalyzer()

### <a id='iniagent'></a> 2.3 - Setting up the Agent 

Now, that we have defined the tools, we can initialize the agent and equit it with the tools from the EO toolbox. 

Memory is added to the agent, so it can have a conversational interaction with the user and can answer follow-up questions from the context of the conversation. The nomber of interactions the LLM can memorize is set by the parameter k and is set to 3. 

The conversational Agent is initialized through the function ``initialize_agent`` with the following parameters: 

``agent`` defined the agent prompt. We use the ReAct description. 

``tools`` equips the agent with the tools from the EO toolboy we created.

`` llm`` sets the used LLM

`` verbose`` decides whether we want to see the output of the LLM during the task solving

``max_iterations`` lets the LLM iterate 5 times between thought and action, this prevents an infinite loop of thoughts and reactions of the LLM 

`` memory`` sets the conversational memory



In [13]:
from langchain.prompts.chat import SystemMessagePromptTemplate
from langchain.agents import initialize_agent
from langchain.chains.conversation.memory import ConversationBufferWindowMemory

# conversational agent memory
memory = ConversationBufferWindowMemory(
    memory_key='chat_history',
    k=3,
    return_messages=True
)

tools = [downloader, api_creator,  analysis_tool, python_tool]

conversational_agent = initialize_agent(
    agent='chat-conversational-react-description',
    tools=tools,
    llm=turbo_llm,
    verbose=True,
    max_iterations=5,
    memory=memory
)

### Prompt Engineering: Telling the LLM which tasks it has to solve

Each LLM has a prompt which defines the "persona" of the LLM. More precisely, the LLM prompt determines the context in which the LLM acts. The prompt is part of the input secuence that determined which next work the LLM predicts. Therefore it is very important to adjust the prompt in a way that it fits to the task of the LLM. 

First, explore the default prompt of the LLM:

In [14]:
conversational_agent.agent.llm_chain.prompt.messages[0]

SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='Assistant is a large language model trained by OpenAI.\n\nAssistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n\nAssistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topi

Next, we define out custom prompt, which fits to the task the LLM should solve and overwrite the default prompt with out custom persona prompt:

In [15]:
fixed_prompt = '''Assistant is a large language model trained by OpenAI.

Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions on specific datasets of Earth Observation Data. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.

Assistant doesn't know anything about API requests and how they are built up.

Assistant also doesn't know information about the shape of datasets or od sea surface temperature.

Assistant knows that the usual way of earth observation data analysis follows the three steps: generating an API request, download the data using the API request, analysing data using the climetlab_python tool.

Overall, Assistant is a powerful system that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.'''


In [16]:
conversational_agent.agent.llm_chain.prompt.messages[0].prompt.template = fixed_prompt

We also adjust the prompt describing the available tools and required outputs to make the agent less prone to formating errors when generating input to the tools. 

In [17]:
fixed_human_prompt = '''TOOLS\n------\nAssistant can ask the user to use tools to look up information that may be helpful in answering the users original question. The tools the human can use are:\n\n> Download Data and get Shape: The input to the tool is an API request created by the APICreator tool! Given this API request as a json wrapped in a string, this tool downloads the data, converts it to an xarray and gives back the shape of the data. This tool is useful if you want to find out which python code is needed next to analyze the data.\n> Create API Request: Generates an API request for satellite data given a dictionary of \'start_time\' in the format "YYYY-MM-DDT00:00:00.000Z", \'end_time\' in the format "YYYY-MM-DDT00:00:00.000Z" and \'dataset_id\'. Please wrap the json in a string\n> Climetlab Data Analyzer: The input to the tool is valid python code to make data analysis. This tool is useful after downloading the data from an API to answer the user question. The data is a xarray dataset and is stored in the variable ds_local. It is most likely a das-sliced xarray, so always use the python package xarray to make the calculations and  the function.compute()in the end! Print the final result using the print() statement.\n> python executer: takes python code as input and executes it. all packages have to be imported inside the python code. make sure that relevant findings are printed to be further processed.\n\nRESPONSE FORMAT INSTRUCTIONS\n----------------------------\n\nWhen responding to me, please output a response in one of two formats:\n\n**Option 1:**\nUse this if you want the human to use a tool.\nMarkdown code snippet formatted in the following schema:\n\n```json\n{{\n    "action": string, \\\\ The action to take. Must be one of Download Data and get Shape, Create API Request, Climetlab Data Analyzer, python executer\n    "action_input": string \\\\ The input to the action\n}}\n```\n\n**Option #2:**\nUse this if you want to respond directly to the human. Markdown code snippet formatted in the following schema:\n\n```json\n{{\n    "action": "Final Answer",\n    "action_input": string \\\\ You should put what you want to return to use here\n}}\n``` Under no cicumstances use more than one markdown snippet in one response, under no circumstances print python code in '```' markdown snippets. \n\nUSER\'S INPUT\n--------------------\nHere is the user\'s input (remember to respond with a markdown code snippet of a json blob with a single action, and NOTHING else):\n\n{input}'''

conversational_agent.agent.llm_chain.prompt.messages[2].prompt.template = fixed_human_prompt

### <a id='sstqa'></a> 2.4 - Asking questions and receiving answers

Now we are ready to ask questions on Sea Surface Temperature to the Agent. The output illustrated the chain of thought the agent undergoes while solving the Task

In [18]:
conversational_agent.run("What was the sea surface temperature over the mediterranean sea for Christmas eve last year (2023)?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo answer this question, we first need to generate an API request to fetch the sea surface temperature data for the Mediterranean Sea on Christmas Eve, 2023. We will use the "Create API Request" tool for this. The dataset_id will depend on the specific dataset you are using for sea surface temperature data. For example, if you are using the Copernicus Marine Environment Monitoring Service (CMEMS) dataset, the dataset_id might be something like "SST_MED_SST_L4_NRT_OBSERVATIONS_010_004". Please replace "dataset_id" with the correct id for your dataset.

```json
{
    "action": "Create API Request",
    "action_input": "{\"start_time\": \"2023-12-24T00:00:00.000Z\", \"end_time\": \"2023-12-24T23:59:59.000Z\", \"dataset_id\": \"dataset_id\"}"
}
```[0m
Observation: [33;1m[1;3m{{
                      "dataset_id": "EO:EUM:DAT:METOP:GLB-SST-NC",
                      "dtstart": "2023-12-24T00:00:00.000Z",
                      "

                                                    


Observation: [36;1m[1;3m('Coordinates of xarray Dataset: ', Coordinates:
  * time     (time) datetime64[ns] 24B 2023-12-24 2023-12-24T12:00:00 2023-12-25
  * lat      (lat) float32 14kB -89.97 -89.92 -89.88 ... 89.88 89.92 89.97
  * lon      (lon) float32 29kB -180.0 -179.9 -179.9 ... 179.9 179.9 180.0, ' Variables: ', ['sea_surface_temperature', 'sst_dtime', 'sses_bias', 'sses_standard_deviation', 'dt_analysis', 'wind_speed', 'sea_ice_fraction', 'aerosol_dynamic_indicator', 'adi_dtime_from_sst', 'sources_of_adi', 'l2p_flags', 'quality_level', 'satellite_zenith_angle', 'solar_zenith_angle'])[0m
Thought:[32;1m[1;3mThe data for the sea surface temperature over the Mediterranean Sea for Christmas Eve, 2023 has been successfully downloaded. The data is in the form of an xarray Dataset with the following coordinates and variables:

Coordinates:
- time: 2023-12-24, 2023-12-24T12:00:00, 2023-12-25
- lat: Ranges from -89.97 to 89.97
- lon: Ranges from -180.0 to 180.0

Variables include '

'The average sea surface temperature over the Mediterranean Sea for Christmas Eve, 2023, was approximately 290.42 Kelvin.'

The LLM successfully chains together the defined tools for API request creation, data download, data exploration and, data analysis. With the code templates given to the LLM in the tools and its inherent knowledge of modules such as xarray, the LLM successfully calculates the average sea surface temperature for the given day. 

Our applications' response to the user: 

**'The average sea surface temperature over the Mediterranean Sea for Christmas Eve, 2023, was approximately 290.42 Kelvin.'**

## <a id='conclusions'></a> 3. Conclusions

Using an **pre-trained LLM and Earth Observation data access tools**, this notebook creates an application capable of data-driven answering of questions about the Earth. While it shows great potential for future extensions, some **limitations** and potential **improvements**  need to be discussed:

1. the scope of the application is **limited to one dataset only**. Therefore, the application does not contain a module that can select specific datasets for a user question. This can get tricky when there is more than one dataset available for the same topic, or the datasets are not ideally documented. To extend this application with a **"data selection"** module, the next step could be to include RAG retreivals to the agent, supplying the LLM with information on the potential datasets with the user question. 

2. **Usage of tokens**. The ReAct and Agent framework is really powerful for self-contained task solving of the LLM. However, one must keep in mind that it causes a high number of input+context tokens (~ 7000 for one user question) and several rounds of LLM predictions (one for each new thought). This makes the presented application **costly**. Using the OpenAI gtp-4 model API, each user question cumulates to around 0.25$. This cost can be reduced by reducing the input prompts, which can come at the cost of the abilities of the LLM. Substituting the (relatively costly) OpenAI gpt-4 API with the gpt-3.5-turbo API or APIs for open-source models could further reduce the cost. 

3. **Code generation**. The custom tools rely heavily on **fixed code templates** of data download using the climetlab plugin. This has the advantage that of allowing no hallucinations from the LLM when it comes to code generation. On the downside, it also reduces the flexibility of the solution. In case other datasets require different data access methods, the "Data Downloader" tool has plenty of room for optimization. 

This notebook provides a **starting point** for LLM-powered application development. It shows that it is possible to interact with and analyze earth observation data using LLMs and is intended to be an **inspiration for more user projects** in this area. 


If you have been working on similar projects, you want to share feedback or in case of questions, please do not hesitate to contact me (annalena.erdmann@eumetsat.int). 


## <a id='resources'></a> 4. References and Additional Resources

### 4.1  References of this notebook 

<a id="Naveedetal2023"></a>
Naveed, Humza, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian. “A Comprehensive Overview of Large Language Models.” arXiv, April 9, 2024. http://arxiv.org/abs/2307.06435.


<a id="Songetal2023"></a>
Song, Yifan, Weimin Xiong, Dawei Zhu, Wenhao Wu, Han Qian, Mingbo Song, Hailiang Huang, et al. “RestGPT: Connecting Large Language Models with Real-World RESTful APIs.” arXiv, August 26, 2023. http://arxiv.org/abs/2306.06624.


<a id="Vasvanietal2017"></a>
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. “Attention Is All You Need.” arXiv, August 1, 2023. http://arxiv.org/abs/1706.03762.


<a id="Yaoetal2023"></a>
Yao, Shunyu, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. “ReAct: Synergizing Reasoning and Acting in Language Models.” arXiv, March 9, 2023. http://arxiv.org/abs/2210.03629.

<a id="Xietal2017"></a>
Xi, Zhiheng, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, et al. “The Rise and Potential of Large Language Model Based Agents: A Survey.” arXiv, September 19, 2023. http://arxiv.org/abs/2309.07864.


### 4.2 Additional Resources

More resources of LLM-powered applications for geospatial data:

- Roberts, Jonathan, Timo Lüddecke, Sowmen Das, Kai Han, and Samuel Albanie. “GPT4GEO: How a Language Model Sees the World’s Geography.” arXiv, May 30, 2023. http://arxiv.org/abs/2306.00020.

- Zhang, Yifan, Cheng Wei, Shangyou Wu, Zhengting He, and Wenhao Yu. “GeoGPT: Understanding and Processing Geospatial Tasks through An Autonomous GPT.” arXiv, July 15, 2023. http://arxiv.org/abs/2307.07930.

- Li, Zhenlong, and Huan Ning. “Autonomous GIS: The next-Generation AI-Powered GIS.” International Journal of Digital Earth 16, no. 2 (December 8, 2023): 4668–86. https://doi.org/10.1080/17538947.2023.2278895.


Valuable Youtube tutorials on LangChain and LLM application development: 

- <a href="https://www.youtube.com/watch?v=J_0qvRt4LNk&t=3s" title=”Langchain1”>LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab</a>
- <a href="https://www.youtube.com/watch?v=ziu87EXZVUE&list=PL8motc6AQftk1Bs42EW45kwYbyJ4jOdiZ&index=11" title=”Langchainagents”>LangChain Agents - Joining Tools and Chains with Decisions</a>
- <a href="https://www.youtube.com/watch?v=biS8G8x8DdA&list=PL8motc6AQftk1Bs42EW45kwYbyJ4jOdiZ&index=20" title=”Langchaincustom”>Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)</a>

More information on WEkEO, WEkEO Data Access Tools and WEkEO datasets: 

- <a href="https://wekeo.eu/">WEkEO Website</a>
- <a href="https://help.wekeo.eu/en/articles/8490164-how-to-use-the-climetlab-wekeo-plugin">WEkEO climetlab Plugins</a>
- <a href="https://www.wekeo.eu/data?view=catalogue">WEkEO Data Catalogue</a>







<a href='https://www.futurelearn.com/courses/artificial-intelligence-for-earth-monitoring/1/steps/1291928' target='_blank'><< Back to FutureLearn</a><br>

<hr>

<img src='./img/copernicus_logo.png' alt='Copernicus logo' align='left' width='20%'></img>

Course developed for <a href='https://www.eumetsat.int/' target='_blank'> EUMETSAT</a>, <a href='https://www.ecmwf.int/' target='_blank'> ECMWF</a> and <a href='https://www.mercator-ocean.fr/en/' target='_blank'> Mercator Ocean International</a> in support of the <a href='https://www.copernicus.eu/en' target='_blank'> EU's Copernicus Programme</a> and the <a href='https://wekeo.eu/' target='_blank'> WEkEO platform</a>.
