## Question1
- Using the requests library, send a GET request to a specified URL.
- Print the response headers.
- Determine if the content is CSV or XLSX based on the Content-Type header.
- Load the content into a pandas DataFrame:
- Utilize io.BytesIO to create a buffer from r.content.
- Employ either read_csv or read_excel accordingly.
- Demonstrate the usage of r.headers, r.content, and the buffer.

In [None]:
import io
import requests
import pandas as pd

url = "REPLACE_WITH_DATA_URL"
r = requests.get(url, timeout=30)
print(r.headers)
ct = r.headers.get("Content-Type", "").lower()
buf = io.BytesIO(r.content)

if "csv" in ct or ct.startswith("text/"):
    # If needed, decode and use StringIO
    text = r.content.decode("utf-8", errors="ignore")
    df = pd.read_csv(io.StringIO(text))
else:
    # Many data portals serve Excel as application/octet-stream
    df = pd.read_excel(buf)

print(df.head())

In [None]:
url = 'example_url'
headers = 'example_header'
payload = {'key':'value'}
r = requests.get(url,header = headers, param = payload)
print(r.headers)


## Question2
Using requests, pypdf, and io.BytesIO, extract and return a list of dictionaries, each containing the page number and text, for all pages from a PDF URL that include the word “anchovies” (case-insensitive).

In [None]:
url = "REPLACE_WITH_PDF_URL"
pdf_data = io.BytesIO(requests.get(url, timeout=30).content)
reader = PdfReader(pdf_data)
out = []
for i, page in enumerate(reader.pages, start=1):
    text = (page.extract_text() or "").lower()
    if "anchovies" in text:
        out.append({i: text})
print(out)

## Question3
- To read a private XLSX file named "fish" from Google Cloud Storage (GCS) into a DataFrame using a service account key, follow these steps:
- Retrieve GCP_SERVICE_ACCOUNT_KEY, GCP_PROJECT_ID, and GCP_BUCKET_NAME from your .env file.
- Authenticate using google.oauth2.service_account and google.cloud.storage.
- Download the file's bytes.
- Load the "fish" sheet into a pandas DataFrame using pandas.read_excel.

In [None]:
from dotenv import load_dotenv
from google.oauth2 import service_account
from google.cloud import storage

load_dotenv()
key_path = os.getenv("GCP_SERVICE_ACCOUNT_KEY")
project = os.getenv("GCP_PROJECT_ID")
bucket_name = os.getenv("GCP_BUCKET_NAME")
blob_name = "2020-09-11_microparticledata.xlsx"

creds = service_account.Credentials.from_service_account_file(key_path)
client = storage.Client(project=project, credentials=creds)
blob = client.bucket(bucket_name).blob(blob_name)
data = blob.download_as_bytes()
df = pd.read_excel(io.BytesIO(data), sheet_name="fish")
print(df.head())

## Question4
Generate a list of 100 random integers, convert it to bytes using pickle, and upload the resulting ex02.pickle file to GCS. Ensure the same service account authentication flow is used for the upload.

In [None]:
from dotenv import load_dotenv
from google.oauth2 import service_account
from google.cloud import storage

load_dotenv()
key_path = os.getenv("GCP_SERVICE_ACCOUNT_KEY")
project = os.getenv("GCP_PROJECT_ID")
bucket_name = os.getenv("GCP_BUCKET_NAME")
blob_name = "2020-09-11_microparticledata.xlsx"

creds = service_account.Credentials.from_service_account_file(key_path)
client = storage.Client(project=project, credentials=creds)
blob = client.bucket(bucket_name).blob(blob_name)
data = blob.download_as_bytes()
df = pd.read_excel(io.BytesIO(data), sheet_name="fish")
print(df.head())


## Question5
Develop a Git workflow that includes creating a feature/search branch, making a single commit, merging it into main, performing an initial push with an upstream branch, and finally deleting the remote branch. (No actual repository execution is necessary.)

git branch feature/search
git checkout feature/search
# ... make changes ...
git add .
git commit -m "feat: add search example"
git checkout main
git merge feature/search
git push -u origin feature/search
git push origin --delete feature/search
