<a href="https://colab.research.google.com/github/YorkJong/news-digest/blob/main/notebooks/news_clip.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

這支 notebook 是為了選取剪輯 news-digest 的部分新聞分類，以利後續轉貼
執行步驟如下：
1. 執行 Functions
2. 根據需要，填表執行後面的兩個 Forms

### Functions

In [104]:
import requests
import re


# Assign GitHub repository and path of "journals" folder
repo = "YorkJong/news-digest"
path = "journals"

def get_latest_fn(path):
    '''Returns the filename of latest date.
    '''
    api_url = f"https://api.github.com/repos/{repo}/contents/{path}"

    # Request to the GitHub API to get all the archives under the journal folder
    response = requests.get(api_url)
    if response.status_code == 200:
        content = response.json()
        pattern = r'^\d{4}_\d{2}_\d{2}\.md$'
        date_list = [f['name'] for f in content if re.match(pattern, f['name'])]
    
        # Find the filename with the latest date
        latest_date = max(date_list)
        #latest_date = sorted(date_list)[-1]
        return latest_date
    else:
        print(f"Error {response.status_code}: {response.reason}")
        return ""


def get_file(fn):
    '''Returns a file content of the news-digest site.
    '''
    file_url = f"https://raw.githubusercontent.com/{repo}/main/{path}/{fn}"

    response = requests.get(file_url)
    if response.status_code == 200:
        return response.content.decode('utf-8')
    else:
        print(f"Error {response.status_code}: {response.reason}")
    return ""


def get_categories(content):
    '''Returns category names of the content of news-digest.
    '''
    lines = content.split('\n')
    categorys = []
    tag = '### '
    for line in lines:
        if line.startswith(tag):
            categorys += [line[len(tag):]]
    return categorys


def get_links_of_category(kind_name, content, remove_hashtags=True):
    '''Returns links of given kind.
    '''
    text = content

    header = kind_name
    if not header.startswith("### "):
        header = f'### {header}'

    # Remove hashtags after each link
    if remove_hashtags:
        text = '\n'.join(
            [re.sub(r'#[\w/]+', '', line) for line in text.split('\n')])
        
    trigger = False
    lines = []
    for line in text.split('\n'):
        if header in line:
            trigger = True
            continue
        if line.startswith("###"):
            if trigger:
                break
        if trigger:
            lines += [line]
    return '\n'.join(lines)


def get_sublist(all, first, last):
    '''Get a subset of a list of given range.
    '''
    ret = []
    trigger = False
    for item in all: 
        if item == first:
            trigger = True
        if trigger:
            ret += [item]
        if item == last:
            break
    return ret

### Form to list links of given category

In [107]:
#@title  { run: "auto" }
#@title  { run: "auto", vertical-output: true }
#@title
category = "Tech Titans" #@param ["Tesla & SpaceX; Vehicle", "Tech Titans", "Finance", "Taiwan", "Crypto", "Technology", "AI"] {allow-input: true}
remove_hashtags = True #@param {type:"boolean"}

fn = get_latest_fn(path)
content = get_file(fn)

lines = get_links_of_category(category, content, remove_hashtags)
print(f'### {category}') 
print(lines)

# Copy text to clipboard in Python using pandas module
#import pandas as pd
#df=pd.DataFrame(['Text to copy'])
#df.to_clipboard(index=False, header=False)

### Tech Titans
- [Google Glass 停售 但 Project Iris 虛擬視覺項目仍進行中](https://www.cool3c.com/article/190829) 
- [外電報導指出微軟計畫 2024 年推出手機遊戲商店 與 Apple、Google 競爭市場](https://gnn.gamer.com.tw/detail.php?sn=246899)   
- [GPT-4 讓微軟 Office 也有了「iPhone 時刻」](https://technews.tw/2023/03/20/microsoft-365-copilot/)  
- [AI搶走了元宇宙的熱度？祖克柏所期待的元宇宙還有更多挑戰](https://www.techbang.com/posts/104627-metaverse-dead)   
- [輝達GTC登場 黃仁勳點名4大AI最夯](https://ctee.com.tw/news/tech/828010.html)   
- [AMD市佔漲不動了 分析師稱Intel的麻煩已結束](https://news.xfastest.com/intel/125467/analyst-intel/)   
-


In [105]:
#@title  { run: "auto" }
#@title  { run: "auto", vertical-output: true }
#@title
first_category = "Tesla & SpaceX; Vehicle" #@param ["Tesla & SpaceX; Vehicle", "Tech Titans", "Finance", "Taiwan", "Crypto", "Technology", "AI"] {allow-input: true}
last_category = "Finance" #@param ["Tesla & SpaceX; Vehicle", "Tech Titans", "Finance", "Taiwan", "Crypto", "Technology", "AI"] {allow-input: true}
remove_hashtags = True #@param {type:"boolean"}

fn = get_latest_fn(path)
content = get_file(fn)
categories = get_categories(content)
categories = get_sublist(categories, first_category, last_category)

for category in categories:
    lines = get_links_of_category(category, content, remove_hashtags)
    print(f'### {category}') 
    print(lines)


### Tesla & SpaceX; Vehicle
- [特斯拉視覺測距機能來了！會顯示障礙物距離也有聲音警示，但台灣車主還要再等等](https://www.ddcar.com.tw/article/34663)    
- [小鵬飛行汽車外形專利曝光　四螺旋槳加四輪體積龐大](https://unwire.hk/2023/03/18/xpeng-flying-car/life-tech/auto/)  
-
- ### Tech Titans
- [Google Glass 停售 但 Project Iris 虛擬視覺項目仍進行中](https://www.cool3c.com/article/190829) 
- [外電報導指出微軟計畫 2024 年推出手機遊戲商店 與 Apple、Google 競爭市場](https://gnn.gamer.com.tw/detail.php?sn=246899)   
- [GPT-4 讓微軟 Office 也有了「iPhone 時刻」](https://technews.tw/2023/03/20/microsoft-365-copilot/)  
- [AI搶走了元宇宙的熱度？祖克柏所期待的元宇宙還有更多挑戰](https://www.techbang.com/posts/104627-metaverse-dead)   
- [輝達GTC登場 黃仁勳點名4大AI最夯](https://ctee.com.tw/news/tech/828010.html)   
- [AMD市佔漲不動了 分析師稱Intel的麻煩已結束](https://news.xfastest.com/intel/125467/analyst-intel/)   
-
### Finance
- [美國金融業吵著要糖！中小銀行聯盟要FDIC「給全部存款保險2年」避免擠兌爆發](https://www.blocktempo.com/us-midsize-banks-seek-fdic-insurance/) 
- [本週財經大事彙整 週四FOMC利率決議即將登場](https://news.cnyes.com/news/id/5117411) 
- [美元展望∶銀行業危機後聯準會將審慎加息?美元或看跌](https://www.dailyfxasi

### Test

In [83]:
get_sublist(list('abcdefg'), 'b', 'd')

['b', 'c', 'd']

In [74]:
fn = get_latest_fn(path)
content = get_file(fn)
print(get_categories(content)) 

['Tesla & SpaceX; Vehicle', ' Tech Titans', 'Finance', 'Taiwan', 'Crypto', 'Technology', 'AI']


In [None]:
fn = get_latest_fn(path)
print(fn)

content = get_file(fn)
print(content)