<a href="https://colab.research.google.com/github/YorkJong/news-digest/blob/main/notebooks/news_clip.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

這支 notebook 是為了選取剪輯 news-digest 的部分新聞分類，以利後續轉貼
執行步驟如下：
1. 執行 Functions
2. 根據需要，填表執行後面的兩個 Forms

### Functions

In [10]:
import requests
import re


# Assign GitHub repository and path of "journals" folder
repo = "YorkJong/news-digest"
path = "journals"

def get_latest_fn(path):
    '''Returns the filename of latest date.
    '''
    api_url = f"https://api.github.com/repos/{repo}/contents/{path}"

    # Request to the GitHub API to get all the archives under the journal folder
    response = requests.get(api_url)
    if response.status_code == 200:
        content = response.json()
        pattern = r'^\d{4}_\d{2}_\d{2}\.md$'
        date_list = [f['name'] for f in content if re.match(pattern, f['name'])]
    
        # Find the filename with the latest date
        latest_date = max(date_list)
        #latest_date = sorted(date_list)[-1]
        return latest_date
    else:
        print(f"Error {response.status_code}: {response.reason}")
        return ""


def get_file(fn):
    '''Returns a file content of the news-digest site.
    '''
    file_url = f"https://raw.githubusercontent.com/{repo}/main/{path}/{fn}"

    response = requests.get(file_url)
    if response.status_code == 200:
        return response.content.decode('utf-8')
    else:
        print(f"Error {response.status_code}: {response.reason}")
    return ""


def get_categories(content):
    '''Returns category names of the content of news-digest.
    '''
    lines = content.split('\n')
    categorys = []
    tag = '### '
    for line in lines:
        if line.startswith(tag):
            categorys += [line[len(tag):]]
    return categorys


def get_links_of_category(kind_name, content, remove_hashtags=True):
    '''Returns links of given kind.
    '''
    text = content

    header = kind_name
    if not header.startswith("### "):
        header = f'### {header}'

    # Remove hashtags after each link
    if remove_hashtags:
        lines = text.split('\n')
        lines = [re.sub(r'\s*#\[\[\S+\]\]', '', line) for line in lines]
        lines = [re.sub(r'\s*#[\w/]+', '', line) for line in lines]
        text = '\n'.join(lines)
        
    trigger = False
    lines = []
    for line in text.split('\n'):
        if header in line:
            trigger = True
            continue
        if line.startswith("###"):
            if trigger:
                break
        if trigger:
            lines += [line]
    return '\n'.join(lines)


def get_sublist(all, first, last):
    '''Get a subset of a list of given range.
    '''
    ret = []
    trigger = False
    for item in all: 
        if item == first:
            trigger = True
        if trigger:
            ret += [item]
        if item == last:
            break
    return ret

### Form to list links of given category

In [12]:
#@title  { run: "auto" }
#@title  { run: "auto", vertical-output: true }
#@title
category = "Tesla & SpaceX; Vehicle" #@param ["Tesla & SpaceX; Vehicle", "Tech Titans", "Finance", "Taiwan", "Crypto", "Technology", "AI"] {allow-input: true}
remove_hashtags = True #@param {type:"boolean"}

fn = get_latest_fn(path)
content = get_file(fn)

lines = get_links_of_category(category, content, remove_hashtags)
print(f'### {category}') 
print(lines)

# Copy text to clipboard in Python using pandas module
#import pandas as pd
#df=pd.DataFrame(['Text to copy'])
#df.to_clipboard(index=False, header=False)

### Tesla & SpaceX; Vehicle
- [豐田宣布引進Mirai氫能車！預告現身台灣智慧移動展-國內車訊|8891汽車](https://c.8891.com.tw/news/16578)
- [賓士將砸數十億美元，投資電動車廠](https://technews.tw/2023/03/20/benz-to-spend-billions-investing-in-electric-car-factory/)
- [麥拉倫切入電動車產業，但不是靠車子](https://finance.technews.tw/2023/03/20/mclaren-inverter-ev-industry/)
-


In [11]:
#@title  { run: "auto" }
#@title  { run: "auto", vertical-output: true }
#@title
first_category = "Tesla & SpaceX; Vehicle" #@param ["Tesla & SpaceX; Vehicle", "Tech Titans", "Finance", "Taiwan", "Crypto", "Technology", "AI"] {allow-input: true}
last_category = "Finance" #@param ["Tesla & SpaceX; Vehicle", "Tech Titans", "Finance", "Taiwan", "Crypto", "Technology", "AI"] {allow-input: true}
remove_hashtags = True #@param {type:"boolean"}

fn = get_latest_fn(path)
content = get_file(fn)
categories = get_categories(content)
categories = get_sublist(categories, first_category, last_category)

for category in categories:
    lines = get_links_of_category(category, content, remove_hashtags)
    print(f'### {category}') 
    print(lines)


### Tesla & SpaceX; Vehicle
- [豐田宣布引進Mirai氫能車！預告現身台灣智慧移動展-國內車訊|8891汽車](https://c.8891.com.tw/news/16578)
- [賓士將砸數十億美元，投資電動車廠](https://technews.tw/2023/03/20/benz-to-spend-billions-investing-in-electric-car-factory/)
- [麥拉倫切入電動車產業，但不是靠車子](https://finance.technews.tw/2023/03/20/mclaren-inverter-ev-industry/)
-
### Tech Titans
- [微軟Edge瀏覽器也推VSR影片增強技術，NV、AMD顯卡都能提升阿公級低解析度影片](https://www.techbang.com/posts/104435-microsoft-edge-rescues-low-definition-old-videos-and-becomes)
- [內建加密貨幣交易功能，傳 Microsoft Edge 祕密測試加密錢包](https://technews.tw/2023/03/20/testing-a-built-in-crypto-wallet-in-microsoft-edge/)
- [路透：微軟給歐盟的反壟斷補救措施僅針對雲端串流媒體競爭對手](https://m.cnyes.com/news/id/5117533)
- [傳蘋果開發大型語言模型提升 Siri 體驗](https://technews.tw/2023/03/21/apple-is-reportedly-experimenting-with-language-generating-ai/)
- [蘋果 CarPlay 危機將至？Google 布局已成](https://technews.tw/2023/03/21/apple-carplay-vs-google-android-automotive/)
- [ChatGPT有多強？對手Google實測可以當「年薪500萬」工程師，Bard怎麼比拼？](https://www.bnext.com.tw/article/74494/google-testin

### Test

In [83]:
get_sublist(list('abcdefg'), 'b', 'd')

['b', 'c', 'd']

In [74]:
fn = get_latest_fn(path)
content = get_file(fn)
print(get_categories(content)) 

['Tesla & SpaceX; Vehicle', ' Tech Titans', 'Finance', 'Taiwan', 'Crypto', 'Technology', 'AI']


In [None]:
fn = get_latest_fn(path)
print(fn)

content = get_file(fn)
print(content)