# 法規資料庫分析
- 資料來源：全國法規資料庫所收錄之中文法律資料檔下載 https://data.gov.tw/dataset/18289
- 下載 zip, 解開後有  FalV.xml
```
  <法規>
    <法規性質>憲法</法規性質>
    <法規名稱>中華民國憲法</法規名稱>
    <法規網址>https://law.moj.gov.tw/LawClass/LawAll.aspx?pcode=A0000001</法規網址>
    <法規類別>憲法</法規類別>
    <最新異動日期>19470101</最新異動日期>
    <生效日期>
    </生效日期>
    <生效內容><![CDATA[]]></生效內容>
    <廢止註記>
    </廢止註記>
    <是否英譯註記>Y</是否英譯註記>
    <英文法規名稱>Constitution of the Republic of China (Taiwan)</英文法規名稱>
    <附件 />
    <沿革內容><![CDATA[1.中華民國三十六年一月一日國民政府令公布
  中華民國三十六年十二月二十五日施行
  中華民國三十五年十二月二十五日國民大會通過]]></沿革內容>
    <前言><![CDATA[中華民國國民大會受全體國民之付託，依據孫中山先生創立中華民國之遺
教，為鞏固國權，保障民權，奠定社會安寧，增進人民福利，制定本憲法
，頒行全國，永矢咸遵。]]></前言>
    <法規內容>
      <編章節>   第 一 章 總綱</編章節>
      <條文>
        <條號>第 1 條</條號>
        <條文內容><![CDATA[中華民國基於三民主義，為民有民治民享之民主共和國。]]></條文內容>
      </條文>
    </法規內容>

  </法規>
  <法規>
```


<法規類別>行政＞勞動部＞組織目</法規類別>

後來知道有命令檔（位階是命令）可以下載。
看起來內容格式完全相同

## PostgreSQL Database Integration

This notebook has been enhanced to support integration with a PostgreSQL database. Key features include:

*   **Connection**: Ability to connect to a user-configured PostgreSQL instance.
*   **Schema**: Designed to work with the schema defined in `law_meta_db_v0.2.sql`. Please ensure this schema is applied to your target database.
*   **Data Synchronization**: When processing XML law files, the notebook can now insert new laws and their articles into the database, or update existing ones based on their PCode.
*   **Metadata Preservation**: During updates from XML, any existing data in fields not directly derived from the XML (e.g., `llm_summary`, `llm_keywords`, `law_metadata` in the `laws` table) will be preserved.
*   **Manual Operations**: Sections are provided for manual database synchronization tasks and direct database querying.

**Important**: Before using database functionalities, please configure your connection details in the 'PostgreSQL Connection Configuration' cell below.

In [None]:
# --- PostgreSQL Connection Configuration ---
# Please fill in your PostgreSQL connection details below.
# For security, especially in shared or production environments,
# it is STRONGLY recommended to use environment variables for sensitive data like passwords,
# rather than hardcoding them here. The connection functions will prioritize environment variables
# (e.g., os.getenv("DB_PASSWORD")) if available, falling back to these direct assignments.

DB_HOST = "localhost"  # Or os.getenv("DB_HOST", "localhost")
DB_PORT = "5432"       # Or os.getenv("DB_PORT", "5432")
DB_USER = "postgres"  # Or os.getenv("DB_USER", "your_user")
DB_PASSWORD = "postgres" # Or os.getenv("DB_PASSWORD", "your_password")
DB_NAME = "test_law_db_agent"     # Or os.getenv("DB_NAME", "law_db")

# Placeholder for the database connection object
db_connection = None
db_cursor = None

print("Database configuration loaded.")
print(f"Target DB: {DB_USER}@{DB_HOST}:{DB_PORT}/{DB_NAME}")

In [None]:
import pandas as pd
import re
import psycopg2
import os

# (Global db_connection and db_cursor are defined in the config cell)

def get_db_params():
    params = {
        'host': os.getenv("DB_HOST", DB_HOST),
        'port': os.getenv("DB_PORT", DB_PORT),
        'user': os.getenv("DB_USER", DB_USER),
        'password': os.getenv("DB_PASSWORD", DB_PASSWORD),
        'dbname': os.getenv("DB_NAME", DB_NAME)
    }
    return params

def connect_db():
    global db_connection, db_cursor
    if db_connection:
        print("Already connected to the database.")
        return True

    params = get_db_params()
    try:
        print(f"Connecting to PostgreSQL: {params['user']}@{params['host']}:{params['port']}/{params['dbname']}...")
        db_connection = psycopg2.connect(**params)
        db_cursor = db_connection.cursor()
        print("Successfully connected to PostgreSQL.")
        return True
    except psycopg2.Error as e:
        db_connection = None
        db_cursor = None
        print(f"Error connecting to PostgreSQL: {e}")
        return False

def disconnect_db():
    global db_connection, db_cursor
    if db_cursor:
        db_cursor.close()
        db_cursor = None
    if db_connection:
        db_connection.close()
        db_connection = None
        print("Disconnected from PostgreSQL.")
    else:
        print("Not connected to any database.")

def check_db_connection():
    global db_connection, db_cursor
    if not db_connection or db_connection.closed != 0:
        print("No active database connection.")
        return False
    try:
        db_cursor.execute("SELECT 1;")
        db_cursor.fetchone()
        print("Database connection is active.")
        return True
    except psycopg2.Error as e:
        print(f"Database connection check failed: {e}")
        # Attempt to reconnect or advise user
        return False

# Example usage / test call (optional, can be in a separate cell)
# if connect_db():
#     check_db_connection()
#     disconnect_db()

In [None]:
# --- Helper Functions for Data Conversion and Laws Table Operations ---
# These functions prepare data for SQL or execute specific parts of DB operations.
# They do NOT handle transactions (commit/rollback) themselves; that's managed by the calling function (e.g., synchronize_lawmgr_with_db).

def str_to_date_for_sql(date_str): # Helper for YYYYMMDD string to be used with TO_DATE
    if date_str and isinstance(date_str, str) and len(date_str) == 8:
        try:
            int(date_str) 
            return date_str
        except ValueError:
            return None
    return None

def str_to_bool(yn_str): # Helper for 'Y'/'N'
    if yn_str == 'Y':
        return True
    if yn_str == 'N':
        return False
    return None

def get_law_from_db(pcode_val):
    global db_cursor, db_connection
    if not db_cursor or not db_connection or (hasattr(db_connection, 'closed') and db_connection.closed != 0):
        return None
    try:
        db_cursor.execute("SELECT * FROM laws WHERE pcode = %s;", (pcode_val,))
        colnames = [desc[0] for desc in db_cursor.description]
        row = db_cursor.fetchone()
        if row:
            return dict(zip(colnames, row))
        return None
    except Exception as e:
        print(f"Error fetching law {pcode_val} from DB: {e}")
        return None

def insert_law_to_db(law_tags):
    """Inserts a law record. Does not commit or rollback. Returns True on successful execution, False otherwise."""
    global db_cursor
    if not db_cursor: return False
    
    pcode = law_tags.get('PCode')
    if not pcode: # PCode is essential as it's a primary key or unique identifier.
        print("Cannot insert law: PCode is missing from law_tags.")
        return False

    # Fields mapped directly from XML. Other fields like llm_summary, keywords, etc., are NOT set here.
    data_for_insert = {
        'pcode': pcode,
        'xml_law_nature': law_tags.get('法規性質'),
        'xml_law_name': law_tags.get('法規名稱'),
        'xml_law_url': law_tags.get('法規網址'),
        'xml_law_category': law_tags.get('法規類別'),
        'xml_latest_change_date': str_to_date_for_sql(law_tags.get('最新異動日期')),
        'xml_effective_date': law_tags.get('生效日期'), # Might be empty or need parsing
        'xml_effective_content': law_tags.get('生效內容'),
        'xml_abolition_mark': law_tags.get('廢止註記'), # Might be empty
        'xml_is_english_translated': str_to_bool(law_tags.get('是否英譯註記')),
        'xml_english_law_name': law_tags.get('英文法規名稱'),
        'xml_attachment': law_tags.get('附件'), # Might be empty
        'xml_history_content': law_tags.get('沿革內容'),
        'xml_preamble': law_tags.get('前言') # Might be empty
    }

    sql = """
    INSERT INTO laws (
        pcode, xml_law_nature, xml_law_name, xml_law_url, xml_law_category,
        xml_latest_change_date, xml_effective_date, xml_effective_content,
        xml_abolition_mark, xml_is_english_translated, xml_english_law_name,
        xml_attachment, xml_history_content, xml_preamble
    ) VALUES (
        %(pcode)s, %(xml_law_nature)s, %(xml_law_name)s, %(xml_law_url)s, %(xml_law_category)s,
        CASE WHEN %(xml_latest_change_date)s IS NOT NULL THEN TO_DATE(%(xml_latest_change_date)s, 'YYYYMMDD') ELSE NULL END,
        %(xml_effective_date)s, %(xml_effective_content)s,
        %(xml_abolition_mark)s, %(xml_is_english_translated)s, %(xml_english_law_name)s,
        %(xml_attachment)s, %(xml_history_content)s, %(xml_preamble)s
    );
    """
    try:
        db_cursor.execute(sql, data_for_insert)
        return True
    except Exception as e:
        print(f"Error during insert_law_to_db for {pcode}: {e}")
        return False

def update_law_in_db(pcode_val, law_tags):
    """Updates a law record's XML-derived fields. Does not commit or rollback. Returns True on successful execution, False otherwise."""
    global db_cursor
    if not db_cursor: return False

    update_fields_sql = []
    update_values_dict = {}
    # Map of XML tags to DB columns and whether they are dates needing TO_DATE conversion.
    # LLM-generated fields or manually edited fields are NOT part of this map and thus are preserved.
    field_map = {
        '法規性質': ('xml_law_nature', False),
        '法規名稱': ('xml_law_name', False),
        '法規網址': ('xml_law_url', False),
        '法規類別': ('xml_law_category', False),
        '最新異動日期': ('xml_latest_change_date', True), 
        '生效日期': ('xml_effective_date', False),
        '生效內容': ('xml_effective_content', False),
        '廢止註記': ('xml_abolition_mark', False),
        '是否英譯註記': ('xml_is_english_translated', False), # Needs str_to_bool conversion
        '英文法規名稱': ('xml_english_law_name', False),
        '附件': ('xml_attachment', False),
        '沿革內容': ('xml_history_content', False),
        '前言': ('xml_preamble', False)
    }

    for tag_key, (db_col, is_date) in field_map.items():
        if tag_key in law_tags: # Only update if the tag is present in the source XML data
            value = law_tags[tag_key]
            placeholder = db_col # Use db_col name as placeholder key for clarity

            if is_date:
                sql_val = str_to_date_for_sql(value)
                update_fields_sql.append(f"{db_col} = CASE WHEN %({placeholder})s IS NOT NULL THEN TO_DATE(%({placeholder})s, 'YYYYMMDD') ELSE NULL END")
                update_values_dict[placeholder] = sql_val
            elif db_col == 'xml_is_english_translated': # Special handling for boolean conversion
                sql_val = str_to_bool(value)
                update_fields_sql.append(f"{db_col} = %({placeholder})s")
                update_values_dict[placeholder] = sql_val
            else:
                update_fields_sql.append(f"{db_col} = %({placeholder})s")
                update_values_dict[placeholder] = value
    
    if not update_fields_sql: # No XML-derived fields to update
        return True 

    sql_set_clause = ", ".join(update_fields_sql)
    sql = f"UPDATE laws SET {sql_set_clause} WHERE pcode = %(pcode_where_val)s;"
    update_values_dict['pcode_where_val'] = pcode_val

    try:
        db_cursor.execute(sql, update_values_dict)
        return True
    except Exception as e:
        print(f"Error during update_law_in_db for {pcode_val}: {e}")
        return False

In [None]:
# --- Articles Table Operations ---
# These functions operate on the 'articles' table. 
# They do NOT handle transactions (commit/rollback) themselves.

def insert_articles_to_db(law_id_val, articles_list):
    """Inserts a list of articles for a given law_id. 
       Does not commit or rollback. Raises an exception if any article fails to insert,
       allowing the caller to handle the transaction for the entire set of articles.
       Returns (True, inserted_count) on success, (False, 0) if initial checks fail.
    """
    global db_cursor
    if not db_cursor: return False, 0

    inserted_count = 0
    sql = """
    INSERT INTO articles (
        law_id, xml_chapter_section, xml_article_number, xml_article_content
    ) VALUES (
        %(law_id)s, %(xml_chapter_section)s, %(xml_article_number)s, %(xml_article_content)s
    );
    """
    
    for article_data in articles_list:
        # Individual article insertion is part of a larger transaction managed by the caller.
        # If one article fails, an exception is raised to ensure atomicity for the law's articles.
        data_for_insert = {
            'law_id': law_id_val,
            'xml_chapter_section': article_data.get('編章節'),
            'xml_article_number': article_data.get('條號'),
            'xml_article_content': article_data.get('條文內容')
        }
        if not data_for_insert['xml_article_number']:
            print(f"Skipping article due to missing '條號' for law_id {law_id_val}")
            continue
        try:
            db_cursor.execute(sql, data_for_insert)
            inserted_count += 1
        except Exception as e:
            # Re-raise to be caught by the transaction-managing function (synchronize_lawmgr_with_db)
            print(f"Error inserting article {article_data.get('條號')} for law_id {law_id_val}: {e}")
            raise 
    return True, inserted_count

def delete_articles_for_law(law_id_val):
    """Deletes all articles for a given law_id. Does not commit or rollback. Returns True on success, False otherwise."""
    global db_cursor
    if not db_cursor: return False
    
    sql = "DELETE FROM articles WHERE law_id = %s;"
    try:
        db_cursor.execute(sql, (law_id_val,))
        return True
    except Exception as e:
        print(f"Error during delete_articles_for_law for law_id {law_id_val}: {e}")
        return False

In [None]:
# --- Database Synchronization Function ---
# This function orchestrates the synchronization of LawMgr data with the database,
# managing transactions on a per-law basis.

def synchronize_lawmgr_with_db(lawmgr_instance):
    """Synchronizes all laws and their articles from a LawMgr instance to the database.
       Manages transactions on a per-law basis: all changes for a single law (law record + its articles)
       are committed together, or rolled back if any part of the process fails for that law.
    """
    global db_connection # Needed for commit/rollback
    if not check_db_connection(): # Verifies db_cursor and db_connection are active
        print("Database not connected. Skipping DB synchronization.")
        return

    print(f"Starting DB synchronization for {len(lawmgr_instance.laws)} laws in LawMgr...")
    synced_laws_count = 0

    for law_name, law_obj_from_mgr in lawmgr_instance.laws.items():
        law_tags = law_obj_from_mgr.tags 
        pcode = law_tags.get('PCode')

        if not pcode:
            print(f"Warning: PCode missing for law '{law_name}' in LawMgr. Skipping DB sync for this law.")
            continue
        
        # --- Start Transaction for current law ---
        try:
            # 1. Synchronize Law Record (Insert or Update)
            existing_law_db = get_law_from_db(pcode)
            db_law_id = None
            law_op_success = False

            if existing_law_db:
                # print(f"Law {pcode} ({law_name}) found in DB. Updating XML-derived fields.") # Verbose
                if update_law_in_db(pcode, law_tags):
                    db_law_id = existing_law_db['id'] # Get ID of existing law
                    law_op_success = True
                else:
                    raise Exception(f"Update operation failed for law {pcode}") # Error already printed by sub-function
            else:
                # print(f"Law {pcode} ({law_name}) not in DB. Inserting.") # Verbose
                if insert_law_to_db(law_tags):
                    # Fetch the new law's ID for article insertion
                    newly_inserted_law_db = get_law_from_db(pcode) 
                    if newly_inserted_law_db:
                        db_law_id = newly_inserted_law_db['id']
                        law_op_success = True
                    else:
                        raise Exception(f"Failed to retrieve ID for newly inserted law {pcode}")
                else:
                    raise Exception(f"Insert operation failed for law {pcode}") # Error already printed by sub-function
            
            if not law_op_success or not db_law_id:
                raise Exception(f"Law record synchronization failed or DB ID not obtained for {pcode}")

            # 2. Synchronize Articles: Delete existing then insert current ones
            # print(f"Synchronizing articles for law ID {db_law_id} (PCode: {pcode})...") # Verbose
            if not delete_articles_for_law(db_law_id):
                raise Exception(f"Failed to delete old articles for law ID {db_law_id}") # Error printed by sub-function
            
            articles_for_current_law = lawmgr_instance.law_items.get(law_name, [])
            if articles_for_current_law:
                # print(f"Found {len(articles_for_current_law)} articles in LawMgr for {law_name}. Inserting...") # Verbose
                articles_op_success, num_inserted = insert_articles_to_db(db_law_id, articles_for_current_law)
                if not articles_op_success:
                    # Error details (which article failed) should be printed by insert_articles_to_db before it raises exception
                    raise Exception(f"Failed to insert one or more articles for law ID {db_law_id}")
                # print(f"Inserted {num_inserted} articles for law ID {db_law_id}.") # Verbose
            # else: # Verbose
                # print(f"No articles found in LawMgr for {law_name} to insert.")
            
            # If all operations for this law are successful, commit the transaction
            if db_connection and not db_connection.closed:
                db_connection.commit()
                # print(f"Successfully committed changes for law {pcode}.") # Verbose
                synced_laws_count +=1
            else:
                # This case should ideally not be reached if check_db_connection passed initially
                # and no external factors closed the connection.
                print(f"CRITICAL: DB connection lost before committing for law {pcode}. Data might be in an inconsistent state if previous operations were partially executed by DB before implicit commit/rollback by connection loss.")
                raise Exception("DB connection lost before commit")

        except Exception as e:
            print(f"Transaction failed for law PCode {pcode} (Name: {law_name}): {e}")
            if db_connection and not db_connection.closed:
                try:
                    db_connection.rollback()
                    print(f"Rolled back changes for law PCode {pcode}.")
                except Exception as rb_e:
                    # This indicates a severe issue with the DB connection or session state.
                    print(f"CRITICAL: Error during rollback for law PCode {pcode}: {rb_e}. Database state might be inconsistent.")
            else:
                print(f"DB connection was not available for rollback for law PCode {pcode}. Previous operations might be persisted if auto-commit was on at any point or connection was lost.")
            # Continue to the next law, transaction for this one is now handled (rolled back or error noted).
            
    print(f"DB synchronization process finished. Successfully synced and committed {synced_laws_count} laws.")

### import summary

In [None]:
# --- Function to Import Law Summaries ---
import re # For parsing summary file
import psycopg2 # Explicit import for type checking, though db functions are global

def import_law_summaries(summary_filepath):
    global db_connection, db_cursor # Use existing global connection variables

    print(f"Starting summary import from: {summary_filepath}")
    law_name = None
    current_summary_lines = []
    summaries_to_process = []

    try:
        with open(summary_filepath, 'r', encoding='utf-8') as f:
            for line in f:
                match = re.match(r"----- File: (.*)\.txt -----", line)
                if match:
                    # If we have a current law name and summary, store it
                    if law_name and current_summary_lines:
                        summaries_to_process.append((law_name, "".join(current_summary_lines).strip()))
                    
                    law_name = match.group(1)
                    current_summary_lines = [] # Reset for the new law
                elif law_name: # Only collect lines if we are inside a law's section
                    current_summary_lines.append(line)
            
            # Add the last processed summary after EOF
            if law_name and current_summary_lines:
                summaries_to_process.append((law_name, "".join(current_summary_lines).strip()))

    except FileNotFoundError:
        print(f"Error: Summary file not found at {summary_filepath}")
        return
    except Exception as e:
        print(f"Error reading or parsing summary file: {e}")
        return

    if not summaries_to_process:
        print("No summaries found in the file or file format incorrect.")
        return

    print(f"Found {len(summaries_to_process)} summaries to process.")

    if not connect_db(): # Ensure DB is connected
        print("Database connection failed. Cannot import summaries.")
        return

    successful_updates = 0
    failed_updates = 0

    for name, summary_text in summaries_to_process:
        print(f"Processing summary for law: {name}")
        if not summary_text:
            print(f"Skipping law '{name}' due to empty summary.")
            failed_updates +=1
            continue
        try:
            # Ensure cursor is valid
            if not db_cursor or (db_connection and db_connection.closed != 0):
                print("Database cursor is not available. Attempting to reconnect...")
                if not connect_db(): # try to connect again
                     print(f"Failed to reconnect to DB. Skipping update for {name}.")
                     failed_updates +=1
                     continue
                # if connect_db did not raise, cursor should be good now

            sql = "UPDATE laws SET llm_summary = %s WHERE xml_law_name = %s;"
            db_cursor.execute(sql, (summary_text, name))
            
            if db_cursor.rowcount > 0:
                db_connection.commit()
                print(f"Successfully updated summary for law: {name}")
                successful_updates += 1
            else:
                db_connection.rollback() # Rollback if no rows were affected (law name not found)
                print(f"Warning: Law '{name}' not found in database or summary already matches. No update made.")
                failed_updates += 1
        except Exception as e:
            if db_connection and not db_connection.closed:
                db_connection.rollback()
            print(f"Error updating summary for law '{name}': {e}")
            failed_updates += 1
            # Attempt to re-establish connection for next item if connection seems lost
            if isinstance(e, (psycopg2.InterfaceError, psycopg2.OperationalError)): # Check specific psycopg2 errors
                print("Connection lost, attempting to reconnect...")
                if not connect_db():
                    print("Failed to reconnect. Aborting further summary imports.")
                    break 
    
    print(f"Summary import finished. Successful updates: {successful_updates}, Failed/Skipped updates: {failed_updates}")
    # disconnect_db() # Decide if to disconnect here or let user manage globally

# Example Usage (commented out, to be placed in a separate cell later)
# summary_file = "/path/to/your/summaries.txt" # User needs to set this path
# import_law_summaries(summary_file)

### import keywords

In [None]:
# --- Function to Import Law Keywords from CSV ---
import pandas as pd
import re # Already imported for summaries, but good to note if this cell were standalone
import psycopg2 # For type hinting and explicit dependency, though db functions are global

def import_law_keywords(keyword_csv_filepath):
    global db_connection, db_cursor # Use existing global connection variables

    print(f"Starting keyword import from CSV: {keyword_csv_filepath}")

    try:
        df = pd.read_csv(keyword_csv_filepath)
        if 'filename' not in df.columns or 'keywords' not in df.columns:
            print("Error: CSV file must contain 'filename' and 'keywords' columns.")
            return
    except FileNotFoundError:
        print(f"Error: Keyword CSV file not found at {keyword_csv_filepath}")
        return
    except Exception as e:
        print(f"Error reading or parsing keyword CSV file: {e}")
        return

    # Extract law name from filename (e.g., "法規A.txt" -> "法規A")
    try:
        df['law_name'] = df['filename'].apply(lambda x: re.sub(r'\.txt$', '', str(x)))
    except Exception as e:
        print(f"Error processing 'filename' column: {e}")
        return
        
    # Group by the extracted law name and aggregate keywords
    keywords_by_law = df.groupby('law_name')['keywords'].apply(lambda x: '|'.join(x.astype(str).dropna()))
    
    if keywords_by_law.empty:
        print("No keywords found or processed from the CSV file.")
        return

    print(f"Found {len(keywords_by_law)} laws with keywords to process.")

    if not connect_db(): # Ensure DB is connected
        print("Database connection failed. Cannot import keywords.")
        return

    successful_updates = 0
    failed_updates = 0

    for law_name, combined_keywords in keywords_by_law.items():
        print(f"Processing keywords for law: {law_name}")
        if not combined_keywords:
            print(f"Skipping law '{law_name}' due to empty combined keywords.")
            failed_updates +=1
            continue
        try:
            if not db_cursor or (db_connection and db_connection.closed != 0):
                print("Database cursor is not available. Attempting to reconnect...")
                if not connect_db():
                     print(f"Failed to reconnect to DB. Skipping update for {law_name}.")
                     failed_updates +=1
                     continue

            sql = "UPDATE laws SET llm_keywords = %s WHERE xml_law_name = %s;"
            db_cursor.execute(sql, (combined_keywords, law_name))

            if db_cursor.rowcount > 0:
                db_connection.commit()
                print(f"Successfully updated keywords for law: {law_name}")
                successful_updates += 1
            else:
                db_connection.rollback()
                print(f"Warning: Law '{law_name}' not found in database or keywords already match. No update made.")
                failed_updates += 1
        except Exception as e:
            if db_connection and not db_connection.closed:
                db_connection.rollback()
            print(f"Error updating keywords for law '{law_name}': {e}")
            failed_updates += 1
            if isinstance(e, (psycopg2.InterfaceError, psycopg2.OperationalError)):
                print("Connection lost, attempting to reconnect...")
                if not connect_db():
                    print("Failed to reconnect. Aborting further keyword imports.")
                    break
    
    print(f"Keyword import finished. Successful updates: {successful_updates}, Failed/Skipped updates: {failed_updates}")
    # disconnect_db() # Decide if to disconnect here

# Example Usage (commented out, to be placed in a separate cell later)
# keyword_file = "/path/to/your/keywords.csv" # User needs to set this path
# import_law_keywords(keyword_file)

## 基本列表與查詢
- 所有法規名稱的列表
- 指定法規列條文
- 客製化搜尋

本部分不建構管理物件，單純 parse 時做動作

In [None]:
import xml.etree.ElementTree as ET
import os

def parse_xml(xml_file,filter_key="",law_name=[]):
    tree = ET.parse(xml_file)
    root = tree.getroot()
    cnt = 0
    for 法規 in root:
        法規性質 = 法規.find('法規性質').text
        法規名稱 = 法規.find('法規名稱').text
        法規網址 = 法規.find('法規網址').text
        法規類別 = 法規.find('法規類別').text
        # ... 其他欄位以此類推
        
        if not law_name:
          print(f"法規名稱: {法規名稱}")  
          #print(f"法規類別: {法規類別}")
          cnt+=1
        else:
          if filter_key=="class":
            if 法規類別.find("行政＞國家科學及技術委員會")==-1: #教育部,勞動部,經濟部,數位發展部
              continue 
          elif 法規名稱 not in law_name:
            continue
          cnt+=1
          print(f"----- 法規名稱: {法規名稱} -----") 
          #print(f"法規類別: {法規類別}")
      
        if 1: #顯示條文
          法規內容 = []
          for 條文 in 法規.find('法規內容').findall('條文'):
            條號 = 條文.find('條號').text
            條文內容 = 條文.find('條文內容').text
            法規內容.append({ "條號": 條號, "條文內容": 條文內容})
            print(f"{條號}：\n{條文內容}") 
          if 1: #write to file
            output_path = "/tmp/output"
            cols = 法規類別.split("＞")
            if len(cols)>=1:
              dir_path = output_path + "/" + cols[0]
              if not os.path.exists(dir_path):
                os.makedirs(dir_path)
            if len(cols)>=2:
              dir_path = dir_path + "/" + cols[1]
              if not os.path.exists(dir_path):
                os.makedirs(dir_path)   
            if len(cols)>=3:
              dir_path = dir_path + "/" + cols[2]
              if not os.path.exists(dir_path):
                os.makedirs(dir_path) 
            with open(dir_path + "/" + 法規名稱 + ".txt" , 'w', encoding='utf-8') as file:
              for 條文 in 法規內容:
                file.write(f"{條文['條號']}：\n{條文['條文內容']}\n")
            file.close()  
            #print(f"法規內容: {法規內容}")
        #print(f"法規性質: {法規性質}")
        #print(f"法規名稱: {法規名稱}")
        #print(f"法規網址: {法規網址}")
        # ... 其他欄位以此類推
        #print(f"法規內容: {法規內容}")
    #print(f"共有 {cnt} 筆資料")


filepath_law ="/Volumes/D2024/data/prj/公文模型/工具/法規/FalV/FalV.xml" #法規
filepath_cmd ="/Volumes/D2024/data/prj/公文模型/工具/命令/MingLing/MingLing.xml" #命令

if 0:
  law_str="中華民國憲法、民法、中華民國刑法、行政程序法、民事訴訟法、刑事訴訟法、行政訴訟法、公司法、勞動基準法、社會福利基本法、地方制度法、國家安全法、公平交易法、稅捐稽徵法、著作權法、個人資料保護法、消費者保護法、環境基本法"
  cols = law_str.split("、")
  parse_xml(filepath_law ,"",cols)  # class,勞動基準法

## 一般化建構成 LawMgr
- 底層建構的方式，是一般化的，可延用
- 另建 Mgr 物件，針對這個應用作直覺的管理

In [None]:
import xml.etree.ElementTree as ET
from urllib.parse import urlparse, parse_qs

def extract_pcode_from_url(url_string):
    """Extracts PCode from the law's URL string."""
    if not url_string:
        return None
    try:
        parsed_url = urlparse(url_string)
        query_params = parse_qs(parsed_url.query)
        pcode_list = query_params.get('pcode', [])
        if pcode_list:
            return pcode_list[0]
        return None
    except Exception: # Catch any parsing errors
        return None

class XMLElement:
    def __init__(self, tag, attrib, text, children):
        self.tag = tag
        self.attrib = attrib
        self.text = text
        self.children = children
        self.tags = {} # key: tag, value: text (populated by LawMgr)

    def __repr__(self):
        children_repr = ", ".join(repr(child) for child in self.children)
        return f"XMLElement(tag={self.tag}, attrib={self.attrib}, text={self.text}, children=[{children_repr}])"


                
def parse_xml_withobj(xml_file, filter_key="", law_name=[]): #一般化解析出 laws, 保留 XML 結構
    def element_to_object(element):
        children = [element_to_object(child) for child in element]
        return XMLElement(
            tag=element.tag,
            attrib=element.attrib,
            text=element.text.strip() if element.text else None,
            children=children
        )

    tree = ET.parse(xml_file)
    root = tree.getroot()
    laws = {}
    cnt = 0

    for 法規 in root:
        法規名稱 = 法規.find('法規名稱').text
        #print(f"法規名稱: {法規名稱}")  
        law = element_to_object(法規)
        #print(law)
        #print(law.attrib.get('法規名稱', ''))
        
        if not law_name:
            #print(law)
            cnt += 1
        else:
            if filter_key == "class":
                if "行政＞勞動部" not in law.attrib.get('法規類別', ''):
                    continue
            elif 法規名稱 not in law_name: #law.attrib.get('法規名稱', '') 
                continue
            #print(law)
            cnt += 1
        
        laws[法規名稱]=law
    
    return laws


def remove_chars(target_str, chars_to_remove):
    # 创建一个翻译表，将每个要移除的字符映射到 None
    translation_table = str.maketrans('', '', chars_to_remove)
    # 使用 translate 方法移除字符
    return target_str.translate(translation_table)

class LawMgr():
    def __init__(self,laws):
        self.laws = laws # key: 法規名稱， value: 法規物件 (XMLElement)
        self.law_items ={} # key: 法規名稱， value: list of article dicts {'編章節': ..., '條號': ..., '條文內容': ...}
        self.law_related = {} # key: 法規名稱， value: {key: article_number, value: list of related_law_names}
        
        for law_name in laws.keys():
            self.law_related[law_name]={}
            law_xml_element = laws[law_name] # This is an XMLElement
            
            # Populate .tags attribute of the XMLElement, including PCode extraction
            law_xml_element.tags['PCode'] = None # Initialize PCode in the XMLElement's tags
            for child_node in law_xml_element.children:
                law_xml_element.tags[child_node.tag] = child_node.text
                if child_node.tag == '法規網址' and child_node.text:
                    pcode = extract_pcode_from_url(child_node.text)
                    if pcode:
                        law_xml_element.tags['PCode'] = pcode
            
            # Process and store articles in self.law_items
            current_chapter_section = None
            law_items_list = []
            法規內容_node = next((c for c in law_xml_element.children if c.tag == '法規內容'), None)
            
            if 法規內容_node:
                for content_child in 法規內容_node.children: 
                    if content_child.tag == '編章節':
                        current_chapter_section = content_child.text
                    elif content_child.tag == '條文':
                        條號_node = next((item for item in content_child.children if item.tag == '條號'), None)
                        條文內容_node = next((item for item in content_child.children if item.tag == '條文內容'), None)
                        
                        條號 = 條號_node.text if 條號_node and 條號_node.text else None
                        條文內容 = 條文內容_node.text if 條文內容_node and 條文內容_node.text else None

                        if 條號:
                            law_items_list.append({
                                "編章節": current_chapter_section,
                                "條號": 條號,
                                "條文內容": 條文內容 if 條文內容 else ""
                            })
            self.law_items[law_name] = law_items_list

        # Rebuild law_related based on the now populated self.law_items
        for law_name in laws.keys():
            self.get_law_related(law_name, saved=True)

    def get_child_txt(self,law_name,tag):
        law = self.laws[law_name]
        for child in law.children:
            if child.tag == tag:
                return child.text
    def find_all(self,node,nodes,tag):
        if node.tag == tag:
            nodes.append(node)
        for child in node.children:
            self.find_all(child,nodes,tag)
    
    def get_law_related(self, law_name, saved=False):
        """Identifies laws mentioned in the articles of the given law_name.
           If saved=True, updates self.law_related[law_name].
           Otherwise, prints findings and returns a dictionary of relations.
        """
        articles_list = self.law_items.get(law_name, [])
        all_known_law_names = self.laws.keys()
        
        relations_found_this_call = {} # For non-saved mode or internal aggregation before saving

        for article_dict in articles_list:
            article_content = article_dict.get('條文內容', '')
            article_number = article_dict.get('條號')

            if not article_number or not article_content:
                continue

            for referred_law_name in all_known_law_names:
                if referred_law_name == law_name: # A law cannot refer to itself in this context
                    continue
                if article_content.find(referred_law_name) != -1:
                    if article_number not in relations_found_this_call:
                        relations_found_this_call[article_number] = []
                    if referred_law_name not in relations_found_this_call[article_number]:
                        relations_found_this_call[article_number].append(referred_law_name)
                        if not saved:
                            print(f"{law_name} - Article '{article_number}' refers to '{referred_law_name}'. Preview: {article_content[:100].strip()}...")
        
        if saved:
            self.law_related[law_name] = relations_found_this_call
        else:
            return relations_found_this_call
                
    def is_law(self,law_name):
        return law_name in self.laws.keys()

    def show_law(self,law_name,format="txt"):
        lines = []
        if not self.is_law(law_name):
            return []
        
        law_data = self.laws[law_name]
        articles_list = self.law_items.get(law_name, [])

        if format=="txt":
            # Display tags from the XMLElement, which were populated in __init__
            for tag_key, tag_value in law_data.tags.items():
                lines.append(f"{tag_key}: {tag_value}")
            lines.append("\n--- Articles ---")
            current_chap_sec_disp = None
            for article_dict in articles_list:
                chapter_section = article_dict.get("編章節")
                if chapter_section and chapter_section != current_chap_sec_disp:
                    lines.append(f"\n編章節: {chapter_section}")
                    current_chap_sec_disp = chapter_section
                article_num = article_dict.get("條號", "N/A")
                article_content = article_dict.get("條文內容", "")
                lines.append(f"{article_num}:\n{article_content}")
        else: # "json" or structured format
            lines.append(law_data.tags) 
            lines.append(articles_list) 
        return lines

    def show_related(self,law_name): # 輸出 mermaid 語法
        ret = []
        if law_name in self.law_related and self.law_related[law_name]:
            rchars = ' （）'
            processed_law_name = remove_chars(law_name, rchars)
            for article_num, related_laws_list in self.law_related[law_name].items():
                disp_article_name = f"{processed_law_name}_{remove_chars(article_num,rchars)}"
                print(f"{processed_law_name} --> {disp_article_name}")
                
                for related_law_name_item in related_laws_list:
                    processed_related_law_name = remove_chars(related_law_name_item, rchars)
                    print(f"{disp_article_name} --> {processed_related_law_name}")
                    if processed_related_law_name not in ret:
                        ret.append(processed_related_law_name)
        return ret
          


## 載入 Mgr
- law,cmd 載入約需 13s


In [None]:
# 有兩個資料集，一個是法規，一個是命令。目前看來格式相同，程式碼都能跑
filepath_law ="/Volumes/D2024/data/prj/公文模型/工具/法規/FalV/FalV.xml" #法規
filepath_cmd ="/Volumes/D2024/data/prj/公文模型/工具/命令/MingLing/MingLing.xml" #命令
obj_laws = parse_xml_withobj(filepath_law, filter_key="", law_name=[]) #class, 勞動檢查法
#obj_cmds = parse_xml_withobj(filepath_cmd, filter_key="", law_name=[]) #
#lawmgr = LawMgr({**obj_laws,**obj_cmds}) #轉成管理物件, 單獨選用或是合併看需要
lawmgr = LawMgr({**obj_laws}) #轉成管理物件, 單獨選用或是合併看需要
print("LawMgr initialized.")

# --- DB Synchronization Block ---
# Ensure DB utility functions (connect_db, etc.) and synchronize_lawmgr_with_db 
# are defined in preceding cells.

# Attempt to connect to the database
if connect_db(): 
    if 0:
        print("Attempting to synchronize LawMgr data with the database...")
        synchronize_lawmgr_with_db(lawmgr) # Pass the initialized LawMgr instance
    # disconnect_db() # Optional: disconnect after sync. Useful for long-running notebooks.
else:
    print("Failed to connect to the database. DB synchronization will be skipped.")


In [None]:
# --- Import Law Summaries from Text File ---
# 1. Set the 'summary_file_path' variable below to the full path of your summary text file.
#    The file should be formatted with '----- File: [Law Name].txt -----' headers.
# 2. Ensure your database connection parameters are correctly set in the 'PostgreSQL Connection Configuration' cell.
# 3. Run this cell to import the summaries.

summary_file_path = "./data/summary_dir_law.md"  # <--- USER ACTION: UPDATE THIS PATH
#summary_file_path = "./data/summary_dir_cmd.md" # both 7984 total
if 'import_law_summaries' in globals() and callable(import_law_summaries):
    if summary_file_path == "/path/to/your/summaries.txt":
        print("INFO: Please update 'summary_file_path' with the actual path to your summary file before running.")
    else:
        print(f"Attempting to import summaries from: {summary_file_path}")
        # connect_db() # Ensure connection is attempted before calling, or rely on function's internal call
        import_law_summaries(summary_file_path)
        # disconnect_db() # Optional: uncomment if you want to disconnect after this operation
else:
    print("Error: The function 'import_law_summaries' is not defined. Please ensure the cell defining it has been run.")

In [None]:
# --- Import Law Keywords from CSV File ---
# 1. Set the 'keyword_csv_path' variable below to the full path of your keyword CSV file.
#    The CSV should have 'filename' (e.g., 'Law Name.txt') and 'keyword' columns.
# 2. Ensure your database connection parameters are correctly set in the 'PostgreSQL Connection Configuration' cell.
# 3. Run this cell to import the keywords.

keyword_csv_path = "./data/keywords_law.csv"  # <--- USER ACTION: UPDATE THIS PATH
#keyword_csv_path = "./data/keywords_cmd.csv" # both 7423 total

if 'import_law_keywords' in globals() and callable(import_law_keywords):
    if keyword_csv_path == "/path/to/your/keywords.csv":
        print("INFO: Please update 'keyword_csv_path' with the actual path to your keyword CSV file before running.")
    else:
        print(f"Attempting to import keywords from: {keyword_csv_path}")
        # connect_db() # Ensure connection is attempted, or rely on function's internal call
        import_law_keywords(keyword_csv_path)
        # disconnect_db() # Optional: uncomment if you want to disconnect after this operation
else:
    print("Error: The function 'import_law_keywords' is not defined. Please ensure the cell defining it has been run.")

## 使用範例-法規，命令
- 兩個資料集內容格式相同，一個為法規，一個為命令

In [None]:
# 例子： law: 勞動檢查法 , cmd:勞動基準法施行細則


#顯示法規數量
#print(len(lawmgr.laws))

#顯示某法，含條文
#print(lawmgr.show_law('勞動基準法施行細則',format="txt"))

#顯示某個法的某欄位
#print(lawmgr.get_child_txt('勞動基準法','法規網址'))

#顯示某法的基本欄位
#print(lawmgr.laws['勞動基準法施行細則'].tags)

#所有法的列表
#print(lawmgr.laws.keys())

#顯示某法的關聯法
#print(lawmgr.law_related['就業保險法'])
if 0: #有法規性質各有多少 #{'憲法': 9, '法律': 1317, '命令': 10225}
    kind = {}
    for law in lawmgr.laws.keys(): # 所有法
        法規性質=lawmgr.laws[law].tags['法規性質']
        if 法規性質 not in kind.keys():
            kind[法規性質]=0
        kind[法規性質]+=1
if 0: # 顯示關聯性，兩層展開(mermaid 語法), mermaid มี 500 บรรทัดจำกัด
    cnt = 0 
    law_list = ['貪污治罪條例']
    print("graph TD")
    for law in law_list:
        ret1 = lawmgr.show_related(law)
        #print(f"ret={ret1}")

    for law in ret1:
        ret2 = lawmgr.show_related(law)
        #print(f"ret={ret2}")    
if 0: # 顯示連結到某法的法條
    rchars = ' （）'
    law_list = ['貪污治罪條例']  
    rlaws = {}
    print("graph TD")
    for law in lawmgr.laws.keys(): # 所有法
        for k in lawmgr.law_related[law]: # 所有相關條文
            for v1 in lawmgr.law_related[law][k]: # 所有相關法
                if v1 in law_list: # 指定法
                    
                    v1 = remove_chars(v1,rchars)
                    law_d= remove_chars(law,rchars)
                    rlaws[law_d]=v1
                    disp_name = f"{k}"
                    disp_name = remove_chars(disp_name,rchars)
                    print(f"{disp_name}-->{law_d}")
    for law in rlaws.keys():
        print(f"{law}-->{rlaws[law]}")    
                
if 0:  # 顯示某法的某條文      
    law_name = "就業保險法"
    item_cnt = "第 2 條"
    # This needs to be updated to reflect that law_items[law_name] is a list of dicts
    # Example: print(f"{law_name} {item_cnt} :\n{[a['條文內容'] for a in lawmgr.law_items[law_name] if a['條號'] == item_cnt][0]}")
    print(f"{law_name} {item_cnt} :\n{lawmgr.law_items[law_name][item_cnt]}") # Placeholder, needs specific article access logic
if 0:
    #law_name = "消費者保護法"
    law_names = ['勞動基準法', '民法典', '消費者保護法','公司法','環境保護法','道路交通安全法', '著作權法', '勞動合同法','食品安全法','稅收征收管理法']
    for law_name in law_names:
        print(f"{law_name} is law?{lawmgr.is_law(law_name)}") 
if 0:
    for law in lawmgr.laws.keys():
        print(law)

if 1:# 法律清單內的法律是否存在
    law_str="中華民國憲法、民法、中華民國刑法、行政程序法、民事訴訟法、刑事訴訟法、行政訴訟法、公司法、勞動基準法、社會福利基本法、地方制度法、國家安全法、公平交易法、稅捐稽徵法、著作權法、個人資料保護法、消費者保護法、環境基本法"
    cols = law_str.split("、")
    for col in cols:
        #print(f"{col}:{lawmgr.is_law(col)}")
        #lawmgr.is_law(col)
        lines = lawmgr.show_law(col)
        chars ="\n".join(lines)
        print(f"{col}:{len(lines)},{len(chars)}")
    
        



### Check Database Connection Status
This cell checks the current status of the database connection using the `check_db_connection()` function. Ensure that the database connection parameters are set correctly in the configuration cell and that `connect_db()` has been run if you intend to be connected.

In [None]:
# Check current database connection status
print("Checking database connection...")
if 'check_db_connection' in globals() and callable(check_db_connection):
    check_db_connection()
else:
    print("Error: check_db_connection function not found or not callable. Ensure previous cells defining it are run.")

### Manual Full XML Import/Update to DB
This cell allows for a full import and synchronization of laws and their articles from a specified XML file into the database. 

1.  **Edit `xml_filepath_manual_full`**: Set this variable to the full path of the XML file you want to process (e.g., `FalV.xml` or `MingLing.xml`).
2.  **Run the Cell**: It will parse the XML, initialize a temporary `LawMgr` with its content, connect to the database, and then synchronize all data from the XML with the database using `synchronize_lawmgr_with_db`.

In [None]:
# Define the path to the XML file for manual full import
xml_filepath_manual_full = "/Volumes/D2024/data/prj/公文模型/工具/法規/FalV/FalV.xml" # USER ACTION: Update this path as needed
print(f"Preparing for full manual import/update from: {xml_filepath_manual_full}")

if 'parse_xml_withobj' not in globals() or 'LawMgr' not in globals() or \
   'connect_db' not in globals() or 'synchronize_lawmgr_with_db' not in globals():
    print("Error: One or more required functions (parse_xml_withobj, LawMgr, connect_db, synchronize_lawmgr_with_db) not found.")
    print("Please ensure all preceding cells, especially class/function definitions, have been run.")
else:
    try:
        print(f"Parsing XML file: {xml_filepath_manual_full}...")
        manual_obj_laws = parse_xml_withobj(xml_filepath_manual_full, filter_key="", law_name=[])
        if not manual_obj_laws:
            print("No laws parsed from the XML file. Aborting.")
        else:
            print(f"Parsed {len(manual_obj_laws)} laws. Initializing temporary LawMgr...")
            manual_lawmgr = LawMgr(manual_obj_laws)
            print("Temporary LawMgr initialized.")
            
            if connect_db():
                print("Connected to DB. Starting synchronization for the manually loaded XML...")
                synchronize_lawmgr_with_db(manual_lawmgr)
                print("Manual full XML import/update process complete.")
                # disconnect_db() # Optional: Disconnect after operation if desired
            else:
                print("Failed to connect to DB. Manual import/update aborted.")
    except FileNotFoundError:
        print(f"Error: XML file not found at {xml_filepath_manual_full}")
    except Exception as e:
        print(f"An error occurred during manual full import/update: {e}")

### Manual Selective Law Import/Update to DB
This cell allows for a selective import or update of specific laws (and their articles) from an XML file into the database. 

1.  **Edit `xml_filepath_manual_selective`**: Set this to the path of the source XML file.
2.  **Edit `target_laws_identifiers`**: Provide a Python list of law identifiers (PCode or Law Name).
3.  **Edit `identifier_type`**: Specify if the identifiers are "PCODE", "NAME", or "PCODE_OR_NAME".
4.  **Run the Cell**: It parses the entire XML to build context, then filters for the specified laws, creates a temporary `LawMgr` for them, and synchronizes only these selected laws with the database.

In [None]:
# Define parameters for selective manual import
xml_filepath_manual_selective = "/Volumes/D2024/data/prj/公文模型/工具/法規/FalV/FalV.xml" # USER ACTION: Update this path
target_laws_identifiers = ["A0000001", "民法"] # USER ACTION: List of PCODENAMEs or Law Names
identifier_type = "PCODE_OR_NAME"  # USER ACTION: Choose "PCODE", "NAME", or "PCODE_OR_NAME"

print(f"Preparing for selective manual import/update from: {xml_filepath_manual_selective}")
print(f"Target laws/pcodes: {target_laws_identifiers} (type: {identifier_type})")

if 'parse_xml_withobj' not in globals() or 'LawMgr' not in globals() or \
   'connect_db' not in globals() or 'synchronize_lawmgr_with_db' not in globals() or \
   'extract_pcode_from_url' not in globals(): # extract_pcode_from_url is used by LawMgr
    print("Error: One or more required functions not found. Please ensure all preceding cells are run.")
else:
    try:
        print("Parsing the entire XML to create a comprehensive LawMgr for context...")
        full_obj_laws = parse_xml_withobj(xml_filepath_manual_selective, filter_key="", law_name=[])
        
        if not full_obj_laws:
            print("No laws parsed from the XML. Aborting selective update.")
        else:
            print(f"Parsed {len(full_obj_laws)} total laws. Creating full LawMgr instance for selection...")
            # LawMgr is needed to correctly extract PCode and structure articles via its __init__
            full_lawmgr_for_selection = LawMgr(full_obj_laws)
            print("Full LawMgr for selection created.")

            selected_law_elements = {}
            for law_name_from_mgr, law_xml_element in full_lawmgr_for_selection.laws.items():
                pcode_from_mgr = law_xml_element.tags.get('PCode') # PCode is in tags after LawMgr init
                
                match = False
                if identifier_type == "PCODE":
                    if pcode_from_mgr in target_laws_identifiers:
                        match = True
                elif identifier_type == "NAME":
                    if law_name_from_mgr in target_laws_identifiers:
                        match = True
                elif identifier_type == "PCODE_OR_NAME":
                    if pcode_from_mgr in target_laws_identifiers or law_name_from_mgr in target_laws_identifiers:
                        match = True
                
                if match:
                    selected_law_elements[law_name_from_mgr] = law_xml_element
            
            if not selected_law_elements:
                print(f"No laws matched the specified identifiers: {target_laws_identifiers}. Aborting.")
            else:
                print(f"Found {len(selected_law_elements)} matching laws. Creating temporary LawMgr for these selected laws...")
                # Create a new LawMgr instance containing only the selected XMLElements.
                # LawMgr's __init__ will correctly process these to populate its internal structures like .tags and .law_items.
                selective_lawmgr = LawMgr(selected_law_elements)
                print(f"Temporary LawMgr for {len(selective_lawmgr.laws)} selected laws initialized.")

                if connect_db():
                    print("Connected to DB. Starting synchronization for the selected laws...")
                    synchronize_lawmgr_with_db(selective_lawmgr) # Pass the LawMgr with only selected laws
                    print("Manual selective law import/update process complete.")
                    # disconnect_db() # Optional
                else:
                    print("Failed to connect to DB. Manual selective import/update aborted.")
    except FileNotFoundError:
        print(f"Error: XML file not found at {xml_filepath_manual_selective}")
    except Exception as e:
        print(f"An error occurred during manual selective import/update: {e}")

### List All Laws in Database
This cell connects to the database and retrieves a list of all laws currently stored in the `laws` table, displaying their PCode, Name, Category, and Last Changed Date.

In [None]:
# List all laws from the database
print("Listing all laws from the database...")
if 'connect_db' in globals() and callable(connect_db) and connect_db():
    try:
        db_cursor.execute("SELECT pcode, xml_law_name, xml_law_category, xml_latest_change_date FROM laws ORDER BY pcode;")
        all_laws_from_db = db_cursor.fetchall()
        if all_laws_from_db:
            print(f"Found {len(all_laws_from_db)} laws in the database:")
            print("{:<12} | {:<50} | {:<30} | {:<15}".format("PCode", "Law Name", "Category", "Last Changed"))
            print("-" * 110)
            for law_row in all_laws_from_db:
                pcode, name, category, last_changed = law_row
                last_changed_str = last_changed.strftime('%Y-%m-%d') if last_changed else 'N/A'
                print("{:<12} | {:<50} | {:<30} | {:<15}".format(pcode, name if name else 'N/A', category if category else 'N/A', last_changed_str))
        else:
            print("No laws found in the database.")
    except Exception as e:
        print(f"Error querying laws from DB: {e}")
    # finally:
    #     disconnect_db() # Optional: disconnect after query
else:
    print("Database not connected. Cannot query laws.")

### Get Specific Law Details from Database (by PCode)
This cell retrieves and displays detailed information for a specific law (identified by its PCode) from the `laws` table, along with all its associated articles from the `articles` table.

1.  **Edit `target_pcode_db_query`**: Set this variable to the PCode of the law you want to query.

In [None]:
# Get specific law details from DB by PCode
target_pcode_db_query = "A0030057"  # USER ACTION: Update this PCode as needed
print(f"Querying details for PCode '{target_pcode_db_query}' from database...")

if 'connect_db' in globals() and callable(connect_db) and connect_db():
    try:
        # Fetch law details
        db_cursor.execute("SELECT * FROM laws WHERE pcode = %s;", (target_pcode_db_query,))
        law_record = db_cursor.fetchone()
        if law_record:
            law_colnames = [desc[0] for desc in db_cursor.description]
            law_dict = dict(zip(law_colnames, law_record))
            print("\n--- Law Details ---")
            for col, val in law_dict.items():
                print(f"{col}: {val}")
            
            law_db_id = law_dict.get('id')
            if law_db_id:
                # Fetch articles
                db_cursor.execute("SELECT xml_chapter_section, xml_article_number, xml_article_content FROM articles WHERE law_id = %s ORDER BY id;", (law_db_id,))
                articles_records = db_cursor.fetchall()
                print("\n--- Articles ---")
                if articles_records:
                    for art_row in articles_records:
                        print(f"Chapter/Section: {art_row[0] if art_row[0] else '(N/A)'}")
                        print(f"Article Number: {art_row[1]}")
                        print(f"Content:\n{art_row[2]}\n")
                else:
                    print("No articles found for this law in the database.")
        else:
            print(f"No law found with PCode '{target_pcode_db_query}' in the database.")
    except Exception as e:
        print(f"Error querying law details from DB: {e}")
    # finally:
    #     disconnect_db() # Optional
else:
    print("Database not connected. Cannot query law details.")

### 重建DB中特定法規的內容

這個功能允許開發者重建DB中特定法規的內容，當法規的解析邏輯、摘要或關鍵字有更新時，可以使用此功能。

重建流程如下：
1.  **指定法規**：在 `laws_to_rebuild` 列表中指定要重建的法規名稱。
2.  **刪除舊資料**：
    *   程式會先刪除 `legal_concepts` 中與該法規相關的紀錄。
    *   然後刪除 `laws` 表中的主紀錄，這會觸發資料庫的級聯刪除（ON DELETE CASCADE），自動刪除 `articles`, `law_hierarchy_relationships`, `law_relationships` 中相關的資料。
3.  **重新匯入資料**：
    *   從 LawMgr 重新匯入法規的基本資料和法條內容。
    *   從指定的檔案讀取對應的摘要和關鍵字，並更新到 `laws` 表中。
4.  **交易處理**：整個刪除和重灌的過程會在一個資料庫交易中完成，確保資料的一致性。如果中途發生錯誤，所有變更都會被還原。


In [None]:
import json
import csv
import os
import re
import pandas as pd

def rebuild_laws_in_db(law_names_to_rebuild, lawmgr_instance, summary_filepath, keyword_csv_filepath):
    """
    重建指定法規在資料庫中的所有內容。
    此函數會刪除現有資料，然後從 LawMgr 和指定的摘要/關鍵字檔案重新匯入。
    每個法規的操作都在一個獨立的資料庫交易中進行。
    """
    global db_connection, db_cursor # 使用筆記本中定義的全域資料庫連線

    if not check_db_connection():
        print("資料庫未連線。中止重建。")
        return

    # 1. 預先載入所有摘要
    all_summaries = {}
    if os.path.exists(summary_filepath):
        print(f"正在從 {summary_filepath} 載入摘要...")
        try:
            with open(summary_filepath, 'r', encoding='utf-8') as f:
                law_name_from_summary = None
                current_summary_lines = []
                for line in f:
                    match = re.match(r"----- File: (.*)\.txt -----", line)
                    if match:
                        if law_name_from_summary and current_summary_lines:
                            all_summaries[law_name_from_summary] = "".join(current_summary_lines).strip()
                        law_name_from_summary = match.group(1)
                        current_summary_lines = []
                    elif law_name_from_summary:
                        current_summary_lines.append(line)
                if law_name_from_summary and current_summary_lines:
                    all_summaries[law_name_from_summary] = "".join(current_summary_lines).strip()
            print(f"已載入 {len(all_summaries)} 篇摘要。")
        except Exception as e:
            print(f"載入摘要檔案時發生錯誤: {e}")
            return

    # 2. 預先載入所有關鍵字
    all_keywords = {}
    if os.path.exists(keyword_csv_filepath):
        print(f"正在從 {keyword_csv_filepath} 載入關鍵字...")
        try:
            df = pd.read_csv(keyword_csv_filepath)
            df['law_name'] = df['filename'].apply(lambda x: re.sub(r'\.txt$', '', str(x)))
            keywords_by_law = df.groupby('law_name')['keywords'].apply(lambda x: '|'.join(x.astype(str).dropna()))
            all_keywords = keywords_by_law.to_dict()
            
            print(f"已載入 {len(all_keywords)} 部法規的關鍵字。")
        except Exception as e:
            print(f"載入關鍵字檔案時發生錯誤: {e}")
            return
    #return all_keywords
    # 3. 處理每部要重建的法規
    for law_name in law_names_to_rebuild:
        print(f"\n--- 開始重建: {law_name} ---")
        
        law_obj = lawmgr_instance.laws.get(law_name)
        if not law_obj:
            print(f"錯誤：在 LawMgr 中找不到法規 '{law_name}'。跳過此法規。")
            continue
        
        pcode = law_obj.tags.get('PCode')
        if not pcode:
            print(f"錯誤：在 LawMgr 中找不到 '{law_name}' 的 PCode。跳過此法規。")
            continue

        try:
            # a. 尋找現有的 law_id 以進行刪除
            db_cursor.execute("SELECT id FROM laws WHERE pcode = %s", (pcode,))
            result = db_cursor.fetchone()
            
            if result:
                law_id_to_delete = result[0]
                print(f"  - 在資料庫中找到現有法規，ID: {law_id_to_delete}。開始刪除...")
                # 刪除 legal_concepts 中的資料
                try:
                    db_cursor.execute("DELETE FROM legal_concepts WHERE law_id = %s", (law_id_to_delete,))
                    print(f"  - 已從 legal_concepts 刪除 {db_cursor.rowcount} 筆記錄。")
                except psycopg2.errors.UndefinedTable:
                    print("  - 'legal_concepts' 表不存在，跳過刪除。")
                
                # 刪除 laws 表中的資料 (將級聯刪除 articles 等)
                db_cursor.execute("DELETE FROM laws WHERE id = %s", (law_id_to_delete,))
                print(f"  - 已刪除 {db_cursor.rowcount} 筆 laws 記錄 (級聯)。")
            else:
                print(f"  - 在資料庫中找不到法規 '{law_name}' (PCode: {pcode})。將直接進行新增。")

            # b. 使用現有的全域輔助函數重新匯入法規
            if not insert_law_to_db(law_obj.tags):
                raise Exception("重新匯入法規記錄時失敗 (insert_law_to_db)。")
            
            # c. 取得新法規的資料庫 ID
            new_law_db_record = get_law_from_db(pcode)
            if not new_law_db_record:
                raise Exception("找不到新匯入的法規記錄以取得其 ID。")
            new_law_id = new_law_db_record['id']
            print(f"  - 已重新匯入法規，新的資料庫 ID: {new_law_id}。")

            # d. 重新匯入法條
            articles_list = lawmgr_instance.law_items.get(law_name, [])
            if articles_list:
                success, count = insert_articles_to_db(new_law_id, articles_list)
                if not success:
                    raise Exception("重新匯入法條時失敗 (insert_articles_to_db)。")
                print(f"  - 已重新匯入 {count} 條法條。")

            # e. 更新摘要和關鍵字
            summary_text = all_summaries.get(law_name)
            keywords_text = all_keywords.get(law_name)
            
            if summary_text or keywords_text:
                update_parts = []
                params = []
                if summary_text:
                    update_parts.append("llm_summary = %s")
                    params.append(summary_text)
                if keywords_text:
                    update_parts.append("llm_keywords = %s")
                    params.append(keywords_text)
                #print(f"update_parts={update_parts}, params={params}")
                
                params.append(new_law_id)
                update_sql = f"UPDATE laws SET {', '.join(update_parts)} WHERE id = %s"
                db_cursor.execute(update_sql, tuple(params))
                print(f"  - 已更新摘要/關鍵字。")

            # 如果此法規的所有步驟都成功，提交交易
            db_connection.commit()
            print(f"--- 成功重建: {law_name} ---")

        except Exception as e:
            print(f"錯誤：重建 '{law_name}' 時發生錯誤: {e}")
            print("正在還原此法規的變更。")
            db_connection.rollback()


In [None]:
# --- Rebuild Specific Laws in DB ---
# 1. Define the list of law names you want to rebuild.
#    These names must match the '法規名稱' in the XML and LawMgr.
laws_to_rebuild = [
    "預算法"
    #"政府採購法"
]

# 2. Specify the paths to your summary and keyword files.
#    These should be the same paths used in the import cells above.
summary_file_for_rebuild = "./data/summary_dir_law.md"
keyword_file_for_rebuild = "./data/keywords_law.csv"

# 3. Ensure the LawMgr instance ('lawmgr') is initialized and contains the laws you want to rebuild.
# 4. Run the function.

print(f"Starting database rebuild for {len(laws_to_rebuild)} laws: {laws_to_rebuild}")

if 'lawmgr' in globals():
    if connect_db():
        try:
            rebuild_laws_in_db(
                law_names_to_rebuild=laws_to_rebuild,
                lawmgr_instance=lawmgr,
                summary_filepath=summary_file_for_rebuild,
                keyword_csv_filepath=keyword_file_for_rebuild
            )
        finally:
            # You can choose to disconnect here or leave the connection open
            # disconnect_db() 
            print("Rebuild script finished. Connection left open.")
    else:
        print("Could not connect to database. Rebuild aborted.")
else:
    print("Error: `lawmgr` is not defined. Please initialize it by running the cells under '載入 Mgr' first.")
