# 📘 Python 列表与 Pandas 数据选择技巧整理
本 Notebook 总结了常用于特征工程与数据分析中的 Python 列表操作技巧，适合用于数据清洗、特征构建及模型输入准备。

## 1️⃣ 列表连接技巧

In [None]:
quantitative = ['GrLivArea', 'OverallQual']
all_features = quantitative + ['SalePrice']
print(all_features)  # ['GrLivArea', 'OverallQual', 'SalePrice']

## 2️⃣ 列表推导式：按数据类型选列

In [None]:
import pandas as pd
train = pd.read_csv('train.csv')  # 示例数据需用户提供
numeric_feats = [col for col in train.columns if train[col].dtype != 'object']
print(numeric_feats[:5])

## 3️⃣ 正则匹配列名

In [None]:
import re
garage_cols = [col for col in train.columns if re.match(r'Garage.*', col)]
print(garage_cols)

## 4️⃣ filter 方法匹配列名

In [None]:
area_cols = train.filter(like='Area', axis=1)
print(area_cols.columns.tolist())

## 5️⃣ zip 解包组合构造新特征

In [None]:
for col1, col2 in zip(['YearBuilt', 'GarageYrBlt'], ['OverallQual', 'OverallCond']):
    train[f'{col1}_{col2}'] = train[col1] * train[col2]
train[[f'{col1}_{col2}' for col1, col2 in zip(['YearBuilt', 'GarageYrBlt'], ['OverallQual', 'OverallCond'])]].head()

## 6️⃣ Pandas Index 运算：交集与差集

In [None]:
test = pd.read_csv('test.csv')  # 示例数据需用户提供
shared = train.columns.intersection(test.columns)
unique = train.columns.difference(test.columns)
print('交集列:', list(shared)[:5])
print('差集列:', list(unique)[:5])