Lab隨記: 9月 2024

文章轉自微信公眾號:機器學習與人工智慧AI

文章原始連結:https://mp.weixin.qq.com/s/4njH2Uc96-zmkh1cWq41Dw

今天分享的Python核心操作，是圍繞著資料科學的周邊展開，涉及到Numpy、Pandas、以及機器學習庫，sklearn、pytorch、TensorFlow等等。

1. 導入庫並設定預設參數

介紹：

導入Python資料科學常用函式庫並設定一些預設參數，例如顯示所有欄位、禁止科學計數法等。

範例：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

pd.set_option('display.max_columns', None)  # 显示所有列
pd.set_option('display.float_format', lambda x: '%.3f' % x)  # 禁用科学计数法
sns.set(style="whitegrid")  # 设置默认Seaborn样式

2. 建立多維NumPy數組並檢查其屬性

介紹：

建立一個2x3的NumPy數組，並檢查其形狀、維度和資料類型。

範例：

arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape)  # (2, 3)
print(arr.ndim)   # 2
print(arr.dtype)  # int64

3. NumPy數組的基礎操作

介紹：

在NumPy數組上進行基礎數學運算，如加減乘除。

範例：

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

sum_arr = arr1 + arr2  # 元素加法 [5, 7, 9]
mul_arr = arr1 * arr2  # 元素乘法 [4, 10, 18]
exp_arr = np.exp(arr1)  # 指数 [2.718, 7.389, 20.085]

4. 產生隨機數矩陣

介紹：

產生一個3x3的隨機矩陣，可以指定範圍。

範例：

random_matrix = np.random.randint(1, 100, size=(3, 3))
print(random_matrix)

5. Pandas創建DataFrame並查看基本信息

介紹：

建立DataFrame並查看其前幾行、資料類型等資訊。

範例：

data = {'Name': ['Tom', 'Jerry', 'Spike'],
        'Age': [25, 30, 22],
        'Score': [85.5, 90.1, 78.3]}
df = pd.DataFrame(data)

print(df.head())  # 查看前几行
print(df.info())  # 数据类型和非空计数
print(df.describe())  # 统计描述

6. 讀取CSV檔案並處理缺失值

介紹：

從CSV檔案讀取數據，並處理缺失值，如填充或刪除缺失資料。

範例：

df = pd.read_csv('data.csv')

# 查看缺失值情况
print(df.isnull().sum())

# 填充缺失值
df['column_name'].fillna(df['column_name'].mean(), inplace=True)

# 或者删除有缺失值的行
df.dropna(inplace=True)

7. Pandas篩選數據

介紹：

透過條件篩選DataFrame中的資料。

範例：

df_filtered = df[df['Age'] > 25]  # 筛选年龄大于25的行

8. Pandas分組操作

介紹：

對DataFrame進行分組操作，常用於聚合統計。

範例：

grouped = df.groupby('Category')
mean_scores = grouped['Score'].mean()  # 计算每个分类的平均得分

9. Pandas資料透視表

介紹：

建立資料透視表用於資料匯總和分析。

範例：

pivot_table = pd.pivot_table(df, values='Score', index='Category', columns='Gender', aggfunc=np.mean)
print(pivot_table)

10. 資料視覺化- 基本Matplotlib繪圖

介紹：

使用Matplotlib繪製簡單的折線圖。

範例：

x = np.linspace(0, 10, 100)
y = np.sin(x)

plt.plot(x, y)
plt.title('Sine Wave')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

11. Seaborn資料視覺化- 線性迴歸圖

介紹：

使用Seaborn繪製帶有迴歸線的散佈圖。

範例：

sns.lmplot(x='Age', y='Score', data=df, height=6, aspect=1.5)
plt.show()

12. Matplotlib繪製多子圖

介紹：

在同一畫布上繪製多個子圖。

範例：

fig, axes = plt.subplots(2, 2, figsize=(10, 10))

axes[0, 0].plot(x, y)
axes[0, 1].plot(x, np.cos(x))
axes[1, 0].plot(x, np.tan(x))
axes[1, 1].plot(x, -y)

plt.show()

13. Seaborn資料分佈視覺化

介紹：

繪製資料的分佈圖，直觀展示資料分佈形態。

範例：

sns.histplot(df['Score'], kde=True)
plt.show()

14. Pandas處理日期數據

介紹：

將字串轉換為日期格式，並進行日期操作。

範例：

df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month

15. Pandas合併DataFrame

介紹：

透過merge操作合併兩個DataFrame，類似SQL中的JOIN操作。

範例：

df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['A', 'B', 'D'], 'value': [4, 5, 6]})

merged_df = pd.merge(df1, df2, on='key', how='inner')  # 内连接

16. Pandas透視表和層次化索引

介紹：

使用透視表進行資料聚合和層次化索引操作。

範例：

pivot = pd.pivot_table(df, values='Sales', index=['Region', 'Product'], columns='Year', aggfunc='sum')

17. 處理類別變數

介紹：

將類別變數轉換為數值類型（如使用啞變數）。

範例：

df = pd.get_dummies(df, columns=['Category'], drop_first=True)

18. 繪製相關性矩陣與熱力圖

介紹：

計算DataFrame的相關性並繪製熱力圖，展示變數之間的線性關係。

範例：

corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.show()

19. 拆分訓練集和測試集

介紹：

使用sklearn庫將資料集劃分為訓練集和測試集。

範例：

from sklearn.model_selection import train_test_split

X = df[['Age', 'Score']]
y = df['Outcome']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

20. 建構線性迴歸模型

介紹：

使用sklearn建立並訓練線性迴歸模型。

範例：

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

predictions = model.predict(X_test)

21. 模型評估- 均方誤差

介紹：

計算模型的均方誤差(MSE)，評估模型表現。

範例：

from sklearn.metrics import mean_squared_error



mse = mean_squared_error(y_test, predictions)
print(f'MSE: {mse:.3f}')

22. 交叉驗證

介紹：

使用交叉驗證評估模型的穩定性和泛化性能。

範例：

from sklearn.model_selection import cross_val_score

scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_squared_error')
print(f'Cross-validated MSE: {-scores.mean():.3f}')

23. 標準化數據

介紹：

標準化特徵以便將其縮放至同一量綱。

範例：

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

24. 決策樹模型

介紹：

使用sklearn庫建立決策樹分類模型。

範例：

from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

accuracy = clf.score(X_test, y_test)
print(f'Accuracy: {accuracy:.3f}')

25. 隨機森林模型

介紹：

使用隨機森林演算法進行分類。

範例：

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

accuracy = rf.score(X_test, y_test)
print(f'Accuracy: {accuracy:.3f}')

26. 特徵重要性

介紹：

使用隨機森林提取重要特徵。

範例：

feature_importances = rf.feature_importances_
print(feature_importances)

27. PCA主成分分析

介紹：

使用PCA降維，減少資料的維度。

範例：

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

28. K-Means聚類

介紹：

使用K-Means演算法進行無監督學習，進行聚類分析。

範例：

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

labels = kmeans.labels_

29. 評價聚類結果

介紹：

計算輪廓係數(Silhouette Score)評估聚類效果。

範例：

from sklearn.metrics import silhouette_score

score = silhouette_score(X, labels)
print(f'Silhouette Score: {score:.3f}')

30. 邏輯迴歸模型

介紹：

建立邏輯迴歸模型用於分類。

範例：

from sklearn.linear_model import LogisticRegression

log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

accuracy = log_reg.score(X_test, y_test)
print(f'Accuracy: {accuracy:.3f}')

31. Grid Search 網格搜索

介紹：

透過網格搜尋來調優模型超參數，尋找最佳參數組合。

範例：

from sklearn.model_selection import GridSearchCV

param_grid = {'n_estimators': [50, 100, 200], 'max_depth': [None, 10, 20]}
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)

print(f'Best parameters: {grid_search.best_params_}')
print(f'Best score: {grid_search.best_score_}')

32. Randomized Search 隨機搜索

介紹：

隨機搜尋用於尋找最佳超參數，比網格搜尋更快適用於大範圍參數搜尋。

範例：

from sklearn.model_selection import RandomizedSearchCV

param_dist = {'n_estimators': [50, 100, 200], 'max_depth': [None, 10, 20]}
random_search = RandomizedSearchCV(estimator=rf, param_distributions=param_dist, n_iter=10, cv=5)
random_search.fit(X_train, y_train)

print(f'Best parameters: {random_search.best_params_}')

33. XGBoost模型

介紹：

使用XGBoost進行梯度提升分類。

範例：

from xgboost import XGBClassifier

xgb_model = XGBClassifier(n_estimators=100, random_state=42)
xgb_model.fit(X_train, y_train)

accuracy = xgb_model.score(X_test, y_test)
print(f'Accuracy: {accuracy:.3f}')

34. LightGBM模型

介紹：

使用LightGBM進行快速梯度提升分類。

範例：

import lightgbm as lgb

lgb_model = lgb.LGBMClassifier(n_estimators=100, random_state=42)
lgb_model.fit(X_train, y_train)

accuracy = lgb_model.score(X_test, y_test)
print(f'Accuracy: {accuracy:.3f}')

35. CatBoost模型

介紹：

使用CatBoost處理類別特徵的梯度提升模型。

範例：

from catboost import CatBoostClassifier

cat_model = CatBoostClassifier(n_estimators=100, random_state=42, verbose=0)
cat_model.fit(X_train, y_train)

accuracy = cat_model.score(X_test, y_test)
print(f'Accuracy: {accuracy:.3f}')

36. 支援向量機（SVM）分類

介紹：

使用SVM進行二分類任務，適用於高維度資料。

範例：

from sklearn.svm import SVC

svm_model = SVC(kernel='linear', C=1)
svm_model.fit(X_train, y_train)

accuracy = svm_model.score(X_test, y_test)
print(f'Accuracy: {accuracy:.3f}')

37. K近鄰演算法（KNN）分類

介紹：

使用KNN演算法進行分類。

範例：

from sklearn.neighbors import KNeighborsClassifier

knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model.fit(X_train, y_train)

accuracy = knn_model.score(X_test, y_test)
print(f'Accuracy: {accuracy:.3f}')

38. 多項式迴歸

介紹：

使用多項式迴歸進行非線性關係建模。

範例：

from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(degree=3)
X_poly = poly.fit_transform(X)

lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)

39. 嶺迴歸（L2正規化）

介紹：

使用嶺迴歸（L2正則化）以防止過擬合。

範例：

from sklearn.linear_model import Ridge

ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train, y_train)

40. Lasso回歸（L1正則化）

介紹：

使用Lasso回歸（L1正則化）進行特徵選擇。

範例：

from sklearn.linear_model import Lasso

lasso_model = Lasso(alpha=0.1)
lasso_model.fit(X_train, y_train)

41. ElasticNet回歸

介紹：

結合L1和L2正規化的ElasticNet回歸。

範例：

from sklearn.linear_model import ElasticNet

enet_model = ElasticNet(alpha=0.1, l1_ratio=0.7)
enet_model.fit(X_train, y_train)

42. Stochastic Gradient Descent (SGD)分類

介紹：

使用SGD進行大規模線性分類任務。

範例：

from sklearn.linear_model import SGDClassifier

sgd_model = SGDClassifier(max_iter=1000, tol=1e-3)
sgd_model.fit(X_train, y_train)

43. DBSCAN密度聚類

介紹：

使用DBSCAN進行密度聚類，適用於非凸形狀資料。

範例：

from sklearn.cluster import DBSCAN

dbscan = DBSCAN(eps=0.5, min_samples=5)
labels = dbscan.fit_predict(X)

44. 層次聚類

介紹：

使用層次聚類進行無監督學習並視覺化聚類層次。

範例：

from scipy.cluster.hierarchy import dendrogram, linkage

linked = linkage(X, method='ward')
dendrogram(linked)
plt.show()

45. 孤立森林（異常檢測）

介紹：

使用孤立森林進行異常檢測。

範例：

from sklearn.ensemble import IsolationForest

iso_forest = IsolationForest(contamination=0.1)
iso_forest.fit(X)

anomalies = iso_forest.predict(X)

46. 主成分分析（PCA）視覺化

介紹：

將PCA結果進行視覺化，展示降維後資料的分佈。

範例：

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y)
plt.title('PCA Visualization')
plt.show()

47. t-SNE降維可視化

介紹：

使用t-SNE進行降維並視覺化高維度資料的分佈。

範例：

from sklearn.manifold import TSNE

tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X)

plt.scatter(X_tsne[:, 0], X_tsne[:, 1], c=y)
plt.title('t-SNE Visualization')
plt.show()

48. ROC曲線繪製

介紹：

繪製Receiver Operating Characteristic (ROC)曲線，評估二分類模型的表現。

範例：

from sklearn.metrics import roc_curve, auc

y_prob = model.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_prob)
roc_auc = auc(fpr, tpr)

plt.plot(fpr, tpr, label=f'AUC = {roc_auc:.3f}')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.show()

49. 混淆矩陣

介紹：

使用混淆矩陣評估分類模型的表現。

範例：

from sklearn.metrics import confusion_matrix

y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d')
plt.title('Confusion Matrix')
plt.show()

50. 精準度、召回率和F1分數

介紹：

計算分類模型的精確度、召回率和F1分數，用於評估模型效能。

範例：

from sklearn.metrics import precision_score, recall_score, f1_score

y_pred = model.predict(X_test)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

print(f'Precision: {precision:.3f}')
print(f'Recall: {recall:.3f}')
print(f'F1 Score: {f1:.3f}')

51. 特徵選擇- 基於模型的選擇

介紹：

使用模型的特徵重要性進行特徵選擇。

範例：

from sklearn.feature_selection import SelectFromModel

selector = SelectFromModel(rf, threshold='mean')
X_selected = selector.fit_transform(X_train, y_train)

52. 交叉驗證- 分層K折

介紹：

使用分層K折交叉驗證確保每個折中類別分佈均勻。

範例：

from sklearn.model_selection import StratifiedKFold

skf = StratifiedKFold(n_splits=5)
for train_index, test_index in skf.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

53. 標準化和歸一化

介紹：

數據標準化（z-score標準化）和歸一化（min-max縮放）。

範例：

from sklearn.preprocessing import StandardScaler, MinMaxScaler

# 标准化
scaler = StandardScaler()
X_standardized = scaler.fit_transform(X)

# 归一化
minmax_scaler = MinMaxScaler()
X_normalized = minmax_scaler.fit_transform(X)

54. 資料拆分- 自訂拆分

介紹：

根據自訂條件拆分資料集。

範例：

train_df = df[df['Year'] < 2020]
test_df = df[df['Year'] >= 2020]

55. 時間序列分析- 自相關圖

介紹：

繪製自相關圖分析時間序列資料的相關性。

範例：

from statsmodels.graphics.tsaplots import plot_acf

plot_acf(df['value'])
plt.show()

56. 時間序列分析- 滾動均值

介紹：

計算和繪製滾動平均值以平滑時間序列資料。

範例：

df['Rolling_Mean'] = df['value'].rolling(window=12).mean()
df[['value', 'Rolling_Mean']].plot()
plt.show()

57. 資料處理- 應用函數

介紹：

在Pandas DataFrame的欄位上套用自訂函數。

範例：

def custom_function(x):
    return x * 2

df['new_column'] = df['column_name'].apply(custom_function)

58. 資料處理- 資料透視表的總和函數

介紹：

使用資料透視表進行更複雜的聚合操作。

範例：

pivot_table = pd.pivot_table(df, values='Sales', index='Region', columns='Product', aggfunc={'Sales': np.sum, 'Profit': np.mean})

59. 交叉表

介紹：

建立交叉表用於分析類別變數之間的關係。

範例：

crosstab = pd.crosstab(df['Category'], df['Outcome'])

60. 資料處理- 資料清洗

介紹：

處理重複資料和異常值。

範例：

df = df.drop_duplicates()  # 删除重复行
df = df[df['column_name'] < threshold]  # 处理异常值

61. 分佈擬合- 常態分佈

介紹：

使用scipy庫擬合資料到常態分佈。

範例：

from scipy import stats

mu, std = stats.norm.fit(df['value'])

62. 線性模型- 多項式迴歸

介紹：

擴展線性模型以處理非線性數據。

範例：

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

poly = PolynomialFeatures(degree=3)
X_poly = poly.fit_transform(X)

model = LinearRegression()
model.fit(X_poly, y)

63. 深度學習- TensorFlow基礎

介紹：

使用TensorFlow進行基本的深度學習模型建構。

範例：

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model = Sequential([
    Dense(64, activation='relu', input_shape=(input_dim,)),
    Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=5)

64. 深度學習- Keras基礎

介紹：

使用Keras建立和訓練深度學習模型。

範例：

from keras.models import Sequential
from keras.layers import Dense

model = Sequential([
    Dense(128, activation='relu', input_shape=(input_dim,)),
    Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10)

65. 模型保存與載入

介紹：

保存和載入深度學習模型。

範例：

model.save('my_model.h5')  # 保存模型
loaded_model = tf.keras.models.load_model('my_model.h5')  # 加载模型

66. 模型評估- 混淆矩陣與分類報告

介紹：

評估模型效能並產生分類報告。

範例：

from sklearn.metrics import classification_report

y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

67. 超參數調優- 貝葉斯最佳化

介紹：

使用貝葉斯最佳化進行超參數調優。

範例：

from skopt import BayesSearchCV

bayes_search = BayesSearchCV(estimator=rf, search_spaces={'n_estimators': (50, 200), 'max_depth': (5, 30)}, n_iter=50)
bayes_search.fit(X_train, y_train)

print(f'Best parameters: {bayes_search.best_params_}')

68. 時間序列- 季節性分解

介紹：

分解時間序列資料為趨勢、季節性和殘差成分。

範例：

from statsmodels.tsa.seasonal import seasonal_decompose

decomposition = seasonal_decompose(df['value'], model='additive', period=12)
decomposition.plot()
plt.show()

69. 時間序列- ARIMA模型

介紹：

使用ARIMA模型進行時間序列預測。

範例：

from statsmodels.tsa.arima_model import ARIMA

model = ARIMA(df['value'], order=(5,1,0))
model_fit = model.fit(disp=0)

forecast = model_fit.forecast(steps=10)[0]

70. 異常檢測- LOF（局部離群因子）

介紹：

使用LOF進行異常檢測。

範例：

from sklearn.neighbors import LocalOutlierFactor

lof = LocalOutlierFactor(n_neighbors=20)
outliers = lof.fit_predict(X)

71. 協方差矩陣

介紹：

計算資料的協方差矩陣，了解變數之間的線性關係。

範例：

covariance_matrix = np.cov(df[['x1', 'x2']].T)

72. 條件機率計算

介紹：

計算類別變數的條件機率。

範例：：

conditional_prob = pd.crosstab(df['Category'], df['Outcome'], normalize='index')

73. 資訊增益計算

介紹：

計算資訊增益，用於特徵選擇。

範例：

from sklearn.feature_selection import mutual_info_classif

mi = mutual_info_classif(X, y)

74. 常態性檢定- Shapiro-Wilk檢驗

介紹：

使用Shapiro-Wilk檢定檢查資料是否服從常態分佈。

範例：

from scipy.stats import shapiro

stat, p_value = shapiro(df['value'])

75. 變異數分析（ANOVA）

介紹：

進行變異數分析來比較不同組別之間的平均值。

範例：

from scipy.stats import f_oneway

f_stat, p_value = f_oneway(df['group1'], df['group2'], df['group3'])

76. Bootstrapping

介紹：

使用自助法進行模型評估和不確定性估計。

範例：

from sklearn.utils import resample

bootstrapped_samples = resample(df, n_samples=1000, random_state=42)

77. 貝葉斯網絡

介紹：

使用貝葉斯網路進行機率推斷。

範例：

from pomegranate import BayesianNetwork

model = BayesianNetwork.from_samples(X, algorithm='chow-liu')

78. 決策樹視覺化

介紹：

可視化決策樹以理解模型決策過程。

範例：

from sklearn.tree import export_graphviz
import graphviz

dot_data = export_graphviz(clf, out_file=None, feature_names=X.columns, class_names=['0', '1'], filled=True, rounded=True)
graph = graphviz.Source(dot_data)
graph.render('decision_tree')

79. 熱圖

介紹：

使用熱圖展示資料的相關性或頻率。

範例：

import seaborn as sns

sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.show()

80. 3D散點圖

介紹：

繪製三維散點圖以視覺化三維資料。

範例：

from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df['x1'], df['x2'], df['x3'])
plt.show()

81. 小提琴圖

介紹：

使用小提琴圖展示資料分佈的密度。

範例：

sns.violinplot(x='Category', y='Value', data=df)
plt.show()

82. 箱線圖

介紹：

使用箱線圖展示資料的分佈及異常值。

範例：

sns.boxplot(x='Category', y='Value', data=df)
plt.show()

83. 直方圖

介紹：

繪製直方圖以顯示資料的分佈。

範例：

df['value'].hist(bins=30)
plt.show()

84. KDE（核密度估計）

介紹：

繪製KDE圖以估計資料的機率密度函數。

範例：

sns.kdeplot(df['value'])
plt.show()

85. 圖形化模型效能

介紹：

使用不同圖形展示模型效能，例如學習曲線。

範例：

from sklearn.model_selection import learning_curve

train_sizes, train_scores, test_scores = learning_curve(model, X, y, cv=5)

plt.plot(train_sizes, train_scores.mean(axis=1), 'o-', label='Training score')
plt.plot(train_sizes, test_scores.mean(axis=1), 'o-', label='Test score')
plt.xlabel('Training examples')
plt.ylabel('Score')
plt.title('Learning Curve')
plt.legend()
plt.show()

86. 模型的係數可視化

介紹：

可視化線性模型的係數，以理解特徵對預測的影響。

範例：

coef = model.coef_
plt.bar(range(len(coef)), coef)
plt.xlabel('Feature index')
plt.ylabel('Coefficient value')
plt.title('Model Coefficients')
plt.show()

87. RNN基礎

介紹：

建構簡單的循環神經網路（RNN）進行序列預測。

範例：

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense

model = Sequential([
    SimpleRNN(50, input_shape=(timesteps, features)),
    Dense(1)
])

model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X_train, y_train, epochs=10)

88. LSTM網絡

介紹：

使用長短期記憶網路（LSTM）進行序列資料預測。

範例：

from tensorflow.keras.layers import LSTM

model = Sequential([
    LSTM(50, input_shape=(timesteps, features)),
    Dense(1)
])

model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X_train, y_train, epochs=10)

89. 數據增強

介紹：

在影像資料上使用資料增強技術進行模型訓練。

範例：

from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

datagen.fit(X_train)

90. 圖神經網路（GNN）基礎

介紹：

使用圖神經網路處理圖結構資料。

範例：

import torch
from torch_geometric.nn import GCNConv

class GCN(torch.nn.Module):
    def __init__(self):
        super(GCN, self).__init__()
        self.conv1 = GCNConv(num_features, 64)
        self.conv2 = GCNConv(64, num_classes)
    
    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        x = self.conv1(x, edge_index)
        x = x.relu()
        x = self.conv2(x, edge_index)
        return x

91. 自動編碼器

介紹：

建立自動編碼器進行資料降維和特徵學習。

範例：

from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model

input_layer = Input(shape=(input_dim,))
encoded = Dense(64, activation='relu')(input_layer)
decoded = Dense(input_dim, activation='sigmoid')(encoded)

autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
autoencoder.fit(X_train, X_train, epochs=50)

92. 生成對抗網路（GAN）

介紹：

使用GAN產生新的資料樣本。

範例：

from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model

# Generator
noise = Input(shape=(100,))
x = Dense(128, activation='relu')(noise)
generated_image = Dense(784, activation='sigmoid')(x)

generator = Model(noise, generated_image)

# Discriminator
image = Input(shape=(784,))
x = Dense(128, activation='relu')(image)
validity = Dense(1, activation='sigmoid')(x)

discriminator = Model(image, validity)

93. 影像分類- 卷積神經網路（CNN）

介紹：

建構卷積神經網路進行影像分類。

範例：

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='

relu'),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(num_classes, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10)

94. 異常檢測- Isolation Forest

介紹：

使用Isolation Forest進行異常檢測。

範例：

from sklearn.ensemble import IsolationForest

iso_forest = IsolationForest(contamination=0.1)
outliers = iso_forest.fit_predict(X)

95. 模型融合- 隨機森林與梯度提升

介紹：

結合隨機森林和梯度提升模型進行模型融合。

範例：

from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.ensemble import VotingClassifier

rf = RandomForestClassifier(n_estimators=100)
gb = GradientBoostingClassifier(n_estimators=100)

ensemble_model = VotingClassifier(estimators=[('rf', rf), ('gb', gb)], voting='soft')
ensemble_model.fit(X_train, y_train)

96. 主成分分析（PCA）視覺化

介紹：

使用PCA降維並視覺化資料。

範例：

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y)
plt.xlabel('PCA 1')
plt.ylabel('PCA 2')
plt.title('PCA of dataset')
plt.show()

97. 特徵重要性視覺化

介紹：

可視化特徵的重要性評分。

範例：

feature_importances = model.feature_importances_
plt.bar(range(len(feature_importances)), feature_importances)
plt.xlabel('Feature index')
plt.ylabel('Importance')
plt.title('Feature Importances')
plt.show()

98. 超參數搜索- 網格搜索

介紹：

使用網格搜尋進行超參數最佳化。

範例：

from sklearn.model_selection import GridSearchCV

param_grid = {'n_estimators': [50, 100, 200], 'max_depth': [10, 20, 30]}
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)

print(f'Best parameters: {grid_search.best_params_}')

99. 時間序列預測- SARIMA

介紹：

使用SARIMA進行季節性時間序列預測。

範例：

from statsmodels.tsa.statespace.sarimax import SARIMAX

model = SARIMAX(df['value'], order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
model_fit = model.fit(disp=0)

forecast = model_fit.forecast(steps=10)

100. 文字資料- 詞雲

介紹：

使用詞雲可視化文字資料中的關鍵字。

範例：

from wordcloud import WordCloud

text = ' '.join(df['text_column'])
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text)

plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

2024年9月15日 星期日

[文章轉貼] 超強總結！ 100個Python核心操作！ ！

1. 導入庫並設定預設參數

2. 建立多維NumPy數組並檢查其屬性

3. NumPy數組的基礎操作

4. 產生隨機數矩陣

5. Pandas創建DataFrame並查看基本信息

6. 讀取CSV檔案並處理缺失值

7. Pandas篩選數據

8. Pandas分組操作

9. Pandas資料透視表

10. 資料視覺化- 基本Matplotlib繪圖

11. Seaborn資料視覺化- 線性迴歸圖

12. Matplotlib繪製多子圖

13. Seaborn資料分佈視覺化

14. Pandas處理日期數據

15. Pandas合併DataFrame

16. Pandas透視表和層次化索引

17. 處理類別變數

18. 繪製相關性矩陣與熱力圖

19. 拆分訓練集和測試集

20. 建構線性迴歸模型

21. 模型評估- 均方誤差

22. 交叉驗證

23. 標準化數據

24. 決策樹模型

25. 隨機森林模型

26. 特徵重要性

27. PCA主成分分析

28. K-Means聚類

29. 評價聚類結果

30. 邏輯迴歸模型

31. Grid Search 網格搜索

32. Randomized Search 隨機搜索

33. XGBoost模型

34. LightGBM模型

35. CatBoost模型

36. 支援向量機（SVM）分類

37. K近鄰演算法（KNN）分類

38. 多項式迴歸

39. 嶺迴歸（L2正規化）

40. Lasso回歸（L1正則化）

41. ElasticNet回歸

42. Stochastic Gradient Descent (SGD)分類

43. DBSCAN密度聚類

44. 層次聚類

45. 孤立森林（異常檢測）

46. 主成分分析（PCA）視覺化

47. t-SNE降維可視化

48. ROC曲線繪製

49. 混淆矩陣

50. 精準度、召回率和F1分數

51. 特徵選擇- 基於模型的選擇

52. 交叉驗證- 分層K折

53. 標準化和歸一化

54. 資料拆分- 自訂拆分

55. 時間序列分析- 自相關圖

56. 時間序列分析- 滾動均值

57. 資料處理- 應用函數

58. 資料處理- 資料透視表的總和函數

59. 交叉表

60. 資料處理- 資料清洗

61. 分佈擬合- 常態分佈

62. 線性模型- 多項式迴歸

63. 深度學習- TensorFlow基礎

64. 深度學習- Keras基礎

65. 模型保存與載入

66. 模型評估- 混淆矩陣與分類報告

67. 超參數調優- 貝葉斯最佳化

68. 時間序列- 季節性分解

69. 時間序列- ARIMA模型

70. 異常檢測- LOF（局部離群因子）

71. 協方差矩陣

72. 條件機率計算

73. 資訊增益計算

74. 常態性檢定- Shapiro-Wilk檢驗

75. 變異數分析（ANOVA）

76. Bootstrapping

77. 貝葉斯網絡

78. 決策樹視覺化

2024年9月15日星期日

[文章轉貼] 超強總結！ 100個Python核心操作！！