查看全集:💎Quantopia量化分析56讲
多元线性回归模型表达为:
其中:
# 平方误差计算示例
Y_actual = np.array([1, 3.5, 4, 8, 12])
Y_pred = np.array([1, 3, 5, 7, 9])
print("平方误差之和:", np.sum((Y_pred - Y_actual)**2))
import yfinance as yf
# 下载标普500和个股数据
start = '2014-01-01'
end = '2015-01-01'
spy = yf.download('SPY', start=start, end=end)['Close']
aapl = yf.download('AAPL', start=start, end=end)['Close']
# 数据预处理
data = pd.DataFrame({'SPY': spy, 'AAPL': aapl}).dropna()
import statsmodels.api as sm
# 添加常数项
X = sm.add_constant(data['SPY'])
model = sm.OLS(data['AAPL'], X).fit()
print(model.summary())
关键输出解析:
# 绘制诊断图
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(12,8))
sm.graphics.plot_regress_exog(model, 'SPY', fig=fig)
# 下载多资产数据
assets = ['AAPL', 'MSFT', 'SPY']
data = yf.download(assets, start=start, end=end)['Close']
# 设置自变量和因变量
X = sm.add_constant(data[['MSFT', 'SPY']])
y = data['AAPL']
# 拟合模型
multi_model = sm.OLS(y, X).fit()
print(multi_model.summary())
模型类型 | β_MSFT | β_SPY | R-squared |
单变量 | 0.85 | - | 0.72 |
多变量 | 0.32 | 0.61 | 0.81 |
通过加入SPY,MSFT的系数显著降低,说明部分相关性被市场因素解释
通过迭代添加/删除变量,优化AIC指标:
from sklearn.feature_selection import SequentialFeatureSelector
selector = SequentialFeatureSelector(
estimator=LinearRegression(),
direction='forward',
scoring='neg_mean_squared_error'
)
selector.fit(X, y)
# 练习参考代码框架
from sklearn.linear_model import LinearRegression
# 初始化模型
lr = LinearRegression()
# 拟合数据
lr.fit(X_train, y_train)
# 预测
predictions = lr.predict(X_test)
提示:在金融应用中,常使用收益率而非原始价格进行分析。可尝试对数据进行对数差分处理:returns = data.pct_change().dropna()
通过本教程,您已掌握:
下一步可深入学习: