给定华氏温度与摄氏温度值,建立两者间的线性回归模型。
数据准备:
fahrenheit = [-868, -778, -688, -598, -508, -418, -328, -238, -144, -58, 32, 122, 212, 302, 392, 482, 572, 662, 752, 842, 932]
celsius = [-500, -450, -400, -350, -300, -250, -200, -150, -100, -50, 0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500]
答题框架:
import statsmodels.api as sm
import matplotlib.pyplot as plt
def linreg(X, Y):
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit()
X_values = np.linspace(min(X[:,1]), max(X[:,1]), 100)
Y_pred = model.params[0] + model.params[1] * X_values
plt.scatter(X[:,1], Y, alpha=0.3)
plt.plot(X_values, Y_pred, 'r')
plt.xlabel('Celsius')
plt.ylabel('Fahrenheit')
return model.summary()
linreg(celsius, fahrenheit)
使用seaborn绘制回归线及95%置信区间。
数据准备:
import yfinance as yf
start = '2014-01-01'
end = '2015-01-01'
ko = yf.download('KO', start=start, end=end)['Close']
pep = yf.download('PEP', start=start, end=end)['Close']
returns_ko = ko.pct_change().dropna()
returns_pep = pep.pct_change().dropna()
答题框架:
import seaborn as sns
sns.regplot(x=returns_ko, y=returns_pep, ci=95)
plt.xlabel('KO Returns')
plt.ylabel('PEP Returns')
plt.show()
手动计算参数的95%置信区间。
答题框架:
import scipy.stats
model = sm.OLS(returns_pep, sm.add_constant(returns_ko)).fit()
X = np.vstack([returns_ko, np.ones(len(returns_ko))])
C = np.linalg.inv(X @ X.T) * model.mse_resid
SE = np.sqrt(np.diag(C))
dof = model.nobs - model.df_model - 1
t_crit = scipy.stats.t.ppf(0.975, dof)
beta = model.params[1]
se_beta = SE[1]
print(f'Beta置信区间: ({beta - t_crit*se_beta:.3f}, {beta + t_crit*se_beta:.3f})')
基于练习1的模型,手动计算决定系数R²。
答题框架:
y_pred = 32.1905 + 1.7998 * np.array(celsius)
ss_res = sum((np.array(fahrenheit) - y_pred)**2)
ss_tot = sum((np.array(fahrenheit) - np.mean(fahrenheit))**2)
r_squared = 1 - (ss_res / ss_tot)
print(f'R²值: {r_squared:.6f}')
分析SPY与GS的收益率残差。
数据准备:
spy = yf.download('SPY', start='2005-01-01', end='2010-01-01')['Close']
gs = yf.download('GS', start='2005-01-01', end='2010-01-01')['Close']
returns_spy = spy.pct_change().dropna()
returns_gs = gs.pct_change().dropna()
答题框架:
model = sm.OLS(returns_spy, sm.add_constant(returns_gs)).fit()
plt.scatter(model.predict(), model.resid)
plt.xlabel('Predicted Returns')
plt.ylabel('Residuals')
plt.title('残差散点图')
plt.show()
from statsmodels.stats.diagnostic import het_breuschpagan
_, p_val, _, _ = het_breuschpagan(model.resid, model.model.exog)
print(f'Breusch-Pagan检验p值: {p_val:.3f}')
分析SPY与XLF的残差模式。
数据准备:
xlf = yf.download('XLF', start='2005-01-01', end='2010-01-01')['Close']
returns_xlf = xlf.pct_change().dropna()
答题框架:
model = sm.OLS(returns_spy, sm.add_constant(returns_xlf)).fit()
plt.scatter(model.predict(), model.resid)
plt.xlabel('Predicted Returns')
plt.ylabel('Residuals')
plt.title('非随机残差模式')
plt.show()
_, p_val, _, _ = het_breuschpagan(model.resid, model.model.exog)
print(f'Heteroskedasticity p值: {p_val:.4f}')