练习

练习1：温度转换

给定华氏温度与摄氏温度值，建立两者间的线性回归模型。

数据准备：

fahrenheit = [-868, -778, -688, -598, -508, -418, -328, -238, -144, -58, 32, 122, 212, 302, 392, 482, 572, 662, 752, 842, 932]
celsius = [-500, -450, -400, -350, -300, -250, -200, -150, -100, -50, 0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500]

答题框架：

import statsmodels.api as sm
import matplotlib.pyplot as plt

def linreg(X, Y):
    X = sm.add_constant(X)
    model = sm.OLS(Y, X).fit()
    X_values = np.linspace(min(X[:,1]), max(X[:,1]), 100)
    Y_pred = model.params[0] + model.params[1] * X_values
    plt.scatter(X[:,1], Y, alpha=0.3)
    plt.plot(X_values, Y_pred, 'r')
    plt.xlabel('Celsius')
    plt.ylabel('Fahrenheit')
    return model.summary()

linreg(celsius, fahrenheit)

练习2：置信区间

a. 可视化置信区间

使用seaborn绘制回归线及95%置信区间。

数据准备：

import yfinance as yf

start = '2014-01-01'
end = '2015-01-01'
ko = yf.download('KO', start=start, end=end)['Close']
pep = yf.download('PEP', start=start, end=end)['Close']
returns_ko = ko.pct_change().dropna()
returns_pep = pep.pct_change().dropna()

答题框架：

import seaborn as sns

sns.regplot(x=returns_ko, y=returns_pep, ci=95)
plt.xlabel('KO Returns')
plt.ylabel('PEP Returns')
plt.show()

b. 计算参数置信区间

手动计算参数的95%置信区间。

答题框架：

import scipy.stats

model = sm.OLS(returns_pep, sm.add_constant(returns_ko)).fit()
X = np.vstack([returns_ko, np.ones(len(returns_ko))])
C = np.linalg.inv(X @ X.T) * model.mse_resid
SE = np.sqrt(np.diag(C))

dof = model.nobs - model.df_model - 1
t_crit = scipy.stats.t.ppf(0.975, dof)

beta = model.params[1]
se_beta = SE[1]
print(f'Beta置信区间: ({beta - t_crit*se_beta:.3f}, {beta + t_crit*se_beta:.3f})')

练习3：手动计算R²

基于练习1的模型，手动计算决定系数R²。

答题框架：

y_pred = 32.1905 + 1.7998 * np.array(celsius)
ss_res = sum((np.array(fahrenheit) - y_pred)**2)
ss_tot = sum((np.array(fahrenheit) - np.mean(fahrenheit))**2)
r_squared = 1 - (ss_res / ss_tot)
print(f'R²值: {r_squared:.6f}')

练习4：残差分析

a. 残差分析案例1

分析SPY与GS的收益率残差。

数据准备：

spy = yf.download('SPY', start='2005-01-01', end='2010-01-01')['Close']
gs = yf.download('GS', start='2005-01-01', end='2010-01-01')['Close']
returns_spy = spy.pct_change().dropna()
returns_gs = gs.pct_change().dropna()

答题框架：

model = sm.OLS(returns_spy, sm.add_constant(returns_gs)).fit()
plt.scatter(model.predict(), model.resid)
plt.xlabel('Predicted Returns')
plt.ylabel('Residuals')
plt.title('残差散点图')
plt.show()

from statsmodels.stats.diagnostic import het_breuschpagan
_, p_val, _, _ = het_breuschpagan(model.resid, model.model.exog)
print(f'Breusch-Pagan检验p值: {p_val:.3f}')

b. 残差分析案例2

分析SPY与XLF的残差模式。

数据准备：

xlf = yf.download('XLF', start='2005-01-01', end='2010-01-01')['Close']
returns_xlf = xlf.pct_change().dropna()

答题框架：

model = sm.OLS(returns_spy, sm.add_constant(returns_xlf)).fit()
plt.scatter(model.predict(), model.resid)
plt.xlabel('Predicted Returns')
plt.ylabel('Residuals')
plt.title('非随机残差模式')
plt.show()

_, p_val, _, _ = het_breuschpagan(model.resid, model.model.exog)
print(f'Heteroskedasticity p值: {p_val:.4f}')