第43讲：积分、协整和平稳性 (Integration, Cointegration, and Stationarity)

💡

查看全集：💎Quantopian量化分析56讲

一、平稳性与非平稳性

平稳性是时间序列分析的核心概念。一个平稳序列的统计特性（均值、方差、自相关性）不随时间变化。数学上，严平稳要求所有统计特性不变，弱平稳仅要求一阶矩和二阶矩不变。

非平稳序列的统计特性随时间变化，常见类型包括：

趋势性非平稳（如持续上升的股价）

季节性非平稳（如气温变化）

结构性突变的序列

import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller

# 生成平稳序列示例
np.random.seed(42)
stationary_series = pd.Series(np.random.normal(0, 1, 100), name='平稳序列')

# 生成非平稳序列示例（带趋势）
trend = np.linspace(0, 5, 100)
non_stationary_series = pd.Series(trend + np.random.normal(0, 1, 100), name='非平稳序列')

# 可视化对比
fig, ax = plt.subplots(2, 1, figsize=(10, 6))
stationary_series.plot(ax=ax[0], title='平稳序列示例')
non_stationary_series.plot(ax=ax[1], title='非平稳序列示例')
plt.tight_layout()

1.1 ADF检验原理

Augmented Dickey-Fuller检验用于检测单位根存在：

原假设（H0）：序列存在单位根（非平稳）

备择假设（H1）：序列不存在单位根（平稳）

ADF单位根检验的回归方程：

\Delta y_t = \alpha + \beta t + \gamma y_{t-1} + \sum_{i=1}^p \delta_i \Delta y_{t-i} + \epsilon_t

其中：

$\Delta y_t$ 为序列的一阶差分

$\alpha$ 为常数项

$\beta t$ 为时间趋势项

$\gamma y_{t-1}$ 为滞后项

$\sum_{i=1}^p \delta_i \Delta y_{t-i}$ 为差分滞后项之和

$\epsilon_t$ 为随机误差项

$p$ 为滞后阶数

def adf_test(series, signif=0.05):
    result = adfuller(series, autolag='AIC')
    print(f'ADF统计量: {result[0]:.3f}')
    print(f'p值: {result[1]:.3f}')
    print('临界值:')
    for k, v in result[4].items():
        print(f'  {k}: {v:.3f}')
    print(f'结论: 序列{"非" if result[1] > signif else ""}平稳')

adf_test(stationary_series)
adf_test(non_stationary_series)

1.2 实战练习

尝试对S&P 500指数进行平稳性检验：

# 获取标普500指数数据
sp500 = yf.download('^GSPC', start='2010-01-01', end='2020-01-01')['Close']

# 绘制价格序列和收益率序列
returns = sp500.pct_change().dropna()

fig, ax = plt.subplots(2, 1, figsize=(10, 6))
sp500.plot(ax=ax[0], title='标普500价格序列')
returns.plot(ax=ax[1], title='标普500收益率序列')
plt.tight_layout()

# 进行ADF检验
print("价格序列检验结果:")
adf_test(sp500)

print("\n收益率序列检验结果:")
adf_test(returns)

二、整合阶数（Order of Integration）

I(0)：平稳序列，无需差分

I(1)：一阶差分后平稳（如随机游走）

I(d)：需要d次差分才能平稳

整合阶数的数学表达式：

y_t \sim I(d) \Leftrightarrow \Delta^d y_t \sim I(0)

其中：

$I(0)$ 表示平稳序列，无需差分

$I(1)$ 表示一阶单整序列，需要一次差分得到平稳序列

$I(d)$ 表示d阶单整序列，需要d次差分得到平稳序列

$\Delta^d$ 表示d阶差分算子

$\sim$ 表示序列具有相应的整合性质

$\Leftrightarrow$ 表示充分必要条件

这个表达式说明：如果时间序列 $y_t$ 是d阶单整的，当且仅当对其进行d次差分后能得到平稳序列

2.1 阶数判断流程

原始序列进行ADF检验

若拒绝原假设，确定为I(0)

否则，进行一阶差分后检验，重复直到平稳

def determine_integration_order(series, max_order=3):
    current_order = 0
    current_series = series.copy()

    while current_order <= max_order:
        p_value = adfuller(current_series.dropna())[1]
        if p_value < 0.05:
            print(f'序列在{current_order}阶差分后平稳')
            return current_order
        current_series = current_series.diff().dropna()
        current_order += 1
    print(f'在{max_order}阶差分后仍未平稳')
    return None

# 测试非平稳序列
determine_integration_order(non_stationary_series)

# 测试平稳序列
determine_integration_order(stationary_series)

2.2 金融数据特征

典型金融价格序列多为I(1)，收益率序列多为I(0)：

# 获取苹果公司股价数据
aapl = yf.download('AAPL', start='2015-01-01', end='2020-01-01')['Close']

# 分析整合阶数
print("价格序列整合阶数:")
determine_integration_order(aapl)

print("\n收益率序列整合阶数:")
determine_integration_order(aapl.pct_change().dropna())

三、协整性分析

协整关系的数学定义：

X_t \sim I(1),\ Y_t \sim I(1) \ \exists \beta \text{ 使得 } Y_t - \beta X_t \sim I(0)

协整检验的回归方程：

Y_t = \alpha + \beta X_t + \epsilon_t

其中：

$X_t$ 和 $Y_t$ 为待检验的时间序列

$\alpha$ 为常数项

$\beta$ 为协整系数

$\epsilon_t$ 为残差项

$I(1)$ 表示一阶单整序列

$I(0)$ 表示平稳序列

$\sim$ 表示具有某种整合性质

$\exists$ 表示存在

检验步骤需要验证：

$X_t$ 和 $Y_t$ 均为 $I(1)$ 序列

残差 $\epsilon_t$ 为 $I(0)$ 序列

3.1 协整检验步骤

验证两个序列均为I(1)

建立回归模型： $Y_t = \alpha + \beta X_t + \epsilon_t$

检验残差序列ϵt的平稳性 $\epsilon_t$

from statsmodels.tsa.stattools import coint

# 获取协整股票对示例
symbols = ['SPY', 'IVV']  # 两个高度相关的ETF
data = yf.download(symbols, start='2015-01-01', end='2020-01-01')['Close']

# 协整检验
score, pvalue, _ = coint(data[symbols[0]], data[symbols[1]])
print(f'协整检验p值: {pvalue:.4f}')
if pvalue < 0.05:
    print("拒绝原假设，存在协整关系")
else:
    print("未发现显著协整关系")

# 可视化价格序列和价差
spread = data[symbols[1]] - 0.98*data[symbols[0]]  # 通过回归得到的beta系数

fig, ax = plt.subplots(2, 1, figsize=(10,6))
data.plot(ax=ax[0], title='价格序列对比')
spread.plot(ax=ax[1], title='价差序列')
plt.tight_layout()

3.2 配对交易策略基础

寻找协整资产对

计算历史价差均值和标准差

当价差偏离均值超过阈值时开仓

价差回归时平仓

# 计算价差的Z-score
mean_spread = spread.mean()
std_spread = spread.std()
zscore = (spread - mean_spread)/std_spread

# 可视化交易信号
plt.figure(figsize=(10,4))
zscore.plot(label='Z-score')
plt.axhline(2, color='r', linestyle='--')
plt.axhline(-2, color='g', linestyle='--')
plt.legend()
plt.title('价差Z-score和交易信号')

进阶对比分析

检验类型	应用场景	核心假设	输出解读
ADF检验	单序列平稳性判断	线性趋势、无结构突变	p<0.05拒绝非平稳原假设
KPSS检验	平稳性补充检验	趋势平稳性	p<0.05拒绝平稳原假设
Johansen检验	多变量协整关系检验	多个协整关系可能性	迹统计量判断协整向量数量
Engle-Granger	双变量协整检验	单协整关系	残差平稳性判断协整存在

关键公式总结

ADF检验模型：

\Delta y_t = \alpha + \beta t + \gamma y_{t-1} + \sum_{i=1}^p \delta_i \Delta y_{t-i} + \epsilon_t

协整回归方程：

Y_t = \alpha + \beta X_t + \epsilon_t

价差Z-score计算：

Z_t = \frac{spread_t - \mu_{spread}}{\sigma_{spread}}

其中：

$spread_t$ 为t时刻的价差

$\mu_{spread}$ 为价差的均值

$\sigma_{spread}$ 为价差的标准差

综合练习

选择三个行业ETF（如XLE能源、XLF金融、XLV医疗）

检验每对ETF之间的协整关系

对存在协整关系的组合计算价差序列

制定简单的均值回归交易策略

可视化策略信号和理论收益曲线

# 示例代码框架
etfs = ['XLE', 'XLF', 'XLV']
data = yf.download(etfs, start='2015-01-01')['Close']

# 遍历所有组合对
from itertools import combinations

for pair in combinations(etfs, 2):
    # 执行协整检验
    # 计算价差
    # 生成交易信号
    # 回测简单策略
    pass

附：练习合集

练习

第42讲：风险价值和条件风险价值 (VaR and CVaR)

第44讲：配对交易入门 (Introduction to Pairs Trading)