HS300股指与其成分股的价格匹配


观察沪深300股指收盘点时,会发现大部分个股的收盘价走势与股指不是同步的。下面一段程序提供了一个方法寻找沪深300股指收盘点位和其成分股收盘价格匹配度比较高的个股。 此段程序的主要思路是先确定股指和个股收盘价格的线性关系系数,然后通过计算其线性关系中的残差项,对其进行ADF检测来评估哪些个股走势与沪深300股指走势比较相近。


股指和个股使用数据来自tushare.  http://tushare.org/trading.html

统计时间段为2014-10-1到2017-10-1,只发现15只个股走势和沪深300指数比较相近,同步度比较高 的股票代码如下:

‘’002024',    '600038',   '002415',   '601333',  '000060',   '600406',   '600332',   '000540',   '000402',   '601988',   '601998',   '601328',   '601111',  '600048',  '000069']

贴几张走势图,但有的走势图看起来涨跌幅的匹配度并不高。 下面三张图中,第三张有些时候匹配不是很好。

HS300股指与其成分股的价格匹配

**********************************************************************************************

HS300股指与其成分股的价格匹配

***********************************************************************************************

HS300股指与其成分股的价格匹配

**********************************************************************************************

以下是Python程序:

#_*_coding:utf-8_*_
'''
Version: V17.1.0
Date: 2017-11-5
@Author: Cheney
'''

# tushare上获取数据,查询时间段2014.10-2017.10中,HS300股票收盘价格走势和HS300股指相似的个股

# Part I
import datetime
import numpy as np
import pandas as pd
import tushare as ts
import matplotlib.pyplot as plt
import traceback
import statsmodels.tsa.stattools as sts
import statsmodels.api as sm


t = datetime.datetime.now()
print ('Program is starting ... %s' %t)

def plot_price_relation(df, start, end, st_a, st_b='hs300'):
    '''
    Draw HS300 Index and stocks price relation plot
    df--DataFrame, index is date, columns are stock and hs300 index close
    start and end -- set the start and end date for stock and HS300 Index comparision
    st_a , st_b -- stock code and hs300 code or label
    '''
    fig, (ax,bx) = plt.subplots(nrows=2)
    x_date = [datetime.datetime.strptime(d, '%Y-%m-%d').date() for d in df.index]
    ax.plot(x_date, df[st_b], label=st_b, c='g')
    ax.set_title("%s index and stock %s daily prices relation" % (st_b, st_a))
    ax.set_xticklabels([])
    ax.set_ylabel("HS300Index")
    ax.grid(True)
    ax.legend(loc='best')

    bx.plot(x_date, df[st_a], label=st_a, c='b')
    bx.set_xlabel("Year/Month")
    bx.set_ylabel("Stock Price")
    bx.grid(True)
    bx.legend(loc='best')
    fig.autofmt_xdate()
    # Save figures in a folder or show in time
    plt.savefig('hs30index_pair_stock_plot/ %s+%s.png' % (st_b, st_a))
    # plt.show()

def get_df_close(stocka, stockb):
    # Transform stock data as dateframe format and keep the close columns and date index
    # stocka and stockb--stocks code, like '600036'

    sta = ts.get_hist_data(stocka)
    stb = ts.get_hist_data(stockb)
    # To build a new DataFrame to get the close of stock and HS300 Index
    df = pd.concat([sta, stb], axis=1)
    df = df['close'].fillna(method='ffill')
    df.columns = ['%s' %stocka, '%s' % stockb]
    return df

#Part II
if __name__ == "__main__":
    start = datetime.datetime(2014,10,1).strftime('%Y-%m-%d')
    end = datetime.datetime(2017,10,1).strftime('%Y-%m-%d')

    # Get HS300 stocks code list
    hs_name = 'hs300'
    hs = ts.get_hs300s()
    hs_list = hs['code']

    stockADF = {}   
    for code in hs_list:  
        #Get the stock and hs300 index close data
        df = get_df_close(code, hs_name)

        #Calculate the linear model's coefficient
        x_value= df['%s'%code]
        x = sm.add_constant(x_value)
        y = list(df['%s'%hs_name])
        try:
            #Calcualte the residuals of linear model, if it can't get the fit data, it will raise exception
            res = sm.OLS(y, x_value)
            res = res.fit()
            betaCoef = res.params[0]
            if (betaCoef-betaCoef) != 0:
                raise
        except:
            print ("Can't catch the res params of stock %s and %s"%(code,hs_name))
            traceback.print_exc()
            continue

        df['res'] = df['%s'%hs_name] - betaCoef * df['%s'%code]
        tempStockADF = sts.adfuller(df['res'])
        #Save the ADF test value in a dict for polting price comparision figure
        stockADF[code+''+ hs_name] = [tempStockADF[0], tempStockADF[4]['1%']]  

    #Compare the ADF test value and 1% salient threshold to estimate whether meet stationary time series
    for key,value in stockADF.items():        
        if value[0] < value[1]:
            print ("The best pairs stocks %s, ADF values %s and percent-1 %s" %(key,value[0],value[1]))
            keyCode = key.strip("\'\'")
            code, hs_name = keyCode[:6], keyCode[-5:]
            df = get_df_close(code,hs_name)
            plot_price_relation(df, start, end, '%s'%code,'%s'%hs_name)

    print ('Program total running time is %s' %(datetime.datetime.now() -t))

以上是量化交易学习中一点点的知识积累,有不足之处还望大牛多多指导。