R语言如何估计结构方程模型

薛英杰 / 2020-11-29

论文改完扔出去一个多月了，一直想把估计结构方程模型中踩过的坑说说，但忙于其他事情并未动笔，今天就以论文中结构方程估计为例，为大家介绍一下结构方程的估计和路径图绘制中注意的事项。

我们用结构方程模型联合检验ETF持股对定价效率的影响路径。在我们的模型中，内生变量为价格延迟程度（Delayr）和噪声交易者参与度（NoiseTrade），潜变量为基础资产定价效率，中介变量为交易成本（Iliquidity）、卖空限制（Shortratio）和分析师覆盖（Analaysiscover），外生变量为ETF持股比例（ETFOwnership）。具体模型如下：

测量模型：

\[\ \begin{pmatrix} \ Delayr\\ \ NoiseTrade \end{pmatrix}= \begin{pmatrix} \ \lambda_{11}\\ \ \lambda_{12} \end{pmatrix} Pefficiency+ \begin{pmatrix} \ \delta_{1}\\ \ \delta_{2} \end{pmatrix}\ \]

\[\ \begin{pmatrix} \ Iliquidity\\ \ Shortratio \\ \ Analysiscover \end{pmatrix}= \begin{pmatrix} \ \lambda_{21}\\ \ \lambda_{22} \\ \ \lambda_{32} \end{pmatrix} ETFOwnership+ \begin{pmatrix} \ \delta_{3}\\ \ \delta_{4} \\ \ \delta_{5} \end{pmatrix}\ \]

结构模型：

\[\ Pefficiency=\begin{pmatrix} \ \lambda_{21}\\ \ \lambda_{22} \\ \ \lambda_{32} \end{pmatrix}^T \begin{pmatrix} \ Iliquidity\\ \ Shortratio \\ \ Analysiscover \end{pmatrix}+\epsilon \ \]

在这个模型中，测量方程描述了各观测变量之间以及潜变量与其观测变量的关系，结构方程描述了潜变量与其他观测变量之间的关系。上述模型中，只有定价效率（Pefficiency）为潜变量，其他变量都为可观测变量，潜变量Pefficiency的测量变量分别为价格延迟程度（Delayr）和噪声交易者参与度（NoiseTrade）。我们可以使用R软件lavaan包进行参数估计，并绘制路径图。具体代码如下：

pacman::p_load(lavaan,semPlot)             #加载结构方程估计包和路径图绘制包
setwd("D:\\科研\\开题准备\\analsisold")    #设置数据所在路径
sf<-read.csv("structuredata.csv")          #读入研究数据

tofit.model <-'
Priceeff=~Delay_coef+NoiseTrad1
Priceeff~Illiqd+rqratio+follownum
Illiqd~ETFownership_1
rqratio~ETFownership_1
follownum~ETFownership_1
'
## =~代表潜变与其观测变量的关系  ~表示观测变量之间及潜变量与其他观测变量的关系

tofit.model_m <- sem(tofit.model, sf)      #结构方程模型估计

summary(tofit.model_m,rsquar=TRUE,fit.measures = TRUE) #结构方程模型结果输出

## lavaan 0.6-7 ended normally after 208 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         13
##                                                       
##                                                   Used       Total
##   Number of observations                         97939      137255
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                              3145.772
##   Degrees of freedom                                 7
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                             20290.087
##   Degrees of freedom                                15
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.845
##   Tucker-Lewis Index (TLI)                       0.668
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             656696.778
##   Loglikelihood unrestricted model (H1)     658269.664
##                                                       
##   Akaike (AIC)                            -1313367.556
##   Bayesian (BIC)                          -1313244.159
##   Sample-size adjusted Bayesian (BIC)     -1313285.473
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.068
##   90 Percent confidence interval - lower         0.066
##   90 Percent confidence interval - upper         0.070
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.037
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Priceeff =~                                         
##     Delay_coef        1.000                           
##     NoiseTrad1        0.033    0.014    2.348    0.019
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Priceeff ~                                          
##     Illiqd            0.164    0.028    5.911    0.000
##     rqratio         -11.175    0.265  -42.157    0.000
##     follownum        -1.771    0.231   -7.667    0.000
##   Illiqd ~                                            
##     ETFownership_1   -2.458    0.070  -35.111    0.000
##   rqratio ~                                           
##     ETFownership_1    0.740    0.007  105.300    0.000
##   follownum ~                                         
##     ETFownership_1    0.488    0.008   58.456    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Delay_coef        0.454    0.037   12.325    0.000
##    .NoiseTrad1        0.147    0.001  220.892    0.000
##    .Illiqd            0.005    0.000  221.291    0.000
##    .rqratio           0.000    0.000  221.291    0.000
##    .follownum         0.000    0.000  221.291    0.000
##    .Priceeff         -0.067    0.037   -1.825    0.068
## 
## R-Square:
##                    Estimate
##     Delay_coef       -0.151
##     NoiseTrad1       -0.000
##     Illiqd            0.012
##     rqratio           0.102
##     follownum         0.034
##     Priceeff             NA

结构方程路径图绘制时，应该注意节点布局，图形样式、图片旋转和配色，尤其时节点布局，可能需要根据我们的需要来设置。

节点布局

节点布局有5种样式，分别为：1、tree; 2、circle; 3、spring; 4、tree2; 5、circle2,可以使用参数layout设置，我们分别看一下各种样式

1、tree

semPaths(object = tofit.model_m,
         whatLabels = "std",
         edge.label.cex = 1,
         layout = "tree")

2、circle

semPaths(object = tofit.model_m,
         whatLabels = "std",
         edge.label.cex = 1,
         layout = "circle")

3、spring

semPaths(object = tofit.model_m,
         whatLabels = "std",
         edge.label.cex = 1,
         layout = "spring")

4、tree2

semPaths(object = tofit.model_m,
         whatLabels = "std",
         edge.label.cex = 1,
         layout = "tree2")

5、circle2

semPaths(object = tofit.model_m,
         whatLabels = "std",
         edge.label.cex = 1,
         layout = "circle2")

但这些布局样式都不满足我们的需求时，怎么办？不用怕，有万能的布局设置，一般人还不知道，就需要设置每一个节点的位置，在二维平面内，两个坐标确定唯一的一个节点，因此，7个节点应该用2乘7的矩阵来确定所有节点的位置，具体如下：

ly<-matrix(c(-0.5, 2.5,
             0.5,2.5,
             0.7, 1.5,
             -0.7,1.5,
             0,1.5,
             0,0.8,
             0,2), ncol=2, byrow=TRUE) #设置每个节点的位置

semPaths(tofit.model_m, "path",label.prop = 0.5,residuals=F, fade = FALSE,
         layout =ly,sizeInt = 1, intStyle="single",
         sizeLat = 10,nodeLabels=c("Delayr","NoiseTrde","Iliquidity","Shortratio","Analyscover","ETFOwnership","Pefficency"),
         edge.color="black",rotation=3,whatLabels="est",label.cex=1.8,node.width=1.4,node.height=0.2,
         edge.label.cex=1,sizeMan=8)

图片旋转

参数rotation 可以用来设置图片旋转角度，旋转可选的值有：1，2，3，4，分别是默认、逆时针旋转90°、180°以及270° 在设置节点位置后，旋转参数就不会起作用了

semPaths(tofit.model_m, "path",label.prop = 0.5,residuals=F, fade = FALSE,
         sizeInt = 1, intStyle="single",
         sizeLat = 10,nodeLabels=c("Delayr","NoiseTrde","Iliquidity","Shortratio","Analyscover","ETFOwnership","Pefficency"),
         edge.color="black",rotation=1,whatLabels="est",label.cex=1.8,node.width=1.4,node.height=0.2,
         edge.label.cex=1,sizeMan=8) #旋转270度

配色我们可以给节点和路径线条配色，节点配色用color参数，线条配色用edge.color参数，如果标出系数正负不同的路径，可以使用posCol和negCol参数来定义分别的路径颜色，比如节点配成浅蓝lightblue，正系数路径配成红色，负系数路径配成绿色，具体如下：

semPaths(tofit.model_m, "path",layout =ly,label.prop = 0.5,residuals=F, fade = FALSE,posCol="red",negCol="green",color = "lightblue",
         sizeInt = 1, intStyle="single",
         sizeLat = 10,nodeLabels=c("Delayr","NoiseTrde","Iliquidity","Shortratio","Analyscover","ETFOwnership","Pefficency"),
         edge.color=c("green","red"),rotation=3,whatLabels="est",label.cex=1.8,node.width=1.4,node.height=0.2,
         edge.label.cex=1,sizeMan=8)

节点参数设置

edge.label.cex是设置edge（路径、线）上的字体大小，默认0.8

label.cex 节点标签的大小设置

node.width和node.height分别用来设置节点的长和宽