DUCK 谣言检测《DUCK: Rumour Detection on Social Media by Modelling User and Comment Propagation Networks》( 二 ) _生活百科

Longformer 具有与 one-tier transformer 类似的架构，但使用更稀疏的注意模式来更有效地处理更长的序列。我们使用一个预先训练过的 Longformer，并遵循与之前相同的方法来建模 comment chain：
$z_{c c}=\mathrm{LF}\left(\operatorname{emb}\left([C L S], c_{0},[S E P], c_{1}, \ldots, c_{m^{\prime \prime}}\right)\right)$
其中，$m^{\prime \prime} \approx m$
3.2.3 Two-tier transformer解决序列长度限制的另一种方法是使用 two tiers of transformers 对 comment chain 进行建模：一层用于独立处理帖子，另一种用于使用来自第一个 transformer 的表示来处理帖子序列。
$\begin{array}{l}h_{i} &=&\operatorname{BERT}\left(\mathrm{emb}_{1}\left([C L S], c_{i}\right)\right) \\z_{c c} &=&\operatorname{transformer}\left(\operatorname{emb}_{2}([C L S]), h_{0}, h_{1}, \ldots, h_{m}\right)\end{array}$
其中，BERT 和 transformer 分别表示 first-tier transformers 和 second-tier transformers 。econd-tier transformers 具有与 BERT 类似的架构，但只有 2 层，其参数是随机初始化的。
3.3 User Tree我们探索了三种都是基于 GAT 建模 user network 的方法，并通过 mean-pooling 所有节点来聚合节点编码，以生成图表示：
$z_{u t}=\frac{1}{m+1} \sum\limits_{i=0}^{m} h_{i}^{L}$
这三种方法之间的主要区别在于它们如何初始化用户节点 $\left(h_{i}^{(0)}\right)$：
第一种 $\mathbf{G A T_{\text {rnd }}}$ ：用随机向量初始化用户节点。
$h_{i}^{0}=\operatorname{random}\left[v_{1}, v_{2}, \ldots, v_{d}\right]$
第二种 $\mathbf{GAT _{\text {prf: }}}$ : 来自他们的 user profiles ：username, user screen name, user description, user account age 等。因此， static user node $h_{i}^{0}$ 由 $v_{i} \in \mathbb{R}^{k}$ 给出
$h_{i}^{0}=\left[v_{1}, v_{2}, \ldots, v_{k}\right]$
第三种 $\mathbf{GAT_{\text {prf }+\text { rel : }}}$：该方法基于用户特征（user profiles）及其社会关系（基于“follow”关系）通过变分图自动编码器 GAE 初始化用户节点的表示。前者捕捉使用源帖子的用户，而后者是互相关注的用户网络。
给定基于训练数据构造的 social graph $G_{s}$，我们可以推导出一个邻接矩阵 $\mathrm{A} \in \mathbb{R}^{n \times n}$，其中 $\mathrm{n} $ 为用户数。设 $X=\left[x_{1}, x_{2}, \ldots, x_{n}\right], x_{i} \in \mathbb{R}^{k}$，$x_{i} \in \mathbb{R}^{k}$ 为输入节点特征。我们的目标是学习一个变换矩阵 $\mathrm{Z} \in \mathbb{R}^{n \times d}$，它将用户转换为一个维数为 $d$ 的潜在空间。我们使用一个两层的 GCN 作为编码器。它以邻接矩阵 $\mathrm{A}$ 和特征矩阵 $\mathrm{X}$ 作为输入，并生成潜在变量 $Z$ 作为输出。解码器由潜在变量 $\mathrm{Z}$ 之间的内积定义。我们的解码器的输出是一个重构的邻接矩阵 $ \hat{A}$ 。从形式上讲：$\begin{array}{l}Z &=\operatorname{enc}(\mathbf{X}, \mathbf{A}) =\operatorname{GCN}\left(f\left(\operatorname{GCN}\left(\mathbf{A}, \mathbf{X} ; \theta_{1}\right)\right) ; \theta_{2}\right) \\\hat{A} &=\operatorname{dec}\left(Z, Z^{\top}\right)=\sigma\left(Z Z^{\top}\right)\end{array}$
$h_{i}^{(0)} \in \mathbb{R}^{d}$ 通过下述方法计算：
$h_{i}^{(0)}=\left\{\begin{array}{ll}\operatorname{ReLU}\left(W \cdot\left[v_{1}, \ldots, v_{k}\right]\right), & \text { if } \operatorname{user}_{i} \notin G_{s} \\Z_{i}, & \text { if } \operatorname{user}_{i} \in G_{s}\end{array}\right.$
其中，$W_{i}$ 是全连接参数， $v_{i} \in \mathbb{R}^{k}$ 是 user profiles 。
3.4 Rumour Classifier使用 comment tree、comment chain、user tree 分别生成的图表示 $z_{c t}$、$z_{c c}$、$z_{u t}$ 进行谣言分类：
$\begin{array}{l}z=z_{c t} \oplus z_{c c} \oplus z_{u t} \\\hat{y}=\operatorname{softmax}\left(W_{c} z+b_{c}\right) \\\mathcal{L}=-\sum\limits _{i=1}^{n} y_{i} \log \left(\hat{y_{i}}\right)\end{array}$其中，$n$ 表示训练实例数。4 Experiments and Results4.1 Datasets数据集统计如下：

文章插图
we report the average performance based on 5-fold cross-validation.
we reserve 20% data as test and split the rest in a ratio of 4:1 for training and development partitions and report the average test performance over 5 runs (initialised with different random seeds).
4.2 Results本文实验主要回答如下问题：