# Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning

https://arxiv.org/pdf/2009.07111

Contrastive and Generative Graph Convolutional Networks for Graph-based
Semi-Supervised Learning ，2020，AAAI

# 1. 简介

## 1.1 摘要

Graph-based Semi-Supervised Learning (SSL) aims to transfer the labels of a handful of labeled data to the remaining massive unlabeled data via a graph. As one of the most popular graph-based SSL approaches, the recently proposed Graph Convolutional Networks (GCNs) have gained remarkable progress by combining the sound expressiveness of neural networks with graph structure. Nevertheless, the existing graph-based methods do not directly address the core problem of SSL, i.e., the shortage of supervision, and thus their performances are still very limited. To accommodate this issue, a novel GCN-based SSL algorithm is presented in this paper to enrich the supervision signals by utilizing both data similarities and graph structure. Firstly, by designing a semi-supervised contrastive loss, improved node representations can be generated via maximizing the agreement between different views of the same data or the data from the same class. Therefore, the rich unlabeled data and the scarce yet valuable labeled data can jointly provide abundant supervision information for learning discriminative node representations, which helps improve the subsequent classification result. Secondly, the underlying determinative relationship between the data features and input graph topology is extracted as supplementary supervision signals for SSL via using a graph generative loss related to the input features. Intensive experimental results on a variety of real-world datasets firmly verify the effectiveness of our algorithm compared with other state-of-the-art methods.

# 2. 方法

## 2.1 构建对比视角

• local view： 作者采用一个2层GCN作为这部分模型骨架

$\mathbf{H}^{\phi_{1}}=\hat{\mathbf{A}} \sigma\left(\hat{\mathbf{A}} \mathbf{X} \mathbf{W}^{(0)}\right) \mathbf{W}^{(1)}$

其中$\hat{\mathbf{A}}=\tilde{\mathbf{D}}^{-\frac{1}{2}} \tilde{\mathbf{A}} \tilde{\mathbf{D}}^{-\frac{1}{2}}, \tilde{\mathbf{A}}=\mathbf{A}+\mathbf{I}, \tilde{\mathbf{D}}_{i i}=\sum_{j} \tilde{\mathbf{A}}_{i j}$$\mathbf{H}^{\phi_{1}}$表示学习到的local view下的节点表示。

• global view： 使用的是层次图神经网络HGCN，$\mathbf{H}^{\phi_{1}}$为最终学习到的global view节点表示。

## 2.2 对比损失

• 视角1作为锚点和视角2对比

$\mathcal{L}_{u c}^{\phi_{1}}\left(\mathbf{x}_{i}\right)=-\log \frac{\exp \left(\left\langle\mathbf{h}_{i}^{\phi_{1}}, \mathbf{h}_{i}^{\phi_{2}}\right\rangle\right)}{\sum_{j=1}^{n} \exp \left(\left\langle\mathbf{h}_{i}^{\phi_{1}}, \mathbf{h}_{j}^{\phi_{2}}\right\rangle\right)}$

• 视角2作为锚点和视角1对比

$\mathcal{L}_{u c}^{\phi_{2}}\left(\mathbf{x}_{i}\right)=-\log \frac{\exp \left(\left\langle\mathbf{h}_{i}^{\phi_{2}}, \mathbf{h}_{i}^{\phi_{1}}\right\rangle\right)}{\sum_{j=1}^{n} \exp \left(\left\langle\mathbf{h}_{i}^{\phi_{2}}, \mathbf{h}_{j}^{\phi_{1}}\right\rangle\right)}$

$\mathcal{L}_{s c}=\frac{1}{2 l} \sum_{i=1}^{l}\left(\mathcal{L}_{s c}^{\phi_{1}}\left(\mathbf{x}_{i}\right)+\mathcal{L}_{s c}^{\phi_{2}}\left(\mathbf{x}_{i}\right)\right)$

$\begin{array}{c} \mathcal{L}_{s c}^{\phi_{1}}\left(\mathbf{x}_{i}\right)=-\log \frac{\sum_{k=1}^{l} \mathbb{1}_{\left[y_{i}=y_{k}\right]} \exp \left(\left\langle\mathbf{h}_{i}^{\phi_{1}}, \mathbf{h}_{k}^{\phi_{2}}\right\rangle\right)}{\sum_{j=1}^{l} \exp \left(\left\langle\mathbf{h}_{i}^{\phi_{1}}, \mathbf{h}_{j}^{\phi_{2}}\right\rangle\right)} \\ \mathcal{L}_{s c}^{\phi_{2}}\left(\mathbf{x}_{i}\right)=-\log \frac{\sum_{k=1}^{l} \mathbb{1}_{\left[y_{i}=y_{k}\right]} \exp \left(\left\langle\mathbf{h}_{i}^{\phi_{2}}, \mathbf{h}_{k}^{\phi_{1}}\right\rangle\right)}{\sum_{j=1}^{l} \exp \left(\left\langle\mathbf{h}_{i}^{\phi_{2}}, \mathbf{h}_{j}^{\phi_{1}}\right\rangle\right)} \end{array}$

$\mathcal{L}_{s s c}=\mathcal{L}_{u c}+\mathcal{L}_{s c}$

## 2.3 生成损失

$p\left(\mathcal{G} \mid \mathbf{H}^{\phi_{1}}, \mathbf{H}^{\phi_{2}}\right)=\prod_{i, j} p\left(e_{i j} \mid \mathbf{H}^{\phi_{1}}, \mathbf{H}^{\phi_{2}}\right)$

$p\left(\mathcal{G} \mid \mathbf{H}^{\phi_{1}}, \mathbf{H}^{\phi_{2}}\right)=\prod_{i, j} p\left(e_{i j} \mid \mathbf{h}_{i}^{\phi_{1}}, \mathbf{h}_{j}^{\phi_{2}}\right)=\prod_{i, j} \delta\left(\left[\mathbf{h}_{i}^{\phi_{1}}, \mathbf{h}_{j}^{\phi_{2}}\right] \mathbf{w}\right)$

## 2.4 模型训练

$\mathbf{O}=\lambda^{\phi_{1}} \mathbf{H}^{\phi_{1}}+\left(1-\lambda^{\phi_{1}}\right) \mathbf{H}^{\phi_{2}}$

$\mathcal{L}_{c e}=-\sum_{i=1}^{l} \sum_{j=1}^{c} \mathbf{Y}_{i j} \ln \mathbf{O}_{i j}$

$\mathcal{L}=\mathcal{L}_{c e}+\lambda_{s s c} \mathcal{L}_{s s c}+\lambda_{g^{2}} \mathcal{L}_{g^{2}}$

# 3. 实验

## 3.2 消融实验

$\text{CG}^3$模型包含三种类型损失函数，作者针对这些损失函数进行了消融实验，结果如下表6所示：

• 版权声明： 本博客所有文章除特别声明外，著作权归作者所有。转载请注明出处！