# SimGRACE: A Simple Framework for Graph Contrastive Learning without Data Augmentation

https://arxiv.org/pdf/2202.03104

https://github.com/junxia97/simgrace

# 1. 简介

## 1.1 摘要

Graph contrastive learning (GCL) has emerged as a dominant technique for graph representation learning which maximizes the mutual information between paired graph augmentations that share the same semantics. Unfortunately, it is difficult to preserve semantics well during augmentations in view of the diverse nature of graph data. Currently, data augmentations in GCL that are designed to preserve semantics broadly fall into three unsatisfactory ways. First, the augmentations can be manually picked per dataset by trial-and-errors. Second, the augmentations can be selected via cumber some search. Third, the augmentations can be obtained by introducing expensive domain-specific knowledge as guidance. All of these limit the efficiency and more general applicability of existing GCL methods. To circumvent these crucial issues, we propose a Simple framework for GRAph Contrastive lEarning, SimGRACE for brevity, which does not require data augmentations. Specifically, we take original graph as input and GNN model with its perturbed version as two encoders to obtain two correlated views for contrast. SimGRACE is inspired by the observation that graph data can preserve their semantics well during encoder perturbations while not requiring manual trial-and-errors, cumbersome search or expensive domain knowledge for augmentations selection. Also, we explain why SimGRACE can succeed. Furthermore, we devise adversarial training scheme, dubbed AT-SimGRACE, to enhance the robustness of graph contrastive learning and theoretically explain the reasons. Albeit simple, we show that SimGRACE can yield competitive or better performance compared with state-of-the-art methods in terms of generalizability, transferability and robustness, while enjoying unprecedented degree of flexibility and efficiency.

# 2. 方法

## 模型细节

1. Encoder perturbation

$f(\cdot;\theta)$表示GNN编码器，$f(\cdot;\theta')$表示其扰动版本，$h$$h'$表示两个编码器学习到的节点嵌入。扰动编码器参数计算方式如下：

$\theta_{l}^{\prime}=\theta_{l}+\eta \cdot \Delta \theta_{l} ; \quad \Delta \theta_{l} \sim \mathcal{N}\left(0, \sigma_{l}^{2}\right)$

其实就是将原始GNN每一层的参数加上一个高斯噪声。

这个属于常规操作，将节点嵌入用一个非线性函数映射到对比空间。

$z=g(\mathrm{~h}), z^{\prime}=g\left(\mathrm{~h}^{\prime}\right)$

3. Contrastive loss

这个也很常规，和其他GCL方法没什么区别：

$\ell_{n}=-\log \frac{\left.\exp \left(\operatorname{sim}\left(z_{n}, z_{n}^{\prime}\right)\right) / \tau\right)}{\sum_{n^{\prime}=1, n^{\prime} \neq n}^{N} \exp \left(\operatorname{sim}\left(z_{n}, z_{n^{\prime}}\right) / \tau\right)}$

## 理论证明

• alignment：正样本对之间的距离

\ell_{\text {align }}(f ; \alpha) \triangleq \underset{(x, y) \sim p_{\text {pos }}}{\mathbb{E}}\left[\|f(x)-f(y)\|_{2}^{\alpha}\right], \quad \alpha>0

其中$p_{pos}$表示所有的正样本对。这个指标其实适合对比学习的目标相对应的，即正样本在嵌入空间中的距离应该很近。

在SimGRACE里面，alignment计算方式可以转换成：

\ell_{\text {align }}(f ; \alpha) \triangleq \mathbb{E}_{x \sim p_{\text {data }}}\left[\left\|f(x ; \theta)-f\left(x ; \theta^{\prime}\right)\right\|_{2}^{\alpha}\right], \quad \alpha>0

其中$p_{data}$表示数据分布，其实就是输入图的所有节点。

• uniform： the logarithm
of the average pairwise Gaussian potential

$\ell_{\text {uniform }}(f ; \alpha) \triangleq \log \underset{x, y^{i . i . d .} p_{\text {data }}}{\mathbb{E}}\left[e^{-t\|f(x ; \theta)-f(y ; \theta)\|_{2}^{2}}\right]$

uniform对应于对比学习的另一个目标：随机样本的嵌入应该分散在嵌入空间中。

## AT-SimGRACE

$\min _{\theta} \mathcal{L}^{\prime}(\theta), \quad \text { where } \quad \mathcal{L}^{\prime}(\theta)=\frac{1}{n} \sum_{i=1}^{n} \max _{\|\mathrm{x}_{i}^{\prime}-\mathbf{x}_{i} \|_{p} \leq \epsilon} \ell_{i}^{\prime}\left(f\left(\mathrm{x}_{i}^{\prime} ; \theta\right), y_{i}\right) \text {, }$

1. AT需要标签信息，但是GCL中没有
2. 对数据集中的每个图都进行perturbation，计算量过大，这个问题在GROC已经被指出。

$\mathrm{R}(\mathrm{w} ; \epsilon):=\{\theta \in \Theta:\|\theta-\mathrm{w}\| \leq \epsilon\}$

$\min _{\theta} \mathcal{L}(\theta+\Delta) \text {, } \\ where\ \mathcal{L}(\theta+\Delta)=\frac{1}{M} \sum_{i=1}^{M} \max _{\Delta \in \mathrm{R}(0 ; \epsilon)} \ell_{i}\left(f\left(\mathcal{G}_{i} ; \theta+\Delta\right), f\left(\mathcal{G}_{i} ; \theta\right)\right)$

$\mathbb{E}_{\left\{\mathcal{G}_{i}\right\}_{i=1}^{M}, \Delta}[\mathcal{L}(\theta+\Delta)] \leq \mathbb{E}_{\Delta}[\mathcal{L}(\theta+\Delta)]+4 \sqrt{\frac{K L(\theta+\Delta \| P)+\ln \frac{2 M}{\delta}}{M}}$

\begin{aligned} \mathbb{E}_{\left\{\mathcal{G}_{i}\right\}_{i=1}^{M}, \Delta}[\mathcal{L}(\theta+\Delta)] \leq \mathcal{L}(\theta) &+\underbrace{\left\{\mathbb{E}_{\Delta}[\mathcal{L}(\theta+\Delta)]-\mathcal{L}(\theta)\right\}}_{\text {Expected sharpness }} \\ &+4 \sqrt{\frac{1}{M}\left(\frac{1}{2 \alpha}+\ln \frac{2 M}{\delta}\right)} \end{aligned}

## 实验

1. 通用性：无监督、半监督设定下，模型在下游任务的表现是否优于竞争对手？

2. 迁移性：预训练模式下的Sim-GRACE是否优于竞争对手？
1. 鲁棒性：Sim-GRACE在不同对抗攻击下是否优于竞争对手？

2. 性能：时间、空间占用情况是否优于对手？

3. 超参敏感性：扰动参数、epoch和batch大小等等。

