# Disentangled Contrastive Learning on Graphs

# 1. 简介

## 1.1 摘要

Recently, self-supervised learning for graph neural networks (GNNs) has attracted considerable attention because of their notable successes in learning the representation of graph-structure data. However, the formation of a real-world graph typically arises from the highly complex interaction of many latent factors. The existing self-supervised learning methods for GNNs are inherently holistic and neglect the entanglement of the latent factors, resulting in the learned representations suboptimal for downstream tasks and difficult to be interpreted. Learning disentangled graph representations with self-supervised learning poses great challenges and remains largely ignored by the existing literature. In this paper, we introduce the Disentangled Graph Contrastive Learning (DGCL) method, which is able to learn disentangled graph-level representations with self-supervision. In particular, we first identify the latent factors of the input graph and derive its factorized representations. Each of the factorized representations describes a latent and disentangled aspect pertinent to a specific latent factor of the graph. Then we propose a novel factor-wise discrimination objective in a contrastive learning manner, which can force the factorized representations to independently reflect the expressive information from different latent factors. Extensive experiments on both synthetic and real-world datasets demonstrate the superiority of our method against several state-of-the-art baselines.

# 2. 方法

DGCL模型图如下，其中蓝色和黄色两个方框就是整个模型最核心的两部分：（1）解耦图编码器，即如何学习结构图表示；（2）对比学习部分，每个图有多个解耦表示时，如何设计对比优化目标？

## 2.1 Disentangled图编码器

disentangled图编码器的目标是：对于图$G_i\in\mathbf G$，生成factorized图表示$\left[\mathbf{z}_{i, 1}, \mathbf{z}_{i, 2}, \ldots, \mathbf{z}_{i, K}\right]$。我们知道GNN编码器通过聚合邻域信息来更新节点嵌入：

$h_v^l=\operatorname{COMBINE}^l\left(h_v^{l-1}, \operatorname{AGGREGATE}^l\left(\left\{h_u^{l-1}: u \in \mathcal{N}(v)\right\}\right)\right)\\ \mathbf{H}^l=\left\{h_v^l \mid v \in V\right\}\\$

## 2.2 模型优化

$p_\theta\left(y_i \mid x_i\right)=\frac{\exp \phi\left(\mathrm{v}_i, \mathrm{v}_{y_i}^{\prime}\right)}{\sum_{j=1}^N \exp \phi\left(\mathrm{v}_i, \mathrm{v}_{y_j}^{\prime}\right)}$

$p_\theta\left(y_i \mid G_i\right)=\mathbb{E}_{p_\theta\left(k \mid G_i\right)}\left[p_\theta\left(y_i \mid G_i, k\right)\right]$

1. $p_\theta(k|G_i)$，采用基于原型的方式计算。假设$\left\{\mathbf{c}_k\right\}_{k=1}^K$表示K个潜在因子原型，$z_i$表示图编码器的输出，计算方式如下：

$p_\theta\left(k \mid G_i\right)=\frac{\exp \phi\left(\mathbf{z}_{i, k}, \mathbf{c}_k\right)}{\sum_{k=1}^K \exp \phi\left(\mathbf{z}_{i, k}, \mathbf{c}_k\right)}$

其中$\phi(\cdot,\cdot)$表示相似度函数。

2. $p_\theta(y_i|G_i,k)$，和常规GCL一样，只不过用的是k-th factor对应的representation计算相似度：

$p_\theta\left(y_i \mid G_i, k\right)=\frac{\exp \phi\left(\mathbf{z}_{i, k}, \mathbf{z}_{y_i, k}^{\prime}\right)}{\sum_{j=1}^N \exp \phi\left(\mathbf{z}_{i, k}, \mathbf{z}_{y_j, k}^{\prime}\right)},$

其中$y_i$其实就是节点id，$z_{i,k}$$z_{y_i,k}'$分别表示节点$x_i$在不同视角中的节点嵌入。

$\theta^*=\underset{\theta}{\arg \max } \sum_{i=1}^N \log p_\theta\left(y_i \mid G_i\right)=\underset{\theta}{\arg \max } \sum_{i=1}^N \log \mathbb{E}_{p_\theta\left(k \mid G_i\right)}\left[p_\theta\left(y_i \mid G_i, k\right)\right]$

\begin{aligned} &\log p_\theta\left(y_i \mid G_i\right) \\ &=\mathbb{E}_{q_\theta\left(k \mid G_i, y_i\right)}\left[\log p_\theta\left(y_i \mid G_i\right)\right] \\ &=\mathbb{E}_{q_\theta\left(k \mid G_i, y_i\right)}\left[\log \frac{p_\theta\left(y_i, k \mid G_i\right)}{p_\theta\left(k \mid G_i, y_i\right)}\right] \\ &=\mathbb{E}_{q_\theta\left(k \mid G_i, y_i\right)}\left[\log \frac{p_\theta\left(y_i, k \mid G_i\right)}{q_\theta\left(k \mid G_i, y_i\right)} \frac{q_\theta\left(k \mid G_i, y_i\right)}{p_\theta\left(k \mid G_i, y_i\right)}\right] \\ &=\mathbb{E}_{q_\theta\left(k \mid G_i, y_i\right)}\left[\log \frac{p_\theta\left(y_i, k \mid G_i\right)}{q_\theta\left(k \mid G_i, y_i\right)}\right]+\mathbb{E}_{q_\theta\left(k \mid G_i, y_i\right)}\left[\log \frac{q_\theta\left(k \mid G_i, y_i\right)}{p_\theta\left(k \mid G_i, y_i\right)}\right] \\ &=\mathbb{E}_{q_\theta\left(k \mid G_i, y_i\right)}\left[\log \frac{p_\theta\left(y_i, k \mid G_i\right)}{q_\theta\left(k \mid G_i, y_i\right)}\right]+D_{K L}\left(q_\theta\left(k \mid G_i, y_i\right) \| p_\theta\left(k \mid G_i, y_i\right)\right) \\ &\geq \mathbb{E}_{q_\theta\left(k \mid G_i, y_i\right)}\left[\log \frac{p_\theta\left(y_i, k \mid G_i\right)}{q_\theta\left(k \mid G_i, y_i\right)}\right] \\ &=\mathbb{E}_{q_\theta\left(k \mid G_i, y_i\right)}\left[\log p_\theta\left(y_i \mid G_i, k\right) \frac{p_\theta\left(k \mid G_i\right)}{q_\theta\left(k \mid G_i, y_i\right)}\right] \\ &=\mathbb{E}_{q_\theta\left(k \mid G_i, y_i\right)}\left[\log p_\theta\left(y_i \mid G_i, k\right)\right]-D_{K L}\left(q_\theta\left(k \mid G_i, y_i\right) \| p_\theta\left(k \mid G_i\right)\right) \\ &=\mathcal{L}(\theta, i) . \end{aligned}

$p_\theta\left(k \mid G_i, y_i\right)=\frac{p_\theta\left(k \mid G_i\right) p_\theta\left(y_i \mid G_i, k\right)}{\sum_{k=1}^K p_\theta\left(k \mid G_i\right) p_\theta\left(y_i \mid G_i, k\right)}$

$q_\theta\left(k \mid G_i, y_i\right)=\frac{p_\theta\left(k \mid G_i\right) \hat{p}_\theta\left(y_i \mid G_i, k\right)}{\sum_{k=1}^K p_\theta\left(k \mid G_i\right) \hat{p}_\theta\left(y_i \mid G_i, k\right)}\\ \hat{p}_\theta\left(y_i \mid G_i, k\right)=\frac{\exp \phi\left(\mathbf{z}_{i, k}, \mathbf{z}_{i, k}^{\prime}\right)}{\sum_{\substack{|\mathcal{B}| \\ j \in \mathcal{B}, j}} \exp \phi\left(\mathbf{z}_{i, k}, \mathbf{z}_{j, k}^{\prime}\right)} .$

$\mathcal{L}(\theta, \mathcal{B})=\sum_{i \in \mathcal{B}} \mathcal{L}(\theta, i)$

# 3. 实验

1. 标准数据集

2. 人工数据集

## 3.2 消融实验

