PSIN 谣言检测——《Divide-and-Conquer: Post-User Interaction Network for Fake News Detection on Social Media》

论文信息

论文标题:Divide-and-Conquer: Post-User Interaction Network for Fake News Detection on Social Media论文作者:Erxue Min, Yu Rong, Yatao Bian, Tingyang Xu, Peilin Zhao, Junzhou Huang,Sophia Ananiadou论文来源:2022,WWW 论文地址:download 论文代码:download
Background挑战:
(1) 谣言检测涉及众多类型的实体和关系,需要一些方法来建模异质性;(2) 社交媒体中的话题出现了分布变化,显著降低了虚假新闻的性能;(3) 现有虚假新闻数据集通常缺乏较大规模、话题多样性和用户的社交关系;
基于文本的谣言检测方法存在如下两个问题:
(1) 首先,在新闻的社会背景下的信息是复杂的和异构的;
(2) 其次是分布偏移问题——训练分布不同于测试分布;
分布偏移例子:如虚假新闻分类器是在 包含政治、体育、娱乐等普通主题的标记数据进行训练的,但是在测试集上出现了出现了诸如“黑天鹅事件”的新主题 。
贡献:


    • We construct and publicize a new fake news dataset with social context named MC-Fake2 , which contains 27,155 news events in 5 topics, and their social context composed of 5 million posts, 2 million users and induced social graph with 0.2 billion edges.
    • We propose a novel Post-User Interaction Network (PSIN), which applies divide-and-conquer strategy to model the heterogeneous relations. Specifically, we integrate the post-post, user-user and post-user subgraphs with three variants of Graph Attention Networks based on their intrinsic characteristics. Additionally, we employ an additionally adversarial topic discriminator to learn topic-agnostic features for veracity classification.
    • We evaluate our proposed model on the curated dataset in two settings: in-topic split and out-of-topic split. The superior results of our model in both settings reveal the effectiveness of the proposed method.
2 Related work2.1 Fake News Datasets
  • BuzzFeedNews specializes in political news published on Facebook during the 2016 U.S. Presidential Election.
  • LIAR collects 12.8K short statements with manual labels from the political fact-checking website.
  • FA-KES consists of 804 articles around Syrian war.
  • CREDBANK contains about 1000 news events and 60 million tweets, labeled by Amazon mechanical Turk.
  • Twitter15 contains 778 reported events between March 2015 to December 2015, with 1 million posts from 500k users.
  • FakeNewsNet is a data repository with news content and related posts, containing political news and entertainment news which are checked by politifact and gossiocop.
  • FakeHealth is collected from healthcare information review website Health News Review, it contains over 2000 news articles, 500k posts and 27k user profiles, along with user networks.
  • COAID collects 1,896 news, 183,654 related user engagements, 516 social platform posts about COVID-19, and ground truth labels.
  • FakeCovid is a multilingual cross-domain dataset of 5,182 fact-checked news article for COVID-19 from 92 different fact-checking websites.
  • MM-COVID is a multilingual and multidimensional COVID-19 fake news data repository, containing 3,981 pieces of fake news content and 7,192 trustworthy information from 6 different languages.
2.2 Social Context-based Fake News Detection划分为三类:


    • Sequential Modeling [20, 24, 30, 52]
    • Explicit responding path modeling [4, 19, 26, 47]
    • Implicit attention modeling
3 Problem statement假新闻数据集定义:$\mathbf{D}=\left\{\mathbf{T}, G^{U}, G^{U P}\right\}$

经验总结扩展阅读