1月5日论文推荐(附下载地址)

论文题目:Sequences of Sets

作者

Austin R. Benson (Cornell University)

Ravi Kumar (Google)

Andrew Tomkins (Google)

推荐理由

“Sequences of Sets”是康奈尔大学的一个年轻教授Austin Benson(以前Jure Leskovec的学生)和Google两个大佬(Ravi Kumar和Andrew Tomkins)一起发表的一篇的文章。其实论文研究的问题是数据挖掘里面一个非常基础的问题:给定一个集合序列,也就是序列中每个点都是一个集合,这个集合可以是比如社交网络中的用户行为,当然连续两个点的行为可能是一样的,也可能非常不同,Sequence of sets里面最重要的事情就是自动挖掘出里面隐含的模式。论文提出一个随机模型,用于挖掘这种和时间相关的隐含模型,模型能否挖掘出两个方面的关联,一个是序列中相邻集合之间的关联关系;一个是最近参数模型(使得模型更好描述最近的信息)。

摘要

Sequential behavior such as sending emails, gathering in groups,tagging posts, or authoring academic papers may be characterized by a set of recipients, attendees, tags, or coauthors respectively. Such“sequences of sets" show complex repetition behavior, sometimes repeating prior sets wholesale, and sometimes creating new sets from partial copies or partial merges of earlier sets.rom partial copies or partial merges of earlier sets.

In this paper, we provide a stochastic model to capture these patterns. The model has two classes of parameters. First, a correlation parameter determiness how much of an earlier set will contribute to a future set. Second, a vector of recency parameters captures the fact that a set in a sequence is more similar to recent sets thanmore distant ones. Comparing against a strong baseline, we find that modeling both correlation and recency structures are required for high accuracy.We also find that both parameter classes vary widely across domains, so must be optimized on a per-dataset basis. We present the model in detail, provide a theoretical examination of its asymptotic behavior, and perform a set of detailed experiments on its predictive performance.

(0)

相关推荐