Identity Linkage 学习、分享与交流 陈凯 2016/5/9
CATALOG Introduction Methodology Achievements My Idea
Introduction
Introduction
Methodology User Attribute User Topic User Behavior Textual Attributes Visual Attributes User Topic Content Genre Distribution Interest&Skill Distribution User Behavior Sentiment Pattern Distribution Writing Style Mobile Trajectory and Location Information Multimedia Content Generation and Sharing ...
User Attribute Textual Attributes Username Password name, gender, age, nationality, profession, education,email account Missing Information
User Attribute Visual Attributes
User Topic Content Genre Distribution sports/ music/ entertainment/ society/ history/ science/ art/ high-tech/ commercial/ politics/ geography/ traveling/ fashions/ digital game/ industry/ luxury/ violence Latent Semantic Analysis
User Topic Interest&Skill Distribution java/ c++ / python/ machine-learing / ide /... Tagging
User Behavior Sentiment Pattern Distribution happy/ fear /sad /neutral Sentiment Vocabulary Maching
User Behavior Writing Style Frequency term: Oh my God/Jesus/讲道理/... Punctuation: 。。/ !!/ ~~/... Special character: ^_^ / @_@/... Counting K-TOP
User Behavior Mobile Trajectory and Location Information
User Behavior Multimedia Content Generation and Sharing Article Sharing Music Sharing Link Sharing
Achievements Studying User Footprints in Different Online Soical Networks 此文只用了几个浅显的特征,USERID,NAME, LOCATION等来做二分类,比较基础
Achievements Mining Email Social Networks 对邮件名做同一用户认定,主要采用邮件列表作为LINK,转化为一个社交网络
Achievements Inferring Anchor Links across Multiple Heterogeneous Social Networks 根据三类特征:Location,Temporal Activity,Text Content对Foursquare和Twitter的用户群做的同一用户认定
Achievements Identifying User Across Social Tagging Systems 用USERNAME+TAG做特征做的同一用户认定
Achievements Who's who in GNOME: using LSA to merge software repository identities 使用隐式语义分析LSA对邮件内容进行分析从而来对邮件名进行同一用户认定
Achievements What’s in a Name? An Unsupervised Approach to Link Users across Communities 这篇文章基于一个事实:很少有人用真名命名用户名,大众化的用户名更可能被不同的人使用,而现在已存在的大多数身份认定方法就选取用户名作为重要特征,此文做的工作是在相同用户名下的同一用户认定消歧,因为作者发现56%的相同用户名并不是同一个自然人,作者用了三个特征:User Meta Data, Social Relationship, Post Content
My Idea 带唯一性判别的用户名特征 john123 (1000) - john1234(2000) VoyageCK_github(4) - VoyageCK_stack(3) 带时序的主题模型 movie,food,sport - sport, movie, food movie,food,sport - movie, spprt movie->food>sport - sport->movie->food movie->food->sport- movie->null->sport
My Idea
thank you ! Q&A