Download presentation
Presentation is loading. Please wait.
1
数据摘要现状调研报告 上下文摘要初步思考 徐丹云
2
数据摘要分类 实体摘要 面向查询的摘要 上下文有关摘要 一般性摘要 本体摘要 抽取术语 抽取句子 RDF图摘要 模式抽取 顶点聚类
3
实体摘要-面向查询 和查询的相关性 Bai X, Delbru R, Tummarello G. RDF snippets for Semantic Web search engines topic-related node + query-related node+启发性算法排序 Cheng G, Qu Y. Searching linked objects with falcons: Approach, implementation and evaluation 词向量+余弦相似度 Zhang L, Zhang Y, Chen Y. Summarizing highly structured documents for effective search interaction 机器学习方法计算Facet-value和query的相关度
4
实体摘要-上下文感知 Tonon A, Catasta M, Demartini G, et al. TRank: Ranking Entity Types Using the Web of Data Type出现的频率 上下文中提到的实体 Type的层次结构
5
实体摘要-一般性 单步摘要 Cheng G, Tran T, Qu Y. RELIN: relatedness and informativeness-based centrality for entity summarization 随机冲浪模型 两个Feature的相关度(搜索引擎搜索共同出现的次数) Feature的信息量(数据集中出现的次数) Thalhammer A, Toma I, Roa-Valverde A, et al. Leveraging usage data for linked data movie entity summarization K近邻 Feature分数计算(与k个近邻共享的feature) 选择前n个feature
6
实体摘要-一般性 多步摘要 Fakas G J, Cai Z, Mamoulis N. Size-l object summaries for relational keyword 优化问题(动态规划,贪心算法) 每个元组重要性,最大化 Sydow M, Pikuła M, Schenkel R. The notion of diversity in graphical entity summarisation on semantic knowledge graphs relevance+importance+popularity+diversity
7
实体摘要-总结 统计信息 信息量,流行度等 特定应用相关信息 相关度 图结构重要性 图算法
8
本体摘要-抽取术语 Zhang X, Li H, Qu Y. Finding important vocabulary within ontology Vocabulary Dependency Graph+vocabulary和ontology的相似度+Double Focused PageRank算法(concepts+relations) Wu G, Li J, Feng L, et al. Identifying potentially important concepts and relations in an ontology CARRank算法的四个原则,迭代(concepts+relations) A concept is more important if there are more relations starting from the concepts A concept is more important if there is a relation starting from the concept to a more important concept A concept is more important if it has a higher relation weight to any other concept A relation weight is higher if it starts from a more import concept
9
本体摘要-抽取句子 -概念以及概念之间的关系
Zhang X, Cheng G, Qu Y. Ontology summarization based on rdf sentence graph RDF Sentence Graph+RDF句子的“Centrality” Degree Centrality Between Centrality Eigenvector Centrality(PageRank,HITS) Zhang X, Cheng G, Ge W Y, et al. Summarizing vocabularies in the global semantic web Expanded Bipartite Graph+Weighted HITS算法(结构上的重要程度) 包含的terms的平均重要程度(语用学上的重要性) Cheng G, Ji F, Luo S, et al. BipRank: ranking and summarizing RDF vocabulary descriptions A bipartite graph(sentence-item graph) Random walk
10
本体摘要-总结 构造图 图算法
11
RDF图摘要-抽取模式 Basse A, Gandon F, Mirbel I, et al. DFS-based frequent graph pattern extraction to characterize the content of RDF Triple Stores DFS编码代表rdf图模式+递归 Presutti12 V, Aroyo L, Adamou12 A, et al. Extracting core knowledge from Linked Data Dataset knowledge architecture+统计分析+between centrality
12
RDF图摘要-顶点聚类
13
RDF图摘要-顶点聚类 等价关系,划分 Tian Y, Hankins R A, Patel J M. Efficient aggregation for graph summarization 根据用户选择的属性和关系 Campinas S, Perry T E, Ceccarelli D, et al. Introducing rdf graph summary with application to assisted sparql formulation 同类型+相似的属性 Campinas S, Delbru R, Tummarello G. Efficiency and precision trade-offs in graph summary algorithms 至少有同一种类型,所有类型相同,有相同属性
14
下阶段工作 上下文感知摘要
15
上下文有关摘要 图算法 自然语言处理 只有考虑对type排序 只利用上下文的entity 邻居实体 TRank
Leveraging usage。。。 只有考虑对type排序 只利用上下文的entity 邻居实体 图算法 自然语言处理
16
Thanks
Similar presentations