Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨 2018.09.21.

Slides:

Advertisements

Similar presentations

期末考试作文讲解 % 的同学赞成住校 30% 的学生反对住校 1. 有利于培养我们良好的学习和生活习惯； 1. 学生住校不利于了解外界信息； 2 可与老师及同学充分交流有利于共同进步。 2. 和家人交流少。在寄宿制高中，大部分学生住校，但仍有一部分学生选择走读。你校就就此开展了一次问卷调查，主题为.

Advertisements

研究生大進擊盧永豐

Course 1 演算法: 效率、分析與量級 Algorithms: Efficiency, Analysis, and Order

2007年8月龙星课程周源源老师课程体会包云岗中科院计算所

A Force from Empty Space The Casimir Effect

《高级人工智能》参考书目马少平，朱小燕。人工智能。清华大学出版社。蔡自兴，徐光祐。人工智能及其应用。清华大学出版社。

Classification of Web Query Intent Using Encyclopedia 基于百科知识的查询意图获取

华东师范大学软件学院王科强 (第一作者), 王晓玲

Performance Evaluation

Chaoping Li, Zhejiang University

沐阳老年社区.

XI. Hilbert Huang Transform (HHT)

Semantic-Synaptic Web Mining: A Novel Model for Improving the Web Mining 報告者：陳宜樺報告日期：2015/9/25.

Minimum Spanning Trees

An Adaptive Cross-Layer Multi-Path Routing Protocol for Urban VANET

Some Effective Techniques for Naive Bayes Text Classification

Thinking of Instrumentation Survivability Under Severe Accident

指導教授：許子衡教授報告學生：翁偉傑 Qiangyuan Yu , Geert Heijenk

Population proportion and sample proportion

NLP Group, Dept. of CS&T, Tsinghua University

模式识别 Pattern Recognition

计算机问题求解 – 论题算法的效率 2018年03月14日.

計算方法設計與分析 Design and Analysis of Algorithms 唐傳義

The Empirical Study on the Correlation between Equity Incentive and Enterprise Performance for Listed Companies 上市公司股权激励与企业绩效相关性的实证研究汇报人：白欣蓉学号：

SAT and max-sat Qi-Zhi Cai.

Journal Citation Reports® 期刊引文分析報告的使用和檢索

Ch2 Infinite-horizon and Overlapping- generations Models (无限期与跨期模型)

非财务人员的财务管理 -基础财务辛怀军 QQ:

Course 9 NP Theory序論 An Introduction to the Theory of NP

Simulated Annealing 報告者:李怡緯 OPLAB in NTUIM.

Journal of High Speed Networks 15（2006）

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi

The role of leverage in cross-border mergers and acquisitions

Interval Estimation區間估計

增强型MR可解决临床放射成像的多供应商互操作性问题

ZEEV ZEITIN Delft University of Technology, Netherlands

Advisor : Prof. Frank Y.S. Lin Presented by Yen-Yi, Hsu

Chp.4 The Discount Factor

数据摘要现状调研报告上下文摘要初步思考徐丹云.

Version Control System Based DSNs

研究技巧與論文撰寫方法中央大學資管系陳彥良.

高性能计算与天文技术联合实验室智能与计算学部天津大学

Maintaining Frequent Itemsets over High-Speed Data Streams

Guide to a successful PowerPoint design – simple is best

Total Review of Data Structures

Chp.4 The Discount Factor

中国科学技术大学计算机系陈香兰 2013Fall 第七讲存储器管理中国科学技术大学计算机系陈香兰 2013Fall.

VRP工具or-tools调研王羚宇

Learn Question Focus and Dependency Relations from Web Search Results for Question Classification 各位老師大家好,這是我今天要報告的論文題目,…… 那在題目上的括號是因為,前陣子我們有投airs的paper,那有reviewer對model的名稱產生意見.

Representation Learning of Knowledge Graphs with Hierarchical Types

Google Local Search API Research and Implementation

计算机问题求解 – 论题算法方法 2016年11月28日.

Highlight in cooperation-branch breakthrough I&S Branch 财年行业突破—冶金行业

Inter-band calibration for atmosphere

A Data Mining Algorithm for Generalized Web Prefetching

Chp.4 The Discount Factor

Outline Overview of this paper Motivation and Initialization

LOGO 2018 企业公司年会庆典PPT模板 SOME ENTERPRISE COMPANY ANNUAL MEETING PPT TEMPLATE.

赵才荣同济大学，电子与信息工程学院，智信馆410室

MGT 213 System Management Server的昨天，今天和明天

句子成分的省略（3）.

簡單迴歸分析與相關分析莊文忠副教授世新大學行政管理學系計量分析一(莊文忠副教授) 2019/8/3.

Principle and application of optical information technology

之前都是分类的蒸馏很简单。然后从分类到分割也是一样，下一篇是检测的蒸馏

Experimental Analysis of Distributed Graph Systems

A Trie-based Approach to Fast Flow Recognition for OpenFlow

Self-Attention huitr

Gaussian Process Ruohua Shi Meeting

Hybrid fractal zerotree wavelet image coding

Presentation transcript:

Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨 2018.09.21

Outline Motivation Algorithms Experiments Conclusion Review

Motivation The need to search on a large graph for semantic association between a group of entities Empty results -> disappoint user Query relax find maximum successful sub-query to minimize the loss. 在许多领域需要在很大的实体关联图上搜索一组查询实体间的复杂关联，搜索结果成为语义关联（semantic association），包含所有查询实体并且有大小限制。现有算法可能由于查询实体间距离较远而返回空的结果，给用户造成了不好的体验。于是，我们提出了查询松弛，为了减少损失，需要保留尽可能多的有关联的实体。

Motivation If diameter=3，the query result will be empty. Dist(Alice, Bob) = 4 > 3

Motivation Solution Result: {Alice, Dan} or {Bob, Dan} Increase the diameter constraint -> result in performance issues Remove some query entities (our solution) Result: {Alice, Dan} or {Bob, Dan}

Algorithms Baseline CertQR CertQR+ Heuristics

Algorithms —— Baseline Consider all possible sub-queries in non-increasing order of the number of query entities. For each sub-query, run existing algorithm for SA search and check whether it returns non-empty results. Low efficiency. Single run of SA search is exponential in the diameter constraint, and the number of sub-queries to be considered is exponential in the number of query entities. 考虑查询实体从多到少的所有组合，分别调用已有的搜关联的算法，直到搜到关联返回对应的子查询，否则返回空集。效率低下。单次关联搜索的时间随着直径指数级增长，子查询的数量随着查询实体的数量指数级增长。实验中先搜完路径，然后按照从多到少的组合判断是否有关联。

Algorithms —— CertQR Search for a certificate which can prove the existence of a SA rather than actually find such a SA. Certificate (c) : the distance between it and every query entity is not more than ceil(D/2) D=4, c=ICDE2019 dist(ICDE2019, Alice/Bob/Dan)<=ceil(D/2) 可以不用搜出实际的关联，而是找到一个certificate能够证明有关联存在。提出一个条件：该certificate到每个查询实体的距离不超过ceil(D/2)。 D为偶数时，该条件已经充分。

Algorithms —— CertQR D=3, the condition is still satisfied but it violate diameter constraint. critical query entity exactly ceil(D/2) distance Another condition c has a neighbor c’, the distance between c’ and each critical query entity is ceil(D/2)-1 Q={Dan, Erin, Frank} c = ICDE2019 c’ = Paper02 D为奇数时，只满足刚才的条件可能会违背直径约束，需要再加一个条件。可能造成违背直径约束的是那些到c刚好ceil(D/2)距离的实体，称为关键查询实体 c有一个邻居c’，它和那些到c有ceil(D/2)步的查询实体距离为ceil(D/2)-1。即保证了有一条边是公用的，也就不会违背直径约束了。例如查询Q={Dan, Erin, Frank}

Algorithms —— CertQR If certificate exists, it is not more than ceil(D/2) hops away from each entity in result query. Search and check all the entities within the scope in a breadth-first manner, return the maximum successful sub-query. 如果certificate存在，那么它到结果中的每个实体的距离一定不超过ceil(D/2)。穷举该范围内的所有点，返回最大的成功的子查询。

Algorithms —— CertQR+ Based on the idea of CertQR and improved. Best-first search Consider fewer entities, ensure that unvisited entities cannot be a certificate of any larger successful sub-query. Compute priority of entity, which estimates the number of query entities that is likely to obey the distance constrain through the certificate, is the upper bound of the result size. Use priority queue, if |Qmax|>=priority, return. 基于CertQR提出了改进的算法。 Best-first search，考虑更少的实体，保证没被访问过的实体不可能是更大的成功子查询的certificate。计算搜索到的实体的优先级，优先级估计了可能能够遵循距离约束的查询实体数量，是该实体关联的查询实体数量的一个上界。使用优先队列，如果当前实体的优先级<=当前最大的成功子查询的实体数量，就可直接返回。

Algorithms —— Heuristics dg: Degree-enhanced ds: Distance-enhanced dgs: Combined 启发式（为priority增加小数部分）基于度数：大度的点在扩展邻居和计算priority时较耗时，因此优先考虑小度点。基于距离：离可能的certificate近的点应该优先访问。两者结合。

Experiments Datasets Queries (D=3, 4, 5, 6) DBpedia LinkedMDB Mondial Sparse query For each dataset, each number of query entities n in the range of 2-6, we construct 100 queries. Dense query DBpedia: n 2-6, 92 queries. LinkedMDB: n 2-6, 39 queries. Mondial: n 2-6, 57 queries.

Experiments 原查询找不到关联的比例，有必要进行松弛。

Experiments Baseline CertQR CertQR+求不同直径的平均时间，可以看出我们的方法有效。三个启发式对于时间的提升。Combined效果总体最好

Conclusion Necessity of studying query relaxation for SA search Transform the problem into finding a certificate Depend on fast distance calculation, use distance oracle to achieve a trade-off between time and space. 证明了研究查询松弛的必要性。将实际找到关联转化为找certificate的问题，并据此提出了有效的best-first算法，以及两个启发式。但是算法依赖于快速的距离计算，使用distance oracle在时间和空间取得平衡。

Review Unfair baseline algorithm is set. The analysis of the experimental results is insufficient. Querying by entities directly is not convincing. Usually, users give keywords and matched entities need to be found. It is not user-friend to discard entities, better to relax diameter threshold. 1. Baseline设置不公平，BSL算法不仅返回了最大子查询，还计算了关联，CertQR和CertQR+没有考虑计算关联的时间。 2. 直接使用实体查询不令人信服，通常是给定关键词。 3. 实验结果分析不充分。 4. 移除实体对用户不友好，更好的方式是增加直径。

Q&A