Download presentation
Presentation is loading. Please wait.
Published byΆριστόδημος Παπαδάκης Modified 6年之前
1
SView0.2-排序 (续) Rank aggregation Two rank aggregation method
Kemeny method Spearman's Footrule method Improve Folksorting How to do experiments and evalution
2
1. Ranking aggregation This is a classical problem from social choice and voting theory, in which each voter gives a preference on a set of alternatives, and the system outputs a single preference order on the set of alternatives based on the voters' preferences
3
Election Winner Two such schemes. One was invented by the mathematician Charles Dodgson (better known as Lewis Carroll), and the other was suggested by J. Kemeny. We show that under a either a Dodgson election or a Kemeny election, it is NP-hard (that is, at least as hard as an NP-complete problem) to determine whether any particular candidate has won!
4
a. Dodgson Election Dodgson score is NP-complete.
Now we contrive an election for which determining the winner entails solving Exact Cover by 3-Sets (X3C), which is known to be NP-complete.
5
b. Kemeny Election Kemeny score is NP-complete, and Kemeny ranking and Kemeny winner are NP-hard. Proof. Kemeny score is in NP since the score of any candidate can be computed in polynomial time. Kemeny ranking and Kemeny winner is finding a minimum feedback arc set
6
Feedback Arc Set 给定一个有向图G=( V, A) , V={v1 , v2 , ⋯, vn }, A={a1 , a2 , ⋯, am}, 其中V 为顶点集合, A 为弧集合, n 为顶点个数,m为弧个数, 要求找到一个最小的子集A*A, 使得G*=( V, A-A*) 无环。
7
2. Rank aggregation method
Kemeny method Spearman's Footrule method
8
Definition U: a finite set of elements. {x1,x2,…,xn}. By assignig a unique identifier to each element in U, assume U={1,2,…,|U|} Τ: ordered list of U. Τ(i): denote position of i in T.( i U) πk: the kth user preference orders on U. σ: is closest to a given set of input lists {πk }, called consensus order.
9
Kemeny method 距离度量:2个序列π₁, π₂的Kendall-tau距离为:
K(π₁, π₂) = |{(i, j)|i<j , π₁(i) < π₁( j) ,but π₂(i) > π₂(j)}| counts the number of pairwise disagreements between two lists Kemeny 排序方法对由k 个排序π₁, π₂,…, πk计算最优的合成排序σ ,使得Kendall-tau距离总和 K(σ; π₁, π₂,…, πk) = 最小。 The Kendall tau distance counts the number of pairwise disagreements between two lists; 成对分歧 that is, the distance between two full lists σ and τ is K(σ; τ) =|{(i; j)| i < j; σ(i) <σ(j); but τ(i) >τ(j)}|. Dividing this number by the maximum possible value 从|S|选2 ; we obtain a normalized version of the Kendall distance. The Kendall distance for full lists is the `bubble sort' distance, i.e., the number of pairwise adjacent transpositions needed to transform from one list to the other. The Kendall distance between two lists of length n can be computed in nlogn time using simple data structures. K(σ, πl)
10
A weighted complete digraph G=(V ,E, W)
Kemeny optimal aggregation is equivalent to find a minimum feedback arc sets(FAS) in digraph. A weighted complete digraph G=(V ,E, W) V: the set of elements. Here, V=U. E: the set of arcs. <xi, xj>: prefer xi to xj W: the total K-distance (from the πi‘s) of a ranking that among elements. weight W(xi, xj)= |{<xi, xj>|xi< xj , π (xi) < π (xj)| x1 x2 x3 x4
11
This problem can be cast as a special case of weighted minimum feedback arc sets in tournaments.
x1 x2 x4 x3 The minimum feedback arc set problem can be approximated to within O(logn loglog n) in general graphs and has (at least) the same approximation hardness as the vertex cover problem, which is 1.36. Even 等将带权有向图FAS 问题规约为有向循环网络( circular networks) 中最小容量多路分割问题( minimum capacity multicut) , 提出了一个近似度为O( log2|X| ) 的算法, 其中X 是给定的顶点集合, 从图中删除FAS 中边将使得图中没有圈与X 中点相交
12
computational complexity
Total = Distance computed+ kemeny optimal aggregation Distance computed: Two lists :K(π₁ , π₂ ) is O(nlogn) W(xi, xj): O(knlogn); W: O((n*(n-1)/2) *knlogn)) kemeny optimal aggregation: minimum feedback arc sets
13
Spearman's Footrule method
距离度量:2个序列π₁, π₂ 的Spearman's Footrule distance F(π₁ , π₂ ) = the sum of the absolute difference between the rank of i according to the two lists. Spearman排序方法对由k 个排序π₁, π₂,…, πk计算最优的合成排序σ,使得Spearman's Footrule距离总和 最小。 | π₁(i)-π₂(i) | The Spearman footrule distance is the sum, over all elements i ∈S, of the absolute difference between the rank of i according to the two lists. 元素i所在不同序列中的位置绝对差 Formally, given two full lists σ and τ , the distance is given by F(σ; τ) After dividing this number by the maximum value |S|2/2, one can obtain a normalized value of the footrule distance,which is always between 0 and 1. The footrule distance between two lists can be computed in linear time. F(σ, πl)
14
footrule optimal aggregation is equivalent to find a minimum cost perfect matching in a bipartite graph. A weighted complete bipartite graph G=(C ,P, W) C: the set of elements. Here, C=U. P: the set of available positions. PN+ W: the total F-distance (from the πi's) of a ranking that places element c at position p. W(c,p)= |πi(c)-p | … 1 2 n P: C: x1 x2 xn W:
15
computational complexity
16
summary Above two method according to full list
Two method for partial lists is more problematic. footrule optimal aggregation to be equivalent to the NP-hard problem of computing the minimum number of edges to delete to convert a directed graph into a directed cyclic graph(DAG). Distance measure: scaled footrule distance; induced footrule distance.
17
3. Improve Folksorting Definition
U: a finite set of elements. {x1,x2,…,xn}. By assigning a unique identifier to each element in U, assume U={1,2,…,|U|} Τ: ordered list of U. Τ(i): denote position of i in T.( i U) Τ∣S : supposed SU, a partial list of T that contains only elements from S. Wi: Given i=1,2,…,k users, Wi U, represent the scope of the ith user’s focus on U, all of the elements involved in the user’s movement
18
πi: the ith user rearrangement the list Τ∣Wi, finally obtain a preference order. πi denote a permutation of Τ∣Wi σ: called consensus permutation, is closest to a given set U of input permutations{πi}, finally obtain a consensus order of U. d(π₁ , π₂ )= D(σ; π₁, π₂,…, πk)= d(σ∣πi , πi)= costF(σ∣πi(x) , πl (x)) = | π₁(i)π₂(i) | d(σ∣πi , πi) costF(σ∣πi(x) , πi (x))
19
① Use for reference from Kemeny optimal aggregation
equivalent to find a maximum weight directed cycle in a weighted digraph. A weighted directed graph G=(V ,E, W) V: the set of elements. Here, V= Wi. E: the set of edges, with respect to edges of permutation graph of {πi}, and delete duplicate edge. W(xi, xj): the number of duplicate edge in edges of permutation graph of {πi}
20
permutation graph Majority graph x1 x2 x5 x3 x4 2 x1 x2 Target:
原始序列 (x1 x2 x3 x4) ;用户移动后序列(x4 x1 x2 x3) ;置换为πi =(x1 x4 x3 x2) x1 x2 x5 x3 x4 2 Majority graph 原始序列 (x1 x2 x3 x4 x5) ; 3个用户移动后序列分别(x4 x1 x2 x3 x5) , (x1 x3 x4 x5x2), (x1 x2 x4 x5 x3 ), 对应π1=(x1 x4 x3 x2) π2=(x2 x3 x4 x5) π3=(x3 x4 x5) x1 x2 Target: maximum weight directed cycle 2 x3 x5 2 2 x4
21
② Use for reference from Footrule optimal aggregation
equivalent to find a maximum weight perfect matching in a bipartite graph. x1 x2 x5 x3 x4 2 最终得到:每个元素对应全序列中的位置; 可能存在不同的结果集 T(x1) T(x2) T(x3) T(x4) T(x5) x1 x2 x5 x3 x4 2 2 x1 x2 x3 x4 x5
22
summary 借鉴前面的两种方法,利用图论中问题来解决。 不同点
输入:kemeny和footrule是lists,folksorting是置换; kemeny和footrule的图中边的权重是距离,folksorting是边出现重复次数 计算复杂度,folksorting边权重计算:W(xi, xj): is O(k),W: O(nk); kemeny和footrule是O((n*(n-1)/2) *knlogn)) 和O(n2k)。 优化算法的复杂度一样O(n2.5). [但nfolksorting nfootrule ] 现提出改进方法,边的权重计算没有完全体现出hamming距离,总目标可以体现出 现没有对 涉及用户的非关注点的元素的边权重进行类似induced,scaled的计算 现选择边时,除权重外,没加入启发式规则
23
谢谢 请提出宝贵意见
Similar presentations