Applied Human Computer Interaction Lecture 10 Yan Ke

Slides:



Advertisements
Similar presentations
Teenagers should be allowed to choose their own clothes. (1a-1c) Section A.
Advertisements

新目标 Go For It 九年级 Unit3 情景交际用语之问路与指路 广东省东莞市石碣袁崇焕中学 彭丽霞.
Chapter 2 Combinatorial Analysis 主講人 : 虞台文. Content Basic Procedure for Probability Calculation Counting – Ordered Samples with Replacement – Ordered.
allow v. wrong adj. What’s wrong? midnight n. look through guess v. deal n. big deal work out 允许;准许 有毛病;错误的 哪儿不舒服? 午夜;子夜 快速查看;浏览 猜测;估计 协议;交易 重要的事.
Moral Reasoning 道德推理 Moral Reasoning 台大哲學系 林火旺 教授
高考短文改错专题 张柱平. 高考短文改错专题 一. 对短文改错的要求 高考短文改错的目的在于测试考生判断发现, 纠正语篇中 语言使用错误的能力, 以及考察考生在语篇中综合运用英 语知识的能力. 二. 高考短文改错的命题特点 高考短文改错题的形式有说明文. 短文故事. 书信等, 具有很 强的实用性.
考研英语复试 口语准备 考研英语口语复试. 考研英语复试 口语准备 服装 谦虚、微笑、自信 态度积极 乐观沉稳.
2014 年上学期 湖南长郡卫星远程学校 制作 13 Getting news from the Internet.
Section B Period Two.
-CHINESE TIME (中文时间): Free Response idea: 你周末做了什么?
Section A Period 1 (1a-2d) Unit 8. 1.There be “ 某处有 ( 存在 ) 某人或某物 ” 结构 :There be (is, are, was, were)+ 名词 + 地点状语。 There are forty-eight students in our.
中职英语课程改革中 如何实践“以就业为导向,服务为宗旨”的办学理念
How can we become good leamers
Could you please tell me where the restrooms are? Unit 11.
1. Ask for information politely 用宾语从句礼貌的寻求帮助 2. Follow directions 能用正确的方法指路.
初中进阶 (2346 期 ) 1 版. 1. What types of bullying do you know about? Physical hitting, tripping, stealing and hair pulling Social telling other kids.
摘要的开头: The passage mainly tells us sth.
Unit 5 Dialogues Detailed Study of Dialogues (对话) Exercises(练习)
Unit 9 What does he look like?
深層學習 暑期訓練 (2017).
What do you think of game shows?
Could you please clean your room?
Homework 4 an innovative design process model TEAM 7
Unit 4 I used to be afraid of the dark.
Module 5 Shopping 第2课时.
Ⅱ、从方框里选择合适的单词填空,使句子完整通顺。 [ size beef special large yet ]
Unit 2 What should I do?.
I always like birthday parties.
Unit3 Our animal friends Cartoon time, Checkout time &Ticking time
Learning goals for this class
Cross cultural communication in college english
但是如果你把它发给最少两个朋友。。。你将会有3年的好运气!!!
This Is English 3 双向视频文稿.
子博弈完美Nash均衡 我们知道,一个博弈可以有多于一个的Nash均衡。在某些情况下,我们可以按照“子博弈完美”的要求,把不符合这个要求的均衡去掉。 扩展型博弈G的一部分g叫做一个子博弈,如果g包含某个节点和它所有的后继点,并且一个G的信息集或者和g不相交,或者整个含于g。 一个Nash均衡称为子博弈完美的,如果它在每.
陕西省教育科学研究所 张雪莲 初中英语教学与2011年中考命题趋势思考 陕西省教育科学研究所 张雪莲
Lesson 28 How Do I Learn English?
Section B 2b–3b & Self Check
Lesson 44:Popular Sayings
神经信息学 自组织网络 ——自组织映射 史忠植 中科院计算所 2019/2/2.
第五章 聚类方法 内容提要 聚类方法概述 划分聚类方法 层次聚类方法 密度聚类方法 其它聚类方法 2019年2月17日星期日
Try to write He Mengling Daqu Middle School.
外事英语 主讲:陈蔼琦 2019/2/17.
基于课程标准的校本课程教学研究 乐清中学 赵海霞.
錢買不到的禮物 自動換頁 音樂:海莉·衛斯頓 演唱<Nada Sousou> 日本電影「淚光閃閃」主題曲英文版
第十五课:在医院看病.
Single’s Day.
客户服务 售后服务.
Section A Period 2.
Good Karma 善業 原稿:牛Sir 配楽:懺悔經 捕頭恭製 按鍵換頁.
BORROWING SUBTRACTION WITHIN 20
中央社新聞— <LTTC:台灣學生英語聽說提升 讀寫相對下降>
True friendship is like sound health;
中考英语阅读理解 完成句子命题与备考 宝鸡市教育局教研室 任军利
商業英文 組員: 張裕欣 廖彥鈞 吳鎵佑 陳奕達.
Good Karma 善因緣 This is a nice reading, but short. Enjoy! This is what The Dalai Lama has to say for All it takes is a few seconds to read and think.
冀教版 九年级 Lesson 20: Say It in Five.
李宏毅專題 Track A, B, C 的時間、地點開學前通知
M; Well, let me check again with Jane
Good Karma 善因緣 This is a nice reading, but short. Enjoy! This is what The Dalai Lama has to say for All it takes is a few seconds to read and think.
2 Number Systems, Operations, and Codes
常州市教育学会学业水平监测 九年级英语试卷分析 常州市第二十四中学 许喆 2012年2月.
多维阅读第4级 Food for Zebras.
錢買不到的禮物 自動換頁 音樂:海莉·衛斯頓 演唱<Nada Sousou> 日本電影「淚光閃閃」主題曲英文版
The Doorbell Rang Hangzhou Yucai Primary School June wu.
Sun-Star第六届全国青少年英语口语大赛 全国总决赛 2015年2月 北京
冀教版 三年级下册 Lesson 18 The Magic Stone.
高考英语作文指导 福建省教研室 姚瑞兰.
Train Track and Children
Climbing a Rock Wall 攀岩 选自《多维阅读第10级》.
Presentation transcript:

Applied Human Computer Interaction Lecture 10 Yan Ke Clustering Applied Human Computer Interaction Lecture 10 Yan Ke

Basic Clustering Methods Kmeans Expectation Maximization (EM)

Some Data This could easily be modeled by a Gaussian Mixture (with 5 components) But let’s look at an satisfying, friendly and infinitely popular alternative… Copyright © 2001, 2004, Andrew W. Moore

Suppose you transmit the coordinates of points drawn randomly from this dataset. You can install decoding software at the receiver. You’re only allowed to send two bits per point. It’ll have to be a “lossy transmission”. Loss = Sum Squared Error between decoded coords and original coords. What encoder/decoder will lose the least information? Lossy Compression Copyright © 2001, 2004, Andrew W. Moore

Idea One Any Better Ideas? Suppose you transmit the coordinates of points drawn randomly from this dataset. You can install decoding software at the receiver. You’re only allowed to send two bits per point. It’ll have to be a “lossy transmission”. Loss = Sum Squared Error between decoded coords and original coords. What encoder/decoder will lose the least information? Idea One Break into a grid, decode each bit-pair as the middle of each grid-cell 00 01 10 11 Any Better Ideas? Copyright © 2001, 2004, Andrew W. Moore

Idea Two Any Further Ideas? Suppose you transmit the coordinates of points drawn randomly from this dataset. You can install decoding software at the receiver. You’re only allowed to send two bits per point. It’ll have to be a “lossy transmission”. Loss = Sum Squared Error between decoded coords and original coords. What encoder/decoder will lose the least information? Idea Two Break into a grid, decode each bit-pair as the centroid of all data in that grid-cell 00 01 11 10 Any Further Ideas? Copyright © 2001, 2004, Andrew W. Moore

K-means Ask user how many clusters they’d like. (e.g. k=5) Copyright © 2001, 2004, Andrew W. Moore

K-means Ask user how many clusters they’d like. (e.g. k=5) Randomly guess k cluster Center locations Copyright © 2001, 2004, Andrew W. Moore

K-means Ask user how many clusters they’d like. (e.g. k=5) Randomly guess k cluster Center locations Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints) Copyright © 2001, 2004, Andrew W. Moore

K-means Ask user how many clusters they’d like. (e.g. k=5) Randomly guess k cluster Center locations Each datapoint finds out which Center it’s closest to. Each Center finds the centroid of the points it owns Copyright © 2001, 2004, Andrew W. Moore

K-means Ask user how many clusters they’d like. (e.g. k=5) Randomly guess k cluster Center locations Each datapoint finds out which Center it’s closest to. Each Center finds the centroid of the points it owns… …and jumps there …Repeat until terminated! Copyright © 2001, 2004, Andrew W. Moore

K-means Start Advance apologies: in Black and White this example will deteriorate Example generated by Dan Pelleg’s super-duper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999, (KDD99) (available on www.autonlab.org/pap.html) Copyright © 2001, 2004, Andrew W. Moore

K-means continues… Copyright © 2001, 2004, Andrew W. Moore

K-means continues… Copyright © 2001, 2004, Andrew W. Moore

K-means continues… Copyright © 2001, 2004, Andrew W. Moore

K-means continues… Copyright © 2001, 2004, Andrew W. Moore

K-means continues… Copyright © 2001, 2004, Andrew W. Moore

K-means continues… Copyright © 2001, 2004, Andrew W. Moore

K-means continues… Copyright © 2001, 2004, Andrew W. Moore

K-means continues… Copyright © 2001, 2004, Andrew W. Moore

K-means terminates Copyright © 2001, 2004, Andrew W. Moore

K-means Questions What is it trying to optimize? Are we sure it will terminate? Are we sure it will find an optimal clustering? How should we start it? How could we automatically choose the number of centers? ….we’ll deal with these questions over the next few slides Copyright © 2001, 2004, Andrew W. Moore

Will we find the optimal configuration? Not necessarily. Can you invent a configuration that has converged, but does not have the minimum distortion? (Hint: try a fiendish k=3 configuration here…) Copyright © 2001, 2004, Andrew W. Moore

Will we find the optimal configuration? Not necessarily. Can you invent a configuration that has converged, but does not have the minimum distortion? (Hint: try a fiendish k=3 configuration here…) Copyright © 2001, 2004, Andrew W. Moore

Trying to find good optima Idea 1: Be careful about where you start Idea 2: Do many runs of k-means, each from a different random start configuration Many other ideas floating around. Copyright © 2001, 2004, Andrew W. Moore

Common uses of K-means Often used as an exploratory data analysis tool In one-dimension, a good way to quantize real-valued variables into k non-uniform buckets Used on acoustic data in speech understanding to convert waveforms into one of k categories (known as Vector Quantization) Also used for choosing color palettes on old fashioned graphical display devices! Copyright © 2001, 2004, Andrew W. Moore

Single Linkage Hierarchical Clustering Say “Every point is its own cluster” Copyright © 2001, 2004, Andrew W. Moore

Single Linkage Hierarchical Clustering Say “Every point is its own cluster” Find “most similar” pair of clusters Copyright © 2001, 2004, Andrew W. Moore

Single Linkage Hierarchical Clustering Say “Every point is its own cluster” Find “most similar” pair of clusters Merge it into a parent cluster Copyright © 2001, 2004, Andrew W. Moore

Single Linkage Hierarchical Clustering Say “Every point is its own cluster” Find “most similar” pair of clusters Merge it into a parent cluster Repeat Copyright © 2001, 2004, Andrew W. Moore

Single Linkage Hierarchical Clustering Say “Every point is its own cluster” Find “most similar” pair of clusters Merge it into a parent cluster Repeat Copyright © 2001, 2004, Andrew W. Moore

Single Linkage Hierarchical Clustering How do we define similarity between clusters? Minimum distance between points in clusters (in which case we’re simply doing Euclidian Minimum Spanning Trees) Maximum distance between points in clusters Average distance between points in clusters Say “Every point is its own cluster” Find “most similar” pair of clusters Merge it into a parent cluster Repeat…until you’ve merged the whole dataset into one cluster You’re left with a nice dendrogram, or taxonomy, or hierarchy of datapoints (not shown here) Copyright © 2001, 2004, Andrew W. Moore

Single Linkage Comments Also known in the trade as Hierarchical Agglomerative Clustering (note the acronym) Single Linkage Comments It’s nice that you get a hierarchy instead of an amorphous collection of groups If you want k groups, just cut the (k-1) longest links There’s no real statistical or information-theoretic foundation to this. Makes your lecturer feel a bit queasy. Copyright © 2001, 2004, Andrew W. Moore

What you should know All the details of K-means The theory behind K-means as an optimization algorithm How K-means can get stuck The outline of Hierarchical clustering Be able to contrast between which problems would be relatively well/poorly suited to K-means vs Gaussian Mixtures vs Hierarchical clustering Copyright © 2001, 2004, Andrew W. Moore

Exercise 1: Group these cards into three groups using K-means method

Layer-based Clustering Methods Caution: some Chinese slides comes in…

层次聚类方法概述 层次聚类方法对给定的数据集进行层次的分解,直到某种条件满足为止。具体又可分为: 凝聚的层次聚类:一种自底向上的策略,首先将每个对象作为一个簇,然后合并这些原子簇为越来越大的簇,直到某个终结条件被满足。 分裂的层次聚类:采用自顶向下的策略,它首先将所有对象置于一个簇中,然后逐渐细分为越来越小的簇,直到达到了某个终结条件。 层次凝聚的代表是AGNES算法。层次分裂的代表是DIANA算法。 2019年4月26日星期五 DMKD Sides By MAO

AGNES算法 AGNES (AGglomerative NESting)算法最初将每个对象作为一个簇,然后这些簇根据某些准则被一步步地合并。两个簇间的相似度由这两个不同簇中距离最近的数据点对的相似度来确定。聚类的合并过程反复进行直到所有的对象最终满足簇数目。 算法5-3 AGNES(自底向上凝聚算法) 输入:包含n个对象的数据库,终止条件簇的数目k。 输出:k个簇,达到终止条件规定簇数目。 (1) 将每个对象当成一个初始簇; (2) REPEAT (3) 根据两个簇中最近的数据点找到最近的两个簇; (4) 合并两个簇,生成新的簇的集合; (5) UNTIL 达到定义的簇的数目; 2019年4月26日星期五 DMKD Sides By MAO

AGNES算法例子 第1步:根据初始簇计算每个簇之间的距离,随机找出距离最小的两个簇,进行合并,最小距离为1,合并后1,2点合并为一个簇。 序号 属性 1 属性 2 1 1 1 2 1 2 3 2 1 4 2 2 5 3 4 6 3 5 7 4 4 8 4 5 第1步:根据初始簇计算每个簇之间的距离,随机找出距离最小的两个簇,进行合并,最小距离为1,合并后1,2点合并为一个簇。 第2步:,对上一次合并后的簇计算簇间距离,找出距离最近的两个簇进行合并,合并后3,4点成为一簇。 第3步:重复第2步的工作,5,6点成为一簇。 第4步:重复第2步的工作,7,8点成为一簇。 第5步:合并{1,2},{3,4}成为一个包含四个点的簇。 第6步:合并{5,6},{7,8},由于合并后的簇的数目已经达到了用户输入的终止条件程序结束。 步骤 最近的簇距离 最近的两个簇 合并后的新簇 1 1 {1},{2} {1,2},{3},{4},{5},{6},{7},{8} 2 1 {3},{4} {1,2},{3,4},{5},{6},{7},{8} 3 1 {5},{6} {1,2},{3,4},{5,6},{7},{8} 4 1 {7},{8} {1,2},{3,4},{5,6},{7,8} 5 1 {1,2},{3,4} {1,2,3,4},{5,6},{7,8} 6 1 {5,6},{7,8} {1,2,3,4},{5,6,7,8}结束 2019年4月26日星期五 DMKD Sides By MAO

AGNES性能分析 AGNES算法比较简单,但经常会遇到合并点选择的困难。假如一旦一组对象被合并,下一步的处理将在新生成的簇上进行。已做处理不能撤消,聚类之间也不能交换对象。如果在某一步没有很好的选择合并的决定,可能会导致低质量的聚类结果。 这种聚类方法不具有很好的可伸缩性,因为合并的决定需要检查和估算大量的对象或簇。 假定在开始的时候有n个簇,在结束的时候有1个簇,因此在主循环中有n次迭代,在第i次迭代中,我们必须在n-i+1个簇中找到最靠近的两个聚类。另外算法必须计算所有对象两两之间的距离,因此这个算法的复杂度为 O(n2),该算法对于n很大的情况是不适用的。 2019年4月26日星期五 DMKD Sides By MAO

DIANA算法 DIANA (Divisive ANAlysis)算法是典型的分裂聚类方法。 在聚类中,用户能定义希望得到的簇数目作为一个结束条件。同时,它使用下面两种测度方法: 簇的直径:在一个簇中的任意两个数据点的距离中的最大值。 平均相异度(平均距离): 算法5-4 DIANA(自顶向下分裂算法) 输入:包含n个对象的数据库,终止条件簇的数目k。 输出:k个簇,达到终止条件规定簇数目。 (1)将所有对象整个当成一个初始簇; (2) FOR (i=1; i≠k; i++) DO BEGIN (3) 在所有簇中挑出具有最大直径的簇C; (4) 找出C中与其它点平均相异度最大的一个点p并把p放入splinter group,剩余的放在old party中; (5). REPEAT (6) 在old party里找出到最近的splinter group中的点的距离不大于到old party中最近点的距离的点,并将该点加入splinter group。 (7) UNTIL 没有新的old party的点被分配给splinter group; (8) splinter group和old party为被选中的簇分裂成的两个簇,与其它簇一起组成新的簇集合。 (9) END. 2019年4月26日星期五 DMKD Sides By MAO

DIANA算法例子 第1步,找到具有最大直径的簇,对簇中的每个点计算平均相异度(假定采用是欧式距离)。 序号 属性 1 属性 2 1 1 1 2 1 2 3 2 1 4 2 2 5 3 4 6 3 5 7 4 4 8 4 5 第1步,找到具有最大直径的簇,对簇中的每个点计算平均相异度(假定采用是欧式距离)。 1的平均距离:(1+1+1.414+3.6+4.24+4.47+5)/7=2.96 类似地,2的平均距离为2.526;3的平均距离为2.68;4的平均距离为2.18;5的平均距离为2.18;6的平均距离为2.68;7的平均距离为2.526;8的平均距离为2.96。 挑出平均相异度最大的点1放到splinter group中,剩余点在old party中。 第2步,在old party里找出到最近的splinter group中的点的距离不大于到old party中最近的点的距离的点,将该点放入splinter group中,该点是2。 第3步,重复第2步的工作,splinter group中放入点3。 第4步,重复第2步的工作,splinter group中放入点4。 第5步,没有在old party中的点放入了splinter group中且达到终止条件(k-2),程序终止。如果没有到终止条件,因该从分裂好的簇中选一个直径最大的簇继续分裂。 步骤 具有最大直径的簇 splinter group Old party 1 {1,2,3,4,5,6,7,8} {1} {2,3,4,5,6,7,8} 2 {1,2,3,4,5,6,7,8} {1,2} {3,4,5,6,7,8} 3 {1,2,3,4,5,6,7,8} {1,2,3} {4,5,6,7,8} 4 {1,2,3,4,5,6,7,8} {1,2,3,4} {5,6,7,8} 5 {1,2,3,4,5,6,7,8} {1,2,3,4} {5,6,7,8} 终止 2019年4月26日星期五 DMKD Sides By MAO

Exercise 2: Group the credit card applicants into two groups (using AGNES or DIANA)

Stock Index Forecasting Applied Human Computer Interaction Lecture 11 Yan Ke

Stock indices

Stock Index Data day1 1st min 2nd min … Class1 day2 Class2 day3 Class3

Averaged curves