Chapter 10 Graphs
Outline Introduction to Graphs Searches(DFS,BFS) Minimum Spanning Trees(MST 最小生成树) Topological Sorting with Directed Graphs(有向图的拓扑排序) Connectivity in Directed Graphs(有向图的连通性)
Graphs 是程序设计中的多面手。它解决的问题跟之前学过的数据结构大不相同。 The sorts of problems that it can solve are generally quite different from those we’ve dealt with before.
In next Chapter In this Chapter Unweighted graphs Weighted graphs
Introduction to Graphs A tree is a kind of graph. dictated by(取决于) the algorithms used on them. Binary Trees The shape of Graphs dictated by(取决于) a physical or abstract problem. Such as: Flight routes between the cities. or AOE.
Definitions vertices edges Adjacency Paths Connected Graphs At least one path from every vertex to every other vertex vertices edges Adjacency Paths Connected Graphs I J E D G F A H B C
Introduction to Graphs A non-connected graph consists of several connected components. A B C D A B D C a) Connected Graph b) Non-connected Graph
Introduction to Graphs Directed and Weighted Graphs A A 20 55 B B 40 D D 10 C C Directed Graphs Weighted Graphs We’re going to begin this chapter by discussing simple undirected, unweighted graphs; later we’ll explore directed unweighted graphs.
Introduction to Graphs Leonhard Euler Historical Note He solved a famous problem dealing with the bridges in the town of Konigsberg(今为加里宁格勒). To find a way to walk across all seven bridges without recrossing any of them. The key to his solution was to represent the problem as a graph
Introduction to Graphs Representing a Graph in a Program Vertices A B C D We’ll store them in an array called vertexList. It has no relevance to how the vertices are connected by edges. For this, we need another mechanism. class Vertex{ public char label; // label (e.g. ‘A’) public boolean wasVisited; public Vertex(char lab) // constructor { label = lab; wasVisited = false; } } // end class Vertex
Introduction to Graphs Edges Two methods are commonly used for graphs: the adjacency matrix(邻接矩阵) and the adjacency list(邻接表). A B C D The Adjacency Matrix
Introduction to Graphs B C D The Adjacency List Don’t confuse the contents of adjacency lists with paths!
Introduction to Graphs Adding Vertices to a Graph A nVerts B D vertexList F A B C D F C vertexList[nVerts++] = new Vertex(‘F’); A B C D Adding Edges to a Graph F 1 A B C D 1 adjMat[1][4] = 1; adjMat[4][1] = 1;
The Graph Class class Graph{ private final int MAX_VERTS = 20; private Vertex vertexList[]; // array of vertices private int adjMat[][]; // adjacency matrix private int nVerts; // current number of vertices public Graph() // constructor { vertexList = new Vertex[MAX_VERTS]; // adjacency matrix adjMat = new int[MAX_VERTS][MAX_VERTS]; nVerts = 0; for(int j=0; j<MAX_VERTS; j++) // set adjacency for(int k=0; k<MAX_VERTS; k++) // matrix to 0 adjMat[j][k] = 0; } // end constructor // continued…
The Graph Class public void addVertex(char lab) // argument is label { vertexList[nVerts++] = new Vertex(lab); } public void addEdge(int start, int end) adjMat[start][end] = 1; adjMat[end][start] = 1; public void displayVertex(int v) System.out.print(vertexList[v].label); } // end class Graph
Searches Fundamental operations: E.g. 某地出发乘火车旅行,设计电路板,…。 To find all vertices can be reached from a specified vertex. E.g. 某地出发乘火车旅行,设计电路板,…。 There are two common approaches to searching a graph: depth-first search (DFS) and breadth-first search (BFS). Both will eventually reach all connected vertices.
Searches Depth-First Search(深度优先查找) It uses a stack to remember where it should go when it reaches a dead end(死胡同). 1.Pick a starting point (A) (1)visit this vertex, (2)push it onto a stack (3)mark it 2.Go to any vertex adjacent to A that hasn’t yet been visited. Repeat 1(1)~(3)
Searches RULE 1 If possible, visit an adjacent unvisited vertex, mark it, and push it on the stack. RULE 2 If you can’t follow Rule 1, then, if possible, pop a vertex off the stack. RULE 3 If you can’t follow Rule 1 or Rule 2, you’re done.
Searches The GraphN Workshop Applet and DFS
Searches An Analogy The algorithm likes to get as far away from the starting point as quickly as possible and returns only when it reaches a dead end. An Analogy
Searches Java Code 设计 DFS algorithm 的关键:找到指定Vertex的邻接、而且未访问的Vertex. // returns an unvisited vertex adjacent to v public int getAdjUnvisitedVertex(int v) { for(int j=0; j<nVerts; j++) if(adjMat[v][j]==1 && vertexList[j].wasVisited==false) return j; // return first such vertex return -1; // no such vertices } // end getAdjUnvisitedVertex()
Searches 深度优先搜索算法(DFS) LISTING 13.1 The dfs.java Program 起点入栈S; while(栈S不空){ 1.用栈S的peek()方法检查栈顶的顶点; 2.试图找到这个顶点还未访问的邻接顶点; 3.如果没有找到,出栈; 4.如果找到这样的顶点,访问这个顶点,并把它放入栈。 } LISTING 13.1 The dfs.java Program
Searches Depth-First Search and Game Simulations 1997年,由1名国际特级大师, 4名电脑专家组成的“深蓝”小组研究开发出“更深的蓝”,它具有更加高级的“大脑”,通过安装在RS/6000S大型计算机上的256个专用处理芯片, 可以在每秒钟计算2亿步,并且存储了百年来世界顶尖棋手的10亿套棋谱,最后“超级深蓝”以3.5比2.5击败了卡斯帕罗夫. 卡斯帕罗夫要求重赛,但没有得到回应. Searches Depth-First Search and Game Simulations 9*8*7*6*5*4*3*2*1 paths 64!Or 1089 paths
基于博弈树的人机对战算法 博弈树上的每一个节点都代表一个棋局,棋手就是要在众多的叶子节点上挑选一个最佳的局面作为自己的选择, 从而反推到当前的着法
Searches Breadth-First Search(广度优先搜索) The algorithm likes to stay as close as possible to the starting point. This kind of search is implemented using a queue instead of a stack.
Searches An Example
Searches RULE 1 访问与当前Vertex邻接的、且未访问的Vertex, mark it, and insert it into the queue. RULE 2 若因为没有未访问的Vertex,而无法执行 RULE 1, remove a vertex from the queue (if possible) and make it the current vertex. RULE 3 If you can’t carry out Rule 2 because the queue is empty, you’re done.
Searches The GraphN Workshop Applet and BFS Java Code 起点入队Q while( Q不空 ) // until queue empty, { 队头出队v1; while( v1有未访问的邻接顶点v2) { // get one, 访问v2; v2入队 } LISTING 13.2 The bfs.java Program
扩展知识-网络爬虫(Web crawler) 如何有效地提取并利用网页信息成为一个巨大的挑战。搜索引擎(Search Engine) 作为一个辅助人们检索信息的工具成为用户访问WWW的入口。通用性搜索引擎的局限性: (1) 搜索引擎所返回的结果包含大量用户不关心的网页。 (2)通用搜索引擎的目标是尽可能大的网络覆盖率,有限的搜索引擎服务器资源与无限的网络数据资源之间的矛盾将进一步加深。 (3) 对这些信息含量密集且具有一定结构的数据无能为力,不能很好地发现和获取。 (4)大多只是基于关键字的检索,难以支持根据语义信息提出的查询
扩展知识-网络爬虫(Web crawler) 为了解决上述问题,定向抓取相关网页资源的聚焦爬虫应运而生。 聚焦爬虫是一个自动下载网页的程序,它根据既定的抓取目标,有选择的访问网页与相关的链接,获取所需要的信息。 它并不追求大的覆盖,而将目标定为抓取与某一特定主题内容相关的网页,为面向主题的用户查询准备数据资源。
扩展知识-网络爬虫(Web crawler) 网络爬虫的组成 控制器:主要工作是负责给多线程中的各个爬虫线程分配工作任务。 解析器:主要工作是下载网页,进行页面的处理,主要是将一些JS脚本标签、CSS代码内容、空格字符、HTML标签等内容处理掉,爬虫的基本工作是由解析器完成。 资源库:用来存放下载到的网页资源,一般都采用大型的数据库存储,如Oracle数据库,并对其建立索引。
扩展知识-网络爬虫(Web crawler) 深度优先算法实现的网络爬虫 Webcrawler.java
扩展知识-网络爬虫(Web crawler) 聚焦爬虫工作原理以及关键技术概述 传统爬虫从一个或若干初始网页的URL开始,获得初始网页上的URL,在抓取网页的过程中,不断从当前页面上抽取新的URL放入队列,直到满足系统的一定停止条件。 聚焦爬虫还需要根据网页分析算法过滤与主题无关的链接,保留有用的链接并将其放入等待抓取的URL队列。然后,它将根据一定的搜索策略从队列中选择下一步要抓取的网页URL,并重复上述过程,直到达到系统的某一条件时停止。 网页的抓取策略:深度优先、广度优先和最佳优先。深度优先在很多情况下会导致爬虫的陷入(trapped)问题,目前常见的是广度优先和最佳优先方法。
Minimum Spanning Trees(最小生成树) Graph with the minimum number of edges necessary to connect the vertices. A A B C B C D E D E a) Extra Edges b) Minimum Number of Edges
Minimum Spanning Trees(最小生成树) 在MST中,边数和顶点数的关系为: E = V – 1 GraphN Workshop Applet
Minimum Spanning Trees(最小生成树) Java Code for the Minimum Spanning Tree 代码跟深度优先查找很类似.区别在于: 在while内部的else 子句中,当前vertex和下一个未访问的邻接vertex将被输出。 These two vertices define the edge that the algorithm is currently traveling to get to a new vertex, and it’s these edges that make up the minimum spanning tree. The mst.java Program
Topological Sorting with Directed Graphs An Example: Course Prerequisites(先修课程)
Topological Sorting with Directed Graphs public void addEdge(int start, int end) // directed graph { adjMat[start][end] = 1; }
Topological Sorting with Directed Graphs A list of all the courses necessary for your degree BAEDGCFH 按这种方式得到的排列,称之为图的拓扑排序。 other possible orderings CFBAEDGH
Model other situations besides course prerequisites.
Topological Sorting with Directed Graphs The GraphD Workshop Applet STEP 1 找到一个没有后继的顶点v。 STEP 2 从图中删除v,在要输出的列表sortedArray前面插入v。 重复STEP1和STEP2,直到所有顶点都从图中删除。这时,列表显示的顶点顺序就是拓扑排序的结果。
Topological Sorting with Directed Graphs Cycles and Trees More general than binary and multiway trees What’s a cycle? It’s a path that ends up where it started. A A graph with no cycles is called a tree. B C Q:如何判断无向图中是否有环? D A:如果有N个顶点的图有超过N-1条边, 那么它必定存在环。
Topological Sorting with Directed Graphs Call noSuccessors() to find any vertex with no successors. Java Code 有未处理的 Vertex T vertex is found the graph must have a cycle F (1)put the vertex label at the end of sortedArray[] (2)delete the vertex from graph T Output sortedArray F
Topological Sorting with Directed Graphs The noSuccessors() method 按行扫描整个邻接矩阵 如果某行全为0.则查找成功,返回其行号。 如果所有行都不满足全为0,就返回一1。 Delete a vertex 顶点从vertexList[]数组删除,后面的顶点向前移动填补空位。同样,顶点的行、列从邻接矩阵中删除,后继的行、列移动来填补空位。这些任务由deleteVertex(), moveRowUp()和moveCoILeft()方法来完成。对于这个算法,用邻接表表示图效率更高,但要使用更多空间。
Connectivity in Directed Graphs The Connectivity Table(连通性表) What vertices can you reach if you start on a particular vertex? AC BACE C DEC EC Use dfs.java program to start the search on each vertex. We get
Connectivity in Directed Graphs Warshall’s Algorithm 在有些应用中,需要快速地找出是否一个顶点可以从其他顶点到达。例如想沿Hubris航线从Athens到Murinansk,如果不考虑中间要转多少次站,那么可能到达吗? 一种方法是检索连通性表,但是那要查看指定行的所有表项,需要O(N)的时间(这里N是指定顶点可到达的顶点数目平均值)。 可否有更快的方式?
Connectivity in Directed Graphs public class Warshall { /** * @param args */ public static void main(String[] args) { // TODO Auto-generated method stub int[][] adjMatrix = { { 0, 1, 0, 0 }, { 0, 0, 0, 1 }, { 0, 0, 0, 0 }, { 1, 0, 1, 0 } }; System.out.println("邻接矩阵:"); display(adjMatrix); int[][] bibao = Warshall(adjMatrix); System.out.println("传递闭包的矩阵:"); display(bibao); } private static void display(int[][] BiBao) { for (int i = 0; i < BiBao.length; i++) { for (int j = 0; j < BiBao.length; j++) System.out.print(BiBao[i][j] + " "); System.out.println(); public static int[][] Warshall(int[][] adjMatrix) { // 接受一个图的邻接矩阵为参数,返回表达它的传递闭包的矩阵 int[][] result = adjMatrix; // 初始结果R0 int n = adjMatrix.length; for (int k = 0; k < n; k++) // R0到Rn,做n步变换 { int[][] temp = new int[n][n]; // 每循环一下求下次结果 for (int i = 0; i < n; i++) for (int j = 0; j < n; j++) { if (result[i][j] == 1) temp[i][j] = 1; else if (result[i][k] == 1 && result[k][j] == 1) else temp[i][j] = 0; result = temp; return result; 用 Warshall’s algorithm 将邻接矩阵变为图的传递闭包. 所有顶点间的路径可达情况,该矩阵就为所要求的传递闭包矩阵 A可到C B可到C B可到A
Connectivity in Directed Graphs
Summary 图包含由边连接的顶点。 图可以表示许多真实世界的情况,包括飞机航线、电子线路和工作调度。 搜索算法以一种系统的方式访问图中的每个顶点。搜索是其他行为的基础。 两个主要的搜索算法是深度优先搜索(DFS )和广度优先搜索(BFS)。 深度优先搜索通过栈实现;广度优先搜索通过队列实现。 最小生成树(MST)包含连接图中所有顶点所需要的最少数量的边。 在不带权的图中,简单修改深度优先搜索算法就可以生成它的最小生成树。
在有向图中,边有方向(通常用箭头表示)。 拓扑排序算法创建这样一个顶点的排列列表:如果从A到B有一条边,那么在列表中顶点A在顶点B的前面。 拓扑排序只能在DAG中执行,DAG表示有向无环(没有环存在)图。 拓扑排序的典型应用是复杂项目的调度,它包含了任务和任务之间的前后关系。 Warshall算法能找到任意顶点和另外顶点之间是否有连接,不管是通过一步还是任意步到达。
The End