Presentation is loading. Please wait.

Presentation is loading. Please wait.

1. Introduction, Notation

Similar presentations


Presentation on theme: "1. Introduction, Notation"— Presentation transcript:

1 1. Introduction, Notation
Chapter 7 SEARCHING 1. Introduction, Notation 2. Sequential Search 3. Binary Search 4. Comparison Trees 5. Lower Bounds 6. Asymptotic 7. Pointers and Pitfalls

2 7.1 search:Introduction and notation
◆We are given a list of records, where each record is associated with one piece of information, which we shall call a key. ◆We are given one key, called the target, and are asked to search the list to nd the record(s) (if any) whose key is the same as the target. ◆There may be more than one record with the same key, or there may be no record with a given key.

3 ◆We often ask how times one key is compared with another during a search. This gives us a good measure of the total amount of work that the algorithm will do. ◆Internal searching means that all the records are kept in high-speed memory. In external searching, most of the records are kept in disk les. We study only internal searching. ◆For now, we consider only contiguous lists of records. We shall consider linked structures in Chapter 10.

4 Records and Keys in C++ The records (from a class Record) that are stored in the list being searched (generally called the list) must conform to the following minimal standards: ◆Every Record is associated to a key (of a type or class called Key). ◆Key objects can be compared with the standard operators == , != , <, >, <= , >= . ◆There is a conversion operation to turn any Record into its associated Key.

5 ◆Every Record is associated to a key (of a type or class called Key).
The records (from a class Record) that are stored in the list being searched (generally called the list) must conform to the following minimal standards: ◆Every Record is associated to a key (of a type or class called Key). ◆Key objects can be compared with the standard operators == , != , <, >, <= , >= . ◆There is a conversion operation to turn any Record into its associated Key. ◆Record objects can be compared to each other or to a Key by first converting the Record objects to their

6 associated keys. ◆Examples of the conversion operation: ◆A method of the class Record, with the declaration operator Key( ) const; ◆A constructor for the class Key, with declaration Key(const Record &); ◆If the classes Key and Record are identical, no conversion needs to be defined, since any Record is automatically a Key. ◆We do not assume that a Record has a Key object as a data member, although often it does. We merely assume that the compiler can turn a Record into its corresponding Key.

7 Parameters for Search Functions
Each searching function has two input parameters: ◆First is the list to be searched; ◆Second is the target key for which we are searching. Each searching function will also have an output parameter and a returned value: ◆The returned value has type Error_code and indicates whether or not the search is successful in finding an entry with the target key. ◆If the search is successful, then the returned value is success,and the output parameter called position will locate the target within the list.

8 ◆If the search is unsuccessful, then the value not present is returned,and the output parameter may have an undefined value or a value that will differ from one method to another. Key Definition in C++ To select existing types as records and keys, a client could use type definition statements such as: typedef int Key; typedef int Record; A client can design new classes that display appropriate behavior based on the following skeletons:

9 在本章我们使用的类声明如下: typedef Key Record; class Key { public:
Definition of a Key class 在本章我们使用的类声明如下: class Key { public: static int comparisons; Key(int x=0) { key = x; } int the_key( ) const { return key; } Key& operator=(const int x) { key = x; return *this; } private: int key; char *other; }; static data number 由于只关心查找key, 不关心Record中的 其它内容,因此将 Record定义为key typedef Key Record;

10 7.2 Sequential Search 1. Algorithm and procedure Begin at one end of the list and scan down it until the desired key is found or the other end is reached: 我们在第6章中实现的find方法: int Find(const List_entry &x); // 查找元素x,找不到返回-1,否则为x在表中的位置 就是顺序查找方法,与本节书上的方法区别仅仅是: 返回了类型是Error_code,位置由引用参数带回。

11 Error_code sequential_search( List<Record> & the_list,
const int &target, int &position) /* Post: If an entry in the list has key equal to target, then return success and the output parameter position locates such an entry within the list. Otherwise return not_present and position becomes invalid. */ { for( position=0; position<the_list.size( ); position++) { Record x; the_list.retrieve(position, x); if(x==target) return success; } return not_present; // 脱离for,则检索不成功

12 To analyze the behavior of an algorithm that makes comparisons of keys, we shall use the count of these key comparisons as our measure of the work done. 2. Analysis The number of comparisons of keys done in sequential search of a list of length n is ◆ Unsuccessful search: n comparisons. ◆Successful search, best case: 1 comparison. ◆Successful search, worst case: n comparisons. ◆Successful search, average case: comparisons.

13 3. Testing Program ◆For test purposes, use integer keys, and do not store any data other than a key in a record. Accordingly, we define typedef Key Record; ◆Keep a count of all key comparison operations, by modifying the overloaded key comparison operations to increment a counter whenever they are called. ◆This counter must be available to all Key objects: Thus, it should be declared as a static class member. In C++, static class members provide data objects that are shared by every instance of the class.

14 ◆The method the key inspects a copy of a key's value.
◆The static counter comparisons is incremented by any call to a Key comparison operator. For example: bool operator== (const Key &x, const Key &y) { Key::comparisons++; return x.the_key( ) == y.the_key( ); } ◆Static data members must be defined and initialized outside of a class definition. Accordingly, the following statement is included in the Key implementation file key.c: int Key :: comparisons = 0; ◆Use the class Timer from Appendix C to provide CPU timing information.

15 Choice of test data ◆Most later searching methods require the data to be ordered, so use a list with integer keys in increasing order. ◆To test both successful and unsuccessful searches, insert only keys containing odd integers into the list. ◆If n denotes the number of entries in the list, then the targets for successful searches will be 1, 3, 5, …, 2n – 1 . ◆For unsuccessful searches, the targets will be 0, 2, 4, 6, …, 2n. ◆In this way we test all possible failures, including targets less than the smallest key in the list, between each pair, and greater than the largest.

16 7.3 Binary Search 二分查找/搜索 (折半查找/搜索) 1. Ordered Lists DEFINITION An ordered list is a list in which each entry contains a key, such that the keys are in order. That is, if entry i comes before entry j in the list, then the key of entry i is less than or equal to the key of entry j . ◆All List operations except insert and replace apply without modification to an ordered list. ◆Methods insert and replace must fail when they would otherwise disturb the order of a list.

17 ◆We implement an ordered list as a class derived from a contiguous List. In this derived class, we shall override the methods insert and replace with new implementations. 重载的插入方法 覆盖父类的插入方法 该查找方法的 应用条件 typedef Key Record; class Ordered_list: public List<Record> { public: Ordered_list( ); Error_code insert(const Record &data); Error_code insert(int position, const Record &data); Error_code replace(int position, const Record &data); };

18 带两个参数的重载父类的插入方法,请看 Pg.280
Error_code Ordered_list::insert(const Record &data) { // 按要插入的data.key值插入到合适的位置,以保持有序 for(int i=0; i<size() && data >= entry[i]; i++); return List<Record>::insert(i, data); } 带两个参数的重载父类的插入方法,请看 Pg.280 Note: The overridden method replaces a method of the base class by one with a matching name and parameter list. The overloaded method matches in name but has a different parameter list.

19 2. Algorithm Development
Idea: In searching an ordered list, first compare the target to the key in the center of the list. If it is smaller, restrict the search to the left half; otherwise restrict the search to the right half, and repeat. In this way, at each step we reduce the length of the list to be searched by half. Keep: two indices, top and bottom, that will bracket the part of the list still to be searched. The target key, provided it is present, will be found between the indices bottom and top, inclusive. Initialization: Set bottom = 0; top = the list.size( ) - 1;

20 Compare target with the Record at the midpoint,
Change the appropriate index top or bottom to restrict the search to the appropriate half of the list. Loop terminates when top ≤ bottom, if it has not terminated earlier by finding the target. Make progress toward termination by ensuring that the number of items remaining to be searched, top - bottom + 1, strictly decreases at each iteration of the process.

21 2. Non-recursive Version Pg.283 - Pg.285
Please see: 1. recursive Version Pg.282 2. Non-recursive Version Pg Pg.285 下面我们先给出有序表类的两个方法的实现算法,然后给出顺序查找与非递归二分查找算法 (Pg.285 binary_search_2)及测试上述算法的主程序:

22 Error_code Ordered_list::insert(const Record &data)
{ // 按要插入的data.key值插入到合适的位置,以保持有序 for(int i=0; i<size( ) && data >= entry[i]; i++); return List<Record>::insert(i, data); } Error_code Ordered_list::replace(int position, const Record &data) // overridden replace { // 为了保持有序,先删除序号position的记录,然后插入 Record list_data; if((remove(position, list_data))!=success) return fail; return insert(data);

23 Error_code sequential_search(Ordered_list& the_list,
const int &target, int &position) // 顺序查找算法 { for( position=0; position<the_list.size( ); position++) { Record x; the_list.retrieve(position, x); if(x==target) return success; } return not_present; // 脱离for,则检索不成功

24 { int bottom=0, top=the_list.size()-1; // 置区间初值 while(bottom<=top)
Error_code binary_search(Ordered_list& the_list, const int &target, int &position) // 二分查找算法 { int bottom=0, top=the_list.size()-1; // 置区间初值 while(bottom<=top) { int mid=(bottom+top)/2; Record x; the_list.retrieve(mid,x); if ( target==x ) // 检索成功 { position = mid; return success; } if (target>x) bottom=mid+1; // 修改下界 bottom 值 else top=mid-1; // 修改上界top值 } return not_present; // 脱离while,则 bottom>top 检索不成功 See Pg.285 binary_search_2

25 #include <stdlib.h>
#include <time.h> #include "Search.h" void main() { Ordered_list L; int n, i, j, pos; Record x; char answer; time_t t; srand((unsigned)time(&t)); //初始化随机数种子 cout<<" Please enter List Size=?(<=30) "; cin>>n; cout<<" Generate Random sequence is: \n{ "; for(i=0; i<n; ) // 产生n个互不相等的整数,插入表,使有序 { j = rand() % 1000; if (binary_search(L, j,pos)==not_present) // 产生的整数不在表中,插入 { x=j; L.insert(x); i++; cout<<j<<' '; } } cout<<"}\n Ordered_list L is:\n { "; L.traverse(visit);

26 cout<<"}\n"; do { cout<<"\n Please enter search key : "; cin>>i; Key::comparisons = 0; if(binary_search(L, i,pos)==success) // 用二分查找 cout<<"\n Recorg Position ="<<pos<<"\t binary_search comparisons="<<Key::comparisons; else cout<<"\n Recorg not present"<<"\t binary_search if(sequential_search(L, i,pos)==success) // 用顺序查找 cout<<"\n Recorg Position ="<<pos<<"\t sequential_search comparisons="<<pos+1; else cout<<"\n Recorg not present"<<"\t sequential_search comparisons="<<pos; cout<<"\n\n Continue search (y/n) ? "; cin>>answer; }while(answer=='y' || answer=='Y');

27 上面有两个do循环,第1个:每次输入一个查找关键码,分别用两种方法查找,输出查找结果及比较次数。
do{ cout<<"\n Please enter replaced position key : "; cin>>i>>j; x=j; if(L.replace(i,x)==success) { cout<<" Ordered_list L is:\n { "; L.traverse(visit); cout<<"}\n"; } else cout<<" fail: position error!\n"; cout<<"\n\n Continue replace (y/n) ? "; cin>>answer; }while(answer=='y' || answer=='Y'); cout<<endl; } 上面有两个do循环,第1个:每次输入一个查找关键码,分别用两种方法查找,输出查找结果及比较次数。 第2个每次输入一对值,分别是欲修改记录的位置及要重新设置的记录值。

28 Algorithm Verification
◆ The division of the list into sublists is described in the following diagram: ◆ Only entries strictly less than target appear in the first part of the list, but the last part contains entries greater than or equal to target. ◆When the middle part of the list is reduced to size 1, it will be guaranteed to be the first occurrence of the target if it appears more than once in the list.

29 Recognizing Equality in Binary Search
◆We must prove carefully that the search makes progress towards termination. This requires checking the calculations with indices to make sure that the size of the remaining sublist strictly decreases at each iteration. It is also necessary to check that the comparison of keys corresponds to the division into sublists in the above diagram. Recognizing Equality in Binary Search ◆Method: Check at each stage to see if the target has been found.

30 Example: Search process of 15 and 60
Search 15 && 60 step 1 Search 15 step success

31 Search 60 step 2 Search 60 step 3 Search 60 step not_present

32 7.4 Comparison Trees The comparison tree (also called decision tree or search tree) of an algorithm is obtained by tracing the action of the algorithm, representing each comparison of keys by a vertex of the tree (which we draw as a circle). Inside the circle we put the index of the key against which we are comparing the target key. 由于本书这一部分内容比较晦涩,我们先用中文教材中的方式讲解,之后再解释本书内容。

33 中文书通常称为:折半查找判定树 n=10的判定树
◆The comparison ( decision/search ) tree of Example 中文书通常称为:折半查找判定树 查找成功,关键字 x == List [7] . key 查找失败 查找关键字x 介于 List [1] . key和 List [2] . key之间 n=10的判定树

34 在判定树中所有叶结点均为方形结点,称为判定树的外部结点(称圆形结点为内部结点),那么查找不成功的过程就是走了一条从根结点到外部结点的路径,和关键字进行比较的次数等于该路径的内部结点个数,例如查找 60 的过程是走了从根到结点 8-9 的路径。 internal vertices external vertices n个内部结点构成的判定树,其高度不会大于log2n ,因此,二分查找在查找不成功时和关键字比较的次数最多不超过 log2n +1(外部结点)。

35 平均查找长度:为了确定数据元素在查找表中的位置,需要和给定值进行比较的关键字个数的数学期望值,称为查找算法在查找成功时的平均查找长度。对于长度为n的查找表,查找成功的平均查找长度为:
其中pi为查找第i个数据元素的概率,ci为找到查找表中第i个元素时,进行比较的次数。 Average Search Length

36 那么,二分查找的平均查找长度是多少呢? 假定表的长度为 n=2h-1,则描述二分查找的判定树是深度(高度)为h的满二叉树。树中层次为1的结点有1个,层次为2的结点有2个,…,层次为h的结点有2h-1个。又假设表中每个元素的查找概率相等(Pi=1/n),则二分查找的平均查找长度为: height level

37 ◆Branches (lines) drawn down from the circle represent the possible outcomes of the comparison. When the algorithm terminates, we put either F (for failure) or the location where the target is found at the end of the appropriate branch, which we call a leaf, and draw as a square. ◆The remaining vertices are called the internal vertices of the tree. The number of comparisons done by an algorithm in a particular search is the number of internal vertices traversed in going from the top of the tree, called its root, down the appropriate path to a leaf. ◆The number of branches traversed to reach a vertex from the root is called the level of the vertex. Thus the

38 root itself has level 0, the vertices immediately below it have level 1, and so on. The largest level that occurs is called the height of the tree. ◆We call the vertices immediately below a vertex v the children of v and the vertex immediately above v the parent of v. ◆The external path length of a tree is the sum of the number of branches traversed in going from the root once to every leaf in the tree. ◆The internal path length is defined to be the sum, over all vertices that are not leaves, of the number of branches from the root to the vertex.

39 Lemma 7.1 The number of vertices on each level of a 2-
◆ As 2-tree is a tree in which every vertex except the leaves has exactly two children. 严格二叉树 Lemma 7.1 The number of vertices on each level of a 2- tree is at most twice the number on the level immediately above. Hence, in a 2-tree, the number of vertices on level t is at most 2t for t ≥ 0. Lemma 7.2 If a 2-tree has k vertices on level t , then t ≥ lg k, where lg denotes a logarithm with base 2.

40 Unless stated otherwise, all logarithms will be taken with base 2.
Conventions Unless stated otherwise, all logarithms will be taken with base 2. The symbol lg denotes a logarithm with base 2, and the symbol ln denotes a natural logarithm. When the base for logarithms is not specified (or is not important), then the symbol log will be used. The floor of a real number x is the largest integer less than or equal to x, and the ceiling of x is the smallest integer greater than or equal to x. We denote the floor of x by and the ceiling of x by . 约定

41 注意:本书与中文教材在这儿相差一个常数因子2。
Binary Search Analysis The number of comparisons done in an unsuccessful search by binary_search_2 is approximately 2 lg(n+ 1). In a successful search of a list of n entries, binary search does approximately comparisons of keys. 注意:本书与中文教材在这儿相差一个常数因子2。 本书是精确的用和“关键字比较次数”来计算的,而 中文教材用 “进行了比较的关键字的个数”来计算的。 (每进行一个关键字比较,若查找成功则比了一次,否则还要 比一次,以确定待查字到底大于当前关键字否,并依此选择 继续在前半段查找还是后半段查找。)

42 7.5 Lower Bounds See Pg 详细分析了二分查找的算法复杂度,结论:从理论上说binary_search_ 1(Pg.281 )的上界最小。 以10个结点的二分查找树( Pg Fig.7.2-Fig.7.4 )为例,介绍分析算法复杂度的方法。

43 The Big-O and Other Notations
7.6 Asymptotics: The Big-O and Other Notations ◆算法的复杂性是算法运行所需要的计算机资源的量,需要的时间资源的量称作时间复杂性,需要的空间(即存储器)资源的量称作空间复杂性。这个量应该集中反映算法中所采用的方法的效率,而从运行该算法的实际计算机中抽象出来。换句话说,这个量应该是只依赖于算法要解的问题的规模、算法的输入和算法本身的函数。 如果分别用N、I和A来表示算法要解问题的规模、算法的输入和算法本身,用C表示算法的复杂性,那么应该有:C =F(N, I, A) 其中F(N, I, A)是 N, I, A的一个确定的三元函数。

44 ◆如果把时间复杂性和空间复杂性分开,并分别用
T 和 S来表示,那么应该有: T = T( N, I, A ) 和 S = S( N, I, A ) ◆通常,我们让A隐含在复杂性函数名当中,因而将 T 和S 分别简写为: T = T( N, I ) 和 S = S( N, I ) ◆由于时间复杂性和空间复杂性概念类同,计算方法相似,且空间复杂性分析相对地简单些,所以我们将主要地讨论时间复杂性。

45 ◆根据T(N, I )的概念,它应该是算法在一台抽象的计算机上运行所需的时间。设此抽象的计算机所提供的元运算有k 种,他们分别记为 O1,O2 , … ,Ok;再设这些元运算每执行一次所需要的时间分别为 t1, t2, … , tk 。对于给定的算法A,设经过统计,用到元运算Oi的次数为ei,i=1,2,..,k ,很明显,对于每一个i,1≦i≦k,ei是N 和I 的函数,即ei=ei(N,I)。则: 其中ti,i=1,2,..,k,是与N,I无关的常数。 显然,我们不可能对规模N的每一种合法的输入I都去统计ei(N,I),i=1,2,…,k。因此T(N,I)的表达式还得进一步简化,或者说,我们只能在规模为N的某些或某类有代表性的合法输入中统计相应的ei , i=1,2,…,k,评价时间复杂性。

46 下面我们只考虑三种情况的复杂性,即最坏情况、最好情况和平均情况下的时间复杂性,并分别记为Tmax(N)、Tmin(N)和Tavg(N )。在数学上有:
其中,DN是规模为N的合法输入的集合;I*是DN中一个使T(N,I *)达到Tmax(N)的合法输入, 是DN中一个使T(N,) 达到Tmin(N)的合法输入;而P(I)是在算法的应用中出现输入I 的概率。

47 复杂性的渐近性态及其阶 设T(N)是在前面所定义的关于算法A的复杂性函数。一般说来,当N单调增加且趋于∞时,T(N)也将单调增并趋于∞。对于T(N),如果存在 ,使得当N→∞时有: 那么,我们就说 是 T(N) 当 N→∞时的渐近性态,或叫 为算法A当N→∞时的 渐近复杂性而与T(N)相区别,因为在数学上, 是T(N)当N→∞时的渐近(asymptotic)表达式。 直观上, 是T(N)中略去低阶项所留下的主项。所以它无疑比T(N)来得简单。

48 渐近意义下的记号(asymptotic notation):
例: 当T(N)=3N2 + 4Nlog2N + 7 时, = 3N2,因为这时有: 显然, 3N2 比 3N2 + 4Nlog2N + 7 简单得多。 渐近意义下的记号(asymptotic notation): The Big-O and Related Notations

49 ◆大写Ο符号 定义 设f(N)和g(N)是定义在正数集上的正函数。如果存在正的常数C和自然数N0,使得当N ≥N0时有f(N)≤Cg(N)。则称函数f(N)当N充分大时上有界,且g(N)是它的一个上界,记为f(N)=Ο(g(N))。这时我们还说f(N)的阶不高于g(N)的阶。 > ◆小写o 符号 定义 设f(N)和g(N)是定义在正数集上的正函数。如果对于任意给定的ε> 0,都存在正的常数C和自然数N0,使得当N ≥N0时有 f(N)/g(N) <ε。则称函数f(N)当N充分大时的阶比g(N) 低, 记为 f(N) = o((g(N))。

50 ◆ 符号(与大Ο符号类似,它用来估算函数f的下限值。)
◆ 符号(与大Ο符号类似,它用来估算函数f的下限值。) 定义 设f(N)和g(N)是定义在正数集上的正函数。如果存在正的常数C和自然数N0,使得当N ≥N0时有f(N) ≥ Cg(N)。则称函数f(N)当N充分大时下有界,且g(N)是它的一个下界,记为f(N)= (g(N))。这时我们还说 f(N)的阶不低于g(N)的阶。 ◆θ符号 (同一函数 g 既是 f 的上限也是 f 的下限。) 定义 f(N) =θ(g(N))当且仅当 f(N)=Ο(g(N))且 f(N) = (g(N))。 一般,在算法分析中常常只用Ο符号,即只关心算法A的时间复杂性f(N)的上界(最坏情况) 。下面我们就举例说明Ο符号的使用。

51 See Pg. 310 举例:(略去低阶项和常数因子,留下主项,记为Ο ) ⑴因为对所有N ≥1有3N ≤4N ,我们说3N =Ο(N );
因为若不然,则存在正的常数C和自然数N0,使得当N ≥N0时 有N3≤C N 2,即N≤C 。显然当取N = max(N0,[C]+l)时这个不等式不成立,所以N3≠Ο(N 2)。 See Pg. 310

52 Pointers and Pitfalls 仔细检查 ◆In designing algorithms be very careful of the extreme cases, such as empty lists, lists with only one item, or full lists (in the contiguous case). ◆Be sure that all your variables are properly initialized. ◆Double check the termination conditions for your loops, and make sure that progress toward termination always occurs.

53 ◆In case of difficulty, formulate statements that will be correct both before and after each iteration of a loop, and verify that they hold. ◆Avoid sophistication for sophistication's sake. Whenever a simple method is adequate for your application, use it. ◆Don't reinvent the wheel. If a ready-made function is adequate for your application, use it.

54 ◆Sequential search is slow but robust
◆Sequential search is slow but robust. Use it for short lists or if there is any doubt that the keys in the list are properly ordered. ◆Be extremely careful if you must reprogram binary search. Verify that your algorithm is correct and test it on all the extreme cases. ◆Drawing trees is an excellent way both to trace the action of an algorithm and to analyze its behavior. ◆Rely on the big-O analysis of algorithms for large applications but not for small applications.


Download ppt "1. Introduction, Notation"

Similar presentations


Ads by Google