数据结构第4章串.

Slides:

Advertisements

Similar presentations

動畫與遊戲設計 Data structure and artificial intelligent

Advertisements

Memory Pool ACM Yanqing Peng.

TQC+ 物件導向程式認證-JAVA.

Performance Evaluation

資料庫設計 Database Design.

補充: Input from a text file

第8章字串與陣列 8-1 字串處理 8-2 一維陣列的處理 8-3 建立多維陣列 8-4 不規則陣列與參數傳遞 8-5 陣列排序與搜尋.

Chapter 7 Search.

Tree（樹）什麼是「樹」？「樹」的範例「樹」的定義「樹」的表示法.

Linked List（串列） Why Linked List? Pointer Dynamic Allocation

第4章串串的基本概念和C语言的串函数串的存储结构动态数组实现的顺序串串的模式匹配算法——BF算法主要知识点.

数据结构 Data Structure 主讲人：王国军，郑瑾中南大学中南大学信息院计科系

Chapter 4 歸納(Induction)與遞迴(Recursion)

Zn 模式匹配与KMP算法 Zn

佇列 (Queue).

Derived Class 前言衍生類別的定義單一繼承 public, protected, 和 privated 基底類別

Chap 3 堆疊與佇列 Stack and Queue.

Chapter 6 Graph Chang Chi-Chung

刘胥影东南大学计算机学院面向对象程序设计1 2011~2012第3学期刘胥影东南大学计算机学院.

程序设计II 第三讲字符串处理.

C 程式設計— 語言簡介台大資訊工程學系資訊系統訓練班.

Last Lecture Revisited

1 巨集 2 資料型態 3 物件、屬性、方法與事件 4 陳述式與副函式 5 其他注意事項 6 範例

·线性表的定义及ADT ·线性表的顺序存储结构 ·线性表的链接存储结构 · 单向循环链表 · 双链表、双向循环链表 · 一元多项式的加法

JAVA程序设计第5章深入理解JAVA语言----补充.

C++ 程式設計— 語言簡介台大資訊工程學系資訊系統訓練班.

导入 STL的概念与组成 Iterator（迭代器） Container（容器） Algorithm（算法） Adaptors（配接器）

授课老师：龚涛信息科学与技术学院 2018年3月教材：《Visual C++程序员成长攻略》《C++ Builder程序员成长攻略》

第三章 C++中的C 面向对象程序设计(C++).

Lecture 1 STL Primer.

【STL標準樣版函式庫】 STL（Standard Template Library），即標準樣版函式庫，是一個具有工業標準、高效率的C++函式庫。它包含於C++標準函式庫（C++ Standard Library）中，是ANSI/ISO C++標準中，最新的、也是極具革命性的一部分。STL包含了諸多在電腦科學領域裏所常用的基本資料結構和基本演算法。為廣大C++程式師們提供了一個可擴展的應用框架，高度實現了軟體的可複用性。這種現象有些類似於Microsoft.

第5章堆疊（Stacks） 5-1 堆疊的基礎 5-2 堆疊的表示法 5-3 堆疊的應用 - 運算式的計算與轉換

第3章變數、資料型別與運算子 3-1 變數與資料型別的基礎 3-2 變數的命名與宣告 3-3 資料型別 3-4 運算式與運算子

重點資料結構之選定會影響演算法選擇對的資料結構讓您上天堂程式.

数据结构 Ch.4 串计算机学院肖明军

五、链表 1、单链表 2、双向链表 3、链表应用.

4.1 串的抽象数据类型的定义 4.2 串的表示和实现 4.3 串的模式匹配算法.

String Matching Michael Tsai 2012/06/04.

第四章串 4.1 串类型定义 4.2 串的表示和实现 4.3 串的模式匹配算法 4.4 串操作应用举例.

樹 2 Michael Tsai 2013/3/26.

第三章链表单链表 (Singly Linked List) 循环链表 (Circular List) 多项式及其相加

Chapter 5 Recursion.

Classes (2) Lecture 7.

第五章串和数组 5.1 串的定义和操作 5.2 串的表示和实现 5.3 字符串应用 5.4 字符串匹配算法 5.5 数组

第三章链表单链表循环链表多项式及其相加双向链表稀疏矩阵.

常宝宝北京大学计算机科学与技术系数据结构（三）常宝宝北京大学计算机科学与技术系

第五章递归与广义表递归的概念递归过程与递归工作栈递归与回溯广义表.

在Microsoft Access 下建立資料庫

Speaker: Liu Yu-Jiun Date: 2009/4/29

第二章数组作为抽象数据类型的数组顺序表多项式抽象数据类型稀疏矩阵字符串小结.

第五章串和数组 5.1 串的定义和操作 5.2 串的表示和实现 5.3 字符串应用 5.4 字符串匹配算法 5.5 数组

String Matching Michael Tsai 2013/05/28.

Inheritance -II.

字符串 (String) 字符串是 n (  0 ) 个字符的有限序列，记作 S = “c1c2c3…cn” 其中，S 是串名字

计算机问题求解 – 论题1-5 - 数据与数据结构 2018年10月16日.

第六讲栈和队列（一）.

第 8 章标准模板库STL 陈哲副教授南京航空航天大学计算机科学与技术学院.

第 3 章类的基础部分陈哲副教授南京航空航天大学计算机科学与技术学院.

第六章类属B树索引技术对基于树的索引方法给出一种通用算法。该算法是建立在类属B树的概念之上开发的。它将类型系统开放，使系统能支持用户自定义的数据类型、函数和某些特殊的查询谓词的集合。并且，将新的数据类型、函数、查询谓词等登记到数据库管理系统中，

#include <iostream.h>

進階 WWW 程式設計 -- PHP Array 靜宜大學資訊管理學系蔡奇偉副教授

2012 程式設計比賽 Openfind 天使帝國 v2.0 (蓋亞的紋章).

Oop7 字串 String.

Introduction to Computer Security and Cryptography

第六章复合数据类型指针的声明与使用数组的声明与使用指针与数组的相互引用字符串及相关库函数 new与delete

When using opening and closing presentation slides, use the masterbrand logo at the correct size and in the right position. This slide meets both needs.

Presentation transcript:

数据结构第4章串

第4章串 4.1 串类型的定义 4.2 串的表示和实现 4.3 串的模式匹配算法 4.4 串操作应用举例

4.1 串类型的定义 In mathematical logic, more precisely in the theory of formal languages, and in computer science, a string is a sequence of symbols that are chosen from a set or alphabet In computer programming, a string is, essentially, a sequence of characters. A string is generally understood as a data type storing a sequence of data values, usually bytes, in which elements usually stand for characters according to a character encoding, which differentiates it from the more general array data type. In this context, the terms binary string and byte string are used to suggest strings in which the stored data does not (necessarily) represent text. http://en.wikipedia.org/wiki/String_(computer_science)

4.1 串类型的定义串(string)是由零个或多个字符组成的有限序列串中字符的数目称为串的长度零个字符的串称为空串(null string)，其长度为0 串中任意连续的字符组成的子序列称为该串的子串称字符在序列中的序号为该字符在串中的位置

4.1 串类型的定义串的抽象数据类型的定义如下： ADT String { 数据对象： D＝{ ai |ai∈CharacterSet, i=1,2,...,n; n≥0 } 数据关系： R1＝{ < ai-1, ai > | ai-1, ai ∈D, i=2,...,n } ...

4.1 串类型的定义基本操作： StrAssign (&T, chars) DestroyString(&S) StrCopy (&T, S) StrLength(S) StrCompare (S, T) Concat (&T, S1, S2) StrEmpty (S)

4.1 串类型的定义 } ADT String SubString (&Sub, S, pos, len) ClearString (&S) Index (S, T, pos) Replace (&S, T, V) StrInsert (&S, pos, T) StrDelete (&S, pos, len) } ADT String

4.1 串类型的定义 StrAssign (&T, chars) 初始条件：chars 是字符串常量。操作结果：把 chars 赋为 T 的值。

4.1 串类型的定义 StrCopy (&T, S) 初始条件：串 S 存在。操作结果：由串 S 复制得串 T。

4.1 串类型的定义 DestroyString (&S) 初始条件：串 S 存在。操作结果：串 S 被销毁。

4.1 串类型的定义 StrEmpty (S) 初始条件：串S存在。操作结果：若 S 为空串，则返回TRUE，否则返回 FALSE。 “”表示空串，空串的长度为零。

4.1 串类型的定义 StrCompare (S, T) 初始条件：串 S 和 T 存在。操作结果：若S > T，则返回值 > 0；若S < T，则返回值 < 0；若S = T，则返回值 = 0。例如： StrCompare( “data”, “state”) < 0 StrCompare( “cat”, “case”) > 0

4.1 串类型的定义 StrLength (S) 初始条件：串 S 存在。操作结果：返回 S 的元素个数，称为串的长度。

4.1 串类型的定义 Concat (&T, S1, S2) 初始条件：串 S1 和 S2 存在。例如： Concate( T, “man”, “kind”) 求得 T = “mankind”

4.1 串类型的定义 SubString (&Sub, S, pos, len) 初始条件：串 S 存在，1≤pos≤StrLength(S)，且0≤len≤StrLength(S)-pos+1。操作结果：用 Sub 返回串 S 的第 pos 个字符起长度为 len 的子串。例如： SubString( sub, "commander", 4, 3) 求得 sub = "man" ; SubString( sub, "commander" , 1, 9) 求得 sub = "commander" SubString( sub, "commander", 9, 1) 求得 sub = "r" SubString( sub, "commander", 4, 7) sub = ? SubString( sub, "beijing", 7, 2) sub = ? SubString("student", 5, 0) = "" 长度为 0 的子串为“合法”串

4.1 串类型的定义 Index (S, T, pos) 初始条件：串S和T存在，T是非空串，1≤pos≤StrLength(S)。操作结果：若主串 S 中存在和串 T 值相同的子串, 则返回它在主串 S 中第pos个字符之后第一次出现的位置; 否则函数值为0。 “子串在主串中的位置”意指子串中的第一个字符在主串中的位序。假设 S = "abcaabcaaabc", T = "bca" Index(S, T, 1) = 2; Index(S, T, 3) = 6; Index(S, T, 8) = 0;

4.1 串类型的定义例如：假设 S = "abcaabcaaabca", T = "bca" Replace (&S, T, V) 初始条件：串S, T和 V 均已存在，且 T 是非空串。操作结果：用V替换主串S中出现的所有与（模式串）T 相等的不重叠的子串。例如：假设 S = "abcaabcaaabca", T = "bca" 若 V = "x", 则经置换后得到 S = "axaxaax" 若 V = "bc", 则经置换后得到 S = "abcabcaabc"

4.1 串类型的定义 StrInsert (&S, pos, T) 初始条件：串S和T存在， 1≤pos≤StrLength(S)＋1。例如： S = “chater ”，T = “rac ”，则执行 StrInsert(S, 4, T)之后得到 S = "character "

4.1 串类型的定义 StrDelete (&S, pos, len) 初始条件：串S存在 1≤pos≤StrLength(S)-len+1。操作结果：从串S中删除第pos个字符起长度为len的子串。

4.1 串类型的定义 ClearString (&S) 初始条件：串S存在。操作结果：将S清为空串。

4.1 串类型的定义对于串的基本操作集可以有不同的定义方法，在使用高级程序设计语言中的串类型时，应以该语言的参考手册为准。两个例子： MFC 中的 CStringT http://msdn.microsoft.com/en-us/library/5bzxfsea.aspx stl 中的 string http://msdn2.microsoft.com/en-us/library/hd5zecz6(VS.80).aspx

4.1 串类型的定义在上述抽象数据类型定义的操作中，五种操作构成串类型的最小操作子集。即：这些操作不可能利用其他串操作来实现，串赋值StrAssign 串比较StrCompare 求串长StrLength 串联接Concat 求子串SubString 五种操作构成串类型的最小操作子集。即：这些操作不可能利用其他串操作来实现，反之，其他串操作（除串清除ClearString和串销毁DestroyString外）可在这个最小操作子集上实现。

4.1 串类型的定义 i 例如，可利用串比较、求串长和求子串等操作实现定位函数Index( S, T, pos)。算法的基本思想为： StrCompare(SubString(S, i, StrLength(T)),T ) ? 0 i S 串 pos T 串 n-m+1 T 串

4.1 串类型的定义 int Index (String S, String T, int pos) { // T为非空串。若主串S中第pos个字符之后存在与 T相等的子串 // 则返回第一个这样的子串在S中的位置，否则返回0 if (pos > 0) { n = StrLength(S); m = StrLength(T); i = pos; while ( i <= n-m+1) { SubString (sub, S, i, m); if (StrCompare(sub,T) != 0) ++i ; else return i ; } // while } // if return 0; // S中不存在与T相等的子串 } // Index

4.1 串类型的定义又如串的置换函数 news 串 pos pos = i+m i pos S 串 sub n-pos+1 T 串 V 串

4.1 串类型的定义 void replace(String& S, String T, String V) { n=StrLength(S); m=StrLength(T); pos = 1; StrAssign(news, ""); i=1; while ( pos < n-m+1 && i) { i=Index(S, T, pos); if (i!=0) { SubString(sub, S, pos, i-pos); Concat(news, news, sub, V); pos = i+m; } SubString(sub, S, pos, n-pos+1); Concat( S, news, sub );

4.1 串类型的定义串的逻辑结构和线性表极为相似，区别仅在于串的数据对象约束为字符集。串的基本操作和线性表有很大差别：在线性表的基本操作中，大多以“单个元素”作为操作对象；在串的基本操作中，通常以“串的整体”作为操作对象。

4.2 串的表示和实现 4.2.1 串的定长顺序存储表示 4.2.2 串的堆分配存储表示 4.2.3 串的块链存储表示

4.2.1 串的定长顺序存储表示类似于线性表的顺序存储结构，可用一组地址连续的存储单元存储串值的字符序列。例如C和C++语言中串不是预定义的数据类型，而是以字符数组来表示串。如声明 char str[10]; 表明 str 是一个串变量。C语言中还规定了一个"串的结束标志 '\0'"（字符 '\0'称为空终结符），即数组中在该结束标志之前的字符是串变量的有效字符，但结束标志本身要占一个字符的空间，因此串变量 str 的值（字符序列）的实际长度可在这个定义范围内随意，但最大不能超过9。

4.2.1 串的定长顺序存储表示在这种表示方法下，实现串操作的基本操作是“字符序列的复制”。如下列算法为串的联接。 void Concat( char S1[ ], char S2[ ], char T[ ]){ 　　// 以T返回由S1和S2联接而成的新串　　j=0; k=0; 　　while ( S1[j] != '\0') T[k++] = S1[j++]; 　　// 复制串 S1 　　j = 0; 　　while ( S2[j] != '\0') T[k++] = S2[j++]; //紧接着复制串S2 　　T[k] = '\0';　　　　　　　 // 置结果串T的结束标志　} // Concat_Sq 实际实现时请注意变量T存储区的准备

4.2.1 串的定长顺序存储表示另外一种定长顺序存储表示： #define MAXSTRLEN 255 // 用户可在255以内定义最大串长 typedef unsigned char Sstring[MAXSTRLEN + 1]; // 0号单元存放串的长度

4.2.1 串的定长顺序存储表示对应的联接操作： Status Concat(SString S1, SString S2, SString &T) { // 用T返回由S1和S2联接而成的新串。 // 若未截断, 则返回TRUE，否则FALSE。 if (S1[0]+S2[0] <= MAXSTRLEN) {// 未截断 T[1..S1[0]] = S1[1..S1[0]]; T[S1[0]+1..S1[0]+S2[0]] = S2[1..S2[0]]; T[0] = S1[0]+S2[0]; uncut = TRUE; }else if (S1[0] <MAXSTRSIZE) { // 截断 ……

4.2.1 串的定长顺序存储表示对应的联接操作： …… T[1..S1[0]] = S1[1..S1[0]]; T[S1[0]+1..MAXSTRLEN] = S2[1..MAXSTRLEN－S1[0]]; T[0] = MAXSTRLEN; uncut = FALSE; }else { // 截断(仅取S1) T[0..MAXSTRLEN] = S1[0..MAXSTRLEN]; // T[0] == S1[0] == MAXSTRLEN } return uncut; } // Concat

4.2.2 串的堆分配存储表示串的堆分配存储的特点是，程序中出现的所有串变量的存储空间都是在程序执行过程中动态分配而得的。堆分配存储结构的串定义如下：　　typedef struct{ 　　　char *ch; 　　　int length; 　　} HString; 由于串仍然是以数组存储的字符序列表示，因此串的操作仍基于"字符序列的复制"实现。

4.2.2 串的堆分配存储表示 "插入"操作算法如下： bool StrInsert (Hstring& S, int pos, Hstring T){ // 若1≤pos≤StrLength(S)＋1,则改变串S,在串S的第pos个 // 字符之前插入串T，并返回TRUE,否则串S不变，并返回FALSE 　　if (pos < 1 || pos > S.length+1) 　　　return FALSE; 　　// 插入位置不合法　　char S1[S.length] ; // S1 作为辅助串空间用于暂存 S.ch 　　if (T.length){ 　// T 非空，则为S重新分配空间并插入 T …… 　　} // if 　　return TRUE; 　} // StrInsert

4.2.2 串的堆分配存储表示 if (T.length){ // T 非空，则为S重新分配空间并插入 T p=S.ch; i=0; 　while (i < S.length) 　　S1[i++] = *(p+i); 　　　　 // 暂存串S 　S.ch = new char[S.length + T.length ]; // 为S重新分配串值存储空间　for ( i=0, k=0; i<pos-1; i++) 　　S.ch[k++] = S1[i]; 　　　 // 保留插入位置之前的子串　j = 0; 　while (j<T.length) 　　S.ch[k++] = T.ch[j++]; 　　// 插入 T 　while ( i<S.length) 　　S.ch[k++] = S1[i++];　　　 // 复制插入位置之后的子串　S.length+=T.length; 　　　　// 置串 S 的长度 } // if

4.2.3 串的块链存储表示和线性表的链式存储结构类似，也可用链表来存储串值。但由于串的结构的特殊性，即串的数据元素是一个字符，它只有8位二进制数，因此用链表存储串值时通常采用的办法是在一个结点中存放多个字符（如下图所示），因此称它为"块链"存储表示。图中的 "结点大小" CHUNKSIZE=4。

4.2.3 串的块链存储表示 const CHUNKSIZE = 80; // 可由用户定义的块（结点）大小　　typedef struct Chunk {　　// 结点结构　　　char ch[CUNKSIZE]; 　　　struct Chunk *next; 　　} Chunk; 　　typedef struct { 　　　　　// 串的链表结构　　　Chunk *head, *tail; 　　// 串的头指针和尾指针　　　int curlen; 　　　　　　　 // 串的当前长度　　} LString;

4.3 串的模式匹配算法在正文串中，查询有没有和一个"给定的串"相同的子串，返回这个子串的起始位置。这个操作即为串的定位操作，通常称为正文模式匹配。若 S =“concatenation”， T =“cat”，则称主串S中存在和模式串T相同的子串，起始位置为4，即 Index(S,T,1)=4 。

4.3.1 串的模式匹配的简单算法将主串S中某个位置i起始的子串和模式串T相比较。即从 j=0 起比较 S[i+j] 与 T[j]，若相等，则在主串 S 中存在以 i 为起始位置匹配成功的可能性，继续往后探索( j逐步增1 )，直至T串中最后一个字符比较相等为止，否则改从主串的下一个字符起重新开始进行下一轮的"匹配"，即将串T向后滑动一位，即 i 增1，而 j 退回至0，重新开始新一轮的匹配。

4.3.1 串的模式匹配的简单算法 int Index_BF ( char S [ ], char T [ ], int pos ){ // 若串 S 中从第pos(1≤pos≤StrLength(S))个字符起存在和 // 串 T 相同的子串，则称匹配成功，返回第一个这样的子串 // 在串 S 中的位置，否则返回 0 　　i = pos-1; j = 0; 　　while ( S[i+j] != '\0'&& T[j] != '\0') 　　　if ( S[i+j] == T[j] ) j++; 　　// 继续比较后一字符　　　else { i ++; j = 0; } 　　　　　// 重新开始新的一轮匹配　　if ( T[j] == '\0') return (i+1); 　// 匹配成功　　else return 0; 　// 串S中(第pos个字符起)不存在和串T相同的子串　} // Index_BF 容易看出，此算法的时间复杂度为O(m * n)，其中 m 和 n 分别为S串和T串的长度。

4.3.1 串的模式匹配的简单算法 int Index_BF1 ( char S [ ], char T [ ], int pos ) { 　// 若串 S 中从第pos(1≤pos≤StrLength(S))个字符起存在　// 和串 T 相同的子串，则称匹配成功，返回第一个这样的子串　// 在串 S 中的位置，否则返回 0 　　i = pos-1; j = 0; 　　while ( S[i] != '\0'&& T[j] != '\0') 　　　if ( S[i] == T[j] ) 　　　　{ i++; j ++; } 　　　　　　　// 继续比较后一字符　　　else { i = i-j+1; j = 0; } 　　　// 重新开始新的一轮匹配　　if ( T[j] == '\0') return (i-j); 　　　// 匹配成功　　else return 0; 　 // 串S中(第pos个字符起)不存在和串T相同的子串　} // Index_BF1

4.3.2 串的模式匹配的改进算法 ababcabcacbab abc a abcac 简单算法的匹配过程：从主串ababcabcacbab 中找模式串 abcac ababcabcacbab abc a abcac

4.3.2 串的模式匹配的改进算法 ababcabcacbab abc abcac 改进：（KMP算法）改进算法的匹配过程：从主串 ababcabcacbab 中找模式串 abcac ababcabcacbab abc abcac 改进：（KMP算法）当匹配过程中出现字符不等时，无需回溯主串指针，而是利用已经得到的部分匹配信息，将模式串向右滑动尽可能远的一段距离后，继续进行比较。

4.3.2 串的模式匹配的改进算法前提是：‘p1p2…pk-1’ == ‘pj-k+1pj-k+2…pj-1’，pk!=pj KMP(D.E.Knuth, V.R.Pratt, J.H.Morris) 算法 SS...SS...SSSS...SSSSS..SSSS. PP...PPP....PPPP..PPP.. PPP.....PPP..P.... 主串 i 匹配过程中的模式串 Si与Pj为首次出现的不匹配 j 1 j-k+1 k 1 主串不回溯，假设模式串可以右移到位置k处再开始比较Si=?=Pk 前提是：‘p1p2…pk-1’ == ‘pj-k+1pj-k+2…pj-1’，pk!=pj

4.3.2 串的模式匹配的改进算法令 next( j )=k 为模式串中第j个与主串对应位置失配时，在模式中需要重新和主串中该字符进行比较的字符位置，它可以这样定义： next[j]==0意味着主串的下一个位置要和模式串的第一个位置开始进行比较

4.3.2 串的模式匹配的改进算法 int Index_KMP( SString S, SString T, int pos){ 假设 next 的值已知，可以用来实现匹配算法 int Index_KMP( SString S, SString T, int pos){ i = pos; j=1; while( i<=S[0] && j<=T[0]){ if( j==0 || S[i]==T[j]) { i++; j++} else j = next[j]; } if( j>T[0]) return i-T[0]; else return 0;

4.3.2 串的模式匹配的改进算法求 next 函数的过程是一个递推的过程：首先由定义得 next[1]=0; 假设已知 next[j]=k( i.e. p1…pk-1 = pj-k+1…pj-1) 若：p[j] = p[k]，则显然有 next[j+1]=k+1；若：p[j] !=p[k]，则令 k’ = next[k]，检查p[j]=?=p[k’] 若相等，则next[j+1] = next[k]+1 = k’+1 若不等，令k=k’，继续 ...... j k’ k

4.3.2 串的模式匹配的改进算法 void get_next( SString T, int next[]) { 　　// 求模式串T的next函数值并存入数组 next。　　i = 1; next[1] = 0; j = 0; 　　while ( i<T[0] ) { 　　 if ( j == 0 || T[i] == T[j]){ 　　　 ++i; ++j; next[i] = j; } 　　else j = next[j]; 　　} 　} // get_next

但是还有一种特殊情况需要考虑，　　例如： S = 'aaabaaabaaabaaabaaab' T = ‘aaaab’，next[]={0,1,2,3,4} aaaab

4.3.2 串的模式匹配的改进算法 S = 'aaabaaabaaabaaabaaab' T = 'aaaab' 　　此时因为T串的next函数值依次为0,1,2,3,4, 虽然匹配过程中指针 i 没有回溯，但对i=3、7、11和15重复进行了多次的'b'是否等于'a'的操作。换句话说，如果 next[j]=k，那么此时应该看一下T[j]是否等于 T[k]，如果不等，则 next[j]=k，否则next[j]就应该取值next[k]。例如上述T串的 next 函数值依次为0,0,0,0,4。由此改写求模式串的 next 函数值的算法如下。

4.3.2 串的模式匹配的改进算法 void get_nextval( SString T[], int nextval[]) { 　　// 求模式串T的next函数修正值并存入数组 nextval。　　i=1; j = 0; next[1] = 0; 　　while ( i<T[0] ) { 　　　if (j == 0 || T[i] == T[j]) { 　　　　++i; ++j; 　　　　if (T[i]!=T[j]) nextval[i] = j; 　　　　else nextval[i] = nextval[j]; 　　　}else j = nextval[j]; 　　}　// while 　}　// get_nextval

4.3.2 串的模式匹配的改进算法虽然简单的匹配算法的时间复杂度是O(n*m)，但一般情况下实际的执行时间近似于O(n+m)

KMP from wiki Worked example of the search algorithm 1 2 1 2 m: 01234567890123456789012 S: ABC ABCDAB ABCDABCDABDE W: ABCDABD i: 0123456

KMP from wiki Worked example of the search algorithm 1 2 1 2 m: 01234567890123456789012 S: ABC ABCDAB ABCDABCDABDE W: ABCDABD i: 0123456

KMP from wiki Worked example of the search algorithm 1 2 1 2 m: 01234567890123456789012 S: ABC ABCDAB ABCDABCDABDE W: ABCDABD i: 0123456

KMP from wiki Worked example of the search algorithm 1 2 1 2 m: 01234567890123456789012 S: ABC ABCDAB ABCDABCDABDE W: ABCDABD i: 0123456

KMP from wiki Worked example of the search algorithm 1 2 1 2 m: 01234567890123456789012 S: ABC ABCDAB ABCDABCDABDE W: ABCDABD i: 0123456

KMP from wiki algorithm kmp_search: input: an array of characters, S (the text to be searched) an array of characters, W (the word sought) output: an integer (the zero-based position in S at which W is found) define variables: an integer, m ← 0 (the beginning of the current match in S) an integer, i ← 0 (the position of the current character in W) an array of integers, T (the table, computed elsewhere) while m + i is less than the length of S, do: if W[i] = S[m + i], let i ← i + 1 if i equals the length of W, return m otherwise, let m ← m + i - T[i], if T[i] is greater than -1, let i ← T[i] else let i ← 0 (if we reach here, we have searched all of S unsuccessfully) return the length of S

KMP from wiki algorithm kmp_search: input: an array of characters, S (the text to be searched) an array of characters, W (the word sought) output: an integer (the zero-based position in S at which W is found) define variables: an integer, m ← 0 (the beginning of the current match in S) an integer, i ← 0 (the position of the current character in W) an array of integers, T (the table, computed elsewhere)

KMP from wiki while m + i is less than the length of S, do: if W[i] = S[m + i], let i ← i + 1 if i equals the length of W, return m otherwise, let m ← m + i - T[i], if T[i] is greater than -1, let i ← T[i] else let i ← 0 (if we reach here, we have searched all of S unsuccessfully) return the length of S

KMP from wiki algorithm kmp_table: input: an array of characters, W (the word to be analyzed) an array of integers, T (the table to be filled) output: nothing (but during operation, it populates the table) define variables: an integer, pos ← 2 (the current position we are computing in T) an integer, cnd ← 0 (the zero-based index in W of the next character of the current candidate substring) (the first few values are fixed but different from what the algorithm might suggest) let T[0] ← -1, T[1] ← 0 while pos is less than the length of W, do: (first case: the substring continues) if W[pos - 1] = W[cnd], let T[pos] ← cnd + 1, pos ← pos + 1, cnd ← cnd + 1 (second case: it doesn't, but we can fall back) otherwise, if cnd > 0, let cnd ← T[cnd] (third case: we have run out of candidates. Note cnd = 0) otherwise, let T[pos] ← 0, pos ← pos + 1

KMP from wiki algorithm kmp_table: input: an array of characters, W (the word to be analyzed) an array of integers, T (the table to be filled) output: nothing (but during operation, it populates the table) define variables: an integer, pos ← 2 (the current position we are computing in T) an integer, cnd ← 0 (the zero-based index in W of the next character of the current candidate substring) (the first few values are fixed but different from what the algorithm might suggest) let T[0] ← -1, T[1] ← 0

KMP from wiki while pos is less than the length of W, do: (first case: the substring continues) if W[pos - 1] = W[cnd], let T[pos] ← cnd + 1, pos ← pos + 1, cnd ← cnd + 1 (second case: it doesn't, but we can fall back) otherwise, if cnd > 0, let cnd ← T[cnd] (third case: we have run out of candidates. Note cnd = 0) otherwise, let T[pos] ← 0, pos ← pos + 1

4.4 串操作应用举例正文编辑程序是一个面向用户的系统服务程序，广泛用于源程序的输入和修改，甚至用于报刊和书籍的编辑排版以及办公室的公文书信的起草和润色等。正文编辑的实质是修改字符数据的形式或格式。无论是 Microsoft word 还是 WPS，其工作的基础原理都是正文编辑。虽然各种正文编辑程序的功能强弱不同，但其基本功能大致相同，一般都包括串的查找、插入、删除和修改等基本操作。

string in STL #include <string> String objects are a special type of container, specifically designed to operate with sequences of characters. Unlike traditional c-strings, which are mere sequences of characters in a memory array, C++ string objects belong to a class with many built-in features to operate with strings in a more intuitive way and with some additional useful features common to C++ containers. The string class is an instantiation of the basic_string class template, defined in <string> as: typedef basic_string<char> string; From: http://www.cplusplus.com/reference/string/string/

string in STL A set of global functions provide some additional functionality for strings to interact either with other string objects or with objects of other types, mainly through the overloading of operators: operator+ Add strings (function) swap Swap contents of two strings (function) comparison operators String comparison operators (function, ==, !=, >, >=, <, <=)

string in STL The header also declares some functions that extend the functionality of streams (iostream library) to string objects: getline Get line from stream (function) operator<< Insert string into stream (function) operator>> Extract string from istream (function)

string in STL Member functions (constructor) Construct string object (constructor member) operator= String assignment (public member function) Iterators: begin Return iterator to beginning (public member function) end Return iterator to end (public member function) rbegin Return reverse iterator to reverse beginning (public member function) rend Return reverse iterator to reverse end (public member function)

string in STL Capacity: size Return length of string (public member function) length Return length of string (public member function) max_size Return maximum size of string (public member function) resize Resize string (public member function) capacity Return size of allocated storage (public member function) reserve Request a change in capacity (public member function) clear Clear string (public member function) empty Test if string is empty (public member function)

string in STL Element access: operator[] Get character in string (public member function) at Get character in string (public member function)

string in STL Modifiers: operator+= Append to string (public member function) append Append to string (public member function) push_back Append character to string (public member function) assign Assign content to string (public member function) insert Insert into string (public member function) erase Erase characters from string (public member function) replace Replace part of string (public member function) copy Copy sequence of characters from string (public member function) swap Swap contents with another string (public member function)

string in STL String operations: c_str Get C string equivalent (public member function) data Get string data (public member function) get_allocator Get allocator (public member function) find Find content in string (public member function) rfind Find last occurrence of content in string (public member function) find_first_of Find character in string (public member function) find_last_of Find character in string from the end (public member function) find_first_not_of Find absence of character in string find_last_not_of Find absence of character in string from the end (public member function) substr Generate substring (public member function) compare Compare strings (public member function)