《医学信息检索与利用》 Medical Information Retrieval March 28, 2007 黄利辉
概论-信息检索系统 美国生物医学文献检索系统 Medline & PubMed
信息检索的定义 Information Retrieval Is the science and practice of identification and efficient use of recorded media Biomedical literature Multimedia publishing Chemical structures Cartographic materials Genes and protein sequences Video clippings Etc.
MIR的发展 1879 Index Medicus 1966 Medical Literature Analysis and Retrieval System (MEDLARS) 1971 MEDLARS Online 1980s full-text databases 1990s world wide web 1997 PubMed
信息检索的过程 Indexing 索引 Query formulation 建立检索策略 Retrieval 检索 Evaluation 检索结果评价 -- (Gerard Salton 1983)
信息检索的过程 Content Information Need Indexing Query Query Formulation Indexing Query Database (Content plus Index) Evaluation and Refinement Retrieval Results
信息内容 Content: Media developed to communicate information or knowledge Original content (primary literature) Synoptic content Bibliographic content Full-text content Evidence-based medicine
Original & synoptic content Original content (primary literature):is developed through new observations and analysis of the world. (Medical journal articles、conference proceedings、 white papers etc.) Peer review Authors develop synoptic content by extracting important observations and principles from sources of original content, as well as from personal experience. Text book, practice guidelines, drug monographs, review articles etc.
Bibliographic & full-text Bibliographic content is the information abstracted from the original source Full-text content: Citation information The complete body text Multimedia content
Evidence-based medicine Diagnosis etiology prognosis treatment/prevention (Randomized controlled trials) Original literature Review articles Systematic reviews (Cochrane Collaboration)
索引 IR database= content + index Index: items and item attributes (inverted index) item attribute (document No.) Aspirin 1,5,6,9 attack 3,6,7,8 Heart 4,6,7,10 Prevention 1,6,9
索引:如何建立检索策略 Query formulation: The process of stating information needs in terms of queries Information need is the searcher’s expression, in her own language, of the information that she desires “should middle-aged men be given a daily dose of aspirin to prevent heart attack” Aspirin AND prevention AND heart AND attack
索引的构建 The goal of indexing is to produce the smallest, most efficient representation of the original content that will facilitate high-quality retrieval Index items are units of information suitable for matching with a query Index attributes describe facets of the item
Capture of content structure with indexes Content structuring Markup Semantic regions Exp. Woods Author woods & forests
Indexing of bibliographic informaion Medline Information abstracted from the publication, such as the authors’ names, article title, article source, publication data, and authors’ abstract Information added by a human indexer, such as subject headings and publication types
PMID- 16931225 IS - 1523-6838 (Electronic) TI - A case of "pure" preeclampsia with nephrotic syndrome before 15 weeks of gestation in a patient whose renal biopsy showed glomerular capillary endotheliosis. AB - A 35-year-old Japanese woman for whom a previous health checkup showed normal blood pressure and urinalysis results without serological AD - Department of Internal Medicine and Division of Immunopathology, Clinical Hospital, Chuoh, Japan. timasawa@yahoo.co.jp AU - Joh K LA - eng PT - Case Reports TA - Am J Kidney Dis JT - American journal of kidney diseases : the official journal of the National Kidney Foundation. MH - Kidney Diseases/*etiology MH - Kidney Glomerulus/*pathology MH - Nephrotic Syndrome MH - *Pre-Eclampsia MH - Pregnancy MH - Pregnancy Trimester, Second EDAT- 2006/08/26 09:00 SO - Am J Kidney Dis. 2006 Sep;48(3):495-501.
Medical Subject Headings (MeSH) MeSH was developed by the NLM to represent important concepts in biomedicine. 18,000 subject headings grouped into one of 15 trees Diseases Category Cardiovascular Diseases Vascular Diseases Hypertension Hypertension, Malignant Hypertension, Pregnancy-Induced Hypertension, Renal Hypertension, Renovascular
The reason that Medline have manually assigned controlled-vocabulary terms : More thorough representation of the main concepts found in the paper It facilitates retrieval by concept Publication type: 1991 more detailed PT
信息需求 Query formulation is the process by which information needs are translated into queries suitable for searching. Person’s role in the healthcare process Medical researchers Clinicians Generalist physician specialist
建立检索策略 semiautomated Boolean Queries Natural Language Queries
Boolean Queries
Field qualification: a designation fo which index or field should be searched Text-word searching Wildcard characters * #
检索 Retrieval: Matching queries against the index Ranking or sorting the output by some criteria Displaying the results to the user
检索 Matching Ranking Display Queries are compared against the index, and a result set is created Ranking The original result set is sorted or ranked by criteria (chronology, alphabetic ranking, relevance ranking ) Display The final result set is shown to the user
结果评价和反馈 Evaluation Refinement
Evaluation Recall= Number of documents retrieved and relevant Number of relevant documents in database Number of documents retrieved
Recall and precision of clinician searchers at McMaster university (1990)
Medline & Pubmed
一.历史与背景 INDEX MEDCUS MEDLARS ONLINE MEDLINE CD MEDLINE WEB
美国医学文献检索体系 美国国立医学图书馆(NLM) http://www.nlm.nih.gov/ 前身是美国军医署图书馆,始建于十九世纪三十年代,1922年改名为军事医学图书馆。1952年再次改名为陆军医学图书馆,1956年美国总统批准“建立国家医学图书馆,促进医学进步,提高国家卫生和福利水平”的法案成为法律,正式成为美国的三个国家图书馆: 国会图书馆,国立医学图书馆,国立农业图书馆之一,归属美国国立卫生研究院(NIH)。
医学索引 INDEX MEDICUS(IM) 1865年曾经是一名军医的 John Shaw Billings 结合目录学创造性地运用自己的医学专业特长于1879年编制了一部卫生科学期刊文献指南,这就是当今世界上最著名的题录式医学文献检索期刊-医学索引 INDEX MEDICUS(IM)。从本世纪二十年代以来一直使用主题法(MeSH)。目前IM收录有世界上主要国家和地区用44种语言文字发表的生物医学及与医学有关的科技期刊3419种(1999年收编中文期刊57种),年文献量约30万条, 其中88%英文文献。
美国的医学泰斗William H. Welch 曾经指出:十九世纪美国对医学有四大贡献:麻醉术的发展,昆虫传播疾病的发现,现代公共卫生实验室的建立和军医署图书馆的发展及馆藏索引目录的编制,他认为这最后一项是这四大贡献中最重要的一项。
医学文献分析与检索系统 MEDLARS Medical Literature Analysis and Retrieval System 美国国立医学图书馆于1963年正式建成世界上第一个医学文献计算机检索系统,是当今世界最具权威性的医学文献数据库检索系统。
MEDLINE 1971年MEDLARS发展成为联机检索系统- Medline 目前收录期刊4300种。收录了1966年至今的1000多万条记录 1997年克林顿宣布提供Internet免费检索-PubMed
Web medline检索特点: 免费提供题录和文摘 可与提供原文的网址链接 提供检索词自动转换匹配 操作简便、快捷
MEDLINE Web (FREE) PubMed Internet Grateful Med (http://www.ncbi.nlm.nih.gov/PubMed) Internet Grateful Med (http://www.grateful.com) BioMedNet Evaluated Medline (http://www.bmn.com) Healthgate Medline (http://www.healthgate.com)
NCBI简介:The National Center for Biotechnology Information Bethesda Created in 1988 as a part of the National Library of Medicine at NIH Establish public databases Research in computational biology Develop software tools for sequence analysis Disseminate biomedical information
WWW Access Entrez & BLAST
NCBI Web Traffic Christmas and New Year’s Days Users per day 600,000 300,000 200,000 100,000 400,000 1998 1999 2000 2001 2002 2003 2004 500,000 600,000 2005 Users per day US Internet Users World Christmas and New Year’s Days
The Entrez System: Text Searches
The (ever expanding) Entrez System PopSet Structure PubMed Books 3D Domains Taxonomy GEO/GDS UniGene Nucleotide Protein Genome OMIM CDD/CDART Journals SNP UniSTS PubMed Central Based on key word searching (MESH terms, author names, gene names, accession or gi numbers, or just recognized patterns in the records). 15 database are included….
Entrez: Neighboring and Hard Links Word weight PubMed abstracts Taxonomy 3 -D Structure 3-D Structure VAST Genomes Phylogeny (MMDB) Nucleotide sequences Protein sequences BLAST BLAST
Literature Databases
A part of the NCBI Bookshelf Part 2. Data Flow and Processing Part 1. The Databases Part 3. Querying and Linking the Data Part 4. User Support A part of the NCBI Bookshelf
PubMed 数据库主建单位:美国生物技术信息中心(NCBI)研制的数据库。 数据类型:期刊论文、综述、以及与其他数据资源链接。
数据收录 MEDLINE 4300余种生物医学期刊,内容涉及医学、护理、牙科、兽医、健康保健系统、前临床医学等学科。这些期刊来源于美国和世界上70多个国家和地区。 文献量达1千1百万条记录,并回溯到1966年。
In process citation 提供MEDLINE尚未经规范处理的数据。 获MeSH词后,再加入MEDLINE。 记录中[record in process]的标记。
Publisher Supplied Citations 出版商直接向PubMed提供电子记录 包括MEDLINE未收录的部分记录
PubMed 与MEDLINE的区别 收录范围广:MEDLINE收录的部分生命科学相关文章的非医学专业期刊(物理、天文、化学等) 文献类型全:提供电子原文链接(部分免费)
检索栏 特征栏 侧栏
PubMed 基本检索功能 自动转换检索词(automatic term mapping) 自动对照规范词表,期刊名表,短语,著者名索引, 依次寻找转换, 短语寻找转换,改成单词转换 单词在上述四个字段寻找转换,在所有字段中转换
如:输入placenta growth oxygen系统转换为——((("placenta"[MeSH Terms] OR placenta[Text Word]) AND (("growth and development"[Subheading] OR "growth"[MeSH Terms]) OR growth[Text Word])) AND ((("oxygen"[MeSH Terms] OR "oxygen inhalation therapy"[MeSH Terms]) OR "Oxygen"[MeSH Terms]) OR oxygen[Text Word]))
截词检索:treat* 强迫短语检索:“brca 1”(不再自动转换匹配和扩展检索)
自动扩展检索 系统自动对主题词、副主题词进行扩展检索,如: 输入“hypertension therapy,系统自动将高血压的药物治疗、饮食疗法等
PubMed的特征栏 Limits(检索限制选择) 字段限制:著者、刊名、篇名等 数据输入时间:默认检索可回溯到1966年,限制选择30天-10年 7种文献类型限制: 7种语种: 12种子文档:(01年新增2种Space Life Sciences and Bioethics )
Previw/index(检索策略预览) 浏览检索式 改变检索式 (可用检索式编号#1 OR #2) 浏览索引(index)
Index(索引字顺表) 选择限制字段 输入检索词(词根) 在字顺表中选择检索词 选用逻辑算符建立检索式 点击GO
History 检索式回顾 编辑检索式 如#3 AND child
修改检索策略 1.检索框中随时修改 2.“Limit” 、 “Preview/index”、 “History” “Details”功能键实现。
检索结果输出 显示检索结果 保存检索结果 保存检索策略 打印检索结果
显示格式选择(Display) Abstract report:来源期刊、标题、著者、著者地址、记录性质、文献类型、勘误、评论、PMID或UI、摘要。 Citation Report: 显示除上述内容外,加MeSH叙词、化学物质名称、资助项目号等字段。 MEDLINE Report: Clipboard(剪贴板)(Clip) 允许存储500条题录(01年增加到1000条) 选中题录(或文摘) 点击 clip Add 多项检索后一起保存
显示记录选择 单一记录:著者姓名 所有记录:选格式、点按display 选定记录: 方框内打钩
保存检索结果 Clipboard(剪贴板)(Clip) 允许存储500条题录(01年增加到1000条) 选中题录(或文摘) 点击 clip Add 多项检索后一起保存
save Pubmed允许存储5000条题录(选用系统默认显示格式)尽量采用相同格式 多项检索用clipboard 注: 选用Pubmed save按钮 可保存全部所检记录 文件名后加 .txt 可在写字板或word中打开 (如china.txt) 选用浏览器文件-另存为 保存当前页
保存检索策略 Details 中的URL ,可以让检索匹配的详细策略放入检索框,该检索页面可以存储入浏览器收藏夹。 MyNCBI 灰色栏中的功能按钮,存储在数据库的个人文档中。
打印检索结果 1 浏览器打印(当前显示的网页) 2 show功能键,增加网页显示的记录数 3 save 后一起打印(word)
取原文及相关资源 1.联机获取原文(文摘中超链) 2.向NLM索取原文(order)(由图书馆出面向成员馆订购) 3.相关文献查找(related article) 4.其他数据库资源(生物信息学等)
侧栏上的其他功能 Mesh Batabase 功能: 确定规范检索词 查看词义注释、树状结构表 Simple display 不能限定副主题词或进行其他修饰检索 Detailed display 副主题词、扩展、主要主题词 例:系统性硬皮病(systemic scleroderma)的药物疗法
附加检索 引文匹配器(Citation Matcher) 期刊浏览器(Journal Browser) 临床问题(Clinical Queries) 临床警示(Clinical Alert):
引文匹配器 (Citation Matcher) 输入题录信息,查找特定文献进行补缺。 刊名要准确,缩写要标准 著者姓名大小写不敏感 如查找:Taiwan I Hsueh Hui Chih由Chang PY(taiwan yi xue za zhi ) 撰写、 1975 发表文献的其他著者
期刊浏览器 (Journal Database): 通过输入刊名、缩写名、等浏览期刊文献。 提供电子原文的超链键。
临床问题 (Clinical Queries): 专门检索临床研究方法学文献,主要涉及治疗、诊断、病因、和预后四个分类,并提供强调选择,即敏感度(强调查全)或专指度(强调查准)。
临床警示(Clinical Alert): 1991年开始,NIH美国卫生健康研究所不定期公布能引起死亡率和致病率升高的临床试验项目重要新闻稿。
PubMed Spell Checker: phenylthalien?