PNX格式及数据转换规则(NR) 李珍 艾利贝斯有限公司北京代表处 CCEU 培训 6.25-27,2014
PNX 格式各部分组成 (PNX Sections) NR(Normalization Rules)规则 NR配置举例 内容提要 PNX查看工具(PNX Viewer) PNX 格式各部分组成 (PNX Sections) NR(Normalization Rules)规则 NR配置举例 Ex Libris Ltd., 2014 Internal and Confidential
PNX Viewer Ex Libris Ltd., 2014 Internal and Confidential
PNX Viewer Ex Libris Ltd., 2014 Internal and Confidential
PNX Viewer Ex Libris Ltd., 2014 Internal and Confidential
PNX Viewer>查看各字段 Ex Libris Ltd., 2014 Internal and Confidential
PNX 格式各部分组成 Ex Libris Ltd., 2014 Internal and Confidential
PNX (Primo Normalized XML)格式 Sections Control Display Links Search Facets Sort Dedup FRBR Delivery & Scoping Ranking Enrichment Additional Data This is an example of PNX record. The PNX record is divided into sections as listed in the slide. Every section serves a different functional purpose. Data that is required by more than one section is duplicated. This enables complete flexibility in data manipulating. (see how the title “Wall Street Journal” appears in both DISPLAY and SEARCH sections). You may find a very detailed description of all PNX sections and fields in the “Primo Technical Guide ” document. Later in this session we will focus on the fields that require customization during the implementation process. It is possible to view the PNX from the front end by adding to the URL of a full record display: &showPnx=true. Note: If possible, show a demo of a PNX record via the PNX Viewer. 8
PNX与原始格式比较 PNX与CNMARC格式 Ex Libris Ltd., 2013 Internal and Confidential
PNX与原始格式比较 PNX与DC格式 Ex Libris Ltd., 2013 Internal and Confidential
PNX Sections > Control Control section : Primo内部控制使用,包含数据源相关信息(原始格式、系统、记录号等)。包括: SourceID Source-RecordID RecordID = SourceID + Source-RecordID Source format Original SourceID Source System Ex Libris Ltd., 2014 Internal and Confidential
PNX Sections > Display Display section:记录简略及完整显示字段,包括: Type Title Creator Contributor Edition Publisher Creationdate Format Identifier Subject Language Availlibrary Source Ex Libris Ltd., 2014 Internal and Confidential
PNX Sections > Display: Type 资源类型还可以用于条件过滤或限定检索范围 在检索结果列表,每个记录前面会显示一个相应的资源类型图标 Book Journal Article Text Resource Image Database Video Audio CD Map Score Website Other
PNX Sections > Links Links section:包括记录的相关链接 Delivery links (GetIt!), such as: OpenURL OpenURL_fulltext LinktoResource – 数字资源的全文链接 LinktoHoldings – 链接到OPAC馆藏记录 LinktoRequest – 链接到OPAC预约请求 Additional links: Thumbnail Link to TOC – table of contents Link to Abstract Link to Item in Amazon / WorldCat AdditionalLinks – 记录的其他链接 There are two types of links: Calculated links based on a template – the field includes the template name Static links – the field includes the URL. More links can appear; the ones on the slide are only an example. dedupmrg787638
PNX Sections > Links Advanced Configuration > All Mapping Tables > delivery aleph_backlink Advanced Configuration > Full Normalization Rule Configuration > cnmarc > Links links:backlink {{ils_base}}?func=direct&local_base={{control/originalsourceid}} &doc_number={{control/sourcerecordid}} $$Taleph_backlink$$D查看书目记录
PNX Sections > Search Search section:包括用于索引和检索的字段 Creators/Contributors Titles Subjects Creation Date ISSN/ISBN Full-text Resource Type Search Scope RecordID SourceID The “search” section includes the data – metadata and unstructured – that will be indexed by the search-engine. The data is grouped for several reasons: To enable qualified searching in the User Interface For internal Primo purposes (for example qualified searching on the record-ID). To boost ranking.
PNX Sections > Facets Facets section:分面字段,用于检索结果的进一步精简,一条记录可以包含多个分面字段,也可以有重复的分面。 例如: Creator/Contributor Creation date Topic Physical format ClassificationLCC Resource type “Top-level” facets Prefilter Resource type. Based on Resource Type from display. Creator/Contributor – personal names are normalized – only first initial of first name is used Language Creation date range Topics. Three level topics can be created. For example, for MARC21 this will be based on the sub-divisions: Japan Japan – economics Japan – economics – taxation Physical format Classification.lccn – to create a browsable subject’s list based on LCCN. An enrichment routine converts the classification code to text. Collection. A physical, digital, or logical collection to which the record belongs. “Top-level” facets. These are the facets that display on top of the results list (“Show only”). The default includes: Online resources Available “Pre-filter”. This is the qualified search by resource types.
PNX Sections > Sort Sort section:用于记录的排序 例如: Creation Date (newest first) Author Title In the Front End the search results can be sorted by: Date, newest first Relevance Popularity But the only section that actually appears in the PNX, under the Sort field, is creation date.
PNX Sections > Dedup “Dedup” section:包含用于判断记录合并去重的字段。 每条记录生成一个 “de-duplication” vector ,相同的记录将分配一个 “MatchID” 。 C1…c4 candidate vector F1…f11 matching vector Duplicate records are normally derived from different repositories.
PNX Sections > FRBR “FRBR” section:包含用于判断记录按题名作者归组聚合的字段 每条记录生成一到多个key值,有相同key值的记录会聚集到一组 K1..kn for marching vector Primo's mapping for the grouping section is based on an the principles published by IFLA in the study Functional requirements for bibliographic records: final report / IFLA Study Group on the Functional Requirements for Bibliographic Records. The algorithm is based on constructing the primary key from the normalized primary author and title and then additional keys from the other titles and authors in the record. The keys represent the work. dedupmrg787638
FRBR example 记录1 K1 carroll lewis 1832‐1898 $$AA K3 alices adventures in wonderland $$AT K3 alice in wonderland $$AT 生成两个key: carroll lewis 1832‐1898 alices adventures in wonderland carrol lewis 1832‐1898 alice in wonderland 记录2 K3 alices adventures in wonderland $$AT K3 allibillilokamloamayikatha $$AT carroll lewis 1832‐1898 allibillilokamloamayikatha
PNX Sections > Delivery Delivery section:包含记录所属机构以及获取方式(GetIt!,查看全文/获取馆藏)的相关信息 包括: Institution Delivery category Restricted Delivery Scope Explain: The Delivery section contains the information of to which institution the item belongs, so that when an end user would like to get it, Primo will either notify the user that the item exists in the library or that the item exists elsewhere. (institution here used for backlink only, while institution in avail_library field is used for GetIt) Likewise, the Delivery section delcategory contains values indicating whether it’s a physical item, online resource, etc. This is important both for selecting a scope and for the GetIt! functionality.
PNX Sections > Additional PNX Sections Ranking – 包含两个用于相关性排序加权字段 Booster1 = can influence record position Booster2 = not in use Enrichment – 包含数据扩充和丰富处理阶段需要的字段 Additional data – Primo所需的但在其他section里没有提取的字段– e.g. data elements for the OpenURL.
NR(Normalization Rules)规则 Ex Libris Ltd., 2014 Internal and Confidential
数据转换/标准化规则将不同格式、类型的数据统一化,进而实现了统一、快速有效的检索 NR(数据转换/标准化规则) 数据转换/标准化规则将不同格式、类型的数据统一化,进而实现了统一、快速有效的检索 数据转换 MARC Generic XML Dublin Core Digital Entity PNX Input Records 数据转换/ 标准化规则 These rules define how to convert the source data into the PNX format.
NR(数据转换/标准化规则) Primo在默认安装中已带有若干预定义的数据转换/标 准化规则模板,适用于一些常见的数据格式 每条数据转换/标准化规则定义了应该查找源数据中的 哪些字段及如何将其转换为PNX 每条规则可由多个条件以”与”、”或”条件组合起来 Normalization Rules The list of Normalization Rules in each Normalization mapping set, appears according to the PNX sections and PNX fields. Most of the standard normalization rules are hard-coded, for example the title field in the original source, author, etc. are automatically converted into the appropriate PNX fields. There are also many Normalization Rules that can be configured according to the institution’s preference. We will soon see a demo in the BO.
NR(数据转换/标准化规则) 预定义的转换规则模板,可通过Web界面选择和复制。 Ex Libris Ltd., 2013 Internal and Confidential
NR(数据转换/标准化规则) NR 包含三个内容: Source —— 转换之前的来源字段,通常来自原始记录。 PNX field (or “target”) —— 转换之后的PNX section及字段 Conversion —— 转换程序,例如删除不需要字符,提取某个字段的某一段数据,根据分隔符拆分字段等等。 Normalization Rules
NR(数据转换/标准化规则) 定义单条规则 Rules Configuration There are four main sections in a normalization rule: Source: the section of data from the source data that is being normalized in the rule. We define the type (MARC, XML, constant, etc.) and the field and subfield that we are looking for within the source data. Conditions: This is not a mandatory field for the rule. We use it when we wish to apply certain rules only if the original source data fulfills certain conditions. Many complex conditions can be created. We will see more on the following slides. 3) Transformations: These are the conversions that we want the normalization rule to perform. The most simple transformation is “Copy as is”. In this window, we see that there are two transformations: the first converts the field according to what is defined in the mapping table with the name “ILS Institution Code”. The second adds to the beginning of the string “$$I” 4) Action: There are three types of actions: ADD – this means that a new PNX field should be added for every additional source field. OR – only one PNX field should be created. In other words once the field has been created, the system stops checking the remaining rules. MERGE – merge all occurrences of the source fields to a single PNX field. If MERGE is used then a delimiter must be added. We must check the “Enabled” box for this rule to become effective.
NR(数据转换/标准化规则) 将多条转换操作以”或”连接起来 Rules Configuration This is a rule with one condition We define in the Conditions logic whether the condition is fulfilled when it is true or false. Routine –defines the condition and the parameter, to check if it is met In this rule, we see that if the condition is true, i.e. the MARC 007 field begins with cr, “Online Resources” will be entered into the PNX under this specific field. (In this case, this is section Delivery, field delcategory)
NR(数据转换/标准化规则) 定义多重条件 When we add more than one condition, we see that there are many places to fill in the Conditions logic. This allows for many complex conditions. In the general Conditions section we have two fields which apply to the entire rule and all of its conditions: Conditions logic: can be True or False Conditions relation: can be And or Or Within each condition, Condition 1 and Condition 2 we can define True or False On the next slide we will see what the various combinations can mean.
The condition is met if… NR(数据转换/标准化规则) 定义多重条件 Conditions logic Conditions relation Condition1 logic Condition2 logic The condition is met if… True Or At least one condition is true And Both conditions are true False 1st condition is true or 2nd condition is false 1st condition is true and 2nd condition is false At least one condition is false Both conditions are false Both conditions are false 1st condition is false and the 2nd condition is true Explain for example the third rule: If the General Conditions logic above is True, and the relation is Or, and then within the conditions Condition 1 is True and Condition 2 is False, the condition will be fulfilled if either the 1st condition is true or the 2nd conditions is false.
NR(数据转换/标准化规则) 新建一条规则: Click on Edit next to the rule you wish to change or map (as seen in slide 30). Note the message “This PNX field is not yet mapped”. To create a new rule, click on “Create”. We will then see all the fields that need to be filled in. By default, we receive the option to create a Basic rule. A Basic rule is missing the options of Conditions and Action (and, or, merge). To view the advanced options, click on “Advanced”. To test the rule to see if it operates as desired, go to the Testing area, select a record to be tested from the dropdown and click on “Test”. You will then see the original XML source on one side of the page, and the data that was converted into the PNX field on the other side of the page. This can help you fix the rule if necessary. It is possible to use your own records for the Test utility. To access your records, they need to be extracted in XML format and placed on: /exlibris/primo/p1_x/ng/primo/home/profile/publish/demo_data/normalization_test/APPROPRIATE FORMAT. For example for data in MARC format it should be /exlibris/primo/p1_x/ng/primo/home/profile/publish/demo_data/normalization_test/marc.
NR(数据转换/标准化规则) 例如: 当原数据540字段包括THESIS时,定义其scope为 THESIS : We continue from the previous screen in the BO and demonstrate how we fill in this Normalization Rule. We fill out the Source and the Condition 1 Source. We then select from the Routine dropdown “Check that string exists” and enter in our Parameter: THESIS W then select from the Transformations dropdown “Copy as is”, since we want the field THESIS copied into this field in the PNX.
NR配置举例 Ex Libris Ltd., 2014 Internal and Confidential
国家书目中心:数据转换思路 CNMARC数据:按几大类型(图书、期刊/报纸、音像电子资源、博士论文)分别拟定显示、检索、分面字段,以实现理想的统一展示效果。 MARC21数据:在系统缺省模板的基础上做少量修改。 根据馆藏书目特点制定了特殊的FRBR规则,以实现理想的书目聚合效果。 Ex Libris Ltd., 2013 Internal and Confidential
国家书目中心:数据转换规则 显示部分(PNX/Display) Ex Libris Ltd., 2013 Internal and Confidential
国家书目中心:书目显示效果 PNX格式 书目详细信息显示 Ex Libris Ltd., 2013 Internal and Confidential
国家书目中心:书目显示实例 外文图书 Ex Libris Ltd., 2013 Internal and Confidential
期刊 Ex Libris Ltd., 2013 Internal and Confidential
音像电子资源 Ex Libris Ltd., 2013 Internal and Confidential
国家书目中心:数据转换规则 分面部分(PNX/Facet) Ex Libris Ltd., 2013 Internal and Confidential
国家书目中心:分面显示效果 PNX格式 Ex Libris Ltd., 2013 Internal and Confidential
国家书目中心:书目聚合规则 书目聚合部分(PNX/FRBR) k1表示责任者(A),k3表示题名(T),k2表示只取题名(TO)。 系统会将记录的所有k1和k3进行组合生成多个key值,k2单独生成key值,用于FRBR处理。 根据不同书目类型定义其K1,K2,K3 的取值字段。 Ex Libris Ltd., 2013 Internal and Confidential
国家书目中心:书目聚合效果 Ex Libris Ltd., 2013 Internal and Confidential
国家书目中心:普通图书聚合
国家书目中心:图书与音像电子资源聚合
国家书目中心:多卷书聚合
国家书目中心:年鉴聚合
国家书目中心:期刊报纸聚合
国家书目中心:外部链接规则 Links部分(PNX/Links) 静态URL 根据模板动态生成 Ex Libris Ltd., 2013 Internal and Confidential
国家书目中心:外部链接效果
国家书目中心:数字对象链接 PNX <linktorsrc> $$Uhttp://mylib.nlc.gov.cn/system/application/search/display/metaDataDisplayRedirectPage.jsp?metaData.id=1071966&metaData.lId=1076511&IdLib=40283415347ed8bd013483174ef60002&sysid=wenjin $$Enlc_digitalres </linktorsrc> Ex Libris Ltd., 2013 Internal and Confidential
国家书目中心:数字对象链接 PNX <linktorsrc> $$Uhttp://mylib.nlc.gov.cn/system/application/search/display/metaDataDisplayRedirectPage.jsp?metaData.id=1071966&metaData.lId=1076511&IdLib=40283415347ed8bd013483174ef60002&sysid=wenjin $$Enlc_digitalres </linktorsrc> Ex Libris Ltd., 2013 Internal and Confidential
PNX 格式各部分组成 (PNX Sections) NR(Normalization Rules)规则 NR配置举例 内容提要 PNX查看工具(PNX Viewer) PNX 格式各部分组成 (PNX Sections) NR(Normalization Rules)规则 NR配置举例 Ex Libris Ltd., 2014 Internal and Confidential
谢谢! zhen.li@exlibrisgroup.com