RDBMS & NoSQL 统一数据建模 Allen Wang 王琤 Development Director CA Technologies
Allen Wang 王琤 2006年加入CA,现在负责ERwin(数据建模市场第一) 的全球研发。同时负责ERwin NoSQL Modeler的产品 管理。 标准委员会OMG 和行业协会DAMA的成员。 CA CTE成员,作为数据管理技术专家服务于董事会。 从2012年开始主持CA与清华大数据研究合作项目, 帮助企业进入大数据市场,此演讲也是项目的成果 之一,相关技术已提交专利。 2014年在复旦大学主讲研究生课程大数据与云计算 课程。 其他经验: CA Technologies – 信息治理CAMM (CA Message Manager) & CARM (CA Record Manager). 企业邮件合规和记录数据生命周期管理 Microsoft. – MDM主数据管理 中石油ERP项目。 Role 企业软件研发,产品管理,市场开发。 数据管理领域,数据生命周期管理,数据建模,数据库,新兴NoSQL技术,数据挖掘,数据仓库, Contents of this presentation are confidential and are being provided pursuant to the NDA signed between CA and Buyer as well as the terms and conditions set forth in the Confidential Information Memorandum issued by CA.
Survey Role? - Developer / DBA / Architect / Product Manager / Project Manager / CxO NoSQL?-正在用 / 考虑近期使用/没计划 混合NoSQL & RDBMS? - Yes/No
Big Data Technical Mature
大数据管理 – The Wild West 狂野的西部 NoSQL honeymoon is coming to an end, and it's time to start balancing our enthusiasm with some gimlet-eyed hard truths. “Big Data” and NoSQL data stores are just another enterprise data asset to the business Contents of this presentation are confidential and are being provided pursuant to the NDA signed between CA and Buyer as well as the terms and conditions set forth in the Confidential Information Memorandum issued by CA.
商业化互联网带来扩展和可用性的需求,而RDBMS这样的瑞士军刀再也无法满足这样的需求。 对数据存储增加水平扩展和冗余加大了系统复杂度,使得ACID更加难以保证,迫使我们按照CAP理论考虑取舍,创造了许多优化和专业化的有趣机会。 正如你们中的很多人可能已经知道的,关系型数据库(RDB)技术自从1970年代就已经存在,直到1990年代末一直是结构化存储的事实标准。RDB几十年来很出色地支持了高度一致性事务的工作负载,并依然保持强劲。随着时间的推移,该项古老的技术为应对客户的需求获得了新的能力,比如BLOB存储、XML/文档存储、全文检索、在数据库中执行代码、使用星形数据结构的数据仓库、以及地理空间扩展。只要一切都能挤进关系型数据结构的定义中,并且适合于单机,就可以在关系型数据库中实现。 然后,互联网的商业化发生了,并且彻底改变了一切,使得关系型数据库不再能够满足所有的存储需求。相比于一致性,可用性、性能和扩展正在变得同样重要--有时甚至更重要。 性能一直很重要,但是随着互联网商业化的出现,改变的是规模。事实证明,要达到规模化的性能,要求的技巧和技术是前互联网时代无法接受的。关系型数据库围绕着ACID(原子性Atomicity、一致性Consistency、隔离性Isolation和持久性Durability)的概念而建立,实现ACID最简单的方法就是把一切保持在单机上。因此,传统的RDB规模化的方法是垂直扩展(scale up),用白话说,就是使用更大的机器。
传统关系型数据库 范式: 3NF ACID 原子性 Atomicity 一致性 Consistency 隔离性 Isolation 持久性 Durability Relational ACID transactions Typically row-oriented Joins Operational and/or analytical workloads May 16, 2010 [Presentation Name via Insert tab > Header & Footer] Copyright © 2010 CA
CAP R+W > N CAP理论把关于数据存储的讨论扩展到超出ACID的范围,激发了许多非关系型数据库技术的诞生。在提出他的CAP理论的10年之后,Brewer博士发表了一份声明,澄清他最初的“三选二”的观点被极大地简化,是为了引起讨论,并有助于超越ACID。不过,这种极大的简化,引发了无数的曲解和误会。在对CAP更精细的解释中,所有三个维度应当理解为范围,而不是布尔值。此外,应当理解,分布式系统大部分时间工作在非分隔模式,在这种情况下,需要做出一致性和性能/延迟之间的折中。在分隔真的发生的罕见情况下,系统必须在一致性和可用性之间做出选择。 5/1/2019
Development Agility through flexible schema Database Manageability 灵活 => 缺乏管理 看code理解 曾经,数据模型是软件项目成功的关键。
Schemaless 这基本不再是一个需要讨论的话题 深深的寒意。 数据管理的灾难,多个应用用各自的方式读写数据
Schema at read
The lack of schema on write in the NoSQL world makes it inherently difficult to manage Analyze Aggregate Normalize Cleanse Code Data Warehouse Extract Load Utilize Data Model Hadoop / NoSQL Analyze Cleanse Code Load Utilize
NoSQL 数据管理 传统上,企业通过数据模型设计、构建、标准化、整合、管理他们 的数据资产。 NoSQL schema隐藏在 Map-Reduce 程序中. 多个Map-Reduce应用各自的方式读写数据 NoSQL数据需要由数据模型来管理。 当前的数据建模工具需要扩展支持NoSQL. Contents of this presentation are confidential and are being provided pursuant to the NDA signed between CA and Buyer as well as the terms and conditions set forth in the Confidential Information Memorandum issued by CA.
Unified Data Model & ERwin NoSQL Modeler Enterprise Data Architecture RDBMS Bi-direction, round trip (Reverse Engineering & Forward Engineering) Data modeling NoSQL DB Data Data Schema Schema Data Model Innovative unified model can describe structured and non-structured data, and two-ways transform data schema between Big Data and rational DB. Traditional ER Diagram cannot express NoSQL structure. Unified model is Hirachy object model, with data charactist Extract schema from both RDBMS and NoSQL Easily migrate between RDBMS and NoSQL CAP principle, easily transform between relational DB and NoSQL. Data Model design Data Modeler May 16, 2010 [Presentation Name via Insert tab > Header & Footer] Copyright © 2010 CA
比较关系型与NoSQL ER (Entity Relationship) Schema predefined Vertical Scaling ACID RDBMS Four main types: document, column-oriented, key-value, and graph Schema at read Horizontal Scaling Performance Prioritized based on CAP principal NoSQL Variety, data type Individual records are stored as rows in table, with each column storing a specific piece of data about that record. When data is needed from more than on table, these tables are joined together. Structure RDMBS, schema Predefined, store information to new property need modify the structure before we can add the data. NoSQL are typically dynamic, free add data without … Scaling RDBMS, single server must be upgrade to deal with increased demand. NoSQL expand to inexpensive server or cloud instances. Focus RDBMS, ACID NoSQL, high availability & performance Soft-state Eventual consistency May 16, 2010 [Presentation Name via Insert tab > Header & Footer] Copyright © 2010 CA
比较逻辑模型/ER图/NoSQL模型 逻辑模型 ER图(物理模型) NoSQL (物理模型) 实体 (Entity) 表 (Table) 文档/列族/图 实体的实例 行 (Row) Collection / Row 属性(Attribute) Column Name Key 属性值 (Attribute Value) Column Value Field Value 值域(Domain) Data type 关系(Relationship) Constraint Reference, Embedded, Additional table, 键(Key Group) 索引(Index) Index, Additional table, Reference 主键Primary Key 主键(Primary Key) Row Key
Example of NoSQL four types – ER Diagram Document oriented Column oriented Key-value pair Graphic
Example of NoSQL four types – Document ER Diagram Document oriented Column oriented Key-value pair Graphic For example, instead of storing film, actor, category into three distinct relational tables, Aggregated objects and properties to single document. This is similar to a search you would do for a particular film, where title, actor and category information appear together. We don’t have to join to separate places to get everything we need. Document oriented is much more application focused as opposed to table oriented more data focused. May 16, 2010 [Presentation Name via Insert tab > Header & Footer] Copyright © 2010 CA
Example of NoSQL four types – Column oriented ER Diagram Document oriented Column oriented Key-value pair Graphic Closest to the RDBMS, have similar way of looking at data as rows and values. The difference, RDBMS work with a predefined structure and simple data types. But column-oriented can work with more complex data types. May 16, 2010 [Presentation Name via Insert tab > Header & Footer] Copyright © 2010 CA
Example of NoSQL four types – Key-Value ER Diagram Document oriented Column oriented Key-value pair Graphic Key Value “Film_id:1:title” "ACADEMY DINOSAUR" “Film_id:1:description” "A Epic Drama of a Feminist And a Mad Scientist who must Battle a Teacher in The Canadian Rockies" “Film_id:1:release_year” "2005-12-31” “Actor:27:First_name” "JULIA" “Actor:27:Last_name” "MCQUEEN" “Film_id:1:actor” “27,31,66”
Example of NoSQL four types – Graphic ER Diagram Document oriented Column oriented Key-value pair Graphic Node & Relationship Work best are social relations , public transport links, or road maps. Usually find the shortest routes, nearest neighbors, etc. May 16, 2010 [Presentation Name via Insert tab > Header & Footer] Copyright © 2010 CA
Schema Inference
Unified Data Model & ERwin NoSQL Modeler Enterprise Data Architecture RDBMS Bi-direction, round trip (Reverse Engineering & Forward Engineering) Data modeling NoSQL DB Data Data Schema Schema Physical model Normalize /Denormalize Unified model Innovative unified model can describe structured and non-structured data, and two-ways transform data schema between Big Data and rational DB. Traditional ER Diagram cannot express NoSQL structure. Unified model is Hirachy object model, with data charactist Extract schema from both RDBMS and NoSQL Easily migrate between RDBMS and NoSQL CAP principle, easily transform between relational DB and NoSQL. Data Model design Data Modeler May 16, 2010 [Presentation Name via Insert tab > Header & Footer] Copyright © 2010 CA
NoSQL 数据建模 数据如何用(查询模式)? 建模策略(实体间)-嵌入?引用?外部表? 识别聚合实体 强依赖 (composite) 一对一/一对多/多对多 只读/追加/更新 识别聚合实体
NoSQL 数据建模 数据特性 强一致性/最终一致性 事务性验证 性能优化
NoSQL 数据建模 - Aggregate 一组相关对象的集合作为一个整体
NoSQL 数据建模 – 一致性 强一致性 优点 数据可信度高 应用情境 高准确 无数据丢失 最终一致性 优点 应用情境 可扩展 读写解偶 允许少量数据丢失 高性能 网络相对不稳定 强一致性 优点 数据可信度高 应用情境 高准确 无数据丢失
Semantic DS数据空间 语义标签(本体) 数据互通(征信) 认知科学
Unified Data Model & ERwin NoSQL Modeler Enterprise Data Architecture RDBMS Bi-direction, round trip (Reverse Engineering & Forward Engineering) Data modeling NoSQL DB Data Data Schema Schema Physical model Normalize /Denormalize Unified model Innovative unified model can describe structured and non-structured data, and two-ways transform data schema between Big Data and rational DB. Traditional ER Diagram cannot express NoSQL structure. Unified model is Hirachy object model, with data charactist Extract schema from both RDBMS and NoSQL Easily migrate between RDBMS and NoSQL CAP principle, easily transform between relational DB and NoSQL. Business Concept Model Production Data Character Query Pattern Semantic Context Data Model design Data Modeler May 16, 2010 [Presentation Name via Insert tab > Header & Footer] Copyright © 2010 CA
ERwin NoSQL Modeler基于统一模型的轻量级大数据解决方案 ODBC 数据源 标准SQL访问方式 支持跨多数据源查询
Unified Data Model & ERwin NoSQL Modeler Enterprise Data Architecture RDBMS Bi-direction, round trip (Reverse Engineering & Forward Engineering) Data modeling NoSQL DB Data Data ODBC Schema Extraction Schema Inference Data Visualization Physical model Data Discovery Normalize /Denormalize Unified model Innovative unified model can describe structured and non-structured data, and two-ways transform data schema between Big Data and rational DB. Traditional ER Diagram cannot express NoSQL structure. Unified model is Hirachy object model, with data charactist Extract schema from both RDBMS and NoSQL Easily migrate between RDBMS and NoSQL CAP principle, easily transform between relational DB and NoSQL. Business Domain Model Production Data Character Query Pattern Semantic Context Data Governance Application Application Application Data Model design Data Modeler May 16, 2010 [Presentation Name via Insert tab > Header & Footer] Copyright © 2010 CA
1 2 3 NoSQL 数据建模 关系型数据库与NoSQL间数 据整合 统一数据查询与管理(UDBC) 基于统一模型的NoSQL 数据管理 关系型数据库到NoSQL / ETL / 基于NoSQL生产环境数据创建数据仓库 2 统一数据查询与管理(UDBC) 数据可视化 语义标签,数据互通 3 Contents of this presentation are confidential and are being provided pursuant to the NDA signed between CA and Buyer as well as the terms and conditions set forth in the Confidential Information Memorandum issued by CA. Copyright © 2010 CA. All rights reserved.
Demo
结语 大数据管理 is the wild west. Schema at read NoSQL数据模型可视化,正向工程&逆向工程 数据迁移 UDBC – 标准SQL访问异构数据源 ERwin NoSQL Modeler基于统一模型的轻量级大数据解决方案 业内都是重的解决方案。 全托管。 适合全新的用户 考量/权衡 已有投资,数据平台。 软硬件。 有研发团队。 迁移的代价 数据量, 长期性价比 May 16, 2010 [Presentation Name via Insert tab > Header & Footer] Copyright © 2010 CA