Presentation is loading. Please wait.

Presentation is loading. Please wait.

启用“Hadoop”的哨兵 - Sentry 的通用权限管理模型

Similar presentations


Presentation on theme: "启用“Hadoop”的哨兵 - Sentry 的通用权限管理模型"— Presentation transcript:

1 启用“Hadoop”的哨兵 - Sentry 的通用权限管理模型
Hao Hao - Anne Yu - Beijing Strata + Hadoop World, Aug

2 关于我们 Cloudera (美国加州硅谷)软件工程师 Apache Sentry 的 PMCs 和 Committers
Hao, 曾工作于eBay 的 Search Backend 团队 Anne, 曾工作于Amazon 的 Search Backend 团队

3 会议议程 Sentry 概观 Sentry 通用权限管理模型 Sentry 其他的重要特征 引言 (Introduction)
架构 (Architecture) Sentry 通用权限管理模型 动机 (Motivation): 提供机制使得任意的大数据处理引擎可以使用Sentry的细敉度访问控制 (easy integration with Apache data engines, even third-party data applications) 成功地与 Apache Hive, Impala, Solr, Kafka and Sqoop2整合 成功整合案例详解 Sentry 其他的重要特征

4 Sentry 概观 权限控制后台 (Authorization Service)
Sentry 对hadoop上的数据提供基于角色的细粒度授权服务 (enforce role-based access control,RBAC。  数据的安全管理可以达到企业级的标准,现阶段已经应用于企业、银行等机构。 对于hadoop上的各种模块提供一个统一的授权平台。 提供可插拔接口(pluggable)和高度模块化。 目前的代码内置与Apache Hive, Hive metastore/HCataglog, Apache Solr, Apache Kafka, Apache Sqoop and Apache Impala的整合。下载编译可用。

5 (Policy Metadata Store)
Sentry Architecture Hive Solr Sqoop Hook Hook Hook Server Client Model Thrift client APIs Get privileges; Grant/Revoke role; Grant/Revoke privileges; List roles Sentry Client Plugin Thrift APIs Sentry Server (Policy Metadata Store)

6 Authorization Provider
Sentry Architecture Hive Solr Sqoop Access Binding Layer Authorization Provider DB Policy Engine Solr Policy Engine Sqoop Policy Engine Provider Backend Policy Metadata Store Local File/HDFS Sentry Database

7 Authorization Provider
Sentry Architecture Hive Solr Sqoop Binding Layer: takes the authorization requests in the native format of requestors and converts that into a authz request based on the authorization data model that can be handled by Sentry authorization provider. Access Binding Layer Authorization Provider DB Policy Engine Solr Policy Engine Sqoop Policy Engine Provider Backend Policy Metadata Store Local File/HDFS Sentry Database

8 Authorization Provider
Sentry Architecture Hive Solr Sqoop Authorization provider: an abstraction for making the authorization decision for the authz request from binding layer. Currently, supplies a RBAC authorization model implementation. Access Binding Layer Authorization Provider DB Policy Engine Solr Policy Engine Sqoop Policy Engine Provider Backend Policy Metadata Store Local File/HDFS Sentry Database

9 Authorization Provider
Sentry Architecture Hive Solr Sqoop Policy Engine: gets the requested privileges from the binding layer and the required privileges from the provider layer.  It looks at the requested and required privileges and makes the decision whether the action should be allowed. Access Binding Layer Authorization Provider DB Policy Engine Solr Policy Engine Sqoop Policy Engine Provider Backend Policy Metadata Store Local File/HDFS Sentry Database

10 Authorization Provider
Sentry Architecture Hive Solr Sqoop Policy Backend: making the authorization metadata available for the policy engine. It allows the metadata to be pulled out of the underlying repository independent of the way that metadata is stored. Access Binding Layer Authorization Provider DB Policy Engine Solr Policy Engine Sqoop Policy Engine Provider Backend Policy Metadata Store Local File/HDFS Sentry Database

11 Authorization Provider
Sentry Architecture Hive Solr Sqoop Sentry policy store and Sentry Service: persist the role to privilege and group to role mappings in an RDBMS and provide programmatic APIs to create, query, update and delete it. This enables various Sentry clients to retrieve and modify the privileges concurrently and securely. Access Binding Layer Authorization Provider DB Policy Engine Solr Policy Engine Sqoop Policy Engine Provider Backend Policy Metadata Store Local File/HDFS Sentry Database

12 Authorization Model for SQL Engines
目前代码已经与 Apache SQL engines整合,下载编译即可使用。 Apache Hive, Impala (incubating) 定制的安全控制Apache SQL engines的所有DML 和 DDL操作。 定制的安全控制Apache SQL engines的所有数据形式: 数据库,表,视图和列 定制的客户端接口 DB policy engine 如果需要把这个定制的模型扩展到其他模块上需要大量的开发工作。

13 Generic Authorization Model
Motivation Supports various data applications out of box Easy integration with any new components Very few implementation A more flexible design Generic authorization data model for defining sensitive resources Generic policy engine that includes: access actions and privileges abstraction could be interpreted by various engines

14 Generic Authorization Model
Hive Solr Kafka Access Binding Layer Authorization Provider Policy Engine Provider Backend Policy Metadata Store Local File/HDFS Sentry Database

15 Server Client Model Thrift client APIs
Get privilege; Grant/Revoke role; Grant/Revoke privilege; List roles Provide SentryCLI to manage policies and metadata sentryShell --grant_role_privilege --role analyst --privilege server=server1->db=db2->table=tab1->action=select --conf sentry-site.xml RESTful client APIs (Ongoing) Use HTTP requests to manage roles and privileges

16 Generic Authorization Model
Hive Solr Kafka Access Binding Layer Authorization Provider Policy Engine Provider Backend Sentry Shell REST API Policy Metadata Store Local File/HDFS Sentry Database

17 详解如何实现通用的授权管理(1) Sentry 内部定义了一个通用的数据模型 定义通用的模块数据表现形式 URL
例如 Solr:///collection=c1/field=f1; Hive:///database=db/table=tbl/column=cl 组件数据可以是 server, database, table, view, column或者是connection, link 等等 内部机制可翻译各个模块的各种操作和之上的授权协议 例如Solr授权用户可以 “search search through collection”,授权协议可为基于 collection数据的read操作 例如Hive授权用户可以 “show databases”,授权协议可以为基于 database数据的read操作 其余的内部操作可以是 create, insert, select,write, update 其余的授权形协议可以是 create database, insert into table, select column etc

18 详解如何实现通用的授权管理(2) 核心通用授权引擎 (policy engine/policy provider)
可以解析各个组件数据上的数据,授权协议和理解他们的操作 核心通用结合层和可插拔的通用接口(pluggable hook) 提供通用的泛义hook函数可插入到各个组件的compile或者是execute的不同阶段。 例如 HS2 integrate hook into build stage; HMS integrate hook into a pre metadata change event 重新设计内部系统数据库 重建模块数据 重建模块操作和协议

19 Apache Sentry 最新成果 Sentry支持授权审计日志 (例如:Cloudera nagivator)
Sentry支持不同集群间授权协议的导入和导出。例如 export/import to dump or load Sentry metadata Sentry目前支持Apache hive, impala, kafka, solr and sqoop2的数据安全管理 Sentry的通用权限访问控制模型简化了跟Hadoop其他生态组件之间的整合 Sentry hdfs sync (Cloudera cdh 5.3+) Column level privileges (Cloudera cdh5.5+) 支持在Amazon s3上的transactional data(Cloudera cdh5.7+) Sentry的系统和用户数据可以创建在amazon的 rds engine (Cloudera cdh 5.8+)

20 Sentry Upstream Releases
Integrate Sqoop2 with Sentry by using generic authorization model 2016年6月Sentry 1.7.0发布 Sentry 和 Hive v2整合 Sentry 通用权限管理模型和 Kafka整合 Sentry 通用权限管理模型和 Solr 整合

21 Sentry hdfs sync Sentry 的授权和协议可以同步为hdfs data的acls
Namenode维护一个基于内存的授权协议数据库 目前支持 Hive/Impala查询引擎的database/table/partition上的授权协议, 但是不支持 HMA HA 例如 Grant select on database database_name to role role_name; 拥有这个角色的用户可以被赋予hdfs的acls:group:user:r-x; 用户可以读database在hdfs filesystem上的目录和下面的文件。 hdfs dfs -ls -r /user/hive/warehouse/database.db

22 Column level privileges
支持基于查询引擎的表列级别的细粒度权限管理 在此之前用户创建大量的views来实现对表列的安全管理 例如: Grant select (column name) on table table_name to role role_name; Revoke select(column name) on table table_name from role role_name; 赋予此项权限的用户可以看到表列的数据: Select column_name from table table_name Desc table_name Show columns; 可以看到赋予权限的表列 Show grant command: 可以看到表列的权限 目前支持 Apache Hive/Impala/Hue

23 Sentry支持hdfs data on Amazon s3
hdfs的数据可以存储在s3云。 <property><name>fs.s3a.access.key</name><value>your-access-key</value></property> <property><name>fs.s3a.secret.key</name><value> your-secret-key</value></property> Hive 的用户数据也相应地可以存储在s3云。 另外Hive Warehouse也可以创建的s3云。 例如: CREATE EXTERNAL TABLE my_s3_table (viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT 'IP Address of the User', country STRING COMMENT 'country of origination') COMMENT 'This is the staging page view table' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054' STORED AS TEXTFILE LOCATION 's3a://sentry-s3/cdh-sentry/db/tbl'; Grant all on uri ‘s3a://sentry-s3/cdh-sentry/db/tbl’ to role role_name 如果用户没有uri privilege on s3,sentry会拒绝用户的创建表操作

24 在amazon rds engine上创建sentry
用户可以创建Amazon rds mysql instance,然后设置成 sentry的 meta database。这样数据就可以存储在rds上面而不是hdfs filesystem; 例如: <property> <name>sentry.store.jdbc.url</name> <value>jdbc:mysql://rdsname.us-west-1.rds.amazonaws.com:3306/sentryserver f56d1932dd2d7bce2d171e?useUnicode=true&characterEncoding=UTF-8</value> </property> 于此同时Hive的 metadata 数据也可以放在 Amazon rds上: <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://rdsname.us-west-1.rds.amazonaws.com:3306/hive1?useUnicode=true&characterEncoding=UTF-8</value> </property>

25 Sentry 最新进度 2013年sentry成为apache的孵化项目,经过两年半的开发,开发社区增长很快,很多组织为其贡献代码,现已经有50多个贡献者,其中31个成为committer。 2016年sentry成为apache的顶级开源项目。

26 Future Work Sentry HA (high availabitity) 可以支持 HMS HA 和 HDFS ACLs Sync
为通用权限管理提供 RESTful client APIs Attribute Based Access Control (abac)

27 Reference Apache Sentry: Integrating with Sentry New Universal Authorization Model:

28 Questions?


Download ppt "启用“Hadoop”的哨兵 - Sentry 的通用权限管理模型"

Similar presentations


Ads by Google