在本节中,我们将讨论如何从Canopy集群中添加或删除节点,以及如何处理节点故障。 为了更轻松地跨节点移动分片或在故障节点上重新复制分片,Canopy 支持完全在线的分片重平衡。我们将在以下部分中简要讨论分片重平衡器提供的相关功能。Canopy还可以用于SaaS或者分析系统中。
分片
为每个分布式表选择分片数是在拥有更多分片的灵活性与查询计划和跨分片执行的开销之间取得平衡。如果您决定在分配后更改表的分片数,则可以使用 alter_distributed_table 函数。
Multi-Tenant SaaS 应用场景
最佳选择因您的数据访问模式而异。例如,在 Multi-Tenant Database 用例中,我们建议在 32 - 128 个分片之间进行选择。对于小于 100GB 的较小工作负载,您可以从 32 个分片开始,而对于较大的工作负载,您可以选择 64 或 128 个。这意味着您可以从 32 台工作机器扩展到 128 台工作机器。
Real-Time Analytics 应用场景
在 Real-Time Analytics 用例中,分片计数应与工作线程上的内核总数相关。为确保最大并行性,您应该在每个节点上创建足够的分片,以便每个 CPU 核心至少有一个分片。我们通常建议创建大量初始分片,例如当前 CPU 内核数量的 2 倍或 4 倍。方便后续工作节点和 CPU 内核的扩展。 但是,请记住,对于每个查询,Canopy 都会为每个分片打开一个数据库连接,并且这些连接是有限的(不能超过操作系统的文件描述符数量)。在表数量和并发高的系统应该保持分片数量尽可能小,这样分布式查询就不必经常等待连接。换句话说,所需的连接数(最大并发查询数 * 分片数)通常不应超过系统中可能的总连接数(work节点数 * 每个work节点的最大连接数)。
集群扩展
Canopy 基于逻辑分片的架构允许您在不停机的情况下扩展集群。本节介绍如何向 Canopy 集群添加更多节点以提高查询性能/可扩展性。 添加工作节点 Canopy 将分布式表的所有数据存储在工作节点上。因此,如果你想通过增加更多的计算能力来扩展你的集群,你可以通过添加一个工人来实现。 要向集群添加新节点,首先需要在 pg_dist_node 目录表中添加该节点和端口(运行 LightDB 的节点)。您可以使用 canopy_add_node 来执行此操作。例如:
create database test; --所有机器实例执行-- 下面只在cn执行\c testSELECT * from canopy_add_node('10.20.30.10', 5432);SELECT * from canopy_add_node('10.20.30.11', 5432);SELECT * from canopy_add_node('10.20.30.12', 5432);
lightdb@test=# select * from pg_dist_node;
nodeid | groupid | nodename | nodeport | noderack | hasmetadata | isactive | noderole | nodecluster | metadatasynced | shouldhaveshards
--------+---------+-------------+----------+----------+-------------+----------+----------+-------------+----------------+------------------
5 | 5 | 10.20.30.11 | 5432 | default | t | t | primary | default | t | t
6 | 6 | 10.20.30.12 | 5432 | default | t | t | primary | default | t | t
(2 rows)
新节点可用于新分布式表的分片。如果不进行rebalance操作,新添加的工作节点是提高不了整个集群的性能的。 如果您的集群有非常大的引用表(reference table),它们会减慢节点的添加速度。在这种情况下,请考虑 canopy.replicate_reference_tables_on_activate (boolean) GUC参数。 Canopy 工作节点默认使用加密通信。新节点将拒绝与未启用 SSL 的其他工作节点通信。在没有加密通信的情况下将节点添加到集群时,您必须在创建 Canopy 扩展之前重新配置新节点。 First, from the coordinator node check whether the other workers use SSL: 首先,在协调节点检查其他工作节点是否使用 SSL:
lightdb@test=# SELECT run_command_on_workers('show ssl');
run_command_on_workers
--------------------------
(10.20.30.11,5432,t,off)
(10.20.30.12,5432,t,off)
(2 rows)
If they do not, then connect to the new node and permit it to communicate over plaintext if necessary: 如果值为off,则连接到新节点并允许它在必要时通过明文进行通信:
ALTER SYSTEM SET canopy.node_conninfo TO 'sslmode=prefer';SELECT pg_reload_conf();
非停机的rebalance 操作
Canopy 支持在线重平衡,如果你想将现有的分片移动到新添加的 worker节点,Canopy 提供了一个 rebalance_table_shards 函数实现。此函数将移动指定表的分片以将它们平均分配到worker节点。 该功能可配置为根据多种策略重新平衡分片,以最好地匹配您的数据库工作负载。请参阅功能参考以了解选择哪种策略。这是使用默认策略重新平衡分片的示例: SELECT rebalance_table_shards();
许多产品,如多租户 SaaS 应用程序,不能容忍停机时间,而在我们的托管服务上,重新平衡能够满足 Lightdb 上的这一要求。这意味着在移动数据时,应用程序的读写可以继续进行,业务最少中断化。
工作原理
Canopy 的分片重新平衡使用 Lightdb 逻辑复制将数据从旧分片(在复制术语中称为“发布者”)移动到新分片(“订阅者”)。逻辑复制允许应用程序读取和写入在复制分片数据时继续不间断。 Canopy 仅在更新元数据以将订阅者分片提升为活动状态时,才会在分片上放置一个简短的写锁。 正如 Lightdb 文档所解释的,源需要配置副本身份: A published table must have a “replica identity” configured in order to be able to replicate UPDATE and DELETE operations, so that appropriate rows to update or delete can be identified on the subscriber side. By default, this is the primary key, if there is one. Another unique index (with certain additional requirements) can also be set to be the replica identity. 换句话说,如果您的分布式表定义了主键,那么它就可以进行分片重新平衡,无需额外的工作。但是,如果它没有主键或明确定义的副本身份,则尝试重新平衡它会导致错误。例如:
-- creating the following table without REPLICA IDENTITY or PRIMARY KEYCREATE TABLE test_table (key int not null, value text not null);SELECT create_distributed_table('test_table', 'key');-- add a new worker node to simulate need for-- shard rebalancing-- running shard rebalancer with default behaviorSELECT rebalance_table_shards('test_table');/*
NOTICE: Moving shard 102040 from localhost:9701 to localhost:9700 ...
ERROR: cannot use logical replication to transfer shards of the
relation test_table since it doesn't have a REPLICA IDENTITY or
PRIMARY KEY
DETAIL: UPDATE and DELETE commands on the shard will error out during
logical replication unless there is a REPLICA IDENTIY or PRIMARY KEY.
HINT: If you wish to continue without a replica identity set the
shard_transfer_mode to 'force_logical' or 'block_writes'.
*/
下面给出解决办法 首先, 看表中是否有唯一约束 如果要复制的表已经有一个包含分布列的唯一索引,则选择该索引作为副本标识:
-- supposing my_table has unique index my_table_idx-- which includes distribution columnALTER TABLE my_table REPLICA IDENTITY USING INDEX my_table_idx;
虽然 REPLICA IDENTITY USING INDEX 很好,但我们建议不要将 REPLICA IDENTITY FULL 添加到表中。此设置将导致每次更新/删除都在订阅者端执行全表扫描以查找包含这些行的元组。在我们的测试中,我们发现这会导致比下面的解决方案四更差的性能。 判断表是否可以添加主键 向表中添加一个主键。如果想要的key刚好是分片键列,那就很简单了,加个约束就行了。否则,具有非分布列的主键必须是复合的并且也包含分布列。 唯一约束和主键都不能添加的情况 如果分布式表没有主键或副本标识,并且添加一个不明确或不需要,您仍然可以在 LightDB 上强制使用逻辑复制。在只接收读取和插入(没有删除或更新)的表上执行此操作是可以的。包括 rebalance_table_shards 的可选 shard_transfer_mode 参数:
SELECT rebalance_table_shards( 'test_table', shard_transfer_mode => 'force_logical');
在这种情况下,如果应用程序确实在复制期间尝试更新或删除,则该请求只会返回一个错误。复制完成后,删除和写入将再次成为可能。 添加CN节点 Canopy 协调节点仅存储有关表分片的元数据,不存储任何数据。这意味着所有计算都被下推到工作节点,协调器只对工作节点的结果进行最终聚合。因此,协调器不太可能成为读取性能的瓶颈。此外,很容易通过转移到更强大的机器来提升协调器。 但是,在协调节点成为性能瓶颈的一些写入繁重的用例中,用户可以添加另一个协调器。由于元数据表很小(通常只有几 MB),可以将元数据复制到另一个节点并定期同步。完成后,用户可以将他们的查询发送给任何协调器并扩展性能。
Dealing With Node Failures
在本小节中,我们将讨论如何在不导致 Canopy 集群停机的情况下处理节点故障。
Worker Node Failures
Canopy 使用 LightDB 流式复制,允许它容忍工作节点故障。此选项通过将 WAL 记录连续流式传输到备用节点来复制整个工作节点。您可以通过查阅 LightDB 复制文档 https://www.hs.net/lightdb/docs/html/warm-standby.html#STREAMING-REPLICATION 自己在本地配置流复制。 Coordinator Node Failures Canopy 协调节点维护元数据表以跟踪所有集群节点和这些节点上数据库分片的位置。元数据表很小(通常只有几 MB)并且不会经常更改。这意味着如果节点遇到故障,它们可以被复制并快速恢复。关于用户如何处理协调器故障,有多种选择。
- 使用LightDB流复制:可以使用LightDB的流复制特性来创建coordinator的热备。然后,如果主协调器节点出现故障,备用节点可以自动提升为主节点,为您的集群提供查询服务。
- 使用备份工具:由于元数据表很小,用户可以使用EBS卷,或者LightDB备份工具 LightDB backup tools 来备份元数据。然后,他们可以轻松地将元数据复制到新节点以恢复操作。 Tenant Isolation租户隔离 Canopy 现在包括租户隔离功能! Canopy 根据行的分布列的散列值将表行放入工作分片。多个分布列值通常属于同一个分片。在 Canopy 多租户用例中,这意味着租户通常共享分片。 但是,当租户的规模差异很大时,共享分片可能会导致资源争用。对于拥有大量租户的系统来说,这是一种常见的情况——我们观察到,随着租户数量的增加,租户数据的大小往往服从 Zipfian 分布。这意味着有一些非常大的租户,还有许多较小的租户。为了改善资源分配并保证租户QoS,值得将大租户移动到专用节点。 Canopy 提供了在特定节点上隔离租户的工具。这发生在两个阶段:1) 将租户的数据隔离到一个新的专用分片,然后 2) 将分片移动到所需的节点。要了解该过程,有助于准确了解数据行是如何分配给分片的。 每个分片都在 Canopy 元数据中标记了它包含的散列值范围(更多信息在 pg_dist_shard 的参考资料中)。 Canopy UDF isolate_tenant_to_new_shard(table_name, tenant_id) 通过三个步骤将租户移动到专用分片中:
- 为 table_name 创建一个新分片,其中 (a) 包括其分布列具有值 tenant_id 的行,并且 (b) 排除所有其他行。
- 将相关行从当前分片移动到新分片。
- 将旧分片一分为二,散列范围邻接上方和下方的切除。 此外,UDF 采用 CASCADE 选项,该选项不仅隔离 table_name 的租户行,还隔离与其共存的所有表的租户行。这是一个例子:
-- This query creates an isolated shard for the given tenant_id and-- returns the new shard id.-- General form:SELECT isolate_tenant_to_new_shard('table_name', tenant_id);-- Specific example:SELECT isolate_tenant_to_new_shard('lineitem', 135);-- If the given table has co-located tables, the query above errors out and-- advises to use the CASCADE optionSELECT isolate_tenant_to_new_shard('lineitem', 135, 'CASCADE');
┌─────────────────────────────┐
│ isolate_tenant_to_new_shard │
├─────────────────────────────┤
│ 102240 │
└─────────────────────────────┘
The new shard(s) are created on the same node as the shard(s) from which the tenant was removed. For true hardware isolation they can be moved to a separate node in the Canopy cluster. As mentioned, the isolate_tenant_to_new_shard function returns the newly created shard id, and this id can be used to move the shard: – find the node currently holding the new shard
SELECT nodename, nodeport FROM canopy_shards WHERE shardid = 102240;-- list the available worker nodes that could hold the shardSELECT * FROM master_get_active_worker_nodes();-- move the shard to your choice of worker-- (it will also move any shards created with the CASCADE option)SELECT canopy_move_shard_placement( 102240, 'source_host', source_port, 'dest_host', dest_port);
Note that canopy_move_shard_placement will also move any shards which are co-located with the specified one, to preserve their co-location. Viewing Query Statistics Canopy 现在包含 canopy_stat_statements 视图! 在管理 Canopy 集群时,了解用户正在运行哪些查询、涉及哪些节点以及 Canopy 对每个查询使用哪种执行方法是很有用的。 Canopy 在名为 canopy_stat_statements 的元数据视图中记录查询统计信息,其名称类似于 LightDB 的 lt_stat_statments。 pg_stat_statements 存储有关查询持续时间和 I/O 的信息,而 canopy_stat_statements 存储有关 Canopy 执行方法和分片分区键(如果适用)的信息。 Canopy 需要安装 lt_stat_statements 扩展以跟踪查询统计信息。此扩展将在 LightDB 中预先激活 让我们看看这是如何工作的。假设我们有一个名为 foo 的表,它是按其 id 列散列分布的。
-- create and populate distributed tablecreate table foo ( id int );select create_distributed_table('foo', 'id');insert into foo select generate_series(1,100);
我们将再运行两个查询,canopy_stat_statements 将显示 Canopy 如何选择执行它们。
-- counting all rows executes on all nodes, and sums-- the results on the coordinatorSELECT count(*) FROM foo;-- specifying a row by the distribution column routes-- execution to an individual nodeSELECT * FROM foo WHERE id = 42;
要了解这些查询是如何执行的,请查询统计表:
SELECT * FROM canopy_stat_statements; -[ RECORD 1 ]-+----------------------------------------------queryid | -6844578505338488014 userid | 10 dbid | 13340 query | SELECT count(*) FROM foo; executor | adaptive partition_key | calls | 1 -[ RECORD 2 ]-+----------------------------------------------queryid | 185453597994293667 userid | 10 dbid | 13340 query | insert into foo select generate_series($1,$2) executor | insert-selectpartition_key | calls | 1-[ RECORD 3 ]-+----------------------------------------------queryid | 1301170733886649828userid | 10dbid | 13340query | SELECT * FROM foo WHERE id = $1executor | adaptive partition_key | 42calls | 1
我们可以看到 Canopy 最常使用自适应执行器来运行查询。该执行器将查询分割成组成查询以在相关节点上运行,并在协调器节点上组合结果。对于第二个查询(按分布列 id = $1 过滤),Canopy 确定它只需要来自一个节点的数据。最后,我们可以看到 insert into foo select… 语句与 insert-select 执行器一起运行,它提供了运行此类查询的灵活性。 到目前为止,这个视图中的信息没有给我们任何我们无法通过对给定查询运行 EXPLAIN 命令来了解的信息。然而,除了获取有关单个查询的信息外,canopy_stat_statements 视图还允许我们回答诸如“集群中有多少百分比的查询是针对单个租户的?”等问题。 SELECT sum(calls), partition_key IS NOT NULL AS single_tenant FROM canopy_stat_statements GROUP BY 2;
sum | single_tenant -----±-------------- 2 | f 1 | t
In a multi-tenant database, for instance, we would expect the vast majority of queries to be single tenant. Seeing too many multi-tenant queries may indicate that queries do not have the proper filters to match a tenant, and are using unnecessary resources. We can also find which partition_ids are the most frequent targets. In a multi-tenant application these would be the busiest tenants.
例如,在多租户数据库中,我们希望绝大多数查询都是单租户的。看到太多多租户查询可能表明查询没有适当的过滤器来匹配租户,并且正在使用不必要的资源。 我们还可以找到哪些 partition_ids 是最常见的目标。在多租户应用程序中,这些将是最繁忙的租户。
SELECT partition_key, sum(calls) as total_queriesFROM canopy_stat_statementsWHERE coalesce(partition_key, '') <> ''GROUP BY partition_keyORDER BY total_queries descLIMIT 10; ┌───────────────┬───────────────┐ │ partition_key │ total_queries │ ├───────────────┼───────────────┤ │ 42 │ 1 │ └───────────────┴───────────────┘
统计过期时间
lt_stat_statements 视图限制它跟踪的语句数及其记录的持续时间。因为 canopy_stat_statements 跟踪 lt_stat_statements 中查询的严格子集,所以为两个视图选择相等的限制会导致它们的数据保留不匹配。不匹配的记录会导致视图之间的连接出现不可预测的行为。
There are three ways to help synchronize the views, and all three can be used together.
- Have the maintenance daemon periodically sync the canopy and lt stats. The GUC canopy.stat_statements_purge_interval sets time in seconds for the sync. A value of 0 disables periodic syncs.
- Adjust the number of entries in canopy_stat_statements. The canopy.stat_statements_max GUC removes old entries when new ones cross the threshold. The default value is 50K, and the highest allowable value is 10M. Note that each entry costs about 140 bytes in shared memory so set the value wisely.
- Increase lt_stat_statements.max. Its default value is 5000, and could be increased to 10K, 20K or even 50K without much overhead. This is most beneficial when there is more local (i.e. coordinator) query workload. 备注 Changing lt_stat_statements.max or canopy.stat_statements_max requires restarting the LightDB service. Changing canopy.stat_statements_purge_interval, on the other hand, will come into effect with a call to pg_reload_conf(). Resource Conservation Limiting Long-Running Queries 长时间运行的查询可能会持有锁、排队 WAL,或者只是消耗大量系统资源,因此在生产环境中最好防止它们运行时间过长。您可以在协调器和工作器上设置 statement_timeout 参数以取消运行时间过长的查询。
-- limit queries to five minutesALTER DATABASE canopy SET statement_timeout TO 300000;SELECT run_command_on_workers($cmd$ ALTER DATABASE canopy SET statement_timeout TO 300000; $cmd$); The timeout is specified in milliseconds. To customize the timeout per query, use SET LOCAL in a transaction:BEGIN;-- this limit applies to just the current transactionSET LOCAL statement_timeout TO 300000;-- ...COMMIT;
Security
Connection Management 备注 The traffic between the different nodes in the cluster is encrypted for NEW installations. This is done by using TLS with self-signed certificates. This means that this does not protect against Man-In-The-Middle attacks. This only protects against passive eavesdropping on the network. When Canopy nodes communicate with one another they consult a table with connection credentials. This gives the database administrator flexibility to adjust parameters for security and efficiency. To set non-sensitive libpq connection parameters to be used for all node connections, update the canopy.node_conninfo GUC: – key=value pairs separated by spaces. – For example, ssl options:
ALTER SYSTEM SET canopy.node_conninfo = ‘sslrootcert=/path/to/canopy-ca.crt sslcrl=/path/to/canopy-ca.crl sslmode=verify-full’;
There is a whitelist of parameters that the GUC accepts, see the node_conninfo reference for details. The default value for node_conninfo is sslmode=require, which prevents unencrypted communication between nodes. After changing this setting it is important to reload the lightdb configuration. Even though the changed setting might be visible in all sessions, the setting is only consulted by Canopy when new connections are established. When a reload signal is received, Canopy marks all existing connections to be closed which causes a reconnect after running transactions have been completed. SELECT pg_reload_conf(); – only superusers can access this table
– add a password for user jdoe INSERT INTO pg_dist_authinfo (nodeid, rolename, authinfo) VALUES (123, ‘jdoe’, ‘password=abc123’);
After this INSERT, any query needing to connect to node 123 as the user jdoe will use the supplied password. The documentation for pg_dist_authinfo has more info. – update user jdoe to use certificate authentication UPDATE pg_dist_authinfo SET authinfo = ‘sslcert=/path/to/user.crt sslkey=/path/to/user.key’ WHERE nodeid = 123 AND rolename = ‘jdoe’;
This changes the user from using a password to use a certificate and keyfile while connecting to node 123 instead. Make sure the user certificate is signed by a certificate that is trusted by the worker you are connecting to and authentication settings on the worker allow for certificate based authentication. Full documentation on how to use client certificates can be found in the lightdb libpq documentation. Changing pg_dist_authinfo does not force any existing connection to reconnect. Setup Certificate Authority signed certificates This section assumes you have a trusted Certificate Authority that can issue server certificates to you for all nodes in your cluster. It is recommended to work with the security department in your organization to prevent key material from being handled incorrectly. This guide covers only Canopy specific configuration that needs to be applied, not best practices for PKI management. For all nodes in the cluster you need to get a valid certificate signed by the same Certificate Authority. The following machine specific files are assumed to be available on every machine: • /path/to/server.key: Server Private Key • /path/to/server.crt: Server Certificate or Certificate Chain for Server Key, signed by trusted Certificate Authority. Next to these machine specific files you need these cluster or CA wide files available: • /path/to/ca.crt: Certificate of the Certificate Authority • /path/to/ca.crl: Certificate Revocation List of the Certificate Authority 备注 The Certificate Revocation List is likely to change over time. Work with your security department to set up a mechanism to update the revocation list on to all nodes in the cluster in a timely manner. A reload of every node in the cluster is required after the revocation list has been updated. Once all files are in place on the nodes, the following settings need to be configured in the LightDB configuration file:
# the following settings allow the lightdb server to enable ssl, and # configure the server to present the certificate to clients when # connecting over tls/ssl ssl = on ssl_key_file = '/path/to/server.key' ssl_cert_file = '/path/to/server.crt' # this will tell canopy to verify the certificate of the server it is connecting to canopy.node_conninfo = 'sslmode=verify-full sslrootcert=/path/to/ca.crt sslcrl=/path/to/ca.crl'
After changing, either restart the database or reload the configuration to apply these changes. Also, adjusting canopy.local_hostname (text) may be required for proper functioning with sslmode=verify-full. Depending on the policy of the Certificate Authority used you might need or want to change sslmode=verify-full in canopy.node_conninfo to sslmode=verify-ca. For the difference between the two settings please consult the official lightdb documentation. Lastly, to prevent any user from connecting via an un-encrypted connection, changes need to be made to lt_hba.conf. Many LightDB installations will have entries allowing host connections which allow SSL/TLS connections as well as plain TCP connections. By replacing all host entries with hostssl entries, only encrypted connections will be allowed to authenticate to LightDB. For full documentation on these settings take a look at the lt_hba.conf file documentation on the official LightDB documentation. 备注 When a trusted Certificate Authority is not available, one can create their own via a self-signed root certificate. This is non-trivial and the developer or operator should seek guidance from their security team when doing so. To verify the connections from the coordinator to the workers are encrypted you can run the following query. It will show the SSL/TLS version used to encrypt the connection that the coordinator uses to talk to the worker:
SELECT run_command_on_workers($$ SELECT version FROM pg_stat_ssl WHERE pid = pg_backend_pid() $$); ┌────────────────────────────┐ │ run_command_on_workers │ ├────────────────────────────┤ │ (localhost,9701,t,TLSv1.2) │ │ (localhost,9702,t,TLSv1.2) │ └────────────────────────────┘ (2 rows)
Increasing Worker Security For your convenience getting started, our multi-node installation instructions direct you to set up the lt_hba.conf on the workers with its authentication method set to “trust” for local network connections. However, you might desire more security. To require that all connections supply a hashed password, update the LightDB lt_hba.conf on every worker node with something like this:
# Require password access and a ssl/tls connection to nodes in the local # network. The following ranges correspond to 24, 20, and 16-bit blocks # in Private IPv4 address spaces. hostssl all all 10.0.0.0/8 md5 # Require passwords and ssl/tls connections when the host connects to # itself as well. hostssl all all 127.0.0.1/32 md5 hostssl all all ::1/128 md5
The coordinator node needs to know roles’ passwords in order to communicate with the workers. In Canopy the authentication information has to be maintained in a .pgpass file. Edit .pgpass in the lightdb user’s home directory, with a line for each combination of worker address and role: hostname:port:database:username:password
Sometimes workers need to connect to one another, such as during repartition joins. Thus each worker node requires a copy of the .pgpass file as well.
Row-Level Security
Canopy 现在支持分布式表的行级安全性。 LightDB 行级安全策略以每个用户为基础限制哪些行可以通过正常查询返回或通过数据修改命令插入、更新或删除。这在多租户 Canopy 集群中特别有用,因为它允许单个租户对数据库具有完整的 SQL 访问权限,同时对其他租户隐藏每个租户的信息。 我们可以通过使用绑定到表行级安全策略的数据库角色命名约定来实现租户数据的分离。我们将按编号顺序为每个租户分配一个数据库角色:tenant_1、tenant_2 等。租户将使用这些单独的角色连接到 Canopy。行级安全策略可以将角色名称与 tenant_id 分布列中的值进行比较,以决定是否允许访问。 以下是如何在由 tenant_id 分发的简化事件表上应用该方法。首先创建角色tenant_1 和tenant_2。然后以管理员身份运行以下命令:
CREATE TABLE events(
tenant_id int, id int, type text);SELECT create_distributed_table('events','tenant_id');INSERT INTO events VALUES (1,1,'foo'), (2,2,'bar');-- assumes that roles tenant_1 and tenant_2 existGRANT select, update, insert, delete
ON events TO tenant_1, tenant_2;
就目前而言,任何对该表具有选择权限的人都可以看到这两行。任何一个租户的用户都可以查看和更新另一个租户的行。我们可以使用行级表安全策略来解决这个问题。 每个策略由两个子句组成:USING 和 WITH CHECK。当用户尝试读取或写入行时,数据库会根据这些子句评估每一行。根据 USING 中指定的表达式检查现有表行,同时根据 WITH CHECK 中指定的表达式检查将通过 INSERT 或 UPDATE 创建的新行。
-- first a policy for the system admin "canopy" userCREATE POLICY admin_all ON events TO canopy -- apply to this role USING (true) -- read any existing row WITH CHECK (true); -- insert or update any row-- next a policy which allows role "tenant_<n>" to-- access rows where tenant_id = <n>CREATE POLICY user_mod ON events USING (current_user = 'tenant_' || tenant_id::text); -- lack of CHECK means same condition as USING-- enforce the policiesALTER TABLE events ENABLE ROW LEVEL SECURITY;
Now roles tenant_1 and tenant_2 get different results for their queries: Connected as tenant_1:
SELECT * FROM events; ┌───────────┬────┬──────┐ │ tenant_id │ id │ type │ ├───────────┼────┼──────┤ │ 1 │ 1 │ foo │ └───────────┴────┴──────┘ Connected as tenant_2:SELECT * FROM events; ┌───────────┬────┬──────┐ │ tenant_id │ id │ type │ ├───────────┼────┼──────┤ │ 2 │ 2 │ bar │ └───────────┴────┴──────┘INSERT INTO events VALUES (3,3,'surprise');/* ERROR: new row violates row-level security policy for table "events_102055" */
