卡桑德拉嵌套查询

问题描述：

我是cassandra的新手，我试图插入笔记本电脑映射列表的员工，如下所示，'laptoplist'是一个UDT。卡桑德拉嵌套查询

cqlsh:sourceutilization> SELECT * from employee ; 

id | laptoplist                  | name  | type 
----+-----------------------------------------------------------------------------------+-----------+------------ 
    5 | [{laptopid: 5, cpu: 9, memory: 18, networkutilization: 25, diskutilization: 85}] | testname5 | staffType5 
    1 | [{laptopid: 1, cpu: 94, memory: 36, networkutilization: 13, diskutilization: 66}] | testname1 | staffType1 
    8 | [{laptopid: 8, cpu: 64, memory: 1, networkutilization: 15, diskutilization: 71}] | testname8 | staffType8 
    0 | [{laptopid: 0, cpu: 4, memory: 95, networkutilization: 20, diskutilization: 16}] | testname0 | staffType0 
    2 | [{laptopid: 2, cpu: 49, memory: 37, networkutilization: 20, diskutilization: 88}] | testname2 | staffType2 
    4 | [{laptopid: 4, cpu: 13, memory: 67, networkutilization: 67, diskutilization: 10}] | testname4 | staffType4 
    7 | [{laptopid: 7, cpu: 11, memory: 75, networkutilization: 75, diskutilization: 97}] | testname7 | staffType7 
    6 | [{laptopid: 6, cpu: 27, memory: 34, networkutilization: 2, diskutilization: 92}] | testname6 | staffType6 
    9 | [{laptopid: 9, cpu: 12, memory: 10, networkutilization: 19, diskutilization: 73}] | testname9 | staffType9 
    3 | [{laptopid: 3, cpu: 47, memory: 13, networkutilization: 72, diskutilization: 54}] | testname3 | staffType3

现在，我想查询类似下面，它是如何可能

select * from employee where laptoplist.networkutilization > 50;

仅供参考，我使用3.1版本的卡桑德拉。

由于提前，哈利

[Cassandra - 带有非主键缺陷的WHERE子句]的可能重复（https://stackoverflow.com/questions/35524516/cassandra-where-clause-with-non-primary-key-disadvantages） – muru

答

这是行不通的好，原样。在这里得到你想要的东西需要进行一些更改。在Cassandra中有两件事通常可以提供帮助。

如果您在使用数据模型时遇到问题，请问问自己它是什么样的时间序列。

借助Cassandra的分布式附加存储引擎，可以轻松地调整时间序列和事件跟踪等用例。有时候，当调整到这个角度时，数据模型更有意义（从Cassandra的角度来看）。

构建您的表以符合您的查询模式。

我看到什么可能是ID的主键。但是我没有看到（至少在上面）是对ID进行过滤的任何查询。我可以说，像员工和笔记本电脑这样的东西很重要，而且可能是独一无二的。但独特的键并不总是使最好的信息过滤器。

要问的主要问题是，你想要在这里得到什么？

对我来说，它看起来像你想看到正在经历高网络利用率的用户。高网络利用率是一个（希望）暂时的事情，那么为什么我们不添加一个时间组件（checkpoint_time）？ IMO，随着时间的推移跟踪计算资源的利用率是有意义的。考虑到这些点后，我想出了这样一个数据模型：

[email protected]:stackoverflow> CREATE TABLE employee_laptop__by_network_utilization (
         timebucket text, 
         checkpoint_time timestamp, 
         employee_id bigint, 
         name text, 
         type text, 
         laptop_id bigint, 
         cpu bigint, 
         memory bigint, 
         network_utilization bigint, 
         disk_utilization bigint, 
         PRIMARY KEY ((timebucket),network_utilization, 
          checkpoint_time,employee_id,laptop_id) 
        ) WITH CLUSTERING ORDER by 
          (network_utilization ASC, checkpoint_time DESC, 
          employee_id ASC, laptop_id ASC);

插入了一些行之后，我现在可以查询谁是10月12日经历了网络利用率> 50员工/笔记本电脑的组合，2017年

[email protected]:stackoverflow> SELECT * FROm employee_laptop__by_network_utilization 
    WHERE timebucket='20171012' AND network_utilization > 50; 

timebucket | network_utilization | checkpoint_time     | employee_id | laptop_id | cpu | disk_utilization | memory | name  | type 
------------+---------------------+---------------------------------+-------------+-----------+-----+------------------+--------+----------+----------- 
    20171012 |     55 | 2017-10-12 12:30:00.000000+0000 |   1 |   1 | 4 |    62 |  19 | Jebediah |  Pilot 
    20171012 |     55 | 2017-10-12 12:15:00.000000+0000 |   1 |   1 | 19 |    62 |  18 | Jebediah |  Pilot 
    20171012 |     72 | 2017-10-12 12:00:00.000000+0000 |   3 |   3 | 47 |    54 |  13 |  Bob | Scientist 

(3 rows)

首先，我需要一个好的分区键，将的查询都意义，和防止我的分区从绑定的增长。因此，我选择了一个名为“日期桶”的timebucket。这样，我可以将查询隔离一天，并确保每个查询都由单个节点提供服务。

接下来，我聚集在network_utilization上，因为这是该模型主要关注的主要列。它是第一个聚类列，因为我们不希望在查询中过滤列的方式提供更多。

checkpoint_time是PRIMARY KEY中的下一列，主要是因为具有相同timebucket和network_utilization的请求可能更适合按时间排序（DESCending）。

最后，我增加了employee_id的唯一性，然后laptop_id，因为员工可能有多台笔记本电脑。

现在，我相信你会发现我的解决方案不完全符合你的用例。那是因为Cassandra的数据建模是非常用以用例为中心。通常一个好的解决方案不适合另一个。但是，这是获取数据的一种方法。

答

您不能只在任何列上执行范围查询。 cassandra有一些限制。

在cassandra上创建任何模式之前，您必须具体说明您希望执行查询的方式，否则大部分时间您的模式将无法工作。

要进行一个范围查询，如大于，大于等于，小于，小于等于，您需要在模式中指定聚类列。

我们不能简单地在cassandra中指定Clustering列。您必须在cassandra的每个模式中声明分区键。

要对集群列进行查询，您必须传递查询中以前的所有主键的值。

卡桑德拉嵌套查询

相关推荐