SQL Server中的列存储索引

先决条件 (Prerequisite )

通过理论和实践措施可以更好地解释与SQL Server 2012列存储索引有关的讨论。 因此,对于实际测量部分–我将使用AdventureWorksDW2012示例数据库。 全面理解此主题的其余先决条件如下:
  • Conceptual understanding of clustered and non-clustered indexing in SQL Server.
    • Although columnstore indexing is a new feature of SQL Server 2012, like in many database platforms, the concept of indexing has largely been prevalent in SQL Server versions prior to SQL Server 2012. Thus, in order to fully appreciate columnstore indexing I suggest you familiarise yourself with SQL Server indexing capability.

    对SQL Server中的群集索引和非群集索引的概念理解。
    • 尽管列存储索引是SQL Server 2012的新功能,就像在许多数据库平台中一样,索引的概念在SQL Server 2012之前SQL Server版本中已经很普遍。因此,为了充分理解列存储索引,我建议您熟悉一下自己具有SQL Server索引功能。
  • Running Instance of SQL Server 2012 along with AdventureWorksDW2012 sample database

    SQL Server 2012的运行实例以及AdventureWorksDW2012示例数据库

在SQL Server 2012中引入列存储索引 (Introducing Columnstore Index in SQL Server 2012)

Columnstore index is an enterprise feature that is intended to improve the query response on T-SQL queries against OLAP systems. This feature was introduced in SQL Server 2012 as an alternative to the traditional row-based indexing that has been dominant in SQL Server versions prior to SQL Server 2012. It must be noted however that the notion of indexing against columns of a given object is not novel to Microsoft instead has been largely pioneered by SAP’s Sybase IQ database technology.

列存储索引是一项企业功能,旨在改善针对OLAP系统的T-SQL查询的查询响应。 SQL Server 2012中引入了此功能,以替代传统的基于行的索引,该索引在SQL Server 2012之前SQL Server版本中一直占主导地位。但是,必须注意,针对给定对象的列进行索引的概念并非如此相反,对于Microsoft而言,这本新颖的书主要是由SAP的Sybase IQ数据库技术开创的。

SQL Server 2012中的列存储索引入门 (Getting started with Columnstore Index in SQL Server 2012)

If you familiar with the creation of clustered or non-clustered indexes you should be able to find your way around creating a columnstore index. Figure 1 shows the complete syntax for creating a SQL Server 2012 columnstore index.

如果您熟悉集群索引或非集群索引的创建,则应该能够找到创建列存储索引的方法。 1显示了用于创建SQL Server 2012列存储索引的完整语法。

Figure 1

SQL Server中的列存储索引

图1

Whilst the arguments of columnstore index creation are further explained in the TechNet site, it must be noted that the inclusion of the NONCLUSTERED argument does not make any difference during the creation of columnstore index in SQL Server 2012. The reason for this is that clustered columnstore indexes are not supported in SQL Server 2012 which means that every SQL Server 2012 columnstore index is always created as NONCLUSTERED columnstore index – regardless of whether or not you specify the NONCLUSTERED argument in your create columnstore index statement.

尽管TechNet网站上进一步解释了列存储索引创建的参数,但必须注意,在SQL Server 2012中创建列存储索引的过程中,包含NONCLUSTERED参数没有任何区别。其原因是群集的列存储SQL Server 2012不支持索引,这意味着每个SQL Server 2012列存储索引始终创建为NONCLUSTERED列存储索引–不管您是否在create columnstore index语句中指定了NONCLUSTERED参数。

Thus, Figure 2 and Figure 3 show the respective creation of columnstore indexes against dbo.FactProductInventory_ApexSQL_CC1 and dbo.FactProductInventory tables.

因此, 图2图3分别显示了针对dbo.FactProductInventory_ApexSQL_CC1dbo.FactProductInventory表的列存储索引的创建。

Figure 2                                                                                                                   Figure 3

SQL Server中的列存储索引

图2 图3

SQL Server中的列存储索引

Although the script in Figure 2 differs from the one shown in Figure 3 through its omission of the NONCLUSTERED argument, the execution of both scripts invariably results into the creation of NONCLUSTERED index as shown in Figure 4.

尽管图2中的脚本由于省略了NONCLUSTERED参数而与图3所示的脚本有所不同,但是两个脚本的执行总是导致图4所示的NONCLUSTERED索引的创建。

Figure 4

SQL Server中的列存储索引

图4

In addition to getting started with the syntax of creating columnstore index, you will notice that there is now a new icon in SQL Server 2012 object explorer that indicates the presence of a columnstore index in a given object as shown in Figure 5.

除了开始创建列存储索引的语法外,您还将注意到SQL Server 2012对象浏览器中现在有一个新图标,该图标指示给定对象中存在列存储索引, 如图5所示。

Figure S5

SQL Server中的列存储索引

图S5

Finally, as you continue getting familiar with columnstore index, take note of the list of data types that are permitted in a create columnstore index statement. Table 1 outlines a list of supported data types as per SQL Server Data Type Category.

最后,随着您继续熟悉列存储索引,请注意create columnstore index语句中允许的数据类型列表。 表1列出了每个SQL Server 数据类型类别所支持的数据类型。

Table 1

表格1

Character Strings Unicode Character Strings Exact Numerics Approximate Numerics Date and Time
CHAR NCHAR INT FLOAT DATE
VARCHAR – except for MAX values NVARCHAR – except for MAX values BIGINT REAL DATETIME
SMALLINT DATETIME2
TINYINT SMALLDATETIME
BIT TIME
MONEY DATETIMEOFFSET
– with a scale less than 2
SMALLMONEY
DECIMAL
NUMERIC
字符串 Unicode字符串 精确数值 近似数值 日期和时间
焦炭 NCHAR INT 浮动 日期
VARCHAR – MAX值除外 NVARCHAR – MAX值除外 比金特 真实 约会时间
小灵通 DATETIME2
天音 约会时间
比特 时间
DATETIMEOFFSET
–比例小于2
小钱
十进制
数字

SQL Server 2012中的列存储索引的限制 (Limitations of Columnstore Index in SQL Server 2012 )

Unfortunately, notwithstanding some of the potential gains in columnstore indexing, there are several restrictions. In this section we will take a look at the limitations of columnstore index usage in SQL Server 2012. However, due to the fact SQL Server 2014 had already been released at the time of writing this article, it must then be noted that some of the columnstore index restrictions applicable to SQL Server 2012 have since been resolved in SQL Server 2014 – i.e. creation of clustered columnstore index is now supported in SQL Server 2014. Nevertheless, as of SQL Server 2012, the limitations of columnstore index were as follows:

不幸的是,尽管列存储索引有一些潜在的好处,但还是有一些限制。 在本节中,我们将研究SQL Server 2012中列存储索引使用的限制。但是,由于在撰写本文时已经发布了SQL Server 2014,因此必须注意此后,SQL Server 2014中已解决了适用于SQL Server 2012的列存储索引限制-即,SQL Server 2014现在支持创建群集的列存储索引 。 但是,从SQL Server 2012开始,列存储索引的限制如下:

  1. Restrictions on Column-Store Indexes:
    1. Columnstore index is not allowed to be created against a sparse column
    2. Columnstore index is not allowed to have more than 1024 columns
    3. Columnstore index is not allowed to be used as a primary key in a given object
    4. Columnstore index is not allowed to be used as a foreign key in a given object
    5. During a create columnstore index statement, you are not allowed to specify sorting keywords (i.e. ASC, DESC) on the columns that are used as part of the columnstore index
    6. Columnsstore index is not allowed to be created against a view or indexed view
    7. Columnstore create index statement is not allowed to contain the INCLUDE keyword
    8. You are not allowed to modify the columnsstore index using the ALTER INDEX statement
    9. A columnstore index can only be created in a table that currently has no data
    10. You are restricted to creating a single columnstore index per given object
    11. You are not allowed to update the data of an object that uses columnstore indexing
    12. Although you are allowed to add columns to a table that uses columnstore index, you are not allowed to drop any of the columns that are referenced in a columnstore index

    对列存储索引的限制:
    1. 不允许针对稀疏列创建Columnstore索引
    2. 列存储索引不允许有超过1024个列
    3. 列存储索引不允许用作给定对象中的主键
    4. 列存储索引不允许在给定对象中用作外键
    5. 在create columnstore index语句期间,不允许您在用作columnstore索引一部分的列上指定排序关键字(即ASC,DESC)
    6. 不允许针对视图或索引视图创建Columnsstore索引
    7. Columnstore创建索引语句不允许包含INCLUDE关键字
    8. 不允许使用ALTER INDEX语句修改columnstore索引
    9. 只能在当前没有数据的表中创建列存储索引
    10. 您仅限于为每个给定对象创建单个列存储索引
    11. 您不允许更新使用列存储索引的对象的数据
    12. 尽管可以将列添加到使用列存储索引的表中,但不允许删除列存储索引中引用的任何列
  2. SQL Server 2012 Features that do not support Column-Store Indexes:
    1. SQL Server Replication
    2. Change Tracking
    3. Change Data Capture
    4. Filestream
    5. Compression
      1. Page
      2. Row
      3. SQL Server Vardecimal Storage Format

    不支持列存储索引SQL Server 2012功能:
    1. SQL Server复制
    2. 变更追踪
    3. 更改数据捕获
    4. 文件流
    5. 压缩
      1. SQL Server十进制存储格式
  3. SQL Server 2012 Data Types that do not support Column-Store Indexes:

    Table 2

    Character Strings Unicode Character Strings Exact Numerics Binary Strings Date and Time Other Data Types
    VARCHAR (MAX) NVARCHAR (MAX) DECIMAL – with precision greater than 18 digits BINARY DATETIMEOFFSET – with scale greater than 2 XML
    NUMERIC – with precision greater than 18 digits VARBINARY HIERARCHYID
    IMAGE SPATIALTYPES
    TIMESTAMP
    SQLVARIANT
    UNIQUEIDENTIFIER
    ROWVERSION

    不支持列存储索引SQL Server 2012数据类型:

    表2

    字符串 Unicode字符串 精确数值 二进制字符串 日期和时间 其他数据类型
    VARCHAR (最大值) NVARCHAR (最大) 十进制 -精度大于18位 二进制 DATETIMEOFFSET-比例大于2 XML格式
    NUMERIC –精度大于18位 VARBINARY 等级
    图片 空间类型
    时间戳
    SQLVARIANT
    唯一标识符
    ROWVERSION

教程:SQL Server 2012中基于行的索引与列存储的索引 (Tutorial: Row based index vs Columnstore Index in SQL Server 2012)

In this section we take a closer look at some of the benefits that can be incurred by using a columnstore index in a data warehouse environment. We will be using the AdventureWorksDW2012.dbo.FactProductInventory table to test the regular index part of this tutorial. I have chosen this table as it has largest record count (at 776286) of all the objects that ships with the AdventureWorksDW2012 sample database.

在本节中,我们将仔细研究在数据仓库环境中使用列存储索引可带来的一些好处。 我们将使用AdventureWorksDW2012.dbo.FactProductInventory表测试本教程的常规索引部分。 我选择此表是因为它在AdventureWorksDW2012示例数据库附带的所有对象中具有最大记录数(为776286)。

Figure 6 shows the entity relationship diagram of the dbo.FactProductInventory table. The clustered index and primary key uses a combination of ProductKey and DateKey fields.

图6显示了dbo.FactProductInventory表的实体关系图。 聚集索引和主键使用ProductKeyDateKey字段的组合。

Figure 6

SQL Server中的列存储索引

图6

For testing the columnstore index part, I have done the following:

为了测试columnstore索引部分,我做了以下工作:

  1. Created a new fact table called dbo.FactProductInventory_ApexSQL, which is based on the definition of the sample dbo.FactProductInventory table, and then

    创建了一个新的事实表dbo.FactProductInventory_ApexSQL ,它基于示例dbo.FactProductInventory表的定义,然后
  2. Created a nonclustered columnstore index called CIX_FactProductInventory_ApexSQL on the dbo.FactProductInventory_ApexSQL table (Figure 7 shows the definition of the CIX_FactProductInventory_ApexSQL columnstore index).

    Figure 7

    SQL Server中的列存储索引

    在dbo.FactProductInventory_ApexSQL表上创建了一个名为CIX_FactProductInventory_ApexSQL的非聚集列存储索引( 图7显示了CIX_FactProductInventory_ApexSQL列存储索引的定义)。

    图7

Now for the purposes of testing both scenarios (row based vs columnstore indexes), I am going to apply the same sample user story against dbo.FactProductInventory and dbo.FactProductInventory_ApexSQL. The user story is as follows:

现在,出于测试这两种方案(基于行与列存储的索引)的目的,我将针对dbo.FactProductInventory和dbo.FactProductInventory_ApexSQL应用相同的示例用户故事 。 用户故事如下:

“As a user, I would like to see a report that show the totals of Unit Costs, Units In, Units Out, and Units Balance per Product”

“作为用户,我希望看到一个报告,其中显示每个产品的单位成本,单位输入,单位输出和单位余额的总计”

Figure 8 and Figure 9 show respective T-SQL queries for retrieving the data to satisfy the user story against dbo.FactProductInventory and dbo.FactProductInventory_ApexSQL tables.

图8图9分别显示了T-SQL查询,用于检索数据以满足dbo.FactProductInventory和dbo.FactProductInventory_ApexSQL表的用户需求 。

Figure 8

SQL Server中的列存储索引

图8

Figure 9

SQL Server中的列存储索引

图9

Figure 10 and Figure 11 show IO Statistics for the respective execution of T-SQL queries shown in Figure 8 and Figure 9. Just by comparing the statistics in terms of logical reads and object scan count, it can be seen that there are significant performance benefits for the table that uses a columnstore index.

图10图11显示了分别执行图8图9所示的T-SQL查询的IO统计信息。 仅通过比较逻辑读取和对象扫描计数方面的统计信息,就可以看出使用列存储索引的表具有显着的性能优势。

Figure 10

SQL Server中的列存储索引

图10

Figure 11

SQL Server中的列存储索引

图11

Finally, as probably could have been expected after looking at the IO Statistics, the execution plan of the T-SQL query that uses the columnstore index (shown in Figure 13) is much simpler and less costly as compared to that of the T-SQL query that uses a regular row-based index (shown in Figure 12).

最后,就像在看完IO统计信息后可能会预期的那样,与T-SQL相比,使用列存储索引的T-SQL查询的执行计划(如图13所示)要简单得多且成本更低。使用常规基于行的索引的查询(如图12所示)。

Figure 12

SQL Server中的列存储索引

图12

Figure 13

SQL Server中的列存储索引

图13

结论 (Conclusion)

The newly introduced concept of indexing columns in SQL Server 2012 known as columnstore index can significantly improve the response to T-SQL query requests. However, in spite of such improvements, there are several restrictions to columnstore index usage that may force one to relinquish SQL Server 2012 features such as replication and change tracking for the benefit of columnstore indexing.

SQL Server 2012中新引入的索引列概念(称为列存储索引)可以显着改善对T-SQL查询请求的响应。 但是,尽管进行了这样的改进,但列存储索引的使用仍存在一些限制,这些限制可能会迫使人们放弃SQL Server 2012功能(例如复制和更改跟踪)以利于列存储索引。

翻译自: https://www.sqlshack.com/columnstore-index-sql-server/