integration_Integration Services性能最佳实践–写目标

integration

With this article, we continue part 1 of common best practices to optimize the performance of Integration Services packages. As mentioned in the previous article “Integration Services (SSIS) Performance Best Practices – Data Flow Optimization“, it’s not an exhaustive list of all possible performance improvements for SSIS packages. It merely represents a set of best practices that will guide you through the most common development patterns.

在本文中,我们继续了常见最佳实践的第1部分,以优化Integration Services软件包的性能。 如前一篇文章“ Integration Services(SSIS)性能最佳实践–数据流优化 ”中所述,它不是SSIS包所有可能的性能改进的详尽列表。 它仅代表了一组最佳实践,可以指导您完成最常见的开发模式。

#6使用FastLoad选项 (#6 Use the FastLoad option)

This tip only applies when you are using the OLE DB Destination and if your destination supports the fast load option. It’s very common though to have SQL Server as the destination. In that case, use the fast load option. If not, you are inserting records row by row which has a devastating impact on the transaction log.

仅当您使用OLE DB目标并且目标支持快速加载选项时,此技巧才适用。 尽管将SQL Server作为目标是非常普遍的。 在这种情况下,请使用快速加载选项。 如果不是这样,您将逐行插入记录,这对事务日志具有破坏性的影响。

integration_Integration Services性能最佳实践–写目标

The default settings work fine in general. You can tweak them a bit for a specific need though. A good use case is when loading data to a clustered column store index. Here you want that only batches of at least 102,400 rows (preferably over a million) are sent to SQL Server. In this case, the batches end up immediately in a compressed row group instead of the delta store (which is row based and thus slower). More info can be found at Clustered Columnstore Indexes – part 51 (“SSIS, DataFlow & Max Buffer Memory”) and Loading Data into Clustered Columnstore Index with SSIS.

通常,默认设置可以正常工作。 不过,您可以针对特定需求对其进行一些调整。 一个好的用例是将数据加载到群集列存储索引中。 在这里,您只希望将至少102,400行(最好超过一百万)的批次发送到SQL Server。 在这种情况下,批次将立即以压缩的行组而不是增量存储(基于行,因此速度较慢)结束。 有关更多信息,请参见群集列存储索引–第51部分(“ SSIS,DataFlow和最大缓冲内存”)使用SSIS将数据加载到群集列存储索引中

In most cases, the OLE DB Destination – when writing to SQL Server using the Native Client – outperforms all other alternatives. The SQL Server Destination is supposed to be faster (at least in 2005), but it has some limitations and nowadays the OLE DB Destination is the best option. OLE DB is deprecated though by Microsoft. This means you can still use the OLE DB connection managers, but there are no new features being released. For example, AlwaysEncrypted is supported by ADO.NET but not by OLE DB.

在大多数情况下,使用本机客户端写入SQL Server时,OLE DB目标的性能优于所有其他选择。 SQL Server Destination应该更快(至少在2005年),但是它有一些局限性,如今,OLE DB Destination是最好的选择。 虽然Microsoft不推荐使用OLE DB。 这意味着您仍然可以使用OLE DB连接管理器,但是没有发布任何新功能。 例如,ADO.NET支持AlwaysEncrypted,但OLE DB不支持。

#7准备目的地 (#7 Prepare your Destination)

Just as you need to prepare your source (for example by putting the right indexes into place to speed up reading), you need to prepare your destination. Here are some examples of how you can make your destination go faster (most apply when the destination is SQL Server):

正如您需要准备源文件(例如,通过放置正确的索引以加快阅读速度)一样,您也需要准备目的地。 以下是一些有关如何使目标运行速度更快的示例(当目标是SQL Server时最适用):

  • Put the destination on a fast disk. SSDs are preferred.

    将目标放在快速磁盘上。 SSD是首选。

  • For large loads, you can disable constraints on the table and re-enable them after the load. The same applies for indexes: you can drop them first and recreate them after the SSIS package has finished. This avoids fragmentation of your indexes.

    对于较大的负载,您可以禁用表上的约束,然后在负载后重新启用它们。 同样适用于索引:您可以先删除它们,然后在SSIS包完成后重新创建它们。 这样可以避免索引碎片。

  • If you write to a database, make sure the database files have the appropriate size. You don’t want them to be automatically growing all the time (especially not with the default settings). Estimate what your total data volume is and how it is going to grow in the coming months. Then size your files accordingly.

    integration_Integration Services性能最佳实践–写目标

    如果写入数据库,请确保数据库文件具有适当的大小。 您不希望它们一直自动增长(尤其是不使用默认设置)。 估计您的总数据量以及未来几个月的数据量将如何增长。 然后相应地调整文件大小。

  • If possible, try to load in parallel. You’ll probably need to take precautions to avoid locking and blocking. Try to avoid a “hot spot” in your clustered index: all the load processes try to insert at the last page of the clustered index at the same time. The Balanced Data Distributor can help you to parallelize your data flow. This component is a separate download for SSIS 2012 and 2014, but is built-in into SSIS 2016.

    integration_Integration Services性能最佳实践–写目标

    如果可能,请尝试并行加载。 您可能需要采取预防措施以避免锁定和阻塞。 尝试避免在聚簇索引中出现“热点”:所有加载过程都尝试同时插入聚簇索引的最后一页。 平衡数据分配器可以帮助您并行化数据流。 该组件是针对SSIS 2012和2014的单独下载,但内置于SSIS 2016中。

  • Go for as minimal logging as possible. Set the recovery model to Simple, if your disaster recovery policy allows it. Do bulk inserts (using the fast load option) and configure them to be minimally logged. You can check the Data Loading Performance Guide on how to do this. The guide is from SQL Server 2008, but most information is still applicable today. As with data files, size your log files appropriately.

    尽可能减少日志记录。 如果您的灾难恢复策略允许,请将恢复模型设置为Simple 。 进行大容量插入(使用快速加载选项)并将其配置为最少记录。 您可以查看《 数据加载性能指南》中有关如何执行此操作的信息。 该指南来自SQL Server 2008,但是大多数信息仍然适用。 与数据文件一样,适当调整日志文件的大小。

Remember if your destination is slow, your source will be slow as well due to backpressure in the data flow. In other words, if your destination can’t keep up with the source, SSIS will tell the source to slow down. You can easily test this by removing the destination and running the package. If suddenly your data flow is very fast, the destination is the problem.

请记住,如果目的地很慢,那么由于数据流中的背压,您的来源也会很慢。 换句话说,如果您的目的地跟不上源,SSIS会告诉源放慢速度。 您可以通过删除目标并运行程序包来轻松测试。 如果突然您的数据流非常快,那么目的地就是问题。

#8了解ETL和ELT之间的区别 (#8 Know the difference between ETL and ELT)

Integration Services is a great tool and if designed correctly, performance can be extraordinary. However, you don’t need to approach all problems with a hammer. For example, if the source and the destination are on the same destination, ELT (Extract – Load – Transform) probably will outperform SSIS. With ELT, the data is loaded into the destination (with SSIS or with stored procedures) and the transformations are done with stored procedures. Especially when SSIS resides on another server, ELT will be preferred because it will save you the network round trip between the servers. When you use the SSIS data flow, all data needs to be read into the memory of the server where the package is running. If the server is another one than the database server, you need to transfer all the data over the network. Twice.

Integration Services是一个很好的工具,如果设计正确,性能可能会非常出色。 但是,您不需要用锤子解决所有问题。 例如,如果源和目标在同一目标上,则ELT(提取-加载-转换)可能会胜过SSIS。 使用ELT,将数据加载到目标中(使用SSIS或存储过程),并使用存储过程完成转换。 特别是当SSIS驻留在另一台服务器上时,将首选ELT,因为它可以节省服务器之间的网络往返时间。 当您使用SSIS数据流时,所有数据都需要读入运行包的服务器的内存中。 如果服务器是数据库服务器以外的服务器,则需要通过网络传输所有数据。 两次。

Integration Services excels in transferring data from server A to server B. However, once the data is in the destination, ELT probably is the best option to move forward. This doesn’t mean you don’t need to use SSIS anymore. The SSIS control flow is ideal for visualizing and orchestrating the flows of your data. Furthermore, it has the additional advantage of easily running tasks in parallel (which is not so easy in a stored procedure).

Integration Services擅长将数据从服务器A传输到服务器B。但是,一旦数据到达目标位置,ELT可能是前进的最佳选择。 这并不意味着您不再需要使用SSIS。 SSIS控制流是可视化和编排数据流的理想选择。 此外,它还具有易于并行运行任务的附加优点(在存储过程中并不是那么容易)。

Aside from data movement, some tasks are better done using SQL statements instead of using transformations in a data flow. The most obvious examples are sorting and aggregating data, as discussed in the section about blocking components. Updating data is also better done outside the data flow in an Execute SQL Task. Think about the transformations you want to do and where they will be the most efficient: in the data flow or in the database engine.

除了数据移动外,某些任务最好使用SQL语句完成,而不要使用数据流中的转换。 最明显的示例是对数据进行排序和聚合,如有关阻塞组件的部分所述。 在Execute SQL Task中,最好在数据流之外进行数据更新。 考虑一下您要执行的转换以及它们将在哪里最有效:数据流或数据库引擎中的转换。

结论 (Conclusion)

If you follow a couple of design principles, Integration Services packages will perform just fine in most scenarios. This article presented you these design principles along with some extra performance tips. As mentioned, it’s not an exhaustive list – I’m sure there are other performance tricks out there – but rather a collection of the top practices on my checklist when I audit packages.

如果您遵循几个设计原则,那么Integration Services软件包在大多数情况下都可以正常运行。 本文向您介绍了这些设计原则以及一些其他性能提示。 如前所述,这并不是一个详尽的清单-我敢肯定还有其他性能技巧-而是在审核软件包时清单上的一些*实践。

参考链接 (Reference links)

翻译自: https://www.sqlshack.com/integration-services-performance-best-practices-writing-destination/

integration