无法创建ssis数据流任务_SSIS中的数据分析任务概述
无法创建ssis数据流任务
The Data Profiling task in SSIS is an important task that can be used to assess the quality of data sources. Unfortunately, this component is not widely used by many business intelligence developers.
SSIS中的数据分析任务是一项重要任务,可用于评估数据源的质量。 不幸的是,许多商业智能开发人员并未广泛使用此组件。
In this article, we will give a brief overview of data profiling and the Data Profiling task in SSIS. Furthermore, we will mention some of its limitations and alternatives.
在本文中,我们将简要概述数据剖析和SSIS中的数据剖析任务。 此外,我们将提及其一些局限性和替代方案。
为什么数据分析很重要? (Why is Data Profiling important?)
Before using any data source, the best practice is to assess its data quality and determine whether the data source is usable in a specific context.
在使用任何数据源之前,最佳实践是评估其数据质量并确定该数据源在特定上下文中是否可用。
There are many factors for determining data quality, such as completeness, consistency, uniqueness, timeliness, etc. Some of these factors require aggregating the data with other sources or performing some complex operations. But, the first thing to do is to analyze the data itself (NULL values ratio, values lengths, and other measurements) since this doesn’t require any reference data or external aggregations.
确定数据质量的因素很多,例如完整性,一致性,唯一性,及时性等。其中一些因素需要将数据与其他来源进行汇总或执行一些复杂的操作。 但是,首先要做的是分析数据本身(NULL值比,值长度和其他度量),因为这不需要任何参考数据或外部聚合。
As an example, assume that you are building a reference database of world countries, and you are using an online data source. If the data source contains many missing values, it cannot be used in this context. So, it is very important to check the ratio between NULL and non-NULL values before deciding whether to use this data source.
例如,假设您正在建立世界国家的参考数据库,并且正在使用在线数据源。 如果数据源包含许多缺失值,则不能在此上下文中使用它。 因此,在决定是否使用此数据源之前,检查NULL和非NULL值之间的比率非常重要。
Based on my own experience, data profiling can be used to build a knowledge base that stores quality information for each data source and specifies the contexts for the use of the data source.
根据我自己的经验,可以使用数据概要分析来构建知识库,该知识库存储每个数据源的质量信息并指定使用该数据源的上下文。
This article is not intended to describe data quality and its importance, so refer to the following article for a brief introduction to data quality: An Introduction to Data Quality
本文无意描述数据质量及其重要性,因此,请参考以下文章,以简要介绍数据质量:数据质量简介
SSIS中的数据分析任务 (Data Profiling Task in SSIS)
Due to the importance of data profiling, the Data Profiling Task in SSIS was developed by Microsoft. As described in the documentation, this task is used to identify data quality problems:
由于数据分析的重要性,SSIS中的数据分析任务是由Microsoft开发的。 如文档中所述,此任务用于识别数据质量问题:

In this section, we will describe the Data Profiling task in SSIS, profile viewer, quick data profiling and the profiles that can be created using this task.
在本节中,我们将描述SSIS中的数据配置任务,配置文件查看器,快速数据配置文件以及可以使用此任务创建的配置文件。
When we open the task editor, it contains three-tab pages:
当我们打开任务编辑器时,它包含三个选项卡的页面:
- General 一般
- Profile Requests 个人资料要求
- Expressions 表达方式
In the General tab page, the user can specify the timeout period for each profile request, and choose to store the result in a variable (Record Set) or an external file using a File Connection since the file will be saved as XML:
在“常规”选项卡页上,用户可以指定每个配置文件请求的超时时间,并选择使用文件连接将结果存储在变量(记录集)或外部文件中,因为该文件将另存为XML:

Besides the general configuration, one important part of the General tab page is that you can open the Profile Viewer, which is a standalone tool that allows reading a previously exported XML file:
除了常规配置之外,“常规”选项卡页面的一个重要部分是您可以打开Profile Viewer,它是一个独立的工具,可以读取以前导出的XML文件:

To learn more about Data Profile Viewer, refer to the following documentation: Data Profile Viewer
要了解有关Data Profile Viewer的更多信息,请参阅以下文档: Data Profile Viewer
In the Profile Requests tab page, you can add multiple profile requests for multiple connections (server, database, table), but you need to add them one-by-one. We’ll describe this in more detail later. If you need to add multiple profile requests for a single connection/table, you can simply use the quick profile option from the General tab page as shown in the image below:
在“配置文件请求”选项卡页面中,可以为多个连接(服务器,数据库,表)添加多个配置文件请求,但是您需要一个一个地添加它们。 稍后我们将对此进行详细描述。 如果您需要为单个连接/表添加多个配置文件请求,则只需使用“常规”选项卡页面中的“快速配置文件”选项,如下图所示:

For more information about the Quick Profile, refer to the following documentation: Single Table Quick Profile Form (Data Profiling Task)
有关快速配置文件的更多信息,请参考以下文档: 单表快速配置文件表单(数据分析任务)
The second tab page (Profile Requests) is where the user selects the profile requests needed. As shown in the image below, you can add a request from the grid located on the top of the form, while all configuration related to each profile request is found in the other grid:
用户可以在第二个标签页(配置文件请求)中选择所需的配置文件请求。 如下图所示,您可以从表单顶部的网格中添加一个请求,而与每个配置文件请求相关的所有配置都可以在另一个网格中找到:

There are different types of profile requests. Refer to the official documentation for more information about each profile request type:
有不同类型的配置文件请求。 有关每种配置文件请求类型的更多信息,请参考官方文档:
- Candidate Key Profile Request 候选**配置文件请求
- Column Length Distribution Profile Request 列长度分布配置文件请求
- Column Null Ratio Profile Request 列空率配置文件请求
- Column Pattern Profile Request 列模式配置文件请求
- Column Statistics Profile Request 列统计资料配置文件请求
- Column Value Distribution Profile Request 列值分配配置文件请求
- Functional Dependency Profile Request 功能依赖性配置文件请求
- Value Inclusion Profile Request 值包含配置文件请求
Each profile request has its own connection manager, table and column that must be specified in the configuration grid.
每个概要文件请求都有其自己的连接管理器,表和列,必须在配置网格中指定它们。
The last tab page, Expressions, is used to assign an expression to any property of the Data Profiling task in SSIS.
最后一个选项卡页面“表达式”用于将表达式分配给SSIS中“数据分析”任务的任何属性。
局限性 (Limitations)
There are three main limitations of the Data Profiling task in SSIS.
SSIS中的数据分析任务有三个主要限制。
1.仅允许ADO.NET连接 (1. Only ADO.NET connections are allowed)
The Data Profiling task in SSIS only accepts ADO.NET connection and SqlClient ADO provider, which only supports SQL Server databases. Which means you are not able to perform profile requests over other database providers such as Oracle, SQLite, MySQL, and others:
SSIS中的数据分析任务仅接受ADO.NET连接和SqlClient ADO提供程序,后者仅支持SQL Server数据库。 这意味着您将无法通过其他数据库提供程序(例如Oracle,SQLite,MySQL等)执行配置文件请求:

2.无法添加自定义配置文件请求 (2. No custom profile requests can be added)
Even if there are many types of profile requests you can add using the Data Profiling task in SSIS, you may need to create a custom profile request based on a specific requirement, which is not allowed by this task.
即使可以使用SSIS中的“数据概要分析”任务添加多种类型的概要文件请求,您也可能需要根据特定要求创建自定义概要文件请求,此任务不允许这样做。
3.有限的分析目标类型 (3. Limited profiling destination types)
While it is very important to store the result of the profile requests within a database to be used for quality assessment purposes, only the file connection manager or an object variable are available.
尽管将概要文件请求的结果存储在数据库中以用于质量评估目的非常重要,但是只有文件连接管理器或对象变量可用。
备择方案 (Alternatives)
As mentioned above, no custom profile requests can be added using the Data Profiling task in SSIS, but you can write your own profile request using an Execute SQL Task and store the result in a result set. Check the following answer on Stack overflow as an example: Data profiling Task – custom Profile Request.
如上所述,不能使用SSIS中的“数据概要分析”任务来添加自定义概要文件请求,但是您可以使用“执行SQL任务”编写自己的概要文件请求并将结果存储在结果集中。 以堆栈溢出为例,检查以下答案: 数据分析任务–自定义配置文件请求 。
In addition, Execute SQL Task supports a wider range of connection types and can be used to perform SQL commands over different database providers such as Microsoft Access, Excel, Oracle, SQLite, MySQL and others.
此外,“执行SQL任务”支持更广泛的连接类型,可用于在不同的数据库提供程序(例如Microsoft Access,Excel,Oracle,SQLite,MySQL等)上执行SQL命令。
它是如何工作的? (How does it work?)
In this section, we will illustrate how profile requests are sent to the database engine. To do that, we will use the SQL Profiler to check which commands are sent to the SQL Server instance.
在本节中,我们将说明如何将概要文件请求发送到数据库引擎。 为此,我们将使用SQL事件探查器检查将哪些命令发送到SQL Server实例。
To run this experiment, we created an SSIS package, and we added a data profiling task where we added 5 profile requests using the quick profile option. Then, we executed the package after creating a trace using SQL Profiler.
为了运行此实验,我们创建了一个SSIS程序包,并添加了数据配置任务,其中使用快速配置文件选项添加了5个配置文件请求。 然后,在使用SQL事件探查器创建跟踪之后,我们执行了该程序包。
As shown in the image below, a batch of SQL commands are sent to the database engine in order to perform data profiling. Each profile request is not performed using only one SQL command; rather, multiple commands are executed to obtain each profile request. In addition, we can see that the target database and Tempdb are used:
如下图所示,一批SQL命令被发送到数据库引擎以执行数据分析。 并非仅使用一个SQL命令就执行每个配置文件请求; 而是执行多个命令以获得每个配置文件请求。 此外,我们可以看到使用了目标数据库和Tempdb:
结论 (Conclusion)
In this article, we gave an overview of the Data Profiling task in SSIS. We answered questions like what is the Data Profiling task in SSIS, why is data profiling important, how to use the Data Profiling task, what are the main limitations and alternatives, and how does this task perform profile requests over SQL Server database engine.
在本文中,我们概述了SSIS中的数据分析任务。 我们回答了一些问题,例如SSIS中的数据分析任务是什么,为什么数据分析很重要,如何使用数据分析任务,主要限制和替代方案是什么以及此任务如何通过SQL Server数据库引擎执行配置文件请求。
Additional information about profile requests and the use cases for each one are described in the official documentation.
官方文档中介绍了有关配置文件请求和每个用例的其他信息。
翻译自: https://www.sqlshack.com/overview-of-data-profiling-task-in-ssis/
无法创建ssis数据流任务