Simon Worth’s SQL Server Weblog

SQL Server and its many complexities

Archive for the ‘SSIS’ Category

SSIS Event Handler Variables – Propagate

with 6 comments

The Propagate imagesystem variable in SSIS is used to determine whether an event is propagated to a higher level event handler.  This allows package execution to continue once an event is handled in the container that has generated an Error.  For example, if you have a sequence container that contains an Execute Package Task and a Dataflow Task – but you don’t need the results of the Execute Package Task to stop the container from continuing you can use this variable in the event handler to log the issue and continue on to the Data Flow Task without the package stopping.  This can be handled as well using precedence constraints in your package – but using the Propagate variable allows you to simply log the error (or handle it in any other way fit) using an event handler and continuing on as if nothing happened. There may be times when information is not available at the time of execution, a network error occurs, or some other issue takes place that will not adversely affect the rest of the execution of the package, and can be retrieved at a later execution – the Propagate system variable can help by forcing the package to continue without adding in logic to handle the failure inside the control flow – but instead handle it within the event handler and move to the next task.

This would not be a best practice in many Production or Test situations – because of the unknowns involved with this type of handling – but it can come in handy when developing packages that have some unknowns during the beginning of the development cycle – and later handled properly using precedence constraints when the unknowns have been defined or eliminated.

To find the Propagate variable, select the section you wish to add an event handler to and click the Event Handlers tab within the package designer (Figure 1 – arrow 1).

 image
Figure 1 – Finding the Propagate System variable

Once you are in the Event Handlers section of the package designer, open the variables window (View > Other Windows > Variables) and select the “Show System Variables” button on the top of the Variables window (Figure 1 – arrow 2).  This will show all the system variables related to the event handler.  Scroll through the many variables that have been exposed by selecting the system variables and locate the “Propagate” variable entry (Figure 1 – arrow 3).  This variable is scoped to “OnError” and has a Data Type of Boolean.  Setting the value of this variable to “False” (Figure 1 – arrow 4) tells SSIS not to propagate the error encountered up through the rest of the package chain – and allows the package to continue execution if an error occurs on the section you have defined an Event Handler for.

Important Note

If the package that you have configured with an Event Handler is being executed from a parent package, this will not return a successful execution to the parent.  The Propagate variable is local to the package only – and the outcome of the package is bubbled up to the parent as a failure.  In order to handle the error so that the parent package ignores a failure – you must set the “DisableEventHandlers” property of the Execute Package Task within the parent to “True”. 

Now we are getting into pretty risky territory by setting that property on the Execute Package Task.  If you have unhandled failures within the child package (IE – the propagate variable is set to true and no handlers are defined) – then they will be ignored by the parent package regardless of you handling the issue or not. 

I do not recommend this approach to package development and deployment – this takes too much control away from the packages built in Error reporting and control – but this can be handy when first developing new control flow routines – to find issues within the package, and then fix them using rules and control flow.

Advertisements

Written by Simon Worth

November 11, 2009 at 11:00 am

SSIS Package Properties – CheckPoints

with one comment

This post will explain the properties used by SSIS in to make checkpoints work when executing packages.  There may be times when a package executes and fails for some reason.  In order to circumvent re-doing all the tasks that completed successfully you can use Checkpoints to start the package from where it last failed.  This post covers checkpoint files and other checkpoint properties used to successfully restart a failed package from the point of failure in order to not duplicate efforts already completed during the last run of the package.

image

The reason for using checkpoints in packages is to allow the package to start at the point of failure instead of from the beginning.  For instance, if you have 2 dataflow tasks in a package that take an hour each to execute – it is not beneficial to start the package from the beginning if the first dataflow task completed successfully and the second dataflow returned an error.  Using a checkpoint file and other checkpoint properties allows the package to start at the second dataflow task where it failed previously and skip over the first long running dataflow (which was already executed successfully) – saving time, resources, and logic to rollback what has been already committed.

How Checkpoints Work

SSIS Checkpoints are comprised of a Checkpoint file, a usage property, save checkpoints property, and the FailPackageOnFailure property.  These properties all work together in order to define how packages are restarted if a failure occurs.  The checkpoint file captures the information of the package required to restart properly from the point of failure, the usage property tells the package if checkpoints are being used and when to use them, and the save checkpoints property indicates if the package actually saves checkpoints to the checkpoint file.  The FailPackageOnFailure property is set at the container level for those containers you wish to enable restarts on failure.

Information captured in a checkpoint file

If you enable checkpoints in your package by configuring various checkpoint properties, at runtime SSIS will capture the restart position for the package within the checkpoint file.  It will capture the container that fails, variable values at time of failure (except the object variable type), transactions, and configured values at runtime.

How to use Checkpoints

Checkpoints within a package are captured at the host level of the task that fails.  For example, if there is an Execute SQL Task within a ForEach Loop Task and the Execute SQL Task fails – the package will be restarted at the ForEach Loop instead of the Execute SQL Task.  Any work completed before the failure within the ForEach Loop will be run again when the package is restarted.  All efforts are taken to rollback all transactions within the container if a failure occurs during execution.  This does not pertain to non managed code outside of the package execution environment however – such as in the case of using an Execute SQL Task that calls a stored procedure.  If the stored procedure is committing transactions within the SQL engine – outside the scope of the package – these transactions will not be rolled back by the package checkpoint system – and any work completed in the stored procedure will be duplicated the next time the package restarts.

Another thing worth noting when using checkpoints within packages, is that configuration changes between the time the error in the package took place, and re-running the package will not be reflected at restart.  This is part of the checkpoint process.  The runtime configuration values are captured within the checkpoint file and re-loaded upon execution in order to keep continuity with the last unsuccessful execution.  Once the package completes successfully, the checkpoint file is deleted from the file system and configuration changes will be reflected in the package upon next execution.

Explaining the Checkpoint Properties

There are 4 properties used for implementing checkpoints within SSIS packages.

  1. CheckpointFileName
  2. CheckpointUsage
  3. SaveCheckpoints
  4. FailPackageOnFailure

CheckpointFileName

As indicated above, the CheckpointFileName property is used to specify the path and file name that will be used to capture the checkpoint information of the package at run time.  As stated earlier – this file will be deleted from the file system when the package executes successfully.  This file captures the unique identifier of the package as well, so if you change the unique identifier after a failure, the checkpoint system will not work because the GUID’s no longer match.  This is done so that a package cannot execute using the checkpoint file of a different package.  System and user defined variable values are captured in the checkpoint file (except for object type variables), as well configuration values.

CheckpointUsage

This property simply indicates whether or not checkpoints are used within the package.  The available values for this property are:

  • Never – If the value is set to Never then the package ignores any checkpoint files defined and restarting the package starts it from the beginning.
  • Always – this value indicates that the package always uses the checkpoint file defined and the package always starts from the point of failure – if a failure occurs.  With this option set, the package will fail if a checkpoint files does not exist.
  • IfExists – This option indicates that if there is a checkpoint file for the package – it will be used to restart from failures, otherwise, the package is run from the beginning regardless of failures.

SaveCheckpoints

This property indicates whether the package will save checkpoints when the package is executed.  This property must be set to True in order for the package to restart at point of failure.

FailPackageOnError

This property must be set to True for all containers in your packages – where you want to be able to restart the package from in case of error.   If you have not set any container’s FailPackageOnError property to True when the package executes – it will not recover from where it left off.

Please see BOL for more information on this subject.

This feature in SSIS has some known issues – so if you have any experience with it – good or bad – please feel free to leave a comment.

Written by Simon Worth

November 8, 2009 at 10:47 am

SSIS – Using Stored Procedures as a Data Source

with 10 comments

I have been looking into the methods for using stored procedures as a data source within a data flow component in SSIS.  Most of the information I found in blog posts and technical write ups say using stored procedures as a data source is possible – however, can be difficult to configure and troubleshoot.  There are a few things to keep in mind when using SP’s as your data source in packages.

I’ll use a simple example of the system stored procedure “sp_SpaceUsed”.  This procedure returns data size information about the current database, or – if a table name is passed as a parameter – the size information about the table.

If you set up an OLE DB data source within a data flow task, and change the data access mode to SQL Command you can type into the SQL Command Text area something like the following

EXEC sp_SpaceUsed ‘Person.Address’

I’m using the AdventureWorks database as my example here.  If you hit the preview button you can see the results that this command returns.  There are 6 columns returned when a table name is provided to the stored procedure.

image

But if you go to the Columns section of the Source component the columns in this section don’t match the preview we just did.

imageHere we can see that the columns section is returning the default result set of the stored procedure when parameters are not supplied – it is returning database_name, database_size, and unallocated_space.  The stored procedure actually returns 2 result sets when no parameters are supplied, and the columns shown here are from the first result set – so the second isn’t even being evaluated.

Before we get into why we’re not seeing the proper columns lets take a look at what is going on behind the scenes of SSIS and maybe this will shed some light onto why the preview window and the output column windows are not in sync.

I’m going to use Profiler to capture commands issued to the database.  I use the standard template and connect to my server.  Then open the data source component and hit the preview button.  My preview comes up with the correct information and columns.  Now go to the columns section of the component and view that the columns coming back are still incorrect.  If I go back to Profiler I can see where the command was issued to get the preview data – and find an entry for “EXEC sp_spaceUsed ‘Person.Address’” as expected.  Scrolling down further I can find multiple prepare statements and an execution of the stored procedure.  However, before the statement is SET FMTONLY ON.  This explains a lot actually.

FMTONLY is a command that is used pretty heavily by SSIS (and many other apps for that matter) to gather the meta data of the command being executed without actually executing it.  This provides fast response times since nothing is being executed, but in our case this is unhelpful.  Here’s why.

image 
I executed the command found in my data source using SSMS.  The only thing I added was SET FMTONLY ON and then closed it with SET FMTONLY OFF.  Looking at the meta data returned there are now 3 record sets returned instead of 1.  The reason the 3 were returned is because SET FMTONLY isn’t actually executing the command, it’s just getting the “Format Only” of the command results.  Since nothing is executed this returns 3 legitimate result sets – the first one being the one we are seeing within the columns section of the data source component.  So SSIS is taking the first result and mapping the meta data of the columns and discarding the other 2 result sets.

So how do I get around this.  I want my columns to be properly mapped and I don’t want to write a stored procedure or query myself to replicate what has already been written.

By adding SET FMTONLY OFF to the beginning of my statement in the data source we can get around this.  This is somewhat sneaky and definitely not good practice, but it does the job for this case.  image The reason adding that statement to beginning of our command works is because we are manually overriding the behavior of SSIS.  If we think back to the profiler output, SSIS is appending SET FMTONLY ON to beginning of our statement and then running it.  If we override it with SET FMTONLY OFF we will receive a proper list of columns as illustrated below – because the statement is in fact being executed.

image

If we were to run profiler again we see something like this now:
SET FMTONLY ON
SET FMTONLY OFF
EXEC sp_spaceused ‘Person.Address’
SET FMTONLY OFF

The second command is actually the first of our commands.  We overrode what SSIS was trying to accomplish by turning off FMTONLY right after the SSIS engine turned it on.  This can be a very bad thing however for developing packages.  Lets say you have a stored procedure that takes 65 seconds to return the result set, by setting FMTONLY OFF, you are running that stored procedure twice – or 130 seconds.  This would become very time consuming in the development phase waiting for SSIS to get the results before you could start working.  Also at run time this would take twice as long as well – the pre-validate phase as well as the execution phase would take the same amount of time. 
Really, what it comes down to is Stored Procedures probably aren’t a very good choice for a data source because you have to work around the default behavior of SSIS to get them to work properly or create your own stored procedures that take into account the behavior of SSIS in the Pre-validate and execution phases. 

Another issue you may encounter is “noise” from the proc making it impossible to get any results at all.  This can happen when SET NOCOUNT ON is not used in a procedure you write.  It’s a best practice to always set NOCOUNT to ON when writing procedures, but sometimes we can forget.  The added communication between the client and the server of “N Row(s) affected” (where N is the record count) is very confusing and will cause issues in the end.

Be careful when you are experimenting with SET FMTONLY.  It can become very confusing very quickly if you have set FMTONLY to on and try to do other tasks.  You cannot delete records for example from a table when FMTONLY is set to ON, you can’t truncate a table, or do DDL statements.  The reason is because they commands aren’t actually being executed, they are just gathering the meta data from the commands.

So in closing, if you are going to use stored procedures as a data source then make sure you understand what the stored procedure is doing, how long it takes to execute, and make sure SET NOCOUNT ON is inside your stored procedure as well.  It would be best to write your own procedures with only 1 recordset being returned.  This allows you to not override what SSIS is doing in the background by adding SET FMTONLY OFF to the beginning of the statement – and avoids double executions of the statements.

Written by Simon Worth

August 14, 2009 at 9:03 pm

Posted in SSIS

SSIS Configuration Tables and Tracking Changes

with 2 comments

A typical scenario when developing SSIS packages is to use configurations for initialization of packages, connections, variable values, etc.  There are few different types of configurations available through SSIS including XML files, parent package values, environment variables, and SQL Server.

In the case of SQL Server, the configurations you specify are created, updated, and queried from a table.  The table can be named what ever you want (as long as it doesn’t break the naming rules of objects) and placed in what ever schema is required.  As long as the correct column names exist in the table (with the proper data types) SSIS can use this table to hold configuration values.

Figure 1 – Package Configuration Wizard

image

If you run a trace on SQL Server while adding a SQL Server configuration in SSIS (Figure 1) returns the following query executed by BIDS to determine if a configuration table candidate exists for the specified connection.

Script 1 – Query from BIDS to find configuration tables

SELECT
       TABLE_SCHEMA
      ,TABLE_NAME
  FROM
       INFORMATION_SCHEMA.COLUMNS
 WHERE
       COLUMN_NAME = 'ConfiguredValue'
            OR COLUMN_NAME = 'PackagePath'
                  OR COLUMN_NAME = 'ConfiguredValueType'
                        OR COLUMN_NAME = 'ConfigurationFilter'
 GROUP BY
       TABLE_SCHEMA
      ,TABLE_NAME
HAVING
       COUNT(1) = 4

 

The above query looks for any table within the specified connection that has the 4 required column names (ConfiguredValue, PackagePath, ConfiguredValueType, and ConfigurationFilter).  If those 4 column names exist in a table on the specified connection then the table name will be returned to the “Configuration Table” section of the configurations wizard dropdown box.  If there is no candidate in the current connection then the dropdown will be empty – and your only option within BIDS is to click the “New” button in order to create a table.  You can also create the table using SSMS, and the next time BIDS queries the specified connection, the table will show up in the dropdown – as long as the correct column names exist in the object.

Script 2 shows the default DDL that BIDS generates for creating a configuration table.  As you can see the default code places the table in the “dbo” schema and names the table “SSIS Configurations”.  I’m not a fan of object names that have spaces in them so when I create a configuration table in SSMS I generally remove the space.  But I’m also not a fan of having the table name as SSISConfigurations or a fan of the table being in the dbo schema.

Script 2 – DDL generated by BIDS for creating a configuration table

CREATE TABLE [dbo].[SSIS Configurations]
(
    ConfigurationFilter NVARCHAR(255) NOT NULL,
    ConfiguredValue NVARCHAR(255) NULL,
    PackagePath NVARCHAR(255) NOT NULL,
    ConfiguredValueType NVARCHAR(20) NOT NULL
)

 

Script 3 is the code I usually use for a basic configuration table.  There are some enhancements I like to add as well to the table, but this script mirrors the original BIDS script for basic configurations.

Script 3 – My DDL to create a configuration table

CREATE TABLE [Config].[ETLConfigurations](
       [ConfigurationFilter] NVARCHAR(255) NOT NULL
      ,[ConfiguredValue]     NVARCHAR(255) NULL
      ,[PackagePath]         NVARCHAR(255) NOT NULL
      ,[ConfiguredValueType] NVARCHAR(20) NOT NULL)

 

So far the only differences between the SSIS generated DDL and the DDL I use in SSMS are the use of the Config schema rather than the dbo schema and I have changed to the name to “[ETLConfigurations]” rather than using the provided “[SSIS Configurations]” name.

One thing I like to see on a configuration table – or any other table for that matter that is tracking meta data or lookup information – is the use of audit columns.  These columns can be as simple (and usually are) as the name of the user who inserted the data and a date and time for when the data was inserted.  Script 4 demonstrates adding on 3 audit columns to the configurations table to track who inserted the record and when it was inserted.

Script 4 – Alter config table to add 3 new audit columns

ALTER TABLE
       Config.ETLConfigurations
  ADD
       InsertUser NVARCHAR(100) NOT NULL
            CONSTRAINT DF_ETLConfigurations_InsertUser DEFAULT SUSER_SNAME()
      ,InsertDate DATE NOT NULL
            CONSTRAINT DF_ETLConfigurations_InsertDate DEFAULT GETDATE()
      ,InsertTime TIME(7) NOT NULL
            CONSTRAINT DF_ETLConfigurations_InsertTime DEFAULT GETDATE()

 

The column InsertUser captures the full login name of the user that added the record – and InsertDate and InsertTime capture the date and time (using the new to 2008 DATE and TIME data types) of transaction.  If you’re using SQL Server 2005 then these 2 data types won’t be available to you – so you can just have 1 column instead called InsertDate with the data type DATETIME and this will capture the date and the time.  All 3 columns have defaults assigned to them so the data will be automatically added when a record is inserted and there is no need for a trigger to manage this information as new records are added.

These 3 new columns are a good beginning for auditing and data change logging – but they do not help with updates and deletes to the configuration table.

Why would we care what changes take place within the configuration table?  Does it really matter who touched what, when, and with what application?  Well sometimes it does matter.  Generally the configuration table will be feeding information to many packages – not just one – and these packages could span multiple projects and data sources.  Reusing a configuration value is a common practice in developing packages, so if a value changes it can affect many packages.  If a value in the configuration table gets changed or removed by someone or something by accident (or deliberately) this change could affect a very large set of enterprise solutions and some sort of tracking is a good idea to find out what happened and why.

Now we have 3 audit columns on the configuration table to track insert information and at this point we have the option to add a few more columns to the configuration table – but instead of InsertUser, InsertDate, and InsertTime we could add UpdateUser, UpdateDate, and UpdateTime.  These columns would be NULL when data is inserted, and a trigger could be used to populate the columns when data is changed.  However, this doesn’t help with data being deleted – only updated, and the update columns would only reflect the last change to the specific row, not all the changes for that row.

Another approach is to create a table that will capture the history of the changes that take place in the configuration table – both updates and deletes.  This can be accomplished by creating a new table and adding a trigger to the original configuration table.

Script 5 – History table for configuration changes

CREATE TABLE [Config].[ETLConfigurationsHistory] (
       [ConfigurationFilter]    NVARCHAR(255) NOT NULL
      ,[OldConfiguredValue]     NVARCHAR(255) NULL
      ,[NewConfiguredValue]     NVARCHAR(255) NULL
      ,[OldConfiguredValueType] NVARCHAR(20)  NOT NULL
      ,[NewConfiguredValueType] NVARCHAR(20)  NOT NULL 
      ,[ChangeUser]             NVARCHAR(100) NOT NULL
      ,[ChangeDate]             DATE          NOT NULL
      ,[ChangeTime]             TIME          NOT NULL
      ,[ChangeType]             NVARCHAR(10)  NOT NULL
      ,[ChangeApplicationName]  NVARCHAR(255) NOT NULL)

 

This history table will capture the previous value and new value for ConfiguredValue and ConfiguredValueType as well as the user that made the change, the date and time the change was made, the type of change (IE – Insert, Delete, Update), and the application name that made the change.  To capture all this audit information we create a trigger for UPDATE and DELETE on the original configuration table.  It is important to note that when you alter a configuration value within BIDS, the current configured value is deleted from the table and a new row is inserted with the new information.  So, on a system that is not being touched manually by users you will only see a record deleted and then a record inserted.

Script 6 – Trigger on configuration table to capture changes

CREATE TRIGGER 
       Config.tr_ETLConfigurations 
    ON 
       Config.ETLConfigurations 
 AFTER INSERT, UPDATE, DELETE AS 

-- SSIS deletes and inserts a row when a configuration value changes rather than
-- updating the row itself.  So if a row is updated - then it was outside of the
-- BIDS environment.
DECLARE @Inserted INT, @Deleted INT

SELECT @Inserted = COUNT(1) FROM Inserted
SELECT @Deleted = COUNT(1) FROM Deleted

--if inserted and deleted have record counts then this transaction is an update
IF (@Inserted > 0 AND @Deleted > 0)
      BEGIN
            INSERT INTO [Config].[ETLConfigurationsHistory]
                       ([ConfigurationFilter]
                       ,[OldConfiguredValue]
                       ,[NewConfiguredValue]
                       ,[OldConfiguredValueType]
                       ,[NewConfiguredValueType]
                       ,[PackagePath]
                       ,[ChangeUser]
                       ,[ChangeDate]
                       ,[ChangeTime]
                       ,[ChangeType]
                       ,[ChangeApplicationName])
            SELECT
                   ec.ConfigurationFilter
                  ,d.ConfiguredValue AS OldConfiguredValue
                  ,i.ConfiguredValue AS NewConfiguredValue
                  ,d.ConfiguredValueType AS OldConfiguredValueType
                  ,i.ConfiguredValueType AS NewConfiguredValueType
                  ,ec.PackagePath
                  ,SUSER_SNAME() AS ChangeUser
                  ,CAST(GETDATE() AS DATE) AS ChangeDate
                  ,CAST(GETDATE() AS TIME) AS ChangeTime
                  ,'Updated' AS ChangeType
                  ,APP_NAME() AS ChangeApplicationName
              FROM
                   [Config].[ETLConfigurations] ec
             INNER JOIN
                   Inserted i
                ON i.ConfigurationFilter = ec.ConfigurationFilter
               AND i.PackagePath = ec.PackagePath
             INNER JOIN
                   Deleted d
                ON d.ConfigurationFilter = ec.ConfigurationFilter
               AND d.PackagePath = ec.PackagePath
      RETURN
      END

IF (@Inserted > 0 AND @Deleted = 0)
      BEGIN
            INSERT INTO [Config].[ETLConfigurationsHistory]
                       ([ConfigurationFilter]
                       ,[OldConfiguredValue]
                       ,[NewConfiguredValue]
                       ,[OldConfiguredValueType]
                       ,[NewConfiguredValueType]
                       ,[PackagePath]
                       ,[ChangeUser]
                       ,[ChangeDate]
                       ,[ChangeTime]
                       ,[ChangeType]
                       ,[ChangeApplicationName])
            SELECT
                   i.ConfigurationFilter
                  ,NULL AS OldConfiguredValue
                  ,i.ConfiguredValue AS NewConfiguredValue
                  ,NULL AS OldConfiguredValueType
                  ,i.ConfiguredValueType AS NewConfiguredValueType
                  ,i.PackagePath AS PackagePath
                  ,SUSER_SNAME() AS ChangeUser
                  ,CAST(GETDATE() AS DATE) AS ChangeDate
                  ,CAST(GETDATE() AS TIME) AS ChangeTime
                  ,'Inserted' AS ChangeType
                  ,APP_NAME() AS ChangeApplicationName
              FROM
                   Inserted i
      RETURN
      END

--a delete took place
IF (@Inserted = 0 AND @Deleted > 0)
      BEGIN
            INSERT INTO [Config].[ETLConfigurationsHistory]
                       ([ConfigurationFilter]
                       ,[OldConfiguredValue]
                       ,[NewConfiguredValue]
                       ,[OldConfiguredValueType]
                       ,[NewConfiguredValueType]
                       ,[PackagePath]
                       ,[ChangeUser]
                       ,[ChangeDate]
                       ,[ChangeTime]
                       ,[ChangeType]
                       ,[ChangeApplicationName])
            SELECT
                   d.ConfigurationFilter
                  ,d.ConfiguredValue AS OldConfiguredValue
                  ,NULL AS NewConfiguredValue
                  ,d.ConfiguredValueType AS OldConfiguredValueType
                  ,NULL AS NewConfiguredValueType
                  ,d.PackagePath AS PackagePath
                  ,SUSER_SNAME() AS ChangeUser
                  ,CAST(GETDATE() AS DATE) AS ChangeDate
                  ,CAST(GETDATE() AS TIME) AS ChangeTime
                  ,'Deleted' AS ChangeType
                  ,APP_NAME() AS ChangeApplicationName
              FROM
                   Deleted d
      RETURN
      END

 

If we apply the trigger to the configuration table and start making some changes – either manually using DML in SSMS or through the configurations wizard in SSIS the changes will now be captured to the history table.  As I said earlier, BIDS deletes a row then inserts it again if modifications are made so with any luck you will only see deletes and inserts in the history table.  But if you do encounter an update then you know that the data has been manually changed by a user or some other process outside of BIDS.

Written by Simon Worth

February 20, 2009 at 7:37 pm