Microsoft SQL Server Development Customer Advisory Team
From The Microsoft ProClarity Team Blog ProClarity Install Prerequisites – PTS for SQL Server 2005.
The Propagate system variable in SSIS is used to determine whether an event is propagated to a higher level event handler. This allows package execution to continue once an event is handled in the container that has generated an Error. For example, if you have a sequence container that contains an Execute Package Task and a Dataflow Task – but you don’t need the results of the Execute Package Task to stop the container from continuing you can use this variable in the event handler to log the issue and continue on to the Data Flow Task without the package stopping. This can be handled as well using precedence constraints in your package – but using the Propagate variable allows you to simply log the error (or handle it in any other way fit) using an event handler and continuing on as if nothing happened. There may be times when information is not available at the time of execution, a network error occurs, or some other issue takes place that will not adversely affect the rest of the execution of the package, and can be retrieved at a later execution – the Propagate system variable can help by forcing the package to continue without adding in logic to handle the failure inside the control flow – but instead handle it within the event handler and move to the next task.
This would not be a best practice in many Production or Test situations – because of the unknowns involved with this type of handling – but it can come in handy when developing packages that have some unknowns during the beginning of the development cycle – and later handled properly using precedence constraints when the unknowns have been defined or eliminated.
To find the Propagate variable, select the section you wish to add an event handler to and click the Event Handlers tab within the package designer (Figure 1 – arrow 1).
Figure 1 – Finding the Propagate System variable
Once you are in the Event Handlers section of the package designer, open the variables window (View > Other Windows > Variables) and select the “Show System Variables” button on the top of the Variables window (Figure 1 – arrow 2). This will show all the system variables related to the event handler. Scroll through the many variables that have been exposed by selecting the system variables and locate the “Propagate” variable entry (Figure 1 – arrow 3). This variable is scoped to “OnError” and has a Data Type of Boolean. Setting the value of this variable to “False” (Figure 1 – arrow 4) tells SSIS not to propagate the error encountered up through the rest of the package chain – and allows the package to continue execution if an error occurs on the section you have defined an Event Handler for.
If the package that you have configured with an Event Handler is being executed from a parent package, this will not return a successful execution to the parent. The Propagate variable is local to the package only – and the outcome of the package is bubbled up to the parent as a failure. In order to handle the error so that the parent package ignores a failure – you must set the “DisableEventHandlers” property of the Execute Package Task within the parent to “True”.
Now we are getting into pretty risky territory by setting that property on the Execute Package Task. If you have unhandled failures within the child package (IE – the propagate variable is set to true and no handlers are defined) – then they will be ignored by the parent package regardless of you handling the issue or not.
I do not recommend this approach to package development and deployment – this takes too much control away from the packages built in Error reporting and control – but this can be handy when first developing new control flow routines – to find issues within the package, and then fix them using rules and control flow.
This post will explain the properties used by SSIS in to make checkpoints work when executing packages. There may be times when a package executes and fails for some reason. In order to circumvent re-doing all the tasks that completed successfully you can use Checkpoints to start the package from where it last failed. This post covers checkpoint files and other checkpoint properties used to successfully restart a failed package from the point of failure in order to not duplicate efforts already completed during the last run of the package.
The reason for using checkpoints in packages is to allow the package to start at the point of failure instead of from the beginning. For instance, if you have 2 dataflow tasks in a package that take an hour each to execute – it is not beneficial to start the package from the beginning if the first dataflow task completed successfully and the second dataflow returned an error. Using a checkpoint file and other checkpoint properties allows the package to start at the second dataflow task where it failed previously and skip over the first long running dataflow (which was already executed successfully) – saving time, resources, and logic to rollback what has been already committed.
How Checkpoints Work
SSIS Checkpoints are comprised of a Checkpoint file, a usage property, save checkpoints property, and the FailPackageOnFailure property. These properties all work together in order to define how packages are restarted if a failure occurs. The checkpoint file captures the information of the package required to restart properly from the point of failure, the usage property tells the package if checkpoints are being used and when to use them, and the save checkpoints property indicates if the package actually saves checkpoints to the checkpoint file. The FailPackageOnFailure property is set at the container level for those containers you wish to enable restarts on failure.
Information captured in a checkpoint file
If you enable checkpoints in your package by configuring various checkpoint properties, at runtime SSIS will capture the restart position for the package within the checkpoint file. It will capture the container that fails, variable values at time of failure (except the object variable type), transactions, and configured values at runtime.
How to use Checkpoints
Checkpoints within a package are captured at the host level of the task that fails. For example, if there is an Execute SQL Task within a ForEach Loop Task and the Execute SQL Task fails – the package will be restarted at the ForEach Loop instead of the Execute SQL Task. Any work completed before the failure within the ForEach Loop will be run again when the package is restarted. All efforts are taken to rollback all transactions within the container if a failure occurs during execution. This does not pertain to non managed code outside of the package execution environment however – such as in the case of using an Execute SQL Task that calls a stored procedure. If the stored procedure is committing transactions within the SQL engine – outside the scope of the package – these transactions will not be rolled back by the package checkpoint system – and any work completed in the stored procedure will be duplicated the next time the package restarts.
Another thing worth noting when using checkpoints within packages, is that configuration changes between the time the error in the package took place, and re-running the package will not be reflected at restart. This is part of the checkpoint process. The runtime configuration values are captured within the checkpoint file and re-loaded upon execution in order to keep continuity with the last unsuccessful execution. Once the package completes successfully, the checkpoint file is deleted from the file system and configuration changes will be reflected in the package upon next execution.
Explaining the Checkpoint Properties
There are 4 properties used for implementing checkpoints within SSIS packages.
As indicated above, the CheckpointFileName property is used to specify the path and file name that will be used to capture the checkpoint information of the package at run time. As stated earlier – this file will be deleted from the file system when the package executes successfully. This file captures the unique identifier of the package as well, so if you change the unique identifier after a failure, the checkpoint system will not work because the GUID’s no longer match. This is done so that a package cannot execute using the checkpoint file of a different package. System and user defined variable values are captured in the checkpoint file (except for object type variables), as well configuration values.
This property simply indicates whether or not checkpoints are used within the package. The available values for this property are:
- Never – If the value is set to Never then the package ignores any checkpoint files defined and restarting the package starts it from the beginning.
- Always – this value indicates that the package always uses the checkpoint file defined and the package always starts from the point of failure – if a failure occurs. With this option set, the package will fail if a checkpoint files does not exist.
- IfExists – This option indicates that if there is a checkpoint file for the package – it will be used to restart from failures, otherwise, the package is run from the beginning regardless of failures.
This property indicates whether the package will save checkpoints when the package is executed. This property must be set to True in order for the package to restart at point of failure.
This property must be set to True for all containers in your packages – where you want to be able to restart the package from in case of error. If you have not set any container’s FailPackageOnError property to True when the package executes – it will not recover from where it left off.
Please see BOL for more information on this subject.
This feature in SSIS has some known issues – so if you have any experience with it – good or bad – please feel free to leave a comment.
I have been looking into the methods for using stored procedures as a data source within a data flow component in SSIS. Most of the information I found in blog posts and technical write ups say using stored procedures as a data source is possible – however, can be difficult to configure and troubleshoot. There are a few things to keep in mind when using SP’s as your data source in packages.
I’ll use a simple example of the system stored procedure “sp_SpaceUsed”. This procedure returns data size information about the current database, or – if a table name is passed as a parameter – the size information about the table.
If you set up an OLE DB data source within a data flow task, and change the data access mode to SQL Command you can type into the SQL Command Text area something like the following
EXEC sp_SpaceUsed ‘Person.Address’
I’m using the AdventureWorks database as my example here. If you hit the preview button you can see the results that this command returns. There are 6 columns returned when a table name is provided to the stored procedure.
But if you go to the Columns section of the Source component the columns in this section don’t match the preview we just did.
Here we can see that the columns section is returning the default result set of the stored procedure when parameters are not supplied – it is returning database_name, database_size, and unallocated_space. The stored procedure actually returns 2 result sets when no parameters are supplied, and the columns shown here are from the first result set – so the second isn’t even being evaluated.
Before we get into why we’re not seeing the proper columns lets take a look at what is going on behind the scenes of SSIS and maybe this will shed some light onto why the preview window and the output column windows are not in sync.
I’m going to use Profiler to capture commands issued to the database. I use the standard template and connect to my server. Then open the data source component and hit the preview button. My preview comes up with the correct information and columns. Now go to the columns section of the component and view that the columns coming back are still incorrect. If I go back to Profiler I can see where the command was issued to get the preview data – and find an entry for “EXEC sp_spaceUsed ‘Person.Address’” as expected. Scrolling down further I can find multiple prepare statements and an execution of the stored procedure. However, before the statement is SET FMTONLY ON. This explains a lot actually.
FMTONLY is a command that is used pretty heavily by SSIS (and many other apps for that matter) to gather the meta data of the command being executed without actually executing it. This provides fast response times since nothing is being executed, but in our case this is unhelpful. Here’s why.
I executed the command found in my data source using SSMS. The only thing I added was SET FMTONLY ON and then closed it with SET FMTONLY OFF. Looking at the meta data returned there are now 3 record sets returned instead of 1. The reason the 3 were returned is because SET FMTONLY isn’t actually executing the command, it’s just getting the “Format Only” of the command results. Since nothing is executed this returns 3 legitimate result sets – the first one being the one we are seeing within the columns section of the data source component. So SSIS is taking the first result and mapping the meta data of the columns and discarding the other 2 result sets.
So how do I get around this. I want my columns to be properly mapped and I don’t want to write a stored procedure or query myself to replicate what has already been written.
By adding SET FMTONLY OFF to the beginning of my statement in the data source we can get around this. This is somewhat sneaky and definitely not good practice, but it does the job for this case. The reason adding that statement to beginning of our command works is because we are manually overriding the behavior of SSIS. If we think back to the profiler output, SSIS is appending SET FMTONLY ON to beginning of our statement and then running it. If we override it with SET FMTONLY OFF we will receive a proper list of columns as illustrated below – because the statement is in fact being executed.
If we were to run profiler again we see something like this now:
SET FMTONLY ON
SET FMTONLY OFF
EXEC sp_spaceused ‘Person.Address’
SET FMTONLY OFF
The second command is actually the first of our commands. We overrode what SSIS was trying to accomplish by turning off FMTONLY right after the SSIS engine turned it on. This can be a very bad thing however for developing packages. Lets say you have a stored procedure that takes 65 seconds to return the result set, by setting FMTONLY OFF, you are running that stored procedure twice – or 130 seconds. This would become very time consuming in the development phase waiting for SSIS to get the results before you could start working. Also at run time this would take twice as long as well – the pre-validate phase as well as the execution phase would take the same amount of time.
Really, what it comes down to is Stored Procedures probably aren’t a very good choice for a data source because you have to work around the default behavior of SSIS to get them to work properly or create your own stored procedures that take into account the behavior of SSIS in the Pre-validate and execution phases.
Another issue you may encounter is “noise” from the proc making it impossible to get any results at all. This can happen when SET NOCOUNT ON is not used in a procedure you write. It’s a best practice to always set NOCOUNT to ON when writing procedures, but sometimes we can forget. The added communication between the client and the server of “N Row(s) affected” (where N is the record count) is very confusing and will cause issues in the end.
Be careful when you are experimenting with SET FMTONLY. It can become very confusing very quickly if you have set FMTONLY to on and try to do other tasks. You cannot delete records for example from a table when FMTONLY is set to ON, you can’t truncate a table, or do DDL statements. The reason is because they commands aren’t actually being executed, they are just gathering the meta data from the commands.
So in closing, if you are going to use stored procedures as a data source then make sure you understand what the stored procedure is doing, how long it takes to execute, and make sure SET NOCOUNT ON is inside your stored procedure as well. It would be best to write your own procedures with only 1 recordset being returned. This allows you to not override what SSIS is doing in the background by adding SET FMTONLY OFF to the beginning of the statement – and avoids double executions of the statements.
Quick post related to another post I am currently working on.
In SSMS, if you have a query – such as
SET FMTONLY ON GO SELECT * FROM sys.columns
Then execute the code – you’ll get a similar resultset to the figure shown below
Now, if you hit CTRL + L to see the estimated execution plan – you will only get
(0 Row(s) affected)
Fine, that makes sense. FMTONLY ON is only getting meta data so there is no estimated execution plan to show. That’s all fine. But now, if you hit F5 to execute the commands you will get this
Notice anything weird? Exactly, it returned 2 query results instead 1. Now, hit CTRL + L a couple times in a row and hit F5 again. However many times you hit CTRL + L and then run the query, you will get that many resultsets back.
No idea if this is a bug or not. Haven’t looked through connect yet to see if it’s there, but will at some point this weekend.
I can probably come up with a half baked type answer as to why it is doing this with FMTONLY ON, but if anyone else actually knows why, I’d much rather hear from them.