Simon Worth’s SQL Server Weblog

SQL Server and its many complexities

SSIS – Using Stored Procedures as a Data Source

with 10 comments

I have been looking into the methods for using stored procedures as a data source within a data flow component in SSIS.  Most of the information I found in blog posts and technical write ups say using stored procedures as a data source is possible – however, can be difficult to configure and troubleshoot.  There are a few things to keep in mind when using SP’s as your data source in packages.

I’ll use a simple example of the system stored procedure “sp_SpaceUsed”.  This procedure returns data size information about the current database, or – if a table name is passed as a parameter – the size information about the table.

If you set up an OLE DB data source within a data flow task, and change the data access mode to SQL Command you can type into the SQL Command Text area something like the following

EXEC sp_SpaceUsed ‘Person.Address’

I’m using the AdventureWorks database as my example here.  If you hit the preview button you can see the results that this command returns.  There are 6 columns returned when a table name is provided to the stored procedure.

image

But if you go to the Columns section of the Source component the columns in this section don’t match the preview we just did.

imageHere we can see that the columns section is returning the default result set of the stored procedure when parameters are not supplied – it is returning database_name, database_size, and unallocated_space.  The stored procedure actually returns 2 result sets when no parameters are supplied, and the columns shown here are from the first result set – so the second isn’t even being evaluated.

Before we get into why we’re not seeing the proper columns lets take a look at what is going on behind the scenes of SSIS and maybe this will shed some light onto why the preview window and the output column windows are not in sync.

I’m going to use Profiler to capture commands issued to the database.  I use the standard template and connect to my server.  Then open the data source component and hit the preview button.  My preview comes up with the correct information and columns.  Now go to the columns section of the component and view that the columns coming back are still incorrect.  If I go back to Profiler I can see where the command was issued to get the preview data – and find an entry for “EXEC sp_spaceUsed ‘Person.Address’” as expected.  Scrolling down further I can find multiple prepare statements and an execution of the stored procedure.  However, before the statement is SET FMTONLY ON.  This explains a lot actually.

FMTONLY is a command that is used pretty heavily by SSIS (and many other apps for that matter) to gather the meta data of the command being executed without actually executing it.  This provides fast response times since nothing is being executed, but in our case this is unhelpful.  Here’s why.

image 
I executed the command found in my data source using SSMS.  The only thing I added was SET FMTONLY ON and then closed it with SET FMTONLY OFF.  Looking at the meta data returned there are now 3 record sets returned instead of 1.  The reason the 3 were returned is because SET FMTONLY isn’t actually executing the command, it’s just getting the “Format Only” of the command results.  Since nothing is executed this returns 3 legitimate result sets – the first one being the one we are seeing within the columns section of the data source component.  So SSIS is taking the first result and mapping the meta data of the columns and discarding the other 2 result sets.

So how do I get around this.  I want my columns to be properly mapped and I don’t want to write a stored procedure or query myself to replicate what has already been written.

By adding SET FMTONLY OFF to the beginning of my statement in the data source we can get around this.  This is somewhat sneaky and definitely not good practice, but it does the job for this case.  image The reason adding that statement to beginning of our command works is because we are manually overriding the behavior of SSIS.  If we think back to the profiler output, SSIS is appending SET FMTONLY ON to beginning of our statement and then running it.  If we override it with SET FMTONLY OFF we will receive a proper list of columns as illustrated below – because the statement is in fact being executed.

image

If we were to run profiler again we see something like this now:
SET FMTONLY ON
SET FMTONLY OFF
EXEC sp_spaceused ‘Person.Address’
SET FMTONLY OFF

The second command is actually the first of our commands.  We overrode what SSIS was trying to accomplish by turning off FMTONLY right after the SSIS engine turned it on.  This can be a very bad thing however for developing packages.  Lets say you have a stored procedure that takes 65 seconds to return the result set, by setting FMTONLY OFF, you are running that stored procedure twice – or 130 seconds.  This would become very time consuming in the development phase waiting for SSIS to get the results before you could start working.  Also at run time this would take twice as long as well – the pre-validate phase as well as the execution phase would take the same amount of time. 
Really, what it comes down to is Stored Procedures probably aren’t a very good choice for a data source because you have to work around the default behavior of SSIS to get them to work properly or create your own stored procedures that take into account the behavior of SSIS in the Pre-validate and execution phases. 

Another issue you may encounter is “noise” from the proc making it impossible to get any results at all.  This can happen when SET NOCOUNT ON is not used in a procedure you write.  It’s a best practice to always set NOCOUNT to ON when writing procedures, but sometimes we can forget.  The added communication between the client and the server of “N Row(s) affected” (where N is the record count) is very confusing and will cause issues in the end.

Be careful when you are experimenting with SET FMTONLY.  It can become very confusing very quickly if you have set FMTONLY to on and try to do other tasks.  You cannot delete records for example from a table when FMTONLY is set to ON, you can’t truncate a table, or do DDL statements.  The reason is because they commands aren’t actually being executed, they are just gathering the meta data from the commands.

So in closing, if you are going to use stored procedures as a data source then make sure you understand what the stored procedure is doing, how long it takes to execute, and make sure SET NOCOUNT ON is inside your stored procedure as well.  It would be best to write your own procedures with only 1 recordset being returned.  This allows you to not override what SSIS is doing in the background by adding SET FMTONLY OFF to the beginning of the statement – and avoids double executions of the statements.

Advertisements

Written by Simon Worth

August 14, 2009 at 9:03 pm

Posted in SSIS

10 Responses

Subscribe to comments with RSS.

  1. Simon,

    Great, wonderful and an extra-ordinary job…
    So clear and fantastic demo.

    I was finding it difficult to understand the FMT concept. But you made it really easy…

    Thanks a lot!

    Keep your good work going…

    Regards
    Venkat…

    Venkatraman

    August 5, 2010 at 4:57 am

    • No problem at all. Glad you found it helpful. Let me know if you find out anything else about executing stored procedures in SSIS. I’m always happy to learn something new.

      Simon Worth

      August 5, 2010 at 6:57 am

  2. Very nice blog! It just helped me in my dataflow with a complicated stored procedure. Good job Simon and well written. This was easy to follow and you did a great job explaining what you were doing! I will be pointing my blog back here for a link!

    Tony Volner

    September 8, 2010 at 8:11 am

    • Glad you found it useful. Let me know if you find any other gotchas while using stored procedures as data sources.

      Simon Worth

      September 9, 2010 at 8:28 am

  3. Hi Simon

    I am using a DB2 Stored Proc which is returning me the result set via cursor. When I try to execute the Proc result set is coming but i am not able to find any columns? Any work around in DB2? I don;t think there is any SET FMT property???

    Quick reply would be really appreciated.. Thanks

    Rahul

    July 12, 2011 at 5:38 pm

    • I’m not sure of a way to do the same in DB2, but you may want to run a DB2 trace and see what commands are being issued, and from there figure out what the correct SQL would be to accomplish the same thing.

      Simon Worth

      July 12, 2011 at 6:15 pm

    • A friend pointed me to this post about using SSIS and DB2 stored procedures. Doesn’t look promising for you – but read it over and perhaps you can come up with alternate solution.
      http://lqqsql.wordpress.com/2010/10/22/ssis-db2-and-stored-procedures/

      Simon Worth

      July 13, 2011 at 8:09 am

  4. […] I’ve made this change because it was convenient for me to work in the BIDS GUI as this allowed me to fetch the column names and edit them if required before presenting them to the downstream package components. To know more about how this can make a difference, please go through the fantastic post here. […]

  5. rajk2888

    February 26, 2015 at 9:41 am

  6. What a fabulous find this is! Works like a charm! Thank you so much!

    Kimberly

    September 4, 2015 at 4:19 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: