Tuesday, March 20, 2012

run sql query on multiple databases

I need to run a same query on 15 different databases and union the results.

Does any of the experts know how to do this with good performance?

Any help is appreciated.

Phil....the only reason i posted in T-Sql is because i will be using this query in reporting services :)

Thanks

Nice to see you over here...

The only reason I invited you over here is that using the power of SSIS may be better suited than doing a large cross-database query. Here's what I'd suggest...

Build a package in SSIS that connects to all of your various databases, and then using the union all component, you can store the results in a staging table to be used in SSRS. Or do you need the results real-time?|||The results will be updated every week.|||

RookieDBA wrote:

The results will be updated every week.

There you go... Using concepts from the data warehouse world, this is your best bet anyhow. You're performance may be faster obtaining the results through SSIS (versus transact-sql -- only you can test that), but certainly when users run reports in SSRS against this "reporting" table, they'll be most appreciative that they aren't waiting for data to be assembled.|||If all source tables have same estructure; you could have a dataflow with only one Source/destination component and then call the package 15 times; passing each time a difrent connection for the source component; that would help incase more sources are added down the road...|||

Yeah its fifteen different databases with five same tables and same sql code.

I will try using SSIS and see the performance.

Thanks

|||

Rafael, could you tell me what toolbox item to use for "call the package 15 times, passing each time a different connection for the source..........."

Thanks

|||

There is not an specific task for that. You would need to create an extra package; a master package that will call the child package that performs the extract/load. The trick would be to use a ForEach loop that iterates through the 15 set of connections and put the connection strings in variables (they can be in a table/file etc); inside of the ForEach Loop you will have an Execute Package task. Then the child package would need to use package configuration (from parent variable) to get the proper connection string on each iteration.

Does this make sense?

The only problem I see with this approach, it is that each iteration has to wait until the previous one is completed; so parallelization is not possible. I used a different approach to achieve parallel execution some time ago; but to be honest I did not like it at all; as you have to use a execute package task for every single execution you need (in your case would be 15!). I explained here in case you want to take a look:

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=816000&SiteID=1

No comments:

Post a Comment