SSIS Server Architecture Depends on Package Design

A common question that my clients ask when they purchase hardware for the ETL Layer of a Business Intelligence architecture using SQL Server Integration Services (SSIS) is usually, “does this server need to be attached to SAN disk or a dedicated external disk, or will the server’s local disk suffice?”   The answer is “it depends” and I will explain why.

Disclaimer: I usually recommend a dedicated SSIS server for exclusively running ETL packages for larger environments, or for smaller environments that have the potential to become large. For throughput, recovery, etc. SAN disk or a dedicated external disk is usually a better option than local server disk. With budgets being tight in this economy it is not always an option so I won’t make this a “SAN or not” article, because “it also depends”.

I find that there are two schools of thought regarding SSIS package design in the last few years.  The first perspective is usually from the Database Engineers – DBAs who think that anything related to data should use execute SQL Tasks for the most part, utilizing the database engine with optimized transact SQL (T-SQL).  The second perspective is from the SSIS Developer that believes packages should use lookups, data flow tasks, script tasks, raw data files, etc., and execute sql tasks should be used as little as possible.

Another Disclaimer: I can relate to both perspectives as I have filled both roles on BI projects and this is a debate that I have seen at more than one client and am sure others have this discussion as well, so why not blog about it and hopefully it can help.

Drum roll please – the answer is both the Database Engineers – DBAs and SSIS Developers are correct.  From the SSIS Developers point of view everything is done and contained within the SSIS packages and local disk will suffice since most operations are done in memory so there is no need for SAN or external dedicated disk.  From the Database Engineers – DBAs point of view, the SSIS server can be configured to have the database engine present with attached SAN or external dedicated disk and use staging database(s) that run optimized T-SQL within execute SQL tasks.

The point is that it is not a clear cut question and answer and it really depends on a lot of factors. If you are the person making the architectural decisions for the SSIS server, you should understand how the packages will be designed first to answer this question correctly.

Advertisements
This entry was posted in SSIS and tagged , , , , , , , . Bookmark the permalink.

One Response to SSIS Server Architecture Depends on Package Design

  1. Oscar Zamora says:

    I believe that SSIS has evolved to a point where it can handle multiple and complex transformation in memory. ETL packages shall be database agnostic and should minimize the need for adding any special procedure or feature to the source, other than granting proper rights to read it. This makes the packages transportable and easy to migrate from 1 SSIS server to another, as well as easy to let the datasource migrate from one place to another.

    Now, in certain conditions where the data extracted requires several tables joined, and the output is for several million rows, the sort operation can benefit from the database engine as it is optimized to deal with such a task in the most efficient way.

    So I agree that it depends. But I tend to have as much as possible on the SSIS side as the database is not designed to perform any transformation in an efficient way. The RDBMs engine is optimize to store and retrieve data and keep it in a very relaible state.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s