A common question that my clients ask when they purchase hardware for the ETL Layer of a Business Intelligence architecture using SQL Server Integration Services (SSIS) is usually, “does this server need to be attached to SAN disk or a dedicated external disk, or will the server’s local disk suffice?” The answer is “it depends” and I will explain why.
Disclaimer: I usually recommend a dedicated SSIS server for exclusively running ETL packages for larger environments, or for smaller environments that have the potential to become large. For throughput, recovery, etc. SAN disk or a dedicated external disk is usually a better option than local server disk. With budgets being tight in this economy it is not always an option so I won’t make this a “SAN or not” article, because “it also depends”.
I find that there are two schools of thought regarding SSIS package design in the last few years. The first perspective is usually from the Database Engineers – DBAs who think that anything related to data should use execute SQL Tasks for the most part, utilizing the database engine with optimized transact SQL (T-SQL). The second perspective is from the SSIS Developer that believes packages should use lookups, data flow tasks, script tasks, raw data files, etc., and execute sql tasks should be used as little as possible.
Another Disclaimer: I can relate to both perspectives as I have filled both roles on BI projects and this is a debate that I have seen at more than one client and am sure others have this discussion as well, so why not blog about it and hopefully it can help.
Drum roll please – the answer is both the Database Engineers – DBAs and SSIS Developers are correct. From the SSIS Developers point of view everything is done and contained within the SSIS packages and local disk will suffice since most operations are done in memory so there is no need for SAN or external dedicated disk. From the Database Engineers – DBAs point of view, the SSIS server can be configured to have the database engine present with attached SAN or external dedicated disk and use staging database(s) that run optimized T-SQL within execute SQL tasks.
The point is that it is not a clear cut question and answer and it really depends on a lot of factors. If you are the person making the architectural decisions for the SSIS server, you should understand how the packages will be designed first to answer this question correctly.