Vision Binary

work hard, be nice, amazing things will happen


SSIS-easier maintain your package by using Stored procedure smartly in SQL statement

As developer, you may need to develop and maintain many SSIS packages included the SQL script. and it can be a frustrated job if you need to frequently trouble shooting the problems which are caused by different data quality issues. Those package has been running well and suddenly it failed. So you always have to open the BIDS, scratch your head, looking for which part is wrong .

It occurred to me that most times of the failure, were cause by our original data. and after loaded into the staging tables, those bad data failed in certain sql statement. Once the statement failed to execute, the whole package failed.

so how should our developers develop a package more smartly, try to avoid so much effort to go to production server, copy packages, trouble shooting the issues and then paste the new shinning package back into production environment and re add back to SSIS server? All of this is so time wasting!

Base on my personal experiences (well I personally don’t really like using SSIS, most tasks I handled by using t-sql, and made them as stored procedures, and it works like a charm), when you are developing a SSIS packages, you may have variables, you will have control flows, data flows, you main job is to extract from other data source, transfer the it into desirable  format, load it into final destination, most time is database. there are steps we all do even we just do it differently. and how to make your package easier to maintain, more efficient?

So you very much likely have a staging place firstly to hold your data, I personally don’t suggest developers load data directly into their production tables. Because you really want to confirm that no data is missing, no bad format is introduced into the tables. And bad data can cause many troubles for an organization, especially when it want to generate reports, for example:  finance reports, market reports, forecasting, etc.  All your figures, visualizations, analysis, summaries come from your database, the data quality is the foundation, the information inside the data is the material, both are indispensable for building models, reports, even just simply analysis. You want all the names, addresses, transaction records, categories, etc,etc, to be accurate , all format to be consistent, you also wish you have the historical data in a certain period to help  you  build a good report to reflect how things were, are, and will be going on.

I believe it’s the data engineer or whoever work on integrating data’s responsibility, to help a company establish a healthy data environment.  working as a consultant I see too many contractors or bad developers, simply not care, maybe just know one or two things then they start working as business intelligence developer. and the package they created just so hard to maintain and very time consuming to be fixed.

okay, let me keep introduce my idea. After the data is loaded into a staging area, certain joins or updates the some column may needed. for example, you may need to verify your data, or transform into another value after join certain table, or base on the all entities there is an application calculate them and out put a value for each row. You can use merge,  you can use union all, but merge needed to be sort, and there are always metadata issues, you probably need to do data conversion against certain column, and that’s frustrating, merge just join, union all just union all in sql, why you can use a script, you have to drag a tool? I believe a SSIS developer would know advanced SQL, if not, then I kind of question how they get the job.  SSIS may have some advantages in variables, containers, managements, but once if something related to data, SQL script will always be a better choice.

So a lot of steps, you can save your time by creating sql statements, transform, calculating, filtering, then just do it in sql statement. the trick is, make sure you create those statement into stored procedure. Why? for example, I have some mapping rules changed for my raw data, so I need to change the clauses on join and where, to simplify the process, if you created all the statement in one/ several stored procedure, you don’t need to go back to your package, open them, edit, and redeploy. All you need just go to the database which ever had the sp, modify your stored procedure. This is truly life saving.

This is how I fixed stuff and even my past bosses were all agreed I  got things done in an amazing pace.



Leave a comment