Vision Binary

work hard, be nice, amazing things will happen


A good practise–SSIS MANAGEMENT FRAMEWORK DESIGN

Problem

Whether you are creating an extraction, transformation, and loading (ETL) package, a maintenance package, or a simple process workflow, certain requirements are the same for every package. As a developer, you want to decrease your development and maintenance times. The rest of your team also has requirements for the package, which includes the following:

  • A support person must know how, when, and why a package failed.

  • A system administrator must see when that package executed and how long it took.

  • A change control operator must promote the package through environments with as little change as possible.

It comes down to you to put all of these requirements into every package that you create.

Challenges of Not Having a Framework Implementation

You will face many hurdles when coming up with an approach to handle the aforementioned requirements. This section discusses some of the problems you might encounter in order to show you how to avoid them in your final design.

 Different Development Methods

The first hurdle you encounter is different packages following different development methods. Having no standard approach (that is, a configuration schema or deployment strategy) makes modifying and maintaining the package after the initial development difficult for either another developer, or maybe even you. To make any changes, the new developer must understand the configuration, logging, and error-handling methods that the original developer used. Only after that understanding can the new developer make modifications, which will result in a longer maintenance cycle.

Using different development methods can also result in confusion about where to make the change. If a new database must be added to the package, should the connection string be added directly to the package? Or should it be included in an external configuration location? If so, should the configuration be stored as a full connection string, or as a server/database combination?

Choosing different development methods will not only cause confusion, it could also result in bad deployments or extended test cycles. In the event that a logging database server name changes, checking that all packages have the new database server name would be necessary. If any packages used text-based logging instead of database logging, the package would either miss the change, or need to be modified separately.

By having packages with different development methods for these base requirements, you’ll also end up with an inconsistent problem-solving routine. Each time you start to research an issue on a new package, you’ll need to look in an entirely different place than you did the last time you looked at a package. This could lead to missing what should have been an easy solution, and causing general frustration with the state of the packages.

 Changing Metadata

Another hurdle you’ll face is how to handle changing metadata. This issue was briefly touched upon earlier in this chapter in the discussion of how to store a connection string, but it is not just isolated to databases. Changing metadata could refer to file locations, control variables that indicate whether a section of the package should run, and other numerous facets of a package. The metadata could need to be changed on a regular basis, or more intermittently.

If the metadata change is overarching across all packages, changing, testing, and redeploying every package that contains that metadata would take a lot of effort. A good example of this would be renaming a shared network folder in which multiple packages store files to adhere to new corporate standards. Modifying the variable that contains the shared network folder name in multiple packages would take a long time, and be a good candidate for mistyping just one of the packages and finding the mistake late in the test cycle.

One change that will happen every time you create a package is modifying the name of a server location while moving a package through your environments, from development to quality assurance (QA) to production, or something similar. Every time you need to open and change the package itself, you open yourself up to the possibility of error.

Getting the Right Information to the Right People at the Right Time

The last hurdle to discuss is getting the right information to the right people at the right time. The type of information needed includes (but is not limited to) the following:The time the package started and ended its execution

  • Whether the package succeeded or failed

  • What events occurred during the execution

Depending on your scenario, you may also need to know what variables were modified or who executed the package. If you realize you need this information after the package has already run, you’re already too late. You must ensure that all of this information is recorded every time every package runs.

After a package has run and the information has been recorded, you must ensure that other developers or the support staff have access to that information in a readable format. They must also have the capability to go to one centralized location to find that readable information. You don’t want to spend your time writing reporting queries off your information when you could be spending it creating more packages!

Reusability

Although SSIS offers a variety of methods out of the box to help assist with these concerns, a few limitations exist to using these standard methods. In general, one issue that arises is that you must set up all the options every time you create a package. Not only does this slow down development time, it also opens up the possibility of a fellow developer mistyping something, or choosing a different method than you have used. These potential problems can make maintaining and supporting the package a nightmare for someone who did not initially create the package.

Framework Implementation

So, how do you overcome all of these issues that happen when developing SSIS packages? You create your own SSIS management framework, of course!

To design an SSIS management framework, you must solve all the previously mentioned issues. The framework must be easy to use, and help improve the SSIS package experience for all parties involved. Several pieces must work together fluidly to create the full framework.

Configuration Management Scheme

The first component, a configuration management scheme, eliminates making any modifications directly to the package, including changes needed to move a package through environments, and adjustments to file or server names. The configuration should manage variable and connection changes to reduce package execution errors caused by wrong information. You want to eliminate storing redundant configuration information, and create the capability to adjust subsets of configurations at a package level, or adjust all configurations at a system level.

Logging and Auditing

The next component provides a way to log events and errors that occur during execution, and to audit package objects during execution. To begin, you record all information in one central location that can be made accessible only through the security needed by your organization. You must keep track of any changes to a package to enable you to determine whether a new package version affected execution times. Speaking of execution times, you also want to record the start and end times of all package executions. To see the hierarchy of what package calls other packages, also include batching information.

The package objects that you want to audit are variables and tasks. You record the start and end times of the tasks with task identification information. Whenever a variable’s value changes, you record that change with the new value. All log entries and audit information should be linked together to promote easy querying and investigation into the data.

Template Package

The final component is a template package that all developers will use to start their packages. The components already described in this section should be incorporated into this package to enable you to start developing the logic of the package immediately. Because all logging and configurations are already in place, you don’t need to worry about having multiple versions, or accidentally mistyping a value.

 Framework Benefits and Value

By implementing a framework, you can overcome the challenges previously discussed. The framework components come together to create an easy-to-use template for all developers to use. This provides a consistent method for logging and configurations. You have a central repository for configurations that can be managed in one location. Development and maintenance times will be much lower with a framework than with having a variety of logging and configuration methods. You are also tracking all the execution and audit information for all packages, providing a transparent look into the system for all members of your team.

Although implementing and deploying the framework has an initial start-up cost, the benefit will greatly outweigh any delays over the long run. Figure 2-1 shows an idea of how the total cost of ownership will actually be lower over time using an SSIS management framework implementation.

                Figure 2-1

 

To summarize, you need to create an SSIS management framework that promotes standardization, configurations, and centralized monitoring. The overall goal includes reducing development and management time by creating a configuration management schema, logging and auditing mechanism, and a template package.

The following section discusses the design of the SSIS management framework.

 

 

 

Design

As discussed in the “Problem” section, you must create three components in order to build an SSIS management framework. This section takes a further look at the following components:

  • Configuration management

  • Logging and auditing mechanism

  • Template package

These pieces will fit together to create an overall SSIS architecture that can be used for all packages. Figure 2-2shows a diagram of this architecture.

Figure 2-2

To fully appreciate the rest of this chapter, you must understand how package configurations, events, variables, and package execution works. For a more detailed description of these topics, see the book Professional Microsoft SQL Server 2008 Integration Services by Brian Knight, Erik Veerman, Grant Dickinson, Douglas Hinson, and Darren Herbold (Indianapolis: Wrox, 2008).

Configuration Management

Often, in an application, you have a piece of data that you would like to change at runtime. This data could be as simple as the value of a variable to change the number of iterations of a loop, or as advanced as an argument that directs a program on which logical flow it should take. By extracting that information out to a different location, you can make modifications to an external source without touching your program. Called configurations, these are usually assigned in name-value pairs, where the name is what property you want to affect and the valueis the new piece of data you provide to the program.

Configurations in SSIS are called package configurations, and they allow you to set the value of package and task properties, variables, connection managers, and event handlers. You can set the configurations using a SQL Server table, an XML file, an environment variable, a registry entry, or a variable from the calling parent package. Details of all methods are not provided here, as this chapter only uses a few of them for framework implementations.

Overall Design

This configuration design employs a three-pass strategy that contains moving packages through environments and creating configuration levels. By using multiple passes, developers can assign configurations applicable not only at a system level, but also at a package level. Following are the three passes:

  • An environment variable that points to an XML configuration file that points to a SQL Server database

  • A SQL Server configuration that is filtered at the system level

  • A SQL Server configuration that is filtered at the package level

The purpose of the first pass is to tell the SSIS package where to look for its configurations. You obviously can’t store that information in the same location as the rest of the configurations because the package would never know where to start! Instead, you use an environment variable configuration to tell the package where to look for the XML configuration file. Then, the XML configuration file will configure the connection string value of a SQL Server connection manager.

The second pass applies the system-level configurations by using the SQL Server connection manager that the XML configuration file just modified to make any necessary run-time changes to the package. The third pass uses the same pattern, but uses a different filter to only apply the package specific configurations.

Environment Variable Configuration

The first thing the SSIS package uses is a system environment variable. Applications typically use environment variables as a configuration avenue. Stored as name-value pairs on the server, environment variables are accessed using the variable name and cannot be moved to a different location.

Because of the nature of environment variables, they make a perfect constant location from which to pull your initial configuration. Having different drive letters or folder structures between servers across environments is common. If you were to use an XML configuration file directly to pull your SQL Server information from, you would have to worry about updating the package when you moved the package between environments. With an environment variable, you can reference the same name in the package, but the value of the variable can be different on each server.

The environment variable should be created once when the server has been designated as part of the SSIS system. Keep in mind, however, that you’ll need to add the creation of the environment variable to your Disaster Recovery (DR) procedure in case of a server failure.

XML Configuration File

The value of the environment variable points to the location of the XML file, which contains just one configuration. The configuration replaces the connection string of a SQL Server connection manager in the package.

Why can’t you just use the connection string of the SQL Server database as the value of the environment variable? The main reason is for flexibility. Because you have an extra layer, you can easily change which SQL Server the packages should use. This may come in handy during your development if you need to work in both your development and QA environments. You can modify the XML configuration file to point to the new server, and your packages will instantly pick up the change, rather than having to modify the environment variable and need to restart your development application. In some organizations, you may need to add multiple databases to the XML configuration file, or other checks that the template package needs. Because the environment variable only allows you to set one property at a time, this is another place where the XML configuration file would be helpful.

If you have multiple SSIS servers per environment, it may behoove you to create a shared network folder to store a single copy of the XML configuration file. In the event that the SQL Server moves to a different server, you will only need to make that corresponding XML configuration file change in one place, rather than on every server.

 SQL Server Configuration

The final two configuration passes use SQL Server configurations that utilize the connection manager that the XML configuration just modified. These configurations hold the dynamic values needed to execute the package. The first configuration uses a system-wide filter for all packages. These types of configurations could include connection strings or file share names needed across the organization. The second configuration uses a filter that is specific to the package and includes applicable information.

Essentially, you use SQL Server as your main repository of dynamic information. Because you use a database rather than a text- or system-based configuration method, you can include this database into your normal database maintenance routine. This allows for easy backups, restores, and other maintenance. You will most likely already have a process in place to manage change control and promotions, so this system will easily fall into place. By using SQL Server, you also have the capability to secure your tables and database against any who should not have access. This also applies for encrypting certain values within the data itself, in case you need to configure passwords. Finally, you have an easy method to update and insert multiple records or query for information.

Now that you’ve set up all of your run-time values, you must record all the execution information of the package.

2.2.2. Logging and Auditing Mechanism

SSIS packages often run automatically in the middle of the night without any assistance on a server. Because no one is there to watch the package, you must record the information of the execution. This recording of package executions and events is known as logging, and is an important part of any SSIS architecture.

Although SSIS comes with its own logging mechanism, ferreting out the exact record you desire can often be difficult. For example, not all records contain a critical piece of data: the name of the package. Even though creating queries that work around this shortcoming is possible, wouldn’t having that information readily available to you be nice?

To assist in the overall logging design, you can circumvent the standard logging mechanism and create one of your own. This also enables you to add auditing of variable changes and events. You can also record any errors that occur during package execution.

The following sections take a look at what objects are needed to store the data and how the log entries get loaded.

Storage

For this example, you store log entries in a SQL Server database — although modifying the statements to use an alternative relational database management system (RDBMS) should be a trivial task. By using SQL Server, you can use the already-created configuration connection manager to point to your new logging database. Using a database gives a variety of benefits. Similar to configurations, you can back up and restore the database in the event of a server failure or unexpected data wipe. In fact, including this database in your normal database maintenance routine is expected. Additionally, you can easily delete historical log entries by running a simple delete statement filtered on the date column. Finally, this logging database gives you a queryable location to get aggregated or detailed information.

Tables

This logging design starts with a set of custom tables. These tables hold the log entries for package execution events, variable values changing, and any errors that occur. These entries also contain information to associate the entry with a package and time of execution.

Now to take a closer look at these tables:

  • dbo.Package — This table is where it all starts. This table contains general information, including the package name, who created it, and when it first entered the system.

  • dbo.PackageVersion — This table links to the Package table to tell an observer when someone modified the package. Each time a change is made to an existing package or a new package is created, a new package version will be created.

  • dbo.PackageLog — This table contains the information about the executions of packages. Every execution should have one record in this table. This is where you can find the start and end times of the execution, as well as whether the package succeeded, failed, or is still running.

  • dbo.BatchLog — This table is linked to by the PackageLog table. A batch consists of all children packages that are called from a master package. Each new master package starts a new batch and a new record in this table. This allows reporting of execution times and statuses at a higher level than the package.

  • dbo.PackageErrorLog — As its name implies, this table stores error information. It also links to thePackage table and stores the time the error occurred and what task caused the error.

  • dbo.PackageTaskLog — This table records the audit trail of what tasks executed during the execution of a package. It links to the Package table, and has start and end times of the tasks” executions. This table completes the three-level drill-down view from batch to package to task.

  • dbo.PackageVariableLog — This table contains the name of the variable, the new value of the variable, and the time that the value changed. It also links to the Package table.

Figure 2-3 shows how the tables are related. Later in the “Solution” section of this chapter, you will learn more about the definition and descriptions of the columns.

Figure 2-3

Stored Procedures

The next set of objects in the framework contains several management stored procedures, which are the only way to write to the previously mentioned tables. These stored procedures contain the business logic needed to insert and update the log entries. You place the calls to these stored procedures in either the control flow of the package, or the event handlers, depending on the type of event.

Here is a look at each stored procedure individually:

  • dbo.LogPackageStart — This stored procedure records general information about the package, and records an entry associated with the beginning of a package execution. You should place this stored procedure in the control flow preceding any other control flow tasks.

  • dbo.LogPackageEnd — This stored procedure updates the execution of the package and batch to show that the package completed successfully. You should place this stored procedure in the control flow following all other control flow tasks.

  • dbo.LogPackageError — This stored procedure inserts a record into the error table with its associated execution time and error information. This stored procedure also updates the status information in the main tables to show that the execution of the package and batch failed. You should place this stored procedure in the OnError event handler, scoped to the package level.

  • dbo.LogTaskPreExecute and dbo.LogTaskPostExecute — This pair of stored procedures records the start and end timing information of each of the tasks within the package. You place them in the OnPreExecute andOnPostExecute event handlers scoped to the package level, respectively.

  • dbo.LogVariableValueChanged — You place this stored procedure in the event handler forOnVariableValueChanged, scoped to the package level. It records the variable information into the variable log table.

At this point, you’ve completed a good portion of the framework design, including your configuration schema and your logging implementation. The final step is putting it all together in a template package for developers to use.

2.2.3. Template Package

The template package (the artifact of the management framework that developers must understand) is a normal SSIS package where the framework settings have already been set up for the developer. This feature enables you to jump right into developing the business logic of the package. The package should contain package configurations, connection managers, variables, and logging helpers.

The package configurations should already be configured in the package, which is possible because the configurations should be the same for all packages, and follow the previously described order. This also ensures the connection manager that the XML configuration sets will be available and named properly.

Ideally, all connections that you think you need will be included in the template package, to prevent a developer from worrying about creating the correct configurations for all environments. Developers won’t even need to name the connection managers correctly, as they will already be available!

The configuration modifies a variable that matches a connection manager. An expression utilizing the variable then sets the connection manager in the package.

In SSIS 2005, using a configuration set that contained connection manager configurations that were not included in the package was possible. As of SSIS 2008 SP1, this action causes an error that fails the execution of the package. To work with this issue, use a configuration to set a variable within the package. You can then use an expression to set the appropriate property on the connection manager.

The template package includes any system-level variables, including the ones that set the connection managers. One example of a variable used across packages is the number of days to retrieve data from a source system. If all packages must pull data that has been modified in the past 100 days, you would want to create a variable for this value and set it to be configurable, in case it must be changed in the future to 150 days.

The template package should not include any values specific to a particular package. These variables can be included on a case-by-case basis during development.

The package also contains preconfigured tasks that call the management stored procedures. The tasks use the same connection manager that the configuration used. All log entries end up in one central logging entity, as long as the developer uses the provided template.

After the template has been created, you can install it on all developers” machines. They can then add the template package using the project menus within Business Intelligence Development Studio (BIDS). They first need to generate a new package ID, and then they can begin their development.

Implementation Guidelines

Now that you’ve laid out the design, you can use SQL Server and SSIS to put it in motion. To review, you want to include the following features:

  • Multipass configuration management schema

  • SQL Server tables that contain log events, audit information, and error information

  • Template package that contains a development base

 



One response to “A good practise–SSIS MANAGEMENT FRAMEWORK DESIGN”

  1. Hi Victoria. Very interesting and powerful. It seems to be designed to work with ‘old’ package deployment model, do you have any pointers for adapting such a framework for project deployment model in SSIS in SQL Server 2016 and onwards?

Leave a comment