Moving data from one db to another using SSIS

ssis

I am very new to SSIS and need to work out how to setup a transfer from table A database A to table B database B.

table A contains many more fields than I need in table B, so the process needs SQL so I can specify fields to take. (I cant just lift the table from database A and drop it into database B).

If I were doing it in SQL I would just select field1, field2, field3 from table a and update table b with the results.

Can anyone offer me any pointers on how to do this in SSIS? I have had a good google around but can't find a definitive answer.

Many thanks

Best Answer

Create regular data flow with 2 components - OLE DB Source and OLE DB Destination (I assume you are using MS SQL Server, in general, use whatever components your company uses to connect to the DB).

In case of 2 DBs, create 2 connection managers, each pointing to its DB. Point OLE DB Source to first connection manager configured to point to source of data, and OLE DB Destination to second connection manager configured to point to destination DB.

Now point OLE DB Source to the source table in source DB, leave all the fields intact. Connect source and destination components with green arrow originally going out of source component. Now point OLE DB Destination to the destination table in target DB. Double-click destination, go to mappings and make sure they are correct (SSIS tries to map automatically using strick name matching), otherwise (in case names are different) connect source and destination fields manually. That's it, you just don't provide mappings for the fields which cannot be accommodated by destination table.

Alternatively, you can leave out the columns you don't need at source component - double-click it, go to Columns and uncheck columns you don't need.

Related Solutions

Sql-server – SSIS how to parse column with erratic data

For something like this the optimal solution of course is to control your input. That said, the reality is you have to parse the supplied input.

For something as complex as your parsing, I'd skip the Derived Column transformation and go straight for a Script transformation. I select my source column, Input and create three output columns: number, trash and Interval. number and Interval will hold the parsed values while trash will only be populated when the script can't make heads or tails from the input.

I use two member variables, numbersRegex and periodDomain. periodDomain is just a list with the acceptable values. For string comparisons, I force everything to lowercase and hope for English. numbersRegex is a regular expression that is used to identify digits in a string.

For every row that comes in, the script will split the Input value based on whitespace. For each of those tokens, I test whether the token has a digit in it. If it does, we'll call the GetBiggestNumber method. Otherwise, we'll call the ValidatePeriodDomain Once all the tokens have been processed, then it's important to make certain both values have been set.

GetBiggestNumber attempts to look at all the groupings of number and find the largest set.

ValidatePeriodDomain attempts to compare the current value to a known list of acceptable values.

using System.Text.RegularExpressions;
using System.Collections.Generic;

/// <summary>
/// The amazing script transformation
/// </summary>
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    Regex numbersRegex;
    List<string> periodDomain;

    /// <summary>
    /// Initialize member variables
    /// </summary>
    public override void PreExecute()
    {
        base.PreExecute();
        // match consecutive digits
        this.numbersRegex = new Regex(@"\d+", RegexOptions.Compiled );
        this.periodDomain = new List<string>(){ "year", "month" };
    }

    /// <summary>
    /// Parse the incoming data
    /// </summary>
    /// <param name="Row">The row that is currently passing through the component</param>
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        string[] parts = Row.Input.Split();
        string period = string.Empty;
        int? foo = null;

        foreach (string token in parts)
        {
            // try to do something with it
            // If the token has a digit in it, then we'll extract the largest value
            // if no digits, then the first token matching our domain is preserved
            if (this.numbersRegex.IsMatch(token))
            {
                foo = GetBiggestNumber(token);
            }
            else
            {
                if (ValidatePeriodDomain(token))
                {
                    period = token;
                }
            }
        }

        // at this point, we've processed the input data
        // If the local variables are in their initial states, then we didn't find
        // anything of note and need to populate the Row.Junk column
        // Why local variables, because can't read from Row.column
        if (period == string.Empty || (foo == null))
        {
            Row.trash = Row.Input;
        }
                    else
                    {
            Row.number = foo.Value;
            Row.Interval = period;
                    }
    }

    private bool ValidatePeriodDomain(string token)
    {
        return (this.periodDomain.Contains(token.ToLower()));
    }

    private int? GetBiggestNumber(string token)
    {
        int? bigOne = null;
        int? current = null;
        // Get all the groups of numbers and compare them
        foreach (Match item in this.numbersRegex.Matches(token))
        {
            current = int.Parse(item.Value);
            if (!bigOne.HasValue)
            {
                bigOne = current;
            }

            if (current.Value > bigOne.Value)
            {
                bigOne = current;
            }
        }

        return bigOne;
    }
}

Using the above script, you can see how it slices and dices the Input data. I made a minor change between the code that generated the below screenshot and what's posted. I observed that the input value 9000 was assigned to Row.number but as that Row never had an Interval assigned, I deferred the actual Row population to the end of the script (it was in the

dataflow

Sql-server – How to load more than one MS-Access-db tables with data in SQL-Server-DB

In SSMS, you can right click on your target database, select Tasks...Import Data, then choose your Access database as the Source and then choose your target SQL Server database as the Destination.

Best Answer

Related Solutions

Sql-server – SSIS how to parse column with erratic data

Sql-server – How to load more than one MS-Access-db tables with data in SQL-Server-DB

Related Question