SQL Server 2008 R2 – How to Pivot Multiple Columns with Variable Number of Rows

pivotsql serversql-server-2008-r2

I have a table that looks like this:

RECIPE     VERSION_ID     INGREDIENT    PERCENTAGE
4000       100            Ing_1          23,0
4000       100            Ing_100         0,1
4000       200            Ing_1          20,0
4000       200            Ing_100         0,7
4000       300            Ing_1          22,3
4000       300            Ing_100         0,9
4001       900            Ing_1           8,3
4001       900            Ing_100        72,4
4001       901            Ing_1           9,3
4001       901            Ing_100        70,5
5012       871            Ing_1          45,1
5012       871            Ing_100         0,9
5012       877            Ing_1          47,2
5012       877            Ing_100         0,8
5012       879            Ing_1          46,6
5012       879            Ing_100         0,9
5012       880            Ing_1          43,6
5012       880            Ing_100         1,2

There are 100 ingredients per recipe/version. I'd like to display the data from this table like this:

RECIPE     INGREDIENT_Vxxx     PERCENTAGE_Vxxx     INGREDIENT_Vyyy     INGREDIENT_Vyyy (ETC)
4000       Ing_1               23,0                Ing_1               20,0
4000       Ing_100             0,1                 Ing_100              0,7

Because in different versions of recipes, ingredients could be removed or added, I'd like to display both ingredient and percentage per version per recipe. There's also the difficulty that different recipes have different number of versions.

I'm not even sure if this is possible at all or where to begin. Maybe with the PIVOT function?

Could anyone please point me in the right direction?

Best Answer

The problem here seems to me to be largely more of a scoping issue - you are likely having difficulty solving this problem on account of the requirements not being defined well enough. With the description and sample data provided, there are at least three partial solutions, none of which may be applicable to your particular use cases. With the test data set up as follows,

IF NOT EXISTS ( SELECT  1
                FROM    sys.objects
                WHERE   name = 'Recipe'
                    AND type = 'U' )
BEGIN
    --DROP TABLE dbo.Recipe;
    CREATE TABLE dbo.Recipe
    (
        Recipe          INTEGER NOT NULL,
        VersionID       INTEGER NOT NULL,
        Ingredient      VARCHAR( 8 ) NOT NULL,
        Percentage      DECIMAL( 5, 2 )
    );

    INSERT INTO dbo.Recipe ( Recipe, VersionID, Ingredient, Percentage )
                SELECT  4000, 100, 'Ing_1', 23.0
    UNION ALL   SELECT  4000, 100, 'Ing_100', 0.1
    UNION ALL   SELECT  4000, 200, 'Ing_1', 20.0
    UNION ALL   SELECT  4000, 200, 'Ing_100', 0.7
    UNION ALL   SELECT  4000, 300, 'Ing_1', 22.3
    UNION ALL   SELECT  4000, 300, 'Ing_100', 0.9
    UNION ALL   SELECT  4001, 900, 'Ing_1', 8.3
    UNION ALL   SELECT  4001, 900, 'Ing_100', 72.4
    UNION ALL   SELECT  4001, 901, 'Ing_1', 9.3
    UNION ALL   SELECT  4001, 901, 'Ing_100', 70.5
    UNION ALL   SELECT  5012, 871, 'Ing_1', 45.1
    UNION ALL   SELECT  5012, 871, 'Ing_100', 0.9
    UNION ALL   SELECT  5012, 877, 'Ing_1', 47.2
    UNION ALL   SELECT  5012, 877, 'Ing_100', 0.8
    UNION ALL   SELECT  5012, 879, 'Ing_1', 46.6
    UNION ALL   SELECT  5012, 879, 'Ing_100', 0.9
    UNION ALL   SELECT  5012, 880, 'Ing_1', 43.6
    UNION ALL   SELECT  5012, 880, 'Ing_100', 1.2;

    ALTER TABLE dbo.Recipe
    ADD CONSTRAINT PK__Recipe
        PRIMARY KEY CLUSTERED ( Recipe, VersionID, Ingredient );

    CREATE NONCLUSTERED INDEX IX__Recipe__Recipe__VersionID
        ON  dbo.Recipe ( Recipe, VersionID )
    INCLUDE ( Percentage );
END;
GO

we can use our new table to explore some possible solutions. Expanding on the sample output, we add the next recipe into the result set to illustrate the difficulty with the question.

RECIPE --- INGREDIENT_V100 --- PERCENTAGE_V100 --- INGREDIENT_V200 --- INGREDIENT_V200 
4000       Ing_1               23,0                Ing_1               20,0
4000       Ing_100              0,1                Ing_100              0,7
4001       Ing_1                8,3                Ing_1                9,3
4001       Ing_100             72,4                Ing_100             70,5

Columns %_V100 and %_V200 make perfect sense in the case of the 4000 recipe, but quickly lose their meaning as additional recipes are added. The 4001 recipe would need new and separate columns to properly label the data by version, but since the version numbers differ for each and every recipe, that path leads us to a very sparse result set which would be downright annoying to use, or we must alias the columns, losing the version number data.

Solution 1:

Starting with what I feel is the absolute worst way of going about this, let's look at the sparse result set. For the sample data, we would be attempting to generate a query which would look like something along the following lines:

SELECT  p.Recipe,
        [Ingredient_v100] = CASE WHEN p.[100] IS NULL THEN NULL ELSE p.[Ingredient] END, [Percentage_v100] = p.[100], 
        [Ingredient_v200] = CASE WHEN p.[200] IS NULL THEN NULL ELSE p.[Ingredient] END, [Percentage_v200] = p.[200], 
        [Ingredient_v300] = CASE WHEN p.[300] IS NULL THEN NULL ELSE p.[Ingredient] END, [Percentage_v300] = p.[300], 
        [Ingredient_v871] = CASE WHEN p.[871] IS NULL THEN NULL ELSE p.[Ingredient] END, [Percentage_v871] = p.[871], 
        [Ingredient_v877] = CASE WHEN p.[877] IS NULL THEN NULL ELSE p.[Ingredient] END, [Percentage_v877] = p.[877], 
        [Ingredient_v879] = CASE WHEN p.[879] IS NULL THEN NULL ELSE p.[Ingredient] END, [Percentage_v879] = p.[879], 
        [Ingredient_v880] = CASE WHEN p.[880] IS NULL THEN NULL ELSE p.[Ingredient] END, [Percentage_v880] = p.[880], 
        [Ingredient_v900] = CASE WHEN p.[900] IS NULL THEN NULL ELSE p.[Ingredient] END, [Percentage_v900] = p.[900], 
        [Ingredient_v901] = CASE WHEN p.[901] IS NULL THEN NULL ELSE p.[Ingredient] END, [Percentage_v901] = p.[901]
FROM (  SELECT  r.Recipe, 
                r.VersionID, 
                r.Ingredient,
                r.Percentage 
        FROM    dbo.Recipe r ) s
PIVOT ( MAX( s.Percentage )
        FOR s.VersionID IN ( [100], [200], [300], [871], [877], [879], [880], [900], [901] ) ) p
ORDER BY p.Recipe;

On account of the variable number of versions, we can use some dynamic SQL to generate and execute the query.

DECLARE @Piv            NVARCHAR( MAX ),
        @Col            NVARCHAR( MAX ),
        @SQL            NVARCHAR( MAX );

SELECT  @Piv = LEFT( b.Piv, LEN( b.Piv ) - 1 )
FROM (  SELECT  N'[' + CONVERT( VARCHAR( 8 ), a.VersionID ) + '], '
        FROM (  SELECT  DISTINCT r.VersionID 
                FROM    dbo.Recipe r ) a
        ORDER BY a.VersionID
        FOR XML PATH ( '' ) ) b ( Piv );

SELECT  @Col = LEFT( b.Piv, LEN( b.Piv ) - 1 )
FROM (  SELECT  N'[Ingredient_v' + CONVERT( VARCHAR( 8 ), a.VersionID ) + '] = CASE'
                    + ' WHEN p.[' + CONVERT( VARCHAR( 8 ), a.VersionID ) + '] IS NULL THEN NULL'
                    + ' ELSE p.[Ingredient] END, ' 
                    + '[Percentage_v' + CONVERT( VARCHAR( 8 ), a.VersionID ) + '] = p.[' 
                    + CONVERT( VARCHAR( 8 ), a.VersionID ) + '], ' 
        FROM (  SELECT  DISTINCT r.VersionID 
                FROM    dbo.Recipe r ) a
        ORDER BY a.VersionID
        FOR XML PATH ( '' ) ) b ( Piv );

SET @SQL = N'
        SELECT  p.Recipe, ' + @Col + '
        FROM (  SELECT  r.Recipe, 
                        r.VersionID, 
                        r.Ingredient,
                        r.Percentage 
                FROM    dbo.Recipe r ) s
        PIVOT ( MAX( s.Percentage )
                FOR s.VersionID IN ( ' + @Piv + ' ) ) p
        ORDER BY p.Recipe;';
EXECUTE dbo.sp_executesql @statement = @SQL;
GO

This result set plainly sucks. Here's an SQL Fiddle which displays the results, take a peek and let's move on.

Solution 2:

With the sparse result set proving to be next to useless, we can accept losing the version numbers of the recipes and simply order them in ascending version number. For the purpose of the example we'll alias alphabetically, so that versions 100, 200 and 300 of recipe 4000 will receive A, B and C designations, whereas versions 900 and 901 will receive just A and B. The query we would like to generate for this should look somewhat like the following:

SELECT  p.Recipe, 
        [Ingredient_vA] = p.[Ingredient], [Percentage_vA] = ISNULL( p.[Percentage_vA], 0 ),
        [Ingredient_vB] = p.[Ingredient], [Percentage_vB] = ISNULL( p.[Percentage_vB], 0 ),
        [Ingredient_vC] = p.[Ingredient], [Percentage_vC] = ISNULL( p.[Percentage_vC], 0 ),
        [Ingredient_vD] = p.[Ingredient], [Percentage_vD] = ISNULL( p.[Percentage_vD], 0 )
FROM (  SELECT  Lvl = 'Percentage_v' + CHAR( 64 + 
                    DENSE_RANK() OVER ( 
                        PARTITION BY r.Recipe
                        ORDER BY r.VersionID ) ), 
                r.Recipe, 
                r.Ingredient,
                r.Percentage 
        FROM    dbo.Recipe r ) s
PIVOT ( MAX( s.Percentage )
        FOR s.Lvl IN ( [Percentage_vA], [Percentage_vB], [Percentage_vC], [Percentage_vD] ) ) p
ORDER BY p.Recipe;

Similarly to solution one, dynamic SQL can be leveraged to accomplish this.

DECLARE @Piv            NVARCHAR( MAX ),
        @Col            NVARCHAR( MAX ),
        @SQL            NVARCHAR( MAX );

SELECT  @Piv = LEFT( b.Piv, LEN( b.Piv ) - 1 )
FROM (  SELECT  N'[Percentage_v' + CHAR( 64 + a.Lvl ) + '], '
        FROM (  SELECT  DISTINCT Lvl = DENSE_RANK() 
                            OVER (  PARTITION BY r.Recipe
                                    ORDER BY r.VersionID )
                FROM    dbo.Recipe r ) a
        ORDER BY a.Lvl
        FOR XML PATH ( '' ) ) b ( Piv );

SELECT  @Col = LEFT( b.Col, LEN( b.Col ) - 1 )
FROM (  SELECT  N'[Ingredient_v' + CHAR( 64 + a.Lvl ) + '] = p.[Ingredient], '
                    + '[Percentage_v' + CHAR( 64 + a.Lvl ) + '] = ISNULL( p.[Percentage_v'
                    + CHAR( 64 + a.Lvl ) + '], 0 ),'
        FROM (  SELECT  DISTINCT Lvl = DENSE_RANK() 
                            OVER (  PARTITION BY r.Recipe
                                    ORDER BY r.VersionID )
                FROM    dbo.Recipe r ) a
        ORDER BY a.Lvl
        FOR XML PATH ( '' ) ) b ( Col );

SET @SQL = N'
        SELECT  p.Recipe, ' + @Col + '
        FROM (  SELECT  Lvl = ''Percentage_v'' + CHAR( 64 + 
                            DENSE_RANK() OVER ( 
                                PARTITION BY r.Recipe
                                ORDER BY r.VersionID ) ), 
                        r.Recipe, 
                        r.Ingredient,
                        r.Percentage 
                FROM    dbo.Recipe r ) s
        PIVOT ( MAX( s.Percentage )
                FOR s.Lvl IN ( ' + @Piv + ' ) ) p
        ORDER BY p.Recipe;';
EXECUTE dbo.sp_executesql @statement = @SQL;
GO

This ends up in a much prettier result set, as can be seen in this SQL Fiddle, despite the loss of the specific version numbers of each recipe.

Solution 3:

If the loss of the version numbers can not be tolerated, a hybrid approach can be implemented, though the results from each call would be limited to only a single recipe. In effect, our goal SQL would be similar to the first solution, but with an expressly defined Recipe number.

SELECT  p.Recipe, 
        [Ingredient_v100] = CASE WHEN p.[100] IS NULL THEN NULL ELSE p.[Ingredient] END, [Percentage_v100] = p.[100], 
        [Ingredient_v200] = CASE WHEN p.[200] IS NULL THEN NULL ELSE p.[Ingredient] END, [Percentage_v200] = p.[200], 
        [Ingredient_v300] = CASE WHEN p.[300] IS NULL THEN NULL ELSE p.[Ingredient] END, [Percentage_v300] = p.[300]
FROM (  SELECT  r.Recipe, 
                r.VersionID, 
                r.Ingredient,
                r.Percentage 
        FROM    dbo.Recipe r
        WHERE   r.Recipe = @Recipe ) s
PIVOT ( MAX( s.Percentage )
        FOR s.VersionID IN ( [100], [200], [300] ) ) p
ORDER BY p.Recipe;

Generation could be handled as follows:

DECLARE @Piv            NVARCHAR( MAX ),
        @Col            NVARCHAR( MAX ),
        @Param          NVARCHAR( MAX ),
        @SQL            NVARCHAR( MAX ),
        @Recipe         INTEGER = 4000;

SELECT  @Piv = LEFT( b.Piv, LEN( b.Piv ) - 1 )
FROM (  SELECT  N'[' + CONVERT( VARCHAR( 8 ), a.VersionID ) + '], '
        FROM (  SELECT  DISTINCT r.VersionID 
                FROM    dbo.Recipe r
                WHERE   Recipe = @Recipe ) a
        ORDER BY a.VersionID
        FOR XML PATH ( '' ) ) b ( Piv );

SELECT  @Col = LEFT( b.Piv, LEN( b.Piv ) - 1 )
FROM (  SELECT  N'[Ingredient_v' + CONVERT( VARCHAR( 8 ), a.VersionID ) + '] = CASE'
                + ' WHEN p.[' + CONVERT( VARCHAR( 8 ), a.VersionID ) + '] IS NULL THEN NULL'
                + ' ELSE p.[Ingredient] END, ' 
                + '[Percentage_v' + CONVERT( VARCHAR( 8 ), a.VersionID ) + '] = p.[' 
                    + CONVERT( VARCHAR( 8 ), a.VersionID ) + '], ' 
        FROM (  SELECT  DISTINCT r.VersionID 
                FROM    dbo.Recipe r
                WHERE   Recipe = @Recipe ) a
        ORDER BY a.VersionID
        FOR XML PATH ( '' ) ) b ( Piv );

SET @Param = N'@Recipe  INTEGER';

SET @SQL = N'
        SELECT  p.Recipe, ' + @Col + '
        FROM (  SELECT  r.Recipe, 
                        r.VersionID, 
                        r.Ingredient,
                        r.Percentage 
                FROM    dbo.Recipe r
                WHERE   r.Recipe = @Recipe ) s
        PIVOT ( MAX( s.Percentage )
                FOR s.VersionID IN ( ' + @Piv + ' ) ) p
        ORDER BY p.Recipe;';
EXECUTE dbo.sp_executesql @statement = @SQL, @param = @Param, @Recipe = @Recipe;
GO

The results could then be accessed per recipe, as shown in this SQL Fiddle or this one.