Tuesday, September 6, 2011

Concatenating values from multiple rows using T-SQL


This is one of the challenges taken from BeyondRelational.

Below is the problem statement.

The challenge is all about merging the data based on certain conditions.


Sample Input Data

ID             Name

-------------  ----------------------
1              Deepak Kumar Goyal
2              Niladri Biswas
2              Pratik Shaw
3              Sumi Girijan
3              Suresh Beldar
3              Jeeva Baby


The challenge is that if the Id’s are exactly two in number (e.g. count of ID = 2 is exactly two ) then the names should be concatenated with 'OR'. When Ids are more than two (e.g. Id # 3 has a count of 3) , the values should be concatenated with 'AND'

Expected Output


ID      Name
------- --------------------------------------------------
1       Deepak Kumar Goyal
2       Niladri Biswas OR Pratik Shaw
3       Sumi Girijan AND Suresh Beldar AND Jeeva Baby




Script to generate the data


DECLARE @t TABLE(
ID INT,
NAME VARCHAR(MAX)
)

INSERT INTO @t (ID,[NAME]) SELECT 1, 'Deepak Kumar Goyal'
INSERT INTO @t (ID,[NAME]) SELECT 2, 'Niladri Biswas'
INSERT INTO @t (ID,[NAME]) SELECT 2, 'Pratik Shaw'
INSERT INTO @t (ID,[NAME]) SELECT 3, 'Sumi Girijan'
INSERT INTO @t (ID,[NAME]) SELECT 3, 'Suresh Beldar'
INSERT INTO @t (ID,[NAME]) SELECT 3, 'Jeeva Baby'

Also, below are the other Restrictions or notes that should be considered.



  • The program should run in SQL SERVER 2005+.
  • Column names should respect the desired output shown
  • There should be exactly 1 space before and after the OR/AND clause.
  • Output should be in the order of Ascending Id
  • Names should not be repeated even if it is given in the sample input i.e. if X is appearing more than once in the input for id = 10, the desired output will have only one such record.
  • Names cannot be blank e.g. for id = 10 if the Name field is ‘’(blank), the program should ignore that.
  • The program has to be done by a single query and should begin either with a SELECT or WITH statement with no variables, temporary table, table variables permitted.
  • You cannot use RBAR, cursors, loops etc. in your program. 



  • Solution


    • The CTE 'DistinctNames' gets the distinct Ids and Names excluding the Names that are blank or NULL.
    • Next what I do is get the count of all the distinct Names by Id which will help me to figure out whether to add AND or OR in between Names.
    • Then, we add row number to the data which is obtained by joining the above two data sets.
    • The last CTE is the recursive CTE which is used to traverse data row by row for each Id.


    ;WITH DistinctNames

    AS
    (

          SELECT DISTINCT t.ID,T.NAME
          FROM @t t
          WHERE t.NAME IS NOT NULL AND ISNULL(t.NAME,'') <> ''                   
    ),
    GetTotals
    AS
    (
          SELECT ID,COUNT(Distinct NAME) AS total FROM DistinctNames
          GROUP BY ID
    ),
    Result
    AS
    (    
          SELECT d.ID,d.NAME, g.total, ROW_NUMBER() OVER(PARTITION BY d.ID ORDER BY d.ID) as row_num
          FROM DistinctNames d
                INNER JOIN GetTotals g
                      ON d.ID = g.ID
    ),
    Result2
    AS
    (
          SELECT ID, NAME, row_num
          FROM Result
          WHERE row_num = 1
         
          UNION ALL
         
          SELECT r2.ID,
                   CASE WHEN r.total = 2 THEN r2.NAME + ' AND '+r.NAME     
                            WHEN r.total > 2 THEN r2.NAME + ' OR '+r.NAME
                      END as NAME,
                      r.row_num                                      
          FROM Result2 r2
                INNER JOIN Result r
                      ON r2.ID = r.ID AND (r2.row_num + 1) =  r.row_num
    )
    SELECT r.ID,r.NAME FROM Result2 r
    WHERE r.row_num =
    (
          SELECT MAX(r2.row_num) FROM Result2 r2 WHERE r2.ID = r.ID
    )
    ORDER BY ID


    You can find other solutions for the same problem at BeyondRelational

    No comments:

    Post a Comment