Pivot with out aggregate

Pivot without Aggregating with consistent and inconsistent formats:

One of the most common tasks which I came across at least once in almost every project is to convert the structure of the table data into a PIVOT form. But, I don’t have anything to aggregate in the table; all I want is convert the structure as below.

Image-1

image

Before SQL Server 2005, in the earlier versions we used to perform this through using cursors and temp tables which takes a lot of I/O. Starting with SQL Server 2005 which provides a new operator called PIVOT which actually does this, but the main purpose of using PIVOT is to rotate the table data from vertical to horizontal as shown in above picture, and performs AGGREGATIONS on required columns which will be shown in the final data set.

As per the syntax PIVOT only works with an aggregate:

SELECT <non-pivoted column>,

[first pivoted column] AS <column name>,

[second pivoted column] AS <column name>,

...

[last pivoted column] AS <column name>

FROM

(<SELECT query that produces the data>)

AS <alias for the source query>

PIVOT

(

<aggregation function>(<column being aggregated>)

FOR

[<column that contains the values that will become column headers>]

IN ( [first pivoted column], [second pivoted column],

... [last pivoted column])

) AS <alias for the pivot table>

<optional ORDER BY clause>;

But I don’t want to do any aggregate operation as all I want is to transpose the data into a tabular format.

There are 2 approaches to do this based on the requirements:

1. If the column names are consistent across the data then follow the 1st approach

2. If the column names are NOT consistent then the 2nd approach will solve the purpose using a schema table.

Let’s create a table with the data as shown in the Image-1

Approach- 1 :- columns (NAME,ADDRESS,PHONE ,EMAIL) is consistent across all the sets

create table customers ( IID INT IDENTITY(1,1),col0 varchar(90),col1 varchar(90))

GO

-- 1st set

insert into customers values ( 'NAME','A_NM')

insert into customers values ( 'ADDRESS','A_ADR')

insert into customers values ( 'PHONE','A_PH')

insert into customers values ( 'EMAIL','A_EMAIL')

-- 2nd set

insert into customers values ( 'NAME','B_BM')

insert into customers values ( 'ADDRESS','B_ADR')

insert into customers values ( 'PHONE','B_XYZ')

insert into customers values ( 'EMAIL','B_PH')

-- 3rd set

insert into customers values ( 'NAME','C_BM')

insert into customers values ( 'ADDRESS','C_ADR')

insert into customers values ( 'PHONE','C_XYZ')

insert into customers values ( 'EMAIL','C_EMAIL')

To transpose the data into a pivot report, I have used the partition function and included the results in a CTE and select the data in the CTE using PIVOT operator
script –A1

WITH CUST_CTE(iid, COL0,COL1,RID)

AS

(

SELECT iid, COL0,COL1 , ROW_NUMBER() OVER (PARTITION BY (COL0)ORDER BY IID) AS RID FROM CUSTOMERS

)

SELECT NAME,ADDRESS,PHONE,EMAIL

FROM

(SELECT COL0,COL1,RID

FROM CUST_CTE)C

PIVOT

(

max(COL1)

FOR COL0 IN (NAME,[ADDRESS],PHONE,EMAIL)

) AS PivotTable;

Output:

clip_image004

Explanation: the key to the whole solution is to partition the sets in the data, the column RID has the partitioned values which actually works with the PIVOT. To make it more clear, look at the output of the CTE. You can see the data sets are partitioned with a column in RID. So when you execute the script A1, the RID makes the rows to unique.

WITH CUST_CTE(iid, COL0,COL1,RID)

AS

(

SELECT iid, COL0,COL1 , ROW_NUMBER() OVER (PARTITION BY (COL0)ORDER BY IID) AS RID FROM CUSTOMERS

)

SELECT * FROM CUST_CTE ORDER BY 1,4

clip_image006

Approach- 2:- considering the column NAME acts as a primary key and present across all the sets , but rest of the columns (ADDRESS, PHONE, and EMAIL) are NOT consistent across all the sets.
Lets create the table with the sample set to simulate the problem.

TRUNCATE TABLE CUSTOMERS

GO

-- 1st set

insert into customers values ( 'NAME','A_NM')

insert into customers values ( 'ADDRESS','A_ADR')

insert into customers values ( 'PHONE','A_PH')

insert into customers values ( 'EMAIL','A_EMAIL')

-- 2nd set

insert into customers values ( 'NAME','B_BM')

insert into customers values ( 'ADDRESS','B_ADR')

--insert into customers values ( 'PHONE','B_XYZ')

insert into customers values ( 'EMAIL','B_PH')

-- 3rd set

insert into customers values ( 'NAME','C_BM')

insert into customers values ( 'ADDRESS','C_ADR')

insert into customers values ( 'PHONE','C_XYZ')

insert into customers values ( 'EMAIL','C_EMAIL')

GO

SELECT * FROM CUSTOMERS

When you execute the Script-A, you will see an incorrect results in the transposed data.
Script –A2

WITH CUST_CTE(iid, COL0,COL1,RID)

AS

(

SELECT iid, COL0,COL1 , ROW_NUMBER() OVER (PARTITION BY (COL0)ORDER BY IID) AS RID FROM CUSTOMERS

)

SELECT NAME,ADDRESS,PHONE,EMAIL

FROM

(SELECT COL0,COL1,RID

FROM CUST_CTE)C

PIVOT

(

max(COL1)

FOR COL0 IN (NAME,[ADDRESS],PHONE,EMAIL)

) AS PivotTable;

Output:

clip_image008

The value “C_XYZ should belong to C_BM , but its moved to B_BM.

As I mentioned, the key is the partition column RID, let’s look at what is there in the RID by executing only the statement in CTE

WITH CUST_CTE(iid, COL0,COL1,RID)

AS

(

SELECT iid, COL0,COL1 , ROW_NUMBER() OVER (PARTITION BY (COL0)ORDER BY IID) AS RID FROM CUSTOMERS

)

SELECT * FROM CUST_CTE ORDER BY 1,4

clip_image008[1]clip_image009                  clip_image011

So, the partitioned value RID for C_XYZ should be 3 as it belongs to 3rd set (look at the insert statement above) , but here its 2 , because I had partitioned the col0 order by IID.
So, if I can partition the data based on the sets, then it will resolve the problem, but there is no such function exists in SQL Server to partition based on defined sets. So I followed the below technique to partition the data based on the column value (NAME).

with cte_group(RID,IID,COL0,COL1)

AS

(

select row_number() over (ORDER BY IID )AS RID,* from

--customers ~ select statement with adding one row at the end with value Name

(SELECT * FROM customers UNION SELECT MAX(IID)+1,'NAME','' FROM customers)A

where col0 = 'NAME'

)

-- sent data to a temp table

SELECT * INTO #GROUPS FROM CUSTOMERS a

join (SELECT A.RID, A.IID as strt ,B.IID as ends FROM cte_group A JOIN cte_group B ON B.RID - A.RID = 1) b

on a.iid >= b.strt and a.iid <b.ends

select * from #GROUPS

Execute the below script which shows the partitioned column based on sets without using partition function

select from #GROUPS

Output:

clip_image013

Now, you will get the desired output as the RID is able to show the correct data based on the sets by using the below script.

SELECT NAME,ADDRESS,PHONE,EMAIL

FROM

(SELECT COL0,COL1,RID

FROM #GROUPS)C

PIVOT

(

max(COL1)

FOR COL0 IN (NAME,[ADDRESS],PHONE,EMAIL)

) AS PivotTable;

OUTPUT:

clip_image015

Post a Comment