Friday, October 9, 2009

Pivot with out aggregate

Pivot without Aggregating with consistent and inconsistent formats:

One of the most common tasks which I came across at least once in almost every project is to convert the structure of the table data into a PIVOT form. But, I don’t have anything to aggregate in the table; all I want is convert the structure as below.

Image-1

image

Before SQL Server 2005, in the earlier versions we used to perform this through using cursors and temp tables which takes a lot of I/O. Starting with SQL Server 2005 which provides a new operator called PIVOT which actually does this, but the main purpose of using PIVOT is to rotate the table data from vertical to horizontal as shown in above picture, and performs AGGREGATIONS on required columns which will be shown in the final data set.

As per the syntax PIVOT only works with an aggregate:

SELECT <non-pivoted column>,

[first pivoted column] AS <column name>,

[second pivoted column] AS <column name>,

...

[last pivoted column] AS <column name>

FROM

(<SELECT query that produces the data>)

AS <alias for the source query>

PIVOT

(

<aggregation function>(<column being aggregated>)

FOR

[<column that contains the values that will become column headers>]

IN ( [first pivoted column], [second pivoted column],

... [last pivoted column])

) AS <alias for the pivot table>

<optional ORDER BY clause>;

But I don’t want to do any aggregate operation as all I want is to transpose the data into a tabular format.

There are 2 approaches to do this based on the requirements:

1. If the column names are consistent across the data then follow the 1st approach

2. If the column names are NOT consistent then the 2nd approach will solve the purpose using a schema table.

Let’s create a table with the data as shown in the Image-1

Approach- 1 :- columns (NAME,ADDRESS,PHONE ,EMAIL) is consistent across all the sets

create table customers ( IID INT IDENTITY(1,1),col0 varchar(90),col1 varchar(90))

GO

-- 1st set

insert into customers values ( 'NAME','A_NM')

insert into customers values ( 'ADDRESS','A_ADR')

insert into customers values ( 'PHONE','A_PH')

insert into customers values ( 'EMAIL','A_EMAIL')

-- 2nd set

insert into customers values ( 'NAME','B_BM')

insert into customers values ( 'ADDRESS','B_ADR')

insert into customers values ( 'PHONE','B_XYZ')

insert into customers values ( 'EMAIL','B_PH')

-- 3rd set

insert into customers values ( 'NAME','C_BM')

insert into customers values ( 'ADDRESS','C_ADR')

insert into customers values ( 'PHONE','C_XYZ')

insert into customers values ( 'EMAIL','C_EMAIL')

To transpose the data into a pivot report, I have used the partition function and included the results in a CTE and select the data in the CTE using PIVOT operator
script –A1

WITH CUST_CTE(iid, COL0,COL1,RID)

AS

(

SELECT iid, COL0,COL1 , ROW_NUMBER() OVER (PARTITION BY (COL0)ORDER BY IID) AS RID FROM CUSTOMERS

)

SELECT NAME,ADDRESS,PHONE,EMAIL

FROM

(SELECT COL0,COL1,RID

FROM CUST_CTE)C

PIVOT

(

max(COL1)

FOR COL0 IN (NAME,[ADDRESS],PHONE,EMAIL)

) AS PivotTable;

Output:

clip_image004

Explanation: the key to the whole solution is to partition the sets in the data, the column RID has the partitioned values which actually works with the PIVOT. To make it more clear, look at the output of the CTE. You can see the data sets are partitioned with a column in RID. So when you execute the script A1, the RID makes the rows to unique.

WITH CUST_CTE(iid, COL0,COL1,RID)

AS

(

SELECT iid, COL0,COL1 , ROW_NUMBER() OVER (PARTITION BY (COL0)ORDER BY IID) AS RID FROM CUSTOMERS

)

SELECT * FROM CUST_CTE ORDER BY 1,4

clip_image006

Approach- 2:- considering the column NAME acts as a primary key and present across all the sets , but rest of the columns (ADDRESS, PHONE, and EMAIL) are NOT consistent across all the sets.
Lets create the table with the sample set to simulate the problem.

TRUNCATE TABLE CUSTOMERS

GO

-- 1st set

insert into customers values ( 'NAME','A_NM')

insert into customers values ( 'ADDRESS','A_ADR')

insert into customers values ( 'PHONE','A_PH')

insert into customers values ( 'EMAIL','A_EMAIL')

-- 2nd set

insert into customers values ( 'NAME','B_BM')

insert into customers values ( 'ADDRESS','B_ADR')

--insert into customers values ( 'PHONE','B_XYZ')

insert into customers values ( 'EMAIL','B_PH')

-- 3rd set

insert into customers values ( 'NAME','C_BM')

insert into customers values ( 'ADDRESS','C_ADR')

insert into customers values ( 'PHONE','C_XYZ')

insert into customers values ( 'EMAIL','C_EMAIL')

GO

SELECT * FROM CUSTOMERS

When you execute the Script-A, you will see an incorrect results in the transposed data.
Script –A2

WITH CUST_CTE(iid, COL0,COL1,RID)

AS

(

SELECT iid, COL0,COL1 , ROW_NUMBER() OVER (PARTITION BY (COL0)ORDER BY IID) AS RID FROM CUSTOMERS

)

SELECT NAME,ADDRESS,PHONE,EMAIL

FROM

(SELECT COL0,COL1,RID

FROM CUST_CTE)C

PIVOT

(

max(COL1)

FOR COL0 IN (NAME,[ADDRESS],PHONE,EMAIL)

) AS PivotTable;

Output:

clip_image008

The value “C_XYZ should belong to C_BM , but its moved to B_BM.

As I mentioned, the key is the partition column RID, let’s look at what is there in the RID by executing only the statement in CTE

WITH CUST_CTE(iid, COL0,COL1,RID)

AS

(

SELECT iid, COL0,COL1 , ROW_NUMBER() OVER (PARTITION BY (COL0)ORDER BY IID) AS RID FROM CUSTOMERS

)

SELECT * FROM CUST_CTE ORDER BY 1,4

clip_image008[1]clip_image009                  clip_image011

So, the partitioned value RID for C_XYZ should be 3 as it belongs to 3rd set (look at the insert statement above) , but here its 2 , because I had partitioned the col0 order by IID.
So, if I can partition the data based on the sets, then it will resolve the problem, but there is no such function exists in SQL Server to partition based on defined sets. So I followed the below technique to partition the data based on the column value (NAME).

with cte_group(RID,IID,COL0,COL1)

AS

(

select row_number() over (ORDER BY IID )AS RID,* from

--customers ~ select statement with adding one row at the end with value Name

(SELECT * FROM customers UNION SELECT MAX(IID)+1,'NAME','' FROM customers)A

where col0 = 'NAME'

)

-- sent data to a temp table

SELECT * INTO #GROUPS FROM CUSTOMERS a

join (SELECT A.RID, A.IID as strt ,B.IID as ends FROM cte_group A JOIN cte_group B ON B.RID - A.RID = 1) b

on a.iid >= b.strt and a.iid <b.ends

select * from #GROUPS

Execute the below script which shows the partitioned column based on sets without using partition function

select from #GROUPS

Output:

clip_image013

Now, you will get the desired output as the RID is able to show the correct data based on the sets by using the below script.

SELECT NAME,ADDRESS,PHONE,EMAIL

FROM

(SELECT COL0,COL1,RID

FROM #GROUPS)C

PIVOT

(

max(COL1)

FOR COL0 IN (NAME,[ADDRESS],PHONE,EMAIL)

) AS PivotTable;

OUTPUT:

clip_image015

22 comments:

Deepika and Tarun's blog said...

WHAT IF IT IS NOT KNOWN HOW MANY COLUMNS WOULD BE THERE. SO FOR EXAMPLE THERE MAY BE MORE VALUES OTHER THEN PHONE, NAME ETC WHICH IS NOT KNOWN AT COMPILE TIME

calsql said...

i will try this as well..

Shalabh said...

How to use Pivot in this scenerio? I want to avoid using aggregate function in my pivot statement. Is there any workaround.

calsql said...

you cant use the PIVOT function with out using an aggregate and again its all depends on your exact requirement. post your exact requirement , we will look at it.

Anonymous said...

I tried it on my tables, I'm getting an error:
Incorrect syntax near the keyword 'with'.
This is what I've tried:
select * from IdentityProperties i, Properties p where i.IdentityID = p.ID
with CTE (T_ID, ID, Name, Value)
AS
(
SELECT
row_number() OVER(PARTITION BY Name ORDER BY ID) AS T_ID, ID,Name , Value
FROM dbo.IdentityProperties i inner join dbo.Properties p on i.PropertyID = p.ID
)

calsql said...

-- the line before the with must contain semicolon(;), use the below script

select * from IdentityProperties i, Properties p where i.IdentityID = p.ID;
with CTE (T_ID, ID, Name, Value)
AS
(
SELECT
row_number() OVER(PARTITION BY Name ORDER BY ID) AS T_ID, ID,Name , Value
FROM dbo.IdentityProperties i inner join dbo.Properties p on i.PropertyID = p.ID
)

SidC said...

I need to use your example to populate 49 columns for diagnoses and another 49 columns for proceedures. The common field will be account. Is this feasible?

Anonymous said...

Hi,

Can you please give me how to use pivot when you don't know the number of columns and with out aggregate function?
can we use dynamic sql to do this?

chandra pathivada said...

yes, you can use dynamic pivot. checkout the article by madhavan at the below link.
http://sqlblogcasts.com/blogs/madhivanan/archive/2008/08/27/dynamic-pivot-in-sql-server-2005.aspx

Anonymous said...

If your data set in the first instance has data which has an ID which forces grouping on it, you dont need to use ROW_NUMBER() PARTITION etc. The use of the CTE is unnecessary and adds complexity here it is simply be used as an Alias, when a a few more brackets would do the trick.


chandra sekhar pathivada said...

We still need the CTE to generate TID(partition)

ValonK said...

Hi

This is a simple solution, for the case when you don't know the number of columns

http://valonkastrati.wordpress.com/2013/05/29/sql-rows-into-columns-without-using-pivot/

Happy coding!

chandra sekhar said...

Hi Valonk,
I like your post , but i think you need to modify a bit as its missing the format still.
if you take the table in this poast as an example "create table customers..." then you will see the difference.
Happy coding.

Anonymous said...

Check this link for pivot without aggregate function::

http://sandipgsql.blogspot.in/2013/06/pivot-without-aggregate-function-in-sql.html

chandra sekhar said...

purpose of pivot function is to eliminate cursor. can you compare performance results from both the approaches

Anonymous said...

Nice explanation.. good job...

Anonymous said...

suppose the pivoted column has say 5 different valuesand we need all of those(and not just the max or min value).

chandra sekhar said...

by default it works with 5 different values

frigate said...

Thank a lot - your post saved me hours of work! I used Approach 1 (By using LEFT OUTER JOIN I managed to compensate MISSING values) along with user-defined table type (instead of CTE) and dynamic SQL. Now I have a "GENERIC" stored procedure that could return a recordset with "OPEN SCHEMA" with normal representation of row. Finally tamed EAV "beast"!

Anonymous said...

o happy day! thanks1

Anonymous said...

Thanks for the post. Very useful.

Anonymous said...

Thanks - this helped a lot!

Featured Post

SQL Server AWS Migration BCP

stored procedure to generate BCP scritps to migrate the SQL Server database. Developed this stored procedure on my labs to simulate the Mi...

Contributors