Pivot without Aggregating with consistent and inconsistent formats:
One of the most common tasks which I came across at least once in almost every project is to convert the structure of the table data into a PIVOT form. But, I don’t have anything to aggregate in the table; all I want is convert the structure as below.
Image-1
Before SQL Server 2005, in the earlier versions we used to perform this through using cursors and temp tables which takes a lot of I/O. Starting with SQL Server 2005 which provides a new operator called PIVOT which actually does this, but the main purpose of using PIVOT is to rotate the table data from vertical to horizontal as shown in above picture, and performs AGGREGATIONS on required columns which will be shown in the final data set.
As per the syntax PIVOT only works with an aggregate:
SELECT <non-pivoted column>, [first pivoted column] AS <column name>, [second pivoted column] AS <column name>, ... [last pivoted column] AS <column name> FROM (<SELECT query that produces the data>) AS <alias for the source query> PIVOT ( <aggregation function>(<column being aggregated>) FOR [<column that contains the values that will become column headers>] IN ( [first pivoted column], [second pivoted column], ... [last pivoted column]) ) AS <alias for the pivot table> <optional ORDER BY clause>; |
But I don’t want to do any aggregate operation as all I want is to transpose the data into a tabular format.
There are 2 approaches to do this based on the requirements:
1. If the column names are consistent across the data then follow the 1st approach
2. If the column names are NOT consistent then the 2nd approach will solve the purpose using a schema table.
Let’s create a table with the data as shown in the Image-1
Approach- 1 :- columns (NAME,ADDRESS,PHONE ,EMAIL) is consistent across all the sets
create table customers ( IID INT IDENTITY(1,1),col0 varchar(90),col1 varchar(90)) GO -- 1st set insert into customers values ( 'NAME','A_NM') insert into customers values ( 'ADDRESS','A_ADR') insert into customers values ( 'PHONE','A_PH') insert into customers values ( 'EMAIL','A_EMAIL') -- 2nd set insert into customers values ( 'NAME','B_BM') insert into customers values ( 'ADDRESS','B_ADR') insert into customers values ( 'PHONE','B_XYZ') insert into customers values ( 'EMAIL','B_PH') -- 3rd set insert into customers values ( 'NAME','C_BM') insert into customers values ( 'ADDRESS','C_ADR') insert into customers values ( 'PHONE','C_XYZ') insert into customers values ( 'EMAIL','C_EMAIL') |
To transpose the data into a pivot report, I have used the partition function and included the results in a CTE and select the data in the CTE using PIVOT operator
script –A1
WITH CUST_CTE(iid, COL0,COL1,RID) AS ( SELECT iid, COL0,COL1 , ROW_NUMBER() OVER (PARTITION BY (COL0)ORDER BY IID) AS RID FROM CUSTOMERS ) SELECT NAME,ADDRESS,PHONE,EMAIL FROM (SELECT COL0,COL1,RID FROM CUST_CTE)C PIVOT ( max(COL1) FOR COL0 IN (NAME,[ADDRESS],PHONE,EMAIL) ) AS PivotTable; |
Output:
Explanation: the key to the whole solution is to partition the sets in the data, the column RID has the partitioned values which actually works with the PIVOT. To make it more clear, look at the output of the CTE. You can see the data sets are partitioned with a column in RID. So when you execute the script A1, the RID makes the rows to unique.
WITH CUST_CTE(iid, COL0,COL1,RID) AS ( SELECT iid, COL0,COL1 , ROW_NUMBER() OVER (PARTITION BY (COL0)ORDER BY IID) AS RID FROM CUSTOMERS ) SELECT * FROM CUST_CTE ORDER BY 1,4 |
Approach- 2:- considering the column NAME acts as a primary key and present across all the sets , but rest of the columns (ADDRESS, PHONE, and EMAIL) are NOT consistent across all the sets.
Lets create the table with the sample set to simulate the problem.
TRUNCATE TABLE CUSTOMERS GO -- 1st set insert into customers values ( 'NAME','A_NM') insert into customers values ( 'ADDRESS','A_ADR') insert into customers values ( 'PHONE','A_PH') insert into customers values ( 'EMAIL','A_EMAIL') -- 2nd set insert into customers values ( 'NAME','B_BM') insert into customers values ( 'ADDRESS','B_ADR') --insert into customers values ( 'PHONE','B_XYZ') insert into customers values ( 'EMAIL','B_PH') -- 3rd set insert into customers values ( 'NAME','C_BM') insert into customers values ( 'ADDRESS','C_ADR') insert into customers values ( 'PHONE','C_XYZ') insert into customers values ( 'EMAIL','C_EMAIL') GO SELECT * FROM CUSTOMERS |
When you execute the Script-A, you will see an incorrect results in the transposed data.
Script –A2
WITH CUST_CTE(iid, COL0,COL1,RID) AS ( SELECT iid, COL0,COL1 , ROW_NUMBER() OVER (PARTITION BY (COL0)ORDER BY IID) AS RID FROM CUSTOMERS ) SELECT NAME,ADDRESS,PHONE,EMAIL FROM (SELECT COL0,COL1,RID FROM CUST_CTE)C PIVOT ( max(COL1) FOR COL0 IN (NAME,[ADDRESS],PHONE,EMAIL) ) AS PivotTable; |
Output:
The value “C_XYZ should belong to C_BM , but its moved to B_BM.
As I mentioned, the key is the partition column RID, let’s look at what is there in the RID by executing only the statement in CTE
WITH CUST_CTE(iid, COL0,COL1,RID) AS ( SELECT iid, COL0,COL1 , ROW_NUMBER() OVER (PARTITION BY (COL0)ORDER BY IID) AS RID FROM CUSTOMERS ) SELECT * FROM CUST_CTE ORDER BY 1,4 |
So, the partitioned value RID for C_XYZ should be 3 as it belongs to 3rd set (look at the insert statement above) , but here its 2 , because I had partitioned the col0 order by IID.
So, if I can partition the data based on the sets, then it will resolve the problem, but there is no such function exists in SQL Server to partition based on defined sets. So I followed the below technique to partition the data based on the column value (NAME).
with cte_group(RID,IID,COL0,COL1) AS ( select row_number() over (ORDER BY IID )AS RID,* from --customers ~ select statement with adding one row at the end with value Name (SELECT * FROM customers UNION SELECT MAX(IID)+1,'NAME','' FROM customers)A where col0 = 'NAME' ) -- sent data to a temp table SELECT * INTO #GROUPS FROM CUSTOMERS a join (SELECT A.RID, A.IID as strt ,B.IID as ends FROM cte_group A JOIN cte_group B ON B.RID - A.RID = 1) b on a.iid >= b.strt and a.iid <b.ends select * from #GROUPS |
Execute the below script which shows the partitioned column based on sets without using partition function
select from #GROUPS |
Output:
Now, you will get the desired output as the RID is able to show the correct data based on the sets by using the below script.
SELECT NAME,ADDRESS,PHONE,EMAIL FROM (SELECT COL0,COL1,RID FROM #GROUPS)C PIVOT ( max(COL1) FOR COL0 IN (NAME,[ADDRESS],PHONE,EMAIL) ) AS PivotTable; |
OUTPUT:
22 comments:
WHAT IF IT IS NOT KNOWN HOW MANY COLUMNS WOULD BE THERE. SO FOR EXAMPLE THERE MAY BE MORE VALUES OTHER THEN PHONE, NAME ETC WHICH IS NOT KNOWN AT COMPILE TIME
i will try this as well..
How to use Pivot in this scenerio? I want to avoid using aggregate function in my pivot statement. Is there any workaround.
you cant use the PIVOT function with out using an aggregate and again its all depends on your exact requirement. post your exact requirement , we will look at it.
I tried it on my tables, I'm getting an error:
Incorrect syntax near the keyword 'with'.
This is what I've tried:
select * from IdentityProperties i, Properties p where i.IdentityID = p.ID
with CTE (T_ID, ID, Name, Value)
AS
(
SELECT
row_number() OVER(PARTITION BY Name ORDER BY ID) AS T_ID, ID,Name , Value
FROM dbo.IdentityProperties i inner join dbo.Properties p on i.PropertyID = p.ID
)
-- the line before the with must contain semicolon(;), use the below script
select * from IdentityProperties i, Properties p where i.IdentityID = p.ID;
with CTE (T_ID, ID, Name, Value)
AS
(
SELECT
row_number() OVER(PARTITION BY Name ORDER BY ID) AS T_ID, ID,Name , Value
FROM dbo.IdentityProperties i inner join dbo.Properties p on i.PropertyID = p.ID
)
I need to use your example to populate 49 columns for diagnoses and another 49 columns for proceedures. The common field will be account. Is this feasible?
Hi,
Can you please give me how to use pivot when you don't know the number of columns and with out aggregate function?
can we use dynamic sql to do this?
yes, you can use dynamic pivot. checkout the article by madhavan at the below link.
http://sqlblogcasts.com/blogs/madhivanan/archive/2008/08/27/dynamic-pivot-in-sql-server-2005.aspx
If your data set in the first instance has data which has an ID which forces grouping on it, you dont need to use ROW_NUMBER() PARTITION etc. The use of the CTE is unnecessary and adds complexity here it is simply be used as an Alias, when a a few more brackets would do the trick.
We still need the CTE to generate TID(partition)
Hi
This is a simple solution, for the case when you don't know the number of columns
http://valonkastrati.wordpress.com/2013/05/29/sql-rows-into-columns-without-using-pivot/
Happy coding!
Hi Valonk,
I like your post , but i think you need to modify a bit as its missing the format still.
if you take the table in this poast as an example "create table customers..." then you will see the difference.
Happy coding.
Check this link for pivot without aggregate function::
http://sandipgsql.blogspot.in/2013/06/pivot-without-aggregate-function-in-sql.html
purpose of pivot function is to eliminate cursor. can you compare performance results from both the approaches
Nice explanation.. good job...
suppose the pivoted column has say 5 different valuesand we need all of those(and not just the max or min value).
by default it works with 5 different values
Thank a lot - your post saved me hours of work! I used Approach 1 (By using LEFT OUTER JOIN I managed to compensate MISSING values) along with user-defined table type (instead of CTE) and dynamic SQL. Now I have a "GENERIC" stored procedure that could return a recordset with "OPEN SCHEMA" with normal representation of row. Finally tamed EAV "beast"!
o happy day! thanks1
Thanks for the post. Very useful.
Thanks - this helped a lot!
Post a Comment