[SQL] Is this query optimized? - Tilted Forum Project Discussion Community

Stompy · 04-15-2004, 10:11 AM

I'll try to keep this simple and easy to follow

Let's say I have a system where I run reports on Sales Agents. I return the data once a day and tally up the amount of meetings they attended along with a transaction (sales) summary.

The data is to be presented on ONE line.

To show this example, I have 3 tables: Agents, Meetings, and Transactions.

Agents contains:

Code:

AgentID     Name                                               
----------- -------------------------------------------------- 
1           Bob
2           Bill

Meetings contains:

Code:

MeetingID   AgentID     MeetingDate                                            
----------- ----------- ------------------------------------------------------ 
1           1           2004-04-15 11:00:00.000
2           1           2004-04-15 12:00:00.000
3           1           2004-04-15 13:00:00.000
4           2           2004-04-15 09:00:00.000

Transactions contains:

Code:

TransID     AgentID     TransTypeID Amount      
----------- ----------- ----------- ----------- 
1           1           1           5
2           1           2           10
3           1           3           15
4           1           1           10
5           2           3           50

I need the results returned ONE agent per line, total meetings, and each transaction type (with amount sum) displayed as shown in the following:

Code:

Name                                               Meetings    Type1       Sum1        Type2       Sum2        Type3       Sum3        
-------------------------------------------------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- 
Bill                                               1           0           0           0           0           1           50
Bob                                                3           2           15          1           10          1           15

The query I'm using for this is:

Code:

SELECT
	A.Name, 
	COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings, 
	Type1 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 1),
	Sum1 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 1),
	Type2 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 2),
	Sum2 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 2),
	Type3 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3),
	Sum3 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3)
FROM         
	dbo.Agents A LEFT OUTER JOIN dbo.Meetings ON A.AgentID = dbo.Meetings.AgentID
GROUP BY 
	A.Name,
	A.AgentID
ORDER BY 
	A.Name

The question I have is... is there a more efficient way to get the results I want? It seems highly inefficient to be querying against the same table six times for every record returned.

In the real world situation I'm working with, there are 6 different types with corresponding amounts that need to be displayed, so that'd be 12 queries per row. There could be tens of thousands of agents with hundreds of thousands of transaction records, so doing something like this would seem overkill on the server.

I COULD query it like so:

Code:

SELECT
	dbo.Agents.Name, 
	COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings, 
	dbo.Transactions.TransTypeID, 
	SUM(DISTINCT dbo.Transactions.Amount) AS TransAmount
FROM         
	dbo.Agents LEFT OUTER JOIN dbo.Transactions ON dbo.Agents.AgentID = dbo.Transactions.AgentID 
		LEFT OUTER JOIN dbo.Meetings ON dbo.Agents.AgentID = dbo.Meetings.AgentID
GROUP BY 
	dbo.Agents.Name, 
	dbo.Transactions.TransTypeID
ORDER BY 
	dbo.Agents.Name

which will return the following results:

Code:

Name                                               Meetings    TransTypeID TransAmount 
-------------------------------------------------- ----------- ----------- ----------- 
Bill                                               1           3           50
Bob                                                3           1           15
Bob                                                3           2           10
Bob                                                3           3           15

...which would require me to loop through EACH record, check the TransType, then put it in it's proper position in the table in HTML, but this is done in asp.net and it's a headache to add items "manually" like that.

[edit]
Not to mention it would throw off paging. If the above dataset was limited to 2 records per page, instead of showing one for Bill and one for Bob, you'd end up with one for Bill, and 1/3 of the data for Bob.

However, if that's my only option, then I guess I have no choice

Just wanted other opinions on what I should do, or the most recommended route to tackle this. Thanks

Stompy · 04-15-2004, 11:43 AM

**UPDATE**

The 6 types I mentioned... turns out there can be an infinite number. The types are user defined.

Which brings the question... is there a way to dynamically add columns in a SQL statement, or would I just have to rebuild the stored procedure each time a type is added?

For example, if someone decided to add a type 4 to the transaction table I gave in my example, I'd have to take the existing stored procedure and, through code, add:

Code:

Type4 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4),

	Sum4 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4),

..to the existing code that selects those rows.

Also, another reason I can't really do the last group by method shown the post above this one (the one with the "SUM(DISTINCT") is because it will throw off paging. If the page size is 10 and there's 10 different types, page 1 would be only one record as opposed to.. 10 unique agent records.

nothingx · 04-15-2004, 07:36 PM

First of all, man, that must be the longest question I've seen posted...

Second, you said that your data is collected and tallyed once per day, right? So, what difference does it make if the query is optimized or not? Kick off the tally procedure at 12AM and let it run for 4 hours, if need be. Also, I don't know a whole lot about SQL, but I'm pretty sure its all interpreted. Perhaps the interpreter is "smart" enough to make some kind optimizations on your query or perhaps do some intelligent caching so it isn't too inefficient.

Have you actually implemented this system and noticed a performance issue? Just curious... and sorry I don't have a better answer to your question.

Stompy · 04-16-2004, 09:15 PM

What I meant was.. it needs to tally it for a particular date range, not necessarily "once per day".

The reason I can't pull data every 15 minutes, or hour or whatever is because these are dynamic reports.

Based on different criteria the user selects (columns to display, data to filter on, etc), I need to rebuild the stored procedures. There's a base query, for example, the Agent Meeting/Transaction report above, but the user can also choose to filter that data based on a certain product or other various information.

When the user selects one or more filters to add to the type of report, I take all those filters and apply them to the base query. If a product filter is added, I append a join to the products table and the appropriate where clause, and repeat this for each additional filter.

The problem lies in the Transaction Types. These types are user defined, and the user can pick & choose which types to show.

Say the base report query looks like this:

Code:

SELECT

	A.Name, 

	COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings

FROM         

	dbo.Agents A LEFT OUTER JOIN dbo.Meetings ON A.AgentID = dbo.Meetings.AgentID

GROUP BY 

	A.Name,

	A.AgentID

ORDER BY 

	A.Name

...then they decide "oh, I wanna add type 3 and type 4 columns to display."

I need to take that query and change it (rebuild the stored proc.) so it looks like:

Code:

SELECT
	A.Name, 
	COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings, 
	Type3 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4),
	Sum3 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3),
	Type4 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4),
	Sum4 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4)
FROM         
	dbo.Agents A LEFT OUTER JOIN dbo.Meetings ON A.AgentID = dbo.Meetings.AgentID
GROUP BY 
	A.Name,
	A.AgentID
ORDER BY 
	A.Name

So far, I haven't run into any issues because the data I'm working with is relatively small. I was thinking about filling it with a few million dummy records and testing it out, though.

Overall, I'm just wondering if my method of achieving the following results using the query shown is as good as I can get, or in other words, is there a better more efficient way that I should be doing this?

And given the additional info I provided (dynamic reports, infinite types/columns that can be selected by the user, rebuilding stored procs), is there another way I should be looking at this?

Code:

These results:
Name                                               Meetings    Type1       Sum1        Type2       Sum2        Type3       Sum3        
-------------------------------------------------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- 
Bill                                               1           0           0           0           0           1           50
Bob                                                3           2           15          1           10          1           15

Using this Query:
SELECT
	A.Name, 
	COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings, 
	Type1 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 1),
	Sum1 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 1),
	Type2 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 2),
	Sum2 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 2),
	Type3 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3),
	Sum3 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3)
FROM         
	dbo.Agents A LEFT OUTER JOIN dbo.Meetings ON A.AgentID = dbo.Meetings.AgentID
GROUP BY 
	A.Name,
	A.AgentID
ORDER BY 
	A.Name

twister002 · 04-16-2004, 10:12 PM

So you never specified what RDBMS you are using but this line

Code:

dbo.Meetings.MeetingID

tells me that it's probably SQL server.

One thing you can do is create a view based on the query and select from that. That can speed up execution some. Create a view based on the join tables and index the view. Since you want to include aggregate values in your result set you can't use an indexed view for your final query.

Since you want to include the types based on user input you probably won't be able to use a stored procedure. Stored Procedures can speed things up some because you get the benefit of a pre-compiled (meaning determined) execution path. You *CAN* add columns dynamically, but that kind of programming in T-SQL makes my teeth hurt so I'm not going to suggest that. I'll only give a small piece of advice, select your final result set into a table variable rather than a temp table if you are using SQL Server 2000. Table variables are much more efficient than temp tables in SQL 2K.

The big thing you need to do, and didn't mention, is starting up the SQL profiler and running a trace against the database when you run that query. That'll tell you if it's efficient or not. Start up Query Analyzer, run your query and look at the execution path. That'll tell you where the query takes the most time while it's executing. Make sure your tables are properly indexed.

FYI I run almost exactly the same kind of query against a patient database, counting the number of labs, etc.. that each one has, on a weekly basis and I've got about a quarter of a million rows in one of the tables in my query and there are 7 tables that I hit. It takes a little under 20 seconds to run.

AxelF · 04-17-2004, 04:56 AM

Should'nt this report be redesigned? Does the person who asked for this realise how hard it will be to look at the result with infinite rows and infinite columns? Have you challanged the need for this exact layout?
Just your example with 3 transaction types is though enough to read. Seldom can 3 or more dimensions of data be successfully displayed in only 2 dimensions.

So I would redisign it so that you get the result on, lets say, one page per salesman.

Stompy · 04-28-2004, 03:40 PM

Thanks for the replies.

Yes, this is SQL Server. I knew about the Query Analyzer and the execution path and all that, but was curious as to whether or not there's a better method for querying out data in the manner which I described.

Yeah, THIS report could definitely be redesigned (I'll prob. have a talk w/ my project manager about), but there are times when I had to do ONLY 2 or 3 columns of summed up data.

I've always sat back and wondered if it was good practice or not to query data like that.. basically doing the:

select field1, field2= (SELECT SUM(Amount) FROM tbl WHERE x=1), field3 = (SELECT SUM(Amount) FROM tbl WHERE x=2) from tbl ... etc

Another question I have for you, twister, is that you mentioned I should use a view. I've always used stored procs for EVERYTHING: inserting, updating, and selecting. How does a view differ from a stored proc? I've always used a view as a visual query designer of sorts, then took the SQL it made and copied & pasted to a stored proc. I know you can select FROM a view, but I'm unaware of any advantages it has to offer over stored procs.

twister002 · 04-29-2004, 02:58 PM

I think in this case the biggest advantage is probably the ability to index the view/ You can create an index combining all of the joins that you do by hand, but it's a little easier to maintain if you just index the view. IMO.

If you're just selecting there probably isn't much of a difference except that using a view makes it easier to create an ad-hoc reporting system. Well, a somewhat ad-hoc reporting system anyway. You define the columns that the user can select from, but the users selects the columns to query against.

04-15-2004, 11:43 AM	#2 (permalink)
Stompy Banned from being Banned Location: Donkey	UPDATE The 6 types I mentioned... turns out there can be an infinite number. The types are user defined. Which brings the question... is there a way to dynamically add columns in a SQL statement, or would I just have to rebuild the stored procedure each time a type is added? For example, if someone decided to add a type 4 to the transaction table I gave in my example, I'd have to take the existing stored procedure and, through code, add: Code: Type4 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4), Sum4 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4), ..to the existing code that selects those rows. Also, another reason I can't really do the last group by method shown the post above this one (the one with the "SUM(DISTINCT") is because it will throw off paging. If the page size is 10 and there's 10 different types, page 1 would be only one record as opposed to.. 10 unique agent records. __________________ I love lamp.

04-16-2004, 09:15 PM	#4 (permalink)
Stompy Banned from being Banned Location: Donkey	What I meant was.. it needs to tally it for a particular date range, not necessarily "once per day". The reason I can't pull data every 15 minutes, or hour or whatever is because these are dynamic reports. Based on different criteria the user selects (columns to display, data to filter on, etc), I need to rebuild the stored procedures. There's a base query, for example, the Agent Meeting/Transaction report above, but the user can also choose to filter that data based on a certain product or other various information. When the user selects one or more filters to add to the type of report, I take all those filters and apply them to the base query. If a product filter is added, I append a join to the products table and the appropriate where clause, and repeat this for each additional filter. The problem lies in the Transaction Types. These types are user defined, and the user can pick & choose which types to show. Say the base report query looks like this: Code: SELECT A.Name, COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings FROM dbo.Agents A LEFT OUTER JOIN dbo.Meetings ON A.AgentID = dbo.Meetings.AgentID GROUP BY A.Name, A.AgentID ORDER BY A.Name ...then they decide "oh, I wanna add type 3 and type 4 columns to display." I need to take that query and change it (rebuild the stored proc.) so it looks like: Code: SELECT A.Name, COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings, Type3 = (SELECT COUNT() FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4), Sum3 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3), Type4 = (SELECT COUNT() FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4), Sum4 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4) FROM dbo.Agents A LEFT OUTER JOIN dbo.Meetings ON A.AgentID = dbo.Meetings.AgentID GROUP BY A.Name, A.AgentID ORDER BY A.Name So far, I haven't run into any issues because the data I'm working with is relatively small. I was thinking about filling it with a few million dummy records and testing it out, though. Overall, I'm just wondering if my method of achieving the following results using the query shown is as good as I can get, or in other words, is there a better more efficient way that I should be doing this? And given the additional info I provided (dynamic reports, infinite types/columns that can be selected by the user, rebuilding stored procs), is there another way I should be looking at this? Code: These results: Name Meetings Type1 Sum1 Type2 Sum2 Type3 Sum3 -------------------------------------------------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- Bill 1 0 0 0 0 1 50 Bob 3 2 15 1 10 1 15 Using this Query: SELECT A.Name, COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings, Type1 = (SELECT COUNT() FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 1), Sum1 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 1), Type2 = (SELECT COUNT() FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 2), Sum2 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 2), Type3 = (SELECT COUNT() FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3), Sum3 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3) FROM dbo.Agents A LEFT OUTER JOIN dbo.Meetings ON A.AgentID = dbo.Meetings.AgentID GROUP BY A.Name, A.AgentID ORDER BY A.Name __________________ I love lamp. Last edited by Stompy; 04-16-2004 at 09:19 PM..*

04-16-2004, 10:12 PM	#5 (permalink)
twister002 Crazy	So you never specified what RDBMS you are using but this line Code: dbo.Meetings.MeetingID tells me that it's probably SQL server. One thing you can do is create a view based on the query and select from that. That can speed up execution some. Create a view based on the join tables and index the view. Since you want to include aggregate values in your result set you can't use an indexed view for your final query. Since you want to include the types based on user input you probably won't be able to use a stored procedure. Stored Procedures can speed things up some because you get the benefit of a pre-compiled (meaning determined) execution path. You CAN add columns dynamically, but that kind of programming in T-SQL makes my teeth hurt so I'm not going to suggest that. I'll only give a small piece of advice, select your final result set into a table variable rather than a temp table if you are using SQL Server 2000. Table variables are much more efficient than temp tables in SQL 2K. The big thing you need to do, and didn't mention, is starting up the SQL profiler and running a trace against the database when you run that query. That'll tell you if it's efficient or not. Start up Query Analyzer, run your query and look at the execution path. That'll tell you where the query takes the most time while it's executing. Make sure your tables are properly indexed. FYI I run almost exactly the same kind of query against a patient database, counting the number of labs, etc.. that each one has, on a weekly basis and I've got about a quarter of a million rows in one of the tables in my query and there are 7 tables that I hit. It takes a little under 20 seconds to run.

04-17-2004, 04:56 AM	#6 (permalink)
AxelF Crazy Location: Europe	Should'nt this report be redesigned? Does the person who asked for this realise how hard it will be to look at the result with infinite rows and infinite columns? Have you challanged the need for this exact layout? Just your example with 3 transaction types is though enough to read. Seldom can 3 or more dimensions of data be successfully displayed in only 2 dimensions. So I would redisign it so that you get the result on, lets say, one page per salesman. __________________ Coffee

04-28-2004, 03:40 PM	#7 (permalink)
Stompy Banned from being Banned Location: Donkey	Thanks for the replies. Yes, this is SQL Server. I knew about the Query Analyzer and the execution path and all that, but was curious as to whether or not there's a better method for querying out data in the manner which I described. Yeah, THIS report could definitely be redesigned (I'll prob. have a talk w/ my project manager about), but there are times when I had to do ONLY 2 or 3 columns of summed up data. I've always sat back and wondered if it was good practice or not to query data like that.. basically doing the: select field1, field2= (SELECT SUM(Amount) FROM tbl WHERE x=1), field3 = (SELECT SUM(Amount) FROM tbl WHERE x=2) from tbl ... etc Another question I have for you, twister, is that you mentioned I should use a view. I've always used stored procs for EVERYTHING: inserting, updating, and selecting. How does a view differ from a stored proc? I've always used a view as a visual query designer of sorts, then took the SQL it made and copied & pasted to a stored proc. I know you can select FROM a view, but I'm unaware of any advantages it has to offer over stored procs. __________________ I love lamp.

04-15-2004, 07:36 PM	#3 (permalink)
nothingx undead Location: nihilistic freedom	First of all, man, that must be the longest question I've seen posted... Second, you said that your data is collected and tallyed once per day, right? So, what difference does it make if the query is optimized or not? Kick off the tally procedure at 12AM and let it run for 4 hours, if need be. Also, I don't know a whole lot about SQL, but I'm pretty sure its all interpreted. Perhaps the interpreter is "smart" enough to make some kind optimizations on your query or perhaps do some intelligent caching so it isn't too inefficient. Have you actually implemented this system and noticed a performance issue? Just curious... and sorry I don't have a better answer to your question.

04-29-2004, 02:58 PM	#8 (permalink)
twister002 Crazy	I think in this case the biggest advantage is probably the ability to index the view/ You can create an index combining all of the joins that you do by hand, but it's a little easier to maintain if you just index the view. IMO. If you're just selecting there probably isn't much of a difference except that using a view makes it easier to create an ad-hoc reporting system. Well, a somewhat ad-hoc reporting system anyway. You define the columns that the user can select from, but the users selects the columns to query against.