04-15-2004, 10:11 AM | #1 (permalink) |
Banned from being Banned
Location: Donkey
|
[SQL] Is this query optimized?
I'll try to keep this simple and easy to follow
Let's say I have a system where I run reports on Sales Agents. I return the data once a day and tally up the amount of meetings they attended along with a transaction (sales) summary. The data is to be presented on ONE line. To show this example, I have 3 tables: Agents, Meetings, and Transactions. Agents contains: Code:
AgentID Name ----------- -------------------------------------------------- 1 Bob 2 Bill Code:
MeetingID AgentID MeetingDate ----------- ----------- ------------------------------------------------------ 1 1 2004-04-15 11:00:00.000 2 1 2004-04-15 12:00:00.000 3 1 2004-04-15 13:00:00.000 4 2 2004-04-15 09:00:00.000 Code:
TransID AgentID TransTypeID Amount ----------- ----------- ----------- ----------- 1 1 1 5 2 1 2 10 3 1 3 15 4 1 1 10 5 2 3 50 I need the results returned ONE agent per line, total meetings, and each transaction type (with amount sum) displayed as shown in the following: Code:
Name Meetings Type1 Sum1 Type2 Sum2 Type3 Sum3 -------------------------------------------------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- Bill 1 0 0 0 0 1 50 Bob 3 2 15 1 10 1 15 Code:
SELECT A.Name, COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings, Type1 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 1), Sum1 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 1), Type2 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 2), Sum2 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 2), Type3 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3), Sum3 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3) FROM dbo.Agents A LEFT OUTER JOIN dbo.Meetings ON A.AgentID = dbo.Meetings.AgentID GROUP BY A.Name, A.AgentID ORDER BY A.Name In the real world situation I'm working with, there are 6 different types with corresponding amounts that need to be displayed, so that'd be 12 queries per row. There could be tens of thousands of agents with hundreds of thousands of transaction records, so doing something like this would seem overkill on the server. I COULD query it like so: Code:
SELECT dbo.Agents.Name, COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings, dbo.Transactions.TransTypeID, SUM(DISTINCT dbo.Transactions.Amount) AS TransAmount FROM dbo.Agents LEFT OUTER JOIN dbo.Transactions ON dbo.Agents.AgentID = dbo.Transactions.AgentID LEFT OUTER JOIN dbo.Meetings ON dbo.Agents.AgentID = dbo.Meetings.AgentID GROUP BY dbo.Agents.Name, dbo.Transactions.TransTypeID ORDER BY dbo.Agents.Name Code:
Name Meetings TransTypeID TransAmount -------------------------------------------------- ----------- ----------- ----------- Bill 1 3 50 Bob 3 1 15 Bob 3 2 10 Bob 3 3 15 [edit] Not to mention it would throw off paging. If the above dataset was limited to 2 records per page, instead of showing one for Bill and one for Bob, you'd end up with one for Bill, and 1/3 of the data for Bob. However, if that's my only option, then I guess I have no choice Just wanted other opinions on what I should do, or the most recommended route to tackle this. Thanks Last edited by Stompy; 04-15-2004 at 11:48 AM.. |
04-15-2004, 11:43 AM | #2 (permalink) |
Banned from being Banned
Location: Donkey
|
**UPDATE**
The 6 types I mentioned... turns out there can be an infinite number. The types are user defined. Which brings the question... is there a way to dynamically add columns in a SQL statement, or would I just have to rebuild the stored procedure each time a type is added? For example, if someone decided to add a type 4 to the transaction table I gave in my example, I'd have to take the existing stored procedure and, through code, add: Code:
Type4 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4), Sum4 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4), Also, another reason I can't really do the last group by method shown the post above this one (the one with the "SUM(DISTINCT") is because it will throw off paging. If the page size is 10 and there's 10 different types, page 1 would be only one record as opposed to.. 10 unique agent records.
__________________
I love lamp. |
04-15-2004, 07:36 PM | #3 (permalink) |
undead
Location: nihilistic freedom
|
First of all, man, that must be the longest question I've seen posted...
Second, you said that your data is collected and tallyed once per day, right? So, what difference does it make if the query is optimized or not? Kick off the tally procedure at 12AM and let it run for 4 hours, if need be. Also, I don't know a whole lot about SQL, but I'm pretty sure its all interpreted. Perhaps the interpreter is "smart" enough to make some kind optimizations on your query or perhaps do some intelligent caching so it isn't too inefficient. Have you actually implemented this system and noticed a performance issue? Just curious... and sorry I don't have a better answer to your question. |
04-16-2004, 09:15 PM | #4 (permalink) |
Banned from being Banned
Location: Donkey
|
What I meant was.. it needs to tally it for a particular date range, not necessarily "once per day".
The reason I can't pull data every 15 minutes, or hour or whatever is because these are dynamic reports. Based on different criteria the user selects (columns to display, data to filter on, etc), I need to rebuild the stored procedures. There's a base query, for example, the Agent Meeting/Transaction report above, but the user can also choose to filter that data based on a certain product or other various information. When the user selects one or more filters to add to the type of report, I take all those filters and apply them to the base query. If a product filter is added, I append a join to the products table and the appropriate where clause, and repeat this for each additional filter. The problem lies in the Transaction Types. These types are user defined, and the user can pick & choose which types to show. Say the base report query looks like this: Code:
SELECT A.Name, COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings FROM dbo.Agents A LEFT OUTER JOIN dbo.Meetings ON A.AgentID = dbo.Meetings.AgentID GROUP BY A.Name, A.AgentID ORDER BY A.Name I need to take that query and change it (rebuild the stored proc.) so it looks like: Code:
SELECT A.Name, COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings, Type3 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4), Sum3 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3), Type4 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4), Sum4 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4) FROM dbo.Agents A LEFT OUTER JOIN dbo.Meetings ON A.AgentID = dbo.Meetings.AgentID GROUP BY A.Name, A.AgentID ORDER BY A.Name Overall, I'm just wondering if my method of achieving the following results using the query shown is as good as I can get, or in other words, is there a better more efficient way that I should be doing this? And given the additional info I provided (dynamic reports, infinite types/columns that can be selected by the user, rebuilding stored procs), is there another way I should be looking at this? Code:
These results: Name Meetings Type1 Sum1 Type2 Sum2 Type3 Sum3 -------------------------------------------------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- Bill 1 0 0 0 0 1 50 Bob 3 2 15 1 10 1 15 Using this Query: SELECT A.Name, COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings, Type1 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 1), Sum1 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 1), Type2 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 2), Sum2 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 2), Type3 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3), Sum3 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3) FROM dbo.Agents A LEFT OUTER JOIN dbo.Meetings ON A.AgentID = dbo.Meetings.AgentID GROUP BY A.Name, A.AgentID ORDER BY A.Name
__________________
I love lamp. Last edited by Stompy; 04-16-2004 at 09:19 PM.. |
04-16-2004, 10:12 PM | #5 (permalink) |
Crazy
|
So you never specified what RDBMS you are using but this line
Code:
dbo.Meetings.MeetingID One thing you can do is create a view based on the query and select from that. That can speed up execution some. Create a view based on the join tables and index the view. Since you want to include aggregate values in your result set you can't use an indexed view for your final query. Since you want to include the types based on user input you probably won't be able to use a stored procedure. Stored Procedures can speed things up some because you get the benefit of a pre-compiled (meaning determined) execution path. You *CAN* add columns dynamically, but that kind of programming in T-SQL makes my teeth hurt so I'm not going to suggest that. I'll only give a small piece of advice, select your final result set into a table variable rather than a temp table if you are using SQL Server 2000. Table variables are much more efficient than temp tables in SQL 2K. The big thing you need to do, and didn't mention, is starting up the SQL profiler and running a trace against the database when you run that query. That'll tell you if it's efficient or not. Start up Query Analyzer, run your query and look at the execution path. That'll tell you where the query takes the most time while it's executing. Make sure your tables are properly indexed. FYI I run almost exactly the same kind of query against a patient database, counting the number of labs, etc.. that each one has, on a weekly basis and I've got about a quarter of a million rows in one of the tables in my query and there are 7 tables that I hit. It takes a little under 20 seconds to run. |
04-17-2004, 04:56 AM | #6 (permalink) |
Crazy
Location: Europe
|
Should'nt this report be redesigned? Does the person who asked for this realise how hard it will be to look at the result with infinite rows and infinite columns? Have you challanged the need for this exact layout?
Just your example with 3 transaction types is though enough to read. Seldom can 3 or more dimensions of data be successfully displayed in only 2 dimensions. So I would redisign it so that you get the result on, lets say, one page per salesman.
__________________
Coffee |
04-28-2004, 03:40 PM | #7 (permalink) |
Banned from being Banned
Location: Donkey
|
Thanks for the replies.
Yes, this is SQL Server. I knew about the Query Analyzer and the execution path and all that, but was curious as to whether or not there's a better method for querying out data in the manner which I described. Yeah, THIS report could definitely be redesigned (I'll prob. have a talk w/ my project manager about), but there are times when I had to do ONLY 2 or 3 columns of summed up data. I've always sat back and wondered if it was good practice or not to query data like that.. basically doing the: select field1, field2= (SELECT SUM(Amount) FROM tbl WHERE x=1), field3 = (SELECT SUM(Amount) FROM tbl WHERE x=2) from tbl ... etc Another question I have for you, twister, is that you mentioned I should use a view. I've always used stored procs for EVERYTHING: inserting, updating, and selecting. How does a view differ from a stored proc? I've always used a view as a visual query designer of sorts, then took the SQL it made and copied & pasted to a stored proc. I know you can select FROM a view, but I'm unaware of any advantages it has to offer over stored procs.
__________________
I love lamp. |
04-29-2004, 02:58 PM | #8 (permalink) |
Crazy
|
I think in this case the biggest advantage is probably the ability to index the view/ You can create an index combining all of the joins that you do by hand, but it's a little easier to maintain if you just index the view. IMO.
If you're just selecting there probably isn't much of a difference except that using a view makes it easier to create an ad-hoc reporting system. Well, a somewhat ad-hoc reporting system anyway. You define the columns that the user can select from, but the users selects the columns to query against. |
Tags |
optimized, query, sql |
|
|