Tilted Forum Project Discussion Community  

Go Back   Tilted Forum Project Discussion Community > Interests > Tilted Technology


 
 
LinkBack Thread Tools
Old 04-15-2004, 10:11 AM   #1 (permalink)
Banned from being Banned
 
Location: Donkey
[SQL] Is this query optimized?

I'll try to keep this simple and easy to follow

Let's say I have a system where I run reports on Sales Agents. I return the data once a day and tally up the amount of meetings they attended along with a transaction (sales) summary.

The data is to be presented on ONE line.

To show this example, I have 3 tables: Agents, Meetings, and Transactions.

Agents contains:

Code:
AgentID     Name                                               
----------- -------------------------------------------------- 
1           Bob
2           Bill
Meetings contains:

Code:
MeetingID   AgentID     MeetingDate                                            
----------- ----------- ------------------------------------------------------ 
1           1           2004-04-15 11:00:00.000
2           1           2004-04-15 12:00:00.000
3           1           2004-04-15 13:00:00.000
4           2           2004-04-15 09:00:00.000
Transactions contains:

Code:
TransID     AgentID     TransTypeID Amount      
----------- ----------- ----------- ----------- 
1           1           1           5
2           1           2           10
3           1           3           15
4           1           1           10
5           2           3           50

I need the results returned ONE agent per line, total meetings, and each transaction type (with amount sum) displayed as shown in the following:

Code:
Name                                               Meetings    Type1       Sum1        Type2       Sum2        Type3       Sum3        
-------------------------------------------------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- 
Bill                                               1           0           0           0           0           1           50
Bob                                                3           2           15          1           10          1           15
The query I'm using for this is:

Code:
SELECT
	A.Name, 
	COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings, 
	Type1 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 1),
	Sum1 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 1),
	Type2 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 2),
	Sum2 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 2),
	Type3 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3),
	Sum3 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3)
FROM         
	dbo.Agents A LEFT OUTER JOIN dbo.Meetings ON A.AgentID = dbo.Meetings.AgentID
GROUP BY 
	A.Name,
	A.AgentID
ORDER BY 
	A.Name
The question I have is... is there a more efficient way to get the results I want? It seems highly inefficient to be querying against the same table six times for every record returned.

In the real world situation I'm working with, there are 6 different types with corresponding amounts that need to be displayed, so that'd be 12 queries per row. There could be tens of thousands of agents with hundreds of thousands of transaction records, so doing something like this would seem overkill on the server.

I COULD query it like so:

Code:
SELECT
	dbo.Agents.Name, 
	COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings, 
	dbo.Transactions.TransTypeID, 
	SUM(DISTINCT dbo.Transactions.Amount) AS TransAmount
FROM         
	dbo.Agents LEFT OUTER JOIN dbo.Transactions ON dbo.Agents.AgentID = dbo.Transactions.AgentID 
		LEFT OUTER JOIN dbo.Meetings ON dbo.Agents.AgentID = dbo.Meetings.AgentID
GROUP BY 
	dbo.Agents.Name, 
	dbo.Transactions.TransTypeID
ORDER BY 
	dbo.Agents.Name
which will return the following results:

Code:
Name                                               Meetings    TransTypeID TransAmount 
-------------------------------------------------- ----------- ----------- ----------- 
Bill                                               1           3           50
Bob                                                3           1           15
Bob                                                3           2           10
Bob                                                3           3           15
...which would require me to loop through EACH record, check the TransType, then put it in it's proper position in the table in HTML, but this is done in asp.net and it's a headache to add items "manually" like that.

[edit]
Not to mention it would throw off paging. If the above dataset was limited to 2 records per page, instead of showing one for Bill and one for Bob, you'd end up with one for Bill, and 1/3 of the data for Bob.

However, if that's my only option, then I guess I have no choice

Just wanted other opinions on what I should do, or the most recommended route to tackle this. Thanks

Last edited by Stompy; 04-15-2004 at 11:48 AM..
Stompy is offline  
Old 04-15-2004, 11:43 AM   #2 (permalink)
Banned from being Banned
 
Location: Donkey
**UPDATE**

The 6 types I mentioned... turns out there can be an infinite number. The types are user defined.

Which brings the question... is there a way to dynamically add columns in a SQL statement, or would I just have to rebuild the stored procedure each time a type is added?

For example, if someone decided to add a type 4 to the transaction table I gave in my example, I'd have to take the existing stored procedure and, through code, add:

Code:
Type4 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4),

	Sum4 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4),
..to the existing code that selects those rows.

Also, another reason I can't really do the last group by method shown the post above this one (the one with the "SUM(DISTINCT") is because it will throw off paging. If the page size is 10 and there's 10 different types, page 1 would be only one record as opposed to.. 10 unique agent records.
__________________
I love lamp.
Stompy is offline  
Old 04-15-2004, 07:36 PM   #3 (permalink)
undead
 
Location: nihilistic freedom
First of all, man, that must be the longest question I've seen posted...

Second, you said that your data is collected and tallyed once per day, right? So, what difference does it make if the query is optimized or not? Kick off the tally procedure at 12AM and let it run for 4 hours, if need be. Also, I don't know a whole lot about SQL, but I'm pretty sure its all interpreted. Perhaps the interpreter is "smart" enough to make some kind optimizations on your query or perhaps do some intelligent caching so it isn't too inefficient.

Have you actually implemented this system and noticed a performance issue? Just curious... and sorry I don't have a better answer to your question.
nothingx is offline  
Old 04-16-2004, 09:15 PM   #4 (permalink)
Banned from being Banned
 
Location: Donkey
What I meant was.. it needs to tally it for a particular date range, not necessarily "once per day".

The reason I can't pull data every 15 minutes, or hour or whatever is because these are dynamic reports.

Based on different criteria the user selects (columns to display, data to filter on, etc), I need to rebuild the stored procedures. There's a base query, for example, the Agent Meeting/Transaction report above, but the user can also choose to filter that data based on a certain product or other various information.

When the user selects one or more filters to add to the type of report, I take all those filters and apply them to the base query. If a product filter is added, I append a join to the products table and the appropriate where clause, and repeat this for each additional filter.

The problem lies in the Transaction Types. These types are user defined, and the user can pick & choose which types to show.

Say the base report query looks like this:

Code:
SELECT

	A.Name, 

	COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings

FROM         

	dbo.Agents A LEFT OUTER JOIN dbo.Meetings ON A.AgentID = dbo.Meetings.AgentID

GROUP BY 

	A.Name,

	A.AgentID

ORDER BY 

	A.Name
...then they decide "oh, I wanna add type 3 and type 4 columns to display."

I need to take that query and change it (rebuild the stored proc.) so it looks like:

Code:
SELECT
	A.Name, 
	COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings, 
	Type3 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4),
	Sum3 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3),
	Type4 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4),
	Sum4 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 4)
FROM         
	dbo.Agents A LEFT OUTER JOIN dbo.Meetings ON A.AgentID = dbo.Meetings.AgentID
GROUP BY 
	A.Name,
	A.AgentID
ORDER BY 
	A.Name
So far, I haven't run into any issues because the data I'm working with is relatively small. I was thinking about filling it with a few million dummy records and testing it out, though.

Overall, I'm just wondering if my method of achieving the following results using the query shown is as good as I can get, or in other words, is there a better more efficient way that I should be doing this?

And given the additional info I provided (dynamic reports, infinite types/columns that can be selected by the user, rebuilding stored procs), is there another way I should be looking at this?

Code:
These results:
Name                                               Meetings    Type1       Sum1        Type2       Sum2        Type3       Sum3        
-------------------------------------------------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- 
Bill                                               1           0           0           0           0           1           50
Bob                                                3           2           15          1           10          1           15

Using this Query:
SELECT
	A.Name, 
	COUNT(DISTINCT dbo.Meetings.MeetingID) AS Meetings, 
	Type1 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 1),
	Sum1 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 1),
	Type2 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 2),
	Sum2 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 2),
	Type3 = (SELECT COUNT(*) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3),
	Sum3 = (SELECT ISNULL(SUM(Amount), 0) FROM Transactions T WHERE AgentID = A.AgentID AND T.TransTypeID = 3)
FROM         
	dbo.Agents A LEFT OUTER JOIN dbo.Meetings ON A.AgentID = dbo.Meetings.AgentID
GROUP BY 
	A.Name,
	A.AgentID
ORDER BY 
	A.Name
__________________
I love lamp.

Last edited by Stompy; 04-16-2004 at 09:19 PM..
Stompy is offline  
Old 04-16-2004, 10:12 PM   #5 (permalink)
Crazy
 
So you never specified what RDBMS you are using but this line
Code:
dbo.Meetings.MeetingID
tells me that it's probably SQL server.

One thing you can do is create a view based on the query and select from that. That can speed up execution some. Create a view based on the join tables and index the view. Since you want to include aggregate values in your result set you can't use an indexed view for your final query.

Since you want to include the types based on user input you probably won't be able to use a stored procedure. Stored Procedures can speed things up some because you get the benefit of a pre-compiled (meaning determined) execution path. You *CAN* add columns dynamically, but that kind of programming in T-SQL makes my teeth hurt so I'm not going to suggest that. I'll only give a small piece of advice, select your final result set into a table variable rather than a temp table if you are using SQL Server 2000. Table variables are much more efficient than temp tables in SQL 2K.

The big thing you need to do, and didn't mention, is starting up the SQL profiler and running a trace against the database when you run that query. That'll tell you if it's efficient or not. Start up Query Analyzer, run your query and look at the execution path. That'll tell you where the query takes the most time while it's executing. Make sure your tables are properly indexed.

FYI I run almost exactly the same kind of query against a patient database, counting the number of labs, etc.. that each one has, on a weekly basis and I've got about a quarter of a million rows in one of the tables in my query and there are 7 tables that I hit. It takes a little under 20 seconds to run.
twister002 is offline  
Old 04-17-2004, 04:56 AM   #6 (permalink)
Crazy
 
AxelF's Avatar
 
Location: Europe
Should'nt this report be redesigned? Does the person who asked for this realise how hard it will be to look at the result with infinite rows and infinite columns? Have you challanged the need for this exact layout?
Just your example with 3 transaction types is though enough to read. Seldom can 3 or more dimensions of data be successfully displayed in only 2 dimensions.

So I would redisign it so that you get the result on, lets say, one page per salesman.
__________________
Coffee
AxelF is offline  
Old 04-28-2004, 03:40 PM   #7 (permalink)
Banned from being Banned
 
Location: Donkey
Thanks for the replies.

Yes, this is SQL Server. I knew about the Query Analyzer and the execution path and all that, but was curious as to whether or not there's a better method for querying out data in the manner which I described.

Yeah, THIS report could definitely be redesigned (I'll prob. have a talk w/ my project manager about), but there are times when I had to do ONLY 2 or 3 columns of summed up data.

I've always sat back and wondered if it was good practice or not to query data like that.. basically doing the:

select field1, field2= (SELECT SUM(Amount) FROM tbl WHERE x=1), field3 = (SELECT SUM(Amount) FROM tbl WHERE x=2) from tbl ... etc

Another question I have for you, twister, is that you mentioned I should use a view. I've always used stored procs for EVERYTHING: inserting, updating, and selecting. How does a view differ from a stored proc? I've always used a view as a visual query designer of sorts, then took the SQL it made and copied & pasted to a stored proc. I know you can select FROM a view, but I'm unaware of any advantages it has to offer over stored procs.
__________________
I love lamp.
Stompy is offline  
Old 04-29-2004, 02:58 PM   #8 (permalink)
Crazy
 
I think in this case the biggest advantage is probably the ability to index the view/ You can create an index combining all of the joins that you do by hand, but it's a little easier to maintain if you just index the view. IMO.

If you're just selecting there probably isn't much of a difference except that using a view makes it easier to create an ad-hoc reporting system. Well, a somewhat ad-hoc reporting system anyway. You define the columns that the user can select from, but the users selects the columns to query against.
twister002 is offline  
 

Tags
optimized, query, sql

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -8. The time now is 07:50 AM.

Tilted Forum Project

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Search Engine Optimization by vBSEO 3.6.0 PL2
© 2002-2012 Tilted Forum Project

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360