03-01-2004, 03:57 AM | #1 (permalink) |
Addict
Location: Grey Britain
|
Databases, Optimisation and Search Algorithms
I need to write a program that searches a very large databse with a view to optimising the value of a particular variable (or perhaps I should say field). Most of the fields in the database will be numerical and inter-related, but some are not. For this reason, I have been thinking of using a Genetic Algorithm or an Annealing Algorithm of some description, but this would not guarantee an absolute optimum value. I'm wondering if anyone has any good ideas on how to ensure an absolute optimum without having to perform a comprehensive search.
__________________
"No one was behaving from very Buddhist motives. Then, thought Pigsy, he was hardly a Buddha, nor was he a monkey. Presently, he was a pig spirit changed into a little girl pretending to be a little boy to be offered to a water monster. It was all very simple to a pig spirit." |
03-01-2004, 09:19 AM | #2 (permalink) |
WARNING: FLAMMABLE
Location: Ask Acetylene
|
Without know anything about the nature of the problem space I can't say much.
Short answer: You can't. If you want to the perfect answer there is no substitute for IDA* and an admissible heuristic. Long answer: It depends how effective your heuristics are and whether the problem space has many local maximums on the path to most good solutions. How big is this database and is this a single value your trying maximize or a set of values that you wish to maximize. The real bottomline comes down to the heuristic you develop. Start with subsets (REPRESENTATIVE subsets) of the data and compare the performance of various heuristics. Whether you use GA, Annealing, or local beam search (you should implement all three and see which works best with each heuristic). Pretty printing is suprisingly useful in developing heuristics, the more information you have about what is going on during the actual search the more intuition you will have available to you for developing heuristics. Also keep an eye on performance, a cleaner program will expand states faster, and allow you to gather results more quickly. Frequently with careful optimization you can make a simulation that runs in hours run on order of minutes. This isn't very important when your running the simulation for real, but when your developing and you need to run it many times to compare heuristics it can be a big deal. More then anything else your choice of representation will dictate the performance of expansions. It's the usual balance between space and time, so if you have the space... use it! It will save you many hours twiddling your thumbs.
__________________
"It better be funny" Last edited by kel; 03-01-2004 at 10:14 AM.. |
03-20-2004, 02:34 PM | #3 (permalink) |
Once upon a time...
|
optimality cannot be acheived in a guaranteed fashion through stochastic methods.
IDA* is a good idea, kel's answer is pretty comprehensive. But I find it hard to think of a goal, unless you can test for optimality arbitrarily. So you might be looking at B&B. It's difficult to suggest something without knowing more about the problem, but you might be able to pre-calculate some predictor. Try examining statistical methods before you do anything else. Ideally, you could cook up some correlation factor and just extract the best one of those. If you are going down the annealing / genetic line, you must have a fitness function. Why not just apply the fitness function to the data? It means a comprehensive search, but you might be able to subdivide the data by sorting it and bounding it.
__________________
-- Man Alone ======= Abstainer: a weak person who yields to the temptation of denying himself a pleasure. Ambrose Bierce, The Devil's Dictionary. Last edited by manalone; 03-20-2004 at 02:37 PM.. |
Tags |
algorithms, databases, optimisation, search |
|
|