MS SQL Server Performance/Redundancy? - Tilted Forum Project Discussion Community

asshopo · 10-13-2004, 06:06 AM

Ok, a little background. We have about 1000 clients that hit our servers hard. We have 7 or so webservers and 4 database servers. Right now, if a database server has to be rebooted, all clients on that machine are down for the length of the reboot. Even scarier, if anything on any of the machines dies, all the clients on that database server is down until we fix the server! Other times, a single server for a group of clients gets overloaded and we currently move their db to another server. We need something that can let the extra load spill over to another, less busy machine.

We need redundancy somehow. I want to be able to upgrade hardware on the machine, install MS updates, or know that if a cpu dies that our clients won't be left in the dark. What can help this? Clusters? Replication?

Will replication work? If so, how would you tell the scripts on the webservers to look at another database server if 1 went down?

I know clusters can be super expensive if you get a SAN, but is there another way that will work with less money? A machine with raid 0+1 for 500gb hd redundant space that does nothing but filesharing with gigabit ethernet to each machine?

I'm not limiting the possible solution to clusters or replication. If you have any other suggestions, please, throw them out.

Thanks.

bendsley · 10-13-2004, 08:07 AM

Windows Server 2003 provides several methods of improving system- or server-level fault tolerance by using a few of the services included in the Enterprise and Datacenter platforms.

Also, if you build machines that are there strictly for replication, you should just be able to change your router to point to the new machine, assuming it's replicating data on the fly.

This link might help you some too.

I do have one question for you though, what RAID are you using? We run a quad xeon machine with MS SQL on it in RAID 1 over 6 harddrives in three arrays. We have found that for SQL, RAID 1 is actually faster than the RAID 5 alternative. Everything else we have uses the RAID 5.

asshopo · 10-13-2004, 08:24 AM

Quote:

Originally Posted by bendsley

Windows Server 2003 provides several methods of improving system- or server-level fault tolerance by using a few of the services included in the Enterprise and Datacenter platforms.

Also, if you build machines that are there strictly for replication, you should just be able to change your router to point to the new machine, assuming it's replicating data on the fly.

This link might help you some too.

I do have one question for you though, what RAID are you using? We run a quad xeon machine with MS SQL on it in RAID 1 over 6 harddrives in three arrays. We have found that for SQL, RAID 1 is actually faster than the RAID 5 alternative. Everything else we have uses the RAID 5.

Awesome link, thanks! Some of our database servers don't even have raid, and I believe our others have raid 0+1. But also, as I said, sometimes a single database server overloaded and would be nice to let the excess load spill over to another, less active machine. Can this only be done thru clustering?

A buddy made a suggestion of having a load balancer split up the work between, say 4 machines, and have them all replicate. Should 1 machine go down, the load balancer would see this and not use that machine until it sees that it is up again. Would this work?

bendsley · 10-14-2004, 05:41 AM

Yeah, thats really what load balancing is. You would have to spend some money, possibly setup some more drives and machines, but the load balancer would detect faults coming through and route them accordingly to the fastest machine. Once the machine that was the fastest has a lot more traffic coming to it, it's obviously not the fastest machine anymore and the traffic gets routed elsewhere.

I would also hope that the machines you have running Raid 0 have tape backups for them? That would be nice if they aren't in place.

What might need to happen is one night you tell your customers that you need to save their sql tables and bring down the servers. Cluster all of the machines and let them replicate on the fly. Enable load-balancing and go from there. I believe that MS SQL can be load balanced, in that it doesn't use shared memory on one machine (ie. they would use memory on the local machine).

Another suggesting might be that you actually just build a large NAS through Fibre Channel connections and store all of your data on there. Have all of the traffic balanced on several machines and have MS SQL point to the NAS. If you do this, your NAS will generate considerable heat considering the drives would be moving most all the time (if you're really getting hit a lot). At least this way you would gain redundancy and have speed through Fibre Channel.

asshopo · 10-14-2004, 08:15 AM

Any machines we have that have raid 0 also have raid 1 (0+1) in that there is a mirror of the raid 0 array.

As for the on the fly replication.. what happens if the same record is updated on 2 machines at the same time. For example, we have a counter field to see how many times an image has been loaded. Is the lag of waitng for the replication going to affect the data? I know the replication is on the fly, but say you have 4 machines load balanced and all are at 80% useage.. its not going to be able to instantly replicate data between all the servers and there is a possiblity of something not getting updated properly.

Thanks for the help, there doesn't seem to be alot of help out there on this subject.

bendsley · 10-14-2004, 11:01 AM

What would happen is that all of the servers would be replicated first. All would have the same data. They need to have the same amount of space on them for this to work effectively. Anyway, all servers would have the same data and when something is added, all of the servers have the same access to the same information. The load-balancing comes in to where the "balancer" tells the application which server to route to.

What I would suggest is putting all of the data in one central location, instead of on several different servers. Then, on each of your servers, you have SQL loaded and tell it where the data store is located (NAS or what-have you). The SQL on every server will always pull the same info from the same place, but the SQL server is being load-balanced for how to handle the information.

Data storage would be easy since all you're putting into it is text from SQL. What is power hungry is how what you do with the SQL info, or how MS SQL uses the info. MS SQL is quite a power hungry/memory hungry program.

asshopo · 10-14-2004, 12:09 PM

Sounds like an idea. Only problem that might occur is file locking. Doesnt MS SQL lock the data and log files? If it does, how do the other machines connect to those data/log files? I did a bit of research and they say that multiple machines can not read/write to the same mdf file, however, reading would be fine. This wont work for us as we need to load balance primarily writing/updating of data.

bendsley · 10-14-2004, 12:17 PM

You might end up just having to buy larger machines then to handle the load. Ours is running Quad Xeon 3.2Ghz with 2mb of cache on each processor and 8gigs of ram.

Are you machines comparable to this?

littlebighead · 10-16-2004, 01:44 AM

I'm not in the IT dept, but I believe the strategy my company uses is something like this:
1. cluster two or four servers all sharing the same SAN disk array and with load-balancer
2. 2PC replication to a standby cluster
3. log-shipping to a remote standby server cluster every five min
4. daily full-backups to remote disk
5. daily full/diff backups to tape

Sounds like overkill, but we haven't had any major problems yet.

10-13-2004, 06:06 AM	#1 (permalink)
asshopo Insane Location: Michigan	MS SQL Server Performance/Redundancy? Ok, a little background. We have about 1000 clients that hit our servers hard. We have 7 or so webservers and 4 database servers. Right now, if a database server has to be rebooted, all clients on that machine are down for the length of the reboot. Even scarier, if anything on any of the machines dies, all the clients on that database server is down until we fix the server! Other times, a single server for a group of clients gets overloaded and we currently move their db to another server. We need something that can let the extra load spill over to another, less busy machine. We need redundancy somehow. I want to be able to upgrade hardware on the machine, install MS updates, or know that if a cpu dies that our clients won't be left in the dark. What can help this? Clusters? Replication? Will replication work? If so, how would you tell the scripts on the webservers to look at another database server if 1 went down? I know clusters can be super expensive if you get a SAN, but is there another way that will work with less money? A machine with raid 0+1 for 500gb hd redundant space that does nothing but filesharing with gigabit ethernet to each machine? I'm not limiting the possible solution to clusters or replication. If you have any other suggestions, please, throw them out. Thanks.

10-13-2004, 08:07 AM	#2 (permalink)
bendsley Professional Loafer Location: texas	Windows Server 2003 provides several methods of improving system- or server-level fault tolerance by using a few of the services included in the Enterprise and Datacenter platforms. Also, if you build machines that are there strictly for replication, you should just be able to change your router to point to the new machine, assuming it's replicating data on the fly. This link might help you some too. I do have one question for you though, what RAID are you using? We run a quad xeon machine with MS SQL on it in RAID 1 over 6 harddrives in three arrays. We have found that for SQL, RAID 1 is actually faster than the RAID 5 alternative. Everything else we have uses the RAID 5. __________________ "You hear the one about the fella who died, went to the pearly gates? St. Peter let him in. Sees a guy in a suit making a closing argument. Says, "Who's that?" St. Peter says, "Oh, that's God. Thinks he's Denny Crane." Last edited by bendsley; 10-13-2004 at 08:10 AM..

10-14-2004, 05:41 AM	#4 (permalink)
bendsley Professional Loafer Location: texas	Yeah, thats really what load balancing is. You would have to spend some money, possibly setup some more drives and machines, but the load balancer would detect faults coming through and route them accordingly to the fastest machine. Once the machine that was the fastest has a lot more traffic coming to it, it's obviously not the fastest machine anymore and the traffic gets routed elsewhere. I would also hope that the machines you have running Raid 0 have tape backups for them? That would be nice if they aren't in place. What might need to happen is one night you tell your customers that you need to save their sql tables and bring down the servers. Cluster all of the machines and let them replicate on the fly. Enable load-balancing and go from there. I believe that MS SQL can be load balanced, in that it doesn't use shared memory on one machine (ie. they would use memory on the local machine). Another suggesting might be that you actually just build a large NAS through Fibre Channel connections and store all of your data on there. Have all of the traffic balanced on several machines and have MS SQL point to the NAS. If you do this, your NAS will generate considerable heat considering the drives would be moving most all the time (if you're really getting hit a lot). At least this way you would gain redundancy and have speed through Fibre Channel. __________________ "You hear the one about the fella who died, went to the pearly gates? St. Peter let him in. Sees a guy in a suit making a closing argument. Says, "Who's that?" St. Peter says, "Oh, that's God. Thinks he's Denny Crane."

10-14-2004, 11:01 AM	#6 (permalink)
bendsley Professional Loafer Location: texas	What would happen is that all of the servers would be replicated first. All would have the same data. They need to have the same amount of space on them for this to work effectively. Anyway, all servers would have the same data and when something is added, all of the servers have the same access to the same information. The load-balancing comes in to where the "balancer" tells the application which server to route to. What I would suggest is putting all of the data in one central location, instead of on several different servers. Then, on each of your servers, you have SQL loaded and tell it where the data store is located (NAS or what-have you). The SQL on every server will always pull the same info from the same place, but the SQL server is being load-balanced for how to handle the information. Data storage would be easy since all you're putting into it is text from SQL. What is power hungry is how what you do with the SQL info, or how MS SQL uses the info. MS SQL is quite a power hungry/memory hungry program. __________________ "You hear the one about the fella who died, went to the pearly gates? St. Peter let him in. Sees a guy in a suit making a closing argument. Says, "Who's that?" St. Peter says, "Oh, that's God. Thinks he's Denny Crane."

10-14-2004, 12:17 PM	#8 (permalink)
bendsley Professional Loafer Location: texas	You might end up just having to buy larger machines then to handle the load. Ours is running Quad Xeon 3.2Ghz with 2mb of cache on each processor and 8gigs of ram. Are you machines comparable to this? __________________ "You hear the one about the fella who died, went to the pearly gates? St. Peter let him in. Sees a guy in a suit making a closing argument. Says, "Who's that?" St. Peter says, "Oh, that's God. Thinks he's Denny Crane."

10-14-2004, 08:15 AM	#5 (permalink)
asshopo Insane Location: Michigan	Any machines we have that have raid 0 also have raid 1 (0+1) in that there is a mirror of the raid 0 array. As for the on the fly replication.. what happens if the same record is updated on 2 machines at the same time. For example, we have a counter field to see how many times an image has been loaded. Is the lag of waitng for the replication going to affect the data? I know the replication is on the fly, but say you have 4 machines load balanced and all are at 80% useage.. its not going to be able to instantly replicate data between all the servers and there is a possiblity of something not getting updated properly. Thanks for the help, there doesn't seem to be alot of help out there on this subject.

10-14-2004, 12:09 PM	#7 (permalink)
asshopo Insane Location: Michigan	Sounds like an idea. Only problem that might occur is file locking. Doesnt MS SQL lock the data and log files? If it does, how do the other machines connect to those data/log files? I did a bit of research and they say that multiple machines can not read/write to the same mdf file, however, reading would be fine. This wont work for us as we need to load balance primarily writing/updating of data.

10-16-2004, 01:44 AM	#9 (permalink)
littlebighead Upright Location: Berkeley, CA	I'm not in the IT dept, but I believe the strategy my company uses is something like this: 1. cluster two or four servers all sharing the same SAN disk array and with load-balancer 2. 2PC replication to a standby cluster 3. log-shipping to a remote standby server cluster every five min 4. daily full-backups to remote disk 5. daily full/diff backups to tape Sounds like overkill, but we haven't had any major problems yet.