Yeah it looks like it's a couple of cycles slower per swap. Unless there is some crazy x86 instruction I don't know about to do it better.
Code:
mov eax, x
mov ebx, y
xor eax, ebx
xor ebx, eax
xor eax, ebx
mov x, eax
mov y, ebx
vs.
Code:
mov eax, x
mov ebx, y
mov y, eax
mov x, ebx
But these are of course the human-written versions; the Visual Studio compiler generates 9 xors and movs for the first one and 6 for the second.