I have an assignment that is to read from a file that is too large for RandomFileAccess to use. So, instead I'm using FileChannel through MappedByteBuffer reader so that the memory doesn't get overloaded. The first method extract word is to extract the word from inputted file. The second method findNextLetter is to find the next letter as long as it is [a-z][A-Z].
Code:
private static String extractWords(MappedByteBuffer readBuffer) {
StringBuffer word = new StringBuffer();
byte nextByte = readBuffer.get();
while((nextByte > 64 && nextByte < 91) || (nextByte > 96 && nextByte < 123)) {
word.append((char)nextByte);
nextByte = readBuffer.get();
}
if(readBuffer.hasRemaining())
findNextLetter(readBuffer);
return word.toString();
}
private static void findNextLetter(MappedByteBuffer readBuffer) {
byte nextByte = readBuffer.get();
while((nextByte > 64 && nextByte < 91) || !(nextByte > 96 && nextByte < 123)) {
nextByte = readBuffer.get();
}
readBuffer.position(readBuffer.position() - 1);
}
I used an article from CNN as a test file. the first word was "WASHINGTON" and it printed it out just fine but from that point on, all captialized letters were ignored. Here's what the test file looks like:
Code:
WASHINGTON (CNN) -- North Korea's claims of a nuclear test establish Pyongyang as a "threat to international peace," President Bush said Wednesday as he pledged to defend U.S. allies and interests in the region.
And it would print out something like this:
Code:
WASHINGTON
orth
orea
s
claims
of
a
nuclear
test
establish
yongyang
as
a
threat
to
international
peace
resident
ush
said
ednesday
as
he
pledged
to
defend
allies
and
interests
in
the
I did a byte printout of each letter and found that each captialized letter still falls within the 65-90 and 95-122 range. Any idea why?