Tilted Forum Project Discussion Community  

Go Back   Tilted Forum Project Discussion Community > Interests > Tilted Technology


 
 
LinkBack Thread Tools
Old 05-04-2007, 05:13 PM   #1 (permalink)
Poo-tee-weet?
 
JStrider's Avatar
 
Location: The Woodlands, TX
[Perl]Text file manipulation help

I'm doing a Data mining project and am trying to get the data processed into a more usable format. I'm using some aol search data that was released a few months ago. my professor recommended I take all the queries for one userID and group them into one field and delete all data except the userids and queries.

My prof whipped up a quick perl script that works, but It includes the queries that are empty, that show a "-" I would like to remove the queries that have dashes, and remove the users whose queries are already only dashes.

so basically it needs to delete the rows that have a "-" in Query it is alright if I use another script to process the original data deleting those rows and then run the script my prof gave me to group it all together.

if you look at the data I have put down below you can see user 217 has several dashes.

if its not clear lemme know and I'll try to explain better.

This is the script my professor wrote.
Code:
#!/usr/bin/perl

$id = 0;
$i = 0;
$j = 0;
$q = "";
open (IN, "test");
open (OUT, ">testresult");
while (<IN>){
  chomp;
  @a = split /\t/;
  if ($id == $a[0]){
	$q = $q . "; $a[1]";
	$i++;	
  }
  else{
	$j++;
	print OUT "$id\t$q\t$i\n";
	$id = $a[0];
	$q = $a[1];
	$i = 1;
  }

}
close(IN);
close(OUT);


This is a small part of the dataset
the columns go
Code:
AnonID	Query	QueryTime	ItemRank	ClickURL
Code:
142	rentdirect.com	2006-03-01 07:17:12		
142	www.prescriptionfortime.com	2006-03-12 12:31:06		
142	staple.com	2006-03-17 21:19:29		
142	staple.com	2006-03-17 21:19:45		
142	www.newyorklawyersite.com	2006-03-18 08:02:58		
142	www.newyorklawyersite.com	2006-03-18 08:03:09		
142	westchester.gov	2006-03-20 03:55:57	1	http://www.westchestergov.com
142	space.comhttp	2006-03-24 20:51:24		
142	dfdf	2006-03-24 22:23:07		
142	dfdf	2006-03-24 22:23:14		
142	vaniqa.comh	2006-03-25 23:27:12		
142	www.collegeucla.edu	2006-04-03 21:12:14		
142	www.elaorg	2006-04-03 21:25:20		
142	207 ad2d 530	2006-04-08 01:31:04		
142	207 ad2d 530	2006-04-08 01:31:14	1	http://www.courts.state.ny.us
142	broadway.vera.org	2006-04-08 08:38:23		
142	broadway.vera.org	2006-04-08 08:38:31		
142	vera.org	2006-04-08 08:38:42	1	http://www.vera.org
142	broadway.vera.org	2006-04-08 08:39:30		
142	frankmellace.com	2006-04-09 02:19:24		
142	ucs.ljx.com	2006-04-09 02:20:44		
142	attornyleslie.com	2006-04-13 00:25:27		
142	merit release appearance	2006-04-22 23:51:18		
142	www.bonsai.wbff.org	2006-05-06 08:49:34		
142	loislaw.com	2006-05-12 22:43:36		
142	rapny.com	2006-05-18 09:21:57		
142	whitepages.com	2006-05-19 19:36:31		
217	lottery	2006-03-01 11:58:51	1	http://www.calottery.com
217	lottery	2006-03-01 11:58:51	1	http://www.calottery.com
217	ameriprise.com	2006-03-01 14:06:23	1	http://www.ameriprise.com
217	susheme	2006-03-02 12:31:08		
217	united.com	2006-03-03 14:54:13		
217	mizuno.com	2006-03-07 22:41:17	1	http://www.mizuno.com
217	p; .; p;' p; ' ;' ;';	2006-03-09 12:09:27		
217	p; .; p;' p; ' ;' ;';	2006-03-09 12:09:35		
217	asiansexygoddess.com	2006-03-16 14:31:36	1	http://www.asiansexygoddess.com
217	buddylis	2006-03-16 15:23:33		
217	bestasiancompany.com	2006-03-20 15:15:43	1	http://www.bestasiancompany.com
217	lottery	2006-03-27 14:10:38	1	http://www.calottery.com
217	lottery	2006-03-27 16:34:59	1	http://www.calottery.com
217	ask.com	2006-03-31 14:31:10	1	http://www.ask.com
217	weather.com	2006-03-31 18:00:56		
217	wellsfargo.com	2006-04-03 16:57:54		
217	www.tabiecummings.com	2006-05-04 17:45:57		
217	wanttickets.com	2006-05-16 15:44:38		
217	yahoo.com	2006-05-16 16:35:31		
217	-	2006-05-18 18:20:10	1	http://www.theonering.net
217	www.ngo-quen.org	2006-05-22 15:49:47		
217	-	2006-05-22 16:48:42		
217	vietnam	2006-05-22 17:43:42		
217	vietnam	2006-05-22 17:43:42		
217	vietnam	2006-05-22 17:43:44		
217	vietnam	2006-05-22 18:03:24		
217	vietnam	2006-05-22 18:03:24		
217	vietnam	2006-05-22 18:03:27		
217	-	2006-05-23 15:41:48		
993	myspace.co	2006-03-01 12:13:36		
993	myspace.com	2006-03-01 12:13:41		
993	googl	2006-03-01 15:03:25		
993	chasebadkids.net	2006-03-03 16:55:48	1	http://www.chasebadkids.net
1268	ozark horse blankets	2006-03-01 17:39:28	8	http://www.blanketsnmore.com
1268	www.ghostrockranch.com	2006-03-04 13:58:23		
1268	openrangeht.zachsairforce.com	2006-03-09 22:38:45		
1268	sstack.com	2006-03-11 00:17:09		
1268	www.mecab.org	2006-03-12 18:59:26		
1268	www.raindanceexpress.com	2006-03-18 20:13:01		
1268	www.victoriacostumiere.com	2006-03-19 00:26:51		
1268	osteen-schaztberg.com	2006-03-21 17:55:25		
1268	osteen-schatzberg.com	2006-03-21 17:55:42	1	http://www.osteen-schatzberg.com
1268	osteen-schatzberg.com	2006-03-21 17:55:42	2	http://www.osteen-schatzberg.com
1268	www.buckmountianestates.com	2006-03-24 18:53:10		
1268	idx.techsolsc.com	2006-05-07 00:58:21		
1268	www.bridleandbit.com	2006-05-09 21:34:23		
1268	gall stones	2006-05-11 02:12:51		
1268	gallstones	2006-05-11 02:13:02	1	http://www.niddk.nih.gov
1268	http www.flickr.com photos 88145967 n00 24368586 in pool-32148876 n00	2006-05-12 00:09:54		
1268	http www.flickr.com photos 88145967 n00 24368586 in pool-32148876 n00	2006-05-12 00:10:26		
1268	href a href alt a http www.flickr.com photos 88145967 n00 24368586 in pool-32148876 n00	2006-05-12 01:28:30		
1268	http www.flickr.com photos 88145967 n00 24368586 in pool-32148876 n00	2006-05-12 10:41:27		
1268	www.acevedoarabians.com	2006-05-21 21:28:21		
1268	adbuyer3.lycos.com	2006-05-31 14:10:52		
1268	www.pinerplantation.com	2006-05-31 21:24:08		
1268	www.pinerplantation.com	2006-05-31 21:24:30		
1268	www.pinerplantation.com	2006-05-31 21:24:56		
1326	files	2006-03-01 17:36:08		
1326	www.kmcwheel.com	2006-03-06 17:31:55		
1326	dellcomputers	2006-03-06 20:09:58		
1326	www.ameicaneaglewheel.com	2006-03-09 19:09:52		
1326	cascadefamilymedical	2006-03-14 11:36:57		
1326	cascadefamilymedical.com	2006-03-14 11:39:49		
1326	milaniwheel.com	2006-03-14 12:37:30		
1326	www.ameicaneaglewheel.com	2006-03-14 18:53:20		
1326	www.ameicaneaglewheel.com	2006-03-15 12:27:48		
1326	pop up adds	2006-03-15 20:07:38		
1326	pop up adds	2006-03-15 20:08:29		
1326	the childs wonderland company	2006-03-21 11:50:10		
1326	the child's wonderland company	2006-03-21 11:59:03	6	http://www.wonderlandtheatre.com
1326	the child's wonderland company	2006-03-21 12:00:55		
1326	the child's wonderland company grand rapids michigan	2006-03-21 12:01:24		
1326	the child's wonderland company grand rapids michigan	2006-03-21 12:01:59		
1326	the childs wonderland co.	2006-03-21 21:20:42		
1326	the child's wonderland co.	2006-03-21 21:22:16		
1326	www.ameicaneaglewheel.com	2006-03-22 12:23:07		
1326	www.budget rentals.com	2006-03-24 18:26:10		
1326	budget truck rental	2006-03-24 18:27:07		
1326	adr wheels	2006-03-28 12:53:39		
1326	adr wheels	2006-03-28 12:57:04		
1326	holiday mansion houseboat	2006-03-29 17:14:01	5	http://www.everyboat.com
1326	back to the future	2006-04-01 17:59:28	1	http://www.imdb.com
1326	holiday mansion houseboat	2006-04-06 20:20:43	1	http://www.iboats.com
1326	www.ameicaneaglewheel.com	2006-04-10 14:04:49		
1326	www.ameicaneaglewheel.com	2006-04-10 14:05:15		
1326	the childs wonderland company	2006-04-11 17:25:27		
1326	konig wheels	2006-04-18 13:29:52	2	http://www.konigwheels.com
1326	konig wheels	2006-04-18 13:29:52	1	http://www.konigwheels.com
1326	jet blue airlines	2006-04-27 15:29:05		
1326	coats tire equipment	2006-04-28 15:53:18		
1326	coats tire equipment	2006-05-03 19:15:01		
1326	verizon wireless	2006-05-09 00:09:22		
1326	www.crazyradiodeals.com	2006-05-23 18:00:30		
1337	uslandrecords.com	2006-03-01 11:50:34	1	http://www.seda-cog.org
1337	titlesourcein.com	2006-03-14 15:45:07		
1337	titlesourceinc	2006-03-14 15:45:55	1	http://www.titlesourceinc.com
1337	select business services	2006-03-14 15:51:41		
1337	select business services title	2006-03-14 15:52:10		
1337	cbc companies	2006-03-14 15:52:44	2	http://www.cbc-companies.com
1337	cbc companies	2006-03-14 15:52:44	3	http://www.cbc-companies.com
1337	cbc companies	2006-03-14 15:52:44	4	http://www.mktgservices.com
1337	national real estate settlement services	2006-03-14 15:59:13	1	http://www.realtms.com
1337	national real estate settlement services	2006-03-14 15:59:13	7	http://dmoz.org
1337	pennsylvania real estate settlement services	2006-03-14 16:04:40		
1337	pennsylvania real estate settlement services	2006-03-14 16:05:11		
1337	sunbury pennsylvania real estate settlement services	2006-03-14 16:05:47		
1337	sunbury pennsylvania real estate settlement services	2006-03-14 16:06:28	14	http://pa.optimuslaw.com
1337	atm corporation	2006-03-15 13:46:55	1	http://www.atmprof.com
1337	cheasapeake appraisal and settlement services	2006-03-15 13:50:56	1	http://www.johnkvaluation.com
1337	chesapeake appraisal and settlement services	2006-03-15 13:51:52	10	http://www.citigroup.com
1337	pauslandrecords.com	2006-03-20 09:40:50		
1337	pa.uslandrecords.com	2006-03-20 09:41:08	2	http://www.seda-cog.org
1337	first american lenders advantage	2006-03-22 16:05:56	1	http://www.firstam.com
1337	first american chesapeake	2006-03-22 16:11:31		
1337	first american chesapeake title services	2006-03-22 16:11:50	2	http://www.tavma.com
1337	www.national-reis.com	2006-03-22 16:16:56		
1337	www.americantitleinc.com	2006-03-22 16:19:23		
1337	www.aculinkms.com	2006-03-22 16:19:31		
1337	united one resources	2006-03-22 17:47:11	1	http://www.unitedoneresources.com
1337	credit plus solutions group	2006-03-22 17:52:53		
1337	credit plus solutions group	2006-03-22 17:54:09	1	http://www.cpsg.com
1337	security search and abstract	2006-03-22 17:56:19	1	http://www.securitysearchabstract.com
1337	searchtec	2006-03-22 17:58:46	1	http://www.searchtec.com
1337	searchtec	2006-03-22 17:58:46	1	http://www.searchtec.com
1337	fiserv	2006-03-24 14:05:01	1	http://www.fiserv.com
1337	fiserv	2006-03-24 14:05:01	3	http://www.fiservlendingsolutions.com
1337	fiserv	2006-03-24 14:05:01	2	http://www.fiservinsurance.com
1337	fiserv	2006-03-24 14:05:01	3	http://www.fiservlendingsolutions.com
1337	integrated real estate	2006-03-27 14:52:29	1	http://www.integratedreal.com
1337	integrated real estate	2006-03-27 14:52:29	2	http://www.irisnet.net

this is the output with dashes
Code:
0		0
142	rentdirect.com; www.prescriptionfortime.com; staple.com; staple.com; www.newyorklawyersite.com; www.newyorklawyersite.com; westchester.gov; space.comhttp; dfdf; dfdf; vaniqa.comh; www.collegeucla.edu; www.elaorg; 207 ad2d 530; 207 ad2d 530; broadway.vera.org; broadway.vera.org; vera.org; broadway.vera.org; frankmellace.com; ucs.ljx.com; attornyleslie.com; merit release appearance; www.bonsai.wbff.org; loislaw.com; rapny.com; whitepages.com	27
217	lottery; lottery; ameriprise.com; susheme; united.com; mizuno.com; p; .; p;' p; ' ;' ;';; p; .; p;' p; ' ;' ;';; asiansexygoddess.com; buddylis; bestasiancompany.com; lottery; lottery; ask.com; weather.com; wellsfargo.com; www.tabiecummings.com; wanttickets.com; yahoo.com; -; www.ngo-quen.org; -; vietnam; vietnam; vietnam; vietnam; vietnam; vietnam; -	29
993	myspace.co; myspace.com; googl; chasebadkids.net	4
1268	ozark horse blankets; www.ghostrockranch.com; openrangeht.zachsairforce.com; sstack.com; www.mecab.org; www.raindanceexpress.com; www.victoriacostumiere.com; osteen-schaztberg.com; osteen-schatzberg.com; osteen-schatzberg.com; www.buckmountianestates.com; idx.techsolsc.com; www.bridleandbit.com; gall stones; gallstones; http www.flickr.com photos 88145967 n00 24368586 in pool-32148876 n00; http www.flickr.com photos 88145967 n00 24368586 in pool-32148876 n00; href a href alt a http www.flickr.com photos 88145967 n00 24368586 in pool-32148876 n00; http www.flickr.com photos 88145967 n00 24368586 in pool-32148876 n00; www.acevedoarabians.com; adbuyer3.lycos.com; www.pinerplantation.com; www.pinerplantation.com; www.pinerplantation.com	24
1326	files; www.kmcwheel.com; dellcomputers; www.ameicaneaglewheel.com; cascadefamilymedical; cascadefamilymedical.com; milaniwheel.com; www.ameicaneaglewheel.com; www.ameicaneaglewheel.com; pop up adds; pop up adds; the childs wonderland company; the child's wonderland company; the child's wonderland company; the child's wonderland company grand rapids michigan; the child's wonderland company grand rapids michigan; the childs wonderland co.; the child's wonderland co.; www.ameicaneaglewheel.com; www.budget rentals.com; budget truck rental; adr wheels; adr wheels; holiday mansion houseboat; back to the future; holiday mansion houseboat; www.ameicaneaglewheel.com; www.ameicaneaglewheel.com; the childs wonderland company; konig wheels; konig wheels; jet blue airlines; coats tire equipment; coats tire equipment; verizon wireless; www.crazyradiodeals.com	36
thanks for any help you guys.
__________________
-=JStrider=-

~Clatto Verata Nicto
JStrider is offline  
Old 05-05-2007, 10:59 AM   #2 (permalink)
Psycho
 
are you a cs major? do you understand what your prof wrote? you should understand it and try to solve it yourself, otherwise the later perl scripts your prof tells you to write will only get harder..

what your prof wrote assumes the same user ID only occurs in consecutive blocks. if it occurs in different blocks, then each user id will be printed out on different liens.

here is my solution, haven't tested it but it should work..

if ($id == $a[0]){
if ($a[1] != '\-') { // skip the dashes
$q = $q . "; $a[1]";
$i++;
}
}
else{
$j++;
if ($i != 0) { // does not print if 0 occurences of id (ie all dashes)
print OUT "$id\t$q\t$i\n";
}
$id = $a[0];
if ($a[1] == '\-') {
$q = "";
$i = 0;
}
else {
$q = $a[1];
$i = 1;
}
}
match000 is offline  
Old 05-05-2007, 12:45 PM   #3 (permalink)
Darth Papa
 
ratbastid's Avatar
 
Location: Yonder
There all sorts of tests you might add to the condition that makes us save the query. As match000 points out, if the user's records are intermingled, this script doesn't quite solve.

Personally, I'd use a hash to store up all the queries for each userid, and dump them all at once. Assuming the data set is small enough to fit in available memory, of course...
ratbastid is offline  
Old 05-05-2007, 01:23 PM   #4 (permalink)
Poo-tee-weet?
 
JStrider's Avatar
 
Location: The Woodlands, TX
I'm not a CS major, this is for a Data Mining class, and this is just part of the data preprocessing, before we can start analyzing the data, so its not critical what method we use to process the data, just that we do process it.

and the data is sorted by userid, and then date, so it should be fine assuming all the userids are in consecutive blocks.

as far as loading everything up into a hash the dataset is 2.1GB so that prolly wouldnt work.

my partner just figured one out that seems to work on a small part of the dataset.

Code:
#!/usr/bin/perl

$id = 0;
#$i = 0;
#$j = 0;
$q = "";

open (IN, "Test");
open (OUT, ">TestResult");
while (<IN>){
  chomp;               #removes all newline characters
  @a = split /\t/;     #the whole dataset, not just a line

if ($id == $a[0]){
       if (!($a[1] =~ m/[\-]$/)){      #regex FTW!
       $q = $q . "; $a[1]";
       }
        #$i++;
  }
  else{
       #$j++;
       print OUT "$id\t$q\t\n";
       #modify this print statement to display the number of terms this user
       #searched for

       $id = $a[0];
       $q = $a[1];
       #$i = 1;
  }
}
close(IN);
close(OUT);

but for some reason I'm getting "bash: ./group_ID.pl: /usr/bin/perl: bad interpreter: Permission denied" when I try to run it... just reinstalled ubuntu this morning... must have something to do with that...
__________________
-=JStrider=-

~Clatto Verata Nicto
JStrider is offline  
Old 05-05-2007, 01:35 PM   #5 (permalink)
Psycho
 
your partner's solution does not take care of the case where the userID has a single occurence AND has a query of '-'.

my soln offers one way of taking care of this..

also for my soln, move j++ inside the if block that has the print statement. i am assuming j counts the number of userID (blocks) printed..
match000 is offline  
Old 05-05-2007, 02:09 PM   #6 (permalink)
Poo-tee-weet?
 
JStrider's Avatar
 
Location: The Woodlands, TX
match, when I run your script on the small dataset I just get the output

"142 ; 207 ad2d 530; 207 ad2d 530 2"

thanks for all the help, its really appreciated. do you have any good online tutorials for perl that I can check out, I can follow the logic well enough, but the syntax just seems weird to me.

looks like my partners script has problems with some locations where there is only one userid and it doesnt get all the dashes...
__________________
-=JStrider=-

~Clatto Verata Nicto

Last edited by JStrider; 05-05-2007 at 02:21 PM.. Reason: Automerged Doublepost
JStrider is offline  
Old 05-05-2007, 03:20 PM   #7 (permalink)
Psycho
 
dunno, could be logical error or most likely could be the way I do the character (dash '-') comparison..

$a[1] == '\-' might be not correct. i am assuming - is a special character so you have to use '\-' but maybe you just do '-' if its not a special character.

also, i don't know if in perl you can do == '\-', might have to do a regExp like your partner did.. something like:

$a[1] =~ /\s*\-\s*/ (not sure if this is correct, but you get the picture)

hope this helps

for tutorials, google: perl tutorial, the first hit is very outdated but I read it anyways (in Jan) to learn Perl.. also google: stanford perl tutorial, i think that gives a good one..
match000 is offline  
Old 05-05-2007, 05:31 PM   #8 (permalink)
Junkie
 
Location: San Antonio, TX
Well. First things first.

Add:


use warnings;
use strict;


Right below the #!/usr/bin/perl line in your script. Do this for all your perl scripts, great or small. Tell your professor I told him to do this as well.

Also, I noticed that $j is never used. Weird, but whatever.

Anyway, the test I decided to use was /\w\w/ - ie 'does the supposed URL contain 2 'word' characters (a-z, plus dash) this rules out a single dash, and a bunch of other possible bogus data.

As an aside, apparently tfproject's software doesn't properly escape '&gt;' and '&lt;' - so you need to change them to '&amp;gt;' and '&amp;lt;' respectively when you post code, even inside a 'code' block - this is what caused the 'while' line to not show up correctly.

Code:
#!/usr/bin/perl

use warnings;
use strict;

my $id = 0;
my $i = 0;
my $j = 0;
my $q = "";

open (IN, "test");
open (OUT, ">testresult");

while (&lt;IN&gt;){
  chomp;
  my @a = split /\t/;

  if ($id == $a[0]) {
    if ($a[1] =~ /\w\w/) {
      $q = $q . "; $a[1]";
      $i++;
    }
  }
  else {
    $j++;
    print OUT "$id\t$q\t$i\n";
    $id = $a[0];
    $q = $a[1];
    $i = 1;
  }

}
close(IN);
close(OUT);
Oh, and as far as your error running the script goes - sounds like perl is missing from /usr/bin/perl - try running 'which perl' and see if it's available. Also try 'perl -v' to make sure it really runs. If it isn't there, install it. Or install a real distro. Ubuntu sux. (I kid. ;-))
robot_parade is offline  
Old 05-05-2007, 07:01 PM   #9 (permalink)
Poo-tee-weet?
 
JStrider's Avatar
 
Location: The Woodlands, TX
robot parade I got the scripts running. and what do the use warnings/strict do?

your script is very close, only problem I see is when the field with the '-' in it is the first one for that ID it still includes it. this wouldnt have shown up using the little example dataset I provided.

I'm thinking that the way to go may to be have 2 different scripts. one that just deletes the rows with '-' in the query field. then after running it I run the one that my prof wrote to cluster it.

heres a link to where I downloaded the datasets from originally. http://www.gregsadetsky.com/aol-data/ its a really interesting dataset just to open up and look at and see what people are searching for.
__________________
-=JStrider=-

~Clatto Verata Nicto
JStrider is offline  
Old 05-05-2007, 07:12 PM   #10 (permalink)
Psycho
 
Quote:
Originally Posted by JStrider
your script is very close, only problem I see is when the field with the '-' in it is the first one for that ID it still includes it. this wouldnt have shown up using the little example dataset I provided.
that's the bug i mentioned in my post above. you have to have an if statement for the PRINT statement also, do something similar to what i have in my code to fix this

basically you are not cathcing the case where there is a userID that has only ONE entry, and that one entry is a '-'..
match000 is offline  
Old 05-05-2007, 08:53 PM   #11 (permalink)
Darth Papa
 
ratbastid's Avatar
 
Location: Yonder
Quote:
Originally Posted by robot_parade
use warnings;
use strict;
Indeed.

Quote:
Originally Posted by robot_parade
Anyway, the test I decided to use was /\w\w/ - ie 'does the supposed URL contain 2 'word' characters (a-z, plus dash) this rules out a single dash, and a bunch of other possible bogus data.
That's exactly the test I was planning on proposing.
ratbastid is offline  
Old 05-06-2007, 09:51 AM   #12 (permalink)
Junkie
 
Location: San Antonio, TX
Quote:
Originally Posted by match000
that's the bug i mentioned in my post above. you have to have an if statement for the PRINT statement also, do something similar to what i have in my code to fix this

basically you are not cathcing the case where there is a userID that has only ONE entry, and that one entry is a '-'..

Oops - good catch. Duplicating the if() with the regex is probably the right way to go here.

-RN

Quote:
Originally Posted by JStrider
robot parade I got the scripts running. and what do the use warnings/strict do?

your script is very close, only problem I see is when the field with the '-' in it is the first one for that ID it still includes it. this wouldnt have shown up using the little example dataset I provided.

I'm thinking that the way to go may to be have 2 different scripts. one that just deletes the rows with '-' in the query field. then after running it I run the one that my prof wrote to cluster it.

heres a link to where I downloaded the datasets from originally. http://www.gregsadetsky.com/aol-data/ its a really interesting dataset just to open up and look at and see what people are searching for.
Run 'perldoc warnings' and 'perldoc strict' for an in-depth explanation. Basically, together they warn you if you're doing 'unwise' things, like not initializing your variables with 'my', for instance. Experienced perl coders pretty much always use them. It just encourages better style, and fewer bugs.

I think match000's solution of testing for a '--' in the 'else' clause too is probably the best way to go there.

Last edited by robot_parade; 05-06-2007 at 09:55 AM.. Reason: Automerged Doublepost
robot_parade is offline  
 

Tags
file, manipulation, perltext


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -8. The time now is 11:26 AM.

Tilted Forum Project

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Search Engine Optimization by vBSEO 3.6.0 PL2
© 2002-2012 Tilted Forum Project

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360