July 3rd, 2003

Illuminati

Searching

Having spent far too long searching journals by hand, I created my first .NET program - which took a copy of my friends list (pasted in from my userinfo), removed all the commas and spaces and turned it into an array of names, and then opened a copy of the correct day for each of those names.

Which gave me a copy of Mozilla with 103 tabs in it. A tad slow, to say the least. I wonder if i can get Mozilla to overflow when it hits 30 people.

Still, I learned a fair bit putting my reading into practice.

Oh, for the sake of the archives - my first C# program:
Collapse )
Illuminati

One small step

The great leap is a Guardian article about the journey out of Africa and the change that happened in the species around that time.

By looking at a combination of clues from the human genome and archaeology we can trace two routes - one along the southern coast of Asia, which reached Australia around 50,000 years ago. Another route, inland via the Middle East, would lead to the settlement of Europe by around 35,000 years ago and to the Americas (via the Siberian arctic) 20,000 years later.


Fascinating stuff. Anyone recommend a good book on the subject?
Illuminati

Anti-Spam

Having seen numerous articles on Bayesian Spam Filtering, I decided to take a leap and see what it could do for me.

I snagged a copy of POPfile, followed the incredibly simple install instructions and downloaded my mail. during installation I'd given it the name of 4 "buckets" I wanted mail put into - spam, friends, livejournal and general.

I also told it that I didn't want it putting the name of the bucket at the start of the subject line (so all spam mails would have [spam] in the subject line), but to instead put the bucket into the email headers (where it wouldn't be seen). My email reader The Bat happily filters based on header information and I didn't see why other people should have to put up with subject lines being mangled in replies.

Of course, the first time I ran the software it had no idea where the emails should go, as I hadn't trained it at all. It therefore filed them all in bucket numer one, which happened to be "spam". I then opened the User Interface (web based and very easy to use) and went through all of the non-spam emails, telling it what group they should belong in. I also set up filters in The Bat to put emails from each bucket into their own folders.

Once I'd initially trained it on about 20 emails, it's managed to spot livejournal posts perfectly (they obviously have enough in common to be simply spottable), spam near perfectly (no false negatives, only the occasional false positive) and friends fairly well (Joe and Mike are now both recognised instantly as friends, and it'll learn the rest of you as I get emails from you).

It's not perfect (yet), but it's doing a darn good job and I'm confident that as I feed it more email it'll get better and better. Apparently I should only touch it when it categorises something incorrectly, so I happily predict that by this time next week I'll pretty much have forgotten it's there.

Oh, and for all you wierdos that don't use Windows - it's Perl based and installs on Macs and Linux...

Highly recommended