Email Problem?

I’ve got a file containing all my sent email for the past decade in date order. I want to find it for the years 2011 and 2012. My bash (or perl?) scripting skills are pretty rusty, but here is the pseudocode for what I need to do:

******************************

Open original text file read only

Open new file for writing

Set Printflag to “false”

For each line of file

   If match regular expression “Date: .* 2011” set Printflag to “true.”

   If match regular expression “Date: .* 2013” set Printflag to “false.”

   If Printflag, copy line to new file

Close both files.

******************************

Note that the regular expression may have to be a little more specific (like maybe include a match for [Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec] between “Date:” and the year). The format for a full line would be

“Date: Fri, 02 Oct 2009 14:41:31 -0400”

The idea is to scan the file until 2011 begins, start echoing lines to the new file, and stop doing it when 2013 begins. Maybe exit at that point so it doesn’t waste time or cycles going through the rest of it.

[Update a while later]

Never mind, I got the answer from Twitter.

[Saturday-morning update]

For those curious, here’s the solution to the problem:

$ perl -ne “print if /Date:.*2011/../Date:.*2013/” $oldfile >> $newfile

8 thoughts on “Email Problem?”

  1. Sounds like a job for grep.

    Don’t ask me the command like options for grep to treat the file by blocks rather than lines.

  2. You’d have to pipe grep into another grep to do want Rand wants because grep is stateless. Awk could do this too but nobody in their right mind wants to deal with awk syntax.

    Perl is the ideal language to do this in, it was designed specifically for these kinds of tasks. Thought about writing a script for Rand just for the fun of it but he already has a solution and I didn’t want to deal with debugging it.

  3. I do find it strange to talk about low-level file manipulation using the techniques of 40 years ago. Why not just open it in a word processor, cut and paste to another file, then print it from said word processor? If the file is small enough, you could even use Notepad or something.

    1. Probably because the file is enormous.

      I’ve written the code, its 16 lines of Perl and about 32 lines of test data included in the comments. I’ve emailed it to Rand because I can’t post code to WordPress comments because I can’t seem to get it (WordPress) to suppress format and indentation operations in the comments section, thereby truncating parts of the code, removing the proper indentation, etc.

      1. I’m curious how big the file was. The text portion of a Word file can only be 32mb, but that’s a lot of text (close to six million words in English). Some of the custom workgroup software I developed used code-generated server to server email messaging that could wind up in huge archives no one would ever see to meet legal requirements (because money and controlled substances were involved).

Comments are closed.