Python Problem

Anyone have any idea why the following script doesn’t work?

*************************************
#!/usr/bin/env python3

# import modules used here — sys is a very standard one
import sys
#import re

# Gather our code in a main() function
def main():
infile = open(‘File1’, ‘r’)
outfile = open(‘File2’, ‘w’)
for line in infile:
line.replace(“

“,”test”)
#print(“Found string”)
outfile.write(line)

# Standard boilerplate to call the main() function to begin
# the program.
if __name__ == ‘__main__’:
main()
*************************************

When I run it, it simply copies the old file to the new one, without doing the replacement. I know that it’s seeing the pattern, because if I run the replace function in an if statement, it says that it found it.

29 thoughts on “Python Problem”

      1. Each time through the loop, the variable “line” will be ended by a single newline.

        But it looks like you are trying to replace a sequence of 3 newlines with the word “test”.

        I can’t see how the variable “line” will ever hold something with a sequence of 3 newlines.

        1. My goal is to remove from the file each and every instance of the string “<p class=”P1″> </p>”

          I don’t understand what you mean by “a sequence of three newlines.”

          1. Replace returns the string, which will (if of nonzero length at least?) evaluate as “true” as a boolean, so probably the replace just isn’t working.

            I’d try it in interactive mode. Also NB; if you want to define a string containing quotes, either wrap it in the other quote type or escape it (example to follow in subsequent comment).

          2. OK, probably html will eat this, but hopefully enough survives:

            [mearl@localhost ~]$ python
            Python 2.6.6 (r266:84292, Nov 22 2013, 12:16:22)
            [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
            Type “help”, “copyright”, “credits” or “license” for more information.
            >>> “blah blah blah blarg blah”.replace(“blarg”,”blart”)
            ‘blah blah blah blart blah’
            >>> ‘stuff ” stuff’.replace(”,””)
            ‘stuff ” stuff’
            >>> ‘stuff ” stuff’.replace(”,””)
            ‘stuff ” stuff’
            >>> quit()
            [mearl@localhost ~]$

          3. And it ate my HTML tags as expected. I think Peter below has it right, though. (Alternately you can wrap the string in single-quotes and then there’s no need to escape double-quotes).

            I do recommend using the interactive mode for this kinda stuff, it’s surprising how often you find you missed some silly thing about library usage that way.

          4. It wasn’t obvious to me how to do file operations in interactive mode, but I have almost zero experience with the language. I’m basically coming to it from perl.

  1. I’m struggling a bit because of the formatting, but it looks like you want to replace newlines with “test”? That’s going to be trouble, as the “for line in infile” is breaking the lines by newline. The other possibility is you are replacing “p” tags and the page HTML is confusing it. In which case, are you using ? Could it be that it has the same number of characters as “test” and it’s working?

  2. The other I might try is:
    old_line = line
    line = line.replace(whatever)
    if old_line != line:
      print “Replacing %s with %s” % (old_line, line)

    Make sure the pattern works in place, not just in general.

  3. // This is working for me

    #!/usr/bin/env python3

    import sys
    #import re

    # Gather our code in a main() function
    def main():
    infile = open(‘File1’, ‘r’)
    outfile = open(‘File2’, ‘w’)
    for line in infile:
    line = line.replace(“text”, “bar”)
    print(“Found string”)
    outfile.write(line)
    infile.close()
    outfile.close()
    # Standard boilerplate to call the main() function to begin
    # the program.
    if __name__ == ‘__main__’:
    main()

    changed indentation, especially for writing line
    switched to straight quotes (“” and ‘ ‘ )
    closed files

  4. remove undersores for spaces or tabs as desired

    def__main():
    ________infile__=__open(‘File1′,__’r’)
    ________outfile__=__open(‘File2′,__’w’)
    ________for__line__in__infile:
    ________________line__=__line.replace(“text”,__”bar”)
    ________________print(“Found__string”)
    ________________outfile.write(line)
    ________infile.close()
    ________outfile.close()
    ________#__Standard__boilerplate__to__call__the__main()__function__to__begin
    ________#__the__program.
    if____name____==__’__main__’:
    ________main()

      1. I indented “outfile.write(line)” to be part of the for loop.
        so each replaced line will be written after it’s checked.

        If this is not working, next step is checking the data to see if something is being greedy or converting to invisibles

  5. WordPress is an atrocious format for trying to send you Python code since it thinks it is free to screw with indention as it likes. I’ll send you an email of something I worked up quickly using Emacs.

    It’s not exactly what you are asking for but it works with files of simple characters that you should be able to mung to suit.

    Also I only have Python 2.6.6 here, but should be ok for you.

    Dave

  6. I see, your first string in replace() must have been mangled by the blog software. The quote characters in the string literal will need to be escaped. Try the following (it worked for me after a brief test):


    #!/usr/bin/env python3

    # import modules used here — sys is a very standard one
    import sys
    #import re

    # Gather our code in a main() function
    def main():
    infile = open('File1', 'r')
    outfile = open('File2', 'w')
    for line in infile:
    line = line.replace("<p class=\"P1\"> </p>","test")
    # print("Found string")
    outfile.write(line)

    # Standard boilerplate to call the main() function to begin
    # the program.
    if __name__ == "__main__":
    main()

    If you need to handle the case where there is arbitrary whitespace (rather than exactly one space) between the and tags in your files, you’ll need a regular expression.

    1. Is this missing the indentation? I still don’t understand the difference from mine. And I did escape the quotes with a backslash. I didn’t bother with regex because it does seem to be a single space. The string is an artifact of exporting a word document into HTML in Libre Office, and I’m trying to declutter the code.

      1. Conceivably the quotes in either your code or the files you’re working with are not the usual ASCII ones but other versions from the Unicode zoo. Did my version run okay after cutting and pasting both the code and File1 as a test? If so, squint at the quotes in your version. Just a guess.

  7. Check the time stamp on your outfile to make sure you’re actually still writing to it. Depending on how Python handles opening an already existing file for output, it might be returning an error code instead of writing to the file.

  8. You do know, don’t you, that Python was named after “Monty Python”? It’s true. So I regard it as a gag language.

  9. Python can do some really stupid things if you don’t get whitespace exactly right, such as mixing spaces and tabs, or not continuing indent whitespace correctly when you insert a blank line fore readability. If you’re code is visibly identical to code samples reported to work, but your code has problems, this would be my first suspect. A font (like monoid) that makes whitespace characters visible can save a lot of headaches there.

  10. WordPress is messing with the indentation of the code people are pasting here so it is impossible to know if that is ok.
    Like Peter Monta said str.replace() returns the changed string so you need to use that, not the unchanged string. Other than that it’s impossible to know without having an input file to test the code.
    Are you sure you are passing the right string you want to replace in the str.replace() arguments?

Comments are closed.