Friday, September 4, 2015

Remove duplicate lines from a file using python


In case you have a file "input.txt" with duplicate lines and you would like to remove duplicate lines from it, and have the result put in "output.txt" all you have to do is execute
this python script, be careful and use the same indentation (space):


lines_seen = set() # holds lines already seen
outfile = open("out.txt", "w")
for line in open("input.txt", "r"):
    if line not in lines_seen: # not a duplicate
        outfile.write(line)
        lines_seen.add(line)
outfile.close()

This will execute in less than a 1 second, no matter how big is the file. Have a nice day.
Share:

4 comments:

  1. nice website.., very good information about python. i like this blog very much.
    Thanks for sharing
    python Training in chennai

    ReplyDelete
  2. Hi author your post has helped me to gain some extra knowledge thanks for posting. Please keep posting such helpful information. Python is the fastest growing in IT field. Keep sharing. Learn python with placement support reach us Python Training in Chennai

    ReplyDelete
  3. You may want to work with lists:lines_seen=[].

    ReplyDelete