67532

Remove duplicates entries from multiple text file in perl?

Question:

I am new to this site,need help to remove duplicate entries from multiple text file(in a loop).tried the below code but this is not removing duplicates for multiple files,however it is working for a single file.

Code :

my $file = "$Log_dir/File_listing.txt"; my $outfile = "$Log_dir/Remove_duplicate.txt";; open (IN, "<$file") or die "Couldn't open input file: $!"; open (OUT, ">$outfile") or die "Couldn't open output file: $!"; my %seen = (); { my @ARGV = ($file); # local $^I = '.bac'; while(<IN>){ print OUT $seen{$_}++; next if $seen{$_} > 1; print OUT ; } }

Thanks, arts

Answer1:

The errors in your script:

<ul><li>You overwrite (a new copy of) @ARGV with $file, so it can never have any more file arguments.</li> <li>...which doesn't matter, because you open the file handle before you assign to @ARGV, plus you do not loop around the arguments, you just have a block { ... } around the code that serves no purpose.</li> <li>%seen will contain dedupe data for all the files you open unless you reset it.</li> <li>You print the count $seen{$_} to the output file, which I am sure you don't need.</li> </ul><hr />

You could use the implicit open of @ARGV arguments using the diamond operator, but since you (probably) need to assign a proper output file name for each new file, that is an unwanted complication with such a solution.

use strict; use warnings; # always use these for my $file (@ARGV) { # loop over all file names my $out = "$file.deduped"; # create output file name open my $infh, "<", $file or die "$file: $!"; open my $outfh, ">", $out or die "$out: $!"; my %seen; while (<$infh>) { print $outfh $_ if !$seen{$_}++; # print if a line is never seen before } }

Note that using a lexically scoped %seen variable makes the script check for duplicates inside each individual file. If you move the variable outside the for loop, you will check for duplicates across <em>all</em> files. I am not sure which you prefer.

Answer2:

I think your File_listing.txt contains lines, some of which have multiple occurences? If that's the case, just use the bash shell:

sort --unique <File_listing.txt >Remove_duplicate.txt

Or, if you prefer Perl:

perl -lne '$seen{$_}++ and next or print;' <File_listing.txt >Remove_duplicate.txt

Recommend

  • Unable to Extract XML value from Oracle CBLOB
  • What is lua_len() alternative in Lua 5.1?
  • Call a macro with parameters : Python win32com API
  • JSON - slashes not escaping
  • PDO error when wrong host name
  • Randomizing -and remembering that randomisation- multiple choice questions in php
  • Getting short path in python
  • How to read piped content in C?
  • Excel's Macro-Recorder usage
  • ADO and msqli connections very slow
  • How to handle images sent by a mobile device?
  • java inputstream
  • chrome.tabs.executeScript only fires when the Developer Console is open
  • How to render a blob on a canvas element?
  • PHP buffered output depending on server setting?
  • How to use an array of arrays with array_map(…) in PHP?
  • FFmpeg Conversion Error
  • Java Scanner input dilemma. Automatically inputs without allowing user to type
  • Deserializing XML into class C#
  • Can I make an Android app that runs a web view in Chrome 39?
  • what is the difference between the asp.net mvc application and asp.net web application
  • Function pointer “assignment from incompatible pointer type” only when using vararg ellipsis
  • align graphs with different xlab
  • Return words with double consecutive letters
  • Matrix multiplication with MKL
  • LevelDB C iterator
  • Linking SubReports Without LinkChild/LinkMaster
  • python draw pie shapes with colour filled
  • Are Kotlin's Float, Int etc optimised to built-in types in the JVM? [duplicate]
  • Easiest way to encapsulate a HTML5 webpage into an android app?
  • Busy indicator not showing up in wpf window [duplicate]
  • costura.fody for a dll that references another dll
  • Reading document lines to the user (python)
  • Binding checkboxes to object values in AngularJs
  • Observable and ngFor in Angular 2
  • How to Embed XSL into XML
  • UserPrincipal.Current returns apppool on IIS
  • Conditional In-Line CSS for IE and Others?
  • Python/Django TangoWithDjango Models and Databases
  • java string with new operator and a literal