Challenging way of counting entries of a file dynamically


I am facing a strange question, which despite of trying many times, i am not able to find the logic and proper code to the problem.

I have a file in the format below:

aa:bb:cc dd:ee:ff 100 ---------->line1 aa:bb:cc dd:ee:ff 101 ---------->line2 dd:ee:ff aa:bb:cc 230 ---------->line3 dd:ee:ff aa:bb:cc 231 ---------->line4 dd:ee:ff aa:bb:cc 232 ---------->line5 aa:bb:cc dd:ee:ff 102 ---------->line6 aa:bb:cc dd:ee:ff 103 ---------->line7 aa:bb:cc dd:ee:ff 108 ---------->line8 dd:ee:ff aa:bb:cc 233 ---------->line9 gg:hh:ii jj:kk:ll 450 ---------->line10 jj:kk:ll gg:hh:ii 600 ---------->line11

My program should read the file line by line. In the first line and second line, corresponding column1 and column2 values are equal. Third column is the sequence number which is not the same for any two lines.<br /> Since line1 and line2 are same except, their sequence numbers differ by value of only 1, i should read those two lines first and write their count as 2 to an output file. If we observe, line 6 and line 7 are same as line 1 and line 2, having consecutive sequence numbers, but the line numbers line3, line4, line5 having different column 1 and column 2 entries came in between them. Hence lines(1&2) and lines(6&7) should not be grouped all together. So, in the output file, i should get result like 2 3 2 1 1 1 1. And one more thing is, lines 7 and 8 are differed by sequence number more than 1. Hence, line 8 should be counted as a separate entry, not together with lines 6 and 7 though lines 6,7,8 has same first two columns.<br /> I hope most people understood the question. If not, i will clarify anything on the question.<br /> As you can see this is a very complicated problem. I tried using dictionary as that is the only data structure i know, but no logic works. Please help me solve this problem.


with open("abc") as f: #read the first line and set the number from it as the value of `prev` num, col4 = next(f).rsplit(None,2)[-2:] #use `str.rsplit` for minimum splits prev = int(num) col4_prev = col4 count = 1 #initialize `count` to 1 for lin in f: num, col4 = lin.rsplit(None,2)[-2:] num = int(num) if num - prev == 1: #if current `num` - `prev` == 1 count+=1 # increment `count` prev = num # set `prev` = `num` else: print count,col4_prev #else print `count` or write it to a file count = 1 #reset `count` to 1 prev = num #set `prev` = `num` col4_prev = col4 if num - prev != 1: print count,col4


2 400 3 600 2 400 1 111 1 500 1 999 1 888

Where 'abc' contains:

aa:bb:cc dd:ee:ff 100 400 aa:bb:cc dd:ee:ff 101 400 dd:ee:ff aa:bb:cc 230 600 dd:ee:ff aa:bb:cc 231 600 dd:ee:ff aa:bb:cc 232 600 aa:bb:cc dd:ee:ff 102 400 aa:bb:cc dd:ee:ff 103 400 aa:bb:cc dd:ee:ff 108 111 dd:ee:ff aa:bb:cc 233 500 gg:hh:ii jj:kk:ll 450 999 jj:kk:ll gg:hh:ii 600 888


from collections import defaultdict results = defaultdict(int) for line in open("input_file.txt", "r"): columns = line.split(" ") key = " ".join(columns[:2]) results[key] += 1 with output_file = open("output_file.txt", "w"): for key, count in results: output_file.write("{0} -> {1}".format(key, count))


entries = open('filename.txt', 'r') prevLine = "" count = 1 for line in entries: if line == prevLine: count += 1 else: print count count = 1 prevLine = line

That should do it, here's an explanation: First you open the file then you loop over each line of the file for each line you compare it to the previous one if it is the same as the previous one, you add one to the matches counter if it is not the same, you print the output and reset the counter at the end of the loop you save your previous line


You could use <a href="http://docs.python.org/2/library/itertools.html#itertools.groupby" rel="nofollow">itertools.groupby()</a>...

from cStringIO import StringIO import itertools data = 'aa:bb:cc dd:ee:ff 100\n' \ 'aa:bb:cc dd:ee:ff 101\n' \ 'dd:ee:ff aa:bb:cc 230\n' \ 'dd:ee:ff aa:bb:cc 231\n' \ 'dd:ee:ff aa:bb:cc 232\n' \ 'aa:bb:cc dd:ee:ff 102\n' \ 'aa:bb:cc dd:ee:ff 103\n' \ 'aa:bb:cc dd:ee:ff 108\n' \ 'dd:ee:ff aa:bb:cc 233\n' \ 'gg:hh:ii jj:kk:ll 450\n' \ 'jj:kk:ll gg:hh:ii 600\n' sio = StringIO(data) print [len(list(g)) for k, g in itertools.groupby(sio, key=lambda x, c=itertools.count(): (x[:-5], int(x[-4:-1])-next(c)))]

...which prints...

[2, 3, 2, 1, 1, 1, 1]


  • Setting multiple field to awk variables at once
  • how to skip the next line if condition c++
  • Detox: how to test multiline TextInput
  • How do I use a JS variable within JSON?
  • Could annotation based and xml based configuration be used together in spring 2.5?
  • Multi color Polyline in google map v2 in android
  • Dynamic UI from JSON object
  • What is the difference between CacheStoreMode USE and REFRESH
  • How to have a difference in week units between two days (even if they're close but belong to di
  • Wiring top-level DAGs together
  • Compare two files and write to a new file but only output a few lines?
  • Get last insert id of Postgresql
  • Setting Unknown Array Boundaries and Loop
  • Is it possible to generate a unique numeric value for each row in an iSeries table without looping?
  • C++ - Is destructor called when a vector holds objects?
  • Groovy: Unexpected token “:”
  • Replace value with Factor in r data.table
  • How to access EntityManager inside Entity class in EJB3
  • R - Combining Columns to String Based on Logical Match
  • Repeat a vertical line on every page in Report Builder / SSRS
  • Is there a javascript serializer for JSON.Net?
  • Using $this when not in object context
  • Splitting given String into two variables - php
  • NetLogo BehaviorSpace - Measure runs using reporters
  • Is my CUDA kernel really runs on device or is being mistekenly executed by host in emulation?
  • Read text file and split every line in MSBuild
  • How do I fake an specific browser client when using Java's Net library?
  • How reduce the height of an mschart by breaking up the y-axis
  • Perl system calls when running as another user using sudo
  • Where to put my custom functions in Wordpress?
  • Build own AppleScript numerical error handling
  • Return words with double consecutive letters
  • how to add data labels for bar graph in matlab
  • InvalidAuthenticityToken between subdomains when logging in with Rails app
  • Buffer size for converting unsigned long to string
  • SQL merge duplicate rows and join values that are different
  • Why joiner is not used after Sequence generator or Update statergy
  • LevelDB C iterator
  • Can't mass-assign protected attributes when import data from csv file
  • Binding checkboxes to object values in AngularJs