read through sentences separated by line break and parse


Given that I have tokenized sentences separated with a linebreak, and I have 2 columns representing the actual and predicted tag for the tokens. I want to loop through each of these token and find out wrong predictions e.g. actual tag not equal to predicted tag

#word actual predicted James PERSON PERSON Washington PERSON LOCATION went O O home O LOCATION He O O took O O Elsie PERSON PERSON along O O >James Washington went home: Incorrect >He took Elsie along: Correct


In addition to my <a href="https://stackoverflow.com/a/23084050/846892" rel="nofollow">previous answer</a> I am using <a href="https://docs.python.org/2/library/functions.html#all" rel="nofollow">all()</a> and a list comprehension here:

from itertools import groupby d = {True: 'Correct', False: 'Incorrect'} with open('text1.txt') as f: for k, g in groupby(f, key=str.isspace): if not k: # Split each line in the current group at whitespaces data = [line.split() for line in g] # If for each line the second column is equal to third then `all()` will # return True. predicts_matched = all(line[1] == line[2] for line in data) print ('{}: {}'.format(' '.join(x[0] for x in data), d[predicts_matched]))


James Washington went home: Incorrect He took Elsie along: Correct


Python strings have powerful parsing functions you can use here. I did this using Python 3.3, but it should work with any other version as well.

thistext = '''James PERSON PERSON Washington PERSON LOCATION went O O home O LOCATION He O O took O O Elsie PERSON PERSON along O O ''' def check_text(text): lines = text.split('\n') correct = [True] #a bool wrapped in a list,we can modify it from a nested function words = [] def print_result(): if words: print( ' '.join(words), ": ", "Correct" if correct[0] else "Incorrect" ) #words.clear() del words[:] correct[0] = True for line in lines: if line.strip(): # check if the line is empty word, a, b = line.split() if a != b: correct[0] = False words.append(word) else: print_result(); print_result() check_text(thistext)


