67836

Question:
Given that I have tokenized sentences separated with a linebreak, and I have 2 columns representing the actual and predicted tag for the tokens. I want to loop through each of these token and find out wrong predictions e.g. actual tag not equal to predicted tag
#word actual predicted
James PERSON PERSON
Washington PERSON LOCATION
went O O
home O LOCATION
He O O
took O O
Elsie PERSON PERSON
along O O
>James Washington went home: Incorrect
>He took Elsie along: Correct
Answer1:In addition to my <a href="https://stackoverflow.com/a/23084050/846892" rel="nofollow">previous answer</a> I am using <a href="https://docs.python.org/2/library/functions.html#all" rel="nofollow">all()
</a> and a list comprehension here:
from itertools import groupby
d = {True: 'Correct', False: 'Incorrect'}
with open('text1.txt') as f:
for k, g in groupby(f, key=str.isspace):
if not k:
# Split each line in the current group at whitespaces
data = [line.split() for line in g]
# If for each line the second column is equal to third then `all()` will
# return True.
predicts_matched = all(line[1] == line[2] for line in data)
print ('{}: {}'.format(' '.join(x[0] for x in data), d[predicts_matched]))
<strong>Output:</strong>
James Washington went home: Incorrect
He took Elsie along: Correct
Answer2:Python strings have powerful parsing functions you can use here. I did this using Python 3.3, but it should work with any other version as well.
thistext = '''James PERSON PERSON
Washington PERSON LOCATION
went O O
home O LOCATION
He O O
took O O
Elsie PERSON PERSON
along O O
'''
def check_text(text):
lines = text.split('\n')
correct = [True] #a bool wrapped in a list,we can modify it from a nested function
words = []
def print_result():
if words:
print( ' '.join(words), ": ", "Correct" if correct[0] else "Incorrect" )
#words.clear()
del words[:]
correct[0] = True
for line in lines:
if line.strip(): # check if the line is empty
word, a, b = line.split()
if a != b:
correct[0] = False
words.append(word)
else:
print_result();
print_result()
check_text(thistext)