Merge list of objects into consistent list based on common matching attribute in Python

I have a list of objects which I want to "compress" into a smaller list of objects based on a matching attribute (id) and optional class parameters.

class Case: def __init__(self, id, formtype, age, fever=None, cough=None, gender=None): self.case_id = case_id self.form_type = formtype self.age = age self.fever = fever self.cough = cough self.gender = gender caselist = [ Case(id="12345", formtype="A", age=12, fever=1, gender="female"), Case(id="12345", formtype="B", age=12, cough=0), Case(id="67890", formtype="A", age=34, fever=0, gender="male"), Case(id="67890", formtype="B", age=34, cough=1), Case(id="75321", formtype="A", age=2, fever=0, gender="male") ]

How do I get a new list that looks like this? It should choose formtype="B" over formtype="A".

compressed = [ Case("12345", "B", 12, 1, 1, "female"), Case("67890", "B", 34, 0, 1, "male"), Case("75321", "A", 2, 0, "male") ]

I tried to compress it with a dict with no luck:

compressed = [Case(id=case.id, formtype=None, age=case.age) for event in caselist if case.formtype == 'A']

Answer1:

Group by id and keep the objects that have a "B" form_type for duplicate id's that have a "B" formtype or else just leave as is, if you want to use any attributes not set in "B you can iterate over the attributes using getattr and setattr to set any previously unset attributes in B, you cannot hard code what to set or what not to set unless you know in advance what is set in A and/or what is set in B:

class Case: def __init__(self, id, formtype, age, fever=None, cough=None, gender=None): self.case_id = id self.form_type = formtype self.age = age self.fever = fever self.cough = cough self.gender = gender def __iter__(self): for ele in ["case_id", "form_type", "age", "fever", "cough", "gender"]: yield ele caselist = [ Case(id="12345", formtype="A", age=12, fever=1, gender="female"), Case(id="12345", formtype="B", age=12, cough=0), Case(id="67890", formtype="A", age=34, fever=0, gender="male"), Case(id="67890", formtype="B", age=34, cough=1), Case(id="75321", formtype="A", age=2, fever=0, gender="male") ] d = {} for c in caselist: if c.case_id not in d: d[c.case_id] = c elif d[c.case_id].form_type != "B" and c.form_type == "B": tmp = d[c.case_id] for attr in c: if getattr(c, attr) is None: setattr(c, attr, getattr(tmp, attr)) d[c.case_id] = c caselist[:] = d.values() print(caselist)

Answer2:

This is a quite a bit longer than what you where going for but this works. It creates seperate lists of A forms and B forms. It then loops over the B forms and looks for a matching A form. If it finds a match then it changes adds all the A values to the B form

def merge(acases, bcases): newlist = [] for b in bcases: for a in acases[:]: if b.id == a.id: if not b.cough: b.cough = a.cough if not b.fever: b.fever = a.fever if not b.gender: b.gender = a.gender newlist.append(b) acases.remove(a) newlist += acases return newlist caselist = [ Case(id="12345", formtype="A", age=12, fever=1, gender="female"), Case(id="12345", formtype="B", age=12, cough=0), Case(id="67890", formtype="A", age=34, fever=0, gender="male"), Case(id="67890", formtype="B", age=34, cough=1), Case(id="75321", formtype="A", age=2, fever=0, gender="male") ] acases = [case for case in caselist if case.formtype == 'A'] bcases = [case for case in caselist if case.formtype == 'B'] caselist = merge(acases, bcases) for i in caselist: print '{0} {1} {2} {3} {4} {5}'.format(i.id, i.formtype, i.age, i.cough, i.fever, i.gender) 12345 B 12 0 1 female 67890 B 34 1 0 male 75321 A 2 None 0 male

Here is another way to do it that is more efficient than my previous answer but not as efficient as @LeartS's answer. Both of these answers can handle different form layouts as well

def check_val(av, bv): if not bv: return av return bv caselist = [ Case(case_id="12345", form_type="A", age=12, cough = 0, gender="female"), Case(case_id="12345", form_type="B", age=12, fever=10), Case(case_id="67890", form_type="A", age=34, fever=0, gender="male"), Case(case_id="67890", form_type="B", age=34, cough=1), Case(case_id="75321", form_type="A", age=2, fever=0, gender="male") ] d={} caselist.sort(key=lambda x: x.form_type, reverse=True) for case in caselist: if case.case_id not in d and case.form_type == 'B': d[case.case_id] = case if case.form_type == 'A' and case.case_id in d: b = d[case.case_id] b.cough = check_val(case.cough, b.cough) b.fever = check_val(case.fever, b.fever) b.gender = check_val(case.gender, b.gender) else: d[case.case_id] = case

Answer3:

Sometime I think the trivial explicit approach is also the best one, I would simply go with this:

compressed_cases_dict = {} for case in caselist: if case.case_id not in compressed_cases_dict: compressed_cases_dict[case.case_id] = case else: if case.form_type == 'B': compressed_cases_dict[case.case_id].form_type = 'B' compressed_cases_dict[case.case_id].cough = case.cough else: compressed_cases_dict[case.case_id].fever = case.fever compressed_cases_dict[case.case_id].gender = case.gender # if we really want just a list cases = compressed_cases_dict.values()

Which, with your input, gives the output (after having defined a __str__ function for the Case class):

In [1]: [str(c) for c in cases] Out[1]: ['case_id: 67890, form_type: B, age: 34, fever: 0, cough: 1, gender: male', 'case_id: 12345, form_type: B, age: 12, fever: 1, cough: 0, gender: female', 'case_id: 75321, form_type: A, age: 2, fever: 0, cough: None, gender: male']

Note how for id 75321 it has cough None instead of 0, which I think is better because you don't have any information on the cough parameter for that id. (Also for id 12345 the correct cough parameter is 0, not 1. I assume it's a typo in your example output)

It also iterates the original caselist only once and uses a dictionary to have O(1) id lookup

人吐槽 人点赞

Recommend

Comment

用户名: 密码:
验证码: 匿名发表

你可以使用这些语言

查看评论:Merge list of objects into consistent list based on common matching attribute in Python