51886

Splitting a list by matching a regex to an element

Question:

I have a list that has some specific elements in it. I would like to split that list into 'sublists' or different lists based on those elements. For example:

test_list = ['a and b, 123','1','2','x','y','Foo and Bar, gibberish','123','321','June','July','August','Bonnie and Clyde, foobar','today','tomorrow','yesterday']

I would like to split into sublists if an element matches 'something and something':

new_list = [['a and b, 123', '1', '2', 'x', 'y'], ['Foo and Bar, gibberish', '123', '321', 'June', 'July', 'August'], ['Bonnie and Clyde, foobar', 'today', 'tomorrow', 'yesterday']]

So far I can accomplish this if there is a fixed amount of items after the specific element. For example:

import re element_regex = re.compile(r'[A-Z a-z]+ and [A-Z a-z]+') new_list = [test_list[i:(i+4)] for i, x in enumerate(test_list) if element_regex.match(x)]

Which is almost there, but there's not always exactly three elements following the specific element of interest. Is there a better way than just looping over every single item?

Answer1:

If you want a one-liner,

new_list = reduce(lambda a, b: a[:-1] + [ a[-1] + [ b ] ] if not element_regex.match(b) or not a[0] else a + [ [ b ] ], test_list, [ [] ])

will do. The <a href="https://www.python.org/dev/peps/pep-0020/" rel="nofollow">python way</a> would however be to use a more verbose variant.

I did some speed measurements on a 4 core i7 @ 2.1 GHz. The timeit module ran this code 1.000.000 times and needed 11.38s for that. Using groupby from the itertools module (Kasras variant from the other answer) requires 9.92s. The fastest variant is the verbose version I suggested, taking only 5.66s:

new_list = [[]] for i in test_list: if element_regex.match(i): new_list.append([]) new_list[-1].append(i)

Answer2:

You dont need regex for that , just use <a href="https://docs.python.org/2/library/itertools.html#itertools.groupby" rel="nofollow">itertools.groupby</a> :

>>> from itertools import groupby >>> from operator import add >>> g_list=[list(g) for k,g in groupby(test_list , lambda i : 'and' in i)] >>> [add(*g_list[i:i+2]) for i in range(0,len(g_list),2)] [['a and b, 123', '1', '2', 'x', 'y'], ['Foo and Bar, gibberish', '123', '321', 'June', 'July', 'August'], ['Bonnie and Clyde, foobar', 'today', 'tomorrow', 'yesterday']]

first we grouping the list by this lambda function lambda i : 'and' in i that finds the elements that have "and" in it ! and then we have this :

>>> g_list [['a and b, 123'], ['1', '2', 'x', 'y'], ['Foo and Bar, gibberish'], ['123', '321', 'June', 'July', 'August'], ['Bonnie and Clyde, foobar'], ['today', 'tomorrow', 'yesterday']]

so then we have to concatenate the 2 pairs of lists here that we use add operator and a list comprehension !

Recommend

  • Concatenate list elements that fall between list elements of certain value
  • python unique list based on item
  • Python return statement not running
  • Python: matching values from one list to the sequence of values in another list
  • Categories and SubCategories
  • Subsetting Data Frame into Multiple Data Frames in Pandas
  • How can I do a 301 redirect from http to https in Wildfly 8.2?
  • Sending dynamic email reminders in Ruby on Rails?
  • Changing Machine Type on Google Cloud
  • How do references in functions work?
  • wpf: update multiple controls via dispatcher
  • Remove characters after a specific character in column
  • Converting datatype Char to Nvarchar
  • Insert records if not exist SQL Server 2005
  • Getting unread count in Sent Folder using Google Apps Script - GMail
  • Accessing Rows In A LINQ Result Without A Foreach Loop?
  • Cypher - matching two different possible paths and return both
  • Can my PDF ping my server when it is opened?
  • Two Tables Serving as one Model in Rails
  • Hide HTML elements without javascript, only CSS
  • NUnit 3.0 TestCase const custom object arguments
  • Approximate Order-Preserving Huffman Code
  • ListItem.Attributes.Add not working
  • Copy to all folders batch file?
  • Groovy: Unexpected token “:”
  • How to have background script and something similar to a default popup?
  • Regex thinks I'm nesting, but I'm not
  • Read text file and split every line in MSBuild
  • Matplotlib draw Spline from multiple points
  • Why winpcap requires both .lib and .dll to run?
  • Return words with double consecutive letters
  • SVN: Merging two branches together
  • php design question - will a Helper help here?
  • Python: how to group similar lists together in a list of lists?
  • Qt: Run a script BEFORE make
  • Django query for large number of relationships
  • Busy indicator not showing up in wpf window [duplicate]
  • Net Present Value in Excel for Grouped Recurring CF
  • jQuery Masonry / Isotope and fluid images: Momentary overlap on window resize
  • How do I use LINQ to get all the Items that have a particular SubItem?