finding gappy sublists within a larger list


Let's say I have a list like this:

[['she', 'is', 'a', 'student'], ['she', 'is', 'a', 'lawer'], ['she', 'is', 'a', 'great', 'student'], ['i', 'am', 'a', 'teacher'], ['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']]

Now I have a list like this:

['she', 'is', 'student']

I want to query the larger list with this one, and return all the lists that contain the words within the query list in the same order. There might be gaps, but the order should be the same. How can I do that? I tried using the in operator but I don't get the desired output.


If all that you care about is that the words appear in order somehwere in the array, you can use a <a href="https://docs.python.org/3/library/collections.html#collections.deque" rel="nofollow">collections.deque</a> and <a href="https://docs.python.org/3/library/collections.html#collections.deque.popleft" rel="nofollow">popleft</a> to iterate through the list, and if the deque is emptied, you have found a valid match:

from collections import deque def find_gappy(arr, m): dq = deque(m) for word in arr: if word == dq[0]: dq.popleft() if not dq: return True return False

By comparing each word in arr with the first element of dq, we know that when we find a match, it has been found in the correct order, and then we popleft, so we now are comparing with the next element in the deque.

To filter your initial list, you can use a simple list comprehension that filters based on the result of find_gappy:

matches = ['she', 'is', 'student'] x = [i for i in x if find_gappy(i, matches)] # [['she', 'is', 'a', 'student'], ['she', 'is', 'a', 'great', 'student'], ['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']]


You can compare two lists, with a function like this one. The way it works is it loops through your shorter list, and every time it finds the next word in the long list, cuts off the first part of the longer list at that point. If it can't find the word it returns false.

def is_sub_sequence(long_list, short_list): for word in short_list: if word in long_list: i = long_list.index(word) long_list = long_list[i+1:] else: return False return True

Now you have a function to tell you if the list is the desired type, you can filter out all the lists you need from the 'list of lists' using a list comprehension like the following:

a = [['she', 'is', 'a', 'student'], ['she', 'is', 'a', 'lawer'], ['she', 'is', 'a', 'great', 'student'], ['i', 'am', 'a', 'teacher'], ['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']] b = ['she', 'is', 'student'] filtered = [x for x in a if is_sub_sequence(x,b)]

The list filtered will include only the lists of the desired type.


  • Friendly URL in MODX redirects to home page
  • c# - Calculations - RNG Game
  • When to use React “componentDidUpdate” method?
  • running two interdependent while loops in python?
  • Infinite loop code in C
  • Why toDataURL Does Not Get Canvas Content on Mobile?
  • While using ConcurrentQueue, trying to dequeue while looping through in parallel
  • Stream webcam via. Wowza streaming server
  • is uninitialized_copy/fill(In first, In last, For dest, A &a) an oversight in the c++ standard?
  • Proper way to add unescaped text from a field to a regex in postgres?
  • how to put include(phpfile) to variable?
  • jquery draggable stop event
  • undefined reference error due to use of static variables [duplicate]
  • Crafting a LINQ based solution to determine if a set of predicates are satisfied for a pair of colle
  • how to remove a div with same ids but display='block' and display='none' in JAVa
  • LIBRTMP Delphi: mapping of the DLL
  • Add spaces between words in spaceless string
  • How to use animated gif in Firemonkey?
  • matching similar elements in between two lists
  • Trying to string.Join an IList
  • Configure Spring's MappingJacksonHttpMessageConverter
  • How to use Windows Media Foundation with UWP without a topology
  • jQuery: How to AJAXify WordPress Search?
  • msbuild create itemgroup from property group
  • Updating both a ConcurrentHashMap and an AtomicInteger safely
  • Can't delete or rename original file after resizing
  • Mysql - How to search for 26 records that each begins with the letter of the alphabet?
  • Unable to decode certificate at client new X509Certificate2()
  • Declaring variable dynamically in VB.net
  • Extracting HTML between tags
  • MongoDB in PHP using aggregate to group by _id is null not working
  • Why HTML5 Canvas with a larger size stretch a drawn line?
  • Apache 2.4 and php-fpm does not trigger apache http basic auth for php pages
  • Join two tables and save into third-sql
  • Release, debug version and Authorization Google?
  • Rearranging Cells in UITableView Bug & Saving Changes
  • Is there a mandatory requirement to switch app.yaml?
  • Benchmarking RAM performance - UWP and C#
  • Angular 2 constructor injection vs direct access
  • IndexOutOfRangeException on multidimensional array despite using GetLength check