How to make a list of lists in Python when it has multiple separators?


The sample file looks like this (all on one line, wrapped for legibility):

['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n', '>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n', '>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n', '>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n','\n', '$$$\n', '\n', '>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n', '>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n', '>B5\n', 'TTCGTGGGTATT\n', '>B6\n','TTCGGGGGTATC\n', '>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n', '>B9\n', 'TTCGGGGGTATC\n','>B10\n', 'TTCGGGGGTATC\n', '>B42\n', 'TT-GTGGGTATC\n']

The $$$ separates the two sets. I need to use .strip function and remove the \n and all the "headers".

I need to make a list of lists (as below) and replace "-" with Z (again, all on one line; wrapped here for legibility):



Here is a variation of Moses Koledoye's answer which examines the first character for > and discards any matches as well as any empty elements. I also included replacing "-" with "Z".

lst = ['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n', '>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n', '>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n', '>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n','\n', '$$$\n', '\n', '>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n', '>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n', '>B5\n', 'TTCGTGGGTATT\n', '>B6\n','TTCGGGGGTATC\n', '>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n', '>B9\n', 'TTCGGGGGTATC\n','>B10\n', 'TTCGGGGGTATC\n', '>B42\n', 'TT-GTGGGTATC\n'] result = [[]] for x in lst: if x.startswith('>'): continue if x.startswith('$$$'): result.append([]) continue x = x.strip() if x: result[-1].append(x.replace("-", "Z")) print(result)

This avoids assigning any particular significance to the length of any element.


You can exploit the smaller length of the headers (and other unwanted items) as the criterion to filter them out. You start by creating a list containing one list and <em>appending</em> the items that pass the length test to the inner list.

A new sublist is added to the resulting list when the <em>separator</em> '$$$' is reached, and the length test is again used to add the remaining items to this new sublist:

lst = ['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n', '>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n', '>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n', '>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n','\n', '$$$\n', '\n', '>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n', '>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n', '>B5\n', 'TTCGTGGGTATT\n', '>B6\n','TTCGGGGGTATC\n', '>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n', '>B9\n', 'TTCGGGGGTATC\n','>B10\n', 'TTCGGGGGTATC\n','>B42\n', 'TT-GTGGGTATC\n'] result = [[]] for x in lst: if len(x) > 6: result[-1].append(x.strip()) if x.startswith('$$$'): result.append([]) print(result) # [['TCCGGGGGTATC', 'TCCGTGGGTATC', 'TCCGTGGGTATC', 'TCCGGGGGTATC', 'TCCGTGGGTATC', 'TCCGTGGGTATC', 'TCCGTGGGTATC', 'TCCGGGGGTATC'], ['ATCGGGGGTATT', 'TT-GTGGGAATC', 'TTCGTGGGAATC', 'TT-GTGGGTATC', 'TTCGTGGGTATT', 'TTCGGGGGTATC', 'TT-GTGGGTATC', 'TTCGGGGGAATC', 'TTCGGGGGTATC', 'TTCGGGGGTATC', 'TT-GTGGGTATC']]


  • CSS Random space between elements
  • NullPointerException/Help Reading LogCat
  • Setting Image Button via it's id into a 8x8 2d array. Android. Xamarin
  • How the preg_match handles the delimiter when \\Q..\\E used?
  • Move from one cell to another and count the number of rows in between two data
  • Why is the output of print in python2 and python3 different with the same string?
  • PHPExcel excel read is not working for some cells with calculation
  • coreImage iOS 4.3.3 crash
  • R add new column depending on values in a range in different columns
  • How to have the correct execution from a batch script?
  • How to debug after implementation? My code that works perfectly in simulation shows strange behaviou
  • How to avoid the redefinition of a function (PEP8 error F811)
  • make - define multiple variables in the same eval call
  • How can I put mac os x en1 interface into monitor mode to use with python3 scapy?
  • How to escape string for SQLite FTS query
  • REST API Designing Endpoints (Action/Verb => Noun/Resource )
  • .swf positionning is not contained on page, in chrome
  • not a valid l-value - verilog compiler error
  • How to negative match regex in JavaScript string replace? [duplicate]
  • Grails 3 - How to publish to Artifactory
  • Efficiently reading a csv file with windows newline on linux in Python
  • Merge the values of multiple elements and take the average of the attribute field
  • Using multiple input pipelines in TensorFlow
  • How to initialize context? [closed]
  • Granting permissions to Azure Active Directory Web Application automatically
  • Android: How to correctly use NotifyDataSetChanged with SimpleExpandableListAdapter?
  • In matplotlib, how do you change the fontsize of a single figure?
  • Entity Framework Code First TPC Inheritance Self-Referencing Child Class
  • Excel - Autoshape get it's name from cell (value)
  • Comma separated Values
  • How to delete a row from a dynamic generate table using jquery?
  • How to set the response of a form post action to a iframe source?
  • python draw pie shapes with colour filled
  • Are Kotlin's Float, Int etc optimised to built-in types in the JVM? [duplicate]
  • Running Map reduces the dimensions of the matrices
  • Reading document lines to the user (python)
  • Binding checkboxes to object values in AngularJs
  • Net Present Value in Excel for Grouped Recurring CF
  • jQuery Masonry / Isotope and fluid images: Momentary overlap on window resize
  • How to load view controller without button in storyboard?