4202

How to make a list of lists in Python when it has multiple separators?

Question:

The sample file looks like this (all on one line, wrapped for legibility):

['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n', '>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n', '>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n', '>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n','\n', '$$$\n', '\n', '>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n', '>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n', '>B5\n', 'TTCGTGGGTATT\n', '>B6\n','TTCGGGGGTATC\n', '>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n', '>B9\n', 'TTCGGGGGTATC\n','>B10\n', 'TTCGGGGGTATC\n', '>B42\n', 'TT-GTGGGTATC\n']

The $$$ separates the two sets. I need to use .strip function and remove the \n and all the "headers".

I need to make a list of lists (as below) and replace "-" with Z (again, all on one line; wrapped here for legibility):

[['TCCGGGGGTATC','TCCGTGGGTATC','TCCGTGGGTATC', 'TCCGGGGGTATC', 'TCCGTGGGTATC',CGTGGGTATC','TCCGTGGGTATC', 'TCCGGGGGTATC'], ['ATCGGGGGTATT', 'TT-GTGGGAATC','TTCGTGGGAATC', 'TT-GTGGGTATC', 'TTCGTGGGTATT', 'TTCGGGGGTATC','TT-GTGGGTATC', 'TTCGGGGGAATC', 'TTCGGGGGTATC', 'TTCGGGGGTATC','TT-GTGGGTATC]]

Answer1:

Here is a variation of Moses Koledoye's answer which examines the first character for > and discards any matches as well as any empty elements. I also included replacing "-" with "Z".

lst = ['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n', '>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n', '>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n', '>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n','\n', '$$$\n', '\n', '>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n', '>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n', '>B5\n', 'TTCGTGGGTATT\n', '>B6\n','TTCGGGGGTATC\n', '>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n', '>B9\n', 'TTCGGGGGTATC\n','>B10\n', 'TTCGGGGGTATC\n', '>B42\n', 'TT-GTGGGTATC\n'] result = [[]] for x in lst: if x.startswith('>'): continue if x.startswith('$$$'): result.append([]) continue x = x.strip() if x: result[-1].append(x.replace("-", "Z")) print(result)

This avoids assigning any particular significance to the length of any element.

Answer2:

You can exploit the smaller length of the headers (and other unwanted items) as the criterion to filter them out. You start by creating a list containing one list and <em>appending</em> the items that pass the length test to the inner list.

A new sublist is added to the resulting list when the <em>separator</em> '$$$' is reached, and the length test is again used to add the remaining items to this new sublist:

lst = ['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n', '>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n', '>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n', '>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n','\n', '$$$\n', '\n', '>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n', '>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n', '>B5\n', 'TTCGTGGGTATT\n', '>B6\n','TTCGGGGGTATC\n', '>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n', '>B9\n', 'TTCGGGGGTATC\n','>B10\n', 'TTCGGGGGTATC\n','>B42\n', 'TT-GTGGGTATC\n'] result = [[]] for x in lst: if len(x) > 6: result[-1].append(x.strip()) if x.startswith('$$$'): result.append([]) print(result) # [['TCCGGGGGTATC', 'TCCGTGGGTATC', 'TCCGTGGGTATC', 'TCCGGGGGTATC', 'TCCGTGGGTATC', 'TCCGTGGGTATC', 'TCCGTGGGTATC', 'TCCGGGGGTATC'], ['ATCGGGGGTATT', 'TT-GTGGGAATC', 'TTCGTGGGAATC', 'TT-GTGGGTATC', 'TTCGTGGGTATT', 'TTCGGGGGTATC', 'TT-GTGGGTATC', 'TTCGGGGGAATC', 'TTCGGGGGTATC', 'TTCGGGGGTATC', 'TT-GTGGGTATC']]

Recommend

  • CSS Random space between elements
  • NullPointerException/Help Reading LogCat
  • Setting Image Button via it's id into a 8x8 2d array. Android. Xamarin
  • How the preg_match handles the delimiter when \\Q..\\E used?
  • Move from one cell to another and count the number of rows in between two data
  • Why is the output of print in python2 and python3 different with the same string?
  • PHPExcel excel read is not working for some cells with calculation
  • coreImage iOS 4.3.3 crash
  • R add new column depending on values in a range in different columns
  • How to have the correct execution from a batch script?
  • How to debug after implementation? My code that works perfectly in simulation shows strange behaviou
  • How to avoid the redefinition of a function (PEP8 error F811)
  • make - define multiple variables in the same eval call
  • How can I put mac os x en1 interface into monitor mode to use with python3 scapy?
  • How to escape string for SQLite FTS query
  • REST API Designing Endpoints (Action/Verb => Noun/Resource )
  • .swf positionning is not contained on page, in chrome
  • not a valid l-value - verilog compiler error
  • How to negative match regex in JavaScript string replace? [duplicate]
  • Grails 3 - How to publish to Artifactory
  • Efficiently reading a csv file with windows newline on linux in Python
  • Merge the values of multiple elements and take the average of the attribute field
  • Using multiple input pipelines in TensorFlow
  • How to initialize context? [closed]
  • Granting permissions to Azure Active Directory Web Application automatically
  • Android: How to correctly use NotifyDataSetChanged with SimpleExpandableListAdapter?
  • In matplotlib, how do you change the fontsize of a single figure?
  • Entity Framework Code First TPC Inheritance Self-Referencing Child Class
  • Excel - Autoshape get it's name from cell (value)
  • Comma separated Values
  • How to delete a row from a dynamic generate table using jquery?
  • How to set the response of a form post action to a iframe source?
  • python draw pie shapes with colour filled
  • Are Kotlin's Float, Int etc optimised to built-in types in the JVM? [duplicate]
  • Running Map reduces the dimensions of the matrices
  • Reading document lines to the user (python)
  • Binding checkboxes to object values in AngularJs
  • Net Present Value in Excel for Grouped Recurring CF
  • jQuery Masonry / Isotope and fluid images: Momentary overlap on window resize
  • How to load view controller without button in storyboard?