41255

Split file by number of lines and pattern in awk/perl

I need to split a file into chunks based on approximate number of lines (e.g. ~4 in the example, but thousands in reality), whilst each file has to start with a pattern that also occurs many times within each chunk.

A block needs to start with START and not end with START and be >3 lines long

Input file:

START LINE LINE START LINE LINE START LINE LINE LINE START LINE START LINE

Desired output files:

File 1

START LINE LINE START LINE LINE

File 2

START LINE LINE LINE

File 3

START LINE START LINE

The problem with the following code is that the 2nd occurrence of /^START/ is included at the end of file 1, when it should be at the start of file 2. I can't work out how get the file to output when the next</b> record is /^START/. There is no end pattern that I can use.

awk '/^START/{f=1} f{ print $0 > "file_"n ; c++} c>3 && /^START/ { n++; c=1; close("file_"n) }' c=1 n=1 file

An awk or perl solution would be much appreciated!

Answer1:

This produces the output that you want:

awk -v out=1 'NR>1 && ++i>3 && /^START/ {++out; i=0} {print > "file" out}' file

When all of the conditions are satisfied, increment out, which is part of the output filename.

Output:

$ cat file1 0 START 1 LINE 2 LINE 3 START 4 LINE 5 LINE $ cat file2 6 START 7 LINE 8 LINE 9 LINE $ cat file3 10 START 11 LINE 12 START 13 LINE

Recommend

  • using BufferedReader in Java
  • How to perform single factor ANOVA in R with samples organized by column?
  • Hibernate out of memory exception while processing large collection of elements
  • Radio button value not in $_POST
  • Detect which script causes server overload - apache + php
  • run Rmpi on cluster, specify library path
  • How to know what r is doing behind the scene
  • GitLab CI Runner, how to use volumes or mounts in service containers
  • Selectize dropdown showing all items
  • Automatic email sending with timer control
  • Can I add columns in a QListView in Qt?
  • UIPopoverController for iPhone
  • UpdateException when using SQL Server Compact with Entity Framework
  • SyntaxError: Unexpected token ' in JSON at position 1
  • Cassandra eats memory
  • Get process output without blocking
  • Need to display iframe when link is clicked from menu
  • Getting p-values from leave-one-out in R
  • Vim syntax highlighting
  • In Ember.js, what's the difference between store save and store commit?
  • Why can't pass only 1 coulmn to glmnet when it is possible in glm function in R?
  • Matlab Generating a Matrix
  • Error: java.util.Arrays$ArrayList cannot be cast to java.util.ArrayList
  • Debugging ASP.NET on a built-in web server suddenly stops
  • FB SDK and cURL: Unknown SSL protocol error in connection to graph.facebook.com:443
  • Using $this when not in object context
  • How do I fake an specific browser client when using Java's Net library?
  • How reduce the height of an mschart by breaking up the y-axis
  • Modifying destination and filename of gulp-svg-sprite
  • Perl system calls when running as another user using sudo
  • Deserializing XML into class C#
  • Function pointer “assignment from incompatible pointer type” only when using vararg ellipsis
  • using conditional logic : check if record exists; if it does, update it, if not, create it
  • python regex in pyparsing
  • Android Google Maps API OnLocationChanged only called once
  • python draw pie shapes with colour filled
  • How to Embed XSL into XML
  • How can I use threading to 'tick' a timer to be accessed by other threads?