Replace string that contains CRLF?


I'm reformatting a file, and I want to perform the following steps:

<ol><li>Replace double CRLF's with a temporary character sequence ($CRLF$ or something)</li> <li>Remove all CRLF's in the whole file</li> <li>Go back and replace the double CRLF's.</li> </ol>

So input like this:

This is a paragraph of text that has been manually fitted into a certain colum width. This is another paragraph of text that is the same.

Will become

This is a paragraph of text that has been manually fitted into a certain colum width. This is another paragraph of text that is the same.

It seems this should be possible by piping the input through a few simple sed programs, but I'm not sure how to refer to CRLF in sed (to use in sed 's/<CRLF><CRLF>/$CRLF$/'). Or maybe there's a better way of doing this?


You can use sed to decorate all rows with a {CRLF} at end:

sed 's/$/<CRLF>/'

then remove all \r\n with tr

| tr -d "\r\n"

and then replace double CRLF's with \n

| sed 's/<CRLF><CRLF>/\n/g'

and remove leftover CRLF's.

There was an one-liner sed which did all this in a single cycle, but I can't seem to find it now.


Try the below:

cat file.txt | sed 's/$/ /;s/^ *$/CRLF/' | tr -d '\r\n' | sed 's/CRLF/\r\n'/

That's not quite the method you've given; what this does is the below:

<ol><li>Add a space to the end of each line.</li> <li>Replace any line that contains only whitespace (ie blank lines) with "CRLF".</li> <li>Deletes any line-breaking characters (both CR and LF).</li> <li>Replaces any occurrences of the string "CRLF" with a Windows-style line break.</li> </ol>

This works on Cygwin bash for me.


<h3>Redefine the Problem</h3>

It looks like what you're <em>really</em> trying to do is reflow your paragraphs and single-space your lines. There are a number of ways you can do this.

<h3>A Non-Sed Solution</h3>

If you don't mind using some packages outside coreutils, you could use some additional shell utilities to make this as easy as:

dos2unix /tmp/foo fmt -w0 /tmp/foo | cat --squeeze-blank | sponge /tmp/foo unix2dos /tmp/foo

Sponge is from the <em>moreutils</em> package, and will allow you to write the same file you're reading. The <em>dos2unix</em> (or alternatively the <em>tofrodos</em>) package will allow to convert your line endings back and forth for easier integration with tools that expect Unix-style line endings.


This might work for you (GNU sed):

sed ':a;$!{N;/\n$/{p;d};s/\r\?\n/ /;ba}' file


Am I missing why this is not easier?



sed -e s/\s+$/$'\r\n'/ < index.html > index_CRLF.html


remove CRLF... go unix:


sed -e s/\s+$/$'\n'/ < index_CRLF.html > index.html



