Please note, that the question is similar like this one, but still different so that those answers won't solve my problem:
<li>For insertion of control characters like e.g.
\x08, it seems that I have to use double quotes
".</li> <li>All spaces needs to be preserved exactly as given. For line breaks I use explicitly
I have some string data which I need to store in YAML, e.g.:
" This is my quite long string data "</li> <li>
"This is my quite long string data"</li> <li>
"Sting data\nwhich\x08contains control characters"</li> </ul>
and need it in YAML as something like this:
Key: " This is my" + " quite long " + " string data "
This is no problem as long as I stay on a single line, but I don't know how to put the string content to multiple lines.
YAML block scalar styles (
|) won't help here, because they don't allow escaping and they even do some whitespace stripping, newline / space substitution which is useless for my case.
Looks that the only way seems to be using double quoting
" and backslashes
\, like this:
Key: "\ This is \ my quite \ long string data\ "
Trying this in YAML online parser results in
"This is my quite long string data" as expected.
But it unfortunately fail if one of the "sub-lines" has leading space, like this:
Key: "\ This is \ my quite\ long st\ ring data\ "
This results in
"This is my quitelong string data", removed the space between the words
long of this example. The only thing that comes to my mind to solve that, is to replace the first leading space of each sub-line by
\x20 like this:
Key: "\ This is \ my quite\ \x20long st\ ring data\ "
As I'd chosen YAML to have a best possible human readable format, I find that
\x20 a bit ugly solution. Maybe someone know a better approach?
For keeping human readable, I also don't want to use
!!binary for this.
\x20, you can simply escape the first non-indentation space on the line:
Key: "\ This is \ my quite\ \ long st\ ring data\ "
This works with multiple spaces, you only need to escape the first one.
You are right in your observation that control characters can only be represented in double quoted scalars.
However the parser doesn't <strong>fail</strong> if the sub-lines (in YAML speak: continuation lines) have a leading space. It is your interpretation of the YAML standard that is incorrect. The standard explicitly states that for multi-line double quoted scalars:
All leading and trailing white space characters are excluded from the content.
So you can put as many spaces as you want before
long as you want, it will not make a difference.
The representer for double quoted scalars for Python (both in ruamel.yaml and PyYAML) always does represent newlines as
\n. I am not aware of YAML representers in other languages where you have more control over this (and e.g. get double newlines to represent
\n in your double quoted scalars). So you probably have to write your own representer.
While writing a representer you can try to make the line breaking be smart, in that it minimizes the number of escaped spaces (by putting them between words on the same line). But especially on strings with a high double space to word ratio, combined with a small width to operate in, it will be hard (if not impossible) to do without escaped spaces.
Such a representer should IMO first check if double quoting is necessary (i.e. there are control characters apart from newlines). If not, and there are newlines you are probably better of representing the string a block style literal scalar (for which spaces at the beginning or end of line are not excluded).