43113

How to break a string over multiple lines and preserve spaces in YAML?

Please note, that the question is similar like this one, but still different so that those answers won't solve my problem:

    <li>For insertion of control characters like e.g. \x08, it seems that I have to use double quotes ".</li> <li>All spaces needs to be preserved exactly as given. For line breaks I use explicitly \n.</li> </ul>

    I have some string data which I need to store in YAML, e.g.:

      <li>" This is my quite long string data "</li> <li>"This is my quite long string data"</li> <li>"This_is_my_quite_long_string_data"</li> <li>"Sting data\nwhich\x08contains control characters"</li> </ul>

      and need it in YAML as something like this:

      Key: " This is my" + " quite long " + " string data "

      This is no problem as long as I stay on a single line, but I don't know how to put the string content to multiple lines.

      YAML block scalar styles (>, |) won't help here, because they don't allow escaping and they even do some whitespace stripping, newline / space substitution which is useless for my case.

      Looks that the only way seems to be using double quoting " and backslashes \, like this:

      Key: "\ This is \ my quite \ long string data\ "

      Trying this in YAML online parser results in "This is my quite long string data" as expected.

      But it unfortunately fail if one of the "sub-lines" has leading space, like this:

      Key: "\ This is \ my quite\ long st\ ring data\ "

      This results in "This is my quitelong string data", removed the space between the words quite and long of this example. The only thing that comes to my mind to solve that, is to replace the first leading space of each sub-line by \x20 like this:

      Key: "\ This is \ my quite\ \x20long st\ ring data\ "

      As I'd chosen YAML to have a best possible human readable format, I find that \x20 a bit ugly solution. Maybe someone know a better approach?

      For keeping human readable, I also don't want to use !!binary for this.

      Answer1:

      Instead of \x20, you can simply escape the first non-indentation space on the line:

      Key: "\ This is \ my quite\ \ long st\ ring data\ "

      This works with multiple spaces, you only need to escape the first one.

      Answer2:

      You are right in your observation that control characters can only be represented in double quoted scalars.

      However the parser doesn't <strong>fail</strong> if the sub-lines (in YAML speak: continuation lines) have a leading space. It is your interpretation of the YAML standard that is incorrect. The standard explicitly states that for multi-line double quoted scalars:

      All leading and trailing white space characters are excluded from the content.

      So you can put as many spaces as you want before long as you want, it will not make a difference.

      The representer for double quoted scalars for Python (both in ruamel.yaml and PyYAML) always does represent newlines as \n. I am not aware of YAML representers in other languages where you have more control over this (and e.g. get double newlines to represent \n in your double quoted scalars). So you probably have to write your own representer.

      While writing a representer you can try to make the line breaking be smart, in that it minimizes the number of escaped spaces (by putting them between words on the same line). But especially on strings with a high double space to word ratio, combined with a small width to operate in, it will be hard (if not impossible) to do without escaped spaces.

      Such a representer should IMO first check if double quoting is necessary (i.e. there are control characters apart from newlines). If not, and there are newlines you are probably better of representing the string a block style literal scalar (for which spaces at the beginning or end of line are not excluded).

Recommend

  • Was there a way to create an object without a prototype prior to ES5?
  • Jackson Parser can't read backslash quotation marks in String
  • c# database connection string
  • Create an OpenGL ES 2.0 context on a “standard” Linux system
  • Why is the return value of Perl's system not what I expect?
  • Prolog: Multiplying 2 lists with 1 of them not instantiated?
  • use of phpmailer class
  • #1221 - Incorrect usage of UPDATE and ORDER BY
  • Matlab: Fundamental limitations of struct array?
  • How to check if variable is Scalar or Array before mxIsScalar was introduced?
  • Assign different values to cell arrays in MATLAB at once
  • How does one get div content line-by-line with Javascript?
  • Injecting content via Grunt task, depending on asp.net project build configuration
  • Using a rotation matrix opencv
  • DELETE FROM … reporting syntax error at or near “.”
  • How do I get width and height of my terminal with ioctl?
  • Is co-variance safe here?
  • Standard way for writing a debug mode in C++
  • Complex python3 csv scraper
  • Form Post with enctype = “multipart/form-data” causing parameters to not get passed
  • Simplifying the use of meshgrid in Matlab
  • Handling right-to-left/left-to-right override characters in user input
  • functions and when to use brackets/parenthesis
  • How to trick Node.js to load .js files as ES6 modules?
  • When interface inheritance in Java is useful?
  • How to set an entity field that does not exist on the table but does exists in the raw SQL as an ali
  • Is there a way to choose which files are displayed to the user via the standard OPENFILE dialogs?
  • Let a function return any type in C++ class
  • SAXReader not re-ecape characters
  • Django model inheritance, filtering models
  • bad substitution shell- trying to use variable as name of array
  • How do I open a C file with a relative path?
  • Splitting given String into two variables - php
  • How to make Safari send if-modified-since header?
  • XCode can't find symbols for a specific iOS library/framework project
  • 0x202A in filename: Why?
  • How to pass list parameters for each object using Spring MVC?
  • Setting background image for body element in xhtml (for different monitors and resolutions)
  • How does Linux kernel interrupt the application?
  • JaxB to read class hierarchy