Loading and dumping multiple yaml files with ruamel.yaml (python)


Using python 2 (atm) and ruamel.yaml 0.13.14 (RedHat EPEL)

I'm currently writing some code to load yaml definitions, but they are split up in multiple files. The user-editable part contains eg.

users: xxxx1: timestamp: '2018-10-22 11:38:28.541810' << : *userdefaults xxxx2: << : *userdefaults timestamp: '2018-10-22 11:38:28.541810'

the defaults are stored in another file, which is not editable:

userdefaults: &userdefaults # Default values for user settings fileCountQuota: 1000 diskSizeQuota: "300g"

I can process these together by loading both and concatinating the strings, and then running them through merged_data = list(yaml.load_all("{}\n{}".format(defaults_data, user_data), Loader=yaml.RoundTripLoader)) which correctly resolves everything. (when not using RoundTripLoader I get errors that the references cannot be resolved, which is normal)

Now, I want to do some updates via python code (eg. update the timestamp), and for that I need to just write back the user part. And that's where things get hairy. I sofar haven't found a way to just write that yaml document, not both.


First of all, unless there are multiple documents in your defaults file, you don't have to use load_all, as you don't concatenate two documents into a multiple-document stream. If you had by using a format string with a document-end marker ("{}\n...\n{}") or with a directives-end marker ("{}\n---\n{}") your aliases would not carry over from one document to another, as per the YAML specification:


It is an error for an alias node to use an anchor that does not previously occur in the document.


The anchor has to be in the document, not just in the stream (which can consist of multiple documents).

<hr />

I tried some hocus pocus, pre-populating the already represented dictionary of anchored nodes:

import sys import datetime from ruamel import yaml def load(): with open('defaults.yaml') as fp: defaults_data = fp.read() with open('user.yaml') as fp: user_data = fp.read() merged_data = yaml.load("{}\n{}".format(defaults_data, user_data), Loader=yaml.RoundTripLoader) return merged_data class MyRTDGen(object): class MyRTD(yaml.RoundTripDumper): def __init__(self, *args, **kw): pps = kw.pop('pre_populate', None) yaml.RoundTripDumper.__init__(self, *args, **kw) if pps is not None: for pp in pps: try: anchor = pp.yaml_anchor() except AttributeError: anchor = None node = yaml.nodes.MappingNode( u'tag:yaml.org,2002:map', [], flow_style=None, anchor=anchor) self.represented_objects[id(pp)] = node def __init__(self, pre_populate=None): assert isinstance(pre_populate, list) self._pre_populate = pre_populate def __call__(self, *args, **kw): kw1 = kw.copy() kw1['pre_populate'] = self._pre_populate myrtd = self.MyRTD(*args, **kw1) return myrtd def update(md, file_name): ud = md.pop('userdefaults') MyRTD = MyRTDGen([ud]) yaml.dump(md, sys.stdout, Dumper=MyRTD) with open(file_name, 'w') as fp: yaml.dump(md, fp, Dumper=MyRTD) md = load() md['users']['xxxx2']['timestamp'] = str(datetime.datetime.utcnow()) update(md, 'user.yaml')

Since the PyYAML based API requires a class instead of an object, you need to use a class generator, that actually adds the data elements to pre-populate on the fly from withing yaml.load().

But this doesn't work, as a node only gets written out with an anchor once it is determined that the anchor is used (i.e. there is a second reference). So actually the first merge key gets written out as an anchor. And although I am quite familiar with the code base, I could not get this to work properly in a reasonable amount of time.

So instead, I would just rely on the fact that there is only one key that matches the first key of users.yaml at the root level of the dump of the combined updated file and strip anything before that.

import sys import datetime from ruamel import yaml with open('defaults.yaml') as fp: defaults_data = fp.read() with open('user.yaml') as fp: user_data = fp.read() merged_data = yaml.load("{}\n{}".format(defaults_data, user_data), Loader=yaml.RoundTripLoader) # find the key for line in user_data.splitlines(): line = line.split('# ')[0].rstrip() # end of line comment, not checking for strings if line and line[-1] == ':' and line[0] != ' ': split_key = line break merged_data['users']['xxxx2']['timestamp'] = str(datetime.datetime.utcnow()) buf = yaml.compat.StringIO() yaml.dump(merged_data, buf, Dumper=yaml.RoundTripDumper) document = split_key + buf.getvalue().split('\n' + split_key)[1] sys.stdout.write(document)

which gives:

users: xxxx1: <<: *userdefaults timestamp: '2018-10-22 11:38:28.541810' xxxx2: <<: *userdefaults timestamp: '2018-10-23 09:59:13.829978' <hr />

I had to make a virtualenv to make sure I could run the above with ruamel.yaml==0.13.14. That version is from the time I was still young (I won't claim to have been innocent). There have been over 85 releases of the library since then.

I can understand that you might not be able to run anything but Python2 at the moment and cannot compile/use a newer version. But what you really should do is install virtualenv (can be done using EPEL, but also without further "polluting" your system installation), make a virtualenv for the code you are developping and install the latest version of ruamel.yaml (and your other libraries) in there. You can also do that if you need to distribute your software to other systems, just install virtualenv there as well.

I have all my utilties under /opt/util, and managed <a href="https://pypi.org/project/virtualenvutils/" rel="nofollow">virtualenvutils</a> a wrapper around virtualenv.


For writing the user part, you will have to manually split the output of yaml.dump() multifile output and write the appropriate part back to users yaml file.

import datetime import StringIO import ruamel.yaml yaml = ruamel.yaml.YAML(typ='rt') data = None with open('defaults.yaml', 'r') as defaults: with open('users.yaml', 'r') as users: raw = "{}\n{}".format(''.join(defaults.readlines()), ''.join(users.readlines())) data = list(yaml.load_all(raw)) data[0]['users']['xxxx1']['timestamp'] = datetime.datetime.now().isoformat() with open('users.yaml', 'w') as outfile: sio = StringIO.StringIO() yaml.dump(data[0], sio) out = sio.getvalue() outfile.write(out.split('\n\n')[1]) # write the second part here as this is the contents of users.yaml


  • import tensorflow as tf gives errors even after installing without errors
  • Stop-AzureVM does not shutdown my Azure-VM (Runbook)
  • Custom “cellfilter” in Angular js
  • Converting VBScript's Imp Operator
  • ActiveMQ JMS XA Atomikos - Transaction not started error
  • How to solve singular matrices?
  • What are all the ways to import modules in Python?
  • Python's imp.reload() function is not working?
  • Removing Small Regions from a Binary Image
  • ImageJ jar file plugin shortcut creation - More help needed
  • set circle size and label size on venn diagram with matplotlib-venn
  • Re-apply layout of a dynamically added UserControl after calling ApplyResources
  • WPF Resize Window from Bottom to top
  • Python: Sort nested dictionary by value
  • Django mod-python error
  • VBA Filter Table and Copy Subset of Resulting Columns to Clipboard
  • Java regex vs XSD regex
  • Is it possible to run only subsets of a Boost unit test module?
  • Java programming task efficiency [duplicate]
  • SWIG ImportError: undefined symbol: _Py_RefTotal
  • GraphicsPath and OutOfMemoryException
  • console.log printing statements in the wrong order for learnyounode node.js tutorial
  • how to download csv with fusion charts in codeigniter
  • DML and Exception Handling - Oracle
  • Simple regex for domain names
  • Where in the relevant specification is it documented that some comments in a SQL script are, in fact
  • Does anyone have a Categorized XML Corpus Reader for NLTK?
  • WooCommerce hook after order is updated?
  • Return to second to last URL in MVC (return View with previous filter conditions applied)?
  • Neo4j: Legacy Indexes and auto index vs new label bases schema indexes
  • Scala using regex with or syntax in match case statement
  • How to add learning rate to summaries?
  • d3.js selection conditional rendering
  • matching similar elements in between two lists
  • Issue with routerLink directive
  • How to create CGPath from a SKSpriteNode in SWIFT
  • Avoid links criss cross / overlap in d3.js using force layout
  • How can I use Kendo UI with Razor?
  • R: gsub and capture
  • Error creating VM instance in Google Compute Engine