In python, is there a way to extract a embedded json string?


So I'm parsing a really big log file with some embedded json.

So I'll see lines like this

foo="{my_object:foo, bar:baz}" a=b c=d

The problem is that since the internal json can have spaces, but outside of the JSON, spaces act as tuple delimiters (except where they have unquoted strings . Huzzah for whatever idiot thought that was a good idea), I'm not sure how to figure out where the end of the JSON string is without reimplementing large portions of a json parser.

Is there a json parser for Python where I can give it '{"my_object":"foo", "bar":"baz"} asdfasdf', and it can return ({'my_object' : 'foo', 'bar':'baz'}, 'asdfasdf') or am I going to have to reimplement the json parser by hand?


Found a really cool answer. Use json.JSONDecoder's scan_once function

In [30]: import json In [31]: d = json.JSONDecoder() In [32]: my_string = 'key="{"foo":"bar"}"more_gibberish' In [33]: d.scan_once(my_string, 5) Out[33]: ({u'foo': u'bar'}, 18) In [37]: my_string[18:] Out[37]: '"more_gibberish'

Just be careful

In [38]: d.scan_once(my_string, 6) Out[38]: (u'foo', 11)

Match everything around it.

>>> re.search('^foo="(.*)" a=.+ c=.+$', 'foo="{my_object:foo, bar:baz}" a=b c=d').group(1) '{my_object:foo, bar:baz}'

Use shlex and json.

Something like:

import shlex import json def decode_line(line): decoded = {} fields = shlex.split(line) for f in fields: k, v = f.split('=', 1) if k == "foo": v = json.loads(v) decoded[k] = v return decoded

This does assume that the JSON inside the quotes is quoted properly.

Here's a short example program that uses the above:

import pipes testdict = {"hello": "world", "foo": "bar"} line = 'foo=' + pipes.quote(json.dumps(testdict)) + ' a=b c=d' print line print decode_line(line)

With output:

foo='{"foo": "bar", "hello": "world"}' a=b c=d {'a': 'b', 'c': 'd', 'foo': {u'foo': u'bar', u'hello': u'world'}}



  • How to explicitly get linear indices from arrayfire?
  • Memory placements of C-function
  • Azure function C#: Create or replace document in cosmos db on HTTP request
  • Byte Array to *Signed* Int
  • Generic Return Type Based on Class
  • gm stream stdout pipe throwing unhandled error
  • Rxjs Observable Lifecycle
  • “Backend not found” django social auth
  • How to use Sanitize on HTML Entity
  • Add missing rows within combinations of factors
  • Encoding/decoding PDP-11 assembly language to binary and hex?
  • What is an O(n) algorithm to pair two equally lengthed lists in order in place?
  • Merge of sorted lists with sized types
  • Was default_marker removed from mapbox-gl.js
  • Image insertion from SQL info
  • Storyboard iOS MBProgressHUD
  • How to add CKEditor RTE to typo3 Backend Module with the API?
  • Encounter error “IB API required” when IB API is installed
  • SQL function not working when trying to write table to non-default schema
  • How do I detect if an email client is configured on an Android device?
  • Tableview make specific cell or row editable
  • Launch Dash from Jupyter Notebook
  • Parallelization via JDBC - Pyspark - How does parallelization work using JDBC?
  • Facebook friend list in Facebook Android SDK 3.14
  • Django REST framework - HyperlinkedRelatedField with additional parameter
  • Multiple canvases (pages) in Fabric.js
  • Pick Out Specific Number from Array? [duplicate]
  • content must have a ListView whose id attribute is 'android.R.id.list'
  • Set SelectedIndex of ListView in FlipView_SelectionChanged event
  • Find angle of point on circle
  • How to define something in JavaScript [closed]
  • Neo4j…how to get a visual representation of my data?
  • How to call jQuery function in HTML returned by AJAX
  • Bitrate JWplayer
  • Accessing Arguments, Workflow Variables from custom activities