85007

python regex in pyparsing

How do you make the below regex be used in pyparsing? It should return a list of tokens given the regex.

Any help would be greatly appreciated! Thank you!

<strong>python regex example in the shell:</strong>

>>> re.split("(\w+)(lab)(\d+)", "abclab1", 3) >>> ['', 'abc', 'lab', '1', '']

I tried this in pyparsing, but I can't seem to figure out how to get it right because the first match is being greedy, i.e the first token will be 'abclab' instead of two tokens 'abc' and 'lab'.

<strong>pyparsing example (high level, i.e non working code):</strong>

name = 'abclab1' location = Word(alphas).setResultsName('location') lab = CaselessLiteral('lab').setResultsName('environment') identifier = Word(nums).setResultsName('identifier') expr = location + lab + identifier match, start, end = expr.scanString(name).next() print match.asDict()

Answer1:

Pyparsing's classes are pretty much left-to-right, with lookahead implemented using explicit expressions like FollowedBy (for positive lookahead) and NotAny or the '~' operator (for negative lookahead). This allows you to detect a terminator which would normally match an item that is being repeated. For instance, OneOrMore(Word(alphas)) + Literal('end') will never find a match in strings like "start blah blah end", because the terminating 'end' will get swallowed up in the repetition expression in OneOrMore. The fix is to add negative lookahead in the expression being repeated: OneOrMore(~Literal('end') + Word(alphas)) + Literal('end') - that is, before reading another word composed of alphas, first make sure it is not the word 'end'.

This breaks down when the repetition is within a pyparsing class, like Word. Word(alphas) will continue to read alpha characters as long as there is no whitespace to stop the word. You would have to break into this repetition using something very expensive, like Combine(OneOrMore(~Literal('lab') + Word(alphas, exact=1))) - I say expensive because composition of simple tokens using complex Combine expressions will make for a slow parser.

You might be able to compromise by using a regex wrapped in a pyparsing Regex object:

>>> labword = Regex(r'(\w+)(lab)(\d+)') >>> print labword.parseString("abclab1").dump() ['abclab1']

This does the right kind of grouping and detection, but does not expose the groups themselves. To do that, add names to each group - pyparsing will treat these like results names, and give you access to the individual fields, just as if you had called setResultsName:

>>> labword = Regex(r'(?P<locn>\w+)(?P<env>lab)(?P<identifier>\d+)') >>> print labword.parseString("abclab1").dump() ['abclab1'] - env: lab - identifier: 1 - locn: abc >>> print labword.parseString("abclab1").asDict() {'identifier': '1', 'locn': 'abc', 'env': 'lab'}

The only other non-regex approach I can think of would be to define an expression to read the whole string, and then break up the parts in a parse action.

Answer2:

If you strip the subgroup sign(the parenthesis), you'll get the right answer:)

>>> re.split("\w+lab\d+", "abclab1") ['', '']

Recommend

  • How could simply calling Pitch() and Yaw() cause the camera to eventually Roll()?
  • Two very close regexes with lookahead assertions in Python - why does re.split() behave differently?
  • regexp incorreclty split: by commas but not within brackets (both ) and ] )
  • How to use regex negative lookahead
  • Simple regex for domain names
  • Get XML response value with GDataXML
  • SQL Server re-calculate or not?
  • Using same constraints in multiple classes
  • Use sed with regex and (
  • Xmonad multiple submap key combos
  • C# - Most efficient way to iterate through multiple arrays/list
  • MySQL performance when updating row with FK
  • Passing variable arguments using PowerShell's Start-Process cmdlet
  • Use tryCatch within R loop
  • netsh acl setting (need alternative method - registry settings?)
  • How to explicitly/implicitly implemented interface members in C++/CLI?
  • Casting between Interfaces and Classes
  • Is there some graphical way to create my own configuration file on SonarLint?
  • Javascript, Regex - I need to grab each section of a string contained in brackets
  • URLConnection doesn't work since API 10 and higher?
  • ASP.NET MVC 2 Preview 2 - display directory list rather than home/index
  • Why use database factory in asp.net mvc?
  • Can someone please explain to me in the most layman terms how to use EventArgs?
  • Is it possible to open regedit and navigate to straight to a specific key using process.start?
  • C++ pointer value changes with static_cast
  • uniform generation of points on 3D box
  • Marklogic : Query response time is very high
  • Remove final comma from string in vb.net
  • preg_replace Double Spaces to tab (\\t) at the beginning of a line
  • Listbox within Listbox and scrolling trouble in Windows Phone 7 Silverlight
  • Disable Enter in editText android
  • Jetty Server not starting: Unable to establish loopback connection
  • Debugging ASP.NET on a built-in web server suddenly stops
  • How to check if every primary key value is being referenced as foreign key in another table
  • Statically linking a C++ library to a C# process using CLI or any other way
  • using conditional logic : check if record exists; if it does, update it, if not, create it
  • python regex in pyparsing
  • Suggestions to manage Login/Logout transitions
  • Android Google Maps API OnLocationChanged only called once
  • How can I use threading to 'tick' a timer to be accessed by other threads?