27648

Regular expression extracting number dimension

Question:

I'm using python regular expressions to extract dimensional information from a database. The entries in that column look like this:

23 cm 43 1/2 cm 20cm 15 cm x 30 cm

What I need from this is only the width of the entry (so for the entries with an 'x', only the first number), but as you can see the values are all over the place.

From what I understood in the <a href="https://docs.python.org/3/library/re.html" rel="nofollow" title="documentation">documentation</a>, you can access the groups in a match using their position, so I was thinking I could determine the type of the entry based on how many groups are returned and what is found at each index.

The expression I used so far is ^(\d{2})\s?(x\s?(\d{2}))?(\d+/\d+)?$, however it's not perfect and it returns a number of useless groups. Is there something more efficient and appropriate?

<strong>Edit</strong>: I need the number from every line. When there is only one number, it is implied that only the width was measured (including any fractional components such as line 2). When there are two numbers, the height was also measured, but I only need the width which is the first number (such as in the last line)

Answer1:

try regex below, it will capture 1st digits and optional fractional come after it before the 1st 'cm'

import re regex = re.compile('(\d+.*?)\s?cm') # this will works for all your example data # or # this asserted whatever come after the 1st digit group must be fractional number only regex = re.compile('(\d+(?:\s+\d+\/\d+)?)\s?cm') >>> regex.match('23 cm').group(1) >>> '23' >>> regex.match('43 1/2 cm').group(1) >>> '43 1/2' >>> regex.match('20cm').group(1) >>> '20' >>> regex.match('15 cm x 30 cm').group(1) >>> '15'

<a href="https://regex101.com/r/FTZOIb/1" rel="nofollow">regex101 demo</a>

Answer2:

Here's a sample of how to do it from a text file. It works for the provided data.

f = open("textfile.txt",r') for line in f : if 'x'in line: iposition = line.find('x') print(line[:iposition])

Answer3:

This regex should work (<a href="https://regex101.com/r/LhJl2V/1" rel="nofollow">Live Demo</a>)

^(\d+)(?:\s*cm\s+[xX])

Explanation

<ul><li>^(\d+) - capture at least one digit at the beginning of the line</li> <li>(?: - start non-capturing group</li> <li>\s* - followed by at least zero whitespace characters</li> <li>cm - followed by a literal c and m</li> <li>\s+ - followed by at least one whitespace character</li> <li>[xX] - followed by a literal x or X</li> <li>) - end non-capturing group</li> </ul>

You shouldn't need to bother matching the rest of the line.

Recommend

  • Displaying iOS iAds only to supported countries
  • Why does Redshift need to do a full table scan to find the max value of the DIST/SORT key?
  • How to determine the CCSID used in CPYFRMIMPF command?
  • Python adding lots of things to PATH. How do I stop?
  • CRASH: *** -[__NSArrayM objectAtIndex:]: index 4294967295 beyond bounds [0 .. 9]
  • How to discover Font Type?
  • Possible to “watch” both HAML and SASS at the same time?
  • NUnit 3.0 TestCase const custom object arguments
  • Plotting line graph with factors in R
  • Get specific string
  • uniform generation of points on 3D box
  • SharedPreferences or SQLite Database?
  • Caching attributes in superclass
  • Blackberry - Custom EditField Cursor
  • preg_replace Double Spaces to tab (\\t) at the beginning of a line
  • Can you perform a UNION without a subquery in SQLAlchemy?
  • Use of this Javascript
  • Is it possible to access block's scope in method?
  • Extracting HTML between tags
  • C++ Partial template specialization - design simplification
  • The plugin 'org.apache.maven.plugins:maven-jboss-as-plugin' does not exist or no valid ver
  • FFmpeg Conversion Error
  • MongoDB in PHP using aggregate to group by _id is null not working
  • Q promise. Difference between .when and .then
  • Body moving without any force applied? (Box2d)
  • How to rebase a series of branches?
  • SignalR .NET Client Invoke throws an exception
  • PHPUnit_Framework_TestCase class is not available. Fix… - Makegood , Eclipse
  • Why HTML5 Canvas with a larger size stretch a drawn line?
  • How to get next/previous record number?
  • Rearranging Cells in UITableView Bug & Saving Changes
  • Python: how to group similar lists together in a list of lists?
  • WPF Applying a trigger on binding failure
  • Proper way to use connect-multiparty with express.js?
  • How can I get HTML syntax highlighting in my editor for CakePHP?
  • How do I configure my settings file to work with unit tests?
  • Android Google Maps API OnLocationChanged only called once
  • IndexOutOfRangeException on multidimensional array despite using GetLength check
  • Binding checkboxes to object values in AngularJs
  • Conditional In-Line CSS for IE and Others?