46104

regex to extract mentions in Twitter

Question:

I need to write a regex in python to extract mentions from Tweets.

My attempt:

regex=re.compile(r"(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9]+)")

It works fine for any mention like @mickey However, in mentions with underscores like @mickey_mouse, it only extracts @mickey.

How can I modify the regex for it to work in both cases?

Thank you

Answer1:

Add an underscore to the last set like this:

(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9_]+)

<a href="https://regex101.com/r/IVBuug/1" rel="nofollow">Regex101 Demo</a>

On a side note, <a href="https://support.twitter.com/articles/101299" rel="nofollow">Twitter Handle rules</a> allow you to have usernames starting with numbers & underscores as well. So to extract twitter handles a regex could be as simple as: @\w{1,15} <em>(allows characters, numbers and underscores and includes the 15 character limit)</em>. Will need some additional lookaheads/lookbehinds based on where the regex might be used.

Recommend

  • Problem with Session State in Internet Explorer!
  • Yii - Make a string usable in a URL or filename
  • Watir-webdriver timing out when asked if element is present?
  • Java ClassLoader Confusion
  • Writing a recursive function on lists in Haskell
  • Zend Framework 2 - Building a simple form with Validators
  • Issue with std::thread when using g++ in 32-bit MinGW 4.8.0
  • Prolog Query - Trying to understand how this result happens
  • overlapping appointments using the entity framework
  • jQueryMobile, Ajax Navigation, and MVC
  • calculating number of bytes of each row in an image
  • Regex for URL rewrite with optional query string parameters
  • Tools for understanding HTML layout
  • Why isn't my “Fizz Buzz” test in R working?
  • crash in __tcf_0
  • For loop with if condition on multiple R functions
  • hide missing dates from x-axis ggplot2
  • How to autopopulate a field in SugarCRM form
  • Regex to match a string not followed by anything
  • Do I need to seed any random number generator before using EVP_PKEY_keygen of OpenSSL?
  • Thread safety of a fluent like class using clone() and non final fields
  • OOP Javascript - Is “get property” method necessary?
  • Content-Length header not returned from Pylons response
  • Email format validation in mvc3 view
  • Master page gives error
  • Jenkins: How To Build multiple projects from a TFS repository?
  • Sony Xperia Z Tablet not found by adb
  • How to recover from a Spring Social ExpiredAuthorizationException
  • ILMerge & Keep Assembly Name
  • Redux, normalised entities and lodash merge
  • Large data - storage and query
  • How can I estimate amount of memory left with calling System.gc()?
  • WOWZA + RTMP + HTML5 Playback?
  • R: gsub and capture
  • jqPlot EnhancedLegendRenderer plugin does not toggle series for Pie charts
  • Comma separated Values
  • PHP: When would you need the self:: keyword?
  • Hits per day in Google Big Query
  • Why do underscore prefixed variables exist?
  • How to load view controller without button in storyboard?