26194

Python save file to csv

Question:

I have the following code that gets in Twitter tweets and should process the data and after that save into a new file.

This is the code:

#import regex import re #start process_tweet def processTweet(tweet): # process the tweets #Convert to lower case tweet = tweet.lower() #Convert www.* or https?://* to URL tweet = re.sub('((www\.[\s]+)|(https?://[^\s]+))','URL',tweet) #Convert @username to AT_USER tweet = re.sub('@[^\s]+','AT_USER',tweet) #Remove additional white spaces tweet = re.sub('[\s]+', ' ', tweet) #Replace #word with word tweet = re.sub(r'#([^\s]+)', r'\1', tweet) #trim tweet = tweet.strip('\'"') return tweet #end #Read the tweets one by one and process it input = open('withoutEmptylines.csv', 'rb') output = open('editedTweets.csv','wb') line = input.readline() while line: processedTweet = processTweet(line) print (processedTweet) output.write(processedTweet) line = input.readline() input.close() output.close()

My data in the input file looks like this, so each tweet in one line:

She wants to ride my BMW the go for a ride in my BMW lol http://t.co/FeoNg48AQZ BMW Sees U.S. As Top Market For 2015 i8 http://t.co/kkFyiBDcaP

my function is working good, but I am not happy with the output which looks like this:

she wants to ride my bmw the go for a ride in my bmw lol URL rt AT_USER Ðun bmw es mucho? yo: bmw. -AT_USER veeergaaa!. hahahahahahahahaha nos hiciste la noche caray!

so it puts everything in one row and not each tweet in one row as was the format in the input file.

Has someone an idea to get each tweet in one line?

Answer1:

With a example file like this:

tweet number one tweet number two tweet number three

This code:

file = open('tweets.txt') for line in file: print line

Produces this output:

tweet number one tweet number two tweet number three

Python is reading in the endlines just fine, but your script is replacing them via regular expression substitution.

this regex substitution:

tweet = re.sub('[\s]+', ' ', tweet)

Is converting all of your white space characters (e.g tabs and new lines) into single spaces.

Either add a endline onto the tweet before you output it, or modify your regex to not substitute endlines like so:

tweet = re.sub('[ ]+', ' ', tweet)

EDIT: I put my test substitution command in there. the suggestion has been fixed.

Recommend

  • Unselect column after pasting data
  • How to shorten this expression using regex
  • How to make a dictionary from a text file with python
  • how to insert new rows with values in the same sheet of an excel file in java
  • How do I chomp a string if I have Perl 4?
  • Remove every nth element from swift array
  • How to negative match regex in JavaScript string replace? [duplicate]
  • Grails 3 - How to publish to Artifactory
  • Efficiently reading a csv file with windows newline on linux in Python
  • Selenium and Google - How do you use cookies?
  • How to remove comma or any characters from Python dataframe column name
  • SSH in Bash Script Messing Up File Read
  • Rails 3.2 from SQLite locally to Postgres on Heroku
  • python - calculate orthographic similarity between words of a list
  • Android: How to correctly use NotifyDataSetChanged with SimpleExpandableListAdapter?
  • C# Excel interop - how to test if interop object is still working and performing a task?
  • Textfile Structure (tables)
  • Could not find rake using whenever rails
  • Java Scanner input dilemma. Automatically inputs without allowing user to type
  • Using $this when not in object context
  • How do I fake an specific browser client when using Java's Net library?
  • How reduce the height of an mschart by breaking up the y-axis
  • Javascript Callbacks with Object constructor
  • Perl system calls when running as another user using sudo
  • Deserializing XML into class C#
  • what is the difference between the asp.net mvc application and asp.net web application
  • Function pointer “assignment from incompatible pointer type” only when using vararg ellipsis
  • Return words with double consecutive letters
  • Matrix multiplication with MKL
  • InvalidAuthenticityToken between subdomains when logging in with Rails app
  • SQL merge duplicate rows and join values that are different
  • LevelDB C iterator
  • python draw pie shapes with colour filled
  • Can't mass-assign protected attributes when import data from csv file
  • Django query for large number of relationships
  • Why is Django giving me: 'first_name' is an invalid keyword argument for this function?
  • Binding checkboxes to object values in AngularJs
  • How to Embed XSL into XML
  • How can I use `wmic` in a Windows PE script?
  • How to push additional view controllers onto NavigationController but keep the TabBar?