Python save file to csv


I have the following code that gets in Twitter tweets and should process the data and after that save into a new file.

This is the code:

#import regex import re #start process_tweet def processTweet(tweet): # process the tweets #Convert to lower case tweet = tweet.lower() #Convert www.* or https?://* to URL tweet = re.sub('((www\.[\s]+)|(https?://[^\s]+))','URL',tweet) #Convert @username to AT_USER tweet = re.sub('@[^\s]+','AT_USER',tweet) #Remove additional white spaces tweet = re.sub('[\s]+', ' ', tweet) #Replace #word with word tweet = re.sub(r'#([^\s]+)', r'\1', tweet) #trim tweet = tweet.strip('\'"') return tweet #end #Read the tweets one by one and process it input = open('withoutEmptylines.csv', 'rb') output = open('editedTweets.csv','wb') line = input.readline() while line: processedTweet = processTweet(line) print (processedTweet) output.write(processedTweet) line = input.readline() input.close() output.close()

My data in the input file looks like this, so each tweet in one line:

She wants to ride my BMW the go for a ride in my BMW lol http://t.co/FeoNg48AQZ BMW Sees U.S. As Top Market For 2015 i8 http://t.co/kkFyiBDcaP

my function is working good, but I am not happy with the output which looks like this:

she wants to ride my bmw the go for a ride in my bmw lol URL rt AT_USER Ðun bmw es mucho? yo: bmw. -AT_USER veeergaaa!. hahahahahahahahaha nos hiciste la noche caray!

so it puts everything in one row and not each tweet in one row as was the format in the input file.

Has someone an idea to get each tweet in one line?


With a example file like this:

tweet number one tweet number two tweet number three

This code:

file = open('tweets.txt') for line in file: print line

Produces this output:

tweet number one tweet number two tweet number three

Python is reading in the endlines just fine, but your script is replacing them via regular expression substitution.

this regex substitution:

tweet = re.sub('[\s]+', ' ', tweet)

Is converting all of your white space characters (e.g tabs and new lines) into single spaces.

Either add a endline onto the tweet before you output it, or modify your regex to not substitute endlines like so:

tweet = re.sub('[ ]+', ' ', tweet)

EDIT: I put my test substitution command in there. the suggestion has been fixed.


