24336

Importing text to MySQL: strange format

Question:

I'm importing some data from a .txt file into a MySQL database table, using mysqlimport. It seems to import OK (no error messages) but looks very odd when displayed, and can't be searched as expected.

Here are the details. The original text file is saved in UTF-8, with records that look (in a text editor) like this. The second field includes line breaks:

WAR-16,52 ~~~~~ Lorem ipsum dolor sit. Lorem ipsum dolor sit. ~~~~~ ENDOFRECORD WAR-16,53~~~~~Lorem ipsum dolor sit. Lorem ipsum dolor sit. Lorem ipsum dolor sit. Lorem ipsum dolor sit. ~~~~~ ENDOFRECORD

The database table into which I am importing is very simple:

+-------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+---------------+------+-----+---------+-------+ | id | varchar(100) | YES | | NULL | | | text | varchar(5000) | YES | | NULL | | +-------+---------------+------+-----+---------+-------+ 2 rows in set (0.00 sec)

When I import the file, here's the command I use:

$ mysqlimport -u root db textfile.txt --fields-terminated-by="~~~~~" --lines-terminated-by="ENDOFTHELINE" --default-character-set='utf8' db.records_list: Records: 18778 Deleted: 0 Skipped: 0 Warnings: 18787

Here's what I see if I then ask MySQL to display the records:

mysql> select * from textfile; | W A R - 1 6 , 5 2 | L o r e m i p s u m d o l o r s i t . L o r e m i p s u m d o l o r s i t . (etc)

So, it looks as though spaces, or some strange encoding extras, are being added to the text.

And here's the problem with the database query:

mysql> select * from textfile where id like "%WAR%";

returns nothing; nor does adding spaces:

mysql> select * from textfile where id like "%W A R%";

Only this command returns anything

mysql> select * from textfile where id like "%W%";

Can anyone guess what might be happening? I feel like it must be an encoding problem, but I can't work it out.

------ UPDATE --------

OK, I've checked the database and connection encoding.

mysql> show variables like "character_set_%"; +--------------------------+----------------------------------------+ | Variable_name | Value | +--------------------------+----------------------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/local/mysql/share/mysql/charsets/ | +--------------------------+----------------------------------------+ 8 rows in set (0.01 sec)

And show table status says the table is latin1_swedish_ci.

I have re-saved the text file in "Western (Windows Latin 1)" (using TextEdit on Snow Leopard) and tried to import it using the same command as above. However I still have the same encoding problem.

I also tried, again with no luck:

<ul><li>creating a new table with UTF-8 and importing the existing file </li> <li>copying & pasting the text into another text file that I've previously imported fine, and trying to import that.</li> </ul>

Still totally baffled :(((

Answer1:

As noted in your duplicate question, make sure that your table and connection are both using UTF-8. That is one of the common sources of this issue.

<a href="http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html" rel="nofollow">http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html</a>

Recommend

  • Importing date-time data to excel
  • CouchDB security for Android Devices (and iOS)
  • Append Filename to RDD
  • How to get desired format from ArrayList
  • two ways of displaying a decimal
  • Reading in a seqence from a text file in Common Lisp
  • Globalized custom number formatting - Variable decimal points
  • Prevent focus to URL bar with CTRL + L
  • Why is it still possible to insert a foreign key that doesn't exist?
  • Best practice to eliminate magic numbers within a member function
  • How to plot large time series (thousands of administration times/doses of a medication)?
  • Elasticsearch script query involving root and nested values
  • JBoss External Properties Files in Classpath
  • How can I extract results of aggregate queries in slick?
  • Needing to do .toArray() to get output of mongodb .find() on key name not value
  • htaccess add www if not subdomain, if subdomain remove www
  • Alert pop up with LWUIT
  • MongoDB in PHP using aggregate to group by _id is null not working
  • Why value captured by reference in lambda is broken? [duplicate]
  • How can I send an e-mail from a vbs script
  • Sails.js/waterline: Executing waterline queries in toJSON function of a model?
  • Jenkins: How To Build multiple projects from a TFS repository?
  • How do I fake an specific browser client when using Java's Net library?
  • Can Jackson SerializationFeature be overridden per field or class?
  • vba code to select only visible cells in specific column except heading
  • Do I've to free mysql result after storing it?
  • Apache 2.4 - remove | delete | uninstall
  • Run Powershell script from inside other Powershell script with dynamic redirection to file
  • Transpose CSV data with awk (pivot transformation)
  • How to set the response of a form post action to a iframe source?
  • Why joiner is not used after Sequence generator or Update statergy
  • how does django model after text[] in postgresql [duplicate]
  • Change div Background jquery
  • Turn off referential integrity in Derby? is it possible?
  • Qt: Run a script BEFORE make
  • Recursive/Hierarchical Query Using Postgres
  • Sorting a 2D array using the second column C++
  • reshape alternating columns in less time and using less memory
  • How can i traverse a binary tree from right to left in java?
  • UserPrincipal.Current returns apppool on IIS