87551

unicode and python issue (access to unicde code charts)

Question:

Yesterday i wrote the following function to convert integer to Persian :

def integerToPersian(number): listedPersian = ['۰','۱','۲','۳','۴','۵','۶','۷','۸','۹'] listedEnglish = ['0','1','2','3','4','5','6','7','8','9'] returnList = list() listedTmpString = list(str(number)) for i in listedTmpString: returnList.append(listedPersian[listedEnglish.index(i)]) return ''.join(returnList)

When you call it such as : integerToPersian(3455) , it return ۳۴۵۵, ۳۴۵۵ is equivalent to 3455 in Persian and Arabic language.When you read a number such as reading from databae, and want to show in widget, this function is very useful.

I downloaded codes charts of unicode from <a href="http://unicode.org" rel="nofollow">http://unicode.org</a> ,Because i need to wrote PersianToInteger('unicodeString') According to it should get utf-8 as parameter and utf-8 store 2 bytes,Also i'm newbie in pytho.

<strong>My questions</strong> are, how can store 2bytes? , how can utf8 store , how can split an unicode string to another format ? how can use unicode code charts?

<strong>Notes:</strong> I found to use int() built-in fuinction , but i couldn't use it.may be you can

Answer1:

You need to read the Python Unicode HOWTO for either Python <a href="http://docs.python.org/2/howto/unicode.html" rel="nofollow">2.x</a> or <a href="http://docs.python.org/3/howto/unicode.html" rel="nofollow">3.x</a>, as appropriate. But I can give you brief answers to your questions.

<blockquote>

My questions are, how can store 2bytes? how can utf8 store , how can split an unicode string to another format ?

</blockquote>

A unicode object holds characters; a bytes object holds bytes.

Note that in Python 2.x, str is the same thing as bytes; in 3.x, it's the same thing as unicode. And in both languages, a literal with neither a u nor a b prefix is a str. Since you didn't tell us whether you're using Python 2 or 3, I'll use explicit unicode and bytes, and u and b prefixes, everywhere.

You convert between them by picking an encoding (in this case, UTF-8) and using the encode and decode methods. For example:

>>> my_str = u'۰۱' >>> my_bytes = b'\xdb\xb0\xdb\xb1' >>> my_str.encode('utf-8') == my_bytes True >>> my_bytes.decode('utf-8') == my_str True

If you have a UTF-8 bytes object, you should decode it to unicode as early as possible, and do all your work with it in Unicode. Then you don't have to worry about how many bytes something takes, just treat each character as a character. If you need UTF-8 output, encode back as late as possible.

(Very occasionally, the performance cost of decoding and encoding is too high, and you need to deal with UTF-8 directly. But unless that really is a bottleneck in your code, don't do it.)

So, let's say you wanted to adapt your integerToPersian to take a UTF-8 English digit string instead of an integer, and to return a UTF-8 Persian digit string instead of a Unicode one. (I'm assuming Python 3 for the purposes of this example.) All you need to do is change str(number) to number.decode('utf-8'), and change return ''.join(returnList) to return ''.join(returnList).encode('utf-8'), and that's it.

<blockquote>

how can use unicode code charts?

</blockquote>

Python already comes with the Unicode code charts (and the right ones to match your version of Python) compiled into the <a href="http://docs.python.org/3/library/unicodedata.html" rel="nofollow">unicodedata</a> module, so usually it's a lot easier to just use those than to try to use the charts yourself. For example:

>>> import unicodedata >>> unicodedata.digit(u'۱') 1 <hr /><blockquote>

… i need to wrote PersianToInteger('unicodeString')

</blockquote>

You really shouldn't need to. Unless you're using a very old Python, int should do it for you. For example, in 2.6:

>>> int(u'۱۱') 11

If it's not working for you, unicodedata is the easiest solution:

>>> numeral = u'۱۱' >>> [unicodedata.digit(ch) for ch in numeral] [1, 1]

However, either of these will convert digits in <em>any</em> script to a number, not just Persian. And there's nothing in the Unicode charts that will directly tell you that a digit is Persian; the best you can do is parse the name:

>>> all('ARABIC-INDIC DIGIT' in unicodedata.name(ch) for ch in numeral) True >>> all('ARABIC-INDIC DIGIT' in unicodedata.name(ch) for ch in '123') False <hr />

If you really want to do things in either direction by mapping digits from one script to another, here's a better solution:

listedPersian = ['۰','۱','۲','۳','۴','۵','۶','۷','۸','۹'] listedEnglish = ['0','1','2','3','4','5','6','7','8','9'] persianToEnglishMap = dict(zip(listedPersian, listedEnglish)) englishToPersianMap = dict(zip(listedEnglish, listedPersian)) def persianToNumber(persian_numeral): english_numeral = ''.join(persianToEnglishMap[digit] for digit in persial_numeral) return int(english_numeral)

Recommend

  • Angular Databinding doesnt Work
  • Submission of new app with iAds
  • Python: Split a String Field into 3 Separate Fields using Lambda
  • Xamarin PCLCrypto SHA256 give different hash
  • Iterate twice through a DataReader
  • Cordova Apache wrong module path
  • How to extract text from a PDF and decode characters?
  • how can I compare dates in array to find the earliest one?
  • Divide a $1 by 3 and adjusting 1 cent
  • jquery validation - waiting for remote check to complete
  • Why does java tzupdater add leap seconds?
  • How can I include If-None-Match header in HttpRequestMessage
  • How do I shift the decimal place in Python?
  • How do I display a dialog that asks the user multi-choice questıon using tkInter?
  • Sequential (transactional) API calls in angular 4 with state management
  • Android - Material Design - NavigationView - How to put vertical scroll?
  • Jquery UI tool tip close icon
  • Encrypt data by using a public key in c# and decrypt data by using a private key in php
  • Javascript Callbacks with Object constructor
  • SSO with signing and signature validation doesn't work
  • Font Awesome Showing Box instead of Icons
  • Rearranging Cells in UITableView Bug & Saving Changes
  • Properly structure and highlight a GtkPopoverMenu using PyGObject
  • AT Commands to Send SMS not working in Windows 8.1
  • Matrix multiplication with MKL
  • Windows forms listbox.selecteditem displaying “System.Data.DataRowView” instead of actual value
  • Proper folder structure for lots of source files
  • Benchmarking RAM performance - UWP and C#
  • Acquiring multiple attributes from .xml file in c#
  • How get height of the a view with gone visibility and height defined as wrap_content in xml?
  • FormattedException instead of throw new Exception(string.Format(…)) in .NET
  • How to CLICK on IE download dialog box i.e.(Open, Save, Save As…)
  • apache spark aggregate function using min value
  • How can I remove ASP.NET Designer.cs files?
  • Is it possible to post an object from jquery to bottle.py?
  • Sorting a 2D array using the second column C++
  • costura.fody for a dll that references another dll
  • Python/Django TangoWithDjango Models and Databases
  • java string with new operator and a literal
  • Net Present Value in Excel for Grouped Recurring CF