How to get phrase tables from word alignments?


The output of my word alignment file looks as such:

I wish to say with regard to the initiative of the Portuguese Presidency that we support the spirit and the political intention behind it . In bezug auf die Initiative der portugiesischen Präsidentschaft möchte ich zum Ausdruck bringen , daß wir den Geist und die politische Absicht , die dahinter stehen , unterstützen . 0-0 5-1 5-2 2-3 8-4 7-5 11-6 12-7 1-8 0-9 9-10 3-11 10-12 13-13 13-14 14-15 16-16 17-17 18-18 16-19 20-20 21-21 19-22 19-23 22-24 22-25 23-26 15-27 24-28 It may not be an ideal initiative in terms of its structure but we accept Mr President-in-Office , that it is rooted in idealism and for that reason we are inclined to support it . Von der Struktur her ist es vielleicht keine ideale Initiative , aber , Herr amtierender Ratspräsident , wir akzeptieren , daß sie auf Idealismus fußt , und sind deshalb geneigt , sie mitzutragen . 0-0 11-2 8-3 0-4 3-5 1-6 2-7 5-8 6-9 12-11 17-12 15-13 16-14 16-15 17-16 13-17 14-18 17-19 18-20 19-21 21-22 23-23 21-24 26-25 24-26 29-27 27-28 30-29 31-30 33-31 32-32 34-33

How can I produce the phrase tables that are used by MOSES from this output?

In this pdf, it explains the consistent phrase extraction: <a href="http://www.inf.ed.ac.uk/teaching/courses/mt/lectures/phrase-model.pdf" rel="nofollow">http://www.inf.ed.ac.uk/teaching/courses/mt/lectures/phrase-model.pdf</a> but <strong>what is the algorithm to achieve the phrases</strong>? (slide 16-21)


The way to get a phrase table is to first extract the phrase table with the following algorithm from Philip Koehn's Statistical MT book, pp. 133:

<img alt="enter image description here" class="b-lazy" data-src="https://i.stack.imgur.com/1ffLD.png" data-original="https://i.stack.imgur.com/1ffLD.png" src="https://etrip.eimg.top/images/2019/05/07/timg.gif" />

Then estimate the probabilities for the phrases with their relative frequencies, i.e.

<img alt="enter image description here" class="b-lazy" data-src="https://i.stack.imgur.com/ux4wm.png" data-original="https://i.stack.imgur.com/ux4wm.png" src="https://etrip.eimg.top/images/2019/05/07/timg.gif" />

Note that there is an error in the original printed version of the book but it's addressed in the errata on line 4 of the extract() function.

Also see <a href="https://stackoverflow.com/questions/25109001/phrase-extraction-algorithm-for-statistical-machine-translation" rel="nofollow">Phrase extraction algorithm for statistical machine translation</a> for the details.


  • Count Query with joins needed
  • How to Parse data of next node in xml file
  • Suddenly my server cannot play php video “error loading player:…” i tried it on localhost it works
  • How to encode audio to AAC with profile FF_PROFILE_AAC_LOW
  • jQuery AJAX POST object to ASP.Net WebMethod having dynamic datattypes
  • lining up fieldset tags horizontally in css3
  • Excel - how to create a dependent drop down list that uses different levels?
  • How to add some known objects to ace editors syntax checker?
  • Reading a blob from MySQL with Java
  • Remove Custom Tooltip on mouseout Google Chart
  • How to get phrase tables from word alignments?
  • SQL Server - Temporal Table - Storage costs
  • Training None Intent in LUIS
  • Does an Android Device have to be rooted in order to telnet to it?
  • scss.erb ruby code not executing
  • Why the prototype can be retrieved but the __proto__ is undefined in JavaScript?
  • How to get the input from a text box on a webpage form
  • updating and compacting sqlite database in android
  • WPF DataGrid lost focus after row delete
  • How do I control the soft menu button in Honeycomb?
  • Inno Setup Search for specifc file on a CD, retrieve exact filepath and return value to [Files]-Sect
  • Hudson dependencies
  • web shop (shopping cart) on google app engine
  • Installing apk from within application in android
  • What does a hyphen at end of a term mean
  • Open an application in a space using applescripts
  • Can XOR be expressed using SKI combinators?
  • Where these are stored?
  • Synchronize windows folders
  • Sensibility of combined Maven/Ant+Ivy build management for dual platform Desktop/Android deployment?
  • Yii2: Finding file and getting path in a directory tree
  • Bootstrap (v3.3.4) glyphicons not displayed in IE when refresh page (F5)
  • Clear activity stack before launching another activity
  • Pycharm: Marking a folder as 'sources root' is not recursive for subfolders
  • Angular2 Response for preflight is invalid (redirect) from some GET requests
  • Looking for good analogy/examples for monitor verses semaphore
  • How do I configure context broker accept post requests from my remote sensor?
  • How to write order and limit within cakephp joins array
  • JavaScriptCore crash on iOS9
  • Unable to use reactive element in my shiny app