72739

In LanguageTool, how do you create a dictionary and use it for spell checking?

How do you create a dictionary for spell checking with Language Tool? I'm not a Java programmer and it was the first time I saw LT.

Answer1:

Hello this is my experience in creating a dictionary for spell checking with Language Tool ! Hope you enjoy it.

Part 1: <strong>How to create the dictionary</strong>

You need:

• A .txt file with the dictionary inside

• An .info file specifying the info on how to set LT output file (It is already present in LT directory).

• LanguageTool standalone version

• Java 8

At the end of this section, you will have:

• a .dict file i.e. the file with your dictionary in a readable form for LT

<ol> <li>Install the LAST version of LT: https://languagetool.org/download/snapshots/?C=M;O=D</li> <li>Be sure to have your .txt in the right format (a) and encoding (b): a. 1 word par line b. UTF8 encoding</li> <li>In the command line write: a. java -cp languagetool.jar org.languagetool.tools.SpellDictionaryBuilder fr_FR -i path of the dictionary file -info path of the .info file -o path of the output file</li> </ol>

where:

i. fr_FR is the code related to the language of the dictionary

ii. –i it’s the parameter of the input file (your .txt)

iii. –info it’s the parameter of the .info file related to the dictionary. You can create it following these instructions (http://wiki.languagetool.org/hunspell-support - “Configuring the dictionary” section) or use the .info already present – if present – in \org\languagetool\resource\yourlanguage

iv. –o it’s the parameter for specifing where you wish to save the .dict output file

<hr>

Part 2: <strong>How to integrate the dictionary on LT for spell checking</strong>

You need:

• JDK 1.8 (http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)

• Maven (https://maven.apache.org/download.cgi)

• IDE for Java (JetBrains, Eclipse, etc.)

• .info file + .dict file (see part1)

• GitHub LanguageTool project (https://github.com/languagetool-org/languagetool)

<ol> <li>Set the JDK and Maven bin path (more info: https://maven.apache.org/install.html)</li> <li>Copy the .info and .dict files created on part1 in \languagetool-master\languagetool-language-modules\YourLanguage\src\main\resources\org\languagetool\resource\YourLanguage\hunspell</li> <li>Open with your IDE the java file called as the language of your dictionary (for ex. French.java) : </li> </ol>

a. Change HunspellNoSuggestionRule in YourLanguage.java to MorfologikYourLanguageSpellerRule

@Override public List<Rule> getRelevantRules(ResourceBundle messages) throws IOException { return Arrays.asList( new CommaWhitespaceRule(messages), new DoublePunctuationRule(messages), new GenericUnpairedBracketsRule(messages, Arrays.asList("[", "(", "{" /*"«", "‘"*/), Arrays.asList("]", ")", "}" /*"»", French dialog can contain multiple sentences. */ /*"’" used in "d’arm" and many other words */)), new MorfologikYourLanguageSpellerRule(messages, this), new UppercaseSentenceStartRule(messages, this), new MultipleWhitespaceRule(messages, this), new SentenceWhitespaceRule(messages), // specific to French: new CompoundRule(messages), new QuestionWhitespaceRule(messages) ); }

b. Create the new MorfologikYourLanguageSpellerRule.java in \languagetool-master\languagetool-language-modules\YourLanguage\src\main\java\org\languagetool\rules\YourLanguage :

/* LanguageTool, a natural language style checker * Copyright (C) 2012 Marcin Miłkowski (http://www.languagetool.org) * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 * USA */ package org.languagetool.rules.fr; import java.io.IOException; import java.util.ResourceBundle; import org.languagetool.Language; import org.languagetool.rules.spelling.morfologik.MorfologikSpellerRule; public final class MorfologikYourLanguageSpellerRule extends MorfologikSpellerRule { public static final String RULE_ID = "MORFOLOGIK_RULE_CODEOFYOURLANGUAGE"; /* for ex. Fr_FR for French */ private static final String RESOURCE_FILENAME = "PATH TO YOUR .DICT FILE"; public MorfologikFrenchSpellerRule(ResourceBundle messages, Language language) throws IOException { super(messages, language); } @Override public String getFileName() { return RESOURCE_FILENAME; } @Override public String getId() { return RULE_ID; } }

c. Go to \languagetool-master\ with your command line and write : Mvn package

d. See your results in \languagetool-master\languagetool-standalone\target\LanguageTool-3.4-SNAPSHOT\LanguageTool-3.4-SNAPSHOT.

Recommend

  • django-import-export resource definition for foreignkey field?
  • Specifing machine key in the web.config
  • How to get beacon receive new configuration
  • Play framework - pass multiple images in a post request
  • Sending file with POST request with HttpUnit
  • Merge multiple dataframe pandas
  • Limit the amount of results from mySQL conditionally?
  • Use of single quote and double 'single quote' in PL/SQL block
  • How to extract element-path from XMLType Node?
  • Definition of server-class machine changed recently?
  • Oracle: Using CTE with update clause
  • How to automate user interactive command in chef recipe
  • What is the best data type to store boolean values in a database
  • Syncronizing database from Active Directory
  • D3.js: Define ordinal axis from JSON data
  • web shop (shopping cart) on google app engine
  • How to set up precision attribute used by @Column annotation?
  • Any command in mysql equivalent to Oracle's autotrace for performance turning
  • Maven use Encrypted passwords in POM
  • Implementing Oracle.Web.dll, Oracle.DataAccess.dll 64-bit failed on Windows Server 2008 with IIS 7.5
  • How many Vertica Databases can run on a Host in the same time?
  • Using SWIG with a build system [closed]
  • How does the HEXTORAW() function work? What is the algorithm?
  • What does certain JVM do after loading ByteCode into memory?
  • PHP multiple file uploads
  • Updating product post meta data in admin meta box field
  • Portable JRE on Linux - possible?
  • In Java, how can I construct a File from a resource?
  • Cannot save model when using ember render helper
  • C#: Import/Export Settings into/from a File
  • Do query loads all the data in memory
  • Tomcat memory Leak
  • PLSQL: Get number of records updated vs inserted when a merge statement is used
  • JBoss External Properties Files in Classpath
  • How can I set a binding to a Combox in a UserControl?
  • AJAX Html Editor Extender upload image appearing blank
  • Launch Runnable Jar from Web Start
  • In LanguageTool, how do you create a dictionary and use it for spell checking?