27292

StanfordNLP - ArrayIndexOutOfBoundsException at TokensRegexNERAnnotator.readEntries(TokensRegexNERAn

Question:

I want to identify following as SKILL using stanfordNLP's TokensRegexNERAnnotator.

AREAS OF EXPERTISE Areas of Knowledge Computer Skills Technical Experience Technical Skills

There are many more sequence of text like above.

Code -

Properties props = new Properties(); props.put("annotators", "tokenize, ssplit, pos, lemma, ner"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); pipeline.addAnnotator(new TokensRegexNERAnnotator("./mapping/test_degree.rule", true)); String[] tests = {"Bachelor of Arts is a good degree.", "Technical Skill is a must have for Software Developer."}; List tokens = new ArrayList<>(); // traversing each sentence from array of sentence. for (String txt : tests) { System.out.println("String is : " + txt); // create an empty Annotation just with the given text Annotation document = new Annotation(txt); pipeline.annotate(document); List<CoreMap> sentences = document.get(SentencesAnnotation.class); /* Next we can go over the annotated sentences and extract the annotated words, Using the CoreLabel Object */ for (CoreMap sentence : sentences) { for (CoreLabel token : sentence.get(TokensAnnotation.class)) { System.out.println("annotated coreMap sentences : " + token); // Extracting NER tag for current token String ne = token.get(NamedEntityTagAnnotation.class); String word = token.get(CoreAnnotations.TextAnnotation.class); System.out.println("Current Word : " + word + " POS :" + token.get(PartOfSpeechAnnotation.class)); System.out.println("Lemma : " + token.get(LemmaAnnotation.class)); System.out.println("Named Entity : " + ne); } }

My regex rule file is -

$SKILL_FIRST_KEYWORD = "/area of/|/areas of/|/technical/|/computer/|/professional/" $SKILL_KEYWORD = "/knowledge/|/skill/|/skills/|/expertise/|/experience/"

tokens = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation" }

{ ruleType: "tokens", pattern: ($SKILL_FIRST_KEYWORD + $SKILL_KEYWORD), result: "SKILL" }

I am getting ArrayIndexOutOfBoundsException error. I guess there is something wrong with my rule file. Can somebody please point me where am I making mistake?

Desired Output -

<strong>AREAS OF EXPERTISE - SKILL</strong>

<strong>Areas of Knowledge - SKILL</strong>

<strong>Computer Skills - SKILL</strong>

and so on.

Thanks in advance.

Answer1:

You should be using the TokensRegexAnnotator not the TokensRegexNERAnnotator.

You should review these threads for more info:

<a href="https://stackoverflow.com/questions/43447585/tokensregex-rules-to-get-correct-output-for-named-entities/43532621#43532621" rel="nofollow">TokensRegex rules to get correct output for Named Entities</a>

<a href="https://stackoverflow.com/questions/43521697/getting-output-in-the-desired-format-using-tokenregex" rel="nofollow">Getting output in the desired format using TokenRegex</a>

Answer2:

Above accepted Answer by @StanfordNLPHelp, helped me solve this problem. All credit goes to him/her.

I am just concluding how end code would look like to get output in desired format in the hope that it helps somebody.

First I changed in rule file

$SKILL_FIRST_KEYWORD = "/area of|areas of|Technical|computer|professional/" $SKILL_KEYWORD = "/knowledge|skill|skills|expertise|experience/"

Then in code

props.put("annotators", "tokenize, ssplit, pos, lemma, ner"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); for (String txt : tests) { System.out.println("String is : " + txt); // create an empty Annotation just with the given text Annotation document = new Annotation(txt); pipeline.annotate(document); List<CoreMap> sentences = document.get(SentencesAnnotation.class); Env env = TokenSequencePattern.getNewEnv(); env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE); env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE); CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor.createExtractorFromFiles(env, "test_degree.rules"); for (CoreMap sentence : sentences) { List<MatchedExpression> matched = extractor.extractExpressions(sentence); for(MatchedExpression phrase : matched){ // Print out matched text and value System.out.println("MATCHED ENTITY: " + phrase.getText() + " VALUE: " + phrase.getValue().get()); } } }

Recommend

  • Stanford NLP 3.9.0: Does using CoreEntityMention combine adjacent entity mentions?
  • How to use entitymentions annotator in stanford CoreNLP?
  • Finding a similar text present in string in python
  • Map one value to all values with a common relation Scala
  • Not able to translate tag inside div by i18next library
  • Stanford OpenIE with option openie.resolve_coref don't work
  • Css overflow text displayed in few lines without word break
  • Generate @Indexed annotation using Jaxb or HyperJaxb
  • How to extract the p-value for the slope from an ols object in R
  • Restrict execution of a Method with Java Annotations
  • Does Java EE 6 framework only for Web Application Or can I use it for Client Application as well
  • Mongo server accepts credentials from shell, but not from Java/Scala interface
  • Can I make `git merge` always conflict on file changes?
  • Git Merge Adds New File Instead of Conflict Markers - CONFLICT (rename/add)
  • function cannot execute on segment because it accesses relation
  • JMeter - using substring on a user variable
  • Change axis in polar plots in matlab to radians
  • Finding all the overlapping groups of dictionary keys
  • Net-ssh session timeout
  • Python: Scrapy CSV exports incorrectly?
  • Get attributes of existing SVG elements and bind as data with d3.js
  • Joining Model with Table Laravel 5.2
  • How to get rows of a dataframe that contain values between two other values?
  • Data Type of Columns in a List - R
  • Nanoseconds lost coming from MongoDB ISODate Object
  • Get an index of a sorted matrix
  • PhpStorm: annotation for inherited method return type?
  • How to separate filename from path? basename() versus preg_split() with array_pop()
  • How to extract a number from a string [duplicate]
  • help('modules') crashing? Not sure how to fix
  • How to use the resource module to measure the running time of a function?
  • Can XOR be expressed using SKI combinators?
  • Google Places API - Find a company's CID and LRD
  • How do I get HTML corresponding to current DOM tree?
  • Converting a WriteableBitmap image ToArray in UWP
  • JQuery Internet Explorer and ajaxstop
  • JSON response opens as a file, but I can't access it with JavaScript
  • Change an a tag attribute in JavaScript based on screen width
  • PHP: When would you need the self:: keyword?
  • Does armcc optimizes non-volatile variables with -O0?