45966

How to set define different styles for the same paragraph

Question:

I'm trying to convert html text to generate a word table. It works pretty well, and the created word file is correct, except the character styles.

This is my first try with Apache POI.

So far, I was able to detect new line (<br>) tags from text paragraph (see code below). But I'd like to also check a few other tags such as <b>, <li>, <font> and set the right run values for each part.

For example :<br /><b>This is my text <i> which now is in italic<b> but also in bold</b> depending on its importance</i></b>

I gess I should parse the text, and apply different runs for each part, but I don't know how to do.

private static XWPFParagraph getTableParagraph(XWPFTableCell cell, String text) { int fontsize= 11; XWPFParagraph paragraph = cell.addParagraph(); cell.removeParagraph(0); paragraph.setSpacingAfterLines(0); paragraph.setSpacingAfter(0); XWPFRun myRun1 = paragraph.createRun(); if (text==null) text=""; else { while (true) { int x = text.indexOf("<br>"); if (x <0) break; String work = text.substring(0,x ); text= text.substring(x+4); myRun1.setText(work); myRun1.addBreak(); } } myRun1.setText(text); myRun1.setFontSize(fontsize); return paragraph; }

Answer1:

While converting HTML text one never should go on the HTML using string methods only. XML as well as HTML are markup languages. Their content is markup and not only plain text. The markup needs to be traversed to get all the single nodes together with the meanings out of it. This traversing process never is trivial and so special libraries are there for. Deep inside those libraries also needs using string methods but those are wrapped into useful methods for traversing the markup.

For traversing HTML <a href="https://jsoup.org/" rel="nofollow">jsoup</a> may be used for example. Especially <a href="https://jsoup.org/apidocs/org/jsoup/select/NodeTraversor.html" rel="nofollow">NodeTraversor</a> using <a href="https://jsoup.org/apidocs/org/jsoup/select/NodeVisitor.html" rel="nofollow">NodeVisitor</a> is useful for traversing HTML.

My example creates a ParagraphNodeVisitor which implements NodeVisitor. This interface requests method public void head(Node node, int depth) which is called every time the NodeTraversor is on head of a node and public void tail(Node node, int depth) which is called every time the NodeTraversor is on tail of a node. In those methods the process for handling the single nodes can be implemented. In our case main part of the process is whether we need a new XWPFRun and what settings this run needs.

Example:

import java.io.FileOutputStream; import org.apache.poi.xwpf.usermodel.*; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Node; import org.jsoup.nodes.TextNode; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import org.jsoup.select.NodeVisitor; import org.jsoup.select.NodeTraversor; public class HTMLtoDOCX { private XWPFDocument document; public HTMLtoDOCX(String html, String docxPath) throws Exception { this.document = new XWPFDocument(); XWPFParagraph paragraph = null; Document htmlDocument = Jsoup.parse(html); Elements htmlParagraphs = htmlDocument.select("p"); for(Element htmlParagraph : htmlParagraphs) { System.out.println(htmlParagraph); paragraph = document.createParagraph(); createParagraphFromHTML(paragraph, htmlParagraph); } FileOutputStream out = new FileOutputStream(docxPath); document.write(out); out.close(); document.close(); } void createParagraphFromHTML(XWPFParagraph paragraph, Element htmlParagraph) { ParagraphNodeVisitor nodeVisitor = new ParagraphNodeVisitor(paragraph); NodeTraversor.traverse(nodeVisitor, htmlParagraph); } private class ParagraphNodeVisitor implements NodeVisitor { String nodeName; boolean needNewRun; boolean isItalic; boolean isBold; boolean isUnderlined; int fontSize; String fontColor; XWPFParagraph paragraph; XWPFRun run; ParagraphNodeVisitor(XWPFParagraph paragraph) { this.paragraph = paragraph; this.run = paragraph.createRun(); this.nodeName = ""; this.needNewRun = false; this.isItalic = false; this.isBold = false; this.isUnderlined = false; this.fontSize = 11; this.fontColor = "000000"; } @Override public void head(Node node, int depth) { nodeName = node.nodeName(); System.out.println("Start "+nodeName+": " + node); needNewRun = false; if ("#text".equals(nodeName)) { run.setText(((TextNode)node).text​()); needNewRun = true; //after setting the text in the run a new run is needed } else if ("i".equals(nodeName)) { isItalic = true; } else if ("b".equals(nodeName)) { isBold = true; } else if ("u".equals(nodeName)) { isUnderlined = true; } else if ("br".equals(nodeName)) { run.addBreak(); } else if ("font".equals(nodeName)) { fontColor = (!"".equals(node.attr("color")))?node.attr("color").substring(1):"000000"; fontSize = (!"".equals(node.attr("size")))?Integer.parseInt(node.attr("size")):11; } if (needNewRun) run = paragraph.createRun(); needNewRun = false; run.setItalic(isItalic); run.setBold(isBold); if (isUnderlined) run.setUnderline(UnderlinePatterns.SINGLE); else run.setUnderline(UnderlinePatterns.NONE); run.setColor(fontColor); run.setFontSize(fontSize); } @Override public void tail(Node node, int depth) { nodeName = node.nodeName(); System.out.println("End "+nodeName); if ("i".equals(nodeName)) { isItalic = false; } else if ("b".equals(nodeName)) { isBold = false; } else if ("u".equals(nodeName)) { isUnderlined = false; } else if ("font".equals(nodeName)) { fontColor = "000000"; fontSize = 11; } if (needNewRun) run = paragraph.createRun(); needNewRun = false; run.setItalic(isItalic); run.setBold(isBold); if (isUnderlined) run.setUnderline(UnderlinePatterns.SINGLE); else run.setUnderline(UnderlinePatterns.NONE); run.setColor(fontColor); run.setFontSize(fontSize); } } public static void main(String[] args) throws Exception { String html = "

<font size='32' color='#0000FF'><b>First paragraph.</font></b><br/>Just like a heading

" +"

This is my text <i>which now is in italic <b>but also in bold</b> depending on its <u>importance</u></i>.<br/>Now a <b><i><u>new</u></i></b> line starts <i>within <b>the same</b> paragraph</i>.

" +"

<b>Last <u>paragraph <i>comes</u> here</b> finally</i>.

" +"

But yet <u><i><b>another</i></u></b> paragraph having <i><font size='22' color='#FF0000'>special <u>font</u> settings</font></i>. Now default font again.

"; HTMLtoDOCX htmlToDOCX = new HTMLtoDOCX(html, "./CreateWordParagraphFromHTML.docx"); } }

Result:

<a href="https://i.stack.imgur.com/HsYH8.png" rel="nofollow"><img alt="enter image description here" class="b-lazy" data-src="https://i.stack.imgur.com/HsYH8.png" data-original="https://i.stack.imgur.com/HsYH8.png" src="https://etrip.eimg.top/images/2019/05/07/timg.gif" /></a>

Disclaimer: This is a working draft showing the principle. Neither it is fully ready nor it is code ready for use in productive environments.

Recommend

  • How to add a check box in a PDF file using iText 7?
  • How can I change the font size of annotations in iTextPdf?
  • phpdocx add image makes docx corrupt
  • IMagick check lightness image
  • Textview values does not update when data received from Arduino
  • Convert array of strings to array of objects
  • Does Perl currently (5.8 and 5.10) make any promises about the order alternations will be used?
  • Why is this code not working? Hangman
  • pyspark substring and aggregation
  • toLowerCase with special/unicode characters throws exception
  • How to protect the mp3 file from read or copy on Android?
  • Select column where another related column's total is 0
  • UIScrollView setContentOffset: animated: not working
  • creating instance of object using reflection , when the constructor takes an array of strings as arg
  • Replace last two characters in column
  • Alamofire and Reachability.swift not working on xCode8-beta5
  • Implementing “partial void” in VB
  • Convert Type Decimal to Hex (string) in .NET 3.5
  • What's the purpose of QString?
  • Play WS (2.2.1): post/put large request
  • QLineEdit password safety
  • swift auto completion not working in Xcode6-Beta
  • Finding past revisions of files in StarTeam w/ .NET SDK / C#
  • Deserializing XML into class C#
  • R: gsub and capture
  • AT Commands to Send SMS not working in Windows 8.1
  • jqPlot EnhancedLegendRenderer plugin does not toggle series for Pie charts
  • Is there a mandatory requirement to switch app.yaml?
  • Cannot Parse HTML Data Using Android / JSOUP
  • Comma separated Values
  • Windows forms listbox.selecteditem displaying “System.Data.DataRowView” instead of actual value
  • C# - Getting references of reference
  • apache spark aggregate function using min value
  • Django query for large number of relationships
  • Sorting a 2D array using the second column C++
  • Why is Django giving me: 'first_name' is an invalid keyword argument for this function?
  • How to Embed XSL into XML
  • How can I use `wmic` in a Windows PE script?
  • How to push additional view controllers onto NavigationController but keep the TabBar?
  • How to load view controller without button in storyboard?