23653

Parse Microsoft Office files in Node.JS

Question:

I'm working on a web application where users can upload Microsoft Office Document files. Right now, our server is running Node.JS with Express.js and we're hosted on Heroku. Because of this, I don't think that I can install programs such as abiword or catdoc. I can handle the file uploads, but can't parse the contents of the document.

How can I read the contents of the doc file? The information will then be put into a database. It'd be nice to preserve basic formatting (bold, italic, underline), but not essential.

Answer1:

While there don't seem to be anything you can get with NPM that will do Word directly, you might be able to use a REST API to request it via another cloud service. For example <a href="http://saaspose.com/" rel="nofollow">Saaspose</a> (they of the famous Aspose tools) have public API for <a href="http://saaspose.com/api/words" rel="nofollow">Word</a>, <a href="http://saaspose.com/api/cells" rel="nofollow">Excel</a>, <a href="http://saaspose.com/api/pdf" rel="nofollow">PDF</a>, and others. They list node.js, javascript, and Heroku support on their page.

EDIT:

I see that Saaspose is now called <a href="http://www.aspose.com/cloud/total-api.aspx" rel="nofollow">Aspose for Cloud</a>

Another API that claims something similar is <a href="http://www.doxument.com/" rel="nofollow">Doxument</a>

Answer2:

<a href="http://github.com/dkiyatkin/node-office" rel="nofollow">Office</a> package: npm install office seems to provide at least part of the answer. I use it to read Excel files, so far have not tried any Word docs.

Answer3:

There doesn't seem to be any yet. See below for something that might help.

<a href="https://stackoverflow.com/questions/9038231/can-i-read-pdf-or-word-docs-with-node-js" rel="nofollow">Can I read PDF or Word Docs with Node.js?</a>

Answer4:

You can use mammoth to parse .docx files <a href="https://www.npmjs.com/package/mammoth" rel="nofollow">https://www.npmjs.com/package/mammoth</a> and xlsx to parse .xlsx files <a href="https://github.com/SheetJS/js-xlsx" rel="nofollow">https://github.com/SheetJS/js-xlsx</a>

Recommend

  • d3js: time scaling and “1901”
  • angularjs ng-bind-html html input part missing
  • Slow performance in hybrid AngularJS and Angular application in Safari
  • WP7 difficulties binding data to listbox itemssource - won't refresh
  • Ruby 1.8.6 Array#uniq not removing duplicate hashes
  • Configure Spring's MappingJacksonHttpMessageConverter
  • AppleScript : find open tab in safari by name and open it
  • Jenkins: FATAL: Could not initialize class hudson.util.ProcessTree$UnixReflection
  • How do I get HTML corresponding to current DOM tree?
  • JQuery Internet Explorer and ajaxstop
  • How to attach a node.js readable stream to a Sendgrid email?
  • JSON response opens as a file, but I can't access it with JavaScript
  • Django rest serializer Breaks when data exists
  • PostgreSQL Query without WHERE only ORDER BY and LIMIT doesn't use index
  • Java: can you cast Class into a specific interface?
  • How to rebase a series of branches?
  • Is there a javascript serializer for JSON.Net?
  • Java Scanner input dilemma. Automatically inputs without allowing user to type
  • Master page gives error
  • Azure Cloud Service Web Role web pages do not load
  • AES padding and writing the ciphertext to a disk file
  • Updating server-side rendering client-side
  • DirectX11 ClearRenderTargetViewback with transparent buffer?
  • How to extract text from Word files using C#?
  • JSON with duplicate key names losing information when parsed
  • Change an a tag attribute in JavaScript based on screen width
  • what is the difference between the asp.net mvc application and asp.net web application
  • Why is the timeout on a windows udp receive socket always 500ms longer than set by SO_RCVTIMEO?
  • Web-crawler for facebook in python
  • Run Powershell script from inside other Powershell script with dynamic redirection to file
  • InvalidAuthenticityToken between subdomains when logging in with Rails app
  • Unit Testing MVC Web Application in Visual Studio and Problem with QTAgent
  • SQL merge duplicate rows and join values that are different
  • Load html files in TinyMce
  • How can I get HTML syntax highlighting in my editor for CakePHP?
  • coudnt use logback because of log4j
  • LevelDB C iterator
  • Can't mass-assign protected attributes when import data from csv file
  • sending mail using smtp is too slow
  • How to Embed XSL into XML