I'm working on a web application where users can upload Microsoft Office Document files. Right now, our server is running Node.JS with Express.js and we're hosted on Heroku. Because of this, I don't think that I can install programs such as abiword or catdoc. I can handle the file uploads, but can't parse the contents of the document.
How can I read the contents of the doc file? The information will then be put into a database. It'd be nice to preserve basic formatting (bold, italic, underline), but not essential.Answer1:
I see that Saaspose is now called <a href="http://www.aspose.com/cloud/total-api.aspx" rel="nofollow">Aspose for Cloud</a>
Another API that claims something similar is <a href="http://www.doxument.com/" rel="nofollow">Doxument</a>Answer2:
<a href="http://github.com/dkiyatkin/node-office" rel="nofollow">Office</a> package:
npm install office seems to provide at least part of the answer. I use it to read Excel files, so far have not tried any Word docs.
There doesn't seem to be any yet. See below for something that might help.
<a href="https://stackoverflow.com/questions/9038231/can-i-read-pdf-or-word-docs-with-node-js" rel="nofollow">Can I read PDF or Word Docs with Node.js?</a>Answer4:
You can use mammoth to parse .docx files <a href="https://www.npmjs.com/package/mammoth" rel="nofollow">https://www.npmjs.com/package/mammoth</a> and xlsx to parse .xlsx files <a href="https://github.com/SheetJS/js-xlsx" rel="nofollow">https://github.com/SheetJS/js-xlsx</a>