
Question:
I'm working on a web application where users can upload Microsoft Office Document files. Right now, our server is running Node.JS with Express.js and we're hosted on Heroku. Because of this, I don't think that I can install programs such as abiword or catdoc. I can handle the file uploads, but can't parse the contents of the document.
How can I read the contents of the doc file? The information will then be put into a database. It'd be nice to preserve basic formatting (bold, italic, underline), but not essential.
Answer1:While there don't seem to be anything you can get with NPM that will do Word directly, you might be able to use a REST API to request it via another cloud service. For example <a href="http://saaspose.com/" rel="nofollow">Saaspose</a> (they of the famous Aspose tools) have public API for <a href="http://saaspose.com/api/words" rel="nofollow">Word</a>, <a href="http://saaspose.com/api/cells" rel="nofollow">Excel</a>, <a href="http://saaspose.com/api/pdf" rel="nofollow">PDF</a>, and others. They list node.js, javascript, and Heroku support on their page.
EDIT:
I see that Saaspose is now called <a href="http://www.aspose.com/cloud/total-api.aspx" rel="nofollow">Aspose for Cloud</a>
Another API that claims something similar is <a href="http://www.doxument.com/" rel="nofollow">Doxument</a>
Answer2:<a href="http://github.com/dkiyatkin/node-office" rel="nofollow">Office</a> package: npm install office
seems to provide at least part of the answer. I use it to read Excel files, so far have not tried any Word docs.
There doesn't seem to be any yet. See below for something that might help.
<a href="https://stackoverflow.com/questions/9038231/can-i-read-pdf-or-word-docs-with-node-js" rel="nofollow">Can I read PDF or Word Docs with Node.js?</a>
Answer4:You can use mammoth to parse .docx files <a href="https://www.npmjs.com/package/mammoth" rel="nofollow">https://www.npmjs.com/package/mammoth</a> and xlsx to parse .xlsx files <a href="https://github.com/SheetJS/js-xlsx" rel="nofollow">https://github.com/SheetJS/js-xlsx</a>