27534

Export the tables from pdf to excel?

How do i export only the table contents to excel file through C# programming? I am currently extracting all the contents from PDFs using PDFNET SDK ,but couldn't able to read the table as a tabular structure

Answer1:

I know have not used the SDK for this product, but I have used the stand alone product. It read the content of a PDF into a spreadsheet (many export options).

The product is OmniPage by Nuance http://australia.nuance.com/for-business/by-product/omnipage/index.htm.

there is an SDK with free evaluation.

Answer2:

I tried for the above solutions,but couldn't make it out.There are so many free sdk or dll's available such as pdfnet,pdfclown,itextsharp,pdfbox,pdflib.Finally tried again with pdfnet sdk,now i can able to do it,if my input pdf is of type tagged pdf.

Answer3:

Using bytescount PDF Extractor SDK we can be able to extract the whole page as below,

CSVExtractor extractor = new CSVExtractor(); extractor.RegistrationName = "demo"; extractor.RegistrationKey = "demo"; TableDetector tdetector = new TableDetector(); tdetector.RegistrationKey = "demo"; tdetector.RegistrationName = "demo"; // Load the document extractor.LoadDocumentFromFile("C:\\sample.pdf"); tdetector.LoadDocumentFromFile("C:\\sample.pdf"); int pageCount = tdetector.GetPageCount(); for (int i = 1; i <= pageCount; i++) { int j = 1; do { extractor.SetExtractionArea(tdetector.GetPageRect_Left(i), tdetector.GetPageRect_Top(i), tdetector.GetPageRect_Width(i), tdetector.GetPageRect_Height(i) ); // and finally save the table into CSV file extractor.SavePageCSVToFile(i, "C:\\page-" + i + "-table-" + j + ".csv"); j++; } while (tdetector.FindNextTable()); // search next table }

since it is an old post, hope it would help others.

Answer4:

Above answer(John) works,it is really useful.

But i use bytescount PDF Extrator SDK tools instead of using code.

By the way,the tool will generate a lot of sheet in one excel file.

You can use code below in excel to generate as one sheet.

Sub ConvertAsOne() Application.ScreenUpdating = False For j = 1 To Sheets.Count If Sheets(j).Name <> ActiveSheet.Name Then X = Range("A65536").End(xlUp).Row + 1 Sheets(j).UsedRange.Copy Cells(X, 1) End If Next Range("B1").Select Application.ScreenUpdating = True MsgBox "succeed!", vbInformation, "note" End Sub

Recommend

  • How to install pdflib on WAMP Server
  • Split a PDF page in two parts [duplicate]
  • Extracting text from an area with PDFbox
  • Open Source libraries for PDF to image conversion [duplicate]
  • Converting iTextSharp.text.Image back to System.Drawing.Image
  • Google Maps GeoCoding always uses browser language
  • Spring Mvc 3 hasErrors is always false
  • How do I convert malformed HTML to PDF with iText and XMLWorker?
  • Identify and extract table from pdf using java
  • Setting an object in the Django cache API fails due to pickle error
  • Matrix to EulerAngles
  • Nginx 502 Bad Gateway error when using proxy
  • Excel VBA: Output Distinct Values and Subtotals
  • What does “~/” resolve to in ASP: the current folder or the root of the website?
  • Python urllib freezes with specific URL
  • Connecting to Oracle from Java …Exception
  • Find third occurrence of a special character and drop everything before that in R
  • Put text on top of an image?
  • Json parser to catch trailing commas in Java?
  • Python, ctypes, DLLs and PCOMM emulation. How can I pre alocate a variable?
  • How to get current directory name in macros programming?
  • Invoking a Javascript from Lotus Notes on click of a button
  • View PDF through C# .Net desktop App
  • jQuery - events won't fire for dynamically created tab elements
  • Is a single constant value considered an expression?
  • Howto count a variable inside of innerHTML?
  • How to distinguish between attribute and element nodes returned from a Saxon XPathSelector
  • app_Offline.htm doesn't work for subfolders of my application in IIS
  • How to restrict number of concurrent processes?
  • Selenium and Google - How do you use cookies?
  • Use neo4j server instead of embedded mode
  • How does the HEXTORAW() function work? What is the algorithm?
  • How to remove a SwiftyJSON element?
  • why overloaded new operator is calling constructor even I am using malloc inside overloading functio
  • Java Scanner input dilemma. Automatically inputs without allowing user to type
  • JavaScriptCore crash on iOS9
  • Can I have the cursor start on a particular column by default in jqgrid's edit mode?
  • what is the difference between the asp.net mvc application and asp.net web application
  • Matrix multiplication with MKL
  • Binding checkboxes to object values in AngularJs