Export the tables from pdf to excel?

How do i export only the table contents to excel file through C# programming? I am currently extracting all the contents from PDFs using PDFNET SDK ,but couldn't able to read the table as a tabular structure


I know have not used the SDK for this product, but I have used the stand alone product. It read the content of a PDF into a spreadsheet (many export options).

The product is OmniPage by Nuance http://australia.nuance.com/for-business/by-product/omnipage/index.htm.

there is an SDK with free evaluation.


I tried for the above solutions,but couldn't make it out.There are so many free sdk or dll's available such as pdfnet,pdfclown,itextsharp,pdfbox,pdflib.Finally tried again with pdfnet sdk,now i can able to do it,if my input pdf is of type tagged pdf.


Using bytescount PDF Extractor SDK we can be able to extract the whole page as below,

CSVExtractor extractor = new CSVExtractor(); extractor.RegistrationName = "demo"; extractor.RegistrationKey = "demo"; TableDetector tdetector = new TableDetector(); tdetector.RegistrationKey = "demo"; tdetector.RegistrationName = "demo"; // Load the document extractor.LoadDocumentFromFile("C:\\sample.pdf"); tdetector.LoadDocumentFromFile("C:\\sample.pdf"); int pageCount = tdetector.GetPageCount(); for (int i = 1; i <= pageCount; i++) { int j = 1; do { extractor.SetExtractionArea(tdetector.GetPageRect_Left(i), tdetector.GetPageRect_Top(i), tdetector.GetPageRect_Width(i), tdetector.GetPageRect_Height(i) ); // and finally save the table into CSV file extractor.SavePageCSVToFile(i, "C:\\page-" + i + "-table-" + j + ".csv"); j++; } while (tdetector.FindNextTable()); // search next table }

since it is an old post, hope it would help others.


Above answer(John) works,it is really useful.

But i use bytescount PDF Extrator SDK tools instead of using code.

By the way,the tool will generate a lot of sheet in one excel file.

You can use code below in excel to generate as one sheet.

Sub ConvertAsOne() Application.ScreenUpdating = False For j = 1 To Sheets.Count If Sheets(j).Name <> ActiveSheet.Name Then X = Range("A65536").End(xlUp).Row + 1 Sheets(j).UsedRange.Copy Cells(X, 1) End If Next Range("B1").Select Application.ScreenUpdating = True MsgBox "succeed!", vbInformation, "note" End Sub


  • How to install pdflib on WAMP Server
  • Split a PDF page in two parts [duplicate]
  • Extracting text from an area with PDFbox
  • Open Source libraries for PDF to image conversion [duplicate]
  • Converting iTextSharp.text.Image back to System.Drawing.Image
  • Google Maps GeoCoding always uses browser language
  • Spring Mvc 3 hasErrors is always false
  • How do I convert malformed HTML to PDF with iText and XMLWorker?
  • Identify and extract table from pdf using java
  • Setting an object in the Django cache API fails due to pickle error
  • Matrix to EulerAngles
  • Nginx 502 Bad Gateway error when using proxy
  • Excel VBA: Output Distinct Values and Subtotals
  • What does “~/” resolve to in ASP: the current folder or the root of the website?
  • Python urllib freezes with specific URL
  • Connecting to Oracle from Java …Exception
  • Find third occurrence of a special character and drop everything before that in R
  • Put text on top of an image?
  • Json parser to catch trailing commas in Java?
  • Python, ctypes, DLLs and PCOMM emulation. How can I pre alocate a variable?
  • How to get current directory name in macros programming?
  • Invoking a Javascript from Lotus Notes on click of a button
  • View PDF through C# .Net desktop App
  • jQuery - events won't fire for dynamically created tab elements
  • Is a single constant value considered an expression?
  • Howto count a variable inside of innerHTML?
  • How to distinguish between attribute and element nodes returned from a Saxon XPathSelector
  • app_Offline.htm doesn't work for subfolders of my application in IIS
  • How to restrict number of concurrent processes?
  • Selenium and Google - How do you use cookies?
  • Use neo4j server instead of embedded mode
  • How does the HEXTORAW() function work? What is the algorithm?
  • How to remove a SwiftyJSON element?
  • why overloaded new operator is calling constructor even I am using malloc inside overloading functio
  • Java Scanner input dilemma. Automatically inputs without allowing user to type
  • JavaScriptCore crash on iOS9
  • Can I have the cursor start on a particular column by default in jqgrid's edit mode?
  • what is the difference between the asp.net mvc application and asp.net web application
  • Matrix multiplication with MKL
  • Binding checkboxes to object values in AngularJs