I recently downloaded PDFBox, and I am trying to figure out how to parse pdf files with it in Asp.Net to no avail, in fact, I don't even know where to start. Has anyone been able to do this in asp.net (C# preferrably)?
Is there any other ways to parse text out of a PDF in Asp.Net?
-
This link might help: http://renditionprotocol.blogspot.com/2009/01/using-pdfbox-in-c.html
-
check this out: http://naspinski.net/post/ParsingReading-a-PDF-file-with-C-and-AspNet-to-text.aspx
-
Hi, that solution is very good but when my pdf file has image incrusted, the code generate a exception. How I solve this problem ?? any ideas??
Thanks for your time and happy code!!
StreamWriter sw = new StreamWriter(txt_out, false); try { sw.WriteLine(); sw.WriteLine(DateTime.Now.ToString()); PDDocument doc = PDDocument.load(pdf_in); PDFTextStripper stripper = new PDFTextStripper(); sw.Write(stripper.getText(doc)); } catch (Exception ex) { Response.Write(ex.Message); } finally { sw.Close();
0 comments:
Post a Comment