PDFOne .NET
Powerful all-in-one PDF library for .NET
Compatibility
VS 2008 VS 2005 CLR 2.0

Parsing PDF Page Elements Using PDFOne .NET v4

Learn to access PDF page elements such as text, images, shapes, and Form XObjects.
By V. Subhash
In Version 4 of PDFOne, we introduced a new method GetPageElements() in the PdfDocument class.

ArrayList
  GetPageElements(int pageNum,
                  PDFPageImageElement elementTypes)

List
  GetPageElements(String pageRange,
                   PDFPageImageElement elementTypes)

This method returns a list containing PDFPageElement instances. But, PDFPageElement is the parent class of individual element classes, namely PDFPageCompositeElement, PDFPageImageElement, PDFPagePathElement, and PDFPageTextElement. You can directly access items in the returned list as instances of these derived classes.

The derived classes provide a lot more information about the retrieved page element. For example, with the PDFPageTextElement instance, you can not only find the actual text represented by the text element but also the location, font rotation (if any), and other details. In the following code snippet, we will see how this is done.

static void Main(string[] args) {
  PDFDocument PDFDocument1 = new PDFDocument("your-license-key");

  // Load PDF document
  PDFDocument1.Load("sample_doc.pdf");

  // Extract all image elements
  ArrayList ArrayList1 = PDFDocument1.GetPageElements(1, PDFPageElementType.IMAGE);

  // Save all image elements to file
  int n = ArrayList1.Count;
  Bitmap Bitmap1;
  for (int i = 0; i < n; i++) {
    PDFPageImageElement PDFPageImageElement1 = (PDFPageImageElement) ArrayList1[i];
    Bitmap1 = PDFPageImageElement1.GetImage();
    Bitmap1.Save("I:\\page" + (i+1).ToString() + ".bmp", ImageFormat.Bmp);
    Console.WriteLine("Image Element #" + (i + 1) + " (" +
                      PDFPageImageElement1.ImageHeight + " x " +
                      PDFPageImageElement1.ImageWidth + ")" + " saved to: " +
                      "page1_image" + (i + 1) + ".bmp");
  }

  // Extract all text elements
  ArrayList ArrayList2 = PDFDocument1.GetPageElements(1, PDFPageElementType.TEXT);

  // Save all image elements to file
  n = ArrayList2.Count;
  for (int i = 0; i < n; i++) {
    PDFPageTextElement PDFPageTextElement1 = (PDFPageTextElement) ArrayList2[i];
    Console.WriteLine("Text Element #" + (i + 1) + " \"" +
                      PDFPageTextElement1.Text + "\" uses font " +
                      PDFPageTextElement1.TextFontInfo.FontName);
  }

  // Close the document
  PDFDocument1.Close();
}

This code snippet tries to parse text and image elements in page 1 of a document. The text elements are displayed in the console while image elements are saved to a file. Here is the original document that was used to test this document.

Sample Document

Here is the output of the program when used with the above document. The output mentions the image and text elements that were found in page 1 of the document.

Parsed Text and Images

Here is the image element after it was saved to a file.

Extracted Image

---o0O0o---

Our Developer Tools
eDocEngine VCL

A Delphi/C++Builder component suite for creating documents in over 20 formats and also export reports from popular Delphi reporting tools.

PDFtoolkit VCL

A Delphi/C++Builder component suite to edit, enhance, view, print, merge, split, encrypt, annotate, and bookmark PDF documents.

XtremePDFConverter VCL

A Delphi/C++Builder component to intelligently convert PDF to user-friendly Word RTF documents.

PDFOne .NET

A .NET PDF component suite to create, edit, view, print, reorganize, encrypt, annotate, and bookmark PDF documents in .NET applications.

XtremeDocumentStudio .NET

Multi-format document-processing component suite for .NET developers

PDFOne (for Java™)

A Java™ PDF component suite to create, edit, view, print, reorganize, encrypt, annotate, bookmark PDF documents in Java™ applications.

XtremeFontEngine (for Java)

Java font engine to render glyphs from Type 1, Type 2 (CFF), and TrueType fonts

Our Office Productivity Applications
Free PDF Reader

A free, fast, and portable application for viewing, printing and converting PDF documents.

Privacy | Legal | Feedback | Newsletter | Resellers © 2002-2013 Gnostice Information Technologies Private Limited. All rights reserved.

This site is best viewed on a screen with minimum resolution of 1152 x 864 pixels. Windows XP users are advised to use Microsoft ClearType Tuning for optimal experience. Also, please use the latest version of a standards-compliant browser such as Firefox, Opera, or Dragon (Chromium).