Developer Tools
|
Office Productivity Applications
|
Enterprise Solutions
|
|||||||||||||||||||||||||||
The inspiration for this article is from a query sent by one of our customers.
The customer is a user of PDFtoolkit VCL. He receives a lot of PDF documents containing demographic data - output of some process over which he had no control. He had to extract the demographic data from the PDF files and use that data for some other process.
The data was in a structured format and occurred in the same locations on the first page of all the documents. Now, given the location of the data, was there a way to extract the data, he wanted to know.
The following is a slightly abridged version of the code snippet we sent to the client.
var
PageElements: TgtPDFPageElementList;
PageItem: TgtPDFTextElement;
LI, JI : Integer;
XCord, YCord : Double;
begin
try
Result := "";
PDFDoc.LoadFromFile("input.pdf");
// Gets text elements from page 1
PageElements :=
PDFDoc.GetPageElements(1,[etText],muPixels);
// Parses the text elements in page 1
for JI := 0 to PageElements.Count -1 do
begin
PageItem := TgtPDFTextElement(PageElements.Items[JI]);
// Retrieves coordinates of the text element
XCord := TgtPDFPageElement(PageItem).XCordOrigin;
YCord := TgtPDFPageElement(PageItem).YCordOrigin;
// Checks if the text element is at (100, 250)
if ((Trunc(XCord) = 100) and
(Trunc(YCord) = 250)) then
begin
Result := PageItem.Text;
break;
end;
end;
finally
FreeAndNil(PageElements);
end;
end;
This method is written so that it will extract text data occurring at coordinates (100, 250) on page 1 of a PDF document input.pdf. So, the method parses all text elements on page 1 of the PDF file, checks coordinates of each, and when the coordinates match (100, 250) returns the text string represented by that text element.
| Privacy | Legal | Feedback | Newsletter | © 2002-2010 Gnostice Information Technologies Private Limited. All rights reserved. |
This site is best viewed on a screen with minimum resolution of 1152 x 864 pixels. Windows users are advised to use Microsoft ClearType Tuning for optimal experience. Linux and other users can enable font smoothing, as supported by their OS. Also, please use the latest version of a standards-compliant browser such as Opera, FireFox, Chrome or Safari.