XtremeDocumentStudio
.NET
PDFOne
.NET
XtremeDocumentStudio
(for Java)
PDFOne
(for Java)
XtremeDocumentStudio
Delphi
eDocEngine
VCL
PDFtoolkit
VCL
StarDocs
Web APIs

Redacting text from PDF files in your web apps using Web APIs

Remove sensitive data from your PDF files
by Santosh Patil

Select the language for the code snippets


If you are new to StarDocs, we suggest you read the introductory article and the getting started article first. This article builds on the steps explained in those foundational articles to avoid repetition.

The API reference documentation can be found here.

PDF files were designed for portability so it's natural that you would share documents in this format. However at times there may be pieces of sensitive data in the document that you like to redact (remove or erase) before sharing the document. For example there may be phone numbers, email addresses or Social Security Numbers that you may not want to disclose. StarDocs provides an easy-to-use API to perform redaction of such text. The search term can either be a literal string or a regular expression. The redacted region can be left blank, can be outlined and filled, or a replacement text can written. In this article we'll show you how to redact US phone numbers from a PDF file using a regular expression. We will outline and fill the redacted region with a pattern as well.

The below screenshot shows a store receipt containing US telephone numbers being viewed in the StarDocs HTML viewer.

After authentication and uploading of the document from which you need to redact text you will get the document URL. We pass in this URL to the redact API as shown below.

// Set up connection details
var starDocs = new Gnostice.StarDocs(
  new Gnostice.ConnectionInfo(
    'https://api.gnostice.com/stardocs/v1', 
    '<API Key>', 
    '<API Secret>'),
  new Preferences(
    // Whether to force full permissions on PDF files protected 
    // with an permissions/owner/master password
    new DocPasswordSettings(true))
);

// Authenticate
starDocs.auth.loginApp()
  .done(function(response) {
    // Upload file
    var selectedFile = document.getElementById('input').files[0];
    starDocs.storage.upload(selectedFile) 
      .done(function(response) {
        var documentUrl = response.documents[0].url;
        // Redact text
        // Restrict search to given page range only
        var pageRangeSettings = {
          range: '-', // All pages. Other examples: '10,15-'
          subRangeMode: 'all' // 'all', 'even', 'odd'
        };
        var searchMode = 'regex'; // 'literal' or 'regex'
        // The text to be searched and redacted
        var searchText = [{
          // Regex for matching US telephone numbers
          text: '(\\+?1[.\\s\\-]?)?[\\s\\-\\(]?\\d{3}' + 
                '[.\\s\\-\\)]{1}\\s?\\d{3}[.\\s\\-]{1}\\d{4}',
          caseSensitive: true,
          wholeWord: true
        }];
        // Whether annotations associated with the redacted 
        // text (if any) should also be removed. Default is false.
        var removeAssociatedAnnotations = true;
        // Specify what needs to be filled in the redacted region
        var fillSettings = {
          // Put an outline
          outline: {
            color: '#FF0000', // 'none' (default) or '#RRGGBB'
            width: 1, // In points
            style: 'dash' // 'solid' (default), 'dash', 'dot', 'dashDot' and 'dashDotDot'.
          },
          // Fill the area with color, pattern and/or text
          fill: {
            color: '#000000', // 'none' (default) or '#RRGGBB'
            // 'solid' (default), 'forwardDiagonal', 'backwardDiagonal', 'cross', 
            // 'diagonalCross', 'horizontal' and 'vertical'
            pattern: 'forwardDiagonal',
          }
        };
        starDocs.docOperations.redactText(documentUrl, "password", 
            pageRangeSettings, searchMode, searchText, 
            removeAssociatedAnnotations, fillSettings)
          .done(function(response) {
            var redactedDocUrl = response.documents[0].url;
            // Do something with resultant document (redactedDocUrl)
          });
      });
  });
// Set up connection details
StarDocs starDocs = new StarDocs(
  new ConnectionInfo(
    new Uri("https://api.gnostice.com/stardocs/v1"),
    "<API Key>",
    "<API Secret>"), 
  new Preferences(
    // Force full permissions on PDF files protected 
    // with an permissions/owner/master password
    new DocPasswordSettings(true))
);

// Authenticate
starDocs.Auth.loginApp();

// Input file
FileObject fileObjectReceipt = new FileObject(@"C:\Documents\Receipt.pdf");

// Redact text
DocOjbect redactedFile = starDocs.DocOperations.RedactText(
  fileObjectReceipt, // Input file
  null, // Password
  null, // Page range settings (all pages)
  TextSearchMode.Regex,
  new List() {
    new SearchText(
      "(\\+?1[.\\s\\-]?)?[\\s\\-\\(]?\\d{3}[.\\s\\-\\)]{1}\\s?\\d{3}[.\\s\\-]{1}\\d{4}")
  },
  true, // Remove associated annotations
  new RedactFillSettings(
    new Outline(
      ColoringMode.UseColor, 
      new Pen(new Color(255, 0, 0), PenStyle.Dash, 1)
    ),
    new FillRect(
      ColoringMode.UseColor, 
      new Brush(
        new Color(0, 0, 0),
        BrushPattern.ForwardDiagonal
      )
    ),
  )
);

// Do something with resultant document (redactedFile)
// ...
// Set up connection details
StarDocs starDocs = new StarDocs(
  new ConnectionInfo(
    new java.net.URI("https://api.gnostice.com/stardocs/v1"),
    "<API Key>",
    "<API Secret>"), 
  new Preferences(
    // Force full permissions on PDF files protected 
    // with an permissions/owner/master password
    new DocPasswordSettings(true))
);

// Authenticate
starDocs.auth.loginApp();

// Input file
FileObject fileObjectReceipt = new FileObject("C:\\Documents\\Receipt.pdf");

// Redact text
DocOjbect redactedFile = starDocs.docOperations.redactText(
  fileObjectReceipt, // Input file
  null, // Password
  null, // Page range settings (all pages)
  TextSearchMode.Regex,
  new List() {
    new SearchText(
      "(\\+?1[.\\s\\-]?)?[\\s\\-\\(]?\\d{3}[.\\s\\-\\)]{1}\\s?\\d{3}[.\\s\\-]{1}\\d{4}")
  },
  true, // Remove associated annotations
  new RedactFillSettings(
    new Outline(
      ColoringMode.UseColor, 
      new Pen(new Color(255, 0, 0), PenStyle.Dash, 1)
    ),
    new FillRect(
      ColoringMode.UseColor, 
      new Brush(
        new Color(0, 0, 0),
        BrushPattern.ForwardDiagonal
      )
    ),
  )
);

// Do something with resultant document (redactedFile)
// ...
var
  StarDocs: TgtStarDocsSDK;
  DocObjectReceipt: TgtDocObject;
  DocObjectReceiptRedacted: TgtDocObject;
  SearchTextList: TObjectList;
  SearchText: TgtSearchText;
begin
  StarDocs := nil;
  DocObjectReceipt := nil;
  DocObjectReceiptRedacted := nil;
  SearchTextList := nil;
  SearchText := nil;
  try
    // Set up connection details
    StarDocs := TgtStarDocsSDK.Create(nil);
    StarDocs.ConnectionInfo.ApiServerUri.URI :=
      'http://api.gnostice.com/stardocs/v1';
    StarDocs.ConnectionInfo.ApiKey := '<API Key>';
    StarDocs.ConnectionInfo.ApiSecret := '<API Secret>';
    // Force full permissions on PDF files protected 
    // with an permissions/owner/master password
    StarDocs.Preferences.DocPasswordSettings.ForceFullPermission := True;

    // Authenticate
    StarDocs.Auth.loginApp;

    // Upload file
    DocObjectReceipt:= StarDocs.Storage.Upload(
      'C:\Documents\Receipt.pdf', 'password');

    // Redact text
    SearchText := TgtSearchText.Create(
      '(\\+?1[.\\s\\-]?)?[\\s\\-\\(]?\\d{3}[.\\s\\-\\)]{1}\\s?\\d{3}[.\\s\\-]{1}\\d{4}');
    SearchTextList := TObjectList.Create;
    SearchTextList.Add(SearchText);
    FStarDocsSDK.DocOperations.RedactFillSettings.Outline
      .PenColoringMode := TgtColoringMode.cmUseColor;
    FStarDocsSDK.DocOperations.RedactFillSettings.Outline.Pen.Style := pstDash;
    FStarDocsSDK.DocOperations.RedactFillSettings.Outline.Pen.Color.Red := 255;
    FStarDocsSDK.DocOperations.RedactFillSettings.Outline.Pen.Color.Blue := 0;
    FStarDocsSDK.DocOperations.RedactFillSettings.Outline.Pen.Color.Green := 0;
    FStarDocsSDK.DocOperations.RedactFillSettings.FillRect
      .BrushColoringMode := TgtColoringMode.cmUseColor;
    FStarDocsSDK.DocOperations.RedactFillSettings.FillRect.Brush.Color.Red := 0;
    FStarDocsSDK.DocOperations.RedactFillSettings.FillRect.Brush.Color.Green := 0;
    FStarDocsSDK.DocOperations.RedactFillSettings.FillRect.Brush.Color.Blue := 0;
    FStarDocsSDK.DocOperations.RedactFillSettings.FillRect.Brush.Pattern := bptForwardDiagonal;
    DocObjectReceiptRedacted := StarDocs.DocOperations.RedactText(
      DocObjectReceipt,
      'password',
      nil, // Page range settings (all pages)
      tsmRegex,
      SearchTextList,
      True // Remove associated annotations
    );

    // Do something with resultant document (redactedFile)
    // ...
  finally
    // Free objects
    if Assigned(DocObjectReceipt) then
      FreeAndNil(DocObjectReceipt);
    if Assigned(DocObjectReceiptRedacted) then
      FreeAndNil(DocObjectReceiptRedacted);
    if Assigned(SearchTextList) then
      FreeAndNil(SearchTextList);
    if Assigned(StarDocs) then
      FreeAndNil(StarDocs);
  end;
end;

The below screenshot shows the resultant document being viewed in the StarDocs HTML viewer.

That's it! This article showed how to use the Gnostice StarDocs API to redact text from a PDF document.

---o0O0o---

Our .NET Developer Tools
XtremeDocumentStudio .NET

Multi-format document-processing component suite for .NET developers.

PDFOne .NET

A .NET PDF component suite to create, edit, view, print, reorganize, encrypt, annotate, and bookmark PDF documents in .NET applications.

Our Delphi/C++Builder developer tools
XtremeDocumentStudio Delphi

Multi-format document-processing component suite for Delphi/C++Builder developers, covering both VCL and FireMonkey platforms.

eDocEngine VCL

A Delphi/C++Builder component suite for creating documents in over 20 formats and also export reports from popular Delphi reporting tools.

PDFtoolkit VCL

A Delphi/C++Builder component suite to edit, enhance, view, print, merge, split, encrypt, annotate, and bookmark PDF documents.

Our Java developer tools
XtremeDocumentStudio (for Java)

Multi-format document-processing component suite for Java developers.

PDFOne (for Java)

A Java PDF component suite to create, edit, view, print, reorganize, encrypt, annotate, bookmark PDF documents in Java applications.

Our Platform-Agnostic Cloud and On-Premises APIs
StarDocs

Cloud-hosted and On-Premises REST-based document-processing and document-viewing APIs

Privacy | Legal | Feedback | Newsletter | Blog | Resellers © 2002-2017 Gnostice Information Technologies Private Limited. All rights reserved.