PdfConnector Properties, Methods, and Events

Use the PdfConnector component in an automation to search for text, extract text and images, and annotate PDF files. You cannot use the PdfConnector component to view, create, or modify PDF files, other than to annotate them.

You can use this component to process PDF files with or without user interaction. For instance, you can use this component to process a PDF file before presenting it to the Runtime user.

Use this component with the PdfViewer component to bring annotations and highlights into the Runtime viewer’s attention.

Note: In PDF files, left and right coordinates are offsets from the left of the page. Bottom and top coordinates are offsets from the bottom of the page.

The PdfConnector component contains these properties, methods, and events.

Properties

Property

Description

FileName

Specify the name of the PDF file.

OutputName

Specify the file name to assign to the output PDF file. Be sure to specify an output file name to avoid overwriting the original PDF during the design phase.

HasFormFields

Specify an open PDF file which has form fields that can be written to.

To use this property, you must have version 8.0.1044 or later.

IsDocXfaFormat

(Read-only) Indicates if the opened PDF file is in XFA (XML Forms Architecture) format. Support for XFA format PDF files is limited. You can use the PdfViewer component to display XFA-formatted files but you cannot edit them.

To use this property, you must have version 8.0.1044 or later.

IsOpen

(Read-only) Indicates if the PDF file has been successfully opened.

HasSaved

(Read-only) Indicates that the PDF file has been saved.

Pages

(Read-only) Provides a list of the PdfPage objects. These objects represent the pages in the document.

PageCount

(Read-only) Indicates the number of pages in the document.

ImageCount

(Read-only) Indicates the number of images you can extract from the document.

AnnotationCount

(Read-only) Indicates the number of annotations currently in the document.

Text

(Read-only) Returns all of the text in the document as a single value. The system omits comments and annotation text.

LineThreshold

The system compares the amount of white space between these points when comparing two pieces of text to determine if the text is on the same line:

·         The amount of white space above the top of the line

·         The amount of white space below the bottom of the line

Your entry in this property sets the threshold. If the white space is less than or equal to your entry, the system considers the text to be on the same line. If it is more than your entry, it considers the text to be on different lines.

The default is 2.0 points, with a point being equal to 1/72 of an inch.

SegmentThreshold

The system compares the amount of white space between these points when comparing two pieces  of text to determine if they are part of the same segment of text:

·         The amount of white space above the top of the segment

·         The amount of white space below the bottom of the segment

Your entry in this property sets the threshold. If the white space is less than or equal to your entry, the system considers the text to be part of the same segment. If it is more than your entry, it considers the text to be in different segments.

The default is 10 points, with a point being equal to 1/72 of an inch.

WordThreshold

The system looks at the amount of white space between pieces of text to determine if the text comprises a single word or if the white space indicates there are two words.

Your entry in this property sets the threshold. If the space is less than or equal to your entry, the system considers the text to part of the same word. If it is more than your entry, it considers the text to be different words.

The default is 2.2 points, with a point being equal to 1/72 of an inch.

Methods

Method

Description

Return type

Close()

Closes a PDF file.

bool

Save()

Saves a PDF file.

bool

FindPage(string searchFor, out int pageNumber)

Finds the first page that contains the text you specify.

bool

FindPage(string searchFor, int startPage, out int pageNumber)

Finds the first page that contains the text you specify. The system starts the search on the page number you specify.

bool

FindPage(string searchFor, int startPage, int endPage, out int pageNumber)

Finds  the first page that contains the text you specify. You must also specify the page numbers on which you want the search to start and end.

bool

FindPage(string searchFor, PdfLine searchAfter, out int pageNumber)

Finds  the first page that contains the text you specify. The system starts the search the search after the line you specify.

bool

FindPage(string searchFor, PdfSegment searchAfter, out int pageNumber)

Finds the first page that contains the text you specify. The system starts the search after the segment you specify.

bool

FindPage(string searchFor, PdfWord searchAfter, out int pageNumber)

Finds the first page that contains the text you specify. The system starts the search after the word you specify.

bool

FindPages(string searchFor, out int[] pageNumbers)

Finds all pages that contain the text you specify.

bool

FindPages(string searchFor, int startPage, out int[] pageNumbers)

Finds all pages that contain the text you specify, starting the search at the page number you specify.

bool

FindPages(string searchFor, int startPage, int endPage, out int[] pageNumbers)

Finds all pages that contain the text you specify. You must also specify the page numbers on which you want the search to start and end.

bool

FindPages(string searchFor, PdfLine searchAfter, out int[] pageNumbers)

Finds all pages that contain the text you specify. The system starts the search after the line you specify.

bool

FindPages(string searchFor, PdfSegment searchAfter, out int[] pageNumbers)

Finds all pages that contain the text you specify text, starting the search after the segment you specify.

bool

FindPages(string searchFor, PdfWord searchAfter, out int[] pageNumbers)

Finds all pages that contain the text you specify. The system starts the search after the word you specify.

bool

FindLine(string searchFor, out PdfLine line)

Finds the first line that contains the text you specify.

bool

FindLine(string searchFor, int startPage, out PdfLine line)

Finds the first line that contains the text you specify, starting the search at the page number you specify.

bool

FindLine(string searchFor, int startPage, int endPage, out PdfLine line)

Finds the first line that contains the text you specify. You must also specify the page numbers on which you want the search to start and end.

bool

FindLine(string searchFor, PdfLine searchAfter, out PdfLine line)

Finds the first line that contains the text you specify. The system starts the search after the line you specify.

bool

FindLine(string searchFor, PdfSegment searchAfter, out PdfLine line)

Finds the first line that contains the text you specify. The system starts the search after the segment you specify.

bool

FindLine(string searchFor, PdfWord searchAfter, out PdfLine line)

Finds the first line that contains the text you specify. The system starts the search after the word you specify.

bool

FindLines(string searchFor, out PdfLine[] lines)

Finds all of the lines that contain the text you specify.

bool

FindLines(string searchFor, int startPage, out PdfLine[] lines)

Finds all of the lines that contain the text you specify, starting the search at the page number you specify.

bool

FindLines(string searchFor, int startPage, int endPage, out PdfLine[] lines)

Finds all of the lines that contain the text you specify. You must also specify the page numbers on which you want the search to start and end.

bool

FindLines(string searchFor, PdfLine searchAfter, out PdfLine[] lines)

Finds all of the lines that contain the text you specify. The system starts the search after the line you specify.

bool

FindLines(string searchFor, PdfSegment searchAfter, out PdfLine[] lines)

Finds all of the lines that contain the text you specify. The system starts the search after the segment you specify.

bool

FindLines(string searchFor, PdfWord searchAfter, out PdfLine[] lines)

Finds all of the lines that contain the text you specify. The system starts the search after the word you specify.

bool

FindRelativeLine(string searchFor, int occurrence, int relativeLineOffset, out PdfLine line)

Finds a specific occurrence of a line The system returns a line relative to the line the system finds.

bool

FindSegment(string searchFor, out PdfSegment segment)

Finds the first segment that contains the text you specify.

bool

FindSegment(string searchFor, int startPage, out PdfSegment segment)

Finds the first segment that contains the text you specify, starting the search at the page number you specify.

bool

FindSegment(string searchFor, int startPage, int endPage, out PdfSegment segment)

Finds the first segment that contains the text you specify. You must also specify the page numbers on which you want the search to start and end.

bool

FindSegment(string searchFor, PdfLine searchAfter, out PdfSegment segment)

Finds the first segment that contains the text you specify. The system starts the search after the line you specify.

bool

FindSegment(string searchFor, PdfSegment searchAfter, out PdfSegment segment)

Finds the first segment that contains the text you specify. The system starts the search after the segment you specify.

bool

FindSegment(string searchFor, PdfWord searchAfter, out PdfSegment segment)

Finds the first segment that contains the text you specify. The system starts the search after the word you specify.

bool

FindSegments(string searchFor, out PdfSegment[] segments)

Finds all segments that contain the text you specify.

bool

FindSegments(string searchFor, int startPage, out PdfSegment[] segments)

Finds all segments that contain the text you specify, starting the search at the page number you specify.

bool

FindSegments(string searchFor, int startPage, int endPage, out PdfSegment[] segments)

Finds all segments that contain the text you specify. You must also specify the page numbers on which you want the search to start and end.

bool

FindSegments(string searchFor, PdfLine searchAfter, out PdfSegment[] segments)

Finds all segments that contain the text you specify. The system starts the search after the line you specify.

bool

FindSegments(string searchFor, PdfSegment searchAfter, out PdfSegment[] segments)

Finds all segments that contain the text you specify. The system starts the search after the segment you specify.

bool

FindSegments(string searchFor, PdfWord searchAfter, out PdfSegment[] segments)

Finds all segments that contain the text you specify. The system starts the search after the word you specify.

bool

FindRelativeSegment(string searchFor, int occur, int relSegOffset, out PdfSegment seg)

Finds a specific occurrence of a segment. The system returns a segment relative to the segment it found.

bool

FindWord(string searchFor, out PdfWord word)

Finds the first word that contains the text you specify.

bool

FindWord(string searchFor, int startPage, out PdfWord word)

Finds the first word that contains the text you specify, starting the search at the page number you specify.

bool

FindWord(string searchFor, int startPage, int endPage, out PdfWord word)

Finds the first word that contains the text you specify. You must also specify the page numbers on which you want the search to start and end.

bool

FindWord(string searchFor, PdfLine searchAfter, out PdfWord word)

Finds the first word that contains the text you specify. The system starts the search after the line you specify.

bool

FindWord(string searchFor, PdfSegment searchAfter, out PdfWord word)

Finds the first word that contains the text you specify. The system starts the search after the segment you specify.

bool

FindWord(string searchFor, PdfWord searchAfter, out PdfWord word)

Finds the first word that contains the text you specify. The system starts the search after the word you specify.

bool

FindWords(string searchFor, out PdfWord[] words)

Finds all words that contain the text you specify.

bool

FindWords(string searchFor, int startPage, out PdfWord[] words)

Finds all words that contain the text you specify. You must also specify the page numbers on which you want the search to start and end.

bool

FindWords(string searchFor, int startPage, int endPage, out PdfWord[] words)

Finds all words that contain the text you specify, searching from one page to another page.

bool

FindWords(string searchFor, PdfLine searchAfter, out PdfWord[] words)

Finds all words that contain the text you specify. The system starts the search after the line you specify.

bool

FindWords(string searchFor, PdfSegment searchAfter, out PdfWord[] words)

Finds all words that contain the text you specify. The system starts the search after the segment you specify.

bool

FindWords(string searchFor, PdfWord searchAfter, out PdfWord[] words)

Finds all words that contain the text you specify. The system starts the search after the word you specify.

bool

FindRelativeWord(string searchFor, int occur, int relativeWordOffset, out PdfWord word)

Finds a specific occurrence of a word. The system returns a word relative to the word the system finds.

bool

FindPhrase(string searchFor, out PdfPhrase phrase)

Finds the first occurrence of the text you specify.

bool

FindPhrase(string searchFor, int startPage, out PdfPhrase phrase)

Finds the first occurrence of the text you specify, starting the search at the page number you specify.

bool

FindPhrase(string searchFor, int startPage, int endPage, out PdfPhrase phrase)

Finds the first occurrence of the text you specify. You must also specify the page numbers on which you want the search to start and end.

bool

FindPhrases(string searchFor, out PdfPhrase[] phrases)

Finds the all occurrences of the text you specify.

bool

FindPhrases(string searchFor, int startPage, out PdfPhrase[] phrases)

Finds the all occurrences of the text you specify, starting the search at the page number you specify.

bool

FindPhrases(string searchFor, int startPage, int endPage, out PdfPhrase[] phrases)

Finds the all occurrences of the text you specify. You must also specify the page numbers on which you want the search to start and end.

bool

GetImage(out Image image)

Extracts the first image the system finds.

bool

GetImage(int startPage, out Image image)

Extracts the first image the system finds, starting the search at the page number you specify.

bool

GetImage(int startPage, int endPage, out Image image)

Extracts the first image the system finds within a range of pages. You must specify the page numbers on which you want the search to start and end.

bool

GetImages(out Image[] images)

Extracts all images.

bool

GetImages(int startPage, out Image[] images)

Extracts all images, starting at the page number you specify.

bool

GetImages(int startPage, int endPage, out Image[] images)

Extracts all images. You must specify the page numbers on which you want the search to start and end.

bool

GetAnnotation(out PdfAnnotation annotation)

Retrieve the first annotation the system finds.

bool

GetAnnotation(int startPage, out PdfAnnotation annotation)

Retrieve the first annotation the system finds, beginning with the page number you specify.

bool

GetAnnotation(int startPage, int endPage, out PdfAnnotation annotation)

Retrieve the first annotation the system finds within a range of pages. You must specify the page numbers on which you want the search to start and end.

bool

GetAnnotation(AnnotationType type, out PdfAnnotation annotation)

Gets the first annotation the system finds of the annotation type you specified.

bool

GetAnnotation(AnnotationType type, int startPage, out PdfAnnotation annotation)

Gets the first annotation the system finds of the annotation type you specified. The system starts the search at the page number you specify.

bool

GetAnnotation(AnnotationType type, int startPage, int endPage, out PdfAnnotation annot)

Gets the first annotation the system finds of the annotation type you specified within a range of pages. You must specify the page numbers on which you want the search to start and end.

bool

GetAnnotations(out PdfAnnotation[] annotations)

Gets all of the annotations in the PDF file.

bool

GetAnnotations(int startPage, out PdfAnnotation[] annotations)

Gets all of the annotations, starting at the page number you specify.

bool

GetAnnotations(int startPage, int endPage, out PdfAnnotation[] annotations)

Gets all of the annotations within a range of pages. You must specify the page numbers on which you want the search to start and end.

bool

GetAnnotations(AnnotationType type, out PdfAnnotation[] annotations)

Gets all of the annotations of the type you specify.

bool

GetAnnotations(AnnotationType type, int startPage, out PdfAnnotation[] annotations)

Gets all of the annotations of the type you specify, starting at the page number you specify.

bool

GetAnnotations(AnnotationType type, int start, int end, out PdfAnnotation[] annots)

Gets all of the annotations of the type you specify within a range of pages. You must specify the page numbers on which you want the search to start and end.

bool

Annotate(AnnotationType typ, int pg, string tx, float lf, float rt, float tp, float bt, Color clr)

Adds an annotation based on the position you specify, such as float left, right, top, or bottom.

bool

Annotate(PdfLine line, AnnotationType type, string annotationText, Color color)

Adds an annotation based on the ordinal line number you specify.

bool

Annotate(PdfSegment segment, AnnotationType type, string annotationText, Color color)

Adds an annotation based on the segment you specify.

bool

Annotate(PdfWord word, AnnotationType type, string annotationText, Color color)

Adds an annotation based on the word you specify.

bool

Annotate(PdfPhrase phrase, AnnotationType type, string annotationText, Color color)

Adds an annotation based on the phrase you specify.

bool

DeleteAnnotation(PdfAnnotation annotation)

Deletes the annotation you specify.

bool

PdfPage GetPage(int pageNumber)

Gets the PdfPage object that corresponds to the page number you specify.

PdfPage

Events

Event

Description

FileOpened

Occurs when a file is opened.

OutputSaved

Occurs when the PDF file is saved.

 

Related Types

PdfPage

The PdfPage type has these properties and methods.

Properties

Property

Description

Text

(Read-only) Returns all of the text on the page as a single value.

PdfLines

(Read-only) Returns a list of the lines on the page.

PdfSegments

(Read-only) Returns a list of the segments on the page.

PdfWords

(Read-only) Returns a list of the words on the page.

Images

(Read-only) Returns a list of the images on the page.

Annotations

(Read-only) Returns a list of the annotations on the page.

PageNumber

(Read-only) Returns the page number of the page.

LineCount

(Read-only) Returns the number of lines on the page.

SegmentCount

(Read-only) Returns the number of segments on the page.

WordCount

(Read-only) Returns the number of words on the page.

ImageCount

(Read-only) Returns the number of images on the page.

AnnotationCount

(Read-only) Returns the number of annotations on the page.

Methods

Method

Description

Return type

FindLine(string searchFor, out PdfLine line)

Finds the number of the first line on the page that contains the text you specify.

bool

FindLines(string searchFor, out PdfLine[] lines)

Finds all of the lines on the page that contain the text you specify.

bool

FindSegment(string searchFor, out PdfSegment segment)

Finds the first segment on the page that contains the text you specify.

bool

FindSegments(string searchFor, out PdfSegment[] segments)

Finds all of the segments on the page that contain the text you specify.

bool

FindWord(string searchFor, out PdfWord word)

Finds the first word on the page that contains the text you specify.

bool

FindWords(string searchFor, out PdfWord[] words)

Finds all of the words on the page that contain the text you specify.

bool

PdfLine GetLine(int lineNumber)

Gets the line on the page that corresponds to the line number.

PdfLine

PdfSegment GetSegment(int segmentNumber)

Gets the segment on the page that corresponds to the segment number.

PdfSegment

PdfWord GetWord(int wordNumber)

Gets the word on the page that corresponds to the word number you specify.

PdfWord

GetImage(int imageNumber)

Gets the nth image on the page. You specify the ordinal number of the image you want.

Image

PdfAnnotation GetAnnotation(int annotationNumber)

Gets the nth annotation on the page. You specify the ordinal number of the image you want.

PdfAnnotation

Annotate(AnnotationType type, string tx, float lft, float rt, float tp, float bt, Color color)

Adds an annotation to the page. You specify the position of the annotation, such as float left, right, top, or bottom.

void

GetRelativePage(int relativePageOffset, out PdfPage resultPage)

Gets a page relative to this page.

bool

 

PdfLine

The PdfLine type has these properties and methods.

Properties

Property

Description

Text

(Read-only) Returns the text of the line.

PageNumber

(Read-only) Returns the page number where the line is located.

LineNumber

(Read-only) Returns the line’s ordinal line number.

Left

(Read-only) Returns the line’s left position on the page.

Right

(Read-only) Returns the line’s right position on the page.

Top

(Read-only) Returns the line’s top position on the page.

Bottom

(Read-only) Returns the line’s bottom position on the page.

 

Methods

Method

Description

Return type

GetRelativeLine(int relativeLineOffset, out PdfLine resultLine)

Gets a line that is relative to this line.

bool

 

PdfSegment

The PdfSegment type has these properties and methods.

Properties

Property

Description

Text

(Read-only) Returns the text of the segment.

PageNumber

(Read-only) Returns the page number where the segment is located.

SegmentNumber

(Read-only) Returns the segment’s ordinal line number.

Left

(Read-only) Returns the segment’s left position on the page.

Right

(Read-only) Returns the segment’s right position on the page.

Top

(Read-only) Returns the segment’s top position on the page.

Bottom

(Read-only) Returns the segment’s bottom position on the page.

Methods

Method

Description

Return Type

GetRelativeSegment(int relativeSegmentOffset, out PdfSegment resultSegment)

Gets a segment relative to this segment.

bool

 

PdfWord

The PdfWord type has these properties and methods.

Properties

Property

Description

Text

(Read-only) Returns the text of the word.

PageNumber

(Read-only) Returns the page number where the word is located.

WordNumber

(Read-only) Returns the word’s ordinal line number.

Left

(Read-only) Returns the word’s left position on the page.

Right

(Read-only) Returns the word’s right position on the page.

Top

(Read-only) Returns the word’s top position on the page.

Bottom

(Read-only) Returns the word’s left position on the page.

Methods

Method

Description

Return type

GetRelativeWord(int relativeWordOffset, out PdfWord resultWord)

Gets a word relative to this word.

bool

 

PdfPhrase

The PdfPhrase type has these properties.

Properties

Property

Description

Text

(Read-only) Returns the text of the phrase.

PageNumber

(Read-only) Returns the page number where the phrase is located.

Left

(Read-only) Returns the phrase’s left position on the page.

Right

(Read-only) Returns the phrase’s right position on the page.

Top

(Read-only) Returns the phrase’s top position on the page.

Bottom

(Read-only) Returns the phrase’s left position on the page.

 

PdfAnnotation

The PdfAnnotation type has these properties.

Properties

Property

Description

AnnotationType

(Read-only) Returns the type of annotation. The PdfAnnotation property has these types:

Unknown – The type of annotation is unknown.

Text – PDF viewers typically show a Text annotation as an icon at a specified position. There is typically a way to show the reader the associated comment text, such as by clicking on the icon.

Highlight - PDF viewers typically show a Highlight annotation as an area of the page highlighted in a specified color. You can choose the color by entering RGB values or color names predefined on the Windows color palette. A Highlight annotation can also include a comment.

Text

(Read-only) Returns the comment text for the annotation.

PageNumber

(Read-only) Returns the page number where the annotation is located.

Left

(Read-only) Returns the annotation’s left position on the page.

Right

(Read-only) Returns the annotation’s right position on the page.

Top

(Read-only) Returns the annotation’s top position on the page.

Bottom

(Read-only) Returns the annotation’s left position on the page.

Color

(Read-only) Returns a value of datatype Color. This value can be stored in a variable of datatype Color, assigned to a property of datatype Color (like a button’s BackColor), passed as a parameter to a method that expects a color, and so on.

 

 


Privacy | Trademarks | Terms of Use | Feedback

Updated: 18 June 2020

© 2016 - 2020 Pegasystems Inc.  Cambridge, MA All rights reserved.

 

OpenSpan data classification label