Text Placement Algorithm

PDF-X OCR SDK is a New product from us and intended to compliment our existing PDF and Imaging Tools to provide the Developer with an expanding set of professional tools for Optical Character Recognition tasks

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

Post Reply
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

Text Placement Algorithm

Post by jeffp »

Walter,

I'm using PXC_TextOutA in the DLL Lib to place the OCR text using the OCR word coordinates I receive. However, my text placement doesn't turn out as smooth as yours. Is it possible to share your algorithm for placing the OCR words into a PDF document so that then line up behind in image in a smooth way. That is, so the selected text appears to be all in one smooth line, instead of a jagged line like my placement gives.

--Jeff
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: Text Placement Algorithm

Post by Walter-Tracker Supp »

We vertically align text to a common baseline ;)

Hope that helps.

-Walter
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

Re: Text Placement Algorithm

Post by jeffp »

How do you establish your common base line among all words in a line.
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: Text Placement Algorithm

Post by Walter-Tracker Supp »

Hi Jeff,

The baseline can be straight from OCR layout analysis, or you can obviously calculate it yourself from symbol or word coordinates using a simple average.

Using the baseline calculated by OCR, you can place symbols the same way we do with the PXC library function:

Code: Select all

PDFXCLIB_API	HRESULT	PXC_API	PXC_TextOutExW(_PXCContent* content, LPCPXC_PointF origin, LPCWSTR lpwszText, LONG cbLen, const double* lpDX);
Omitting OCR details, error checking, etc, the code looks something like:

Code: Select all

	WCHAR* symbolstring;
	double* symbolDXList;  // array of X-axis offset, in pts, of each character from first character.
	int NumSymbols;
	PXC_PointF ptOrigin;   // this is the origin of the first character; the Y coordinate is obviously the baseline, X is the start of the string of characters
	GetResults(...);         // fill symbolstring, symbolDXList, ptOrigin, and NumSymbols - ie, place OCR results in a couple of arrays
                                      


	_PXCContent* pContent = (_PXCContent*)pWritePage;  // pWritePage is _PXCPage type

	PXC_TextOptions topt;
	memset(&topt, 0, sizeof(topt));
	topt.fontSize = xx;  // size in pts
	topt.fontID = output_font_id; // output font ID added to pdf already
	topt.nTextPosition = TextPosition_Baseline;
	

	PXC_TextRenderingMode rmode;
	rmode = TextRenderingMode_None;
	PXC_SetTextOptions(pContent, &topt);
	PXC_SetTextRMode(pContent, rmode, &oldmode);
	oldcolour = PXC_SetFillColor(pContent, RGB(0,0,0));
	PXC_TextOutExW(pContent, &ptOrigin, symbolstring, NumSymbols, symbolDXlist);
	PXC_SetTextRMode(pContent, oldmode, NULL);  // revert to old mode - optional
	PXC_SetFillColor(pContent, oldcolour);              // revert to old colour - optional
	PXC_SetTextOptions(pContent, &told);             // revert to old options - optional
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

Re: Text Placement Algorithm

Post by jeffp »

As always, thanks much!
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Text Placement Algorithm

Post by Will - Tracker Supp »

Hi Jeff,

I'll pass the message along to Walter :(

Cheers,
Will
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Post Reply