iText 7 - Extract Exact Word(s) In A Region

iText 7 - Extract Exact Word(s) In A Region - c#

With PDF I understand that char/words can be positioned into multiple locations. I understand also this
(so part of the text may be outside rect, iText doesn't cut text
snippets in pieces)
I found this help also.
Extracting text from a rectangle using iText ( .Net ) does give me the entire line
And managed the code below, but I still couldn't extract the exact word inside my rectangle.
namespace iText
{
public static class Test
{
public static void Main(String[] args)
{
var reader = new PdfReader("mypdf.pdf");
PdfDocument pdfDoc = new PdfDocument(reader);
var addressRect = new Rectangle(0, 0, 0, 0);
var addressRegionFilter = new TextRegionEventFilter(addressRect);
var filterListener = new RectangleTextExtractionStrategy(new LocationTextExtractionStrategy(), addressRect);
var addressText = PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(1), filterListener);
pdfDoc.Close();
}
}
public class RectangleTextExtractionStrategy : ITextExtractionStrategy
{
private ITextExtractionStrategy innerStrategy = null;
private Rectangle rectangle;
public RectangleTextExtractionStrategy(ITextExtractionStrategy strategy, Rectangle rectangle)
{
this.innerStrategy = strategy;
this.rectangle = rectangle;
}
public void EventOccurred(IEventData iEventData, EventType eventType)
{
if (eventType != EventType.RENDER_TEXT)
return;
TextRenderInfo tri = (TextRenderInfo)iEventData;
foreach (TextRenderInfo subTri in tri.GetCharacterRenderInfos())
{
Rectangle r2 = new CharacterRenderInfo(subTri).GetBoundingBox();
if (Intersects(r2))
innerStrategy.EventOccurred(subTri, EventType.RENDER_TEXT);
}
}
public string GetResultantText()
{
return innerStrategy.GetResultantText();
}
public ICollection<EventType> GetSupportedEvents()
{
return innerStrategy.GetSupportedEvents();
}
private bool Intersects(Rectangle rectangle)
{
var addressRect = new Rectangle(62, 20, 6, 7);
bool intersect = rectangle.Contains(addressRect);
if(intersect)
return true;
return false;
}
}
}

Related

Find a string and location in PDF file using iTextSharp using ASP.Net C#

I am trying to find a string and it's location in a PDF using iTextSharp in Asp.net C# for editing. But so far with the help available on Google I am unable to do it. This is the current code but it does read text chunk by chunk but couldn't find the required text. Need help Thanks
public class RectAndText
{
public iTextSharp.text.Rectangle Rect;
public String Text;
public RectAndText(iTextSharp.text.Rectangle rect, String text)
{
this.Rect = rect;
this.Text = text;
}
}
public class MyLocationTextExtractionStrategy : LocationTextExtractionStrategy
{
public List<RectAndText> myPoints = new List<RectAndText>();
public String TextToSearchFor { get; set; }
public System.Globalization.CompareOptions CompareOptions { get; set; }
public MyLocationTextExtractionStrategy(String textToSearchFor, System.Globalization.CompareOptions compareOptions = System.Globalization.CompareOptions.None)
{
this.TextToSearchFor = textToSearchFor;
this.CompareOptions = compareOptions;
}
public override void RenderText(TextRenderInfo renderInfo)
{
base.RenderText(renderInfo);
var startPosition = System.Globalization.CultureInfo.CurrentCulture.CompareInfo.IndexOf(renderInfo.GetText(), this.TextToSearchFor, this.CompareOptions);
if (startPosition < 0)
{
return;
}
var chars = renderInfo.GetCharacterRenderInfos().Skip(startPosition).Take(this.TextToSearchFor.Length).ToList();
var firstChar = chars.First();
var lastChar = chars.Last();
var bottomLeft = firstChar.GetDescentLine().GetStartPoint();
var topRight = lastChar.GetAscentLine().GetEndPoint();
var rect = new iTextSharp.text.Rectangle(
bottomLeft[Vector.I1],
bottomLeft[Vector.I2],
topRight[Vector.I1],
topRight[Vector.I2]
);
this.myPoints.Add(new RectAndText(rect, this.TextToSearchFor));
}
}
Call function
string thisDir = System.Web.Hosting.HostingEnvironment.MapPath("~/");
var testFile = thisDir + "example.pdf";
var t = new MyLocationTextExtractionStrategy("searchstring"); //need to search this searchstring
using (var r = new PdfReader(testFile))
{
var ex = PdfTextExtractor.GetTextFromPage(r, 1, t);
}
foreach (var p in t.myPoints)
{
Console.WriteLine(string.Format("Found text {0} at {1}x{2}", p.Text, p.Rect.Left, p.Rect.Bottom));
}

This can easily be managed (in iText7) using RegexBasedLocationExtractionStrategy.
This class can be constructed using a regular expression and pushes out the locations of the text matching the expression. Even if you can not switch to iText7, you can still have a look at the source code and see how we implemented it.

Loading XML document loads the same group twice

Some classes to start, I'm writing them all so you can reproduce my problem:
public class PermissionObject
{
public string permissionName;
public string permissionObject;
public bool permissionGranted;
public PermissionObject()
{
permissionName = "";
permissionObject = "";
permissionGranted = true;
}
public PermissionObject(string name, string obj, bool granted)
{
permissionName = name;
permissionObject = obj;
permissionGranted = granted;
}
}
public class Config
{
public string cmsDataPath = "";
public string cmsIP = "";
public List<UserClass> usersCMS = new List<UserClass>();
static public string pathToConfig = #"E:\testconpcms.xml";
public string cardServerAddress = "";
public void Save()
{
XmlSerializer serializer = new XmlSerializer(typeof(Config));
using (Stream fileStream = new FileStream(pathToConfig, FileMode.Create))
{
serializer.Serialize(fileStream, this);
}
}
public static Config Load()
{
if (File.Exists(pathToConfig))
{
XmlSerializer serializer = new XmlSerializer(typeof(Config));
try
{
using (Stream fileStream = new FileStream(pathToConfig, FileMode.Open))
{
return (Config)serializer.Deserialize(fileStream);
}
}
catch (Exception ex)
{
return new Config();
}
}
else
{
return null;
}
}
}
public class UserClass
{
public string Name;
public string Login;
public string Password;
public PCMS2 PermissionsList; // OR new PCMS1, as I will explain in a bit
public UserClass()
{
this.Name = "Admin";
this.Login = "61-64-6D-69-6E";
this.Password = "61-64-6D-69-6E";
this.PermissionsList = new PCMS2(); // OR new PCMS1, as I will explain in a bit
}
}
The problematic bit: consider two implementations of PCMS class, PCMS1 and PCMS2:
public class PCMS1
{
public PermissionObject p1, p2;
public PCMS1()
{
p1 = new PermissionObject("ImportConfigCMS", "tsmiImportCMSConfigFile", true);
p2 = new PermissionObject("ExportConfigCMS", "tsmiExportCMSConfigFile", true);
}
}
public class PCMS2
{
public List<PermissionObject> listOfPermissions = new List<PermissionObject>();
public PCMS2()
{
listOfPermissions.Add(new PermissionObject("ImportConfigCMS", "tsmiImportCMSConfigFile", true));
listOfPermissions.Add(new PermissionObject("ExportConfigCMS", "tsmiExportCMSConfigFile", true));
}
}
And finally main class:
public partial class Form1 : Form
{
private Config Con;
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
Con = Config.Load();
if (Con == null)
{
Con = new Config();
Con.cmsDataPath = #"E:\testconpcms.xml";
Con.Save();
}
if (Con.usersCMS.Count == 0)
{
UserClass adminDefault = new UserClass();
Con.usersCMS.Add(adminDefault);
Con.Save();
}
}
}
Now, using either PCMS1 or PCMS2, the config file generates properly - one user with 2 permissions.
However, when config file is present, calling Con = Config.Load() in the main class gives different results.
Using PCMS1, the Con object is as expected - 1 user with 2 permissions.
However, using PCMS2, the Con object is 1 user with 4 (four) permissions. It doubles that field (it's basically p1, p2, p1, p2). Put a BP to see Con after Load().
I guess the list (PCMS2) implementation is doing something wonky during load which I'm not aware of, but I can't seem to find the issue.

You creates your permission objects in constructor of PMCS2 you do it in the constructor of PMCS1 too, but there you do have two properties that will be overwritten by serializer.
In case of of PMCS2 your constructor adds two items to List and than serializer adds the items it has deserilized to the same list.
I don't know exactly your usecase but i would suggest to move init of the permissions to separated method:
public class PCMS1
{
public PermissionObject p1, p2;
public void Init()
{
p1 = new PermissionObject("ImportConfigCMS", "tsmiImportCMSConfigFile", true);
p2 = new PermissionObject("ExportConfigCMS", "tsmiExportCMSConfigFile", true);
}
}
public class PCMS2
{
public List<PermissionObject> listOfPermissions = new List<PermissionObject>();
public void Init()
{
listOfPermissions.Add(new PermissionObject("ImportConfigCMS", "tsmiImportCMSConfigFile", true));
listOfPermissions.Add(new PermissionObject("ExportConfigCMS", "tsmiExportCMSConfigFile", true));
}
}
after that you could call it, if you want to get initial settings:
if (Con.usersCMS.Count == 0)
{
UserClass adminDefault = new UserClass();
adminDefault.PermissionsList.Init();
Con.usersCMS.Add(adminDefault);
Con.Save();
}

C# Webbrowser control in Multi threading Environment

I am able to run WebBrowser in single threaded Env
But I am unable to run it in multithreaded env its throwing access violation error.
I am trying to run mutliple webbrowser instances and then taking screenshots.
Please have a look at the attached code.`
public class Image
{
public int id { get; set; }
public string url { get; set; }
}
internal class RenderMultipleImages
{
//Sample Urls which contains Ajax Calls
private static readonly List<Image> images = new List<Image>()
{
new Image(){id=1, url="www.abc1.com"},
new Image() {id=2,url= "www.abc2.com"},
new Image() {id=3,url= "www.abc3.com"},
new Image() {id=4,url= "www.abc4.com"},
new Image() {id=5, url="www.abc5.com"},
new Image() {id=6,url= "www.abc6.com"}
};
private static int _imageWidth = 1200;
private static int _imageHeight = 800;
public static void Render()
{
var startTime = DateTime.UtcNow;
var tasks = new List<Task>();
for (var i = 0; i < images.Count; i++)
{
var i1 = i;
var task = Task.Factory.StartNew(() => InvokeGenerateImage(images[i1].id, images[i1].url));
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
Console.WriteLine("Time taken to finish: {0}", (DateTime.UtcNow - startTime).TotalSeconds);
}
private static void InvokeGenerateImage(int instanceNumber, string url)
{
var thread = new Thread(() => GenerateImage(instanceNumber, url));
thread.SetApartmentState(ApartmentState.STA);
thread.Start();
thread.Join();
}
//[STAThread]
private static void GenerateImage(int instanceNumber, string url)
{
Console.WriteLine("Started instance number: {0}", instanceNumber);
var browser = new WebBrowser
{
ScrollBarsEnabled = false,
ScriptErrorsSuppressed = true,
Width = _imageWidth,
Height = _imageHeight
};
browser.Navigate(url);
while (browser.ReadyState != WebBrowserReadyState.Complete)
{
Application.DoEvents();
}
//As the url contains ajax calls am checking for the element id which tells us document loaded
while (browser.Document != null && browser.Document.GetElementById("chart-1") == null)
{
Application.DoEvents();
}
//Then screenshot
saveScreeshot(browser, instanceNumber);
browser.Dispose();
Console.WriteLine("Finished instance number: {0}", instanceNumber);
}
static void saveScreeshot(WebBrowser webb,int instance)
{
Bitmap bitmap = new Bitmap(1024, 768);
Rectangle bitmapRect = new Rectangle(0, 0, 1024, 768);
webb.Size = new Size(1024, 768);
webb.DrawToBitmap(bitmap, bitmapRect);
var name = instance + ".png";
bitmap.Save(name, ImageFormat.Png);
bitmap.Dispose();
}
}
}`

c# itextsharp how to get digital signature image

Is it possible to get the image of any digital signatures in a pdf file with itextsharp using c# code?
PdfReader pdf = new PdfReader("location.pdf");
AcroFields acroFields = pdf.AcroFields;
List<string> names = acroFields.GetSignatureNames();
foreach (var name in names)
{
PdfDictionary dict = acroFields.GetSignatureDictionary(name);
}
With this simple lines i can get the signature dictionaries but from this object i am not able to get the content of the image.
Can anyone help?

I answer my own question... if it could be usefull to someone else i did it like this.
I found a Java class to do what i was looking for and I translated it in C#.
class XyzmoSignatureDataExtractor
{
private PdfReader reader;
public XyzmoSignatureDataExtractor(PdfReader reader)
{
this.reader = reader;
}
public PdfImageObject extractImage(String signatureName)
{
MyImageRenderListener listener = new MyImageRenderListener();
PdfDictionary sigFieldDic = reader.AcroFields.GetFieldItem(signatureName).GetMerged(0);
PdfDictionary appearancesDic = sigFieldDic.GetAsDict(PdfName.AP);
PdfStream normalAppearance = appearancesDic.GetAsStream(PdfName.N);
PdfDictionary resourcesDic = normalAppearance.GetAsDict(PdfName.RESOURCES);
PdfContentStreamProcessor processor = new PdfContentStreamProcessor(listener);
processor.ProcessContent(ContentByteUtils.GetContentBytesFromContentObject(normalAppearance), resourcesDic);
return listener.image;
}
class MyImageRenderListener : IRenderListener
{
public void BeginTextBlock() { }
public void EndTextBlock() { }
public void RenderImage(ImageRenderInfo renderInfo)
{
try
{
image = renderInfo.GetImage();
}
catch (Exception e)
{
throw new Exception("Failure retrieving image", e);
}
}
public void RenderText(TextRenderInfo renderInfo) { }
public PdfImageObject image = null;
}
}
To use the class and save the image i just do like that:
PdfReader reader = new PdfReader("location.pdf");
XyzmoSignatureDataExtractor extractor = new XyzmoSignatureDataExtractor(reader);
AcroFields acroFields = reader.AcroFields;
foreach (string name in acroFields.GetSignatureNames())
{
PdfImageObject image = extractor.extractImage(name);
var _image = image.GetDrawingImage();
string file_name = "sig." + image.GetFileType();
_image.Save(file_name);
}

Serialize class with structs of same type name

I'm trying to XML serialize a class that contains two structs with the same name:
public class MyClass
{
public System.Windows.Size WSize = new System.Windows.Size();
public System.Drawing.Size DSize = new Size.Drawing.Size();
}
The resulting error:
Types 'System.Drawing.Size' and 'System.Windows.Size' both use the XML type name,
'Size', from namespace ''. Use XML attributes to specify a unique XML name and/or
namespace for the type.
Everything I've found so far involves decorating the Type with an XML attribute. I can't directly decorate either struct since they are not my code.
I feel like I'm missing something easy here...Is there an XML attribute that I can apply to the fields?
EDIT
I've added answer using a couple surrogate properties. I'm not happy with that particular implementation since it leaves public properties hanging out.
I've also considered DataContractSerialization but I'm hesitant to take that next step quite yet. Anyone else have something they can suggest?
EDIT 2
There may have been some confusion in my wording. I can modify and decorate MyClass, WSize and DSize. However, perhaps obviously, I cannot modify System.Windows.Size or System.Drawing.Size.

You can do it by proxy with custom XML serialization, I created this fully working example, although there is a lot of error checking to be done its a place to start.
public class MyClass
{
public System.Windows.Size WSize = new System.Windows.Size();
public System.Drawing.Size DSize = new System.Drawing.Size();
}
public class MyClassProxy : MyClass, IXmlSerializable
{
public new System.Windows.Size WSize { get { return base.WSize; } set { base.WSize = value; } }
public new System.Drawing.Size DSize { get { return base.DSize; } set { base.DSize = value; } }
public System.Xml.Schema.XmlSchema GetSchema()
{
return null;
}
public void ReadXml(System.Xml.XmlReader reader)
{
reader.MoveToContent();
reader.ReadStartElement();
string wheight = reader["height"];
string wwidth = reader["width"];
int w, h;
w = int.Parse(wwidth);
h = int.Parse(wheight);
WSize = new Size(w, h);
reader.ReadStartElement();
string dheight = reader["height"];
string dwidth = reader["width"];
w = int.Parse(dwidth);
h = int.Parse(dheight);
DSize = new System.Drawing.Size(w, h);
}
public void WriteXml(System.Xml.XmlWriter writer)
{
writer.WriteStartElement("MyClassProxy");
writer.WriteStartElement("WSize");
writer.WriteAttributeString("height", WSize.Height.ToString());
writer.WriteAttributeString("width", WSize.Width.ToString());
writer.WriteEndElement();
writer.WriteStartElement("DSize");
writer.WriteAttributeString("height", DSize.Height.ToString());
writer.WriteAttributeString("width", DSize.Width.ToString());
writer.WriteEndElement();
writer.WriteEndElement();
}
}
class Program
{
static void Main(string[] args)
{
MyClassProxy p = new MyClassProxy();
p.DSize = new System.Drawing.Size(100, 100);
p.WSize = new Size(400, 400);
string xml = "";
using (StringWriter sw = new StringWriter())
{
System.Xml.XmlWriter wr = System.Xml.XmlWriter.Create(sw);
p.WriteXml(wr);
wr.Close();
xml = sw.ToString();
}
MyClassProxy p2 = new MyClassProxy();
using (StringReader sr = new StringReader(xml))
{
System.Xml.XmlReader r = System.Xml.XmlReader.Create(sr);
p2.ReadXml(r);
}
MyClass baseClass = (MyClass)p2;
Print(baseClass);
Console.ReadKey();
}
static void Print(MyClass c)
{
Console.WriteLine(c.DSize.ToString());
Console.WriteLine(c.WSize.ToString());
}
}

Here's a possibility that I'm not terribly happy with (not very clean):
public class MyClass
{
public System.Windows.Size WSize = new System.Windows.Size();
[XmlIgnore]
public System.Drawing.Size DSize = new Size();
public int DSizeWidthForSerialization
{
get
{
return DSize.Width;
}
set
{
DSize.Width = value;
}
}
public int DSizeHeightForSerialization
{
get
{
return DSize.Height;
}
set
{
DSize.Height = value;
}
}
}

I ended up creating a new class to house System.Drawing.Size. Within that new class I created implicit operators and handled some of the constructors. This allowed me to serialize and not have to change any existing code:
public class MyClass
{
public System.Windows.Size WSize = new System.Windows.Size();
public MyDrawingSize DSize = new System.Drawing.Size();
public class MyDrawingSize
{
public int Height, Width;
public MyDrawingSize() { } //Needed for deserialization
public MyDrawingSize(int width, int height)
{
Width = width;
Height = height;
}
public static implicit operator System.Drawing.Size(MyDrawingSize size)
{
return new System.Drawing.Size(size.Width, size.Height);
}
public static implicit operator MyDrawingSize(System.Drawing.Size size)
{
return new MyDrawingSize() { Width = size.Width, Height = size.Height };
}
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

iText 7 - Extract Exact Word(s) In A Region - c#

Related

Find a string and location in PDF file using iTextSharp using ASP.Net C#

Loading XML document loads the same group twice

C# Webbrowser control in Multi threading Environment

c# itextsharp how to get digital signature image

Serialize class with structs of same type name

Categories

Resources