I have a problem. I need to compare word document. Text and format in c# and i found a third party library to view and process the document and it is Devexpress. So i downloaded the trial to check if the problem can be solved with this
Example i have two word document
1: This is a text example
This is not a text example
In the text above the difference is only the word not
My problem is how can i check the difference including the format?
So far this is my code for iterating the contents of the Document
public void CompareEpub(string word)
{
try
{
using (DevExpress.XtraRichEdit.RichEditDocumentServer srv = new DevExpress.XtraRichEdit.RichEditDocumentServer())
{
srv.LoadDocument(word);
MyIterator visitor = new MyIterator();
DocumentIterator iterator = new DocumentIterator(srv.Document, true);
while (iterator.MoveNext())
{
iterator.Current.Accept(visitor);
}
foreach (var item in visitor.ListOfText)
{
Debug.WriteLine("text: " + item.Text + " b: " + item.IsBold + " u: " + item.IsUnderline + " i: " + item.IsUnderline);
}
}
}
catch (Exception ex)
{
Debug.WriteLine(ex.Message);
Debug.WriteLine(ex.StackTrace);
throw ex;
}
}
public class MyIterator : DocumentVisitorBase
{
public List<Model.HtmlContent> ListOfText { get; }
public MyIterator()
{
ListOfText= new List<Model.HtmlContent>();
}
public override void Visit(DocumentText text)
{
var m = new Model.HtmlContent
{
Text = text.Text,
IsBold = text.TextProperties.FontBold,
IsItalic = text.TextProperties.FontItalic,
IsUnderline = text.TextProperties.UnderlineWordsOnly
};
ListOfText.Add(m);
}
}
With the code above i can navigate to the text and its format. But how can i use this as a text compare?
If I'm going to create a two list for each document to compare.
How can i compare it?
If i'm going to compare the text in with another list. Compare it in loop.
I will be receiving it as only two words are equal.
Can help me with this. Or just provide an idea how i can make it work.
I didn't post in the devexpress forum because i feel that this is a problem with how i will be able to do it. And not a problem with the trial or the control i've been using. And i also found out that the control doesn't have a functionality to compare text. Like the one with Microsoft word.
Thank you.
Update:
Desired output
This is (not) a text example
The text inside the () means it is not found in the first document
The output i want is like the output of Diff Match Patch
https://github.com/pocketberserker/Diff.Match.Patch
But i can't implement the code for checking the format.
Related
I am using the PdfSharp reference library to attempt to add functionality to my program that adds metadata tags. I am able to successfully add metadata tags to a document, but I am having an issue with updating the tags on existing custom properties. Whenever I attempt to use my method to update the custom properties, I receive the following exception:
"'System.Collections.Generic.KeyValuePair' does not contain a definition for 'Name'."
Could you guys tell me if I am coding the if statement in the foreach loop below to correctly loop through all of the custom elements in the PDF document to see if it exists and needs to be updated? Thanks.
public void AddMetaDataPDF(string property, string propertyValue, string
path)
{
PdfDocument document = PdfReader.Open(path);
bool propertyFound = false;
try {
dynamic properties = document.Info.Elements;
foreach(dynamic p in properties)
{
//Check to see if the property exists. If it does, update
value.
if(string.Equals(p.Name, property,
StringComparison.InvariantCultureIgnoreCase))
{
document.Info.Elements.SetValue("/" + property, new
PdfString(propertyValue));
}
}
// the property doesn't exist so add it
if(!propertyFound)
{
document.Info.Elements.Add(new KeyValuePair<String, PdfItem>
("/"+ property, new PdfString(propertyValue)));
}
}
catch (Exception ex)
{
MessageBox.Show(path + "\n" + ex.Message);
document.Close();
}
finally
{
if(document != null)
{
document.Save(path);
document.Close();
}
}
}
I didn't try your code but a common issue when working with this library is that you need to add a slash before the name of the property for it to be found. The code below will make the trick.
PdfDocument document = PdfReader.Open(path);
var properties = document.Info.Elements;
if (properties.ContainsKey("/" + propertyName))
{
properties.SetValue("/" + propertyName, new PdfString(propertyValue));
}
else
{
properties.Add(new KeyValuePair<String, PdfItem>("/" + propertyName, new PdfString(propertyValue)));
}
document.Save(path);
document.Close();
Also the PDF file shouldn't be write protected. Otherwise you need to use a tool for unlocking the file before calling PdfSharp.
I'm using C# Code in Ranorex 5.4.2 to create a CSV file, have data gathered from an XML file and then have it write this into the CSV file. I've managed to get this process to work but I'm experiencing an issue where there are 12 blank lines created beneath the gathered data.
I have a file called CreateCSVFile which creates the CSV file and adds the headers in, the code looks like this:
writer.WriteLine("PolicyNumber,Surname,Postcode,HouseNumber,StreetName,CityName,CountyName,VehicleRegistrationPlate,VehicleMake,VehicleModel,VehicleType,DateRegistered,ABICode");
writer.WriteLine("");
writer.Flush();
writer.Close();
The next one to run is MineDataFromOutputXML. The program I am automating provides insurance quotes and an output xml file is created containing the clients details. I've set up a mining process which has a variable declared at the top which shows as:
string _PolicyHolderSurname = "";
[TestVariable("3E92E370-F960-477B-853A-0F61BEA62B7B")]
public string PolicyHolderSurname
{
get { return _PolicyHolderSurname; }
set { _PolicyHolderSurname = value; }
}
and then there is another section of code which gathers the information from the XML file:
var QuotePolicyHolderSurname = (XmlElement)xmlDoc.SelectSingleNode("//cipSurname");
string QuotePolicyHolderSurnameAsString = QuotePolicyHolderSurname.InnerText.ToString();
PolicyHolderSurname = QuotePolicyHolderSurnameAsString;
Report.Info( "Policy Holder Surname As String = " + QuotePolicyHolderSurnameAsString);
Report.Info( "Quote Policy Holder Surname = " + QuotePolicyHolderSurname.InnerText);
The final file is called SetDataSource and it puts the information into the CSV file, there is a variable declared at the top like this:
string _PolicyHolderSurname = "";
[TestVariable("222D47D2-6F66-4F05-BDAF-7D3B9D335647")]
public string PolicyHolderSurname
{
get { return _PolicyHolderSurname; }
set { _PolicyHolderSurname = value; }
}
This is then the code that adds it into the CSV file:
string Surname = PolicyHolderSurname;
Report.Info("Surname = " + Surname);
dataConn.Rows.Add(new string[] { Surname });
dataConn.Store();
There are multiple items in the Mine and SetDataSource files and the output looks like this in Notepad++:
Picture showing the CSV file after the code has been run
I believe the problem lies in the CreateCSVFile and the writer.WriteLine function. I have commented this region out but it then produces the CSV with just the headers showing.
I've asked some of the developers I work with but most don't know C# very well and no one has been able to solve this issue yet. If it makes a difference this is on Windows Server 2012r2.
Any questions about this please ask, I can provide the whole files if needed, they're just quite long and repetitive.
Thanks
Ben Jardine
I had the exact same thing to do in Ranorex. Since the question is a bit old I didn't checked your code but here is what I did and is working. I found an example (probably on stack) creating a csv file in C#, so here is my adaptation for using in Ranorex UserCodeCollection:
[UserCodeCollection]
public class UserCodeCollectionDemo
{
[UserCodeMethod]
public static void ConvertXmlToCsv()
{
System.IO.File.Delete("E:\\Ranorex_test.csv");
XDocument doc = XDocument.Load("E:\\lang.xml");
string csvOut = string.Empty;
StringBuilder sColumnString = new StringBuilder(50000);
StringBuilder sDataString = new StringBuilder(50000);
foreach (XElement node in doc.Descendants(GetServerLanguage()))
{
foreach (XElement categoryNode in node.Elements())
{
foreach (XElement innerNode in categoryNode.Elements())
{
//"{0}," give you the output in Comma seperated format.
string sNodePath = categoryNode.Name + "_" + innerNode.Name;
sColumnString.AppendFormat("{0},", sNodePath);
sDataString.AppendFormat("{0},", innerNode.Value);
}
}
}
if ((sColumnString.Length > 1) && (sDataString.Length > 1))
{
sColumnString.Remove(sColumnString.Length-1, 1);
sDataString.Remove(sDataString.Length-1, 1);
}
string[] lines = { sColumnString.ToString(), sDataString.ToString() };
System.IO.File.WriteAllLines(#"E:\Ranorex_test.csv", lines);
}
}
For your information, a simple version of my xml looks like that:
<LANGUAGE>
<ENGLISH ID="1033">
<TEXT>
<IDS_TEXT_CANCEL>Cancel</IDS_TEXT_CANCEL>
<IDS_TEXT_WARNING>Warning</IDS_TEXT_WARNING>
</TEXT>
<LOGINCLASS>
<IDS_LOGC_DLGTITLE>Log In</IDS_LOGC_DLGTITLE>
</LOGINCLASS>
</ENGLISH>
<FRENCH ID="1036">
<TEXT>
<IDS_TEXT_CANCEL>Annuler</IDS_TEXT_CANCEL>
<IDS_TEXT_WARNING>Attention</IDS_TEXT_WARNING>
</TEXT>
<LOGINCLASS>
<IDS_LOGC_DLGTITLE>Connexion</IDS_LOGC_DLGTITLE>
</LOGINCLASS>
</FRENCH>
</LANGUAGE>
I have the following code which tries to read data from a text file (so users can modify easily) and auto format a paragraph based on a the words in the text document plus variables in the form. I have the file "body" going into a field. my body text file has the following data in it
"contents: " + contents
I was hoping based on that to get
contents: Item 1, 2, etc.
based on my input. I only get exactly whats in the text doc despite putting "". What am I doing wrong? I was hoping to get variables in addition to my text.
string readSettings(string name)
{
string path = System.Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments) + "/Yuneec_Repair_Inv";
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader(path + "/" + name + ".txt"))
{
string data = sr.ReadToEnd();
return data;
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The settings file for " + name + " could not be read:");
Console.WriteLine(e.Message);
string content = "error";
return content;
}
}
private void Form1_Load(object sender, EventArgs e)
{
createSettings("Email");
createSettings("Subject");
createSettings("Body");
yuneecEmail = readSettings("Email");
subject = readSettings("Subject");
body = readSettings("Body");
}
private void button2_Click(object sender, EventArgs e)
{
bodyTextBox.Text = body;
}
If you want to provide the ability for your users to customize certain parts of the text you should use some "indicator" that you know before hand, that can be searched and parsed out, something like everything in between # and # is something you will read as a string.
Hello #Mr Douglas#,
Today is #DayOfTheWeek#.....
At that point your user can replace whatever they need in between the # and # symbols and you read that (for example using Regular Expressions) and use that as your "variable" text.
Let me know if this is what you are after and I can provide some C# code as an example.
Ok, this is the example code for that:
StreamReader sr = new StreamReader(#"C:\temp\settings.txt");
var set = sr.ReadToEnd();
var settings = new Regex(#"(?<=\[)(.*?)(?=\])").Matches(set);
foreach (var setting in settings)
{
Console.WriteLine("Parameter read from settings file is " + setting);
}
Console.WriteLine("Press any key to finish program...");
Console.ReadKey();
And this is the source of the text file:
Hello [MrReceiver],
This is [User] from [Company] something else, not very versatile using this as an example :)
[Signature]
Hope this helps!
When you read text from a file as a string, you get a string of text, nothing more.
There's no part of the system which assumes it's C#, parses, compiles and executes it in the current scope, casts the result to text and gives you the result of that.
That would be mostly not what people want, and would be a big security risk - the last thing you want is to execute arbitrary code from outside your program with no checks.
If you need a templating engine, you need to build one - e.g. read in the string, process the string looking for keywords, e.g. %content%, then add the data in where they are - or find a template processing library and integrate it.
I have been working successfully with the C# OpenXml SDK (Unofficial Microsoft Package 2.5 from NuGet) for some time now, but have recently noticed that the following line of code returns different results depending on what mood Microsoft Word appears to be in when the file gets saved:
var fields = document.Descendants<FieldCode>();
From what I can tell, when creating the document in the first place (using Word 2013 on Windows 8.1) if you use the Insert->QuickParts->Field and choose MergeField from the Field names left hand pane, and then provide a Field name in the field properties and click OK then the field code is correctly saved in the document as I would expect.
Then when using the aforementioned line of code I will receive a field code count of 1 field. If I subsequently edit this document (and even leave this field well alone) the subsequent saving could mean that this field code no longer is returned in my query.
Another case of the same curiousness is when I see the FieldCode nodes split across multiple items. So rather than seeing say:
" MERGEFIELD Author \\* MERGEFORMAT "
As the node name, I will see:
" MERGEFIELD Aut"
"hor \\* MERGEFORMAT"
Split as two FieldCode node values. I have no idea why this would be the case, but it certainly makes my ability to match nodes that much more exciting. Is this expected behaviour? A known bug? I don't really want to have to crack open the raw xml and edit this document to work until I understand what is going on. Many thanks all.
I came across this very problem myself, and found a solution that exists within OpenXML: a utility class called MarkupSimplifier which is part of the PowerTools for Open XML project. Using this class solved all the problems I was having that you describe.
The full article is located here.
Here are some pertinent exercepts :
Perhaps the most useful simplification that this performs is to merge adjacent runs with identical formatting.
It goes on to say:
Open XML applications, including Word, can arbitrarily split runs as necessary. If you, for instance, add a comment to a document, runs will be split at the location of the start and end of the comment. After MarkupSimplifier removes comments, it can merge runs, resulting in simpler markup.
An example of the utility class in use is:
SimplifyMarkupSettings settings = new SimplifyMarkupSettings
{
RemoveComments = true,
RemoveContentControls = true,
RemoveEndAndFootNotes = true,
RemoveFieldCodes = false,
RemoveLastRenderedPageBreak = true,
RemovePermissions = true,
RemoveProof = true,
RemoveRsidInfo = true,
RemoveSmartTags = true,
RemoveSoftHyphens = true,
ReplaceTabsWithSpaces = true,
};
MarkupSimplifier.SimplifyMarkup(wordDoc, settings);
I have used this many times with Word 2010 documents using VS2015 .Net Framework 4.5.2 and it has made my life much, much easier.
Update:
I have revisited this code and have found it clears upon runs on MERGEFIELDS but not IF FIELDS that reference mergefields e.g.
{if {MERGEFIELD When39} = "Y???" "Y" "N" }
I have no idea why this might be so, and examination of the underlying XML offers no hints.
Word will often split text runs with into multiple text runs for no reason I've ever understood. When searching, comparing, tidying etc. We preprocess the body with method which combines multiple runs into a single text run.
/// <summary>
/// Combines the identical runs.
/// </summary>
/// <param name="body">The body.</param>
public static void CombineIdenticalRuns(W.Body body)
{
List<W.Run> runsToRemove = new List<W.Run>();
foreach (W.Paragraph para in body.Descendants<W.Paragraph>())
{
List<W.Run> runs = para.Elements<W.Run>().ToList();
for (int i = runs.Count - 2; i >= 0; i--)
{
W.Text text1 = runs[i].GetFirstChild<W.Text>();
W.Text text2 = runs[i + 1].GetFirstChild<W.Text>();
if (text1 != null && text2 != null)
{
string rPr1 = "";
string rPr2 = "";
if (runs[i].RunProperties != null) rPr1 = runs[i].RunProperties.OuterXml;
if (runs[i + 1].RunProperties != null) rPr2 = runs[i + 1].RunProperties.OuterXml;
if (rPr1 == rPr2)
{
text1.Text += text2.Text;
runsToRemove.Add(runs[i + 1]);
}
}
}
}
foreach (W.Run run in runsToRemove)
{
run.Remove();
}
}
I tried to simplify the document with Powertools but the result was a corrupted word file. I make this routine for simplify only fieldcodes that has specifics names, works in all parts on the docs (maindocumentpart, headers and footers):
internal static void SimplifyFieldCodes(WordprocessingDocument document)
{
var masks = new string[] { Constants.VAR_MASK, Constants.INP_MASK, Constants.TBL_MASK, Constants.IMG_MASK, Constants.GRF_MASK };
SimplifyFieldCodesInElement(document.MainDocumentPart.RootElement, masks);
foreach (var headerPart in document.MainDocumentPart.HeaderParts)
{
SimplifyFieldCodesInElement(headerPart.Header, masks);
}
foreach (var footerPart in document.MainDocumentPart.FooterParts)
{
SimplifyFieldCodesInElement(footerPart.Footer, masks);
}
}
internal static void SimplifyFieldCodesInElement(OpenXmlElement element, string[] regexpMasks)
{
foreach (var run in element.Descendants<Run>()
.Select(item => (Run)item)
.ToList())
{
var fieldChar = run.Descendants<FieldChar>().FirstOrDefault();
if (fieldChar != null && fieldChar.FieldCharType == FieldCharValues.Begin)
{
string fieldContent = "";
List<Run> runsInFieldCode = new List<Run>();
var currentRun = run.NextSibling();
while ((currentRun is Run) && currentRun.Descendants<FieldCode>().FirstOrDefault() != null)
{
var currentRunFieldCode = currentRun.Descendants<FieldCode>().FirstOrDefault();
fieldContent += currentRunFieldCode.InnerText;
runsInFieldCode.Add((Run)currentRun);
currentRun = currentRun.NextSibling();
}
// If there is more than one Run for the FieldCode, and is one we must change, set the complete text in the first Run and remove the rest
if (runsInFieldCode.Count > 1)
{
// Check fielcode to know it's one that we must simplify (for not to change TOC, PAGEREF, etc.)
bool applyTransform = false;
foreach (string regexpMask in regexpMasks)
{
Regex regex = new Regex(regexpMask);
Match match = regex.Match(fieldContent);
if (match.Success)
{
applyTransform = true;
break;
}
}
if (applyTransform)
{
var currentRunFieldCode = runsInFieldCode[0].Descendants<FieldCode>().FirstOrDefault();
currentRunFieldCode.Text = fieldContent;
runsInFieldCode.RemoveAt(0);
foreach (Run runToRemove in runsInFieldCode)
{
runToRemove.Remove();
}
}
}
}
}
}
Hope this helps!!!
I use iText5 for .NET to extract text from a PDF, by using below code.
private void button1_Click(object sender, EventArgs e)
{
PdfReader reader2 = new PdfReader("Scharfetter1969.pdf");
int pagen = reader2.NumberOfPages;
reader2.Close();
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
for (int i = 1; i < 2; i++)
{
textBox1.Text = "";
PdfReader reader = new PdfReader("Scharfetter1969.pdf");
String s = PdfTextExtractor.GetTextFromPage(reader, i, its);
s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
textBox1.Text = s;
reader.Close();
}
}
But I want to get bibliographic data from research paper pdf.
Here is example of data which is extrected from this pdf (in endnote format), Here's a link!
%0 Journal Article
%T Repeated temperature modulation epitaxy for p-type doping and light-emitting diode based on ZnO
%A Tsukazaki, A.
%A Ohtomo, A.
%A Onuma, T.
%A Ohtani, M.
%A Makino, T.
%A Sumiya, M.
%A Ohtani, K.
%A Chichibu, S.F.
%A Fuke, S.
%A Segawa, Y.
%J Nature Materials
%V 4
%N 1
%P 42-46
%# 1476-1122
%D 2004
%I Nature Publishing Group
But remember that this is bibliographic information, it is not available in metadata of this pdf. I want to access Article Type (%O), Title (%T), Authors (%A), Date (%D) and (%I) and show it to different assigned textbox in window form.
I am using C# if any one have any code for this, or guide me how to do this.
PDF is a one-way format. You put data in so that it renders consistently on all devices (monitors, printers, etc) but the format was never intended to pull data back out. Any and all attempts to do that will be pure guess work. iText's PdfTextExtractor works but you are going to have to piece things together based on your own arbitrary set of rules, and these rules will probably change from PDF to PDF. The supplied PDF was created by InDesign which does such a great job of making text look good that it actually makes it even harder to parse the data back out.
That said, if your PDFs are all visually consistent, you could try to pull the data out while retaining formatting and use the formatting rules to guess what is what. That post will get you some HTML formatting that you could guess at. (If this actually works I'd recommend returning something more specific than HTML but I'll leave that up to you.)
Running it against your supplied PDF shows that the title is using the font HelveticaNeue-LightExt at about 17pts so you could write a rule to look for all lines that use that font at that size and combine them together. Authors are done in HelveticaNeue-Condensed at about 10pts so that's another rule.
The below code is a modified version of the one linked to above. Its a full working C# 2010 WinForms app targeting iTextSharp 5.1.1.0. It pulls out the title and authors for the supplied PDF but you'll need to tweak it for other PDFs and meta data. See the comments in the code for specific implementation details.
using System;
using System.Collections.Generic;
using System.Text;
using System.Windows.Forms;
using iTextSharp.text.pdf.parser;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
PdfReader reader = new PdfReader(System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "nmat4-42.pdf"));
TextWithFontExtractionStategy S = new TextWithFontExtractionStategy();
string F = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader, 1, S);
//Buffers to hold various parts from the PDF
List<string> titles = new List<string>();
List<string> authors = new List<string>();
//Array of lines of text
string[] lines = F.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
//Temporary string
string t;
//Loop through each line in the array
foreach (string line in lines)
{
//See if the line looks like a "title"
if (line.Contains("HelveticaNeue-LightExt") && line.Contains("font-size:17.28003"))
{
//Remove the HTML tags
titles.Add(System.Text.RegularExpressions.Regex.Replace(line, "</?span.*?>", "").Trim());
}
//See if the line looks like an "author"
else if (line.Contains("HelveticaNeue-Condensed") && line.Contains("font-size:9.995972"))
{
//Remove the HTML tags and trim extra characters
t = System.Text.RegularExpressions.Regex.Replace(line, "</?span.*?>", "").Trim(new char[] { ' ', ',', '*' });
//Make sure we have a valid name, probably need some more exceptions here, too
if (!string.IsNullOrWhiteSpace(t) && t != "AND")
{
authors.Add(t);
}
}
}
//Write out the title to the console
Console.WriteLine("Title : {0}", string.Join(" ", titles.ToArray()));
//Write out each author
foreach (string author in authors)
{
Console.WriteLine("Author : {0}", author);
}
Console.WriteLine(F);
this.Close();
}
public class TextWithFontExtractionStategy : iTextSharp.text.pdf.parser.ITextExtractionStrategy
{
//HTML buffer
private StringBuilder result = new StringBuilder();
//Store last used properties
private Vector lastBaseLine;
private string lastFont;
private float lastFontSize;
//http://api.itextpdf.com/itext/com/itextpdf/text/pdf/parser/TextRenderInfo.html
private enum TextRenderMode
{
FillText = 0,
StrokeText = 1,
FillThenStrokeText = 2,
Invisible = 3,
FillTextAndAddToPathForClipping = 4,
StrokeTextAndAddToPathForClipping = 5,
FillThenStrokeTextAndAddToPathForClipping = 6,
AddTextToPaddForClipping = 7
}
public void RenderText(iTextSharp.text.pdf.parser.TextRenderInfo renderInfo)
{
string curFont = renderInfo.GetFont().PostscriptFontName;
//Check if faux bold is used
if ((renderInfo.GetTextRenderMode() == (int)TextRenderMode.FillThenStrokeText))
{
curFont += "-Bold";
}
//This code assumes that if the baseline changes then we're on a newline
Vector curBaseline = renderInfo.GetBaseline().GetStartPoint();
Vector topRight = renderInfo.GetAscentLine().GetEndPoint();
iTextSharp.text.Rectangle rect = new iTextSharp.text.Rectangle(curBaseline[Vector.I1], curBaseline[Vector.I2], topRight[Vector.I1], topRight[Vector.I2]);
Single curFontSize = rect.Height;
//See if something has changed, either the baseline, the font or the font size
if ((this.lastBaseLine == null) || (curBaseline[Vector.I2] != lastBaseLine[Vector.I2]) || (curFontSize != lastFontSize) || (curFont != lastFont))
{
//if we've put down at least one span tag close it
if ((this.lastBaseLine != null))
{
this.result.AppendLine("</span>");
}
//If the baseline has changed then insert a line break
if ((this.lastBaseLine != null) && curBaseline[Vector.I2] != lastBaseLine[Vector.I2])
{
this.result.AppendLine("<br />");
}
//Create an HTML tag with appropriate styles
this.result.AppendFormat("<span style=\"font-family:{0};font-size:{1}\">", curFont, curFontSize);
}
//Append the current text
this.result.Append(renderInfo.GetText());
//Set currently used properties
this.lastBaseLine = curBaseline;
this.lastFontSize = curFontSize;
this.lastFont = curFont;
}
public string GetResultantText()
{
//If we wrote anything then we'll always have a missing closing tag so close it here
if (result.Length > 0)
{
result.Append("</span>");
}
return result.ToString();
}
//Not needed
public void BeginTextBlock() { }
public void EndTextBlock() { }
public void RenderImage(ImageRenderInfo renderInfo) { }
}
}
}