OpenXmlPowerTools.TextReplacer.SearchAndReplace is not working on word document - c#

I'm working on a simple document merge. Wanted to find some strings and replace it with another string (Outside table).
So here is the issue, when i try to use TextReplacer.SearchAndReplace after accessing table using var table = wordDoc.MainDocumentPart.Document.Body.Elements<DocumentFormat.OpenXml.Wordprocessing.Table>(); then SearchAndReplace is not working. Don't know what is the issue.
eg code:
private static async Task MergeDoc(WordprocessingDocument wordDoc) {
var table = wordDoc.MainDocumentPart.Document.Body.Elements<DocumentFormat.OpenXml.Wordprocessing.Table>();
TextReplacer.SearchAndReplace(wordDoc, "string to replace", "value", true);
}
If I remove the table variable which actually a reference to the table from word document, then SearchAndReplace is working

Related

Reading specific columns from DBF (Visual FoxPro) file in C#

I have been using DbfDataReader to read DBF files in my C# application. So far, I can read column name, column index, and iterate through the records successfully. There does not appear to be a way to read specific column data I'd like without using the column index. For example, I can get at the FIRSTNAME value with a statement like:
using DbfDataReader;
var dbfPath = "/CONTACTS.DBF";
using (var dbfTable = new DbfTable(dbfPath, EncodingProvider.UTF8))
{
var dbfRecord = new DbfRecord(dbfTable);
while (dbfTable.Read(dbfRecord))
{
Console.WriteLine(dbfRecord.Values[1].ToString()); // would prefer to use something like dbfRecord.Values["FIRSTNAME"].ToString()
Console.WriteLine(dbfRecord.Values[2].ToString()); // would prefer to use something like dbfRecord.Values["LASTNAME"].ToString()
}
}
Where 1 is the index of the FIRSTNAME column and 2 is the index of the LASTNAME column. Is there anyway to use "FIRSTNAME" (or the column name) as the key (or accessor) for what is essentially a name/value pair? My goal is to get all of the columns I care about without having to first build this map each time. (Please forgive me if the terms I am using are not exactly right).
Thanks so much for taking a look at this...
Use the DbfDataReader class as below:
var dbfPath = "/CONTACTS.DBF";
var options = new DbfDataReaderOptions
{
SkipDeletedRecords = true,
Encoding = EncodingProvider.UTF8
};
using (var dbfDataReader = new DbfDataReader.DbfDataReader(dbfPath, options))
{
while (dbfDataReader.Read())
{
Console.WriteLine(dbfDataReader["FIRSTNAME"])
Console.WriteLine(dbfDataReader["LASTNAME"])
}
}

Htmlagilitypack only parses table rows partialy

I'm trying to parse the main (last in the dom tree)
<table>
in this website: "https://aips.um.si/PredmetiBP5/Main.asp?Mode=prg&Zavod=77&Jezik=&Nac=1&Nivo=P&Prg=1571&Let=1"
Im using the Htmlagilitypack and writing code in C# on a wpf application in visual studio 17.
Right now im using this code:
iso = Encoding.GetEncoding("windows-1250");
web = new HtmlWeb()
{
AutoDetectEncoding = false,
OverrideEncoding = iso,
};
//http = https://aips.um.si/PredmetiBP5/Main.asp?Mode=prg&Zavod=77&Jezik=&Nac=1&Nivo=P&Prg=1571&Let=1
string http = formatLetnikLink(l.Attributes["onclick"].Value).ToString();
var htmlProgDoc = web.Load(http);
string s = htmlProgDoc.ParsedText;
htmlprogDoc.ParsedText correctly includes all the rows
that are supposed to be in the last table
(I had this for debugging, just incase the watch window was broken or something... idk...)
I tried to first get all the tables on the tables on the website. And realized that there are 6
<table></table>
tags on it, even tho you visualy see only one. After debuggign for a couple of hours, i realized that the last main table, is the last
<table>
in the dom tree, and that the parser parsing fully all the
<tr>
tags that the table has. This is the problem, I need all the tr tags.
var tables = htmlProgDoc.DocumentNode.SelectNodes("//table");
There are 6 times
<table></table>
tags, as expected, and everyone of them is fully parsed, including all their rows and columns, except the last one, in the last one it only parses the first two rows and then the parser apears to append a
</table>
by its self, I also tried using the direct xpath selector, copy-ed from firefox:
"/html/body/div/center[2]/font/font/font/table", instead of "//table"
which found the correct table, but the table also contained only the first 2 rows
var theTableINeed = tables.Last();
//contains the correct table which I need, but with only the first two rows
The Html on that page is malformed. One possible workaround is stripping the code for last table and parse it as a document.
var client = new WebClient();
string html = client.DownloadString(url);
int lastTableOpen = html.LastIndexOf("<table");
int lastTableClose = html.LastIndexOf("</table");
string lastTable = html.Substring(lastTableOpen, lastTableClose - lastTableOpen + 8);
Then use HtmlAgilityPack:
var table = new HtmlDocument();
table.LoadHtml(lastTable);
foreach (var row in table.DocumentNode.SelectNodes("//table//tr"))
{
Console.WriteLine(row.ToString());
}
But I don't know if there are problems in the table itself.

Duplication with OpenXML (word document) and ID issues

Is it possible to duplicate a word document element with OpenXML without having any issues of "duplicate id" ?
Actually, to duplicate, I clone the elements inside the body and append the cloned elements in the body. But if any of the element have an ID, I'm having errors when I open the document in word.
Here is an example of error from OpenXML validator :
[60] Description="Attribute 'id' should have unique value. Its
current value 'Rectangle 11' duplicates with
others."
And here is my code :
Document document = wordDocument.MainDocumentPart.Document;
Body body = document.Body;
IEnumerable<OpenXmlElement> elements = ((Body)body.CloneNode(true)).Elements();
foreach (var element in elements)
{
OpenXmlElement e = (OpenXmlElement)element.CloneNode(true);
body.AppendChild(e);
}
You can't just copy elements with an id, you have to duplicate Parts too (search OpenXmlPart for more informations).
You can do this by combining functions AddPart() and GetIdOfPart() (accessible from MainDocumentPart)
First try:
when you have an element with an id, use AddPart(OpenXmlPart part) to add the element part and retrieve the new generated id of the part with GetIdOfPart(OpenXmlPart part)
After that, you can replace in your cloned OpenXmlElement the id by the new one
Second try:
or you could imagine an other way like:
Check highest id of existing parts (and save it)
Clone all parts from the start and choose yourself the id (by adding the highest saved id)
When you copy each element and find an id, add the saved highest id to match with the new part
I hope one of this way will help you, but in any case you will need to clone parts
DocIO is a .NET class library that can read, write and render Microsoft Word documents. Using DocIO, you can clone the elements such as paragraph, table, text run or the entire document and append it where you need.
The whole suite of controls is available for free (commercial applications also) through the community license program if you qualify. The community license is the full product with no limitations or watermarks.
Herewith we have a given simple example code snippet which clone all the paragraphs and tables in the document body and append them at the end of the same document.
using Syncfusion.DocIO.DLS;
namespace DocIO_Clone
{
class Program
{
static void Main(string[] args)
{
using (WordDocument document = new WordDocument(#"InputWordFile.docx"))
{
int sectionCount = document.Sections.Count;
for (int i = 0; i < sectionCount; i++)
{
IWSection section = document.Sections[i];
int entityCount = section.Body.ChildEntities.Count;
for (int j = 0; j < entityCount; j++)
{
IEntity entity = section.Body.ChildEntities[j];
switch(entity.EntityType)
{
case EntityType.Paragraph:
IWParagraph paragraph = entity.Clone() as IWParagraph;
document.LastSection.Body.ChildEntities.Add(paragraph);
break;
case EntityType.Table:
IWTable table = entity.Clone() as IWTable;
document.LastSection.Body.ChildEntities.Add(table);
break;
}
}
}
document.Save("ResultDocument.docx");
}
}
}
}
For further information, please refer our help documentation
Note: I work for Syncfusion

Aspose.Words - MailMerge images

I am trying to loop through a Dataset, creating a page per item using Aspose.Words Mail-Merge functionality. The below code is looping through a Dataset - and passing some values to the Mail-Merge Execute function.
var blankDocument = new Document();
var pageDocument = new Document(sFilename);
...
foreach (DataRow row in ds.Tables[0].Rows){
var sBarCode = row["BarCode"].ToString();
var imageFilePath = HttpContext.Current.Server.MapPath("\\_temp\\") + sBarCode + ".png";
var tempDoc = (Document)pageDocument.Clone(true);
var fieldNames = new string[] { "Test", "Barcode" };
var fieldData = new object[] { imageFilePath, imageFilePath };
tempDoc.MailMerge.Execute(fieldNames, fieldData);
blankDocument.AppendDocument(tempDoc, ImportFormatMode.KeepSourceFormatting);
}
var stream = new MemoryStream();
blankDocument.Save(stream, SaveFormat.Docx);
// I then output this stream using headers,
// to cause the browser to download the document.
The mail merge item { MERGEFIELD Test } gets the correct data from the Dataset. However the actual image displays page 1's image on all pages using:
{ INCLUDEPICTURE "{MERGEFIELD Barcode }" \* MERGEFORMAT \d }
Say this is my data for the "Barcode" field:
c:\img1.png
c:\img2.png
c:\img3.png
Page one of this document, displays c:\img1.png in text for the "Test" field. And the image that is show, is img1.png.
However Page 2 shows c:\img2.png as the text, but displays img1.png as the actual image.
Does anyone have any insight on this?
Edit: It seems as this is more of a Word issue. When I toggle between Alt+F9 modes inside Word, the image actually displays c:\img1.png as the source. So that would be why it is being displayed on every page.
I've simplified it to:
{ INCLUDEPICTURE "{MERGEFIELD Barcode }" \d }
Also, added test data for this field inside Word's Mailings Recipient List. When I preview, it doesn't pull in the data, changing the image. So, this is the root problem.
I know this is old question. But still I would like to answer it.
Using Aspose.Words it is very easy to insert images upon executing mail merge. To achieve this you should simply use mergefield with a special name, like Image:MyImageFieldName.
https://docs.aspose.com/words/net/insert-checkboxes-html-or-images-during-mail-merge/#how-to-insert-images-from-a-database
Also, it is not required to loop through rows in your dataset and execute mail merge for each row. Simply pass whole data into MailMerge.Execute method and Aspose.Words will duplicate template for each record in the data.
Here is a simple example of such template
After executing mail merge using the following code:
// Create dummy data.
DataTable dt = new DataTable();
dt.Columns.Add("FirstName");
dt.Columns.Add("LastName");
dt.Columns.Add("MyImage");
dt.Rows.Add("John", "Smith", #"C:\Temp\1.png");
dt.Rows.Add("Jane", "Smith", #"C:\Temp\2.png");
// Open template, execute mail merge and save the result.
Document doc = new Document(#"C:\Temp\in.docx");
doc.MailMerge.Execute(dt);
doc.Save(#"C:\Temp\out.docx");
The result will look like the following:
Disclosure: I work at Aspose.Words team.
If this was Word doing the output, (not sure about Aspose), there would be two possible problems here.
INCLUDEPICTURE expects backslashes to be doubled up, e.g. "c\\img2.png", or (somewhat less reliable) to use forward slashes, or Mac ":" separators on that platform. It may be OK if the data comes in via a field result as you are doing here, though.
INCLUDEPICTURE results have not updated automatically "by design" since Microsoft modified a bunch of field behaviors for security reasons about 10 years ago. If you are merging to an output document, you can probably work around that by using the following nested fields:
{ INCLUDEPICTURE { IF TRUE "{ MERGEFIELD Barcode }" } }
or to remove the fields in the result document,
{ IF { INCLUDEPICTURE { IF TRUE "{ MERGEFIELD Barcode }" } } {
INCLUDEPICTURE { IF TRUE "{ MERGEFIELD Barcode }" } } }
All the { } need to be inserted with Ctrl+F9 in the usual way.
(Don't ask me where this use of "TRUE" is documented - as far as I know, it is not.)

C# resuable library to treat a text file like a database table?

I would like to store values in a text file as comma or tab seperated values (it doesn't matter).
I am looking for a reusable library that can manipulate this data in this text file, as if it were a sql table.
I need select * from... and delete from where ID = ...... (ID will be the first column in the text file).
Is there some code plex project that does this kind of thing?
I do not need complex functionality like joining or relationships. I will just have 1 text file, which will become 1 database table.
SQLite
:)
Use LINQ to CSV.
http://www.codeproject.com/KB/linq/LINQtoCSV.aspx
http://www.thinqlinq.com/Post.aspx/Title/LINQ-to-CSV-using-DynamicObject.aspx
If its not CSV in that case
Let your file hold one record per line. Each record at runtime should be read into a Collection of type Record [assuming Record is custom class representing individual record]. You can do LINQ operations on the collection and write back the collection into file.
Use ODBC. There is a Microsoft Text Driver for csv-Files. I think this would be possible. I don't have tested if you can manipulate via ODBC, but you can test it easily.
For querying you can also use linq.
Have you looked at the FileHelpers library? It has the capability of reading and parsing a text file into CLR objects. Combining that with the power of something like LINQ to Objects, you have the functionality you need.
public class Item
{
public int ID { get; set; }
public string Type { get; set; }
public string Instance { get; set; }
}
class Program
{
static void Main(string[] args)
{
string[] lines = File.ReadAllLines("database.txt");
var list = lines
.Select(l =>
{
var split = l.Split(',');
return new Item
{
ID = int.Parse(split[0]),
Type = split[1],
Instance = split[2]
};
});
Item secondItem = list.Where(item => item.ID == 2).Single();
List<Item> newList = list.ToList<Item>();
newList.RemoveAll(item => item.ID == 2);
//override database.txt with new data from "newList"
}
}
What about data delete. LINQ for query, not for manipulation.
However, List provides a predicate-based RemoveAll that does what you
want:
newList.RemoveAll(item => item.ID == 2);
Also you can overview more advanced solution "LINQ to Text or CSV Files"
I would also suggest to use ODBC. This should not complicate the deployment, the whole configuration can be set in the connection-string so you do not need a DSN.
Together with a schema.ini file you can even set column names and data-types, check this KB article from MS.
sqllite or linq to text and csv :)

Categories