Get text above table MS Word - c#

This one is probably a little stupid, but I really need it. I have document with 5 tables each table has a heading. heading is a regular text with no special styling, nothing. I need to extract data from those tables + plus header.
Currently, using MS interop I was able to iterate through each cell of each table using something like this:
app.Tables[1].Cell(2, 2).Range.Text;
But now I'm struggling on trying to figure out how to get the text right above the table.
Here's a screenshot:
For the first table I need to get "I NEED THIS TEXT" and for secnd table i need to get: "And this one also please"
So, basically I need last paragraph before each table. Any suggestions on how to do this?

Mellamokb in his answer gave me a hint and a good example of how to search in paragraphs. While implementing his solution I came across function "Previous" that does exactly what we need. Here's how to use it:
wd.Tables[1].Cell(1, 1).Range.Previous(WdUnits.wdParagraph, 2).Text;
Previous accepts two parameters. First - Unit you want to find from this list: http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word.wdunits.aspx
and second parameter is how many units you want to count back. In my case 2 worked. It looked like it should be because it is right before the table, but with one, I got strange special character: ♀ which looks like female indicator.

You might try something along the lines of this. I compare the paragraphs to the first cell of the table, and when there's a match, grab the previous paragraph as the table header. Of course this only works if the first cell of the table contains a unique paragraph that would not be found in another place in the document:
var tIndex = 1;
var tCount = oDoc.Tables.Count;
var tblData = oDoc.Tables[tIndex].Cell(1, 1).Range.Text;
var pCount = oDoc.Paragraphs.Count;
var prevPara = "";
for (var i = 1; i <= pCount; i++) {
var para = oDoc.Paragraphs[i];
var paraData = para.Range.Text;
if (paraData == tblData) {
// this paragraph is at the beginning of the table, so grab previous paragraph
Console.WriteLine("Header: " + prevPara);
tIndex++;
if (tIndex <= tCount)
tblData = oDoc.Tables[tIndex].Cell(1, 1).Range.Text;
else
break;
}
prevPara = paraData;
}
Sample Output:
Header: I NEED THIS TEXT
Header: AND THIS ONE also please

Related

Conditional new Break for multi-column docx file, C#

This is a follow-up question for Creating Word file from ObservableCollection with C#.
I have a .docx file with a Body that has 2 columns for its SectionProperties. I have a dictionary of foreign words with their translation. On each line I need [Word] = [Translation] and whenever a new letter starts it should be in its own line, with 2 or 3 line breaks before and after that letter, like this:
A
A-word = translation
A-word = translation
B
B-word = translation
B-word = translation
...
I structured this in a for loop, so that in every iteration I'm creating a new paragraph with a possible Run for the letter (if a new one starts), a Run for the word and a Run for the translation. So the Run with the first letter is in the same Paragraph as the word and translation Run and it appends 2 or 3 Break objects before and after the Text.
In doing so the second column can sometimes start with 1 or 2 empty lines. Or the first column on the next page can start with empty lines.
This is what I want to avoid.
So my question is, can I somehow check if the end of the page is reached, or the text is at the top of the column, so I don't have to add a Break? Or, can I format the Column itself so that it doesn't start with an empty line?
I have tried putting the letter Run in a separate, optional, Paragraph, but again, I find myself having to input line breaks and the problem remains.
In the spirit of my other answer you can extend the template capability.
Use the Productivity tool to generate a single page break object, something like:
private readonly Paragraph PageBreakPara = new Paragraph(new Run(new Break() { Type = BreakValues.Page}));
Make a helper method that finds containers of a text tag:
public IEnumerable FindElements(OpenXmlCompositeElement searchParent, string tagRegex)
where T: OpenXmlElement
{
var regex = new Regex(tagRegex);
return searchParent.Descendants()
.Where(e=>(!(e is OpenXmlCompositeElement)
&& regex.IsMatch(e.InnerText)))
.SelectMany(e =>
e.Ancestors()
.OfType<T>()
.Union(e is T ? new T[] { (T)e } : new T[] {} ))
.ToList(); // can skip, prevents reevaluations
}
And another one that duplicates a range from the document and deletes range:
public IEnumerable<T> DuplicateRange<T>(OpenXmlCompositeElement root, string tagRegex)
where T: OpenXmlElement
{
// tagRegex must describe exactly two tags, such as [pageStart] and [pageEnd]
// or [page] [/page] - or whatever pattern you choose
var tagElements = FindElements(root, tagRegex);
var fromEl = tagElements.First();
var toEl = tagElements.Skip(1).First(); // throws exception if less than 2 el
// you may want to find a common parent here
// I'll assume you've prepared the template so the elements are siblings.
var result = new List<OpenXmlElement>();
var step = fromEl.NextSibling();
while (step !=null && toEl!=null && step!=toEl){
// another method called DeleteRange will instead delete elements in that range within this loop
var copy = step.CloneNode();
toEl.InsertAfterSelf(copy);
result.Add(copy);
step = step.NextSibling();
}
return result;
}
public IEnumerable<OpenXmlElement> ReplaceTag(OpenXmlCompositeElement parent, string tagRegex, string replacement){
var replaceElements = FindElements<OpenXmlElement>(parent, tagRegex);
var regex = new Regex(tagRegex);
foreach(var el in replaceElements){
el.InnerText = regex.Replace(el.InnerText, replacement);
}
return replaceElements;
}
Now you can have a document that looks like this:
[page]
[TitleLetter]
[WordTemplate][Word]: [Translation] [/WordTemplate]
[pageBreak]
[/page]
With that document you can duplicate the [page]..[/page] range, process it per letter and once you're out of letters - delete the template range:
var vocabulary = Dictionary>;
foreach (var letter in vocabulary.Keys.OrderByDescending(c=>c)){
// in reverse order because the copy range comes after the template range
var pageTemplate = DuplicateRange(wordDocument,"\\[/?page\\]");
foreach (var p in pageTemplate.OfType<OpenXmlCompositeElement>()){
ReplaceTag(p, "[TitleLetter]",""+letter);
var pageBr = ReplaceTag(p, "[pageBreak]","");
if (pageBr.Any()){
foreach(var pbr in pageBr){
pbr.InsertAfterSelf(PageBreakPara.CloneNode());
}
}
var wordTemplateFound = FindElements(p, "\\[/?WordTemplate\\]");
if (wordTemplateFound .Any()){
foreach (var word in vocabulary[letter].Keys){
var wordTemplate = DuplicateRange(p, "\\[/?WordTemplate\\]")
.First(); // since it's a single paragraph template
ReplaceTag(wordTemplate, "\\[/?WordTemplate\\]","");
ReplaceTag(wordTemplate, "\\[Word]",word);
ReplaceTag(wordTemplate, "\\[Translation\\]",vocabulary[letter][word]);
}
}
}
}
...Or something like it.
Look into SdtElements if things start getting too complicated
Don't use AltChunk despite the popularity of that answer, it requires Word to open and process the file, so you can't use some library to make a PDF out of it
Word documents are messy, the solution above should work (haven't tested) but the template must be carefully crafted, make backups of your template often
making a robust document engine isn't easy (since Word is messy), do the minimum you need and rely on the template being in your control (not user-editable).
the code above is far from optimized or streamlined, I've tried to condense it in the smallest footprint possible at the cost of presentability. There are probably bugs too :)

Adding numbers from two data frames in Deedle using multi key index

I am new to Deedle. I searched everywhere looking for examples that can help me to complete the following task:
Index data frame using multiple columns (3 in the example - Date, ID and Title)
Add numeric columns in multiple data frames together (Sales column in the example)
Group and add together sales occurred on the same day
My current approach is given below. First of all - it does not work because of the missing values and I don't know how to handle them easily while adding data frames. Second - I wonder if there is a better more elegant way to do it.
// Remove unused columns
var df = dfRaw.Columns[new[] { "Date", "ID", "Title", "Sales" }];
// Index data frame using 3 columns
var dfIndexed = df.IndexRowsUsing(r => Tuple.Create(r.GetAs<DateTime>("Date"), r.GetAs<string>("ID"), r.GetAs<string>("Title")) );
// Remove indexed columns
dfIndexed.DropColumn("Date");
dfIndexed.DropColumn("ID");
dfIndexed.DropColumn("Title");
// Add data frames. Does not work as it will add only
// keys existing in both data frames
dfTotal += dfIndexed
Table 1
Date,ID,Title,Sales,Market
2014-03-01,ID1,Title1,1,US
2014-03-01,ID1,Title1,2,CA
2014-03-03,ID2,Title2,3,CA
Table 2
Date,ID,Title,Sales,Market
2014-03-02,ID1,Title1,2,US
2014-03-03,ID2,Title2,2,CA
Expected Results
Date,ID,Title,Sales
2014-03-01,ID1,Title1,3
2014-03-02,ID1,Title1,2
2014-03-03,ID2,Title2,5
I think that your approach with using tuples makes sense.
It is a bit unfortunate that there is no easy way to specify default values when adding!
The easiest solution I can think of is to realign both series to the same set of keys and use fill operation to provide defaults. Using simple series as an example, something like this should do the trick:
var allKeys = seris1.Keys.Union(series2.Keys);
var aligned1 = series1.Realign(allKeys).FillMissing(0.0);
var aligned2 = series2.Realign(allKeys).FillMissing(0.0);
var res = aligned1 + aligned2;

JSON.NET XML to JSon

whilst trying to work on something else, i stumbled across JSON.NET, and have a quick question regarding the results.
I have a XML Field in sql, which i return in a data reader, I then run this through the following:
XmlDocument doc = new XmlDocument();
doc.LoadXml(rdr.GetString(0));
en.Add(JsonConvert.SerializeXmlNode(doc));
en is a List as there could be many rows returns. the JSON that is created is as follows with real data modified but the structure intact:
"{\"Entity\":{\"#xmlns:xsd\":\"http://www.w3.org/2001/XMLSchema\",\"#xmlns:xsi\":\"http://www.w3.org/2001/XMLSchema-instance\",\"AKA\":{\"string\":[\"Name 1\",\"Name 2\"]},\"Countries\":{\"string\":[\"UK\",\"US\"]},\"IdentNumbers\":{\"string\":[\"Date Set 2\",\"Data Set 1\",\"Data Set 3\",\"Data Set 4\"]},\"PercentageMatch\":\"94\"}}"
So if there were 3 entries then msg.d would contain three values as can be seen from FireBug output below
How do i loop through this information on the client side, and present it in a table?
EDIT
So for the table layout. Any single item needs to have a heading and its associated value, for any items that have one or more value, then i need the table to have a single heading with each item on a new line. Something similiar to this:
Heading 1
Single Item Value
Heading 2
First Item Value \n
Second Item Value
Heading 2
Single Item Value
EDIT
Ok, kind of getting to where I want it. i've produced this:
success: function (msg) {
var resultHtml = "";
$.each(msg.d, function (i, entity) {
//now entity will contain one row of data - you could access the following objects :
//entity.AKA is an array with which you could loop with
resultHtml += '<label><b>Countries:</b></label>';
resultHtml += '<text>' + entity.Countries + '</text>';
resultHtml += '<label><b>Ident:</b></label>';
resultHtml += '<text>' + entity.IdentNumbers + '</text>';
//etc
});
Which produces the output of heading in bold with the value underneath. What I know need to work out, is how to only show one instance at a time, and have pages to move through :-) Any Idea?
using $.each, maybe? Here's the syntax :
$.each(msg.d, function(i, entity) {
//now entity will contain one row of data - you could access the following objects :
//entity.AKA is an array with which you could loop with
//entity.Countries
//entity.IdentNumbers
//etc
});
Then you could construct that table in your each loop. If you give me more info on how you'd want to set up your table (the format), we could help you on that.
Here's a fiddle for you. Resize the output window and check the table : http://jsfiddle.net/hungerpain/9KBDg/

How to read from DataGrid Column Cells?

I'm messing with this problem for about 2 days now and searched on many boards for a solution to solve the problem :(
I wrote via linq XML Attributes in my DataGrids Column named "Betrag".
Now I want to get all of those Entries and then sum them up to one number ( all entries of the column are numbers!).
I hope somebody can help me with this problem.
Best Regards,
Fabian
Now some code :
data = new List<Daten>();
data = (from datensatz in doc1.Descendants("datensatz")
select new Daten
{
//datum = "27.6.2012",
datum =datensatz.Attribute("datum").Value,
//zweck = "Eröffnung",
zweck =datensatz.Attribute("zweck").Value,
//empfang = benutzer,
empfang =datensatz.Attribute("empfang").Value,
//betrag = "0€"
betrag =datensatz.Attribute("betrag").Value + "€"
}).ToList();
this.Daten.ItemsSource = data;
//THIS CODE ADDS THE ATTRIBUTES TO MY GRID
then I tried this :
kontostand += Convert.ToInt32(Daten.Columns[3].GetCellContent(1).ToString());
Why not just do something like this...
var sum = data.Sum(item=>item.betrag);//you might have to parse as number.
you could put that value in a property on the page and then put a databinding expression wherever you want to display the value.
I think you should avoid trying to sum the values in the cells.
Also, I think you should make the betrag property an integer, if possible. You could always add the symbol by using String.Format on the code in front.
This :
kontostand += Convert.ToInt32(Daten.Columns[3].GetCellContent(1).ToString());
Should be like this if its an asp grid:
kontostand += Convert.ToInt32(Daten.Rows.Cells[3].innerText);
If not then you need to loop the rows.

Aspose.Words - MailMerge images

I am trying to loop through a Dataset, creating a page per item using Aspose.Words Mail-Merge functionality. The below code is looping through a Dataset - and passing some values to the Mail-Merge Execute function.
var blankDocument = new Document();
var pageDocument = new Document(sFilename);
...
foreach (DataRow row in ds.Tables[0].Rows){
var sBarCode = row["BarCode"].ToString();
var imageFilePath = HttpContext.Current.Server.MapPath("\\_temp\\") + sBarCode + ".png";
var tempDoc = (Document)pageDocument.Clone(true);
var fieldNames = new string[] { "Test", "Barcode" };
var fieldData = new object[] { imageFilePath, imageFilePath };
tempDoc.MailMerge.Execute(fieldNames, fieldData);
blankDocument.AppendDocument(tempDoc, ImportFormatMode.KeepSourceFormatting);
}
var stream = new MemoryStream();
blankDocument.Save(stream, SaveFormat.Docx);
// I then output this stream using headers,
// to cause the browser to download the document.
The mail merge item { MERGEFIELD Test } gets the correct data from the Dataset. However the actual image displays page 1's image on all pages using:
{ INCLUDEPICTURE "{MERGEFIELD Barcode }" \* MERGEFORMAT \d }
Say this is my data for the "Barcode" field:
c:\img1.png
c:\img2.png
c:\img3.png
Page one of this document, displays c:\img1.png in text for the "Test" field. And the image that is show, is img1.png.
However Page 2 shows c:\img2.png as the text, but displays img1.png as the actual image.
Does anyone have any insight on this?
Edit: It seems as this is more of a Word issue. When I toggle between Alt+F9 modes inside Word, the image actually displays c:\img1.png as the source. So that would be why it is being displayed on every page.
I've simplified it to:
{ INCLUDEPICTURE "{MERGEFIELD Barcode }" \d }
Also, added test data for this field inside Word's Mailings Recipient List. When I preview, it doesn't pull in the data, changing the image. So, this is the root problem.
I know this is old question. But still I would like to answer it.
Using Aspose.Words it is very easy to insert images upon executing mail merge. To achieve this you should simply use mergefield with a special name, like Image:MyImageFieldName.
https://docs.aspose.com/words/net/insert-checkboxes-html-or-images-during-mail-merge/#how-to-insert-images-from-a-database
Also, it is not required to loop through rows in your dataset and execute mail merge for each row. Simply pass whole data into MailMerge.Execute method and Aspose.Words will duplicate template for each record in the data.
Here is a simple example of such template
After executing mail merge using the following code:
// Create dummy data.
DataTable dt = new DataTable();
dt.Columns.Add("FirstName");
dt.Columns.Add("LastName");
dt.Columns.Add("MyImage");
dt.Rows.Add("John", "Smith", #"C:\Temp\1.png");
dt.Rows.Add("Jane", "Smith", #"C:\Temp\2.png");
// Open template, execute mail merge and save the result.
Document doc = new Document(#"C:\Temp\in.docx");
doc.MailMerge.Execute(dt);
doc.Save(#"C:\Temp\out.docx");
The result will look like the following:
Disclosure: I work at Aspose.Words team.
If this was Word doing the output, (not sure about Aspose), there would be two possible problems here.
INCLUDEPICTURE expects backslashes to be doubled up, e.g. "c\\img2.png", or (somewhat less reliable) to use forward slashes, or Mac ":" separators on that platform. It may be OK if the data comes in via a field result as you are doing here, though.
INCLUDEPICTURE results have not updated automatically "by design" since Microsoft modified a bunch of field behaviors for security reasons about 10 years ago. If you are merging to an output document, you can probably work around that by using the following nested fields:
{ INCLUDEPICTURE { IF TRUE "{ MERGEFIELD Barcode }" } }
or to remove the fields in the result document,
{ IF { INCLUDEPICTURE { IF TRUE "{ MERGEFIELD Barcode }" } } {
INCLUDEPICTURE { IF TRUE "{ MERGEFIELD Barcode }" } } }
All the { } need to be inserted with Ctrl+F9 in the usual way.
(Don't ask me where this use of "TRUE" is documented - as far as I know, it is not.)

Categories