Conditional new Break for multi-column docx file, C# - c#

This is a follow-up question for Creating Word file from ObservableCollection with C#.
I have a .docx file with a Body that has 2 columns for its SectionProperties. I have a dictionary of foreign words with their translation. On each line I need [Word] = [Translation] and whenever a new letter starts it should be in its own line, with 2 or 3 line breaks before and after that letter, like this:
A
A-word = translation
A-word = translation
B
B-word = translation
B-word = translation
...
I structured this in a for loop, so that in every iteration I'm creating a new paragraph with a possible Run for the letter (if a new one starts), a Run for the word and a Run for the translation. So the Run with the first letter is in the same Paragraph as the word and translation Run and it appends 2 or 3 Break objects before and after the Text.
In doing so the second column can sometimes start with 1 or 2 empty lines. Or the first column on the next page can start with empty lines.
This is what I want to avoid.
So my question is, can I somehow check if the end of the page is reached, or the text is at the top of the column, so I don't have to add a Break? Or, can I format the Column itself so that it doesn't start with an empty line?
I have tried putting the letter Run in a separate, optional, Paragraph, but again, I find myself having to input line breaks and the problem remains.

In the spirit of my other answer you can extend the template capability.
Use the Productivity tool to generate a single page break object, something like:
private readonly Paragraph PageBreakPara = new Paragraph(new Run(new Break() { Type = BreakValues.Page}));
Make a helper method that finds containers of a text tag:
public IEnumerable FindElements(OpenXmlCompositeElement searchParent, string tagRegex)
where T: OpenXmlElement
{
var regex = new Regex(tagRegex);
return searchParent.Descendants()
.Where(e=>(!(e is OpenXmlCompositeElement)
&& regex.IsMatch(e.InnerText)))
.SelectMany(e =>
e.Ancestors()
.OfType<T>()
.Union(e is T ? new T[] { (T)e } : new T[] {} ))
.ToList(); // can skip, prevents reevaluations
}
And another one that duplicates a range from the document and deletes range:
public IEnumerable<T> DuplicateRange<T>(OpenXmlCompositeElement root, string tagRegex)
where T: OpenXmlElement
{
// tagRegex must describe exactly two tags, such as [pageStart] and [pageEnd]
// or [page] [/page] - or whatever pattern you choose
var tagElements = FindElements(root, tagRegex);
var fromEl = tagElements.First();
var toEl = tagElements.Skip(1).First(); // throws exception if less than 2 el
// you may want to find a common parent here
// I'll assume you've prepared the template so the elements are siblings.
var result = new List<OpenXmlElement>();
var step = fromEl.NextSibling();
while (step !=null && toEl!=null && step!=toEl){
// another method called DeleteRange will instead delete elements in that range within this loop
var copy = step.CloneNode();
toEl.InsertAfterSelf(copy);
result.Add(copy);
step = step.NextSibling();
}
return result;
}
public IEnumerable<OpenXmlElement> ReplaceTag(OpenXmlCompositeElement parent, string tagRegex, string replacement){
var replaceElements = FindElements<OpenXmlElement>(parent, tagRegex);
var regex = new Regex(tagRegex);
foreach(var el in replaceElements){
el.InnerText = regex.Replace(el.InnerText, replacement);
}
return replaceElements;
}
Now you can have a document that looks like this:
[page]
[TitleLetter]
[WordTemplate][Word]: [Translation] [/WordTemplate]
[pageBreak]
[/page]
With that document you can duplicate the [page]..[/page] range, process it per letter and once you're out of letters - delete the template range:
var vocabulary = Dictionary>;
foreach (var letter in vocabulary.Keys.OrderByDescending(c=>c)){
// in reverse order because the copy range comes after the template range
var pageTemplate = DuplicateRange(wordDocument,"\\[/?page\\]");
foreach (var p in pageTemplate.OfType<OpenXmlCompositeElement>()){
ReplaceTag(p, "[TitleLetter]",""+letter);
var pageBr = ReplaceTag(p, "[pageBreak]","");
if (pageBr.Any()){
foreach(var pbr in pageBr){
pbr.InsertAfterSelf(PageBreakPara.CloneNode());
}
}
var wordTemplateFound = FindElements(p, "\\[/?WordTemplate\\]");
if (wordTemplateFound .Any()){
foreach (var word in vocabulary[letter].Keys){
var wordTemplate = DuplicateRange(p, "\\[/?WordTemplate\\]")
.First(); // since it's a single paragraph template
ReplaceTag(wordTemplate, "\\[/?WordTemplate\\]","");
ReplaceTag(wordTemplate, "\\[Word]",word);
ReplaceTag(wordTemplate, "\\[Translation\\]",vocabulary[letter][word]);
}
}
}
}
...Or something like it.
Look into SdtElements if things start getting too complicated
Don't use AltChunk despite the popularity of that answer, it requires Word to open and process the file, so you can't use some library to make a PDF out of it
Word documents are messy, the solution above should work (haven't tested) but the template must be carefully crafted, make backups of your template often
making a robust document engine isn't easy (since Word is messy), do the minimum you need and rely on the template being in your control (not user-editable).
the code above is far from optimized or streamlined, I've tried to condense it in the smallest footprint possible at the cost of presentability. There are probably bugs too :)

Related

How to add a newline to run in C# and OpenXML with justified text using bookmarks

I have a Word template with bookmarks that I populate during runtime. The whole template text is already set to justified. If I only insert one piece of text at bookmark then it's OK but now I'm facing a problem where one bookmark can potentially contain multiple lines with line breaks between them. The code that works for one piece of text is here:
foreach (BookmarkStart bookMarkStart in doc.MainDocumentPart.RootElement.Descendants<BookmarkStart>())
{
if (bookMarkStart.Name == "Author")
{
var id = bookMarkStart.Id.Value;
var bookmarkEnd = bookMarkEnds.Where(i => i.Id.Value == id).First();
var runElement = new Run(new Text(author));
bookmarkEnd.Parent.InsertAfter(runElement, bookmarkEnd);
}
}
I have tried many things so far and the closest I've came was to append a break to the run and add new text after it. However, the problem I have is that the previous line is justified in a way that text is stretched over the whole line - e.g. if only two words are present then one is justified to the left side of the doc and the second one to the right. If I append multiple text lines then only the last one is correct.
I add multiple lines like this:
var id = bookMarkStart.Id.Value;
var bookmarkEnd = bookMarkEnds.Where(i => i.Id.Value == id).First();
var runElement = new Run();
runElement.Append(new Text(text1));
runElement.Append(new Break());
runElement.Append(new Text(text2));
bookmarkEnd.Parent.InsertAfter(runElement, bookmarkEnd);
Does anyone have an idea on how to achieve this?
I found a solution after posting the question here. The approach I used was to use paragraphs. In short, I built multiple paragraphs, one for each line, and added them to the bookmark as needed. Here's the sample code:
ParagraphProperties pProp = new ParagraphProperties();
Justification just = new Justification() { Val = JustificationValues.Both };
pProp.Append(just);
var run = new Run(new Text(slText));
var para = new Paragraph();
para.Append(pProp);
para.Append(run);
bookmark.Parent.InsertAfterSelf(para);
The above code can be repeated multiple times and paragraph(s) added after the bookmark.

How to write a Numbered List of Text in a specific location?

I need to write an array of string to a numbered list but in a specific location of a document.
For example, the array is:
sentence[0] : Jonathan Spielberg
sentence[1] : Stephanie Black
sentence[2] : Marcus Smith
sentence[3] : Kylie Ashton
...
Then it should be written in a specific location, let's say under the section heading "A. Candidate's Name"
A. Candidate's Name
1. Jonathan Spielberg
2. Stephanie Black
3. Marcus Smith
4. Kylie Ashton
My logic so far is using a unique tags, then it will be replaced and looped by the array to be written on a numbered list. Let's say the unique tag is ######CANDIDATESNAME#####. I've done such way, but that doesn't work.
How am I supposed to do to code this?
P.S. : I have a template document .doc/.docx for the only section headings, then I just need to fill it with the numbered list.
I would suggest you following solution.
1) Implement IReplacingCallback interface.
2) Use Range.Replace method to find the unique tag.
3) Move the cursor to the text (unique tag) and insert the numbered list.
Please read following documentation link and use following code to insert numbered list at the position of unique tag.
Find and Replace
string[] list = new string[] { "Jonathan Spielberg", "Stephanie Black", "Marcus Smith", "Kylie Ashton" };
Document mainDoc = new Document(MyDir + "in.docx");
mainDoc.Range.Replace(new Regex("######CANDIDATESNAME#####"), new FindandInsertList(list), false);
mainDoc.Save(MyDir + " Out.docx");
//--------------------------------------
public class FindandInsertList : IReplacingCallback
{
private string[] listitems;
public FindandInsertList(string[] list)
{
listitems = list;
}
ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
{
// This is a Run node that contains either the beginning or the complete match.
Node currentNode = e.MatchNode;
// The first (and may be the only) run can contain text before the match,
// in this case it is necessary to split the run.
if (e.MatchOffset > 0)
currentNode = SplitRun((Run)currentNode, e.MatchOffset);
// This array is used to store all nodes of the match for further removing.
ArrayList runs = new ArrayList();
// Find all runs that contain parts of the match string.
int remainingLength = e.Match.Value.Length;
while (
(remainingLength > 0) &&
(currentNode != null) &&
(currentNode.GetText().Length <= remainingLength))
{
runs.Add(currentNode);
remainingLength = remainingLength - currentNode.GetText().Length;
// Select the next Run node.
// Have to loop because there could be other nodes such as BookmarkStart etc.
do
{
currentNode = currentNode.NextSibling;
}
while ((currentNode != null) && (currentNode.NodeType != NodeType.Run));
}
// Split the last run that contains the match if there is any text left.
if ((currentNode != null) && (remainingLength > 0))
{
SplitRun((Run)currentNode, remainingLength);
runs.Add(currentNode);
}
// Create Document Buidler
DocumentBuilder builder = new DocumentBuilder(e.MatchNode.Document as Document);
builder.MoveTo((Run)runs[runs.Count - 1]);
builder.ListFormat.List = e.MatchNode.Document.Lists.Add(ListTemplate.NumberDefault);
foreach (string item in listitems)
{
builder.Writeln(item);
}
// End the bulleted list.
builder.ListFormat.RemoveNumbers();
// Now remove all runs in the sequence.
foreach (Run run in runs)
run.Remove();
// Signal to the replace engine to do nothing because we have already done all what we wanted.
return ReplaceAction.Skip;
}
private static Run SplitRun(Run run, int position)
{
Run afterRun = (Run)run.Clone(true);
afterRun.Text = run.Text.Substring(position);
run.Text = run.Text.Substring(0, position);
run.ParentNode.InsertAfter(afterRun, run);
return afterRun;
}
}
I work with Aspose as Developer evangelist.

c# - Reading a complex file into a comboBox

So I tried some research, but I just don't know how to google this..
For example, I got a .db (works same as .txt for me) file, written like this:
DIRT: 3;
STONE: 6;
so far, i got a code that can put items in a comboBox like this:
DIRT,
STONE,
will put DIRT and STONE in the comboBox. This is the code I'm using for that:
string[] lineOfContents = System.IO.File.ReadAllLines(dbfile);
foreach (var line in lineOfContents)
{
string[] tokens = line.Split(',');
comboBox1.Items.Add(tokens[0]);
}
How do I expand this so it put e.g. DIRT and STONE in the combobox, and keep the rest (3) in variables (ints, like int varDIRT = 3)?
If you want, it doesn't have to be txt or db files.. i heard xml are config files too.
Try doing something like this:
cmb.DataSource = File.ReadAllLines("filePath").Select(d => new
{
Name = d.Split(',').First(),
Value = Convert.ToInt32(d.Split(',').Last().Replace(";",""))
}).ToList();
cmb.DisplayMember = "Name";
cmb.ValueMember= "Value";
remember it will require to use using System.Linq;
if your want ot reference the selected value of the combobox you can use
cmb.SelectedValue;
cmb.SelectedText;
I think you've really got two questions, so I'll try to answer them separately.
The first question is "How can I parse a file that looks like this...
DIRT: 3;
STONE: 6;
into names and integers?" You could remove all the whitespace and semicolons from each line, and then split on colon. A cleaner way, in my opinion, would be to use a regular expression:
// load your file
var fileLines = new[]
{
"DIRT: 3;",
"STONE: 6;"
};
// This regular expression will match anything that
// begins with some letters, then has a colon followed
// by optional whitespace ending in a number and a semicolon.
var regex = new Regex(#"(\w+):\s*([0-9])+;", RegexOptions.Compiled);
foreach (var line in fileLines)
{
// Puts the tokens into an array.
// The zeroth token will be the entire matching string.
// Further tokens will be the contents of the parentheses in the expression.
var tokens = regex.Match(line).Groups;
// This is the name from the line, i.e "DIRT" or "STONE"
var name = tokens[1].Value;
// This is the numerical value from the same line.
var value = int.Parse(tokens[2].Value);
}
If you're not familiar with regular expressions, I encourage you to check them out; they make it very easy to format strings and pull out values. http://regexone.com/
The second question, "how do I store the value alongside the name?", I'm not sure I fully understand. If what you want to do is back each item with the numerical value specified in the file, the dub stylee's advice is good for you. You'll need to place the name as the display member and value as the value member. However, since your data is not in a table, you'll have to put the data somewhere accessible so that the Properties you want to use can be named. I recommend a dictionary:
// This is your ComboBox.
var comboBox = new ComboBox();
// load your file
var fileLines = new[]
{
"DIRT: 3;",
"STONE: 6;"
};
// This regular expression will match anything that
// begins with some letters, then has a colon followed
// by optional whitespace ending in a number and a semicolon.
var regex = new Regex(#"(\w+):\s*([0-9])+;", RegexOptions.Compiled);
// This does the same as the foreach loop did, but it puts the results into a dictionary.
var dictionary = fileLines.Select(line => regex.Match(line).Groups)
.ToDictionary(tokens => tokens[1].Value, tokens => int.Parse(tokens[2].Value));
// When you enumerate a dictionary, you get the entries as KeyValuePair objects.
foreach (var kvp in dictionary) comboBox.Items.Add(kvp);
// DisplayMember and ValueMember need to be set to
// the names of usable properties on the item type.
// KeyValue pair has "Key" and "Value" properties.
comboBox.DisplayMember = "Key";
comboBox.ValueMember = "Value";
In this version, I have used Linq to construct the dictionary. If you don't like the Linq syntax, you can use a loop instead:
var dictionary = new Dictionary<string, int>();
foreach (var line in fileLines)
{
var tokens = regex.Match(line).Groups;
dictionary.Add(tokens[1].Value, int.Parse(tokens[2].Value));
}
You could also use FileHelpers library. First define your data record.
[DelimitedRecord(":")]
public class Record
{
public string Name;
[FieldTrim(TrimMode.Right,';')]
public int Value;
}
Then you read in your data like so:
FileHelperEngine engine = new FileHelperEngine(typeof(Record));
//Read from file
Record[] res = engine.ReadFile("FileIn.txt") as Record[];
// write to file
engine.WriteFile("FileOut.txt", res);

Is there a better method of calling a comparision over a list of objects in C#?

I am reading in lines from a large text file. Amongst these file are occasional strings, which are in a preset list of possibilities, and I wish to check the line currently being read for a match to any of the strings in the possibilities list. If there is a match I want to simply append them to a different list, and continue the loop I am using to read the file.
I was just wondering if there is a more efficent way to do a line.Contains() or equivilance check against say the first element in the list, then the second, etc. without using a nested loop or a long if statement filled with "or"s.
Example of what I have now:
List<string> possible = new List<string> {"Cat", "Dog"}
using(StreamReader sr = new StreamReader(someFile))
{
string aLine;
while ((aLine = sr.Readline()) != null)
{
if (...)
{
foreach (string element in possible)
{
if line.Contains(element) == true
{
~add to some other list
continue
}
}
~other stuff
}
}
I don't know about more efficient run-time wise, but you can eliminate a lot of code by using LINQ:
otherList.AddRange(File.ReadAllLines(somefile).
.Where(line => possible.Any(p => line.Contains(p)));
I guess you are looking for:
if(possible.Any(r=> line.Contains(r)))
{
}
You can separate your work to Get Data and then Analyse Data. You don't have to do it in the same loop.
After reading lines, there are many ways to filter them. The most readable and maintenable IMO is to use Linq.
You can change your code to this:
// get lines
var lines = File.ReadLines("someFile");
// what I am looking for
var clues = new List<string> { "Cat", "Dog" };
// filter 1. Are there clues? This is if you only want to know
var haveCluesInLines = lines.Any(l => clues.Any(c => l.Contains(c)));
// filter 2. Get lines with clues
var linesWithClues = lines.Where(l => clues.Any(c => l.Contains(c)));
Edit:
Most likely you will have little clues and many lines. This example checks each line with every clue, saving time.

Get text above table MS Word

This one is probably a little stupid, but I really need it. I have document with 5 tables each table has a heading. heading is a regular text with no special styling, nothing. I need to extract data from those tables + plus header.
Currently, using MS interop I was able to iterate through each cell of each table using something like this:
app.Tables[1].Cell(2, 2).Range.Text;
But now I'm struggling on trying to figure out how to get the text right above the table.
Here's a screenshot:
For the first table I need to get "I NEED THIS TEXT" and for secnd table i need to get: "And this one also please"
So, basically I need last paragraph before each table. Any suggestions on how to do this?
Mellamokb in his answer gave me a hint and a good example of how to search in paragraphs. While implementing his solution I came across function "Previous" that does exactly what we need. Here's how to use it:
wd.Tables[1].Cell(1, 1).Range.Previous(WdUnits.wdParagraph, 2).Text;
Previous accepts two parameters. First - Unit you want to find from this list: http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word.wdunits.aspx
and second parameter is how many units you want to count back. In my case 2 worked. It looked like it should be because it is right before the table, but with one, I got strange special character: ♀ which looks like female indicator.
You might try something along the lines of this. I compare the paragraphs to the first cell of the table, and when there's a match, grab the previous paragraph as the table header. Of course this only works if the first cell of the table contains a unique paragraph that would not be found in another place in the document:
var tIndex = 1;
var tCount = oDoc.Tables.Count;
var tblData = oDoc.Tables[tIndex].Cell(1, 1).Range.Text;
var pCount = oDoc.Paragraphs.Count;
var prevPara = "";
for (var i = 1; i <= pCount; i++) {
var para = oDoc.Paragraphs[i];
var paraData = para.Range.Text;
if (paraData == tblData) {
// this paragraph is at the beginning of the table, so grab previous paragraph
Console.WriteLine("Header: " + prevPara);
tIndex++;
if (tIndex <= tCount)
tblData = oDoc.Tables[tIndex].Cell(1, 1).Range.Text;
else
break;
}
prevPara = paraData;
}
Sample Output:
Header: I NEED THIS TEXT
Header: AND THIS ONE also please

Categories