split a paragraph in word to two paragraphs using OpenXml SDK?

split a paragraph in word to two paragraphs using OpenXml SDK? - c#

I have a method that splits paragraphs into two paragraphs at the same location on the word file.
the logic is working fine but I'm losing the document format and styles.
foreach (Paragraph p in body.Descendants<Paragraph>())
{
//splitting the paragraph
var part2 = p.InnerText.Substring(startIndex);
var part1 = p.InnerText.Substring(0, startIndex);
p.InnerText.Replace(p.InnerText, part1);
p.InnerText.Replace(p.InnerText, part2);
pargs.Add(part1);
pargs.Add(part2);
}
// clean the documetn
body.RemoveAllChildren<Paragraph>();
//re-creat the paragraphs
for (int i = 0; i < pargs.Count; i++)
{
Paragraph para = body.AppendChild(new Paragraph());
Run run = para.AppendChild(new Run());
run.AppendChild(new Text(pargs[i]));
}
return paras;
above is simplified code
I know that my approach is the cause for this problem since I'm taking the inner text of each paragraph and creating a new paragraph without taking the styles. my question is. is there another approach

If you want to keep the original style, you can just copy run properties and set them in newly created Run.
Something like that (haven't tested though):
var properties = paragraph
.Descendants<Run>()
.First()
.RunProperties
.Clone();
...
var run = newParagraph.AppendChild(new Run()
{
RunProperties = (RunProperties)properties
});

Related

add character spacing for range of character in open xml c#

how to add character spacing for range of characters
like i want to give character spacing In word toggling for 2 characters
gg of spacing="Expanded" with By value of 4pt

Open XML has not only SDK, but also a tool for converting any document to C# code.
Best way to find out how to use some word feature is to make 2 short documnets - one with this feature used and the other - without. Then convert both documnets into C# code and compare generated code (you can use WinMerge, for example).

As you wish to apply a style to the middle of a paragraph you will need to have (at least) 3 separate Run elements; the first for the start of the unstyled text, the second for the text that has the spacing and the third for the rest of the unstyled text.
To add the character spacing you need to add a Spacing element to the Run. The Spacing element has a Value property that sets the spacing you want in twentieths of a point (so to get 4pt you need to set the Value to 80).
The following code will create a document with spacing on the gg in the word toggling
public static void CreateDoc(string fileName)
{
// Create a Wordprocessing document.
using (WordprocessingDocument package =
WordprocessingDocument.Create(fileName, WordprocessingDocumentType.Document))
{
// Add a new main document part.
package.AddMainDocumentPart();
//create a body and a paragraph
Body body = new Body();
Paragraph paragraph = new Paragraph();
//add the first part of the text to the paragraph in a Run
paragraph.AppendChild(new Run(new Text("This sentence has spacing between the gg in to")));
//create another run to hold the text with spacing
Run secondRun = new Run();
//create a RunProperties with a Spacing child.
RunProperties runProps = new RunProperties();
runProps.AppendChild(new Spacing() { Val = 80 });
//append the run properties to the Run we wish to assign spacing to
secondRun.AppendChild(runProps);
//add the text to the Run
secondRun.AppendChild(new Text("gg"));
//add the spaced Run to the paragraph
paragraph.AppendChild(secondRun);
//add the final text as a third Run
paragraph.AppendChild(new Run(new Text("ling")));
//add the paragraph to the body
body.AppendChild(paragraph);
package.MainDocumentPart.Document = new Document(body);
// Save changes to the main document part.
package.MainDocumentPart.Document.Save();
}
}
The above produces the following:
Note that you can set the Value of the Spacing to a negative number and the text will be condensed rather than expanded.

MigraDoc not adding page breaks automatically

I am tasked with refactoring an old MigraDoc project written by a dev that is no longer with my company, and am having problems with the following bit of code..
var Split = new String[1];
Split[0] = "||";
if (invoiceObject.Note != null)
{
var Lines = invoiceObject.Note.Split(Split, StringSplitOptions.RemoveEmptyEntries);
for (var i = 0; i < Lines.Count(); i++)
{
if (i > 0)
lineItemParagraph.AddLineBreak();
lineItemParagraph.AddText("" + Lines[i].Replace(" ", " ").Replace("|", ""));
}
}
This is working and it's taking a double pipe delimited notes fields and breaking it out into new lines as expected. The issue is that for very large note fields, the rendered PDF only has 1 page and the text just runs off the page. (The item I am testing with has enough data in the notes field for 20+ pages in the rendered PDF).
Edit
The code is inside of a text frame defined like this.
TextFrame lineItemFrame;
this.lineItemFrame = section.AddTextFrame();
this.lineItemFrame.Height = "3.0cm";
this.lineItemFrame.Width = "8.0cm";
this.lineItemFrame.Left = "0cm";
this.lineItemFrame.RelativeHorizontal = RelativeHorizontal.Margin;
this.lineItemFrame.Top = "9.0cm";
this.lineItemFrame.RelativeVertical = RelativeVertical.Page;
The Text frame is inside of a section that is defined like this. Looking through the code, it appears this is the only section on the PDF. Do I perhaps need more sections?
section = this.document.AddSection();
section.PageSetup.StartingNumber = 1;
I can't figure out how to make MigraDoc add the page breaks for me automatically.
Am I missing something painfully obvious?

MigraDoc adds page breaks automatically - with two exceptions: TextFrames do no break, table rows do not break. Tables break between rows only.

Place every sentence from a text file into an array but detect headers/titles

I need to get each sentence from a text document/string into an array.
The issue is with how to handle headers, titles etc. sections of text which are not part of a sentence, but don't end in a full stop ". " to detect.
Being unable to detect these will result them being stuck on to the front of the following sentence (if I use ". " to distinguish sentences) which I can't have happen.
Initially I was going to use:
contentRefined = content.Replace(" \n", ". ");
Which I thought would remove all of the empty lines and newlines, as well as place full stops on the ends of headers to be detected and treated as sentences, it would result in ". . " but I could again Replace them with nothing.
But didn't work it simply left the full empty lines and just put a ". " at the start of the empty line.... As well as ". " at the start of every paragraph
I have now tried:
contentRefined = Regex.Replace(content, #"^\s+$[\r\n]*", "", RegexOptions.Multiline);
Which fully removes the full empty lines, but doesn't get me closer to adding a full stop to the ends of the headers.
I need to place the sentences and headers/titles in an array, I'm not sure if there is a method of which I can do this without having to split the string by something such as ". "
Edit: Full current code showing how I get the test from the file
public void sentenceSplit()
{
content = File.ReadAllText(#"I:\Project\TLDR\Test Text.txt");
contentRefined = Regex.Replace(content, #"^\s+$[\r\n]*", "", RegexOptions.Multiline);
//contentRefined = content.Replace("\n", ". ");
}

I'm making an assumption that 'Header' and 'Title' are on their own line and do not end in a period.
If that's the case, then this may work for you:
var filePath = #"C:\Temp\temp.txt";
var sentences = new List<string>();
using (TextReader reader = new StreamReader(filePath))
{
while (reader.Peek() >= 0)
{
var line = reader.ReadLine();
if (line.Trim().EndsWith("."))
{
line.Split(new[] {'.'}, StringSplitOptions.RemoveEmptyEntries)
.ToList()
.ForEach(l => sentences.Add(l.Trim() + "."));
}
}
}
// Output sentences to console
sentences.ForEach(Console.WriteLine);
UPDATE
Another approach using the File.ReadAllLines() method, and displaying the sentences in a RichTextBox:
private void Form1_Load(object sender, EventArgs e)
{
var filePath = #"C:\Temp\temp.txt";
var sentences = File.ReadAllLines(filePath)
// Only select lines that end in a period
.Where(l => l.Trim().EndsWith("."))
// Split each line into sentences (one line may have many sentences)
.SelectMany(s => s.Split(new[] {'.'}, StringSplitOptions.RemoveEmptyEntries))
// Trim any whitespace off the ends of the sentence and add a period to the end
.Select(s => s.Trim() + ".")
// And finally cast it to a List (or you could do 'ToArray()')
.ToList();
// To show each sentence in the list on it's own line in the rtb:
richTextBox1.Text = string.Join("\n", sentences);
// Or to show them all, one after another:
richTextBox1.Text = string.Join(" ", sentences);
}
UPDATE
Now that I think I understand what you're asking, here's what I would do. First, I would create some classes to manage all this stuff. If you break the document down into parts, you get something like:
HEADER
Paragraph sentence one. Paragraph sentence two. Paragraph
sentence three with a number, like in this quote: "$5.00 doesn't go as
far as it used to".
Header Over an Empty Section
Header over multiple paragraphs
Paragraph sentence one. Paragraph
sentence two. Paragraph sentence three with a number, like in this
quote: "$5.00 doesn't go as far as it used to".
Paragraph sentence one. Paragraph sentence two. Paragraph sentence
three with a number, like in this quote: "$5.00 doesn't go as far as
it used to".
Paragraph sentence one. Paragraph sentence two. Paragraph sentence
three with a number, like in this quote: "$5.00 doesn't go as far as
it used to".
So I would create the following classes. First, one to represent a 'Section'. This is defined by a Header and zero to many paragraphs:
private class Section
{
public string Header { get; set; }
public List<Paragraph> Paragraphs { get; set; }
public Section()
{
Paragraphs = new List<Paragraph>();
}
}
Then I would define a Paragraph, which contains one or more sentences:
private class Paragraph
{
public List<string> Sentences { get; set; }
public Paragraph()
{
Sentences = new List<string>();
}
}
Now I can populate a List of Sections to represent the document:
var filePath = #"C:\Temp\temp.txt";
var sections = new List<Section>();
var currentSection = new Section();
var currentParagraph = new Paragraph();
using (TextReader reader = new StreamReader(filePath))
{
while (reader.Peek() >= 0)
{
var line = reader.ReadLine().Trim();
// Ignore blank lines
if (string.IsNullOrWhiteSpace(line)) continue;
if (line.EndsWith("."))
{
// This line is a paragraph, so add all the sentences
// it contains to the current paragraph
line.Split(new[] {". "}, StringSplitOptions.RemoveEmptyEntries)
.Select(l => l.Trim().EndsWith(".") ? l.Trim() : l.Trim() + ".")
.ToList()
.ForEach(l => currentParagraph.Sentences.Add(l));
// Now add this paragraph to the current section
currentSection.Paragraphs.Add(currentParagraph);
// And set it to a new paragraph for the next loop
currentParagraph = new Paragraph();
}
else if (line.Length > 0)
{
// This line is a header, so we're starting a new section.
// Add the current section to our list and create a
// a new one, setting this line as the header.
sections.Add(currentSection);
currentSection = new Section {Header = line};
}
}
// Finally, if the current section contains any data, add it to the list
if (currentSection.Header.Length > 0 || currentSection.Paragraphs.Any())
{
sections.Add(currentSection);
}
}
Now we have the whole document in a list of sections, and we know the order, the headers, the paragraphs, and the sentences they contain. As an example of how you can analyze it, here's a way to write it back out to a RichTextBox:
// We can build the document section by section
var documentText = new StringBuilder();
foreach (var section in sections)
{
// Here we can display headers and paragraphs in a custom way.
// For example, we can separate all sections with a blank line:
documentText.AppendLine();
// If there is a header, we can underline it
if (!string.IsNullOrWhiteSpace(section.Header))
{
documentText.AppendLine(section.Header);
documentText.AppendLine(new string('-', section.Header.Length));
}
// We can mark each paragraph with an arrow (--> )
foreach (var paragraph in section.Paragraphs)
{
documentText.Append("--> ");
// And write out each sentence, separated by a space
documentText.AppendLine(string.Join(" ", paragraph.Sentences));
}
}
// To make the underline approach above look
// half-way decent, we need a fixed-width font
richTextBox1.Font = new Font(FontFamily.GenericMonospace, 9);
// Now set the RichTextBox Text equal to the StringBuilder Text
richTextBox1.Text = documentText.ToString();

How to delete all match and delete text

How can I delete words from a "string" in the RichTextBox.
Example:
[02/04/2014 17:04:21] Thread 1 Banned: xxxxxxxxx#xxxx.tld
[02/04/2014 17:04:21] Thread 2: Banned: xxxxxxxxx#xxxx.tld
[02/04/2014 17:04:21] Thread 3: Banned: xxxxxxxxx#xxxx.tld
[02/04/2014 17:04:21] Thread 4: Banned: xxxxxxxxx#xxxx.tld
I would like to delete all rows with the word "Banned" in the line.
How can I do this?
Thanks in advance.

You can use LINQ to remove all the lines that contains the work "Banned":
richTextBox1.Lines = richTextBox1.Lines
.Where((line, b) => !line.Contains("Banned"))
.Select((line, b) => line).ToArray();

I know this method looks ugly. But, if you don't want to remove formatting from the existing text in the richtextbox then you should use this method. This example is not tested but, you can get logic from here.
for (int iLine = 0; iLine < rtf.Lines.Length; iLine++)
{
if (rtf.Lines[iLine].Contains("Banned"))
{
int iIndex = rtf.Text.IndexOf(rtf.Lines[iLine]);
rtf.SelectionStart = iIndex;
rtf.SelectionLength = rtf.Lines[iLine].Length;
rtf.SelectedText = string.Empty;
iLine--; //-- is beacause you are removing a line from the Lines array.
}
}

You could try use the answer from this post - I would adjust the code slightly to neaten it up a bit.
URL:
what is the best way to remove words from richtextbox?
Code snippet from the URL (which would need to be tidied up).
string[] lines = richTextBox1.Lines;
List<string> linesToAdd = new List<string>();
string filterString = "Banned".";
foreach (string s in lines)
{
string temp = s;
if (s.Contains(filterString))
temp = s.Replace(filterString, string.Empty);
linesToAdd.Add(temp);
}
richTextBox1.Lines = linesToAdd.ToArray();
I would adjust the above code and whilst still using the loop, just check if the line contains the word you looking for "Banned" and then remove the line / do what is need with it.
I hope this helps?

VSTO format word-footer to be left-bound

im need to implement a way to one-click add a footer to a word document consisting of one line.
The first part needs to be the absolute Path to the document and it has to be left-bound. In addition to this, there has to be the actual page number aligned to the right.
This wasn't a problem on Excel; there I could use LeftFooter, CenterFooter, RightFooter.
On Word however there are no such properties to access.
edit: I found a semi-working solution which has some bugs in it and isn't properly designed because I could not find a proper way yet.
Word.Document doc = Globals.ThisAddIn.Application.ActiveDocument;
foreach (Word.Section wordSection in doc.Sections)
{
Word.Range PageNumberRange = wordSection.Range;
PageNumberRange.Fields.Add(PageNumberRange, Word.WdFieldType.wdFieldEmpty ,"PAGE Arabic ", true);
Word.Range footer = wordSection.Footers[Word.WdHeaderFooterIndex.wdHeaderFooterPrimary].Range;
footer.ParagraphFormat.Alignment = Word.WdParagraphAlignment.wdAlignParagraphCenter;
footer.Tables.Add(footer, 1, 3);
Word.Table tbl = footer.Tables[1];
tbl.Cell(1, 1).Range.Text = doc.FullName;
tbl.Cell(1, 3).Range.Text = PageNumberRange.Text;
/**/
footer.Font.ColorIndex = Word.WdColorIndex.wdBlack;
footer.Font.Size = 6;
PageNumberRange.Text = "";
The problems with this one are: It never overwrites the exisiting footer. If it writes "document1 ... 1" and you click on it again, because you saved your document, it doenst change the footer. Furthermore: If you have multiple pages, every page except page 1 gets deleted.
I never imagined it could be so hard, to implement such an easy task.

An alternative approach using styles
Document doc = this.application.ActiveDocument;
Section wordSection = doc.Sections[1];
Range footer = wordSection.Footers[WdHeaderFooterIndex.wdHeaderFooterPrimary].Range;
footer.Fields.Add(footer, WdFieldType.wdFieldEmpty, #"PAGE \* ARABIC", true);
footer.Collapse(WdCollapseDirection.wdCollapseStart);
footer.InsertBefore("\t \t");
footer.InsertBefore(doc.FullName);
footer.Font.Name = "Arial";

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

split a paragraph in word to two paragraphs using OpenXml SDK? - c#

Related

add character spacing for range of character in open xml c#

MigraDoc not adding page breaks automatically

Place every sentence from a text file into an array but detect headers/titles

How to delete all match and delete text

VSTO format word-footer to be left-bound

Categories

Resources