Text Delta Determination (Hierarchical)

Text Delta Determination (Hierarchical) - c#

I have a case which requires comparison (and diff determination) of two texts.
These are actually hierarchical configuration files and all sub items are indented in every level. Example :
File 1 :
conf_1
conf_1_1
conf_1_2
conf_1_3
conf_1_3_1
conf_1_4
conf_1_4_1
conf_1_4_2
File 2 :
conf_1
conf_1_1
conf_1_2
conf_1_3
conf_1_3_1
conf_1_3_2
conf_1_4
conf_1_4_1
conf_1_4_2
conf_1_5
The comparison between these two files should result as :
Result :
conf_1
conf_1_3
conf_1_3_2
conf_1_5
Remarks :
I'm only interested in plus delta (the additions in second file).
Order of lines may change between two files, this shouldn't be
interpreted as difference, as soon as hierarchy is preserved.
I have a solution :
"Flattening" the lines of each files ( e.g. conf_1 > conf_1_3 > conf_1_3_1 ), performing a brute-force comparison (comparing each line in File1 with each line in File2) and then re-indenting the different lines.
But I'm looking for more efficient solutions.
Any idea?
Thanks in advance.

I would suggest populating a 2 hierarchical lists and processing them recursively.
Start with defining a simple class:
class Node
{
public string Text;
public List<Node> Children;
}
Here the Text is supposed to contain the text with indentation removed.
Then you would populate two node lists from your files, build another node list with differences, and write the result to another file. Something like this:
var nodes1 = ReadNodes(sourceFile1);
var nodes2 = ReadNodes(sourceFile2);
var diff = GetDiff(nodes1, nodes2);
if (diff.Count > 0)
{
using (var sw = new StreamWriter(diffFile))
WriteDiff(sw, diff);
}
The used methods are:
static List<Node> ReadNodes(string fileName)
{
// I'm leaving that part for you
}
static List<Node> GetDiff(List<Node> nodes1, List<Node> nodes2)
{
if (nodes2 == null || nodes2.Count == 0) return null;
if (nodes1 == null || nodes1.Count == 0) return nodes2;
var map = nodes1.ToDictionary(n => n.Text);
var diff = new List<Node>();
foreach (var n2 in nodes2)
{
Node n1;
if (!map.TryGetValue(n2.Text, out n1))
diff.Add(n2);
else
{
var childDiff = GetDiff(n1.Children, n2.Children);
if (childDiff != null && childDiff.Count > 0)
diff.Add(new Node { Text = n2.Text, Children = childDiff });
}
}
return diff;
}
static void WriteDiff(TextWriter output, List<Node> nodes, int indent = 0)
{
if (nodes == null) return;
foreach (var node in nodes)
{
for (int i = 0; i < indent; i++)
output.Write(' ');
output.WriteLine(node.Text);
WriteDiff(output, node.Children, indent + 4);
}
}
Test using your example:
var nodes1 = new List<Node>
{
new Node { Text = "conf_1", Children = new List<Node> {
new Node { Text = "conf_1_1" },
new Node { Text = "conf_1_2" },
new Node { Text = "conf_1_3", Children = new List<Node> {
new Node { Text = "conf_1_3_1" },
} },
new Node { Text = "conf_1_4", Children = new List<Node> {
new Node { Text = "conf_1_4_1" },
new Node { Text = "conf_1_4_2" },
} },
}},
};
var nodes2 = new List<Node>
{
new Node { Text = "conf_1", Children = new List<Node> {
new Node { Text = "conf_1_1" },
new Node { Text = "conf_1_2" },
new Node { Text = "conf_1_3", Children = new List<Node> {
new Node { Text = "conf_1_3_1" },
new Node { Text = "conf_1_3_2" },
} },
new Node { Text = "conf_1_4", Children = new List<Node> {
new Node { Text = "conf_1_4_1" },
new Node { Text = "conf_1_4_2" },
} },
new Node { Text = "conf_1_5" },
}},
};
var diff = GetDiff(nodes1, nodes2);
if (diff.Count > 0)
{
using (var sw = new StringWriter())
{
WriteDiff(sw, diff);
Console.WriteLine(sw.ToString());
}
}

Related

How to set values in dynamic JSON

I have an XML file from which dynamically creates a JSON. The last entry in each row should get a value when the JSON is finished.
However when the JSON is finished only the last entry from the last row has a value. I guess its overwriting since I need to recreate every entry for the right JSON structure.
I split the Group of each row and each point is a JObject except the one with "[0]", thats a JArray.
Has anyone an idea how I can avoid overwriting my JObjects? Thanks.
My XML
My JSON
private static void GenerateElement(JObject parent, string path, string value)
{
var parts = path.Split('.');
foreach (var item in parts)
{
var partsCount = parts.Count();
var currentElement = item;
if (partsCount > 1)
{
path = path.Remove(0, item.Length);
if (path.StartsWith("."))
{
path = path.Remove(0, 1);
}
else if (path.EndsWith("."))
{
path = path.Remove(path.Length - 1, 1);
}
parts = path.Split('.');
}
var nextElementPath = path;
var elementArrayName = currentElement;
if (currentElement.IndexOf("[0]") >= 0)
{
elementArrayName = currentElement.Substring(0, currentElement.IndexOf("[0]"));
}
else
{
parent[currentElement] = new JObject();
}
if (partsCount > 1)
{
if (parent[nextElementPath] == null)
{
if (parent[elementArrayName] == null)
{
parent[elementArrayName] = new JArray();
(parent[elementArrayName] as JArray).Add(new JObject());
}
}
if (parent[elementArrayName] is JArray)
{
parent = parent[elementArrayName][0] as JObject;
}
else
{
parent = parent[currentElement] as JObject;
}
}
else
{
parent[currentElement] = value;
}
}
}

Looking way to refactor these two methods into single method

Hi All i am trying to generate the word document with two different tables included in it, for this purpose i have two similar methods where i am passing word document reference and data object and table to the similar methods..
Now i am looking to make single method in generic way so that in different places i can use single method by passing parameters to it
Method 1 :
private static List<OpenXmlElement> RenderExhaustEquipmentTableDataAndNotes(MainDocumentPart mainDocumentPart, List<ProjectObject<ExhaustEquipment>> exhaustEquipment,Table table)
{
HtmlConverter noteConverter = new HtmlConverter(mainDocumentPart);
var equipmentExhaustTypes = new Dictionary<string, List<ProjectObject<ExhaustEquipment>>>();
foreach (var item in exhaustEquipment)
{
string exhaustEquipmentName = item.TargetObject.Name;
if (!equipmentExhaustTypes.ContainsKey(exhaustEquipmentName))
{
equipmentExhaustTypes.Add(exhaustEquipmentName, new List<ProjectObject<ExhaustEquipment>>());
}
equipmentExhaustTypes[exhaustEquipmentName].Add(item);
}
List<OpenXmlElement> notes = new List<OpenXmlElement>();
int noteIndex = 1;
foreach (var exhaustEquipmentItem in equipmentExhaustTypes)
{
List<string> noteIndices = new List<string>();
for (int exhaustEquipmentConditionIndex = 0; exhaustEquipmentConditionIndex < exhaustEquipmentItem.Value.Count; exhaustEquipmentConditionIndex++)
{
var condition = exhaustEquipmentItem.Value[exhaustEquipmentConditionIndex];
var row = new TableRow();
Run superscriptRun = new Run(new RunProperties(new VerticalTextAlignment { Val = VerticalPositionValues.Superscript }));
if (exhaustEquipmentConditionIndex == 0)
{
row.Append(RenderOpenXmlElementContentCell(new Paragraph(
new List<Run> {
new Run(new RunProperties(), new Text(exhaustEquipmentItem.Key) { Space = SpaceProcessingModeValues.Preserve }),
superscriptRun
}), 1,
new OpenXmlElement[] {new VerticalMerge { Val = MergedCellValues.Restart },new TableCellMargin {
LeftMargin = new LeftMargin { Width = "120" },
TopMargin = new TopMargin { Width = "80" } }
}));
}
else
{
row.Append(RenderTextContentCell(null, 1, null, null, new OpenXmlElement[] { new VerticalMerge { Val = MergedCellValues.Continue } }));
}
row.Append(RenderTextContentCell(condition.TargetObject.IsConstantVolume ? "Yes" : "No"));
row.Append(RenderTextContentCell($"{condition.TargetObject.MinAirflow:R2}"));
row.Append(RenderTextContentCell($"{condition.TargetObject.MaxAirflow:R2}"));
if (condition.TargetObject.NotesHTML?.Count > 0)
{
foreach (var note in condition.TargetObject.NotesHTML)
{
var compositeElements = noteConverter.Parse(note);
var htmlRuns = compositeElements.First().ChildElements.Where(c => c is Run).Cast<Run>().Select(n => n.CloneNode(true));
notes.Add(new Run(htmlRuns));
noteIndices.Add(noteIndex++.ToString(CultureInfo.InvariantCulture));
}
}
if (exhaustEquipmentConditionIndex == exhaustEquipmentItem.Value.Count - 1 && condition.TargetObject.NotesHTML?.Count > 0)
{
superscriptRun.Append(new Text($"({String.Join(',', noteIndices)})") { Space = SpaceProcessingModeValues.Preserve });
}
table.Append(row);
}
}
List<OpenXmlElement> notesSection = new List<OpenXmlElement>();
List<OpenXmlElement> result = RenderNotesArray(table, notes, notesSection);
return result;
}
and I am calling this method like this in below
var table = new Table(RenderTableProperties());
table.Append(new TableRow(
RenderTableHeaderCell("Name"),
RenderTableHeaderCell("Constant Volume"),
RenderTableHeaderCell("Minimum Airflow", units: "(cfm)"),
RenderTableHeaderCell("Wet Bulb Temperature", units: "(cfm)")
));
body.Append(RenderExhaustEquipmentTableDataAndNotes(mainDocumentPart, designHubProject.ExhaustEquipment, table));
Method 2:
private static List<OpenXmlElement> RenderInfiltrationTableData(MainDocumentPart mainDocumentPart, List<ProjectObject<Infiltration>> infiltration,Table table)
{
HtmlConverter noteConverter = new HtmlConverter(mainDocumentPart);
var nameByInflitrationObject = new Dictionary<string, List<ProjectObject<Infiltration>>>();
foreach (var infiltrationData in infiltration)
{
string infiltrationName = infiltrationData.TargetObject.Name;
if (!nameByInflitrationObject.ContainsKey(infiltrationName))
{
nameByInflitrationObject.Add(infiltrationName, new List<ProjectObject<Infiltration>>());
}
nameByInflitrationObject[infiltrationName].Add(infiltrationData);
}
List<OpenXmlElement> notes = new List<OpenXmlElement>();
int noteIndex = 1;
foreach (var inflitrationDataItem in nameByInflitrationObject)
{
List<string> noteIndices = new List<string>();
for (int inflitrationNameIndex = 0; inflitrationNameIndex < inflitrationDataItem.Value.Count; inflitrationNameIndex++)
{
var dataItem = inflitrationDataItem.Value[inflitrationNameIndex];
var row = new TableRow();
Run superscriptRun = new Run(new RunProperties(new VerticalTextAlignment { Val = VerticalPositionValues.Superscript }));
if (inflitrationNameIndex == 0)
{
row.Append(RenderOpenXmlElementContentCell(new Paragraph(
new List<Run> {
new Run(new RunProperties(), new Text(inflitrationDataItem.Key) { Space = SpaceProcessingModeValues.Preserve }),superscriptRun
}), 1,
new OpenXmlElement[] {new VerticalMerge { Val = MergedCellValues.Restart },new TableCellMargin {
LeftMargin = new LeftMargin { Width = "120" },
TopMargin = new TopMargin { Width = "80" }}
}));
}
else
{
row.Append(RenderTextContentCell(null, 1, null, null, new OpenXmlElement[] { new VerticalMerge { Val = MergedCellValues.Continue } }));
}
row.Append(RenderTextContentCell($"{dataItem.TargetObject.AirflowScalar.ToString("R2", CultureInfo.CurrentCulture)} cfm {EnumUtils.StringValueOfEnum(dataItem.TargetObject.InfiltrationCalculationType).ToLower(CultureInfo.CurrentCulture)}"));
if (dataItem.TargetObject.NotesHTML?.Count > 0)
{
foreach (var note in dataItem.TargetObject.NotesHTML)
{
var compositeElements = noteConverter.Parse(note);
var htmlRuns = compositeElements.First().ChildElements.Where(c => c is Run).Cast<Run>().Select(n => n.CloneNode(true));
notes.Add(new Run(htmlRuns));
noteIndices.Add(noteIndex++.ToString(CultureInfo.InvariantCulture));
}
}
if (inflitrationNameIndex == inflitrationDataItem.Value.Count - 1 && dataItem.TargetObject.NotesHTML?.Count > 0)
{
superscriptRun.Append(new Text($"({String.Join(',', noteIndices)})") { Space = SpaceProcessingModeValues.Preserve });
}
table.Append(row);
}
}
List<OpenXmlElement> notesSection = new List<OpenXmlElement>();
List<OpenXmlElement> result = RenderNotesArray(table, notes, notesSection);
return result;
}
and then i am calling this method here like as below
var table = new Table(RenderTableProperties());
table.Append(new TableRow(
RenderTableHeaderCell("Type"),
RenderTableHeaderCell("Air Flow")
));
body.Append(RenderInfiltrationTableData(mainDocumentPart, designHubProject.Infiltration, table));
i know these are lots of lines but is there any generic way to use single method out of these two similar methods and i am using .net core
Could any one please suggest any idea or suggestion how can i refactor these two methods into single method that would be very grateful.
many thanks in advance

Before we can create a single function that handles both types, achieving the highly laudable goal of removing gratuitous duplication, we should clean the code up to make it easier to see which parts, if any, are different between the two nearly identical methods. And there is a lot to clean up, even if we only had one function.
In short, your functions are too long, having too much much code in one place, and in fact too much code altogether.
In the following, the original code has been broken down into multiple functions with specific purposes and refactored to remove DIY nonsense in favor of the standard library functions and the removal of pointless code.
static IEnumerable<OpenXmlElement> RenderExhaustEquipmentTableDataAndNotes(MainDocumentPart mainDocumentPart, List<ProjectObject<ExhaustEquipment>> exhaustEquipment, Table table)
{
var equipmentByType = exhaustEquipment.ToLookup(item => item.TargetObject.Name);
List<OpenXmlElement> notes = new List<OpenXmlElement>();
foreach (var items in equipmentByType)
{
Run superscriptRun = CreateSuperScriptRun();
foreach (var item in items)
{
var row = new TableRow();
if (item == items.First())
{
row.Append(CreateFirstRowStartingCell(items.Key, superscriptRun));
}
else
{
row.Append(RenderTextContentCell(null, 1, null, null, new[] {
new VerticalMerge { Val = MergedCellValues.Continue }
}));
}
row.Append(RenderTextContentCell(item.TargetObject.IsConstantVolume ? "Yes" : "No"));
row.Append(RenderTextContentCell($"{item.TargetObject.MinAirflow:R2}"));
row.Append(RenderTextContentCell($"{item.TargetObject.MaxAirflow:R2}"));
table.Append(row);
var itemNotes = ParseNotes(mainDocumentPart, item.TargetObject.NotesHTML);
if (item == items.Last() && itemNotes.Any())
{
UpdateSuperScript(superscriptRun, itemNotes);
}
notes.AddRange(itemNotes);
}
}
List<OpenXmlElement> result = RenderNotesArray(table, notes, new List<OpenXmlElement>());
return result;
}
private static Run CreateSuperScriptRun()
{
return new Run(new RunProperties(new VerticalTextAlignment
{
Val = VerticalPositionValues.Superscript
}));
}
private static void UpdateSuperScript(Run superscriptRun, IEnumerable<OpenXmlElement> notes)
{
superscriptRun.Append(new Text($"({string.Join(",", Enumerable.Range(0, notes.Count()))})")
{
Space = SpaceProcessingModeValues.Preserve
});
}
private static IEnumerable<OpenXmlElement> ParseNotes(MainDocumentPart mainDocumentPart, IEnumerable<OpenXmlElement> notes)
{
return notes == null
? Enumerable.Empty<OpenXmlElement>()
: notes.Select(note => new HtmlConverter(mainDocumentPart).Parse(note))
.Select(note => note.First().ChildElements
.OfType<Run>()
.Select(n => n.CloneNode(true))).Select(htmlRuns => new Run(htmlRuns))
.ToList();
}
private OpenXmlElement CreateFirstRowStartingCell(string key, Run superscriptRun)
{
return RenderOpenXmlElementContentCell(
new Paragraph(new List<Run> {
new Run(new RunProperties(), new Text(key) { Space = SpaceProcessingModeValues.Preserve }),
superscriptRun
}),
1,
new OpenXmlElement[] {
new VerticalMerge { Val = MergedCellValues.Restart },
new TableCellMargin { LeftMargin = new LeftMargin { Width = "120" }, TopMargin = new TopMargin { Width = "80" } }
});
}
Now, let's tackle the second function:
static IEnunumerable<OpenXmlElement> RenderInfiltrationTableData(MainDocumentPart mainDocumentPart, IEnunumerable<ProjectObject<Infiltration>> infiltration, Table table)
{
var infiltrationsByType = infiltration.ToLookup(item => item.TargetObject.Name);
List<OpenXmlElement> notes = new List<OpenXmlElement>();
foreach (var inflitrations in infiltrationsByType)
{
Run superscriptRun = CreateSuperScriptRun();
foreach (var item in inflitrations)
{
var row = new TableRow();
if (item == inflitrations.First())
{
row.Append(CreateFirstRowStartingCell(inflitrations.Key, superscriptRun));
}
else
{
row.Append(RenderTextContentCell(null, 1, null, null, new[] {
new VerticalMerge { Val = MergedCellValues.Continue }
}));
}
row.Append(RenderTextContentCell($"{item.TargetObject.AirflowScalar:R2} cfm {item.TargetObject.InfiltrationCalculationType}").ToLower());
table.Append(row);
var itemNotes = ParseNotes(mainDocumentPart, item.TargetObject.NotesHTML);
if (item == inflitrations.Last() && itemNotes.Any())
{
UpdateSuperScript(superscriptRun, itemNotes);
}
notes.AddRange(itemNotes);
}
}
IEnumerable<OpenXmlElement> result = RenderNotesArray(table, notes, new List<OpenXmlElement>());
return result;
}
As we have seen, duplication can be massively reduced simply by extracting code into simple helper functions.
This also makes it far easier to see just where the differences are between the two functions.
It is simply a matter of
row.Append(RenderTextContentCell(item.TargetObject.IsConstantVolume ? "Yes" : "No"));
row.Append(RenderTextContentCell($"{item.TargetObject.MinAirflow:R2}"));
row.Append(RenderTextContentCell($"{item.TargetObject.MaxAirflow:R2}"));
vs.
row.Append(RenderTextContentCell($"{item.TargetObject.AirflowScalar:R2} cfm {item.TargetObject.InfiltrationCalculationType}").ToLower());
To achieve your desired goal of a single function, we can make a generic function, and require that the caller pass in a function that will take care of these differences.
static IEnumerable<OpenXmlElement> RenderTableDataAndNotes<T>(
MainDocumentPart mainDocumentPart,
IEnumerable<ProjectObject<T>> projects,
Table table,
Func<ProjectObject<T>, IEnumerable<OpenXmlElement>> createCells
) where T : ITargetObject
{
var projectsByType = projects.ToLookup(item => item.TargetObject.Name);
List<OpenXmlElement> notes = new List<OpenXmlElement>();
foreach (var items in projectsByType)
{
Run superscriptRun = CreateSuperScriptRun();
foreach (var item in items)
{
var row = new TableRow();
if (item == items.First())
{
row.Append(CreateFirstRowStartingCell(items.Key, superscriptRun));
}
else
{
row.Append(RenderTextContentCell(null, 1, null, null, new[] {
new VerticalMerge { Val = MergedCellValues.Continue }
}));
}
var itemCells = createCells(item);
foreach (var cell in itemCells)
{
row.Append(cell);
}
table.Append(row);
var itemNotes = ParseNotes(mainDocumentPart, item.TargetObject.NotesHTML);
if (item == items.Last() && itemNotes.Any())
{
UpdateSuperScript(superscriptRun, itemNotes);
}
notes.AddRange(itemNotes);
}
}
IEnumerable<OpenXmlElement> result = RenderNotesArray(table, notes, new List<OpenXmlElement>());
return result;
}
Now, when we call it for say some Exhaust Equipment, we do so as follows:
var rendered = RenderTableDataAndNotes(mainDocumentPart, exhaustProjects, table,
exhaust => new[] {
RenderTextContentCell(exhaust.TargetObject.IsConstantVolume ? "Yes" : "No"),
RenderTextContentCell($"{exhaust.TargetObject.MinAirflow:R2}"),
RenderTextContentCell($"{exhaust.TargetObject.MaxAirflow:R2}"),
});
And for infiltration projects, we would do as follows:
var rendered = RenderTableDataAndNotes(
mainDocumentPart,
infiltrationProjects,
table,
infiltration => new[] {
RenderTextContentCell($"{item.TargetObject.AirflowScalar:R2} cfm {item.TargetObject.InfiltrationCalculationType}")
.ToLower()
});
The code could still be substantially improved even now. Currently it requires that the various project types implement a common ITargetObject interface declaring the Name property used to group projects by type. If you refactored your code to reduce nesting by hoisting Name to the ProjectObject<T> type, then we could remove the constraint and the otherwise useless requirement that Infiltration and ExhaustEquipment implement the ITargetObject interface.
Note, if you can't change the types, you can adjust the code in a few ways.
For example, you can remove the type constraint on T and build the lookup outside and pass it to the function:
static IEnumerable<OpenXmlElement> RenderTableDataAndNotes<T>(
MainDocumentPart mainDocumentPart,
ILookup<string, ProjectObject<T>> projectsByType,
Table table,
Func<ProjectObject<T>, IEnumerable<OpenXmlElement>> createCells
)
Then you would call it as
var infiltrationProjectsByType = infiltrationProjects.ToLookup(project => project.Name);
var rendered = RenderTableDataAndNotes(
mainDocumentPart,
infiltrationProjectsByType,
table,
infiltration => new[] {
RenderTextContentCell($"{infiltration.TargetObject.AirflowScalar:R2} cfm {infiltration.TargetObject.InfiltrationCalculationType}").ToLower()
}
);

How to add multilevel list in word document using openxml sdk and c#?

I am trying to implement multilevel list in word document programmatically using openxml sdk v2.5 and C#. I have used the following code in order to display headings in multilevel list. I am calling the AddHeading method and passing the respective parameters as shown below. My expected result is to be as below.
1. Parent
1.1. childItem
1.1.1. subchildItem
1.2. childItem
2. Parent
2.1. childItem
3. Parent
But the output i am getting is
1. Parent
1. childItem
1. subchildItem
2. childItem
2. Parent
1. childItem
3. Parent
public static void AddHeading(WordprocessingDocument document, string colorVal, int fontSizeVal, string styleId, string styleName, string headingText, int numLvlRef, int numIdVal)
{
StyleRunProperties styleRunProperties = new StyleRunProperties();
Color color = new Color() { Val = colorVal };
DocumentFormat.OpenXml.Wordprocessing.FontSize fontSize1 = new DocumentFormat.OpenXml.Wordprocessing.FontSize();
fontSize1.Val = new StringValue(fontSizeVal.ToString());
styleRunProperties.Append(color);
styleRunProperties.Append(fontSize1);
AddStyleToDoc(document.MainDocumentPart.Document.MainDocumentPart, styleId, styleName, styleRunProperties, document);
Paragraph p = new Paragraph();
ParagraphProperties pp = new ParagraphProperties();
pp.ParagraphStyleId = new ParagraphStyleId() { Val = styleId };
pp.SpacingBetweenLines = new SpacingBetweenLines() { After = "0" };
ParagraphStyleId paragraphStyleId1 = new ParagraphStyleId() { Val = "ListParagraph" };
NumberingProperties numberingProperties1 = new NumberingProperties();
NumberingLevelReference numberingLevelReference1 = new NumberingLevelReference() { Val = numLvlRef };
NumberingId numberingId1 = new NumberingId() { Val = numIdVal }; //Val is 1, 2, 3 etc based on your numberingid in your numbering element
numberingProperties1.Append(numberingLevelReference1); Indentation indentation1 = new Indentation() { FirstLineChars = 0 };
numberingProperties1.Append(numberingId1);
pp.Append(paragraphStyleId1);
pp.Append(numberingProperties1);
pp.Append(indentation1);
p.Append(pp);
Run r = new Run();
Text t = new Text(headingText) { Space = SpaceProcessingModeValues.Preserve };
r.Append(t);
p.Append(r);
document.MainDocumentPart.Document.Body.Append(p);
}
public static void AddStyleToDoc(MainDocumentPart mainPart, string styleid, string stylename, StyleRunProperties styleRunProperties, WordprocessingDocument document)
{
StyleDefinitionsPart part = mainPart.StyleDefinitionsPart;
if (part == null)
{
part = AddStylesPartToPackage(mainPart);
AddNewStyle(part, styleid, stylename, styleRunProperties);
}
else
{
if (IsStyleIdInDocument(mainPart, styleid) != true)
{
string styleidFromName = GetStyleIdFromStyleName(document, stylename);
if (styleidFromName == null)
{
AddNewStyle(part, styleid, stylename, styleRunProperties);
}
else
styleid = styleidFromName;
}
}
}
public static string GetStyleIdFromStyleName(WordprocessingDocument doc, string styleName)
{
StyleDefinitionsPart stylePart = doc.MainDocumentPart.StyleDefinitionsPart;
string styleId = stylePart.Styles.Descendants<StyleName>()
.Where(s => s.Val.Value.Equals(styleName) &&
(((Style)s.Parent).Type == StyleValues.Paragraph))
.Select(n => ((Style)n.Parent).StyleId).FirstOrDefault();
return styleId;
}
public static StyleDefinitionsPart AddStylesPartToPackage(MainDocumentPart mainPart)
{
StyleDefinitionsPart part;
part = mainPart.AddNewPart<StyleDefinitionsPart>();
DocumentFormat.OpenXml.Wordprocessing.Styles root = new DocumentFormat.OpenXml.Wordprocessing.Styles();
root.Save(part);
return part;
}
public static bool IsStyleIdInDocument(MainDocumentPart mainPart, string styleid)
{
DocumentFormat.OpenXml.Wordprocessing.Styles s = mainPart.StyleDefinitionsPart.Styles;
int n = s.Elements<DocumentFormat.OpenXml.Wordprocessing.Style>().Count();
if (n == 0)
return false;
DocumentFormat.OpenXml.Wordprocessing.Style style = s.Elements<DocumentFormat.OpenXml.Wordprocessing.Style>()
.Where(st => (st.StyleId == styleid) && (st.Type == StyleValues.Paragraph))
.FirstOrDefault();
if (style == null)
return false;
return true;
}
private static void AddNewStyle(StyleDefinitionsPart styleDefinitionsPart, string styleid, string stylename, StyleRunProperties styleRunProperties)
{
DocumentFormat.OpenXml.Wordprocessing.Styles styles = styleDefinitionsPart.Styles;
DocumentFormat.OpenXml.Wordprocessing.Style style = new DocumentFormat.OpenXml.Wordprocessing.Style()
{
Type = StyleValues.Paragraph,
StyleId = styleid,
CustomStyle = false
};
style.Append(new StyleName() { Val = stylename });
style.Append(new BasedOn() { Val = "Normal" });
style.Append(new NextParagraphStyle() { Val = "Normal" });
style.Append(new UIPriority() { Val = 900 });
styles.Append(style);
}

To use multi-level lists in Microsoft Word, you need to ensure that:
you have the desired multi-level list set up correctly in your numbering definitions part (meaning your w:numbering element contains a w:num and corresponding w:abstractNum that specifies the different list levels);
you have one style for each list level that references both the numbering ID specified by the w:num element and the desired list level; and
your paragraphs reference the correct style for the list level.
Please have a look at my answers on how to create multi-level ordered lists with Open XML and display multi-level lists in Word documents using C# and the Open XML SDK for further details and explanations.

Finding difference between two tree structures C#

I am in need of a function that will compare the difference between two different file structures and return that difference as a file structure.
I have a class, "Element" that has properties ID and Children, with ID being a string and Children being a collection of Element.
public class Element
{
string ID { get; }
IEnumerable<Element> Children { get; }
}
Now, let's say I have the following structures of Elements:
Structure A Structure B
- Category 1 - Category 1
- Child X - Child X
- Child Y - Child Z
- Category 2
I would like to return a structure that tells me which elements are present in structure A but missing from structure B, which would look as follows:
Structure Diff
- Category 1
- Child Y
- Category 2
Is there a simple way of doing this using LINQ, or a straight-forward algorithm (Assuming there can be many levels to the tree).

Yes, it is. You can just compare two enumerables of strings that contains paths of files:
Category 1\
Category 1\Child X
Category 1\Child Y
Category 2\
Category 1\
Category 1\Child X
Category 1\Child Z
Having these two enumerables you can call Enumerable.Except method to keep items from the first enumerable that are missing in the second enumerable.

Sample implementation to get you started (tested only on one case):
internal class Program {
private static void Main(string[] args) {
var c1 = new Element[] {
new Element() {ID = "Category 1", Children = new Element[] {
new Element() {ID = "Child X" },
new Element() {ID = "Child Y" }
}},
new Element() {ID = "Category 2",}
};
var c2 = new Element[] {
new Element() {ID = "Category 1", Children = new Element[] {
new Element() {ID = "Child X" },
new Element() {ID = "Child Z" }
}},
};
var keys = new HashSet<string>(GetFlatKeys(c2));
var result = FindDiff(c1, keys).ToArray();
Console.WriteLine(result);
}
private static IEnumerable<Element> FindDiff(Element[] source, HashSet<string> keys, string key = null) {
if (source == null)
yield break;
foreach (var parent in source) {
key += "|" + parent.ID;
parent.Children = FindDiff(parent.Children, keys, key).ToArray();
if (!keys.Contains(key) || (parent.Children != null && parent.Children.Length > 0)) {
yield return parent;
}
}
}
private static IEnumerable<string> GetFlatKeys(IEnumerable<Element> source, string key = null) {
if (source == null)
yield break;
foreach (var parent in source) {
key += "|" + parent.ID;
yield return key;
foreach (var c in GetFlatKeys(parent.Children, key))
yield return c;
}
}
}
As said in another answer, it's easier to first get flat list of keys for each element in second tree, then filter out elements from the first tree based on that list.

Making a predictive text algorithm go faster

I am working on a windows phone dialler app and I have implemented prediction text in my app. When user taps on keypad, contacts that match input are generated. Prediction is too slow, it also blocks my main thread that's why I have implemented BackGroundWorker But still having problems with the performance
My code is:
private void dialer_TextChanged(object sender, TextChangedEventArgs e)
{
MainPage.DialerText = dialer.Text;
if(!bw1.IsBusy)
bw1.RunWorkerAsync();
}
void bw1_DoWork(object sender, DoWorkEventArgs e)
{
try
{
var digitMap = new Dictionary<int, string>() {
{ 1, "" },
{ 2, "[abcABC]" },
{ 3, "[defDEF]" },
{ 4, "[ghiGHI]" },
{ 5, "[jklJKL]" },
{ 6, "[mnoMNO]" },
{ 7, "[pqrsPQRS]" },
{ 8, "[tuvTUV]" },
{ 9, "[wxyzWXYZ]" },
{ 0, "" },
};
var enteredDigits = DialerText;
var charsAsInts = enteredDigits.ToCharArray().Select(x => int.Parse(x.ToString()));
var regexBuilder = new StringBuilder();
foreach (var val in charsAsInts)
regexBuilder.Append(digitMap[val]);
MainPage.pattern = regexBuilder.ToString();
MainPage.pattern = ".*" + MainPage.pattern + ".*";
}
catch (Exception f)
{
// MessageBox.Show(f.Message);
}
}
void bw1_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
SearchListbox.ItemsSource = listobj.FindAll(x => x.PhoneNumbers.Any(a=>a.Contains(MainPage.DialerText)) | Regex.IsMatch(x.FirstName, MainPage.pattern));
}
BackGroundWorker also blocking my main thread, hence when I tap on the keypad there's a lag while input values are added to the TextBox. I want to add input to the TextTox without any lag, how to do it? Thank you.

You can grab a real speed-up here by moving away from exhaustive searches of the entire wordlist and instead, putting your words into a more efficient data-structure.
For lightning fast lookups over any size of word list (but more expensive in terms of memory), you should build a tree structure that contains your entire word list.
The root node represents zero dialled digits, and it is connected to (up to) ten more nodes, where the edges connecting the nodes represent one of the possible numbers pressed for 0 to 9.
Each node then contains the possible words that can be formed from the path taken through the tree from the root node, where the path is representative of the numbers pressed.
This means that a search no longer requires iterating the entire word list and can be completed in very few operations.
Here's the concept in practice with a 370000 word-list I found on the web. Search takes around about 0.02ms on my desktop. Nice and fast. Seems to take about ~50MB in memory.
void Main()
{
var rootNode = new Node();
//probably a bad idea, better to await in an async method
LoadNode(rootNode).Wait();
//let's search a few times to get meaningful timings
for(var i = 0; i < 5; ++i)
{
//"acres" in text-ese (specifically chosen for ambiguity)
var searchTerm = "22737";
var sw = Stopwatch.StartNew();
var wordList = rootNode.Search(searchTerm);
Console.WriteLine("Search complete in {0} ms",
sw.Elapsed.TotalMilliseconds);
Console.WriteLine("Search for {0}:", searchTerm);
foreach(var word in wordList)
{
Console.WriteLine("Found {0}", word);
}
}
GC.Collect();
var bytesAllocated = GC.GetTotalMemory(true);
Console.WriteLine("Allocated {0} bytes", bytesAllocated);
}
async Task LoadNode(Node rootNode)
{
var wordListUrl =
"https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt";
Console.WriteLine("Loading words from {0}", wordListUrl);
using(var httpClient = new HttpClient())
using(var stream = await httpClient.GetStreamAsync(wordListUrl))
using(var reader = new StreamReader(stream))
{
var wordCount = 0;
string word;
while( (word = await reader.ReadLineAsync()) != null )
{
word = word.ToLowerInvariant();
if(!Regex.IsMatch(word,#"^[a-z]+$"))
{
continue;
}
rootNode.Add(word);
wordCount++;
}
Console.WriteLine("Loaded {0} words", wordCount);
}
}
class Node
{
static Dictionary<int, string> digitMap = new Dictionary<int, string>() {
{ 1, "" },
{ 2, "abcABC" },
{ 3, "defDEF" },
{ 4, "ghiGHI" },
{ 5, "jklJKL" },
{ 6, "mnoMNO" },
{ 7, "pqrsPQRS" },
{ 8, "tuvTUV" },
{ 9, "wxyzWXYZ" },
{ 0, "" }};
static Dictionary<char,int> letterMap;
static Node()
{
letterMap = digitMap
.SelectMany(m => m.Value.Select(c=>new {ch = c, num = m.Key}))
.ToDictionary(x => x.ch, x => x.num);
}
List<string> words = new List<string>();
//the edges collection has exactly 10
//slots which represent the numbers [0-9]
Node[] edges = new Node[10];
public IEnumerable<string> Words{get{
return words;
}}
public void Add(string word, int pos = 0)
{
if(pos == word.Length)
{
if(word.Length > 0)
{
words.Add(word);
}
return;
}
var currentChar = word[pos];
int edgeIndex = letterMap[currentChar];
if(edges[edgeIndex] == null)
{
edges[edgeIndex] = new Node();
}
var nextNode = edges[edgeIndex];
nextNode.Add(word, pos+1);
}
public Node FindMostPopulatedNode()
{
Stack<Node> stk = new Stack<Node>();
stk.Push(this);
Node biggest = null;
while(stk.Any())
{
var node = stk.Pop();
biggest = biggest == null
? node
: (node.words.Count > biggest.words.Count
? node
: biggest);
foreach(var next in node.edges.Where(e=>e != null))
{
stk.Push(next);
}
}
return biggest;
}
public IEnumerable<string> Search(string numberSequenceString)
{
var numberSequence = numberSequenceString
.Select(n => int.Parse(n.ToString()));
return Search(numberSequence);
}
private IEnumerable<string> Search(IEnumerable<int> numberSequence)
{
if(!numberSequence.Any())
{
return words;
}
var first = numberSequence.First();
var remaining = numberSequence.Skip(1);
var nextNode = edges[first];
if(nextNode == null)
{
return Enumerable.Empty<string>();
}
return nextNode.Search(remaining);
}
}

There is a number of optimizations you could make to improve the speed:
Adding .* prefix and suffix to your regex pattern is not necessary, because IsMatch will detect a match anywhere in a string
Using a local Dictionary<int,string> for parts of your pattern can be replaced by a static array
Converting digits to ints can be replaced with subtraction
The foreach loop and appending could be replaced by string.Join
Here is how:
private static string[] digitMap = new[] {
""
, "", "[abcABC]", "[defDEF]"
, "[ghiGHI]", "[jklJKL]", "[mnoMNO]"
, "[pqrsPQRS]", "[tuvTUV]", "[wxyzWXYZ]"
};
void bw1_DoWork(object sender, DoWorkEventArgs e) {
try {
MainPage.pattern = string.Join("", DialerText.Select(c => digitMap[c-'0']));
} catch (Exception f) {
// MessageBox.Show(f.Message);
}
}

I suspect the reason for the blocking is that your background worker thread doesn't actually do very much.
I expect the bw1_RunWorkerCompleted method (which gets run on the MAIN thread after the background worker has completed) is far more long-running! Can you move some (all?) of this code into the bw1_DoWork method instead? Obviously "listobj" would have to be accessible to the background thread though - just pass it into your delegate and access it by the DoWorkEventArgs.Argument property for this...
private void dialer_TextChanged(object sender, TextChangedEventArgs e)
{
MainPage.DialerText = dialer.Text;
if(!bw1.IsBusy)
bw1.RunWorkerAsync(listobj);
}
void bw1_DoWork(object sender, DoWorkEventArgs e)
{
try
{
var digitMap = new Dictionary<int, string>() {
{ 1, "" },
{ 2, "[abcABC]" },
{ 3, "[defDEF]" },
{ 4, "[ghiGHI]" },
{ 5, "[jklJKL]" },
{ 6, "[mnoMNO]" },
{ 7, "[pqrsPQRS]" },
{ 8, "[tuvTUV]" },
{ 9, "[wxyzWXYZ]" },
{ 0, "" },
};
var enteredDigits = DialerText;
var charsAsInts = enteredDigits.ToCharArray().Select(x => int.Parse(x.ToString()));
var regexBuilder = new StringBuilder();
foreach (var val in charsAsInts)
regexBuilder.Append(digitMap[val]);
MainPage.pattern = regexBuilder.ToString();
MainPage.pattern = ".*" + MainPage.pattern + ".*";
var listobj = (ListObjectType)e.Argument;
e.Result = listobj.FindAll(x => x.PhoneNumbers.Any(a=>a.Contains(MainPage.DialerText)) | Regex.IsMatch(x.FirstName, MainPage.pattern));
}
catch (Exception f)
{
// MessageBox.Show(f.Message);
}
}
void bw1_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
SearchListbox.ItemsSource = (IEnumerable<ListObjectItemType>)e.Result;
}
NB - you'll obviously need to replace the casts to ListObjectType, and ListObjectItemType with the actual types!

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Text Delta Determination (Hierarchical) - c#

Related

How to set values in dynamic JSON

Looking way to refactor these two methods into single method

How to add multilevel list in word document using openxml sdk and c#?

Finding difference between two tree structures C#

Making a predictive text algorithm go faster

Categories

Resources