Remove duplicated items from XML by an attribute

Remove duplicated items from XML by an attribute - c#

Trying to delete <shipmentIndex Name=\"shipments\">whatever...</shipmentIndex>
if it appear more then 1 time, keeping only one.
I have surrounded the item i want to delete here with ***..
The code i am using worked before, but then i added .Value == "shipments"
and now it fail.
How can i keep this code and only fix .Value == "shipments" to work?
class Program
{
static void Main(string[] args)
{
string renderedOutput =
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
"<RootDTO xmlns:json='http://james.newtonking.com/projects/json'>" +
"<destination>" +
"<name>xxx</name>" +
"</destination>" +
"<orderData>" +
"<items json:Array='true'>" +
"<shipmentIndex Name=\"items\" >111</shipmentIndex>" +
"<barcode>12345</barcode>" +
"</items>" +
"<items json:Array='true'>" +
"<shipmentIndex Name=\"items\">222</shipmentIndex>" +
"<barcode>12345</barcode>" +
"</items>" +
"<items json:Array='true'>" +
"<shipmentIndex Name=\"items\">222</shipmentIndex>" +
"<barcode>12345</barcode>" +
"</items>" +
"<misCode>9876543210</misCode>" +
"<shipments>" +
"<sourceShipmentId></sourceShipmentId>" +
"<shipmentIndex shipments=\"shipments\">111</shipmentIndex>" +
"</shipments>" +
"<shipments>" +
"<sourceShipmentId></sourceShipmentId>" +
"<shipmentIndex Name=\"shipments\">222</shipmentIndex>" +
****
"<shipmentIndex Name=\"shipments\">222</shipmentIndex>" +
****
"</shipments>" +
"</orderData>" +
"</RootDTO>";
var xml = XElement.Parse(renderedOutput);
xml.Element("orderData").Descendants("shipments")
.SelectMany(s => s.Elements("shipmentIndex")
.GroupBy(g => g.Attribute("Name").Value == "shipments")
.SelectMany(m => m.Skip(1))).Remove();
}
}

Not sure I understand the question 100% but here goes:
I am thinking you want to filter the results to only include those elements where the name attribute is equal to 'shipments', although not all of the shipmentIndex elements have a 'Name' attribute so you are probably getting a null reference exception. You need to add a check to ensure that the 'Name' attribute exists.
xml.Element("orderData").Descendants("shipments")
.SelectMany(s => s.Elements("shipmentIndex")
.GroupBy(g => g.Attribute("Name") != null && g.Attribute("Name").Value == "shipments")
.SelectMany(m => m.Skip(1))).Remove();

If you want to delete the duplicate from the renderedOutput string:
Match match = Regex.Match(renderedOutput, "<shipmentIndex Name=\"shipments\">([^<]*)</shipmentIndex>");
int index = renderedOutput.IndexOf(match.ToString());
renderedOutput = renderedOutput.Remove(index, match.ToString().Length);

Related

c# List inside list XML Linq

I have the following statement
xdoc.Descendants("Father").Select(p => new
{
Son1 = (string)p.Element("Son1").Value,
Son2 = (string)p.Element("Son2").Value,
Son3= (string)p.Element("Son3").Value,
Son4 = (string)p.Element("Son4").Value,
Son5 = (string)p.Element("Son5").Value
}).ToList().ForEach(p =>
{
Response.Write("Son1= " + p.Son1 + " ");
Response.Write("Son2=" + p.Son2 + " ");
Response.Write("Son3=" + p.Son3 + " ");
Response.Write(("Son4 =") + p.Son4 + " ");
Response.Write(("Son5 =") + p.Son5 + " ");
Response.Write("<br />");
});
and it works fine as long as i have only one instance of each son , the problem is that i have multiple instances of Son5, and i don´t know how to put Son5 inside of a list
Here is my XML code Example:

If you have several elements of same type, then you should parse them to list or other collection:
var fathers = from f in xdoc.Descendants("Father")
select new {
Son1 = (string)f.Element("Son1"),
Son2 = (string)f.Element("Son2"),
Son3= (string)f.Element("Son3"),
Son4 = (string)f.Element("Son4"),
Son5 = f.Elements("Son5").Select(s5 => (string)s5).ToList()
};
Some notes:
Don't use .Value of XElement or XAttribute - you can cast element itself to appropriate data type without accessing its value. Benefits - less code, more reliable in case element is missing (you will not get NullReferenceException)
Consider to use int or int? as elemenent values if your elements contain integer values
If you have single Father element, then don't work with collection of fathers. Just get xml root and check whether it's null or not. After that you can create single father object.
Writing response
foreach(var father in fathers)
{
Response.Write($"Son1={father.Son1} ");
Response.Write($"Son2={father.Son2} ");
Response.Write($"Son3={father.Son3} ");
Response.Write($"Son4={father.Son4} ");
Response.Write(String.Join(" ", father.Son5.Select(son5 => $"Son5={son5}"));
Response.Write("<br />");
}

Try this:
xdoc.Descendants("Father").Select(p => new
{
Son1 = p.Element("Son1").Value,
Son2 = p.Element("Son2").Value,
Son3= p.Element("Son3").Value,
Son4 = p.Element("Son4").Value,
Sons5 = p.Elements("Son5").Select(element => element.Value).ToList()
}).ToList().ForEach(p =>
{
Response.Write("Son1= " + p.Son1 + " ");
Response.Write("Son2=" + p.Son2 + " ");
Response.Write("Son3=" + p.Son3 + " ");
Response.Write("Son4 =" + p.Son4 + " ");
p.Sons5.ForEach(son5 => Response.Write("Son5 =" + son5 + " "));
Response.Write("<br />");
});
That will create a list of Son5 within your list of items, which you can iterate in the ForEach with another ForEach.

Extract Xpath value from IwebElement

I need the value of an Xpath from Iwebelement. Can someone help me out? PFB code
IWebElement webElement;
if (!string.IsNullOrEmpty(webElement.GetAttribute("id")))
{
searchprop.Add("Id", webElement.GetAttribute("id"));
}
if (!string.IsNullOrEmpty(webElement.GetAttribute("XPath")))
{
searchprop.Add("XPath", webElement.GetAttribute("XPath"));
}
Here,it is obvious that I can't get the Xpath value using "webElement.GetAttribute("XPath")" Since Xpath is not an attribute.Similar to the ID value I need the Xpath as well.So how can I get that?

Maybe this method will solve your problem.
public String GetElementXPath(IWebDriver driver, IWebElement element)
{
String javaScript = "function getElementXPath(elt){" +
"var path = \"\";" +
"for (; elt && elt.nodeType == 1; elt = elt.parentNode){" +
"idx = getElementIdx(elt);" +
"xname = elt.tagName;" +
"if (idx > 1){" +
"xname += \"[\" + idx + \"]\";" +
"}" +
"path = \"/\" + xname + path;" +
"}" +
"return path;" +
"}" +
"function getElementIdx(elt){" +
"var count = 1;" +
"for (var sib = elt.previousSibling; sib ; sib = sib.previousSibling){" +
"if(sib.nodeType == 1 && sib.tagName == elt.tagName){" +
"count++;" +
"}" +
"}" +
"return count;" +
"}" +
"return getElementXPath(arguments[0]).toLowerCase();";
return (String)((IJavaScriptExecutor)driver).ExecuteScript(javaScript, element);
}

Merge 2 TSV files in C# code cleanup

I'm provided with 2 Excel files that I convert to TSV files and in the end have to deliver in a TSV file. The 1st file is the main file (strWorksheetPath) and all lines have to be included. The 2nd file (PrintPath) has additional information but not each line in the main file has extra information. To do this in C# I followed this msdn guide to do what I have to do and it's working fine. Unfortunatly, file 1 has 23 columns and file 2 has 10 adding up to 33 columns and so 33 properties in total. I created some temp classes to see if everything is working but it looks very messy in my opinion.
Is there a way to clean up my code and make it look more tidy by possibly not having to make temp classes, condense some repetitive code, ...?
public static void ConvertTSVtoMontDataTable(string strWorksheetPath, string strPrintPath,
bool closeConnection = true)
{
// Check if the main file exist.
if (!File.Exists(strWorksheetPath)) return;
// Load both files.
var mainFile = File.ReadAllLines(strWorksheetPath);
var extraFile = File.ReadAllLines(strPrintPath);
// Create 2 lists.
var mainLines = mainFile.Select(line => new TempMainLine(line)).ToList();
var extraLines = extraFile.Select(line => new TempExtraLine(line)).ToList();
var lines = new List<TempLine>();
// Merge both files.
var leftOuterJoinQuery =
from worksheetLine in mainLines
join printLine in extraLines on string.Concat(worksheetLine.prop6, worksheetLine.prop8) equals
string.Concat(printLine.prop4, printLine.prop5) into lineGroup
from line in lineGroup.DefaultIfEmpty()
select
new TempLine(worksheetLine.prop0, worksheetLine.prop1, worksheetLine.prop2, worksheetLine.prop3,
worksheetLine.prop4, worksheetLine.prop5, worksheetLine.prop6, worksheetLine.prop7,
worksheetLine.prop8, worksheetLine.prop9, worksheetLine.prop10, worksheetLine.prop11,
worksheetLine.prop12, worksheetLine.prop13, worksheetLine.prop14, worksheetLine.prop15,
worksheetLine.prop16, worksheetLine.prop17, worksheetLine.prop18, worksheetLine.prop19,
worksheetLine.prop20, worksheetLine.prop21, worksheetLine.prop22, line == null ? "" : line.prop0,
line == null ? "" : line.prop1, line == null ? "" : line.prop2, line == null ? "" : line.prop3,
line == null ? "" : line.prop4, line == null ? "" : line.prop5, line == null ? "" : line.prop6,
line == null ? "" : line.prop7, line == null ? "" : line.prop8, line == null ? "" : line.prop9);
foreach (var tempLine in leftOuterJoinQuery)
{
lines.Add(tempLine);
}
// Write output to new temp file (TESTING)
using (
var file =
new StreamWriter(Path.Combine(Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location),
"output.txt")))
{
foreach (var item in lines)
{
file.WriteLine(item.prop0 + (char)9 + item.prop1 + (char)9 + item.prop2 + (char)9 + item.prop3 +
(char)9 + item.prop4 + (char)9 + item.prop5 + (char)9 + item.prop6 + (char)9 +
item.prop7 + (char)9 + item.prop8 + (char)9 + item.prop9 + (char)9 + item.prop10 +
(char)9 + item.prop11 + (char)9 + item.prop12 + (char)9 + item.prop13 + (char)9 +
item.prop14 + (char)9 + item.prop15 + (char)9 + item.prop16 + (char)9 +
item.prop17 + (char)9 + item.prop18 + (char)9 + item.prop19 + (char)9 +
item.prop20 + (char)9 + item.prop21 + (char)9 + item.prop22 + (char)9 +
item.prop23 + (char)9 + item.prop24 + (char)9 + item.prop25 + (char)9 +
item.prop26 + (char)9 + item.prop27 + (char)9 + item.prop28 + (char)9 +
item.prop29 + (char)9 + item.prop30 + (char)9 + item.prop31 + (char)9 +
item.prop32);
}
}
}

I thought about this some more and regardless of what your Temp* classes look like, something along the lines of the below will work given the assumption that (based on the code you presented), you're outputting every column from both files in the order in which they came in. If you needed to exclude fields, change the order, etc., that would require some changes to the below or a different solution entirely.
It's basically just reading those two files in, joining on the Split() result and then combining the two lines. I didn't see a point in handling the LOJ logic for a null printFile line but if you need the extra tabs, you could replace the line ?? "" with something like line ?? new String('\t', 10)
Note that this is probably not the most efficient way to go about this and if your files are huge, you'd definitely want to optimize this a bit.
// Merge both files.
var lines =
from worksheetLine in mainFile
join printLine in extraFile on string.Concat(worksheetLine.Split('\t')[6], worksheetLine.Split('\t')[8]) equals
string.Concat(printLine.Split('\t')[4], printLine.Split('\t')[5]) into lineGroup
from line in lineGroup.DefaultIfEmpty()
select string.Concat(worksheetLine, line ?? "");
// Write output to new temp file (TESTING)
using (
var file =
new StreamWriter(Path.Combine(Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location),
"output.txt")))
{
foreach (var item in lines)
{
file.WriteLine(item);
}
}

how to increase the size of array or free the memory after each iteration. Error: Index was outside the bounds of the array c#

I read data from a text file which is 27 MB file and contains 10001 rows, I need to handle large data. I perform some kind of processing in each row of data and then write it back to a text file. This is the code I have am using
StreamReader streamReader = System.IO.File.OpenText("D:\\input.txt");
string lineContent = streamReader.ReadLine();
int count = 0;
using (StreamWriter writer = new StreamWriter("D:\\ft1.txt"))
{
do
{
if (lineContent != null)
{
string a = JsonConvert.DeserializeObject(lineContent).ToString();
string b = "[" + a + "]";
List<TweetModel> deserializedUsers = JsonConvert.DeserializeObject<List<TweetModel>>(b);
var CreatedAt = deserializedUsers.Select(user => user.created_at).ToArray();
var Text = deserializedUsers.Where(m => m.text != null).Select(user => new
{
a = Regex.Replace(user.text, #"[^\u0000-\u007F]", string.Empty)
.Replace(#"\/", "/")
.Replace("\\", #"\")
.Replace("\'", "'")
.Replace("\''", "''")
.Replace("\n", " ")
.Replace("\t", " ")
}).ToArray();
var TextWithTimeStamp = Text[0].a + " (timestamp:" + CreatedAt[0] + ")";
writer.WriteLine(TextWithTimeStamp);
}
lineContent = streamReader.ReadLine();
}
while (streamReader.Peek() != -1);
streamReader.Close();
This code helps does well up to 54 iterations as I get 54 lines in the output file. After that it gives error "Index was outside the bounds of the array." at line
var TextWithTimeStamp = Text[0].a + " (timestamp:" + CreatedAt[0] + ")";
I am not very clear about the issue if the maximum capacity of array has been violated, if so how can I increase it or If I can write the individual line encountered in loop through
writer.WriteLine(TextWithTimeStamp);
And clean the storage or something that can solve this issue. I tried using list insead of array , still issue is the same.Please help.

Change this line
var TextWithTimeStamp = Text[0].a + " (timestamp:" + CreatedAt[0] + ")";
to
var TextWithTimeStamp = (Text.Any() ? Text.First().a : string.Empty) +
" (timestamp:" + (CreatedAt.Any() ? CreatedAt.First() : string.Empty) + ")";
As you are creating Text and CreatedAt collection objects, they might be empty (0 total item) based on some scenarios and conditions.
Those cases, Text[0] and CreatedAt[0] will fail. So, before using the first element, check if there are any items in the collection. Linq method Any() is used for that purpose.
Update
If you want to skip the lines that do not contain text, change this lines
var TextWithTimeStamp = Text[0].a + " (timestamp:" + CreatedAt[0] + ")";
writer.WriteLine(TextWithTimeStamp);
to
if (Text.Any())
{
var TextWithTimeStamp = Text.First().a + " (timestamp:" + CreatedAt.First() + ")";
writer.WriteLine(TextWithTimeStamp);
}
Update 2
To include all the stringss from CreatedAt rather than only the first one, you can add all the values in comma separated strings. A general example
var strings = new List<string> { "a", "b", "c" };
var allStrings = string.Join(",", strings); //"a,b,c"

error: The query results cannot be enumerated more than once

Edit:
DataClassesDataContext dc = new DataClassesDataContext();
string _idCompany = Request["idCompany"];
var newes = dc.GetNewsCompany(Int64.Parse(_idCompany));
string date = "";
string newsHtml = "<center>";
if(newes.GetEnumerator().MoveNext()){
foreach (var item in newes)//say Error .......................
{
// date = calendar.GetDayOfMonth(item.DateSend) + "/" + calendar.GetMonth(item.DateSend) + "/" + calendar.GetYear(item.DateSend).ToString();
// newsHtml += "<li class='news-item'><a style='text-decoration:none' class=\"link\" onclick=\"$(\'#BodyNews\').text(\'" + HttpUtility.HtmlEncode(item.Body).Trim() + "\');$(\'#BodyNews\').dialog({resizable:false});\" href=\"#\" > " + item.Title.ToString() + "</a> " + date + " </li>";
}
newsHtml += "</center>";
}
else
{
// var propertyCompany = dc.GetPropertyCompanyById(Int64.Parse(_idCompany));
// newsHtml += "<li class='news-item'><a style='text-decoration:none' class=\"link\" );$(\'#BodyNews\').dialog({resizable:false});\" href=\"#\" > " + "!به صفحه شخصی شرکت " + propertyCompany.FirstOrDefault().NameCompany + " خوش آمدید " + "</a> " + date + " </li>";
}
return newsHtml;
say error:The query results cannot be enumerated more than once
how check var is empty or null with out enumerated;

Why bother with the if at all?
var newes = dc.GetNewsCompany(Int64.Parse(_idCompany));
//if (newes.GetEnumerator().MoveNext())//check is null or empty
var newesList = newes.ToList();
if (neweList.Count > 0)
{
...
}
You can always check the newesList.Count property afterward.

Not sure what's available as a member in newes, but if it's an object and depending on what dc.GetNewsCompany returns you could check for null
if (news == null) return;
or if it returns an empty collection/array, just check the count/length:
if (news.Count == 0) return;
if (news.Length == 0) return;

the error comes, because you are using .GetEnumerator() on newes and then using the newes again in a foreach Loop .. this causes the "double enumeration".
Generally avoid walking "such var"'s with a foreach, since the DataReader is locked the whole loop !. Means that you cannot use the same entitie in this loop.
Better .ToList() , you can the list.AsQuearable agian if you want to Linq on it
f.e. something like
var newes = dc.CompanyTable.Where(ln => ln.id.Equals(_idCompany));;
List<CompanyTable> newesList = newes.ToList();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Remove duplicated items from XML by an attribute - c#

If you want to delete the duplicate from the renderedOutput string: Match match = Regex.Match(renderedOutput, "<shipmentIndex Name=\"shipments\">([^<]*)</shipmentIndex>"); int index = renderedOutput.IndexOf(match.ToString()); renderedOutput = renderedOutput.Remove(index, match.ToString().Length);

Related

c# List inside list XML Linq

Extract Xpath value from IwebElement

Merge 2 TSV files in C# code cleanup

how to increase the size of array or free the memory after each iteration. Error: Index was outside the bounds of the array c#

error: The query results cannot be enumerated more than once

Categories

Resources