XDocument load multiple XML - c#

I have an ASP.Net webform app.
I have one fileupload in page allow multiple files.
In a foreach loop I try to read all XML the user uploaded to the folder.
The problem occurs if I read the documents. I only get the first XML in the selection. I simple use XDocument.load(filename) in a foreach. What did I wrong?
I fire that loop in a button click event
protected void btnSaveDatabase_Click(object sender, EventArgs e)
{ //hiddenfield in page have name of xml in folder
var filesUploaded = hfXMLuploaded.Value.ToString().Split('-');
try
{
foreach (var file in filesUploaded)
{
string filename = Server.MapPath("~/SFA_XML_Upload/" + file);
var doc = XDocument.Load(filename);
Response.Write(doc.ToString()); // for testing
//SaveDataBaseMyTable(ReadMyTable(doc, filename));
}
my read method:
protected List<SFA_ORDCLI> ReadOrdcli(XDocument doc, string filename)
{
var ordcli = doc.Descendants("table")
.Where(i => (string)i.Attribute("name") == "ordcli")
.Descendants("row")
.Select(e => e.Elements());
List<SFA_ORDCLI> sfaOrdcliList = new List<SFA_ORDCLI>();
foreach (var row in ordcli)
{
SFA_ORDCLI sfaOrdcli = new SFA_ORDCLI();
sfaOrdcli.NomeFileXML = Path.GetFileNameWithoutExtension(filename);
foreach (var field in row)
{
var name = (string)field.Attribute("name");
switch (name)
{
case "RolCodEst"://key
sfaOrdcli.RolCodEst = (string)field;
break;
...other fields
}
}
if (sfaOrdcli != null) sfaOrdcliList.Add(sfaOrdcli);
}
return sfaOrdcliList;
}
in debug i see Load method receive the files name 'name.xml'
but the response print two time the first xml read.

Problem was on SaveAs method:
i did fileUpload.Saveas instead of
file.SaveAs(folderPath + Path.GetFileName(file.FileName));
where file is HttpPostedFile
sorry
bye and tks everybody

Related

Apose.Words ImportNode ignores font formatting when appendingchild

I am currently using Aspose.Words to open a document, pull content between a bookmark start and a bookmark end and then place that content into another document. The issue that I'm having is that when using the ImportNode method is imports onto my document but changes all of the fonts from Calibri to Times New Roman and changes the font size from whatever it was on the original document to 12pt.
The way I'm obtaining the content from the bookmark is by using the Aspose ExtractContent method.
Because I'm having the issue with the ImportNode stripping my font formatting I tried making some adjustments and saving each node to an HTML string using ToString(HtmlSaveOptions). This works mostly but the problem with this is it is stripping out my returns on the word document so none of my text has the appropriate spacing. My returns end up coming in as HTML in the following format
"<p style=\"margin-top:0pt; margin-bottom:8pt; line-height:108%; font-size:11pt\"><span style=\"font-family:Calibri; display:none; -aw-import:ignore\"> </span></p>"
When using
DocumentBuilder.InsertHtml("<p style=\"margin-top:0pt; margin-bottom:8pt; line-height:108%; font-size:11pt\"><span style=\"font-family:Calibri; display:none; -aw-import:ignore\"> </span></p>");
it does not correctly add the return on the word document.
Here is the code I'm using, please forgive the comments etc... this has been my attempts at correcting this.
public async Task<string> GenerateHtmlString(Document srcDoc, ArrayList nodes)
{
// Create a blank document.
Document dstDoc = new Document();
ELSLogHelper.InsertInfoLog(_callContext, ELSLogHelper.AsposeLogMessage("Open"), MethodBase.GetCurrentMethod()?.Name, MethodBase.GetCurrentMethod().DeclaringType?.Name, Environment.StackTrace);
// Remove the first paragraph from the empty document.
dstDoc.FirstSection.Body.RemoveAllChildren();
// Create a new Builder for the temporary document that gets generated with the header or footer data.
// This allows us to control font and styles separately from the main document being built.
var newBuilder = new DocumentBuilder(dstDoc);
Aspose.Words.Saving.HtmlSaveOptions htmlSaveOptions = new Aspose.Words.Saving.HtmlSaveOptions();
htmlSaveOptions.ExportImagesAsBase64 = true;
htmlSaveOptions.SaveFormat = SaveFormat.Html;
htmlSaveOptions.ExportFontsAsBase64 = true;
htmlSaveOptions.ExportFontResources = true;
htmlSaveOptions.ExportTextBoxAsSvg = true;
htmlSaveOptions.ExportRoundtripInformation = true;
htmlSaveOptions.Encoding = Encoding.UTF8;
// Obtain all the links from the source document
// This is used later to add hyperlinks to the html
// because by default extracting nodes using Aspose
// does not pull in the links in a usable way.
var srcDocLinks = srcDoc.Range.Fields.GroupBy(x => x.DisplayResult).Select(x => x.First()).Where(x => x.Type == Aspose.Words.Fields.FieldType.FieldHyperlink).Distinct().ToList();
var childNodes = nodes.Cast<Node>().Select(x => x).ToList();
var oldBuilder = new DocumentBuilder(srcDoc);
oldBuilder.MoveToBookmark("Header");
var allchildren = oldBuilder.CurrentParagraph.Runs;
var allChildNodes = childNodes[0].Document.GetChildNodes(NodeType.Any, true);
var headerText = allChildNodes[0].Range.Bookmarks["Header"].BookmarkStart.GetText();
foreach (Node node in nodes)
{
var html = node.ToString(htmlSaveOptions);
try
{
//   is used by aspose because it works in XML
// If we see this character and the text of the node is \r we need to insert a break
if (html.Contains(" ") && node.Range.Text == "\r")
{
newBuilder.InsertHtml(html, false);
// Change the node into an HTML string
/*var htmlString = node.ToString(SaveFormat.Html);
var tempHtmlLinkDoc = new HtmlDocument();
tempHtmlLinkDoc.LoadHtml(htmlString);
// Get all the child nodes of the html document
var allChildNodes = tempHtmlLinkDoc.DocumentNode.SelectNodes("//*");
// Loop over all child nodes so we can make sure we apply the correct font family and size to the break.
foreach (var childNode in allChildNodes)
{
// Get the style attribute from the child node
var childNodeStyles = childNode.GetAttributeValue("style", "").Split(';');
foreach (var childNodeStyle in childNodeStyles)
{
// Apply the font name and size to the new builder on the document.
if (childNodeStyle.ToLower().Contains("font-family"))
{
newBuilder.Font.Name = childNodeStyle.Split(':')[1].Trim();
}
if (childNodeStyle.ToLower().Contains("font-size"))
{
newBuilder.Font.Size = Convert.ToDouble(childNodeStyle.Split(':')[1]
.Replace("pt", "")
.Replace("px", "")
.Replace("em", "")
.Replace("rem", "")
.Replace("%", "")
.Trim());
}
}
}
// Insert the break with the corresponding font size and name.
newBuilder.InsertBreak(BreakType.ParagraphBreak);*/
}
else
{
// Loop through the source document links so the link can be applied to the HTML.
foreach (var srcDocLink in srcDocLinks)
{
if (html.Contains(srcDocLink.DisplayResult))
{
// Now that we know the html string has one of the links in it we need to get the address from the node.
var linkAddress = srcDocLink.Start.NextSibling.GetText().Replace(" HYPERLINK \"", "").Replace("\"", "");
//Convert the node into an HTML String so we can get the correct font color, name, size, and any text decoration.
var htmlString = srcDocLink.Start.NextSibling.ToString(SaveFormat.Html);
var tempHtmlLinkDoc = new HtmlDocument();
tempHtmlLinkDoc.LoadHtml(htmlString);
var linkStyles = tempHtmlLinkDoc.DocumentNode.ChildNodes[0].GetAttributeValue("style", "").Split(';');
var linkStyleHtml = "";
foreach (var linkStyle in linkStyles)
{
if (linkStyle.ToLower().Contains("color"))
{
linkStyleHtml += $"color:{linkStyle.Split(':')[1].Trim()};";
}
if (linkStyle.ToLower().Contains("font-family"))
{
linkStyleHtml += $"font-family:{linkStyle.Split(':')[1].Trim()};";
}
if (linkStyle.ToLower().Contains("font-size"))
{
linkStyleHtml += $"font-size:{linkStyle.Split(':')[1].Trim()};";
}
if (linkStyle.ToLower().Contains("text-decoration"))
{
linkStyleHtml += $"text-decoration:{linkStyle.Split(':')[1].Trim()};";
}
}
if (linkAddress.ToLower().Contains("mailto:"))
{
// Since the link has mailto included don't add the target attribute to the link.
html = new Regex($#"\b{srcDocLink.DisplayResult}\b").Replace(html, $"{srcDocLink.DisplayResult}");
//html = html.Replace(srcDocLink.DisplayResult, $"{srcDocLink.DisplayResult}");
}
else
{
// Since the links is not an email include the target attribute.
html = new Regex($#"\b{srcDocLink.DisplayResult}\b").Replace(html, $"{srcDocLink.DisplayResult}");
//html = html.Replace(srcDocLink.DisplayResult, $"{srcDocLink.DisplayResult}");
}
}
}
// Inseret the HTML String into the temporary document.
newBuilder.InsertHtml(html, false);
}
}
catch (Exception ex)
{
throw;
}
}
// This is just for debugging/troubleshooting purposes and to make sure thigns look correct
string tempDocxPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "temp", "TemporaryCompiledDocument.docx");
dstDoc.Save(tempDocxPath);
// We generate this HTML file then load it back up and pass the DocumentNode.OuterHtml back to the requesting method.
ELSLogHelper.InsertInfoLog(_callContext, ELSLogHelper.AsposeLogMessage("Save"), MethodBase.GetCurrentMethod()?.Name, MethodBase.GetCurrentMethod().DeclaringType?.Name, Environment.StackTrace);
string tempHtmlPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "temp", "TemporaryCompiledDocument.html");
dstDoc.Save(tempHtmlPath, htmlSaveOptions);
var tempHtmlDoc = new HtmlDocument();
tempHtmlDoc.Load(tempHtmlPath);
var htmlText = tempHtmlDoc.DocumentNode.OuterHtml;
// Clean up our mess...
if (File.Exists(tempDocxPath))
{
File.Delete(tempDocxPath);
}
if (File.Exists(tempHtmlPath))
{
File.Delete(tempHtmlPath);
}
// Return the generated HTML string.
return htmlText;
}
Saving each node to HTML and then inserting them into the destination document is not a good idea. Because not all nodes can be properly saved to HTML and some formatting can be lost after Aspose.Words DOM -> HTML -> Aspose.Words DOM roundtrip.
Regarding the original issue, the problem might occur because you are using ImportFormatMode.UseDestinationStyles, in this case styles and default of the destination document are used and font might be changed. If you need to keep the source document formatting, you should use ImportFormatMode.KeepSourceFormatting.
If the problem occurs even with ImportFormatMode.KeepSourceFormatting this must be a bug and you should report this to Aspose.Words staff in the support forum.

The number of frames tag appears in the dataset but is not in the DICOMDIR C#

I add dicom files using the AddFile(dicomFile,name) method but the number of frames tag does not appear.
var sourcePath = Path.Combine(tempDirectory, "DICOM", $"PATIENT{i + 1}", $"STUDY{j + 1}", $"SERIES{k + 1}", $"SUBSERIES{l + 1}");
var dicomDir = new DicomDirectory { AutoValidate = false };
foreach (var file in new DirectoryInfo(tempDirectory).GetFiles("*.*", SearchOption.AllDirectories))
{
try
{
var dicomFile = DicomFile.Open(file.FullName);
if (dicomFile != null)
{
var referenceField = file.FullName.Replace(tempDirectory, string.Empty).Trim('\\');
dicomDir.AddFile(dicomFile, referenceField);
}
}
catch (Exception ex)
{
Log.Error(ex, ex.Message);
}
}
var dicomDirPath = Path.Combine(tempDirectory, "DICOMDIR");
dicomDir.Save(dicomDirPath);
resultDirectories.Add(dicomDirPath);
I also tried the addorupdate method but it doesn't work.
I use the fo-dicom library 4.0.7
When building a DICOMDIR with fo-dicom by iterative calling AddFile for each file, then you will get a DICOMDIR with all the required DicomTags. But of course there are a lot of tags that are optional and you can add them yourself.
The method AddFile returns an instance of type DicomDirectoryEntry, which gives you a reference to the patient record entry, the study record entry, the series record entry and the instance record entry. There you can add as many additional optional data that you wish. In your case it would look like
[...]
if (dicomFile != null)
{
var referenceField = file.FullName.Replace(tempDirectory, string.Empty).Trim('\\');
var entries = dicomDir.AddFile(dicomFile, referenceField);
// now you can add some additional data.
// but before adding values, make sure that those values are available
// in your original DicomFile to avoid NullReferenceExceptions.
if (dicomFile.Dataset.Contains(DicomTag.NumberOfFrames))
{
entries.InstanceRecord.AddOrUpdate(DicomTag.NumberOfFrames, dicomFile.Dataset.GetSingleValue<int>(DicomTag.NumberOfFrames));
}
}

No Errors yet nothing in Console?

Just after some help with some code i've written to extract data using HttpClient.
I am new to writing code so can't find my problem. Could someone pls help me troubleshoot this.
I expect to write the data of the table i'm scraping to the console line.
Any help appreciated
using System;
using System.Collections.Generic;
using System.Linq;
using System.Net.Http;
using HtmlAgilityPack;
namespace weatherCheck
{
class Program
{
private static void Main(string[] args)
{
GetHtmlAsync();
Console.ReadLine();
}
protected static async void GetHtmlAsync()
{
var url = "https://www.weatherzone.com.au/vic/melbourne/melbourne";
var httpClient = new HttpClient();
var html = await httpClient.GetStringAsync(url);
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
//grab the rain chance, rain in mm and date
var MyTable = Enumerable.FirstOrDefault(htmlDocument.DocumentNode.Descendants("table")
.Where(table => table.Attributes.Contains("id"))
, table => table.Attributes["id"].Value == "forecast-table");
List<HtmlNode> rows = htmlDocument.DocumentNode.SelectNodes("//tr").ToList();
foreach (var row in rows)
{
try
{
if (MyTable != null)
{
Console.WriteLine(MyTable.GetAttributeValue("forecast-table", " "));
}
}
catch (Exception)
{
}
}
}
}
}
I used your code to look up the values but it didnt produce anything for me either. When i look at the htmlDocument.DocumentNode.OuterHtml to view the entire Html it is scraping, I dont see anything in the document that reflects an attribute forecast-table.
Also, you are validating MyTable each time you loop through rows. You should validate row != null along with printing attribute from row.
var MyTable = Enumerable.FirstOrDefault(htmlDocument.DocumentNode.Descendants("table")
.Where(table => table.Attributes.Contains("id")), table => table.Attributes["id"].Value == "forecast-table");
List<HtmlNode> rows = htmlDocument.DocumentNode.SelectNodes("//tr").ToList();
foreach (var row in rows)
{
try
{
if (row != null) // Here, it should be row, not My Table along with MyTable in line below.
Console.WriteLine(row.GetAttributeValue("forecast-table", " "));
}
catch (Exception)
{
}
}
Problem is
You also should know that Html you view by using Dev Tools on chrome is not the same as the one you see in HtmlAgilityPack. Chrome renders the page after executing the scripts where HtmlAgilityPack simply provides you with default HTML of the page. This is the reason why you are not able to get the value of forecast-table.
From Doc, For GetAttributeValue(name,def) it'll return def if the attribute not found.
So, it'll print ""(empty string if the attribute not found in your case)
remove async and await as you already calling httpClient.GetStringAsync(url);
var html =httpClient.GetStringAsync(url).Result;
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
And print,
Console.WriteLine(MyTable.GetAttributeValue("forecast-table","SOME_TEXT_HERE").ToString());

Aspose PDF - get text from page that has a matching string

I'm working with an existing library - the goal of the library is to pull text out of PDFs to verify against expected values to quality check recorded data vs data in pdf.
I'm looking for a way to succinctly pull a specific page worth of text given a string that should only fall on that specific page.
var pdfDocument = new Document(file.PdfFilePath);
var textAbsorber = new TextAbsorber{
ExtractionOptions = {
FormattingMode = TextExtractionOptions.TextFormattingMode.Pure
}
};
pdfDocument.Pages.Accept(textAbsorber);
foreach (var page in pdfDocument.Pages)
{
}
I'm stuck inside the foreach(var page in pdfDocument.Pages) portion... or is that the right area to be looking?
Answer: Text Absorber recreated each page - inside the foreach loop.
If the absorber isn't recreated, it keeps text from previous loops.
public List<string> ProcessPage(MyInfoClass file, string find)
{
var pdfDocument = new Document(file.PdfFilePath);
foreach (Page page in pdfDocument.Pages)
{
var textAbsorber = new TextAbsorber {
ExtractionOptions = {
FormattingMode = TextExtractionOptions.TextFormattingMode.Pure
}
};
page.Accept(textAbsorber);
var ext = textAbsorber.Text;
var exts = ext.Replace("\n", "").Split('\r').ToList();
if (ext.Contains(find))
return exts;
}
return null;
}

HtmlAgilityPack : illegal characters in path

I'm getting an "illegal characters in path" error in this code. I've mentioned "Error Occuring Here" as a comment in the line where the error is occuring.
var document = htmlWeb.Load(searchUrl);
var hotels = document.DocumentNode.Descendants("div")
.Where(x => x.Attributes.Contains("class") &&
x.Attributes["class"].Value.Contains("listing-content"));
int count = 1;
foreach (var hotel in hotels)
{
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.Load(hotel.InnerText); // Error Occuring Here //
if (htmlDoc.DocumentNode != null)
{
var hotelName = htmlDoc.DocumentNode.SelectNodes("//div[#class='business-container-inner']//div[#class='business-content clearfix']//div[#class='business-name-wrapper']//h3[#class='business-name fn org']//div[#class='srp-business-name']//a[0]");
foreach (var name in hotelName)
{
Console.WriteLine(name.InnerHtml);
}
}
}
You should use LoadHtml method with loads a string. Load method loads from file
htmlDoc.LoadHtml(hotel.InnerText);
This simply means you're trying to load a file with an invalid character in the file path/name.
The error is here:
htmlDoc.Load(hotel.InnerText);
..because that overload expects the path to the file:
public void Load(string path)
Use LoadHtml to load a HTML fragment:
public void LoadHtml(string html)

Categories