Gracefully handle loading of XElement from an empty file - c#

I have a use-case where I'm required to read in some information from an XML file and act on it accordingly. The problem is, this XML file is technically allowed to be empty or full of whitespace and this means "there's no info, do nothing", any other error should fail hard.
I'm currently thinking about something along the lines of:
public void Load (string fileName)
{
XElement xml;
try {
xml = XElement.Load (fileName);
}
catch (XmlException e) {
// Check if the file contains only whitespace here
// if not, re-throw the exception
}
if (xml != null) {
// Do this only if there wasn't an exception
doStuff (xml);
}
// Run this irrespective if there was any xml or not
tidyUp ();
}
Does this pattern seem ok? If so, how do people recommend implementing the check for if the file contained only whitespace inside the catch block? Google only throws up checks for if a string is whitespace...
Cheers muchly,
Graham

Well, the easiest way is probably to make sure it isn't whitespace in the first place, by reading the entire file into a string first (I'm assuming it isn't too huge):
public void Load (string fileName)
{
var stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read);
var reader = new StreamReader(stream, Encoding.UTF8, true);
var xmlString = reader.ReadToEnd();
if (!string.IsNullOrWhiteSpace(xmlString)) { // Use (xmlString.Trim().Length == 0) for .NET < 4
var xml = XElement.Parse(xmlString); // Exceptions will bubble up
doStuff(xml);
}
tidyUp();
}

Related

PdfDocument remains locked after closing

I have a windows service which merges PDFs together on the fly and then moves them to another location. I don't have control over what someone wants merged, for the most part. It has happened that every so often a corrupted PDF gets processed and therefore creating the new PdfDocument throws a PdfException "Trailer not found". I am catching the exception and closing the document but it appears after closing the PDF itself is still locked somehow. I need to delete the directory but in trying to do that it throws an IOException and crashes the service.
I have verified that calling the PdfDocument constructor is what locks the pdf and that immediately after closing the file remains locked.
Any ideas? Is there something iText can do to help with is or do I need to come up with some sort of work around where I check for corrupted PDFs up front?
ProcessDirectory
private void ProcessDirectory(string directoryPath)
{
EventLogManager.WriteInformation("ProcessDirectory");
// DON'T TOUCH THE BACKUPS, ERRORS AND WORK DIRECTORIES. Just in case they were made or renamed after the fact for some reason
if (directoryPath != this._errorsPath && directoryPath != this._backupsPath && directoryPath != this._workPath)
{
string pdfJsonPath = System.IO.Path.Combine(directoryPath, "pdf.json");
if (File.Exists(pdfJsonPath))
{
string workPath = System.IO.Path.Combine(this._workPath, System.IO.Path.GetFileName(directoryPath));
try
{
CopyToDirectory(directoryPath, workPath);
PdfMerge pdfMerge = null;
string jsonPath = System.IO.Path.Combine(workPath, "pdf.json");
using (StreamReader r = Helpers.GetStreamReader(jsonPath))
{
string json = r.ReadToEnd();
pdfMerge = JsonConvert.DeserializeObject<PdfMerge>(json);
}
FillFormFields(workPath, pdfMerge);
if (pdfMerge.Pdfs.Any(p => !String.IsNullOrWhiteSpace(p.OverlayFilename)))
{
ApplyOverlays(workPath, pdfMerge);
}
MergePdfs(workPath, pdfMerge);
//NumberPages(workPath, pdfMerge);
FinishPdf(workPath, pdfMerge);
// Move original to backups directory
if (DoSaveBackups)
{
string backupsPath = System.IO.Path.Combine(this._backupsPath, String.Format("{0}_{1}", System.IO.Path.GetFileName(directoryPath), DateTime.Now.ToString("yyyyMMddHHmmss")));
Directory.Move(directoryPath, backupsPath);
}
else
{
Directory.Delete(directoryPath, true);
}
}
catch (Exception ex)
{
EventLogManager.WriteError(ex);
if (DoSaveErrors)
{
// Move original to errors directory
string errorsPath = System.IO.Path.Combine(this._errorsPath, String.Format("{0}_{1}", System.IO.Path.GetFileName(directoryPath), DateTime.Now.ToString("yyyyMMddHHmmss")));
Directory.Move(directoryPath, errorsPath);
}
else
{
Directory.Delete(directoryPath, true);
}
}
// Delete work directory
// THIS IS WHERE THE IOEXCEPTION OCCURS AND THE SERVICE CRASHES
Directory.Delete(workPath, true);
}
else
{
EventLogManager.WriteInformation(String.Format("No pdf.json file. {0} skipped.", directoryPath));
}
}
}
FillFormFields
private void FillFormFields(string directoryPath, PdfMerge pdfMerge)
{
if (pdfMerge != null && pdfMerge.Pdfs != null)
{
string formPath = String.Empty;
string newFilePath;
PdfDocument document = null;
PdfAcroForm form;
PdfFormField pdfFormField;
foreach (var pdf in pdfMerge.Pdfs)
{
try
{
formPath = System.IO.Path.Combine(directoryPath, pdf.Filename);
newFilePath = System.IO.Path.Combine(
directoryPath,
String.Format("{0}{1}", String.Format("{0}{1}", System.IO.Path.GetFileNameWithoutExtension(pdf.Filename), "_Revised"), System.IO.Path.GetExtension(pdf.Filename)));
// THIS IS WHERE THE PDFEXCEPTOIN OCCURS
document = new PdfDocument(Helpers.GetPdfReader(formPath), new PdfWriter(newFilePath));
form = PdfAcroForm.GetAcroForm(document, true);
if (pdf.Fields != null && pdf.Fields.Count > 0)
{
foreach (var field in pdf.Fields)
{
if (field.Value != null)
{
pdfFormField = form.GetField(field.Name);
if (pdfFormField != null)
{
form.GetField(field.Name).SetValue(field.Value);
}
else
{
EventLogManager.WriteWarning(String.Format("Field '{0}' does not exist in '{1}'", field.Name, pdf.Filename));
}
}
}
}
form.FlattenFields();
}
catch (Exception ex)
{
throw new Exception(String.Format("An exception occurred filling form fields for {0}", pdf.Filename), ex);
}
finally
{
if (document != null)
{
document.Close();
}
}
// Now rename the new one back to the old name
File.Delete(formPath);
File.Move(newFilePath, formPath);
}
}
}
UPDATE
It seems in order to everything to dispose properly you have to declare separate PdfReader and PdfWriter objects into using statements and pass those into the PdfDocument. Like this:
using (reader = Helpers.GetPdfReader(formPath))
{
using (writer = new PdfWriter(newFilePath))
{
using (document = new PdfDocument(reader, writer))
{
// The rest of the code here
}
}
}
I'm not sure why this is other than that iText isn't disposing of the individual PdfReader and PdfWriter when disposing of the PdfDocument, which I assumed it would.
Find out which of the itext7 classes implement IDisposable (from the documentation, or the Visual Studio Object Browser etc), and make sure you use them within using blocks, the same way you already have using blocks for StreamReader.
Edit: #sourkrause's solution can be shortened to:
using (reader = Helpers.GetPdfReader(formPath))
using (writer = new PdfWriter(newFilePath))
using (document = new PdfDocument(reader, writer))
{
// The rest of the code here
}
I know this is an old question, but this is my approach to solving in iText7, and it is quite different then the accepted answer. Since I could not use using statements, I took a different approach when closing out the document. This may seem like over kill, but it works very well.
First I closed the Document:
Document.Close();
Nothing out of the ordinary here.. after doing this however, I close / dispose the Reader and Writer instances. After closing them out, I'll set the writer, reader, and document in that order to null. The GC should take care of clearing these up, but for my usage the object that was holding these instance is still being used, so to free up some memory I'm doing this additional step.
step 2
Writer.Close();
Writer.Dispose();
Writer = null;
Step 3
Reader.SetCloseStream(true);
Reader.Close();
Reader = null;
Step 4
Document = null;
I would suggest you wrap each step in a try catch; depending on how your code is running, you could see issues doing this all at once.
I believe the most important part here is the actions taken on the reader. For some reason, the reader does not seem to close the stream when calling .Close() by default.
***While running in production I have still noticed that one file (so far anyways) still held a lock when trying to delete right after closing. I added a catch that waits a few seconds before trying again. That seems to do the trick on those more "stubborn" files.

Catch "FileNotFoundException"

I have a method to get the folder path of a particular file:
string filePath = Path.Combine(Environment.GetFolderPath(
Environment.SpecialFolder.MyDocuments), "file.txt");
And later, I use this to read the text in the file:
StreamReader rdr = new StreamReader(filePath); // "C:\Users\<user>\Documents\file.txt"
string myString = rdr.ReadToEnd();
Trouble is, if the file doesn't exist, it throws a FileNotFoundException (obviously). I want to hopefully use an if/else to catch the error, in which the user can browse to find the file directly, but I'm not sure what to use to verify if filePath is valid or not.
For example, I can't use:
if (filePath == null)
because the top method to retrieve the string will always return a value, whether or not it is valid. How can I solve this?
While File.Exists() is appropriate as a start, please note that ignoring the exception can still lead to an error condition if the file becomes inaccessible (dropped network drive, file opened by another program, deleted, etc.) in the time between the call to File.Exists() and new StreamReader().
You can use File.Exists:-
if(File.Exists(filePath))
{
//Do something
}
else
{
}
string filePath = Path.Combine(Environment.GetFolderPath(
Environment.SpecialFolder.MyDocuments), "file.txt");
if(!File.Exists(filePath))
{
/* browse your file */
}
else
{
StreamReader rdr = new StreamReader(filePath); // "C:\Users\<user>\Documents\file.txt"
string myString = rdr.ReadToEnd();
}

Isolated storage reading data : System.Xml.XmlException: Unexpected XML declaration

I am writing the data and reading it, i am getting the exception as "System.Xml.XmlException: Unexpected XML declaration",i am unable to figure it out whats the issue.
I have also added the exception that its printing.Please help me to solve the issue.
Here my code:
public static void WriteTopicState(Topic topic)
{
try
{
using (var store = IsolatedStorageFile.GetUserStoreForApplication())
{
using (StreamWriter sw = new StreamWriter(store.OpenFile("Stats.xml", FileMode.Append, FileAccess.Write)))
{
XmlSerializer serializer = new XmlSerializer(typeof(Topic));
serializer.Serialize(sw, topic);
serializer = null;
}
}
}
catch (Exception)
{
throw;
}
}
public static Topic ReadMockTestTopicState()
{
Topic topic = null;
try
{
using (IsolatedStorageFile isoStore = IsolatedStorageFile.GetUserStoreForApplication())
{
// Read application settings.
if (isoStore.FileExists("Stats.xml"))
{
using (var store = IsolatedStorageFile.GetUserStoreForApplication())
{
using (StreamReader SR = new StreamReader(store.OpenFile("Stats.xml", FileMode.Open, FileAccess.Read)))
{
XmlSerializer serializer = new XmlSerializer(typeof(Topic));
topic = (Topic)serializer.Deserialize(SR);
serializer = null;
}
}
}
else
{
// If setting does not exists return default setting.
topic = new Topic();
}
}
}
catch (Exception)
{
throw;
}
return topic;
}
Exception :
{System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. --->
System.InvalidOperationException: There is an error in XML document (9, 19). --->
System.Xml.XmlException: Unexpected XML declaration. The XML declaration must be the first node in the document, and no white space characters are allowed to appear before it. Line 9, position 19.
EDIT
If there is any other way that i can save the data to a txt file also is fine for me, but only thing is i need to append the data to the existing document and read it get back the data.
Your problem is because you are appending to your Stats.xml document, as a result it will contain multiple root elements once it has been written to more than once.
If you wish to only store the latest stats, you should use FileMode.Create:
using (StreamWriter sw = new StreamWriter(store.OpenFile("Stats.xml", FileMode.Create, FileAccess.Write)))
Valid XmlDocuments can only contain one root element, if you wish to store multiple 'stats' a different strategy is required:
If writes are only creates (eg not updates) write out each topic to a different file and combine them when reading
Create a container element that will store multiple topics and then parse this from disk, add to it, then subsequently overwrite the file on disk (you'll need to be careful with concurrency if you choose this option)
Use a different storage medium than the file system (eg a document database, a SQL database, etc)
Many other options

How Can I Handle This Xml Parsing Error?

Consider the following C# code:
using System.Xml.Linq;
namespace TestXmlParse
{
class Program
{
static void Main(string[] args)
{
var testxml =
#"<base>
<elem1 number='1'>
<elem2>yyy</elem2>
<elem3>xxx <yyy zzz aaa</elem3>
</elem1>
</base>";
XDocument.Parse(testxml);
}
}
}
I get a System.Xml.XmlException on the parse, of course, complaining about elem3. The error message is this:
System.Xml.XmlException was unhandled
Message='aaa' is an unexpected token. The expected token is '='. Line 4, position 59.
Source=System.Xml
LineNumber=4
LinePosition=59
Obviously this is not the real Xml (we get the xml from a third party) and while the best answer would be for the third party to clean up their xml before they send it to us, is there any other way I might fix this xml before I hand it off to the parser? I've devised a hacky way to fix this; catch the exception and use that to tell me where I need to look for characters which should be escaped. I was hoping for something a bit more elegant and comprehensive.
Any suggestions are welcome.
If this is a dupe, please point me to the other questions; I'll close this myself. I am more interested in an answer than any karma gain.
EDIT:
I guess I didn't make my question as clear as I had hoped. I know the "<" in elem3 is incorrect; I'm trying to find an elegant way to detect (and correct) any badly formed xml of that sort before I attempt the parse. As I say, I get this xml from a third-party and I can't control what they give me.
I would recommend that you do not manipulate the data you receive. If it is invalid it's your client's problem.
Editing the input so it is valid xml can cause serious problems, e.g. instead of throwing an error you may end up processing wrong data (because you tried your best to make the xml valid, but this may lead to different data).
[EDIT]
I still think it's not a good idea, but sometimes you have to do what you have to do.
Here is a very simple class that parses the input and replaces the invald opening tag. You could do this with a regex (which I am not good at) and this solution is not complete, e.g. depending on your requirements (or lets say the bad xml you get) you will have to adopt it (e.g. scan for complete xml elements instead of only the "<" and ">" brackets, put CDATA around the inner text of a node and so on).
I just wanted to illustrate how you could do it, so please don't complain if it is slow/has bugs (as I mentioned, I would not do it).
class XmlCleaner
{
public void Clean(Stream sourceStream, Stream targetStream)
{
const char openingIndicator = '<';
const char closingIndicator = '>';
const int bufferSize = 1024;
long length = sourceStream.Length;
char[] buffer = new char[bufferSize];
bool startTagFound = false;
StringBuilder writeBuffer = new StringBuilder();
using(var reader = new StreamReader(sourceStream))
{
var writer = new StreamWriter(targetStream);
try
{
while (reader.Read(buffer, 0, bufferSize) > 0)
{
foreach (var c in buffer)
{
if (c == openingIndicator)
{
if (startTagFound)
{
// we have 2 following opening tags without a closing one
// just replace the first one
writeBuffer = writeBuffer.Replace("<", "<");
// append the new one
writeBuffer.Append(c);
}
else
{
startTagFound = true;
writeBuffer.Append(c);
}
}
else if (c == closingIndicator)
{
startTagFound = false;
// write writebuffer...
writeBuffer.Append(c);
writer.Write(writeBuffer.ToString());
writeBuffer.Clear();
}
else
{
writeBuffer.Append(c);
}
}
}
}
finally
{
// unfortunately the streamwriter's dispose method closes the underlying stream, so e just flush it
writer.Flush();
}
}
}
To test it:
var testxml =
#"<base>
<elem1 number='1'>
<elem2>yyy</elem2>
<elem3>xxx <yyy zzz aaa</elem3>
</elem1>
</base>";
string result;
using (var source = new MemoryStream(Encoding.ASCII.GetBytes(testxml)))
using(var target = new MemoryStream()) {
XmlCleaner cleaner = new XmlCleaner();
cleaner.Clean(source, target);
target.Position = 0;
using (var reader = new StreamReader(target))
{
result = reader.ReadToEnd();
}
}
XDocument.Parse(result);
var expectedResult =
#"<base>
<elem1 number='1'>
<elem2>yyy</elem2>
<elem3>xxx <yyy zzz aaa</elem3>
</elem1>
</base>";
Debug.Assert(result == expectedResult);

FileInfo.Length is greater than 0 but File is Empty?

I have an application that crunches a bunch of text files. Currently, I have code like this (snipped-together excerpt):
FileInfo info = new FileInfo(...)
if (info.Length > 0) {
string content = getFileContents(...);
// uses a StreamReader
// returns reader.ReadToEnd();
Debug.Assert(!string.IsNullOrEmpty(contents)); // FAIL
}
private string getFileContents(string filename)
{
TextReader reader = null;
string text = "";
try
{
reader = new StreamReader(filename);
text = reader.ReadToEnd();
}
catch (IOException e)
{
// File is concurrently accessed. Come back later.
text = "";
}
finally
{
if (reader != null)
{
reader.Close();
}
}
return text;
}
Why am I getting a failed assert? The FileInfo.Length attribute was already used to validate that the file is non-empty.
Edit: This appears to be a bug -- I'm catching IO exceptions and returning empty-string. But, because of the discussion around fileInfo.Length(), here's something interesting: fileInfo.Length returns 2 for an empty, only-BOM-marker text file (created in Notepad).
You might have a file which is empty apart from a byte-order mark. I think TextReader.ReadToEnd() would remove the byte-order mark, giving you an empty string.
Alternatively, the file could have been truncated between checking the length and reading it.
For diagnostic purposes, I suggest you log the file length when you get an empty string.
See that catch (IOException) block you have? That's what returns an empty string and triggers the assert even when the file is not empty.
If I remember well, a file ends with end of file, which won't be included when you call ReadToEnd.
Therefore, the file size is not 0, but it's content size is.
What's in the getFileContents method?
It may be repositioning the stream's pointer to the end of the stream before ReadToEnd() is called.

Categories