How to get only xml from file in c#? - c#

I have a problem with parsing file with XmlReader. I have a file containing info like this:
<Users>
<User>
<Email>email</Email>
<Key>23456</Key>
</User>
</Users>
asdfsof48f43uf489f3yf3y39fh3f489f3hf94[t]45.54tv,]5t
File contains xml values and then encrypted data from byte[] array.
The problem I've encountered is when i use:
using (var reader = XmlReader.Create(fileName))
{
while (reader.Read())
{
//parsing
}
}
I got 'System.Xml.XmlException' at line where encrypted bytes begin.
My question is: how to retrieve only xml part and only byte[] part?

If in case the encrypted data is always the last line you can use below snippet to read only XML part of data given that the XML data is limited in size
var fileLines = File.ReadAllLines(#"c:\temp\file.txt");
var xmlFromFile = string.Join("", fileLines, 0, fileLines.Length - 1);
using (var reader = XmlReader.Create(new StringReader(xmlFromFile)))
{
// Your logic goes here
}

you can do string parsing...
int start, end;
string myFile = File.ReadAllText("...");
start = myFile .IndexOf("<Users>");
end = myFile .IndexOf("</Users>") + 8;
myFile = myFile.Substring(start, end-start);
At that point you can load it into a xml document if you want. This all depends on you being 100% sure about the file format. This is a pretty fragile answer, so don't use it if you don't have a total trust in your input file.

Related

Read Parts of an Xml File trough Stream instead of only one

So I've been working on a old piece of code for a project.
I've managed to optimize it for 64bit usage.
But there's only 1 issue. When using the XmlSerializer.Deserialize
It breaks because the input text/Deserialized data is TOO BIG. (overflow/exceeds the 2gb int limit).
I've tried to find a fix, but no answer was helpful.
Here's the code in question.
if (File.Exists(dir + "/" + fileName))
{
string XmlString = File.ReadAllText(dir + "/" + fileName, Encoding.UTF8);
BXML_LIST deserialized;
using (MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes(XmlString)))
{
using (XmlTextReader xmlTextReader = new XmlTextReader(input))
{
xmlTextReader.Normalization = false;
XmlSerializer xmlSerializer = new XmlSerializer(typeof(BXML_LIST));
deserialized = (BXML_LIST)xmlSerializer.Deserialize(xmlTextReader);
}
}
xml_list.Add(deserialized);
}
Following many questions asked here, I tought I could use a method to "split" the xml file (WHILE KEEPING THE SAME TYPE OF BXML_LIST)
Then deserialize it and to finish: Combine it to match it's original content to avoid having the overflow error when deserializing the whole file.
Thing is, I have no idea how to implement this. Any help or guidance would be amazing!
// Edit 1:
I've found a piece of code from another site, don't know if it could be a reliable way to combine the splitted xml file:
var xml1 = XDocument.Load("file1.xml");
var xml2 = XDocument.Load("file2.xml");
//Combine and remove duplicates
var combinedUnique = xml1.Descendants("AllNodes")
.Union(xml2.Descendants("AllNodes"));
//Combine and keep duplicates
var combinedWithDups = xml1.Descendants("AllNodes")
.Concat(xml2.Descendants("AllNodes"));
Your code gives me the creeps, you're so inefficient at using up memory.
string XmlString = File.ReadAllText - Here you load the entire file into memory at the first time.
Encoding.UTF8.GetBytes(XmlString) - Here you spend memory for the same data for the second time.
new MemoryStream(...) - Here you spend memory for the same data for the third time.
xmlSerializer.Deserialize - Here, memory is spent again for deserialized data. But there's no getting away from it.
Write like this
using (XmlReader xmlReader = XmlReader.Create(dir + "/" + fileName))
{
XmlSerializer xmlSerializer = new XmlSerializer(typeof(BXML_LIST));
deserialized = (BXML_LIST)xmlSerializer.Deserialize(xmlReader);
}
In this case, xmlSerializer will read data from the file using xmlReader in a stream, in parts.
Perhaps, this may be enough to solve your problem.

XmlWriter trimming my string

I am trying to return an XML string as a CLOB from Oracle stored procedure to C# string.
Then I am write this string to a file using XmlWriter class.
My code looks like following:
string myString= ((Oracle.ManagedDataAccess.Types.OracleClob)(cmd.Parameters["paramName"].Value)).Value.ToString();
string fileName = DateTime.Now.ToString("yyyyMMddHHmmss");
var stream = new MemoryStream();
var writer = XmlWriter.Create(stream);
writer.WriteRaw(myString);
stream.Position = 0;
var fileStreamResult = File(stream, "application/octet-stream", "ABCD"+fileName+".xml");
return fileStreamResult;
When I checked my CLOB output it returns completely to myString.
When I check my end result, XML file is trimmed at the end.
My string will be huge for ex: Length of 3382563 and more.
Is there any setting for XmlWriter to write the complete string to file.
Thanks in advance.
Sounds like all you want to do is grab some string value out of your Database, and write that string value in a text file. The string being xml does not actually force you into using an XML specific class or method unless you want to do XML specific operations, which I do not see in your snippet. Therefore, I suggest you simply grab the string value and spit it out in a file in the easiest way.
string myString = " blah blah blah keep my spaces ";
using (StreamWriter sw = new StreamWriter(#"M:\StackOverflowQuestionsAndAnswers\XMLWriterTrimmingString_45380476\bin\Debug\outputfile.xml"))
{
sw.Write(myString);
}

Getting data from xml file and comparing it to a text file

So I have two files: a mot file and an xml file. What I need to do with these files is to read data from the xml file and compare it to the mot file if it exists. That's the general idea.
Before anything else, for those who are unfamiliar with what a mot
file is (I don't also have much knowledge about it, just the basics)...
(From Wikipedia) A mot file (or a Motorola S-Record
file) is a file format that conveys binary information in ASCII Hex text form.
(from another source) An S-record file consists of a
sequence of specially formatted ASCII character strings. An S-record
will be less than or equal to 78 bytes in length.
The format of a S-Record is:
S | Type | Record Length | Address (starting address) | Data | Checksum
(e.g. S21404200047524D5354524D0000801410AA5AA555F9)
([parsed] S2 14 042000 47524D5354524D0000801410AA5AA555 F9)
The specific idea is that I have data AA BB CC DD and so on allocated in addresses 0x042000 ~ 0x04200F. What’s written in the xml would be:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<data-set xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<record>
<File name="Test.mot">
<Address id="042000">
<Data>AA</Data>
</Address>
</File>
</record>
<record>
<File name="Test.mot">
<Address id="042001">
<Data>BB CC DD</Data>
</Address>
</File>
</record>
<record>
<File name="Test.mot">
<Address id="042004">
<Data>EE FF</Data>
</Address>
</File>
</record>
Then the program would get the data and address from he XML and search the .mot file for any hits. So if a mot file has a record S214042000AABBCCDDEEFF01234567891A2B3C4D5EF9, then this is supposed to bring a match with what's in the xml. Result to true, or 1. If anything in the xml doesn't have a match, then it would return with false or 0.
The problem now would be I’m not well-versed with C# much less with XML although I did have a tiny bit of experience with both. I initially thought it would be something like this:
using (StreamReader sr = new StreamReader("Test.mot"))
{
String line =String.Empty;
while ((line = sr.ReadLine()) != null)
{
if (line.Contains("042004") & line.Contains("EE FF"))
{
Console.WriteLine("Success");
}
else
{
Console.WriteLine("Failure");
}
}
}
But obviously, it didn't result with what I expected. And Failure keeps popping up. Am I right to use StreamReader to read the .mot file? And with regards to the XML file, will XMLDocument work? How do I get data from the xml and compare it with the .mot file? Could someone walk me through how to get this done or provide guides how to properly start with this.
Let me know if I'm not clear on anything.
EDIT:
I thought of an idea. I'm not sure if it's doable, though. Let's say the program will read the mot S-Record file, and it will identify the type of the record. From there every record line listed in the file would be broken down as shown in the sample below:
sample record line: "S214042000AABBCCDDEEFF01234567891A2B3C4D5EF9"
S2 - type w/c means there would be a 3-byte address
14 - record length
F9 - checksum
042000 - AA
042001 - BB
042002 - CC
042003 - DD
...
04200F - 5E
With this new list, I think or I hope it would be easier for the program to use the data in the XML to locate it in the mot file.
Tell me if this will work, or if there are any alternatives.
Correct me when i'm wrong as it is full of assumptions:
the XML only gives the starting values of the data package under the mot file:
||||||||||||
S214042000AABBCCDDEEFF01234567891A2B3C4D5EF9
AABBCCDDEEFF
You could read out the xml and place each record in a record class
public class Record
{
string FileName{get;set;}
string Id {get;set;}
string Data {get;set;}
public Record(){} //default constructor
}
with the XmlDocument class you could read out the xml.
something like:
var document = new XmlDocument();
document.LoadXml("your.xml");
var records = document.SelectNodes("record");
var recordList = new List<Record>();
foreach(var r in records)
{
var file = r.SelectSingleNode("file");
var fileName = file.Attributes["name"].Value;
var address = file.SelectSingleNode("Address");
var id = address.Attributes["id"].Value;
var data = address.SelectSingleNode("Data").InnerText.Replace(" ", "");
recordList.Add(new Record{FileName = fileName, Id = id, Data = data});
}
Afterwards you can then readout everyline of the mot file by position:
since the location of the 042000 always be the 5 - 10 character
var fn = "Test.mot";
using (StreamReader sr = new StreamReader(fn))
{
var record = recordList.Single(r=> r.FileName);
String line =String.Empty;
while ((line = sr.ReadLine()) != null)
{
if (line.SubString(4,6) == record.Id && line.SubString(10, record.Data.Length) == record.Data)
{
Console.WriteLine("Success");
}
else
{
Console.WriteLine("Failure");
}
}
}
Let me know if it helped you out a bit

C# StreamReader read value of type [duplicate]

This is something that should be very simple. I just want to read numbers and words from a text file that consists of tokens separated by white space. How do you do this in C#? For example, in C++, the following code would work to read an integer, float, and word. I don't want to have to use a regex or write any special parsing code.
ifstream in("file.txt");
int int_val;
float float_val;
string string_val;
in >> int_val >> float_val >> string_val;
in.close();
Also, whenever a token is read, no more than one character beyond the token should be read in. This allows further file reading to depend on the value of the token that was read. As a concrete example, consider
string decider;
int size;
string name;
in >> decider;
if (decider == "name")
in >> name;
else if (decider == "size")
in >> size;
else if (!decider.empty() && decider[0] == '#')
read_remainder_of_line(in);
Parsing a binary PNM file is also a good example of why you would like to stop reading a file as soon as a full token is read in.
Brannon's answer explains how to read binary data. If you want to read text data, you should be reading strings and then parsing them - for which there are built-in methods, of course.
For example, to read a file with data:
10
10.5
hello
You might use:
using (TextReader reader = File.OpenText("test.txt"))
{
int x = int.Parse(reader.ReadLine());
double y = double.Parse(reader.ReadLine());
string z = reader.ReadLine();
}
Note that this has no error handling. In particular, it will throw an exception if the file doesn't exist, the first two lines have inappropriate data, or there are less than two lines. It will leave a value of null in z if the file only has two lines.
For a more robust solution which can fail more gracefully, you would want to check whether reader.ReadLine() returned null (indicating the end of the file) and use int.TryParse and double.TryParse instead of the Parse methods.
That's assuming there's a line separator between values. If you actually want to read a string like this:
10 10.5 hello
then the code would be very similar:
using (TextReader reader = File.OpenText("test.txt"))
{
string text = reader.ReadLine();
string[] bits = text.Split(' ');
int x = int.Parse(bits[0]);
double y = double.Parse(bits[1]);
string z = bits[2];
}
Again, you'd want to perform appropriate error detection and handling. Note that if the file really just consisted of a single line, you may want to use File.ReadAllText instead, to make it slightly simpler. There's also File.ReadAllLines which reads the whole file into a string array of lines.
EDIT: If you need to split by any whitespace, then you'd probably be best off reading the whole file with File.ReadAllText and then using a regular expression to split it. At that point I do wonder how you represent a string containing a space.
In my experience you generally know more about the format than this - whether there will be a line separator, or multiple values in the same line separated by spaces, etc.
I'd also add that mixed binary/text formats are generally unpleasant to deal with. Simple and efficient text handling tends to read into a buffer, which becomes problematic if there's binary data as well. If you need a text section in a binary file, it's generally best to include a length prefix so that just that piece of data can be decoded.
using (FileStream fs = File.OpenRead("file.txt"))
{
BinaryReader reader = new BinaryReader(fs);
int intVal = reader.ReadInt32();
float floatVal = reader.ReadSingle();
string stringVal = reader.ReadString();
}
I like using the StreamReader for quick and easy file access. Something like....
String file = "data_file.txt";
StreamReader dataStream = new StreamReader(file);
string datasample;
while ((datasample = dataStream.ReadLine()) != null)
{
// datasample has the current line of text - write it to the console.
Console.Writeline(datasample);
}
Not exactly the answer to your question, but just an idea to consider if you are new to C#: If you are using a custom text file to read some configuration parameters, you might want to check XML serialization topics in .NET.
XML serialization provides a simple way to write and read XML formatted files. For example, if you have a configuration class like this:
public class Configuration
{
public int intVal { get; set; }
public float floatVal { get; set; }
public string stringVal { get; set; }
}
you can simply save it and load it using the XmlSerializer class:
public void Save(Configuration config, string fileName)
{
XmlSerializer xml = new XmlSerializer(typeof(Configuration));
using (StreamWriter sw = new StreamWriter(fileName))
{
xml.Serialize(sw, config);
}
}
public Configuration Load(string fileName)
{
XmlSerializer xml = new XmlSerializer(typeof(Configuration));
using (StreamReader sr = new StreamReader(fileName))
{
return (Configuration)xml.Deserialize(sr);
}
}
Save method as defined above will create a file with the following contents:
<Configuration>
<intVal>0</intVal>
<floatVal>0.0</floatVal>
<stringVal></stringVal>
</Configuration>
Good thing about this approach is that you don't need to change the Save and Load methods if your Configuration class changes.
C# doesn't seem to have formatted stream readers like C++ (I would be happy to be corrected). So Jon Skeet approach of reading the contents as string and parsing them to the desired type would be the best.
Try someting like this:
http://stevedonovan.blogspot.com/2005/04/reading-numbers-from-file-in-c.html
IMHO Maybe to read a c# tutorial it will be really useful to have the whole picture in mind before asking
Here is my code to read numbers from the text file. It demonstrates the concept of reading numbers from text file "2 3 5 7 ..."
public class NumberReader
{
StreamReader reader;
public NumberReader(StreamReader reader)
{
this.reader = reader;
}
public UInt64 ReadUInt64()
{
UInt64 result = 0;
while (!reader.EndOfStream)
{
int c = reader.Read();
if (char.IsDigit((char) c))
{
result = 10 * result + (UInt64) (c - '0');
}
else
{
break;
}
}
return result;
}
}
Here is sample code to use this class:
using (StreamReader reader = File.OpenText("numbers.txt"))
{
NumberReader numbers = new NumberReader(reader);
while (! reader.EndOfStream)
{
ulong lastNumber = numbers.ReadUInt64();
}
}

Using XDocument to write raw XML

I'm trying to create a spreadsheet in XML Spreadsheet 2003 format (so Excel can read it). I'm writing out the document using the XDocument class, and I need to get a newline in the body of one of the <Cell> tags. Excel, when it reads and writes, requires the files to have the literal string
embedded in the string to correctly show the newline in the spreadsheet. It also writes it out as such.
The problem is that XDocument is writing CR-LF (\r\n) when I have newlines in my data, and it automatically escapes ampersands for me when I try to do a .Replace() on the input string, so I end up with &#10; in my file, which Excel just happily writes out as a string literal.
Is there any way to make XDocument write out the literal
as part of the XML stream? I know I can do it by deriving from XmlTextWriter, or literally just writing out the file with a TextWriter, but I'd prefer not to if possible.
I wonder if it might be better to use XmlWriter directly, and WriteRaw?
A quick check shows that XmlDocument makes a slightly better job of it, but xml and whitespace gets tricky very quickly...
I battled with this problem for a couple of days and finally came up with this solution. I used XMLDocument.Save(Stream) method, then got the formatted XML string from the stream. Then I replaced the &#10; occurrences with
and used the TextWriter to write the string to a file.
string xml = "<?xml version=\"1.0\"?><?mso-application progid='Excel.Sheet'?><Workbook xmlns=\"urn:schemas-microsoft-com:office:spreadsheet\" xmlns:o=\"urn:schemas-microsoft-com:office:office\" xmlns:x=\"urn:schemas-microsoft-com:office:excel\" xmlns:ss=\"urn:schemas-microsoft-com:office:spreadsheet\" xmlns:html=\"http://www.w3.org/TR/REC-html40\">";
xml += "<Styles><Style ss:ID=\"s1\"><Alignment ss:Vertical=\"Center\" ss:WrapText=\"1\"/></Style></Styles>";
xml += "<Worksheet ss:Name=\"Default\"><Table><Column ss:Index=\"1\" ss:AutoFitWidth=\"0\" ss:Width=\"75\" /><Row><Cell ss:StyleID=\"s1\"><Data ss:Type=\"String\">Hello&#10;&#10;World</Data></Cell></Row></Table></Worksheet></Workbook>";
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
doc.LoadXml(xml); //load the xml string
System.IO.MemoryStream stream = new System.IO.MemoryStream();
doc.Save(stream); //save the xml as a formatted string
stream.Position = 0; //reset the stream position since it will be at the end from the Save method
System.IO.StreamReader reader = new System.IO.StreamReader(stream);
string formattedXML = reader.ReadToEnd(); //fetch the formatted XML into a string
formattedXML = formattedXML.Replace("&#10;", "
"); //Replace the unhelpful &#10;'s with the wanted endline entity
System.IO.TextWriter writer = new System.IO.StreamWriter("C:\\Temp\test1.xls");
writer.Write(formattedXML); //write the XML to a file
writer.Close();

Categories