csvReader skips characters - c#

I am reading a Csv file in a asp.net Web application to produce a report. The CsvReader element does not read in special characters such as ± or Σ.
var avar = FileUploader.PostedFile.FileName;
var myfile = File.OpenText(avar);
CsvReader csv = new CsvReader(myfile);
data = csv.GetRecords<T>().ToList();
The reader skips the special characters mentioned above. Every other characters is read included characters surrounding the special characters. Can anyone tell me how to fix this? Thanks.

I use GetEncoding from this link Effective way to find any file's Encoding to find the encoding of my file.
then, I set the configurations:
CsvConfiguration config = new CsvConfiguration();
config.Delimiter = ",";
Encoding enc = GetEncoding(FileUploader.PostedFile.FileName);
config.Encoding = enc;
config.HasHeaderRecord = true;
config.QuoteNoFields = true;
Next, I use a FileStream to load file and send it to a StreamReader.
FileStream stream = File.OpenRead(FileUploader.PostedFile.FileName);
StreamReader reader = new StreamReader(stream, Encoding.GetEncoding(enc.HeaderName));
CsvReader csv = new CsvReader(reader, config);
datas = csv.GetRecords<T>().ToList();
All characters are readable when I load a file. is an IEnumerable class

Related

Hidden character in saved XML [duplicate]

I'm generating an utf-8 XML file using XDocument.
XDocument xml_document = new XDocument(
new XDeclaration("1.0", "utf-8", null),
new XElement(ROOT_NAME,
new XAttribute("note", note)
)
);
...
xml_document.Save(#file_path);
The file is generated correctly and validated with an xsd file with success.
When I try to upload the XML file to an online service, the service says that my file is wrong at line 1; I have discovered that the problem is caused by the BOM on the first bytes of the file.
Do you know why the BOM is appended to the file and how can I save the file without it?
As stated in Byte order mark Wikipedia article:
While Unicode standard allows BOM in
UTF-8 it does not require or
recommend it. Byte order has no
meaning in UTF-8 so a BOM only
serves to identify a text stream or
file as UTF-8 or that it was converted
from another format that has a BOM
Is it an XDocument problem or should I contact the guys of the online service provider to ask for a parser upgrade?
Use an XmlTextWriter and pass that to the XDocument's Save() method, that way you can have more control over the type of encoding used:
var doc = new XDocument(
new XDeclaration("1.0", "utf-8", null),
new XElement("root", new XAttribute("note", "boogers"))
);
using (var writer = new XmlTextWriter(".\\boogers.xml", new UTF8Encoding(false)))
{
doc.Save(writer);
}
The UTF8Encoding class constructor has an overload that specifies whether or not to use the BOM (Byte Order Mark) with a boolean value, in your case false.
The result of this code was verified using Notepad++ to inspect the file's encoding.
First of all: the service provider MUST handle it, according to XML spec, which states that BOM may be present in case of UTF-8 representation.
You can force to save your XML without BOM like this:
XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = new UTF8Encoding(false); // The false means, do not emit the BOM.
using (XmlWriter w = XmlWriter.Create("my.xml", settings))
{
doc.Save(w);
}
(Googled from here: http://social.msdn.microsoft.com/Forums/en/xmlandnetfx/thread/ccc08c65-01d7-43c6-adf3-1fc70fdb026a)
The most expedient way to get rid of the BOM character when using XDocument is to just save the document, then do a straight File read as a file, then write it back out. The File routines will strip the character out for you:
XDocument xTasks = new XDocument();
XElement xRoot = new XElement("tasklist",
new XAttribute("timestamp",lastUpdated),
new XElement("lasttask",lastTask)
);
...
xTasks.Add(xRoot);
xTasks.Save("tasks.xml");
// read it straight in, write it straight back out. Done.
string[] lines = File.ReadAllLines("tasks.xml");
File.WriteAllLines("tasks.xml",lines);
(it's hoky, but it works for the sake of expediency - at least you'll have a well-formed file to upload to your online provider) ;)
By UTF-8 Documents
String XMLDec = xDoc.Declaration.ToString();
StringBuilder sb = new StringBuilder(XMLDec);
sb.Append(xDoc.ToString());
Encoding encoding = new UTF8Encoding(false); // false = without BOM
File.WriteAllText(outPath, sb.ToString(), encoding);

Special character not reading in .txt file

I am using stream reader for reading a text file.
This is the contents of the .txt file:
</a> Schools's are a suitable public </a>
When I read that text I got:
<a>Schoolss are a suitable public<a>
As you can see I did't receive the quotation. How can I receive the special character in a stream reader?
I used following code:
using (StreamReader reader = new StreamReader(CommonGetSet.FileName, System.Text.Encoding.ASCII))
{
string text = reader.ReadToEnd();
docKeyword = XDocument.Parse(text);
}
The problem you are having is that you are trying to load a text file with an xml reader, i.e. this part:
XDocument.Load(reader);
If you look at this question: What characters do I need to escape in XML documents?, you will see other characters that will be stripped/need escaping too.
If you inspect the StreamReader in the debugger you will see it shows the correct text, something that the answer by #JinsPeter shows. So you need to read in a text file, the easiest way is to use either File.ReadAllText or File.ReadAllLines depending on whether you want the result as a string or string[] respectively:
string contents = File.ReadAllText(path);
string[] lines = File.ReadAllLines(path);
However, if for some reason you really want to use a StreamReader you can read directly from the stream using ReadToEnd, ReadLine or any other appropriate read method:
using (StreamReader reader = new StreamReader(path))
{
string contents = reader.ReadToEnd();
}
However, note that the StreamReader methods will read from the current position in the stream so you may need to set the position yourself.
For a list of other ways to read in a file in C# see this question: How to read an entire file to a string using C#?.
When I printed the same text inside the StreamReader I got the ' .
So the issue is with writing it to XML or HTML. Try to fix that rather than finding issue in StreamReader.
using (StreamReader inputStream = new StreamReader(filepath, System.Text.Encoding.UTF8))
{
string line = inputStream.ReadToEnd();
Console.WriteLine(line);
}

Inserting a doc file inplace of place holder

I have a word document which contain many pages. One of those pages contain a placeholder instead of other content. so I want to replace that placeholder with another doc file without losing formatting. This doc file which is to be replaced may have many pages. How can I replace that placeholder with this doc file programmatically.. I searched many but could not find any option to insert a doc file replacing a placeholder.. Thank You In Advance.
Or how can we copy the contents of doc to be inserted and then replace the placeholder with copied content
I found a post here.The below code is from that post.
With the library, you can do the following to replace text from a Word document, considering that documentByteArray is your document byte content taken from database:
using (MemoryStream mem = new MemoryStream())
{
mem.Write(documentByteArray, 0, (int)documentByteArray.Length);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
Regex regexText = new Regex("Hello world!");
docText = regexText.Replace(docText, "Hi Everyone!");
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
}
if instead of "Hi Everyone" if we replace it with a binarydata,which is an array of bytes
byte[] binarydata = File.ReadAllBytes(filepaths);
how can we modify the program?
First of all you should get a Nuget package called Novacode.Docx, this is what I have found to be the best Document creator and editor in the last few years.
using Novacode.Docx;
void Main()
{
var doc = DocX.Load(#"c:\temp\existingDoc.docx");
var docToAdd = DocX.Load(#"c:\temp\docToAdd.docx");
doc.InsertDocument(docToAdd, true); //version 1.0.0.22
doc.InsertDocument(docToAdd); //version 1.0.0.19
}
this is the most simple and basic implementation of what it is that youre after but this works.
for anything else take a look at the documentation at
https://docx.codeplex.com/
or
http://cathalscorner.blogspot.co.uk/
this will be the best place to start. I would also recommend that if you do use this one that you use the version 1.0.0.19 as there are some formatting issues in 1.0.0.22

Weird character encoded characters (’) appearing from a feed

I've got a question regarding an XML feed and XSL transformation I'm doing. In a few parts of the outputted feed on an HTML page, I get weird characters (such as ’) appearing on the page.
On another site (that I don't own) that's using the same feed, it isn't getting these characters.
Here's the code I'm using to grab and return the transformed content:
string xmlUrl = "http://feedurl.com/feed.xml";
string xmlData = new System.Net.WebClient().DownloadString(xmlUrl);
string xslUrl = "http://feedurl.com/transform.xsl";
XsltArgumentList xslArgs = new XsltArgumentList();
xslArgs.AddParam("type", "", "specifictype");
string resultText = Utils.XslTransform(xmlData, xslUrl, xslArgs);
return resultText;
And my Utils.XslTransform function looks like this:
static public string XslTransform(string data, string xslurl)
{
TextReader textReader = new StringReader(data);
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Ignore;
XmlReader xmlReader = XmlReader.Create(textReader, settings);
XmlReader xslReader = new XmlTextReader(Uri.UnescapeDataString(xslurl));
XslCompiledTransform myXslT = new XslCompiledTransform();
myXslT.Load(xslReader);
StringBuilder sb = new StringBuilder();
using (TextWriter tw = new StringWriter(sb))
{
myXslT.Transform(xmlReader, new XsltArgumentList(), tw);
}
string transformedData = sb.ToString();
return transformedData;
}
I'm not extremely knowledgeable with character encoding issues and I've been trying to nip this in the bud for a bit of time and could use any suggestions possible. I'm not sure if there's something I need to change with how the WebClient downloads the file or something going weird in the XslTransform.
Thanks!
Give HtmlEncode a try. So in this case you would reference System.Web and then make this change (just call the HtmlEncode function on the last line):
string xmlUrl = "http://feedurl.com/feed.xml";
string xmlData = new System.Net.WebClient().DownloadString(xmlUrl);
string xslUrl = "http://feedurl.com/transform.xsl";
XsltArgumentList xslArgs = new XsltArgumentList();
xslArgs.AddParam("type", "", "specifictype");
string resultText = Utils.XslTransform(xmlData, xslUrl, xslArgs);
return HttpUtility.HtmlEncode(resultText);
The character â is a marker of multibyte sequence (’) of UTF-8-encoded text when it's represented as ASCII. So, I guess, you generate an HTML file in UTF-8, while browser interprets it otherwise. I see 2 ways to fix it:
The simplest solution would be to update the XSLT to include the HTML meta tag that will hint the correct encoding to browser: <meta charset="UTF-8">.
If your transform already defines a different encoding in meta tag and you'd like to keep it, this encoding needs to be specified in the function that saves XML as file. I assume this function took ASCII by default in your example. If your XSLT was configured to generate XML files directly to disk, you could adjust it with XSLT instruction <xsl:output encoding="ASCII"/>.
To use WebClient.DownloadString you have to know what the encoding the server is going use and tell the WebClient in advance. It's a bit of a Catch-22.
But, there is no need to do that. Use WebClient.DownloadData or WebClient.OpenReader and let an XML library figure out which encoding to use.
using (var web = new WebClient())
using (var stream = web.OpenRead("http://unicode.org/repos/cldr/trunk/common/supplemental/windowsZones.xml"))
using (var reader = XmlReader.Create(stream, new XmlReaderSettings { DtdProcessing = DtdProcessing.Parse }))
{
reader.MoveToContent();
//… use reader as you will, including var doc = XDocument.ReadFrom(reader);
}

Encode XDocumnet form win-1251 to utf-8

I try to convert XDocument from win-1 to utf-8. But in raw-view russian characters have bad view.
var encoding = new UTF8Encoding(false,false);
XmlTextWriter xmlTextWriter = new XmlTextWriter("F:\\File", Encoding.GetEncoding("windows-1251"));
document.Save(xmlTextWriter);
xmlTextWriter.Close();
xmlTextWriter = null;
string text = File.ReadAllText("F:\\File", Encoding.Default);
XDocument documentcode = XDocument.Parse(text);
xmlTextWriter = new XmlTextWriter(_Stream, encoding);
documentcode.Save(xmlTextWriter);
xmlTextWriter.Flush();
_Stream.Position = 0;
Headers.ContentType = new MediaTypeHeaderValue("application/xml");
This is the raw-view in SOAPUI
<?xml version="1.0" encoding="utf-8"?><StatObservationList><StatObservation><ObjectID>0b575ec1-7dea-41c4-a1f0-287190715ed2</ObjectID><Name>Тестовое статнаблюдение</Name><Code>GPPCode42</Code></StatObservation><StatObservation><ObjectID>3a871ea1-06ee-4991-a263-d643b424bdd4</ObjectID><Name>МиСП</Name><Code /></StatObservation></StatObservationList>
I think I've got it now. The text in your XDocument has, for whatever reason, been decoded incorrectly using Windows-1251.
Ideally, you need to go back to the source and ensure it is decoded properly (with UTF8). Converting this may not be an entirely loss-free process, as there are code points in the UTF8 that don't have a representation in Windows-1251 (a quick glance at the code page shows nothing for 0x98, for example).
However, to convert this after the fact the simplest way is just to get the text back, get the bytes for the encoding it was decoded with and then decode those with the correct encoding:
var windows1251 = Encoding.GetEncoding("windows-1251");
var utf8 = Encoding.UTF8;
var originalBytes = windows1251.GetBytes(document.ToString());
var correctXmlString = utf8.GetString(originalBytes);
var correctDocument = XDocument.Parse(correctXmlString);

Categories