ï»¿ characters appended to the beginning of each file

ï»¿ characters appended to the beginning of each file - c#

I've downloaded an HttpHandler class that concatenates JS files into one file and it keeps appending the ï»¿ characters at the start of each file it concatenates.
Any ideas on what is causing this? Could it be that onces the files processed they are written to the cache and that's how the cache is storing/rendering it?
Any inputs would be greatly appreciated.
using System;
using System.Net;
using System.IO;
using System.IO.Compression;
using System.Text;
using System.Configuration;
using System.Web;
public class HttpCombiner : IHttpHandler {
private const bool DO_GZIP = false;
private readonly static TimeSpan CACHE_DURATION = TimeSpan.FromDays(30);
public void ProcessRequest (HttpContext context) {
HttpRequest request = context.Request;
// Read setName, contentType and version. All are required. They are
// used as cache key
string setName = request["s"] ?? string.Empty;
string contentType = request["t"] ?? string.Empty;
string version = request["v"] ?? string.Empty;
// Decide if browser supports compressed response
bool isCompressed = DO_GZIP && this.CanGZip(context.Request);
// Response is written as UTF8 encoding. If you are using languages
// like Arabic, you should change this to proper encoding
UTF8Encoding encoding = new UTF8Encoding(false);
// If the set has already been cached, write the response directly
// from cache. Otherwise generate the response and cache it
if (!this.WriteFromCache(context, setName, version, isCompressed,
contentType))
{
using (MemoryStream memoryStream = new MemoryStream(5000))
{
// Decide regular stream or GZipStream based on whether the
// response can be cached or not
using (Stream writer = isCompressed
? (Stream)(new GZipStream(memoryStream,
CompressionMode.Compress))
: memoryStream)
{
// Load the files defined in <appSettings> and process
// each file
string setDefinition = System.Configuration
.ConfigurationManager.AppSettings[setName] ?? "";
string[] fileNames = setDefinition.Split(
new char[] { ',' },
StringSplitOptions.RemoveEmptyEntries);
foreach (string fileName in fileNames)
{
byte[] fileBytes = this.GetFileBytes(
context, fileName.Trim(), encoding);
writer.Write(fileBytes, 0, fileBytes.Length);
}
writer.Close();
}
// Cache the combined response so that it can be directly
// written in subsequent calls
byte[] responseBytes = memoryStream.ToArray();
context.Cache.Insert(
GetCacheKey(setName, version, isCompressed),
responseBytes, null,
System.Web.Caching.Cache.NoAbsoluteExpiration,
CACHE_DURATION);
// Generate the response
this.WriteBytes(responseBytes, context, isCompressed,
contentType);
}
}
}
private byte[] GetFileBytes(HttpContext context, string virtualPath,
Encoding encoding)
{
if (virtualPath.StartsWith("http://",
StringComparison.InvariantCultureIgnoreCase))
{
using (WebClient client = new WebClient())
{
return client.DownloadData(virtualPath);
}
}
else
{
string physicalPath = context.Server.MapPath(virtualPath);
byte[] bytes = File.ReadAllBytes(physicalPath);
// TODO: Convert unicode files to specified encoding.
// For now, assuming files are either ASCII or UTF8
return bytes;
}
}
private bool WriteFromCache(HttpContext context, string setName,
string version, bool isCompressed, string contentType)
{
byte[] responseBytes = context.Cache[GetCacheKey(setName, version,
isCompressed)] as byte[];
if (null == responseBytes || 0 == responseBytes.Length) return false;
this.WriteBytes(responseBytes, context, isCompressed, contentType);
return true;
}
private void WriteBytes(byte[] bytes, HttpContext context,
bool isCompressed, string contentType)
{
HttpResponse response = context.Response;
response.AppendHeader("Content-Length", bytes.Length.ToString());
response.ContentType = contentType;
if (isCompressed)
response.AppendHeader("Content-Encoding", "gzip");
context.Response.Cache.SetCacheability(HttpCacheability.Public);
context.Response.Cache.SetExpires(DateTime.Now.Add(CACHE_DURATION));
context.Response.Cache.SetMaxAge(CACHE_DURATION);
context.Response.Cache.AppendCacheExtension(
"must-revalidate, proxy-revalidate");
response.OutputStream.Write(bytes, 0, bytes.Length);
response.Flush();
}
private bool CanGZip(HttpRequest request)
{
string acceptEncoding = request.Headers["Accept-Encoding"];
if (!string.IsNullOrEmpty(acceptEncoding) &&
(acceptEncoding.Contains("gzip")
|| acceptEncoding.Contains("deflate")))
return true;
return false;
}
private string GetCacheKey(string setName, string version,
bool isCompressed)
{
return "HttpCombiner." + setName + "." + version + "." + isCompressed;
}
public bool IsReusable
{
get { return true; }
}
}

The ï»¿ characters are the UTF BOM markers.

Its the UTF Byte Order Mark (BOM).
It will be at the start of each file, but your editor will ignore them there. When concatenated they end up in the middle, so you see them.

OK, I've debugged your code.
BOM marks appear in the source stream when the files are being read from the disk:
byte[] bytes = File.ReadAllBytes(physicalPath);
// TODO: Convert unicode files to specified encoding. For now, assuming
// files are either ASCII or UTF8
If you read the files properly, you can get rid of the marks.

I think this is the Byte Order Mark (BOM) for files with UTF-8 encoding. This mark allows to determine in what encoding the file is stored.

You didn't post what the actual solution was. Here's my soulution. On the line where it reads the file into memory, I found a kind of strange way to remove the BOM:
byte[] bytes = File.ReadAllBytes(physicalPath);
String ss = new StreamReader(new MemoryStream(bytes), true).ReadToEnd();
byte[] b = StrToByteArray(ss);
return b;
And you also need this function:
public static byte[] StrToByteArray(string str)
{
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
return encoding.GetBytes(str);
}
Nitech

If you have the file's contents in a string, .Trim() will lop off the "BOM" quite handily.
You may not be able to do that, or you may want the whitespace at the ends of the file, but it's certainly an option.
For .js whitespace isn't significant, so this could work.

Check how your js files are encoded and provide the same encoding in the code which does the reading and concatenation. These two characters usually point to unicode.

Those characters are UTF-8 BOM. It doesn't seem like they're coming from the gzipped stream. It's more likely they are inserted to the response stream, so I would suggest clearing the response before working with it:
context.Response.Clear();

Related

c# converting a .csv file from Windows UTF-8 to w1252

I need to convert a .csv file from UTF-8 to W1252 (West European).
I have tried the example from the MSDN page and the following code without succes
Encoding utf8 = Encoding.UTF8;
//Encoding utf8 = new UTF8Encoding();
Encoding win1252 = Encoding.GetEncoding(1252);
string src = today.ToString("dd-MM-yyyy") + "-ups.csv";
string source = File.ReadAllText(src);
byte[] input = source.ToUTF8ByteArray();
byte[] output = Encoding.Convert(utf8, win1252, input);
File.WriteAllText(src + "w1252", win1252.GetString(output));
with the extension method
public static class StringHelper
{
public static byte[] ToUTF8ByteArray(this string str)
{
Encoding encoding = new UTF8Encoding();
return encoding.GetBytes(str);
}
}
After this, the file still reads with broken characters when opened as W1252 and works perfectly if opening with UTF-8, confirming that it is not good.
Thanks!

Why not read in the initial encoding (Encoding.UTF8), and write in target one (Encoding.GetEncoding(1252)):
string fileName = #"C:\MyFile.csv";
File.WriteAllText(fileName, File
.ReadAllText(fileName, Encoding.UTF8), Encoding.GetEncoding(1252));

Unity3D WWW Error C#

I am working in Unity trying to figure out the WWW class and access API from online-go.com
I get an error in the Debug.Log though. Additionally, the Debug on Line 58 just returns a blank string. I don't think I am fully understanding how to use WWW since this is the first time I am using it.
Necessary data rewind wasn't possible
UnityEngine.Debug:Log(Object)
<LoadWWW>c__Iterator0:MoveNext() (at Assets/OGS.cs:60)
using UnityEngine;
using System.Collections;
using System.Collections.Generic;
using System;
using System.IO;
using System.Net;
using System.Text;
//using System.Net.httpclient;
public class OGS : MonoBehaviour {
string generateAPIClient = "http://beta.online-go.com/developer";
string APIKey = "0c63a59dd17ec69a48af5d9dc8b4e956";
string requestUserToken = "oauth2/access_token";
string clientID = "";
string clientSecret = "";
string baseURL = "http://online-go.com/";
string url = "";
string username;
string password;
string POST;
List<Settings> settings;
// Use this for initialization
void Start () {
Debug.Log("Opened");
settings = new List<Settings>();
Load("Settings");
clientID = AssignSetting("clientID");
clientSecret = AssignSetting("clientSecret");
username = AssignSetting("username");
password = AssignSetting("password");
POST = string.Format( "client_id={0}&client_secret={1}&grant_type=password&username={2}&password={3}",
clientID, clientSecret, username, password);
url = baseURL + requestUserToken;
StartCoroutine("LoadWWW");
}
//Assign settings loaded to settings variables
string AssignSetting (string item) {
int position = -1;
for(int i=0;i<settings.Count;i++) {
if(settings[i].name == item){return settings[i].value;}
}
return string.Empty;
}
IEnumerator LoadWWW() {
byte[] byteArray = GetBytes(POST);
Dictionary<string,string> headers = new Dictionary<string,string>();
headers.Add("Content-Type", "application/x-www-form-urlencoded");
WWW text = new WWW(url, byteArray, headers);
yield return text;
byteArray = text.bytes;
string POSTResponse = GetString(byteArray);
Debug.Log(POSTResponse);
Debug.Log(text.responseHeaders);
Debug.Log(text.error);
}
static byte[] GetBytes(string str)
{
byte[] bytes = new byte[str.Length * sizeof(char)];
System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
return bytes;
}
static string GetString(byte[] bytes)
{
char[] chars = new char[bytes.Length / sizeof(char)];
System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);
return new string(chars);
}
private bool Load(string fileName)
{
// Handle any problems that might arise when reading the text
try
{
string line;
// Create a new StreamReader, tell it which file to read and what encoding the file
// was saved as
StreamReader theReader = new StreamReader(Application.dataPath + "/Resources/" + fileName + ".txt");
// Immediately clean up the reader after this block of code is done.
// You generally use the "using" statement for potentially memory-intensive objects
// instead of relying on garbage collection.
// (Do not confuse this with the using directive for namespace at the
// beginning of a class!)
using (theReader)
{
// While there's lines left in the text file, do this:
do
{
line = theReader.ReadLine();
if (line != null)
{
// Do whatever you need to do with the text line, it's a string now
// In this example, I split it into arguments based on comma
// deliniators, then send that array to DoStuff()
string[] entries = line.Split(':');
if (entries.Length > 0){
Settings newSetting = new Settings(entries[0], entries[1]);
settings.Add(newSetting);
}
}
}
while (line != null);
// Done reading, close the reader and return true to broadcast success
theReader.Close();
return true;
}
}
// If anything broke in the try block, we throw an exception with information
// on what didn't work
catch (Exception e)
{
Console.WriteLine("{0}\n", e.Message);
return false;
}
}
}

necessary data rewind wasn't possible mainly occurs when redirection is involved during the WWW call.
To fix this, make sure that the URL's you call are not redirecting you to another page in the process. Also it would be a good idea to have some error handling before you use the value.
// wait for the result
yield return text;
// Handle the error if there is any
if (!string.IsNullOrEmpty(text.error)) {
Debug.Log(text.error);
}
// Now do with POSTResponse whatever you want if there were no errors.

Decode Base64 and Inflate Zlib compressed XML

Sorry for the long post, will try to make this as short as possible.
I'm consuming a json API (which has zero documentation of course) which returns something like this:
{
uncompressedlength: 743637,
compressedlength: 234532,
compresseddata: "lkhfdsbjhfgdsfgjhsgfjgsdkjhfgj"
}
The data (xml in this case) is compressed and then base64 encoded data which I am attempting to extract. All I have is their demo code written in perl to decode it:
use Compress::Zlib qw(uncompress);
use MIME::Base64 qw(decode_base64);
my $uncompresseddata = uncompress(decode_base64($compresseddata));
Seems simple enough.
I've tried a number of methods to decode the base64:
private string DecodeFromBase64(string encodedData)
{
byte[] encodedDataAsBytes = System.Convert.FromBase64String(encodedData);
string returnValue = System.Text.Encoding.Unicode.GetString(encodedDataAsBytes);
return returnValue;
}
public string base64Decode(string data)
{
try
{
System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding();
System.Text.Decoder utf8Decode = encoder.GetDecoder();
byte[] todecode_byte = Convert.FromBase64String(data);
int charCount = utf8Decode.GetCharCount(todecode_byte, 0, todecode_byte.Length);
char[] decoded_char = new char[charCount];
utf8Decode.GetChars(todecode_byte, 0, todecode_byte.Length, decoded_char, 0);
string result = new String(decoded_char);
return result;
}
catch (Exception e)
{
throw new Exception("Error in base64Decode" + e.Message);
}
}
And I have tried using Ionic.Zip.dll (DotNetZip?) and zlib.net to inflate the Zlib compression. But everything errors out. I am trying to track down where the problem is coming from. Is it the base64 decode or the Inflate?
I always get an error when inflating using zlib: I get a bad Magic Number error using zlib.net and I get "Bad state (invalid stored block lengths)" when using DotNetZip:
string decoded = DecodeFromBase64(compresseddata);
string decompressed = UnZipStr(GetBytes(decoded));
public static string UnZipStr(byte[] input)
{
using (MemoryStream inputStream = new MemoryStream(input))
{
using (Ionic.Zlib.DeflateStream zip =
new Ionic.Zlib.DeflateStream(inputStream, Ionic.Zlib.CompressionMode.Decompress))
{
using (StreamReader reader =
new StreamReader(zip, System.Text.Encoding.UTF8))
{
return reader.ReadToEnd();
}
}
}
}
After reading this:
http://george.chiramattel.com/blog/2007/09/deflatestream-block-length-does-not-match.html
And listening to one of the comments. I changed the code to this:
MemoryStream memStream = new MemoryStream(Convert.FromBase64String(compresseddata));
memStream.ReadByte();
memStream.ReadByte();
DeflateStream deflate = new DeflateStream(memStream, CompressionMode.Decompress);
string doc = new StreamReader(deflate, System.Text.Encoding.UTF8).ReadToEnd();
And it's working fine.

This was the culprit:
http://george.chiramattel.com/blog/2007/09/deflatestream-block-length-does-not-match.html
With skipping the first two bytes I was able to simplify it to:
MemoryStream memStream = new MemoryStream(Convert.FromBase64String(compresseddata));
memStream.ReadByte();
memStream.ReadByte();
DeflateStream deflate = new DeflateStream(memStream, CompressionMode.Decompress);
string doc = new StreamReader(deflate, System.Text.Encoding.UTF8).ReadToEnd();

First, use System.IO.Compression.DeflateStream to re-inflate the data. You should be able to use a MemoryStream as the input stream. You can create a MemoryStream using the byte[] result of Convert.FromBase64String.
You are likely causing all kinds of trouble trying to convert the base64 result to a given encoding; use the raw data directly to Deflate.

Convert UTF-16 text to another encoding (Windows-1250)

I have a text in a variable, text, encoded in the default (UTF-16) encoding. I would like to change it to Windows-1250. I have:
public static string EncodeToWin1250(string text)
{
Encoding unicode = Encoding.Unicode;
Encoding win1250 = Encoding.GetEncoding(1250);
byte[] unicodeBytes = unicode.GetBytes(text);
byte[] win1250Bytes = Encoding.Convert(unicode, win1250, unicodeBytes);
char[] win1250Chars = new char[win1250.GetCharCount(win1250Bytes, 0, win1250Bytes.Length)];
win1250.GetChars(win1250Bytes, 0, win1250Bytes.Length, win1250Chars, 0);
text = new string(win1250Chars);
return text;
}
but so far it doesn't work.
How do I fix this problem?
I am returning the string as a file:
[...]
result = BLL.DataExchange.MoneyS3.MoneyS3Export.EncodeToWin1250(result);
Context.Response.Clear();
Context.Response.AddHeader("Content-Disposition", "attachment; filename=invoicesIssued.xml");
Context.Response.ContentType = "application/octet-stream";
Context.Response.BufferOutput = false;
Context.Response.Write(result);
Context.Response.Flush();
Context.Response.Close();

All strings are stored internally as Unicode in .NET.
You can convert a string to a byte stream using a codepage, as your code does. But your can't change the internal representation of the string: It's Unicode (encoded as UTF16), period.
You may dump your encoded byte stream to a file or wherever you want. But you can't change the internal encoding of .NET string objects.
Your function should return a byte[] instead of a string (win1250Chars actually)

C#/Why does Get html returns random junk characters?

I have this for ex:
Link
This code:
const String nick = "Alex";
const String log = "http://demonscity.combats.com/zayavka.pl?logs=";
foreach (DateTime cd in dateRange)
{
string str = log + String.Format("{0:MM_dd_yy}", cd.Date) + "&filter=" + nick;
String htmlCode = wc.DownloadString(str);
}
returns something...."‹\b\0\0\0\0\0\0я•XYsЫЦ~зЇёѕ™d)bг.тBҐ$ЪRЖ’<2УN&сh#р ’„\f\0J–—_Фџђ§¤нt¦г6ќѕУЄђ0’IQtТґcµо№X(jі-Щ/Ђі|g?`yҐ¶ц"
Other links works fine.
I think the problem is with codepage, how can i fix it? Or it's server problem?

The issue is that the response is GZip-compressed (response has a Content-Encoding: gzip header). You need to first decompress it, then you'll be able to read it:
public class StackOverflow_6660689
{
public static void Test()
{
WebClient wc = new WebClient();
Encoding encoding = Encoding.GetEncoding("windows-1251");
byte[] data = wc.DownloadData("http://demonscity.combats.com/zayavka.pl?logs=08_07_11&filter=Alex");
GZipStream gzip = new GZipStream(new MemoryStream(data), CompressionMode.Decompress);
MemoryStream decompressed = new MemoryStream();
gzip.CopyTo(decompressed);
string str = encoding.GetString(decompressed.GetBuffer(), 0, (int)decompressed.Length);
Console.WriteLine(str);
}
}

I think it is returning result in gzip format which it should not unless client explicitly accepts the format.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

ï»¿ characters appended to the beginning of each file - c#

The ï»¿ characters are the UTF BOM markers.

Its the UTF Byte Order Mark (BOM). It will be at the start of each file, but your editor will ignore them there. When concatenated they end up in the middle, so you see them.

I think this is the Byte Order Mark (BOM) for files with UTF-8 encoding. This mark allows to determine in what encoding the file is stored.

If you have the file's contents in a string, .Trim() will lop off the "BOM" quite handily. You may not be able to do that, or you may want the whitespace at the ends of the file, but it's certainly an option. For .js whitespace isn't significant, so this could work.

Check how your js files are encoded and provide the same encoding in the code which does the reading and concatenation. These two characters usually point to unicode.

Those characters are UTF-8 BOM. It doesn't seem like they're coming from the gzipped stream. It's more likely they are inserted to the response stream, so I would suggest clearing the response before working with it: context.Response.Clear();

Related

c# converting a .csv file from Windows UTF-8 to w1252

Unity3D WWW Error C#

Decode Base64 and Inflate Zlib compressed XML

Convert UTF-16 text to another encoding (Windows-1250)

C#/Why does Get html returns random junk characters?

Categories

Resources