FileStream and Encoding

FileStream and Encoding - c#

I have a program write save a text file using stdio interface. It swap the 4 MSB with the 4 LSB, except the characters CR and/or LF.
I'm trying to "decode" this stream using a C# program, but I'm unable to get the original bytes.
StringBuilder sb = new StringBuilder();
StreamReader sr = new StreamReader("XXX.dat", Encoding.ASCII);
string sLine;
while ((sLine = sr.ReadLine()) != null) {
string s = "";
byte[] bytes = Encoding.ASCII.GetBytes(sLine);
for (int i = 0; i < sLine.Length; i++) {
byte c = bytes[i];
byte lb = (byte)((c & 0x0F) << 4), hb = (byte)((c & 0xF0) >> 4);
byte ascii = (byte)((lb) | (hb));
s += Encoding.ASCII.GetString(new byte[] { ascii });
}
sb.AppendLine(s);
}
sr.Close();
return (sb);
I've tried to change encoding in UTF8, but it didn't worked. I've also used a BinaryReader created using the 'sr' StreamReader, but nothing good happend.
StringBuilder sb = new StringBuilder();
StreamReader sr = new StreamReader("XXX.shb", Encoding.ASCII);
BinaryReader br = new BinaryReader(sr.BaseStream);
string sLine;
string s = "";
while (sr.EndOfStream == false) {
byte[] buffer = br.ReadBytes(1);
byte c = buffer[0];
byte lb = (byte)((c & 0x0F) << 4), hb = (byte)((c & 0xF0) >> 4);
byte ascii = (byte)((lb) | (hb));
s += Encoding.ASCII.GetString(new byte[] { ascii });
}
sr.Close();
return (sb);
If the file starts with 0xF2 0xF2 ..., I read everything except the expected value. Where is the error? (i.e.: 0xF6 0xF6).
Actually this C code do the job:
...
while (fgets(line, 2048, bfd) != NULL) {
int cLen = strlen(xxx), lLen = strlen(line), i;
// Decode line
for (i = 0; i < lLen-1; i++) {
unsigned char c = (unsigned char)line[i];
line[i] = ((c & 0xF0) >> 4) | ((c & 0x0F) << 4);
}
xxx = realloc(xxx , cLen + lLen + 2);
xxx = strcat(xxx , line);
xxx = strcat(xxx , "\n");
}
fclose(bfd);
What wrong in the C# code?

Got it.
The problem is the BinaryReader construction:
StreamReader sr = new StreamReader("XXX.shb", Encoding.ASCII);
BinaryReader br = new BinaryReader(sr.BaseStream);
I think this construct a BinaryReader based on StreaReader which "translate" characters coming from the file.
Using this code, actually works well:
FileInfo fi = new FileInfo("XXX.shb");
BinaryReader br = new BinaryReader(fi.OpenRead());
I wonder if it is possible to read those kind of data with a Text stream reader line by line, since line endings are preserved during "encoding" phase.

I guess you should use a BinaryReader and ReadBytes(), then only use Encoding.ASCII.GetString() on the bytesequence after you have swapped the bits.
In your example, you seem to read the file as ascii (meaning, you convert bytes to .NET internal dual-byte code upon read telling it that it is ascii), then convert it BACK to bytes again, as ascii-bytes.
That is unnecessary for you.

Related

Reading multi language text file in c#

I have to read a text file which can contains char from following languages: English, Japanese, Chinese, French, Spanish, German, Italian
My task is to simply read the data and write it to new text file (placing new line char \n after 100 chars).
I cannot use File.ReadAllText and File.ReadAllLines as file size can be more than 500 MB. So I have written following code:
using (var streamReader = new StreamReader(inputFilePath, Encoding.ASCII))
{
using (var streamWriter = new StreamWriter(outputFilePath,false))
{
char[] bytes = new char[100];
while (streamReader.Read(bytes, 0, 100) > 0)
{
var data = new string(bytes);
streamWriter.WriteLine(data);
}
MessageBox.Show("Compleated");
}
}
Other than ASCII encoding I have tried UTF-7, UTF-8, UTF-32 and IBM500. But no luck in reading and writing multi language characters.
Please help me to achieve this.

You will have to take a look at the first 4 bytes of the file you are parsing.
these bytes will give you a hint on what encoding you have to use.
Here is a helper method I have written to do the task:
public static string GetStringFromEncodedBytes(this byte[] bytes) {
var encoding = Encoding.Default;
var skipBytes = 0;
if (bytes[0] == 0x2b && bytes[1] == 0x2f && bytes[2] == 0x76) {
encoding = Encoding.UTF7;
skipBytes = 3;
}
if (bytes[0] == 0xef && bytes[1] == 0xbb && bytes[2] == 0xbf) {
encoding = Encoding.UTF8;
skipBytes = 3;
}
if (bytes[0] == 0xff && bytes[1] == 0xfe) {
encoding = Encoding.Unicode;
skipBytes = 2;
}
if (bytes[0] == 0xfe && bytes[1] == 0xff) {
encoding = Encoding.BigEndianUnicode;
skipBytes = 2;
}
if (bytes[0] == 0 && bytes[1] == 0 && bytes[2] == 0xfe && bytes[3] == 0xff) {
encoding = Encoding.UTF32;
skipBytes = 4;
}
return encoding.GetString(bytes.Skip(skipBytes).ToArray());
}

This is a good enough start to get to the answer. If i is not equal to 100 you need to read more chars. No trouble with french chars like é - they are all handled in C# char class.
char[] soFlow = new char[100];
int posn = 0;
using (StreamReader sr = new StreamReader("a.txt"))
using (StreamWriter sw = new StreamWriter("b.txt", false))
while(sr.EndOfStream == false)
{
try {
int i = sr.Read(soFlow, posn%100, 100);
//if i < 100 need to read again with second char array
posn += 100;
sw.WriteLine(new string(soFlow));
}
catch(Exception e){Console.WriteLine(e.Message);}
}
Spec: Read(Char[], Int32, Int32) Reads a specified maximum of characters from the current stream into a buffer, beginning at the specified index.
Certainly worked for me anyway :)

How to replace extended ASCII characters in C#?

I am trying to replace non-printable characters ie extended ASCII characters from a HUGE string.
foreach (string line in File.ReadLines(txtfileName.Text))
{
MessageBox.Show( Regex.Replace(line,
#"\p{Cc}",
a => string.Format("[{0:X2}]", " ")
)); ;
}
this doesnt seem to be working.
EX:
AAÂAA should be converted to AA AA

Assuming the Encoding to be UTF8 try this:
string strReplacedVal = Encoding.ASCII.GetString(
Encoding.Convert(
Encoding.UTF8,
Encoding.GetEncoding(
Encoding.ASCII.EncodingName,
new EncoderReplacementFallback(" "),
new DecoderExceptionFallback()
),
Encoding.UTF8.GetBytes(line)
)
);

Since you are opening the file as UTF-8, it must be. So, its code units are one byte and UTF-8 has the very nice feature of encoding characters above ␡ with bytes exclusively above 0x7f and characters at or below ␡ with bytes exclusively at or below 0x7f.
For efficiency, you can rewrite the file in place a few KB at a time.
Note: that some characters might be replaced by more than one space, though.
// Operates on a UTF-8 encoded text file
using (var stream = File.Open(path, FileMode.Open, FileAccess.ReadWrite))
{
const int size = 4096;
var buffer = new byte[size];
int count;
while ((count = stream.Read(buffer, 0, size)) > 0)
{
var changed = false;
for (int i = 0; i < count; i++)
{
// obliterate all bytes that are not encoded characters between ␠ and ␡
if (buffer[i] < ' ' | buffer[i] > '\x7f')
{
buffer[i] = (byte)' ';
changed = true;
}
}
if (changed)
{
stream.Seek(-count, SeekOrigin.Current);
stream.Write(buffer, 0, count);
}
}
}

Convert float to its binary representation (using MemoryStream?)

I'd like to convert a given float into its binary representation. I tried to write the float value into a MemoryStream, read this MemoryStream byte by byte and convert the bytes into their binary representation. But every attempt failed.
"Can't read closed stream" (but I only closed the writer)
For test purposes I simply wrote an integer (I think four bytes in size) and the length of the MemoryStream was 0, when I didn't flush the StreamWriter, and 1, when I did.
I'm sure there is a better way to convert floats to binary, but I also wanted to learn a little bit about the MemoryStream class.

You can use BitConverter.GetBytes(float) or use a BinaryWriter wrapping a MemoryStream and use BinaryWriter.Write(float). It's not clear exactly what you did with a MemoryStream before, but you don't want to use StreamWriter - that's for text.

Using BitConverter, not MemoryStream:
// -7 produces "1 10000001 11000000000000000000000"
static string FloatToBinary(float f)
{
StringBuilder sb = new StringBuilder();
Byte[] ba = BitConverter.GetBytes(f);
foreach (Byte b in ba)
for (int i = 0; i < 8; i++)
{
sb.Insert(0,((b>>i) & 1) == 1 ? "1" : "0");
}
string s = sb.ToString();
string r = s.Substring(0, 1) + " " + s.Substring(1, 8) + " " + s.Substring(9); //sign exponent mantissa
return r;
}

Dotnetfiddle
BitConverter.GetBytes(3.141f)
.Reverse()
.Select(x => Convert.ToString(x, 2))
.Select(x => x.PadLeft(8, '0'))
.Aggregate("0b", (a, b) => a + "_" + b);
// res = "0b_01000000_01001001_00000110_00100101"
Couldn't resist to use a "small" LINQ Query.
Works with double too.

You might have run into a pitfall when using StreamWriter, as the following code shows:
// Write the float
var f = 1.23456f;
var ms = new MemoryStream();
var writer = new StreamWriter(ms);
writer.Write(f);
writer.Flush();
// Read 4 bytes to get the raw bytes (Ouch!)
ms.Seek(0, SeekOrigin.Begin);
var buffer = new char[4];
var reader = new StreamReader(ms);
reader.Read(buffer, 0, 4);
for (int i = 0; i < 4; i++)
{
Console.Write("{0:X2}", (int)buffer[i]);
}
Console.WriteLine();
// This is what you actually read: human readable text
for (int i = 0; i < buffer.Length; i++)
{
Console.Write(buffer[i]);
}
Console.WriteLine();
// This is what the float really looks like in memory.
var bytes = BitConverter.GetBytes(f);
for (int i = 0; i < bytes.Length; i++)
{
Console.Write("{0:X2}", (int)bytes[i]);
}
Console.ReadLine();
If you expect only 4 bytes to be in the stream and read those 4 bytes, everything looks fine at first sight. But actually the length is 7 and you have read only the first 4 bytes of the text representation of the float.
Comparing that to the output of the BitConverter reveals that using StreamWriter is not the correct thing here.

To answer your first question: In .Net, when you close/dispose a reader/writer, the underlying stream is also closed/disposed.

Converting from C++ ifstream to C# FileStream

I am attempting to learn SharpDX via DirectX tutorials. I have this line of code in the C++ project I am working from:
std::ifstream fin("Models/skull.txt");
if(!fin)
{
MessageBox(0, L"Models/skull.txt not found.", 0, 0);
return;
}
UINT vcount = 0;
UINT tcount = 0;
std::string ignore;
fin >> ignore >> vcount;
fin >> ignore >> tcount;
fin >> ignore >> ignore >> ignore >> ignore;
float nx, ny, nz;
XMFLOAT4 black(0.0f, 0.0f, 0.0f, 1.0f);
std::vector<Vertex> vertices(vcount);
for(UINT i = 0; i < vcount; ++i)
{
fin >> vertices[i].Pos.x >> vertices[i].Pos.y >> vertices[i].Pos.z;
vertices[i].Color = black;
// Normal not used in this demo.
fin >> nx >> ny >> nz;
}
fin >> ignore;
fin >> ignore;
fin >> ignore;
mSkullIndexCount = 3*tcount;
std::vector<UINT> indices(mSkullIndexCount);
for(UINT i = 0; i < tcount; ++i)
{
fin >> indices[i*3+0] >> indices[i*3+1] >> indices[i*3+2];
}
fin.close();
And I would like to know how to covert this over to C#. I am 99% sure I need to be using System.IO.FileStream but I am unsure how all the C++ stuff works. What is really messing me up is the fin >> ignore >> vcount; If someone can explain to me how to do the same thing in C# I can probably figure it out from there.
As requested the text file resembles this:
VertexCount: 31076
TriangleCount: 60339
VertexList (pos, normal)
{
0.592978 1.92413 -2.62486 0.572276 0.816877 0.0721907
0.571224 1.94331 -2.66948 0.572276 0.816877 0.0721907
0.609047 1.90942 -2.58578 0.572276 0.816877 0.0721907
…
}
TriangleList
{
0 1 2
3 4 5
6 7 8
…
}

ignore is declared as a std::string. It appears that the original author of the code you are looking at was not aware of the std::istream::ignore function and is using a local variable to read in elements of the file they are simply discarding (that is, he didn't care about). So the lines like:
fin >> ignore >> vcount;
Are reading in a string element (basically up to the first whitespace) and dumping it into the local string he is ignoring, and then reading in the vcount value (which he is storing as an unsigned int).
If you are going to port this to C#, you could do the same thing (read in parts of the file and simply discard them) and it would be a fairly direct port.
As an example (not tested):
using (FileStream file = new FileStream("File.txt", FileMode.Open))
using (StreamReader reader = new StreamReader(file))
{
// with your sample, this will read "VertexCount: 31076"
string line = reader.ReadLine();
string sVCount = line.Substring(line.IndexOf(": ") + 2);
uint vcount = int.Parse(sVCount);
// ... read of your code
}

Thanks to Zac Howland's answer I was able to get everything working. For anyone else trying to convert Frank Luna's book from DirectX to SharpDX I hope this helps. Here is what I ended up doing:
private void _buildGeometryBuffers()
{
System.IO.FileStream fs = new System.IO.FileStream(#"Chapter6/Content/skull.txt", System.IO.FileMode.Open);
int vcount = 0;
int tcount = 0;
//string ignore = string.Empty; // this is not needed for my C# version
using (System.IO.StreamReader reader = new System.IO.StreamReader(fs))
{
// Get the vertice count
string currentLine = reader.ReadLine();
string extractedLine = currentLine.Substring(currentLine.IndexOf(" ") + 1);
vcount = int.Parse(extractedLine);
// Get the indice count
currentLine = reader.ReadLine();
extractedLine = currentLine.Substring(currentLine.IndexOf(" ") + 1);
tcount = int.Parse(extractedLine);
// Create vertex buffer
// Skip over the first 2 lines (these are not the lines we are looking for)
currentLine = reader.ReadLine();
currentLine = reader.ReadLine();
string[] positions = new string[6];
List<VertexPosCol> vertices = new List<VertexPosCol>(vcount);
for (int i = 0; i < vcount; ++i)
{
currentLine = reader.ReadLine();
extractedLine = currentLine.Substring(currentLine.IndexOf("\t") + 1);
positions = extractedLine.Split(' ');
// We only use the first 3, the last 3 are normals which are not used.
vertices.Add(new VertexPosCol(
new Vector3(float.Parse(positions[0]), float.Parse(positions[1]), float.Parse(positions[2])),
Color.Black)
);
}
BufferDescription vbd = new BufferDescription();
vbd.Usage = ResourceUsage.Immutable;
vbd.SizeInBytes = Utilities.SizeOf<VertexPosCol>() * vcount;
vbd.BindFlags = BindFlags.VertexBuffer;
vbd.StructureByteStride = 0;
_vBuffer = Buffer.Create(d3dDevice, vertices.ToArray(), vbd);
// Create the index buffer
// Skip over the next 3 lines (these are not the lines we are looking for)
currentLine = reader.ReadLine();
currentLine = reader.ReadLine();
currentLine = reader.ReadLine();
string[] indexes = new string[6];
_meshIndexCount = 3 * tcount;
List<int> indices = new List<int>(_meshIndexCount);
for (int i = 0; i < tcount; ++i)
{
currentLine = reader.ReadLine();
extractedLine = currentLine.Substring(currentLine.IndexOf("\t") + 1);
indexes = extractedLine.Split(' ');
indices.Add(int.Parse(indexes[0]));
indices.Add(int.Parse(indexes[1]));
indices.Add(int.Parse(indexes[2]));
}
BufferDescription ibd = new BufferDescription();
ibd.Usage = ResourceUsage.Immutable;
ibd.SizeInBytes = Utilities.SizeOf<int>() * _meshIndexCount;
ibd.BindFlags = BindFlags.IndexBuffer;
_iBuffer = Buffer.Create(d3dDevice, indices.ToArray(), ibd);
}
fs.Close();
}
As always is someone sees a problem with the code or a better way of doing something I am always open to ideas.

How to read Id3v2 tag

static void Main(string[] args)
{
FileStream fs = File.Open(#"C:\Skrillex - Rock n' Roll (Will Take You to the Mountain).mp3", FileMode.Open);
BinaryReader br = new BinaryReader(fs);
byte[] tag = new byte[3];
byte[] version = new byte[2];
byte[] flags = new byte[1];
byte[] size = new byte[4];
byte[] frameId = new byte[4];
byte[] frameSize = new byte[4];
byte[] frameFlags = new byte[2];
br.Read(tag, 0, tag.Length);
br.Read(version, 0, version.Length);
br.Read(flags, 0, flags.Length);
br.Read(size, 0, size.Length);
br.Read(frameId, 0, frameId.Length);
br.Read(frameSize, 0, frameSize.Length);
br.Read(frameFlags, 0, frameFlags.Length);
ulong iSize = (ulong)frameSize[0] << 21 | (ulong)frameSize[1] << 14 | (ulong)frameSize[2] << 7 | (ulong)frameSize[3];
Console.WriteLine("Frame Data Size : " + iSize.ToString());
byte[] body = new byte[iSize];
br.Read(body, 0, body.Length);
Console.WriteLine(BitConverter.ToString(body));
Console.WriteLine(ConvertHexToString(BitConverter.ToString(body)));
br.Close();
}
public string ConvertHexToString(string HexValue)
{
string StrValue = "";
HexValue = HexValue.Replace("-", "");
while (HexValue.Length > 0)
{
StrValue += Convert.ToChar(Convert.ToUInt32(HexValue.Substring(0, 2), 16)).ToString();
HexValue = HexValue.Substring(2, HexValue.Length - 2);
}
return StrValue;
}
I am writing the code for reading ID3v2.3 tags without external library or Shell32.
The above code is that code, but it seems not to work properly.
The following is the result when I run the code:
Frame Data Size : 91
01-FF-FE-52-00-6F-00-63-00-6B-00-20-00-6E-00-27-00-20-00-52-00-6F-00-6C-00-6C-00-20-00-28-> 00-57-00-69-00-6C-00-6C-00-20-00-54-00-61-00-6B-00-65-00-20-00-59-00-6F-00-75-00-20-00-74-> 00-6F-00-20-00-74-00-68-00-65-00-20-00-4D-00-6F-00-75-00-6E-00-74-00-61-00-69-00-6E-00-29-00
ÿþR
It is not returning the song title "Rock n' Roll (Will Take You to the Mountain)" that was recorded in the tag.
What is problem?

The 01 at the start indicates that it is encoded as UTF-16 (2 bytes per character). The next two bytes, FF FE, are the byte order mark so you can tell whether to interpret the byte pairs as most significant first or least significant first. After that you have the actual text data.
0052 - R
006F - o
0063 - c
006B - k
etc.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

FileStream and Encoding - c#

Related

Reading multi language text file in c#

How to replace extended ASCII characters in C#?

Convert float to its binary representation (using MemoryStream?)

Converting from C++ ifstream to C# FileStream

How to read Id3v2 tag

Categories

Resources