I have a code that relies heavily on bytes for its speed while not writing to a file.
In either the read function or the datapoint conversion function, I am getting several unnecessary "space" characters from the byte array after converting it to an ASCII string, even after setting them to NULL. This generates a lot of undesired whitespace. Here's parts of the current code:
//Within Read Function
var charBuf = Enumerable.Repeat<byte>(0, 1024).ToArray(); //Set byte array to null
int ret = Read(ConnectionID, charBuf, 1024); //Call to a custom dll to retrieve data
if (0 <= ret)
{
return charBuf;
}
//Datapoint message is set as an empty byte that gets added to the list Datapoint
//The following converts the datapoint to a string depending on its input
var message = Encoding.ASCII.GetString(dataPoint.Message);
if (String.IsNullOrEmpty(message))
{
message = "ReadError";
}
Is there any way to eliminate these supposed characters without too much code or is there an error in my conversion? Either fix would be appreciated.
To prevent modifying the read function of the code too much further than changing the datatype in order to optimize the speed of the code, I decided to take care of the empty space issue by having the process of string simplification done once the read function was no longer in use, and all the data was being written to a file:
var message = Encoding.Default.GetString(dataPoint.Message);
int messageSize=0;
byte nullByte = 0x00;
for (int k=0; k < dataPoint.Message.Count(); k++)
{
if (dataPoint.Message.ElementAt(k).Equals(nullByte))
{
messageSize = k+1;
break;
}
else
{
continue;
}
}
message = message.Substring(0, messageSize);
message would then be appended to a text file line by line for each message per line.
This method ensures that despite being defined as 1024 null spaces, only data received by the read function will be accounted for (where the data being received sends no spaces).
Related
I am receiving data from a cnc machine every 5 seconds. Length of the data is 66 bytes. And every two byte has a special meaning according to the guide that I have. The device sends the data over socket to a specific ip and port. I have been told that I should read the data as hex instead of ascii.
This line of code returns
string data = Encoding.ASCII.GetString(data.buffer,0,66);
this;
"\0\u0004\0\u0001\0\0\0\0\0\0\0\0\0\0\0\0\0\r\0\r\0\0\0\0\0\0:a\u0002#\0?\0`\u001b?\u0015U\0\0\0\0\u0001\u0010\0\u0018\0\0\u000f\a\0\0\0\0\0\0\0\0\0\0\0\0\0\0u/"
and of course it is not useful to me.
I did tried to convert byte array to the hex string with that code;
StringBuilder sb = new StringBuilder();
foreach (byte b in buffer)
sb.Append(b.ToString("X2"));
string hexString = sb.ToString();
And got result as
00040001000000000000000000020000000000000000000000003A9D023F00A000601B841555000000000110001800000F070000000000000000000000000000752F
And when I try to convert this result as string, no success, nothing meaningfull.
GOAL
What I am trying to achieve is, read the incoming socket data as hex and use every two byte as a word to match a value. For example first 2 byte should match either 0 or 1. With i have it returns ? (question mark)
Thank you.
I have been told that I should read the data as hex instead of ascii
My gut feeling is this statement has been misquoted or misunderstood. There is no value in processing binary data as string hex representation just as there is no value in converting it to ascii... The only sane way to process binary data, is in binary unless you have a meaningful way to convert it.
You mention you need word (2byte) groupings, you could just convert this to an array of short, or ushort depending on your needs
var bytes = new byte[66];
var shortArray = new short[bytes.Length / 2];
Buffer.BlockCopy(bytes, 0, shortArray, 0, bytes.Length);
or
for (int i = 0; i < shortArray.Length; i++)
shortArray[i] = BitConverter.ToInt16(bytes[(i*2)..(i*2+2)]);
Disclaimer : This is just an example, be very careful of the endianess of your data, there are other ways to do this
The task is to take a picture, read all its bytes and then write additional 15 zero bytes after each non-zero byte from original file. Example: it was B1,B2,...Bn and after it must be B1,0,0,..0,B2,0,0..,Bn,0,0..0. Then I need to save/replace new picture. In general I assume I can use something like ReadAllBytes and create an array of bytes, then create new byte[] array and take one byte from file, then write 15 zero bytes, then take second byte and etc. But how can I be sure that it is working correctly? I'm not familiar with working with bytes and if I try to print bytes that I've read from file it shows some random symbols that don't make any sense which leaves the question: am I doing it right? If possible, please direct me to right approach and the functions that I need to use to achieve it, thanks in advance!
See How to convert image to byte array for how to read the image.
It seems that you'd like to be able to visually see the data. For debugging purposes, you can show each byte as a hex string which will allow you to "see" the hex values of each element of your array.
public string GetBytesAsHexString(byte[] bArr)
{
StringBuilder sb = new StringBuilder();
if (bArr != null && bArr.Length > 0)
{
for (int i = 0; i < bArr.Length; i++)
{
sb.AppendFormat("{0}{1}", bArr[i].ToString("X"), System.Environment.NewLine);
//sb.AppendFormat("{0}{1}", bArr[i].ToString("X2"), System.Environment.NewLine);
//sb.AppendFormat("{0}{1}", bArr[i].ToString("X4"), System.Environment.NewLine);
}
}
return sb.ToString();
}
Can you use StreamReader to read a normal textfile and then in the middle of reading close the StreamReader after saving the current position and then open StreamReader again and start reading from that poistion ?
If not what else can I use to accomplish the same case without locking the file ?
I tried this but it doesn't work:
var fs = File.Open(# "C:\testfile.txt", FileMode.Open, FileAccess.Read);
var sr = new StreamReader(fs);
Debug.WriteLine(sr.ReadLine()); //Prints:firstline
var pos = fs.Position;
while (!sr.EndOfStream)
{
Debug.WriteLine(sr.ReadLine());
}
fs.Seek(pos, SeekOrigin.Begin);
Debug.WriteLine(sr.ReadLine());
//Prints Nothing, i expect it to print SecondLine.
Here is the other code I also tried :
var position = -1;
StreamReaderSE sr = new StreamReaderSE(# "c:\testfile.txt");
Debug.WriteLine(sr.ReadLine());
position = sr.BytesRead;
Debug.WriteLine(sr.ReadLine());
Debug.WriteLine(sr.ReadLine());
Debug.WriteLine(sr.ReadLine());
Debug.WriteLine(sr.ReadLine());
Debug.WriteLine("Wait");
sr.BaseStream.Seek(position, SeekOrigin.Begin);
Debug.WriteLine(sr.ReadLine());
I realize this is really belated, but I just stumbled onto this incredible flaw in StreamReader myself; the fact that you can't reliably seek when using StreamReader. Personally, my specific need is to have the ability to read characters, but then "back up" if a certain condition is met; it's a side effect of one of the file formats I'm parsing.
Using ReadLine() isn't an option because it's only useful in really trivial parsing jobs. I have to support configurable record/line delimiter sequences and support escape delimiter sequences. Also, I don't want to implement my own buffer so I can support "backing up" and escape sequences; that should be the StreamReader's job.
This method calculates the actual position in the underlying stream of bytes on-demand. It works for UTF8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE, and any single-byte encoding (e.g. code pages 1252, 437, 28591, etc.), regardless the presence of a preamble/BOM. This version will not work for UTF-7, Shift-JIS, or other variable-byte encodings.
When I need to seek to an arbitrary position in the underlying stream, I directly set BaseStream.Position and then call DiscardBufferedData() to get StreamReader back in sync for the next Read()/Peek() call.
And a friendly reminder: don't arbitrarily set BaseStream.Position. If you bisect a character, you'll invalidate the next Read() and, for UTF-16/-32, you'll also invalidate the result of this method.
public static long GetActualPosition(StreamReader reader)
{
System.Reflection.BindingFlags flags = System.Reflection.BindingFlags.DeclaredOnly | System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.GetField;
// The current buffer of decoded characters
char[] charBuffer = (char[])reader.GetType().InvokeMember("charBuffer", flags, null, reader, null);
// The index of the next char to be read from charBuffer
int charPos = (int)reader.GetType().InvokeMember("charPos", flags, null, reader, null);
// The number of decoded chars presently used in charBuffer
int charLen = (int)reader.GetType().InvokeMember("charLen", flags, null, reader, null);
// The current buffer of read bytes (byteBuffer.Length = 1024; this is critical).
byte[] byteBuffer = (byte[])reader.GetType().InvokeMember("byteBuffer", flags, null, reader, null);
// The number of bytes read while advancing reader.BaseStream.Position to (re)fill charBuffer
int byteLen = (int)reader.GetType().InvokeMember("byteLen", flags, null, reader, null);
// The number of bytes the remaining chars use in the original encoding.
int numBytesLeft = reader.CurrentEncoding.GetByteCount(charBuffer, charPos, charLen - charPos);
// For variable-byte encodings, deal with partial chars at the end of the buffer
int numFragments = 0;
if (byteLen > 0 && !reader.CurrentEncoding.IsSingleByte)
{
if (reader.CurrentEncoding.CodePage == 65001) // UTF-8
{
byte byteCountMask = 0;
while ((byteBuffer[byteLen - numFragments - 1] >> 6) == 2) // if the byte is "10xx xxxx", it's a continuation-byte
byteCountMask |= (byte)(1 << ++numFragments); // count bytes & build the "complete char" mask
if ((byteBuffer[byteLen - numFragments - 1] >> 6) == 3) // if the byte is "11xx xxxx", it starts a multi-byte char.
byteCountMask |= (byte)(1 << ++numFragments); // count bytes & build the "complete char" mask
// see if we found as many bytes as the leading-byte says to expect
if (numFragments > 1 && ((byteBuffer[byteLen - numFragments] >> 7 - numFragments) == byteCountMask))
numFragments = 0; // no partial-char in the byte-buffer to account for
}
else if (reader.CurrentEncoding.CodePage == 1200) // UTF-16LE
{
if (byteBuffer[byteLen - 1] >= 0xd8) // high-surrogate
numFragments = 2; // account for the partial character
}
else if (reader.CurrentEncoding.CodePage == 1201) // UTF-16BE
{
if (byteBuffer[byteLen - 2] >= 0xd8) // high-surrogate
numFragments = 2; // account for the partial character
}
}
return reader.BaseStream.Position - numBytesLeft - numFragments;
}
Of course, this uses Reflection to get at private variables, so there is risk involved. However, this method works with .Net 2.0, 3.0, 3.5, 4.0, 4.0.3, 4.5, 4.5.1, 4.5.2, 4.6, and 4.6.1. Beyond that risk, the only other critical assumption is that the underlying byte-buffer is a byte[1024]; if Microsoft changes it the wrong way, the method breaks for UTF-16/-32.
This has been tested against a UTF-8 file filled with Ažテ𣘺 (10 bytes: 0x41 C5 BE E3 83 86 F0 A3 98 BA) and a UTF-16 file filled with A𐐷 (6 bytes: 0x41 00 01 D8 37 DC). The point being to force-fragment characters along the byte[1024] boundaries, all the different ways they could be.
UPDATE (2013-07-03): I fixed the method, which originally used the broken code from that other answer. This version has been tested against data containing a characters requiring use of surrogate pairs. The data was put into 3 files, each with a different encoding; one UTF-8, one UTF-16LE, and one UTF-16BE.
UPDATE (2016-02): The only correct way to handle bisected characters is to directly interpret the underlying bytes. UTF-8 is properly handled, and UTF-16/-32 work (given the length of byteBuffer).
Yes you can, see this:
var sr = new StreamReader("test.txt");
sr.BaseStream.Seek(2, SeekOrigin.Begin); // Check sr.BaseStream.CanSeek first
Update:
Be aware that you can't necessarily use sr.BaseStream.Position to anything useful because StreamReader uses buffers so it will not reflect what you actually have read. I guess you gonna have problems finding the true position. Because you can't just count characters (different encodings and therefore character lengths). I think the best way is to work with FileStream´s themselves.
Update:
Use the TGREER.myStreamReader from here:
http://www.daniweb.com/software-development/csharp/threads/35078
this class adds BytesRead etc. (works with ReadLine() but apparently not with other reads methods)
and then you can do like this:
File.WriteAllText("test.txt", "1234\n56789");
long position = -1;
using (var sr = new myStreamReader("test.txt"))
{
Console.WriteLine(sr.ReadLine());
position = sr.BytesRead;
}
Console.WriteLine("Wait");
using (var sr = new myStreamReader("test.txt"))
{
sr.BaseStream.Seek(position, SeekOrigin.Begin);
Console.WriteLine(sr.ReadToEnd());
}
If you want to just search for a start position within a text stream, I added this extension to StreamReader so that I could determine where the edit of the stream should occur. Granted, this is based upon characters as the incrementing aspect of the logic, but for my purposes, it works great, for getting the position within a text/ASCII based file based upon a string pattern. Then, you can use that location as a start point for reading, to write a new file that discludes the data prior to the start point.
The returned position within the stream can be provided to Seek to start from that position within text-based stream reads. It works. I've tested it. However, there may be issues when matching to non-ASCII Unicode chars during the matching algorithm. This was based upon American English and the associated character page.
Basics: it scans through a text stream, character-by-character, looking for the sequential string pattern (that matches the string parameter) forward only through the stream. Once the pattern doesn't match the string parameter (i.e. going forward, char by char), then it will start over (from the current position) trying to get a match, char-by-char. It will eventually quit if the match can't be found in the stream. If the match is found, then it returns the current "character" position within the stream, not the StreamReader.BaseStream.Position, as that position is ahead, based on the buffering that the StreamReader does.
As indicated in the comments, this method WILL affect the position of the StreamReader, and it will be set back to the beginning (0) at the end of the method. StreamReader.BaseStream.Seek should be used to run to the position returned by this extension.
Note: the position returned by this extension will also work with BinaryReader.Seek as a start position when working with text files. I actually used this logic for that purpose to rewrite a PostScript file back to disk, after discarding the PJL header information to make the file a "proper" PostScript readable file that could be consumed by GhostScript. :)
The string to search for within the PostScript (after the PJL header) is: "%!PS-", which is followed by "Adobe" and the version.
public static class StreamReaderExtension
{
/// <summary>
/// Searches from the beginning of the stream for the indicated
/// <paramref name="pattern"/>. Once found, returns the position within the stream
/// that the pattern begins at.
/// </summary>
/// <param name="pattern">The <c>string</c> pattern to search for in the stream.</param>
/// <returns>If <paramref name="pattern"/> is found in the stream, then the start position
/// within the stream of the pattern; otherwise, -1.</returns>
/// <remarks>Please note: this method will change the current stream position of this instance of
/// <see cref="System.IO.StreamReader"/>. When it completes, the position of the reader will
/// be set to 0.</remarks>
public static long FindSeekPosition(this StreamReader reader, string pattern)
{
if (!string.IsNullOrEmpty(pattern) && reader.BaseStream.CanSeek)
{
try
{
reader.BaseStream.Position = 0;
reader.DiscardBufferedData();
StringBuilder buff = new StringBuilder();
long start = 0;
long charCount = 0;
List<char> matches = new List<char>(pattern.ToCharArray());
bool startFound = false;
while (!reader.EndOfStream)
{
char chr = (char)reader.Read();
if (chr == matches[0] && !startFound)
{
startFound = true;
start = charCount;
}
if (startFound && matches.Contains(chr))
{
buff.Append(chr);
if (buff.Length == pattern.Length
&& buff.ToString() == pattern)
{
return start;
}
bool reset = false;
if (buff.Length > pattern.Length)
{
reset = true;
}
else
{
string subStr = pattern.Substring(0, buff.Length);
if (buff.ToString() != subStr)
{
reset = true;
}
}
if (reset)
{
buff.Length = 0;
startFound = false;
start = 0;
}
}
charCount++;
}
}
finally
{
reader.BaseStream.Position = 0;
reader.DiscardBufferedData();
}
}
return -1;
}
}
FileStream.Position (or equivalently, StreamReader.BaseStream.Position) will usually be ahead -- possibly way ahead -- of the TextReader position because of the underlying buffering taking place.
If you can determine how newlines are handled in your text files, you can add up the number of bytes read based on line lengths and end-of-line characters.
File.WriteAllText("test.txt", "1234" + System.Environment.NewLine + "56789");
long position = -1;
long bytesRead = 0;
int newLineBytes = System.Environment.NewLine.Length;
using (var sr = new StreamReader("test.txt"))
{
string line = sr.ReadLine();
bytesRead += line.Length + newLineBytes;
Console.WriteLine(line);
position = bytesRead;
}
Console.WriteLine("Wait");
using (var sr = new StreamReader("test.txt"))
{
sr.BaseStream.Seek(position, SeekOrigin.Begin);
Console.WriteLine(sr.ReadToEnd());
}
For more complex text file encodings you might need to get fancier than this, but it worked for me.
I found the suggestions above to not work for me -- my use case was to simply need to back up one read position (I'm reading one char at a time with a default encoding). My simple solution was inspired by above commentary ... your mileage may vary...
I saved the BaseStream.Position before reading, then determined if I needed to back up... if yes, then set position and invoke DiscardBufferedData().
From MSDN:
StreamReader is designed for character
input in a particular encoding,
whereas the Stream class is designed
for byte input and output. Use
StreamReader for reading lines of
information from a standard text file.
In most of the examples involving StreamReader, you will see reading line by line using the ReadLine(). The Seek method comes from Stream class which is basically used to read or handle data in bytes.
I'm trying to read a file and extract 2 blocks of data, let's call them block1 and block2, from the file where the file would contain many blocks of data. Both blocks need to be
returned in a byte array. Block1 would begin at place in the file where the line begins
"block1:" followed by the number of bytes to read. Block2, not necessarily appearing after
block1, would begin at place in the file where the line begins "block2:" followed by the
number of bytes to read. I am limited to .Net 3.5 at the highest.
You can use File.ReadAllBytes and extract your blocks from the returned byte[] using one of the Array.Copy overloads if you know the indexes they are in.
As others have mentioned, without header information you'll need to, at the very least, stream the contents of the file through a filter of some kind looking for your "block" markers.
If you do have header information (or at least some information somewhere as to the offset of your block markers), you could use a memory mapped file:
http://www.developer.com/net/article.php/3828586/Using-Memory-Mapped-Files-in-NET-40.htm
This requires .NET 4.0, although you could also use the Win32 API if you're not using .NET 4.
Without any sort of header information in your file, you'll have to scan the entire file, searching for your block1: or block2: markers.
Update:
Here's a sample of how you'd do this (not necessarily the best implementation):
byte[] GetBlockOfData(string fileName, string blockName)
{
var allBytes = File.ReadAllBytes(fileName);
// Assuming block names are ASCII-encoded
var blockMarker = Encoding.ASCII.GetBytes(blockName + ":");
// Scan for the first byte of the marker
for (var i = 0; i < allBytes.Length; i++)
{
if (allBytes[i] == blockMarker[i])
{
// See if this is the entire marker
var isMatch == true;
for (var j = 0; j < blockMarker.Length; j++)
{
if (allBytes[i + j] != blockMarker[j])
{
isMatch = false;
break;
}
}
if (isMatch)
{
// Assuming it's a byte...
var blockLength = allBytes[i + blockMarker.Length];
var result = new byte[blockLength];
Array.Copy(
allBytes, i + blockMarker.Length + 1, result, 0,
blockLength);
return result;
}
}
}
return null;
}
So I've got an algorithm that reads from a (very large, ~155+ MB) binary file, parses it according to a spec and writes out the necessary info (to a CSV, flat text). It works flawlessly for the first 15.5 million lines of output, which produces a CSV file of ~0.99-1.03 GB. This gets through hardly over 20% of the binary file. After this it breaks, as in suddenly the printed data is not at all what is shown in the binary file. I checked the binary file, the same pattern continues (data split up into "packets" - see code below). Due to how it's handled, mem usage never really increases (steady ~15K). The functional code is listed below. Is it my algorithm (if so, why would it break after 15.5 million lines?!)... are there other implications I'm not considering due to the large file sizes? Any ideas?
(fyi: each "packet" is 77 bytes in length, beginning with a 3byte "startcode" and ending with a 5byte "endcode" - you'll see the pattern below)
edit code has been updated based on the suggestions below... thanks!
private void readBin(string theFile)
{
List<int> il = new List<int>();
bool readyForProcessing = false;
byte[] packet = new byte[77];
try
{
FileStream fs_bin = new FileStream(theFile, FileMode.Open);
BinaryReader br = new BinaryReader(fs_bin);
while (br.BaseStream.Position < br.BaseStream.Length && working)
{
// Find the first startcode
while (!readyForProcessing)
{
// If last byte of endcode adjacent to first byte of startcod...
// This never occurs outside of ending/starting so it's safe
if (br.ReadByte() == 0x0a && br.PeekChar() == (char)0x16)
readyForProcessing = true;
}
// Read a full packet of 77 bytes
br.Read(packet, 0, packet.Length);
// Unnecessary I guess now, but ensures packet begins
// with startcode and ends with endcode
if (packet.Take(3).SequenceEqual(STARTCODE) &&
packet.Skip(packet.Length - ENDCODE.Length).SequenceEqual(ENDCODE))
{
il.Add(BitConverter.ToUInt16(packet, 3)); //il.ElementAt(0) == 2byte id
il.Add(BitConverter.ToUInt16(packet, 5)); //il.ElementAt(1) == 2byte semistable
il.Add(packet[7]); //il.ElementAt(2) == 1byte constant
for(int i = 8; i < 72; i += 2) //start at 8th byte, get 64 bytes
il.Add(BitConverter.ToUInt16(packet, i));
for (int i = 3; i < 35; i++)
{
sw.WriteLine(il.ElementAt(0) + "," + il.ElementAt(1) +
"," + il.ElementAt(2) + "," + il.ElementAt(i));
}
il.Clear();
}
else
{
// Handle "bad" packets
}
} // while
fs_bin.Flush();
br.Close();
fs_bin.Close();
}
catch (Exception e)
{
MessageBox.Show(e.ToString());
}
}
Your code is silently catching any exception that happens in the while loop and swallowing it.
This is a bad practice because it masks issues like the one you are running into.
Most likely, one of the methods you call inside the loop (int.Parse() for example) is throwing an exception because it encounters some problem in the format of the data (or your assumptions about that format).
Once an exception occurs, the loop that reads data is thrown off kilter because it is no longer positioned at a record boundary.
You should do several things to make this code more resilient:
Don't silently swallow exception in the run loop. Deal with them.
Don't read data byte by byte or field by field in the loop. Since your records are fixed size (77 bytes) - read an entire record into a byte[] and then process it from there. This will help ensure you are always reading at a record boundary.
Don't put an empty generic catch block here and just silently catch and continue. You should check and see if you're getting an actual exception in there and go from there.
There is no need for the byteToHexString function. Just use the 0x prefix before a hexadecimal number and it will do a binary comparison.
i.e.
if(al[0] == 0x16 && al[1] == 0x3C && al[2] == 0x02)
{
...
}
I don't know what your doConvert function does (you didn't provide that source), but the BinaryReader class provides many different functions, one of which is ReadInt16. Unless your shorts are stored in an encoded format, that should be easier to use than doing your fairly obfuscated and confusing conversion. Even if they're encoded, it would still be far simpler to read the bytes in and manipulate them, rather than doing several roundtrips with converting to strings.
Edit
You appear to be making very liberal use of the LINQ extension methods (particularly ElementAt). Every time you call that function, it enumerates your list until it reaches that number. You'll have much better performing code (as well as less verbose) if you just use the built-in indexer on the list.
i.e. al[3] rather than al.ElementAt(3).
Also, you don't need to call Flush on an input Stream. Flush is used to tell the stream to write anything that it has in its write buffer to the underlying OS file handle. For an input stream it won't do anything.
I would suggest replacing your current sw.WriteLine call with this:
sw.WriteLine(BitConverter.ToString(packet)); and see if the data you're expecting on the row where it starts to mess up is actually what you're getting.
I would actually do this:
if (packet.Take(3).SequenceEqual(STARTCODE) &&
packet.Skip(packet.Length - ENDCODE.Length).SequenceEqual(ENDCODE))
{
ushort id = BitConverter.ToUInt16(packet, 3);
ushort semistable = BitConverter.ToUInt16(packet, 5);
byte contant = packet[7];
for(int i = 8; i < 72; i += 2)
{
il.Add(BitConverter.ToUInt16(packet, i));
}
foreach(ushort element in il)
{
sw.WriteLine(string.Format("{0},{1},{2},{3}", id, semistable, constant, element);
}
il.Clear();
}
else
{
//handle "bad" packets
}