C# extract data from start of StringBuilder

C# extract data from start of StringBuilder - c#

I want to extract some chars from start of StringBuilder variable.
I wrote bellow code :
private string getPart(StringBuilder data, int len)
{
string s = data.ToString(0, len);
data.Remove(0, len);
return s;
}
Any suggestion fro better coding?

If you want character extraction in many parts of your code,
you can try implementing the algorithm as an extension method:
public static class StringBuilderExtensions {
public static String Extract(this StringBuilder source, int length) {
if (Object.ReferenceEquals(null, source))
throw new ArgumentNullException("source");
else if ((length < 0) || (length > source.Length))
throw new ArgumentOutOfRangeException("length");
// Your actual algorithm
String result = source.ToString(0, length);
source.Remove(0, length);
return result;
}
}
...
StringBuilder data = ...
String s = data.Extract(len); // <- Just extract

You can use CopyTo method:
private string getPart(StringBuilder data, int len)
{
var output = new char[len];
data.CopyTo(0, output, 0, len);
data.Remove(0, len);
return new string(output);
}

Related

Read a fixed number of bytes when reading file using StreamReader

I'm using this port of the Mozilla character set detector to determine a file's encoding and then using that to construct a StreamReader. So far, so good.
However, the file format I am reading is an odd one and from time to time it is necessary to skip a number of bytes. That is, a file that is otherwise text, in one or other encoding, will have some raw bytes embedded in it.
I would like to read the stream as text, up to the point that I hit some text that indicates a byte stream follows, then I would like to read the byte stream, then resume reading as text. What is the best way of doing this (balance of simplicity and performance)?
I can't rely on seeking against the FileStream underlying the the StreamReader (and then discarding the buffered data in the latter) because I don't know how many bytes were used in reading the characters up to that point. I might abandon using StreamReader and switch to a bespoke class that uses parallel arrays of bytes and chars, populates the latter from the former using a decoder, and tracks the position in the byte array every time a character is read by using the encoding to calculate the number of bytes used for the character. Yuk.
To further clarify, the file has this format:
[encoded chars][embedded bytes indicator + len][len bytes][encoded chars]...
Where there many be zero one or many blocks of embedded bytes and the blocks of embedded chars may be any length.
So, for example:
ABC:123:DEF:456:$0099[0x00,0x01,0x02,... x 99]GHI:789:JKL:...
There are no line delimiters. I may have any number of fields (ABC, 123, ...) delimited by some character (in this case a colon). These fields may be in various codepages, including UTF-8 (not guaranteed to be single byte). When I hit a $ I know that the next 4 bytes contain a length (call it n), the next n bytes are to be read raw, and byte n + 1 will be another text field (GHI).

Proof of concept. This class works with UTF-16 string data, and ':' delimiters per OP. It expects binary length as a 4-byte, little-endian binary integer. It should be easy to adjust to more specific details of your (odd) file format. For example, any Decoder class should drop in to ReadString() and "just work".
To use it, construct it with a Stream class. For each individual data element, call ReportNextData(), which will tell you what kind of data is next, and then call the appropriate Read*() method. For binary data, call ReadBinaryLength() and then ReadBinaryData().
Note that ReadBinaryData() follows the stream contract; it is not guaranteed to return as many bytes as you asked for, so you may need to call it several times. However, if you ask for too many bytes, it will throw EndOfStreamException.
I tested it with this data (hex format):
410042004300240A0000000102030405060708090024050000000504030201580059005A003A310032003300
Which is:
ABC$[10][1234567890]$[5][54321]XYZ:123
Scan the data like so:
OddFileReader.NextData nextData;
while ((nextData = reader.ReportNextData()) != OddFileReader.NextData.Eof)
{
// Call appropriate Read*() here.
}
public class OddFileReader : IDisposable
{
public enum NextData
{
Unknown,
Eof,
String,
BinaryLength,
BinaryData
}
private Stream source;
private byte[] byteBuffer;
private int bufferOffset;
private int bufferEnd;
private NextData nextData;
private int binaryOffset;
private int binaryEnd;
private char[] characterBuffer;
public OddFileReader(Stream source)
{
this.source = source;
}
public NextData ReportNextData()
{
if (nextData != NextData.Unknown)
{
return nextData;
}
if (!PopulateBufferIfNeeded(1))
{
return (nextData = NextData.Eof);
}
if (byteBuffer[bufferOffset] == '$')
{
return (nextData = NextData.BinaryLength);
}
else
{
return (nextData = NextData.String);
}
}
public string ReadString()
{
ReportNextData();
if (nextData == NextData.Eof)
{
throw new EndOfStreamException();
}
else if (nextData != NextData.String)
{
throw new InvalidOperationException("Attempt to read non-string data as string");
}
if (characterBuffer == null)
{
characterBuffer = new char[1];
}
StringBuilder stringBuilder = new StringBuilder();
Decoder decoder = Encoding.Unicode.GetDecoder();
while (nextData == NextData.String)
{
byte b = byteBuffer[bufferOffset];
if (b == '$')
{
nextData = NextData.BinaryLength;
break;
}
else if (b == ':')
{
nextData = NextData.Unknown;
bufferOffset++;
break;
}
else
{
if (decoder.GetChars(byteBuffer, bufferOffset++, 1, characterBuffer, 0) == 1)
{
stringBuilder.Append(characterBuffer[0]);
}
if (bufferOffset == bufferEnd && !PopulateBufferIfNeeded(1))
{
nextData = NextData.Eof;
break;
}
}
}
return stringBuilder.ToString();
}
public int ReadBinaryLength()
{
ReportNextData();
if (nextData == NextData.Eof)
{
throw new EndOfStreamException();
}
else if (nextData != NextData.BinaryLength)
{
throw new InvalidOperationException("Attempt to read non-binary-length data as binary length");
}
bufferOffset++;
if (!PopulateBufferIfNeeded(sizeof(Int32)))
{
nextData = NextData.Eof;
throw new EndOfStreamException();
}
binaryEnd = BitConverter.ToInt32(byteBuffer, bufferOffset);
binaryOffset = 0;
bufferOffset += sizeof(Int32);
nextData = NextData.BinaryData;
return binaryEnd;
}
public int ReadBinaryData(byte[] buffer, int offset, int count)
{
ReportNextData();
if (nextData == NextData.Eof)
{
throw new EndOfStreamException();
}
else if (nextData != NextData.BinaryData)
{
throw new InvalidOperationException("Attempt to read non-binary data as binary data");
}
if (count > binaryEnd - binaryOffset)
{
throw new EndOfStreamException();
}
int bytesRead;
if (bufferOffset < bufferEnd)
{
bytesRead = Math.Min(count, bufferEnd - bufferOffset);
Array.Copy(byteBuffer, bufferOffset, buffer, offset, bytesRead);
bufferOffset += bytesRead;
}
else if (count < byteBuffer.Length)
{
if (!PopulateBufferIfNeeded(1))
{
throw new EndOfStreamException();
}
bytesRead = Math.Min(count, bufferEnd - bufferOffset);
Array.Copy(byteBuffer, bufferOffset, buffer, offset, bytesRead);
bufferOffset += bytesRead;
}
else
{
bytesRead = source.Read(buffer, offset, count);
}
binaryOffset += bytesRead;
if (binaryOffset == binaryEnd)
{
nextData = NextData.Unknown;
}
return bytesRead;
}
private bool PopulateBufferIfNeeded(int minimumBytes)
{
if (byteBuffer == null)
{
byteBuffer = new byte[8192];
}
if (bufferEnd - bufferOffset < minimumBytes)
{
int shiftCount = bufferEnd - bufferOffset;
if (shiftCount > 0)
{
Array.Copy(byteBuffer, bufferOffset, byteBuffer, 0, shiftCount);
}
bufferOffset = 0;
bufferEnd = shiftCount;
while (bufferEnd - bufferOffset < minimumBytes)
{
int bytesRead = source.Read(byteBuffer, bufferEnd, byteBuffer.Length - bufferEnd);
if (bytesRead == 0)
{
return false;
}
bufferEnd += bytesRead;
}
}
return true;
}
public void Dispose()
{
Stream source = this.source;
this.source = null;
if (source != null)
{
source.Dispose();
}
}
}

How to find and replace a large section of bytes in a file?

I'm looking to find a large section of bytes within a file, remove them and then import a new large section of bytes starting where the old ones started.
Here's a video of the manual process that I'm trying to re-create in C#, it might explain it a little better: https://www.youtube.com/watch?v=_KNx8WTTcVA
I have only basic experience with C# so am learning as I go along, any help with this would be very appreciated!
Thanks.

Refer to this question:
C# Replace bytes in Byte[]
Use the following class:
public static class BytePatternUtilities
{
private static int FindBytes(byte[] src, byte[] find)
{
int index = -1;
int matchIndex = 0;
// handle the complete source array
for (int i = 0; i < src.Length; i++)
{
if (src[i] == find[matchIndex])
{
if (matchIndex == (find.Length - 1))
{
index = i - matchIndex;
break;
}
matchIndex++;
}
else
{
matchIndex = 0;
}
}
return index;
}
public static byte[] ReplaceBytes(byte[] src, byte[] search, byte[] repl)
{
byte[] dst = null;
byte[] temp = null;
int index = FindBytes(src, search);
while (index >= 0)
{
if (temp == null)
temp = src;
else
temp = dst;
dst = new byte[temp.Length - search.Length + repl.Length];
// before found array
Buffer.BlockCopy(temp, 0, dst, 0, index);
// repl copy
Buffer.BlockCopy(repl, 0, dst, index, repl.Length);
// rest of src array
Buffer.BlockCopy(
temp,
index + search.Length,
dst,
index + repl.Length,
temp.Length - (index + search.Length));
index = FindBytes(dst, search);
}
return dst;
}
}
Usage:
byte[] allBytes = File.ReadAllBytes(#"your source file path");
byte[] oldbytePattern = new byte[]{49, 50};
byte[] newBytePattern = new byte[]{48, 51, 52};
byte[] resultBytes = BytePatternUtilities.ReplaceBytes(allBytes, oldbytePattern, newBytePattern);
File.WriteAllBytes(#"your destination file path", resultBytes)
The problem is when the file is too large, then you require a "windowing" function. Don't load all the bytes in memory as it will take up much space.

FileStream delimiter?

I have a large file with (text/Binary) format.
file format: (0 represent a byte)
00000FileName0000000Hello
World
world1
...
0000000000000000000000
Currently i'm using FileStream and i want to read the Hello.
I Know where Hello start, and it ends with a 0x0D 0x0A.
I also need to go back if the words is not equal to Hello.
How can i read until a carriage return?
is there any PEEK like function in FileStream so i can move back the read pointer`?
is FileStream even a good choice in this case?

You can use the method FileStream.Seek to change the read/write position.

You can use BinaryReader for reading binary content; however, it uses an inner buffer so you cannot rely the underlying Stream.Position anymore, because it can read more bytes in the background than you want. But you can re-implement its needed methods:
private byte[] ReadBytes(Stream s, int count)
{
buffer = new byte[count];
if (count == 0)
{
return buffer;
}
// reading one byte
if (count == 1)
{
int value = s.ReadByte();
if (value == -1)
threw new IOException("Out of stream");
buffer[0] = (byte)value;
return buffer;
}
// reading multiple bytes
int offset = 0;
do
{
int readBytes = s.Read(buffer, offset, count - offset);
if (readBytes == 0)
threw new IOException("Out of stream");
offset += readBytes;
}
while (offset < count);
return buffer;
}
public int ReadInt32(Stream s)
{
byte[] buffer = ReadBytes(s, 4);
return BitConverter.ToInt32(buffer, 0);
}
// similarly, write ReadInt16/64, etc, whatever you need
Assuming that you are on the start position, you can write a ReadString, too:
private string ReadString(Stream s, char delimiter)
{
var result = new List<char>();
int c;
while ((c = s.ReadByte()) != -1 && (char)c != delimiter)
{
result.Add((char)c);
}
return new string(result.ToArray());
}
Usage:
FileStream fs = GetMyFile(); // todo
if (!fs.CanSeek)
throw new NotSupportedException("sorry");
long posCurrent = fs.Position; // save current position
int posHello = ReadInt32(fs); // read position of "hello"
fs.Seek(posHello, SeekOrigin.Begin); // seeking to hello
string hello = ReadString(fs, '\n'); // reading hello
fs.Seek(posCurrent, SeekOrigin.Begin); // seeking back

Converting byte arrays (from readfile) to string

So, I am using ReadFile from kernel32 for reading the file. Here is my code in reading files with the help of SetFilePointer and ReadFile.
public long ReadFileMe(IntPtr filehandle, int startpos, int length, byte[] outdata)
{
IntPtr filea = IntPtr.Zero;
long ntruelen = GetFileSize(filehandle, filea);
int nRequestStart;
uint nRequestLen;
uint nApproxLength;
int a = 0;
if (ntruelen <= -1)
{
return -1;
}
else if (ntruelen == 0)
{
return -2;
}
if (startpos > ntruelen)
{
return -3;
}
else if (length <= 0)
{
return -5;
}
else if (length > ntruelen)
{
return -6;
}
else
{
nRequestStart = startpos;
nRequestLen = (uint)length;
outdata = new byte[nRequestLen - 1];
SetFilePointer(filehandle, (nRequestStart - 1), ref a, 0);
ReadFile(filehandle, outdata, nRequestLen, out nApproxLength, IntPtr.Zero);
return nApproxLength; //just for telling how many bytes are read in this function
}
}
When I used this function, it works (for another purpose) so this code is tested and works.
But the main problem is, I now need to convert the outdata on the parameter which the function puts the bytes into string.
I tried using Encoding.Unicode and so on (all UTF), but it doesn't work.

Try to use Encoding.GetString (Byte[], Int32, Int32) method. this decodes a sequence of bytes from the specified byte array into a string.

Hmm... Encoding.Name_of_encoding.GetString must work...
try smth like this:
var convertedBuffer = Encoding.Convert(
Encoding.GetEncoding( /*name of encoding*/),Encoding.UTF8, outdata);
var str = Encoding.UTF8.GetString(convertedBuffer);
UPDATE:
and what about this?:
using (var streamReader = new StreamReader(#"C:\test.txt", true))
{
var currentEncoding = streamReader.CurrentEncoding.EncodingName;
Console.WriteLine(currentEncoding);
}

You might need to add the out parameter on outdata parameter :
Passing Arrays Using ref and out
public long ReadFileMe(IntPtr filehandle, int startpos, int length, out byte[] outdata)

IntPtr to String and long?

I am using C# and the .NET 2.0 framework.
I have this method here to get a string out of a IntPtr:
void* cfstring = __CFStringMakeConstantString(StringToCString(name));
IntPtr result = AMDeviceCopyValue_Int(device, unknown, cfstring);
if (result != IntPtr.Zero)
{
byte length = Marshal.ReadByte(result, 8);
if (length > 0)
{
string s = Marshal.PtrToStringAnsi(new IntPtr(result.ToInt64() + 9L), length);
return s;
}
}
return String.Empty;
It works and all data which is saved as string in my USB-Device (return data of method AMDeviceCopyValue).
Now I get this:
£\rº\0\0Â¨\t\0\0\0ˆ\0\0\0\0\0\0x>Ô\0\0\0\0Å¨\t\0\0\0€1ÔxÕÍ¸MÔ\0\0\0\0È¨\t\0\0\0€)\0fŒ\a\0Value\0\0Ë¨\t\0\0\0€7fŒ\a\0Result\0Î¨\t\0\0\0ˆTÅfŒ\a\0\0Key\0\0\0\0ñ¨\t\0\0\0€+fŒ\a\0Port\0\0\0ô¨\t\0\0\0€%fŒ\a\0Key\0\0\0\0÷¨\t\0\0\0€:\0\0\0\0\0\0\0\0\0\0\0\0\0\0ú¨\t\0"
This is saved as long - so, how I can get this IntPtr into long and not string?

If you simply want to read a long instead of a string, use:
long l = Marshal.ReadInt64(result, 9);
return l;

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.