What's a good workaround for combining long strings? [closed]

What's a good workaround for combining long strings? [closed] - c#

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Right now I have it .Add() to a List<string> but this can get extremely long.
As you can see here, it added 33554432 strings to the list before finally throwing in the towel.
What can I do better here to workaround this?
StringBuilder
Since using a StringBuilder.AppendLine() it's been better. I haven't encountered the issue since but of course that doesn't mean it CAN'T occur.
My end goal
A lot of you are asking why I'm trying to combine strings and telling me to just read in chunks and such. This isn't really an option, I'm reading from an IMAP stream and I cannot chunk this as it's to be read to search data and to be shown to the user.
The only way I can reliably chunk it is if I were to create a new StringBuilder on-exception and start compiling to that, then maybe once read fully, combine all created StringBuilders into 1 string, that probably wouldn't even work well.
Im reading the Stream with an Extension ReadLine method
Note this is used for other kinds of operations as well, i know the return bit isn't exactly pretty or optimized for this case either.
public static string ReadLine(this Stream stream, ref int bodySize, Encoding encoding, bool returnAsByteString=false) {
bool bodySizeWasSpecified = bodySize > 0;
byte b = 0;
List<byte> bytes = new List<byte>();
while (true) {
#region Try Get 1 Byte from Stream
try {
int i = stream.ReadByte();
if (i == -1) {
break;//stream ended/closed
}
b = (byte)i;
} catch (IOException) {
return null;//timeout
}
#endregion
#region If there's a body size specified, decrement back 1
if (bodySizeWasSpecified) {
bodySize--;
}
#endregion
#region If Byte is \n or \r
if (b == 10 || b == 13) {
#region If ByteArray is Empty and the byte is \n reloop so we dont start with a leading \n
if (bytes.Count == 0 && b == 10) {
continue;
}
#endregion
#region We hit a newline, lets finish the reads here.
break;
#endregion
}
#endregion
#region Add the read byte to the Byte Array
bytes.Add(b);
#endregion
#region Break if bodysize was greater than 0 but now its 0
if (bodySizeWasSpecified && bodySize == 0) {
break;
}
#endregion
}
if (returnAsByteString) {
return string.Join(string.Empty, bytes.ToArray().Select(x => x.ToString("X2")));
}
return encoding.GetString(bytes.ToArray());
}

The alternative is that you could use the StringBuilder class and use the Append() or AppendLine function to add the strings to it. This will create a long string with all of them combined.

Related

Convert string to number array c#

I have a follow string example
0 0 1 2.33 4
2.1 2 11 2
There are many ways to convert it to an array, but I need the fastest one, because files can contain 1 billion elements.
string can contain an indefinite number of spaces between numbers
i'am trying
static void Main()
{
string str = "\n\n\n 1 2 3 \r 2322.2 3 4 \n 0 0 ";
byte[] byteArray = Encoding.ASCII.GetBytes(str);
MemoryStream stream = new MemoryStream(byteArray);
var values = ReadNumbers(stream);
}
public static IEnumerable<object> ReadNumbers(Stream st)
{
var buffer = new StringBuilder();
using (var sr = new StreamReader(st))
{
while (!sr.EndOfStream)
{
char digit = (char)sr.Read();
if (!char.IsDigit(digit) && digit != '.')
{
if (buffer.Length == 0) continue;
double ret = double.Parse(buffer.ToString() , culture);
buffer.Clear();
yield return ret;
}
else
{
buffer.Append(digit);
}
}
if (buffer.Length != 0)
{
double ret = double.Parse(buffer.ToString() , culture);
buffer.Clear();
yield return ret;
}
}
}

There are a few things you can do to improve the performance of your code. First, you can use the Split method to split the string into an array of strings, where each element of the array is a number in the string. This will be faster than reading each character of the string one at a time and checking if it is a digit.
Next, you can use double.TryParse to parse each element of the array into a double, rather than using double.Parse and catching any potential exceptions. TryParse will be faster because it does not throw an exception if the string is not a valid double.
Here is an example of how you could implement this:
public static IEnumerable<double> ReadNumbers(string str)
{
string[] parts = str.Split(new[] {' ', '\n', '\r', '\t'}, StringSplitOptions.RemoveEmptyEntries);
foreach (string part in parts)
{
if (double.TryParse(part, NumberStyles.Any, CultureInfo.InvariantCulture, out double value))
{
yield return value;
}
}
}

I'd rather suggest the simpliest solution first and haunt for nano-seconds if there really is a problem with that code.
var doubles = myInput.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Select(x => double.Parse(x, whateverCulture))
Do that for every line in your file, not for the entire file at once, as reading such a huge file at once may crush your memory.
Pretty easy to understand. Afterwards perform a benchmark-test with your data and see if it really affects performance when trying to parse the data. However chances are the actual bottleneck is reading that huge file- which essentially is a IO-thing.

You can improve your solution by trying to avoid creating many objects on the heap. Especially buffer.ToString() is called repeatedly and creates new strings. You can use a ReadOnlySpan<char> struct to slice the string and at the same time avoid heap allocations. A span provides pointers into the original string without making copies of it or parts of it when slicing.
Also do not return the doubles as object, as this will box them. I.e., it will store them on the heap. See: Boxing and Unboxing (C# Programming Guide). If you prefer your solution over mine, use an IEnumerable<double> as return type of your method.
The use of ReadOnlySpans; however, has the disadvantage that it cannot be used in iterator methods. The reason is that a ReadOnlySpan must be allocated on the stack, but an iterator method wraps its state in a class. If you try, you will get the Compiler Error CS4013:
Instance of type cannot be used inside a nested function, query expression, iterator block or async method
Therefore, we must either store the numbers in a collection or consume them in-place. Since I don't know what you want to do with the numbers, I use the former approach:
public static List<double> ReadNumbers(string input)
{
ReadOnlySpan<char> inputSpan = input.AsSpan();
int start = 0;
bool isNumber = false;
var list = new List<double>(); // Improve by passing the expected maximum length.
int i;
for (i = 0; i < inputSpan.Length; i++) {
char c = inputSpan[i];
bool isDigit = Char.IsDigit(c);
if (isDigit && !isNumber) {
start = i;
isNumber = true;
} else if (isNumber && !isDigit && c != '.') {
isNumber = false;
if (Double.TryParse(inputSpan[start..i], CultureInfo.InvariantCulture, out double d)) {
list.Add(d);
}
}
}
if (isNumber) {
if (Double.TryParse(inputSpan[start..i], CultureInfo.InvariantCulture, out double d)) {
list.Add(d);
}
}
return list;
}
inputSpan[start..i] creates a slice as ReadOnlySpan<char>.
Test
string str = "\n\n\n 1 2 3 \r 2322.2 3 4 \n 0 0 ";
foreach (double d in ReadNumbers(str)) {
Console.WriteLine(d);
}
But whenever you are asking for speed, you must run benchmarks to compare the different approaches. Very often what seems a superior solution may fail in the benchmark.
See also: All About Span: Exploring a New .NET Mainstay

FizzBuzz test case pass [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I am wondering can someone please tell me where I am going wrong with this FizzBuzz.
I get an error "not all code returns a value and struggling to find out exactly how to fix it.
My code is below:
for (int i = 0; i < input; i++)
{
if (i % 3 == 0)
{
Console.WriteLine("Fizz");
}
else if (i % 5 == 0)
{
Console.WriteLine("Buzz");
}
else if (i % 3 == 0 && i % 5 == 0)
{
Console.WriteLine("FizzBuzz");
}
else
{
Console.WriteLine(i);
}
}
And the test case is this:
[Test]
public void Test1()
{
var solution = new Solution();
Assert.AreEqual(solution.PrintFizzBuzz(15), "FizzBuzz");
}

First of like other people have said, not all the code is present so it's difficult for us to help. But I'll try anyway.
When testing with Assert.AreEqual the function PrintFizzBuzz should return something, in this case the string "FizzBuzz". But if you are using the Console.WriteLine method it will return nothing. The Console.Writeline method will print the string to the console, but the console is not actually the 'program', it's just for visualising some things like logging or debugging.
So to fix your issue you should use return instead of Console.Writeline and let your method PrintFizzBuzz return a string instead of voiding and printing it to the console. It's probably also better to rename your method in this case, because it doesn't print FizzBuzz anymore.
Your code has also another issue and that's when you input 15 it will print out "Fizz", because the check you do is modulo 3 and 15 % 3 = 0. You should order your check the otherway around, from specific to less specific.
An example for returning a string would be:
string SomeMethod()
{
return "Some String"
}

not all code returns a value
Means your method declared having a return type and not void but you are not returning anything from your method. One example case:
int Add(int a, int b)
{
int c = a + b;
// no return specified would through the same error
}

Convert string to ASCII without exceptions (like TryParse)

I am implementing a TryParse() method for an ASCII string class. The method takes a string and converts it to a C-style string (i.e. a null-terminated ASCII string).
I had been using only a Parse(), doing the conversion to ASCII using::
public static bool Parse(string s, out byte[] result)
{
result = null;
if (s == null || s.Length < 1)
return false;
byte[]d = new byte[s.Length + 1]; // Add space for null-terminator
System.Text.Encoding.ASCII.GetBytes(s).CopyTo(d, 0);
// GetBytes can throw exceptions
// (so can CopyTo() but I can replace that with a loop)
result = d;
return true;
}
However, as part of the idea of a TryParse is to remove the overhead of exceptions, and GetBytes() throws exceptions, I'm looking for a different method that does not do so.
Maybe there is a TryGetbytes()-like method?
Or maybe we can reason about the expected format of a standard .Net string and perform the change mathematically (I'm not overly familiar with UTF encodings)?
EDIT: I guess for non-ASCII chars in the string, the TryParse() method should return false
EDIT: I expect when I get around to implementing the ToString() method for this class I may need to do the reverse there.

Two options:
You could just ignore Encoding entirely, and write the loop yourself:
public static bool TryParse(string s, out byte[] result)
{
result = null;
// TODO: It's not clear why you don't want to be able to convert an empty string
if (s == null || s.Length < 1)
{
return false;
}
byte buffer = new byte[s.Length + 1]; // Add space for null-terminator
for (int i = 0; i < s.Length; i++)
{
char c = s[i];
if (c > 127)
{
return false;
}
buffer[i] = (byte) c;
}
result = buffer;
return true;
}
That's simple, but may be slightly slower than using Encoding.GetBytes.
The second option would be to use a custom EncoderFallback:
public static bool TryParse(string s, out byte[] result)
{
result = null;
// TODO: It's not clear why you don't want to be able to convert an empty string
if (s == null || s.Length < 1)
{
return false;
}
var fallback = new CustomFallback();
var encoding = new ASCIIEncoding { EncoderFallback = fallback };
byte buffer = new byte[s.Length + 1]; // Add space for null-terminator
// Use overload of Encoding.GetBytes that writes straight into the buffer
encoding.GetBytes(s, 0, s.Length, buffer, 0);
if (fallback.HadErrors)
{
return false;
}
result = buffer;
return true;
}
That would require writing CustomFallback though - it would need to basically keep track of whether it had ever been asked to handle invalid input.
If you didn't mind an encoding processing the data twice, you could call Encoding.GetByteCount with a UTF-8-based encoding with a replacement fallback (with a non-ASCII replacement character), and check whether that returns the same number of bytes as the number of chars in the string. If it does, call Encoding.ASCII.GetBytes.
Personally I'd go for the first option unless you have reason to believe it's too slow.

There are two possible exceptions that Encoding.GetBytes might throw according to the documentation.
ArgumentNullException is easily avoided. Do a null check on your input and you can ensure this is never thrown.
EncoderFallbackException needs a bit more investigation... Reading the documentation:
A fallback strategy determines how an encoder handles invalid characters or how a decoder handles invalid bytes.
And if we looking in the documentation for ASCII encoding we see this:
It uses replacement fallback to replace each string that it cannot encode and each byte that it cannot decode with a question mark ("?") character.
That means it doesn't use the Exception Fallback and thus will never throw an EncoderFallbackException.
So in summary if you are using ASCII encoding and ensure you don't pass in a null string then you will never have an exception thrown by the call to GetBytes.

The GetBytes method is throwing an exception because your Encoding.EncoderFallback specifies that it should throw an exception.
Create an encoding object with EncoderReplacementFallback to avoid exceptions on unencodable characters.
Encoding encodingWithFallback = new ASCIIEncoding() { DecoderFallback = DecoderFallback.ReplacementFallback };
encodingWithFallback.GetBytes("Hɘ££o wor£d!");
This way imitates the TryParse methods of the primitive .NET value types:
bool TryEncodingToASCII(string s, out byte[] result)
{
if (s == null || Regex.IsMatch(s, "[^\x00-\x7F]")) // If a single ASCII character is found, return false.
{
result = null;
return false;
}
result = Encoding.ASCII.GetBytes(s); // Convert the string to ASCII bytes.
return true;
}

How to count from a sentence in C# [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
So, I am creating a word filter for a game server in C# and basically I am trying to scourer the sentence for banned words and replace them with clean words. I've already done so, but now I'm up to the part where I want to scan the sentence for a list of sentence banned words. I'm hopeless at this bit, and I can't seem to wrap my head around it.
Basically I am CheckSentence(Message) in the ChatManager, and need the following code to count and return continue; if the value is more than 5. So far I have:
public bool CheckSentence(string Message)
{
foreach (WordFilter Filter in this._filteredWords.ToList())
{
if (Message.ToLower().Contains(Filter.Word) && Filter.IsSentence)
{
// count Message, if message contains >5
// from (Message.Contains(Filter.Word))
// continue; else (ignore)
}
}
return false;
}
I'm not too sure if that makes much sense, but I want it to continue; if there are more than 5 Message.Contains(Filter.Word)

public bool CheckSentence(string rawMessage)
{
var lower = rawMessage.ToLower();
var count = 0;
foreach (WordFilter Filter in this._filteredWords.ToList())
{
if (lower.Contains(Filter.Word) && Filter.IsSentence)
{
count++;
}
}
return count >= 5;
}
If this becomes too slow, you may be better of caching the list of filtered words in a HashSet, and iterating over each word in the message, checking if it exists in the HashSet, which would give you O(n) speed, where N is the number of words.
LINQ Version
public bool CheckSentenceLinq(string rawMessage)
{
var lower = rawMessage.ToLower();
return _filteredWords
.Where(x => x.IsSentence)
.Count(x => lower.Contains(x.Word)) >= 5;
}
EDIT 2: LINQ Updated As per #S.C. Comment
By #S.C.
For the linq version, there's no need to count past the first five. return _filteredWords.Where(x => x.IsSentence && lower.Contains(x.Word)).Skip(5).Any();
public bool CheckSentenceLinq(string rawMessage)
{
var lower = rawMessage.ToLower();
return _filteredWords
.Where(x => x.IsSentence)
.Where(x => lower.Contains(x.Word))
.Skip(5)
.Any();
}
ToUpper vs ToLower
As #DevEstacion mentioned and per Microsoft best practices for using string recommendations here it is best to use ToUpperInvariant() for string comparisons rather than ToLowerInvariant().
EDIT:Using Continue
public bool CheckSentenceWithContinue(string rawMessage)
{
var lower = rawMessage.ToLower();
var count = 0;
foreach (WordFilter Filter in this._filteredWords.ToList())
{
if (!Filter.IsSentence)
continue; // Move on to the next filter, as this is not a senetece word filter
if (!lower.Contains(Filter.Word))
continue; // Move on to the next filter, as the message does not contain this word
// If you are here it means filter is a Sentence filter, and the message contains the word, so increment the counter
count++;
}
return count >= 5;
}

I believe someone already posted a correct answer, I'm just here to provide an alternative.
So instead of doing a forloop or foreach, I'll be providing you with Regex solution.
public bool CheckSentence(string rawMessage)
{
/*
The string.Join("|", _filteredWords) will create the pattern for the Regex
the '|' means or so from the list of filtered words, it will look it up on
the raw message and get all matches
*/
return new Regex(string.Join("|", _filteredWords.Where(x => x.IsSentence)),
RegexOptions.IgnoreCase | RegexOptions.Compiled).Match(rawMessage).Length >= 5;
}
Benefits? much shorter, prevents loop and could be faster :)
Don't forget to add these two lines of using declaration on top of the .cs file
using System.Linq;
using System.Text.RegularExpressions;

Creating an array of all ASCII characters in C# [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I am currently creating a program that uses arrays to categorize ASCII characters in a text document. I am stuck when it comes to creating the array itself, which is a critical part to the project's functionality. It is also suggested that I make the array out of charfrequency objects, which I know my code for is not quite right for this particular project. I used the code from another similar project, but am unsure how to translate it to a project that reads text from a file. I have included my charfrequency class code for reference as to a general idea of what I'm trying to do. I also need to display the results in a format like this:
H(72) = 1
e(101) = 1
l(108) = 2
o(111) = 1
.(46) = 1
I do not understand programming very well, so detailed explanations with relatively simple terms would be very helpful.
{
public class CharFrequency
{
private char m_character;
private long m_count;
public CharFrequency(char ch)
{
Character = ch;
Count = 0;
}
public CharFrequency(char ch, long charCount)
{
Character = ch;
Count = charCount;
}
public char Character
{
set
{
m_character = value;
}
get
{
return m_character;
}
}
public long Count
{
get
{
return m_count;
}
set
{
if (value < 0)
value = 0;
m_count = value;
}
}
public void Increment()
{
m_count++;
}
public override bool Equals(object obj)
{
bool equal = false;
CharFrequency cf = new CharFrequency('\0', 0);
cf = (CharFrequency)obj;
if (this.Character == cf.Character)
equal = true;
return equal;
}
public override int GetHashCode()
{
return m_character.GetHashCode();
}
public override string ToString()
{
String s = String.Format("Character '{0}' ({1})'s frequency is {2}", m_character, (byte)m_character, m_count);
return s;
}
}
}

Since Unicode matches ASCII codes you can just select ASCII range with Enumerable.Range:
var allAscii = Enumerable.Range('\x1', 127).ToArray();
Note that C#/.Net uses UTF-16 (C# and UTF-16 characters) to represent char, but if you only looking for ASCII range it is not a problem (as ASCII covers characters with codes 1-127 it will not conflict with surrogate pairs which are encoded into 2 char in a string).

You can simply store your character frequencies in Dictionary<char, long>.

Perhaps you want to look at this:
http://stackoverflow.com/questions/3665757/c-sharp-convert-char-to-int
If it were me I would create an array in which I stored the character occurances like this:
long[] charCount = new long[256];
And then each time I see a character convert it to its integer value with something like:
int idx = (int)char.GetNumericValue(c);
And then count that character occurance like:
charCount[idx]++;

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

What's a good workaround for combining long strings? [closed] - c#

The alternative is that you could use the StringBuilder class and use the Append() or AppendLine function to add the strings to it. This will create a long string with all of them combined.

Related

Convert string to number array c#

FizzBuzz test case pass [closed]

Convert string to ASCII without exceptions (like TryParse)

How to count from a sentence in C# [closed]

Creating an array of all ASCII characters in C# [closed]

Categories

Resources