Creating an array of all ASCII characters in C# [closed] - c#

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I am currently creating a program that uses arrays to categorize ASCII characters in a text document. I am stuck when it comes to creating the array itself, which is a critical part to the project's functionality. It is also suggested that I make the array out of charfrequency objects, which I know my code for is not quite right for this particular project. I used the code from another similar project, but am unsure how to translate it to a project that reads text from a file. I have included my charfrequency class code for reference as to a general idea of what I'm trying to do. I also need to display the results in a format like this:
H(72) = 1
e(101) = 1
l(108) = 2
o(111) = 1
.(46) = 1
I do not understand programming very well, so detailed explanations with relatively simple terms would be very helpful.
{
public class CharFrequency
{
private char m_character;
private long m_count;
public CharFrequency(char ch)
{
Character = ch;
Count = 0;
}
public CharFrequency(char ch, long charCount)
{
Character = ch;
Count = charCount;
}
public char Character
{
set
{
m_character = value;
}
get
{
return m_character;
}
}
public long Count
{
get
{
return m_count;
}
set
{
if (value < 0)
value = 0;
m_count = value;
}
}
public void Increment()
{
m_count++;
}
public override bool Equals(object obj)
{
bool equal = false;
CharFrequency cf = new CharFrequency('\0', 0);
cf = (CharFrequency)obj;
if (this.Character == cf.Character)
equal = true;
return equal;
}
public override int GetHashCode()
{
return m_character.GetHashCode();
}
public override string ToString()
{
String s = String.Format("Character '{0}' ({1})'s frequency is {2}", m_character, (byte)m_character, m_count);
return s;
}
}
}

Since Unicode matches ASCII codes you can just select ASCII range with Enumerable.Range:
var allAscii = Enumerable.Range('\x1', 127).ToArray();
Note that C#/.Net uses UTF-16 (C# and UTF-16 characters) to represent char, but if you only looking for ASCII range it is not a problem (as ASCII covers characters with codes 1-127 it will not conflict with surrogate pairs which are encoded into 2 char in a string).

You can simply store your character frequencies in Dictionary<char, long>.

Perhaps you want to look at this:
http://stackoverflow.com/questions/3665757/c-sharp-convert-char-to-int
If it were me I would create an array in which I stored the character occurances like this:
long[] charCount = new long[256];
And then each time I see a character convert it to its integer value with something like:
int idx = (int)char.GetNumericValue(c);
And then count that character occurance like:
charCount[idx]++;

Related

What's a good workaround for combining long strings? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Right now I have it .Add() to a List<string> but this can get extremely long.
As you can see here, it added 33554432 strings to the list before finally throwing in the towel.
What can I do better here to workaround this?
StringBuilder
Since using a StringBuilder.AppendLine() it's been better. I haven't encountered the issue since but of course that doesn't mean it CAN'T occur.
My end goal
A lot of you are asking why I'm trying to combine strings and telling me to just read in chunks and such. This isn't really an option, I'm reading from an IMAP stream and I cannot chunk this as it's to be read to search data and to be shown to the user.
The only way I can reliably chunk it is if I were to create a new StringBuilder on-exception and start compiling to that, then maybe once read fully, combine all created StringBuilders into 1 string, that probably wouldn't even work well.
Im reading the Stream with an Extension ReadLine method
Note this is used for other kinds of operations as well, i know the return bit isn't exactly pretty or optimized for this case either.
public static string ReadLine(this Stream stream, ref int bodySize, Encoding encoding, bool returnAsByteString=false) {
bool bodySizeWasSpecified = bodySize > 0;
byte b = 0;
List<byte> bytes = new List<byte>();
while (true) {
#region Try Get 1 Byte from Stream
try {
int i = stream.ReadByte();
if (i == -1) {
break;//stream ended/closed
}
b = (byte)i;
} catch (IOException) {
return null;//timeout
}
#endregion
#region If there's a body size specified, decrement back 1
if (bodySizeWasSpecified) {
bodySize--;
}
#endregion
#region If Byte is \n or \r
if (b == 10 || b == 13) {
#region If ByteArray is Empty and the byte is \n reloop so we dont start with a leading \n
if (bytes.Count == 0 && b == 10) {
continue;
}
#endregion
#region We hit a newline, lets finish the reads here.
break;
#endregion
}
#endregion
#region Add the read byte to the Byte Array
bytes.Add(b);
#endregion
#region Break if bodysize was greater than 0 but now its 0
if (bodySizeWasSpecified && bodySize == 0) {
break;
}
#endregion
}
if (returnAsByteString) {
return string.Join(string.Empty, bytes.ToArray().Select(x => x.ToString("X2")));
}
return encoding.GetString(bytes.ToArray());
}
The alternative is that you could use the StringBuilder class and use the Append() or AppendLine function to add the strings to it. This will create a long string with all of them combined.

How to return multiple values in C# 7? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
Is it is possible to return multiple values from a method natively?
What do you mean by natively?
C# 7 has a new feature that lets you return more than one value from a method thanks to tuple types and tuple literals.
Take the following function for instance:
(string, string, string) MyCoolFunction() // tuple return type
{
//...
return (firstValue, secondValue, thirdValue);
}
Which can be used like this:
var values = MyCoolFunction();
var firstValue = values.Item1;
var secondValue = values.Item2;
var thirdValue = values.Item3;
Or by using deconstruction syntax
(string first, string second, string third) = MyCoolFunction();
//...
var (first, second, third) = MyCoolFunction(); //Implicitly Typed Variables
Take some time to check out the Documentation, they have some very good examples (this answer's one are based on them!).
You are looking for Tuples. This is an example:
static (int count, double sum) Tally(IEnumerable<double> values)
{
int count = 0;
double sum = 0.0;
foreach (var value in values)
{
count++;
sum += value;
}
return (count, sum);
}
...
var values = ...
var t = Tally(values);
Console.WriteLine($"There are {t.count} values and their sum is {t.sum}");
Example stolen from http://www.thomaslevesque.com/2016/07/25/tuples-in-c-7/
You can also implement like this:
public class Program
{
public static void Main(string[] args)
{
var values=GetNumbers(6,2);
Console.Write(values);
}
static KeyValuePair<int,int> GetNumbers(int x,int y)
{
return new KeyValuePair<int,int>(x,y);
}
}

Check if a string contains not defined characters [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a pre defined string as Follows.
string preDefined="abc"; // or i can use char array in here
string value="ac";
string value1="abw";
I need some function to compare value with preDefined.
(value.SomefunctionContains(preDefined)
this function needs to return
value -> true;
value1 -> false
I knew that i can't use contains() or Any(). so plz help
You are just looking for if value has any character that is not in predefined, so this should do it:
!value.Any(x => !predefined.Contains(x))
Or it's more clear using All:
value.All(predefined.Contains);
private bool SomeFunction(string preDefined, string str)
{
foreach (char ch in str)
{
if (!preDefined.Contains(ch))
{
return false;
}
}
return true;
}
You can implement the following method to get the result :
private static bool DoesContain(string predefined, string value)
{
char[] c_pre = predefined.ToCharArray();
char[] c_val = value.ToCharArray();
char[] intersection = c_pre.Intersect(c_val).ToArray();
if (intersection.Length == c_val.Length) {
return true;
}
else {
return false;
}
}
Please not that this solution is a generalized implementation. IT also returns true even if the characters are not in the same order, unless ther include all.

Using unicode characters bigger than 2 bytes with .Net

I'm using this code to generate U+10FFFC
var s = Encoding.UTF8.GetString(new byte[] {0xF4,0x8F,0xBF,0xBC});
I know it's for private-use and such, but it does display a single character as I'd expect when displaying it. The problems come when manipulating this unicode character.
If I later do this:
foreach(var ch in s)
{
Console.WriteLine(ch);
}
Instead of it printing just the single character, it prints two characters (i.e. the string is apparently composed of two characters). If I alter my loop to add these characters back to an empty string like so:
string tmp="";
foreach(var ch in s)
{
Console.WriteLine(ch);
tmp += ch;
}
At the end of this, tmp will print just a single character.
What exactly is going on here? I thought that char contains a single unicode character and I never had to worry about how many bytes a character is unless I'm doing conversion to bytes. My real use case is I need to be able to detect when very large unicode characters are used in a string. Currently I have something like this:
foreach(var ch in s)
{
if(ch>=0x100000 && ch<=0x10FFFF)
{
Console.WriteLine("special character!");
}
}
However, because of this splitting of very large characters, this doesn't work. How can I modify this to make it work?
U+10FFFC is one Unicode code point, but string's interface does not expose a sequence of Unicode code points directly. Its interface exposes a sequence of UTF-16 code units. That is a very low-level view of text. It is quite unfortunate that such a low-level view of text was grafted onto the most obvious and intuitive interface available... I'll try not to rant much about how I don't like this design, and just say that not matter how unfortunate, it is just a (sad) fact you have to live with.
First off, I will suggest using char.ConvertFromUtf32 to get your initial string. Much simpler, much more readable:
var s = char.ConvertFromUtf32(0x10FFFC);
So, this string's Length is not 1, because, as I said, the interface deals in UTF-16 code units, not Unicode code points. U+10FFFC uses two UTF-16 code units, so s.Length is 2. All code points above U+FFFF require two UTF-16 code units for their representation.
You should note that ConvertFromUtf32 doesn't return a char: char is a UTF-16 code unit, not a Unicode code point. To be able to return all Unicode code points, that method cannot return a single char. Sometimes it needs to return two, and that's why it makes it a string. Sometimes you will find some APIs dealing in ints instead of char because int can be used to handle all code points too (that's what ConvertFromUtf32 takes as argument, and what ConvertToUtf32 produces as result).
string implements IEnumerable<char>, which means that when you iterate over a string you get one UTF-16 code unit per iteration. That's why iterating your string and printing it out yields some broken output with two "things" in it. Those are the two UTF-16 code units that make up the representation of U+10FFFC. They are called "surrogates". The first one is a high/lead surrogate and the second one is a low/trail surrogate. When you print them individually they do not produce meaningful output because lone surrogates are not even valid in UTF-16, and they are not considered Unicode characters either.
When you append those two surrogates to the string in the loop, you effectively reconstruct the surrogate pair, and printing that pair later as one gets you the right output.
And in the ranting front, note how nothing complains that you used a malformed UTF-16 sequence in that loop. It creates a string with a lone surrogate, and yet everything carries on as if nothing happened: the string type is not even the type of well-formed UTF-16 code unit sequences, but the type of any UTF-16 code unit sequence.
The char structure provides static methods to deal with surrogates: IsHighSurrogate, IsLowSurrogate, IsSurrogatePair, ConvertToUtf32, and ConvertFromUtf32. If you want you can write an iterator that iterates over Unicode characters instead of UTF-16 code units:
static IEnumerable<int> AsCodePoints(this string s)
{
for(int i = 0; i < s.Length; ++i)
{
yield return char.ConvertToUtf32(s, i);
if(char.IsHighSurrogate(s, i))
i++;
}
}
Then you can iterate like:
foreach(int codePoint in s.AsCodePoints())
{
// do stuff. codePoint will be an int will value 0x10FFFC in your example
}
If you prefer to get each code point as a string instead change the return type to IEnumerable<string> and the yield line to:
yield return char.ConvertFromUtf32(char.ConvertToUtf32(s, i));
With that version, the following works as-is:
foreach(string codePoint in s.AsCodePoints())
{
Console.WriteLine(codePoint);
}
As posted already by Martinho, it is much easier to create the string with this private codepoint that way:
var s = char.ConvertFromUtf32(0x10FFFC);
But to loop through the two char elements of that string is senseless:
foreach(var ch in s)
{
Console.WriteLine(ch);
}
What for? You will just get the high and low surrogate that encode the codepoint. Remember a char is a 16 bit type so it can hold just a max value of 0xFFFF. Your codepoint doesn't fit into a 16 bit type, indeed for the highest codepoint you'll need 21 bits (0x10FFFF) so the next wider type would just be a 32 bit type. The two char elements are not characters, but a surrogate pair. The value of 0x10FFFC is encoded into the two surrogates.
While #R. Martinho Fernandes's answer is correct, his AsCodePoints extension method has two issues:
It will throw an ArgumentException on invalid code points (high surrogate without low surrogate or vice versa).
You can't use char static methods that take (char) or (string, int) (such as char.IsNumber()) if you only have int code points.
I've split the code into two methods, one similar to the original but returns the Unicode Replacement Character on invalid code points. The second method returns a struct IEnumerable with more useful fields:
StringCodePointExtensions.cs
public static class StringCodePointExtensions {
const char ReplacementCharacter = '\ufffd';
public static IEnumerable<CodePointIndex> CodePointIndexes(this string s) {
for (int i = 0; i < s.Length; i++) {
if (char.IsHighSurrogate(s, i)) {
if (i + 1 < s.Length && char.IsLowSurrogate(s, i + 1)) {
yield return CodePointIndex.Create(i, true, true);
i++;
continue;
} else {
// High surrogate without low surrogate
yield return CodePointIndex.Create(i, false, false);
continue;
}
} else if (char.IsLowSurrogate(s, i)) {
// Low surrogate without high surrogate
yield return CodePointIndex.Create(i, false, false);
continue;
}
yield return CodePointIndex.Create(i, true, false);
}
}
public static IEnumerable<int> CodePointInts(this string s) {
return s
.CodePointIndexes()
.Select(
cpi => {
if (cpi.Valid) {
return char.ConvertToUtf32(s, cpi.Index);
} else {
return (int)ReplacementCharacter;
}
});
}
}
CodePointIndex.cs:
public struct CodePointIndex {
public int Index;
public bool Valid;
public bool IsSurrogatePair;
public static CodePointIndex Create(int index, bool valid, bool isSurrogatePair) {
return new CodePointIndex {
Index = index,
Valid = valid,
IsSurrogatePair = isSurrogatePair,
};
}
}
To the extent possible under law, the person who associated CC0 with this work has waived all copyright and related or neighboring rights to this work.
Yet another alternative to enumerate the UTF32 characters in a C# string is to use the System.Globalization.StringInfo.GetTextElementEnumerator method, as in the code below.
public static class StringExtensions
{
public static System.Collections.Generic.IEnumerable<UTF32Char> GetUTF32Chars(this string s)
{
var tee = System.Globalization.StringInfo.GetTextElementEnumerator(s);
while (tee.MoveNext())
{
yield return new UTF32Char(s, tee.ElementIndex);
}
}
}
public struct UTF32Char
{
private string s;
private int index;
public UTF32Char(string s, int index)
{
this.s = s;
this.index = index;
}
public override string ToString()
{
return char.ConvertFromUtf32(this.UTF32Code);
}
public int UTF32Code { get { return char.ConvertToUtf32(s, index); } }
public double NumericValue { get { return char.GetNumericValue(s, index); } }
public UnicodeCategory UnicodeCategory { get { return char.GetUnicodeCategory(s, index); } }
public bool IsControl { get { return char.IsControl(s, index); } }
public bool IsDigit { get { return char.IsDigit(s, index); } }
public bool IsLetter { get { return char.IsLetter(s, index); } }
public bool IsLetterOrDigit { get { return char.IsLetterOrDigit(s, index); } }
public bool IsLower { get { return char.IsLower(s, index); } }
public bool IsNumber { get { return char.IsNumber(s, index); } }
public bool IsPunctuation { get { return char.IsPunctuation(s, index); } }
public bool IsSeparator { get { return char.IsSeparator(s, index); } }
public bool IsSurrogatePair { get { return char.IsSurrogatePair(s, index); } }
public bool IsSymbol { get { return char.IsSymbol(s, index); } }
public bool IsUpper { get { return char.IsUpper(s, index); } }
public bool IsWhiteSpace { get { return char.IsWhiteSpace(s, index); } }
}

How to check if a string has at least 1 alphabetic character? [duplicate]

This question already has answers here:
How can I generate random alphanumeric strings?
(36 answers)
Closed 2 years ago.
My ASP.NET application requires me to generate a huge number of random strings such that each contain at least 1 alphabetic and numeric character and should be alphanumeric on the whole.
For this my logic is to generate the code again if the random string is numeric:
public static string GenerateCode(int length)
{
if (length < 2 || length > 32)
{
throw new RSGException("Length cannot be less than 2 or greater than 32.");
}
string newcode = Guid.NewGuid().ToString("n").Substring(0, length).ToUpper();
return newcode;
}
public static string GenerateNonNumericCode(int length)
{
string newcode = string.Empty;
try
{
newcode = GenerateCode(length);
}
catch (Exception)
{
throw;
}
while (IsNumeric(newcode))
{
return GenerateNonNumericCode(length);
}
return newcode;
}
public static bool IsNumeric(string str)
{
bool isNumeric = false;
try
{
long number = Convert.ToInt64(str);
isNumeric = true;
}
catch (Exception)
{
isNumeric = false;
}
return isNumeric;
}
While debugging, it is working properly but when I ask it to create 10,000 random strings, its not able to handle it properly. When I export that data to Excel, I find at least 20 strings on an average that are numeric.
Is it a problem with my code or C#? - Mine.
If anyone's looking for code,
public static string GenerateCode(int length)
{
if (length < 2)
{
throw new A1Exception("Length cannot be less than 2.");
}
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var random = new Random();
var result = new string(
Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)])
.ToArray());
return result;
}
public static string GenerateAlphaNumericCode(int length)
{
string newcode = string.Empty;
try
{
newcode = GenerateCode(length);
while (!IsAlphaNumeric(newcode))
{
newcode = GenerateCode(length);
}
}
catch (Exception)
{
throw;
}
return newcode;
}
public static bool IsAlphaNumeric(string str)
{
bool isAlphaNumeric = false;
Regex reg = new Regex("[0-9A-Z]+");
isAlphaNumeric = reg.IsMatch(str);
return isAlphaNumeric;
}
Thanks to all for your ideas.
If you want to stick with the Guid as the generator, you could always validate using a Regex
This will only return true if at least one alpha is present
Regex reg = new Regex("[a-zA-Z]+");
Then just use the IsMatch method to see if your string is valid
That way you don't need the (IMHO rather ugly) try..catch around the Convert.
Update : I see your subsequent comment about actually making your code slower. Are you instantiating the Regex object only once, or every time that the test is being done? If the latter then this will be rather inefficient, and you should consider using a "lazy-loaded" property on your class, e.g.
private Regex reg;
private Regex AlphaRegex
{
get
{
if (reg == null) reg = new Regex("[a-zA-Z]+");
return reg;
}
}
Then just use AlphaRegex.IsMatch() in your method. I would expect this to make a difference.
use name space then using System.Linq; use normal string
check whether the string consist at lest one character or number.
using System.Linq;
string StrCheck = "abcd123";
check the string has characters ---> StrCheck.Any(char.IsLetter)
check the string has numbers ---> StrCheck.Any(char.IsDigit)
if (StrCheck.Any(char.IsLetter) && StrCheck.Any(char.IsDigit))
{
//statement goes here.....
}
sorry for the late reply ...
I didn't quite understand what you want in the string except letters (abc etc) - lets say numbers.
You can generate a random character as following:
Random r = new Random();
r.Next('a', 'z'); //For lowercase
r.Next('A', 'Z'); //For capitals
//or you can convert lowercase to capital:
char c = 'k' + ('A' - 'a');
If you want to create a string:
var s = new StringBuilder();
for(int i = 0; i < length; ++i)
s.Append((char)r.Next('a', 'z' + 1)); //Changed to char
return s.ToString();
Note: I don't know ASP.NET so much, so I just act like it's C#.
To answer your question strictly, using your existing code: there is a problem with your recursion logic, which can be avoided by not using recursion (there is absolutely no reason to use recursion in GenerateNonNumericCode). Do the following instead:
public static string GenerateNonNumericCode(int length)
{
string newcode = GenerateCode(length);
while (IsNumeric(newcode))
{
newcode = GenerateCode(length);
}
return newcode;
}
Other General Notes
Your code is very inefficient--throwing exceptions is expensive, so using try/catch in a loop is therefore slow and pointless. As others have suggested, regex makes more sense (System.Text.RegularExpressions namespace).
Is it a problem with my code or C#?
When in doubt, the problem is almost never C#.
So, I would change the code to this:
static Random r = new Random();
public static string GenerateNonNumericCodeFaster(int length) {
var firstLength = r.Next(0, length - 1);
var secondLength = length - 1 - firstLength;
return GenerateCode(firstLength)
+ (char) r.Next((int)'A', (int)'G')
+ GenerateCode(secondLength);
}
You can keep your GenerateCode function as is. Everything else you toss out. The idea here of course is, rather than testing if the string contains an alphabetic character, you just explicitly PUT one in. In my tests, using this code could generate 10,000 8 character strings in 0.0172963 seconds compared to your code which takes around 52 seconds. So, yeah, this is about 3000 times faster :)

Categories