Large Unicode List to Char[] c# - c#

I pass a string containing separate characters on each line to a Unicode list with this code.
string MultiLineCharArray = string.Join(Environment.NewLine, CharArray);
var UnicodeList = MultiLineCharArray.Select(str => Convert.ToUInt16(str)).ToList();
when reversing it the program dies, it does not even try, very badly:
for (int i = 0; i < UnicodeList.Count; i++)
{
MultiLineCharArray = string.Join(Environment.NewLine, Convert.ToChar(UnicodeList[i]));
}
I need the MultiLineCharArray to then convert it into an Array of its valid Unicode characters (unicode = A) going through each line to convert it to a single string.
The Unicode list is very long (9,000) elements, maybe that's why the program crashed, is there a more optimal way to do it?

Use LINQ and String functions
// to populate charArray with dummy chars A through J
var charArray = Enumerable.Range(65, 10).Select(i => (char)i);
// your existing code
var multiLineCharArray = string.Join(Environment.NewLine, charArray);
var unicodeList = multiLineCharArray.Select(str => Convert.ToUInt16(str)).ToList();
// to reverse
var multiLineCharArray1 = new string(unicodeList.Select(u => (char)u).ToArray());
var charArray1 = multiLineCharArray1.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);

Related

Get first word on every new line in a long string?

I am trying to add a leaderboard in my unity app
I have a long string as below(just an example, actual string is http pipe data from my web service, not manually stored):
string str ="name1|10|junk data.....\n
name2|9|junk data.....\n
name3|8|junk data.....\n
name4|7|junk data....."
I want to get the first word (string before the first pipe '|' like name1,name2...) from every line and store it in an array and then get the numbers (10,9,8... arter the '|') and store it in an other one.
Anyone know whats the best way to do this?
Fiddle here: https://dotnetfiddle.net/utp4HK
code below, you may want to revisit the algorithm for performance, but if that is not an issue, this will do the trick;
using System;
public class Program
{
public static void Main()
{
string str ="name1|10|junk data.....\nname2|9|junk data.....\nname3|8|junkdata.....\nname4|7|junk data.....";
foreach (var line in str.Split('\n'))
{
Console.WriteLine(line.Split('|')[0]);
}
}
}
First split by new-line characters:
string[] lines = str.Split(new string[]{Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
Then you can use LINQ to get both arrays:
var data = lines.Select(l => l.Trim().Split('|')).Where(arr => arr.Length > 1);
string[] names = data.Select(arr => arr[0].Trim()).ToArray();
string[] numbers = data.Select(arr => arr[1].Trim()).ToArray();
Check out this link on splitting strings: http://msdn.microsoft.com/en-us/library/ms228388.aspx
You could first create an array of strings (one for each line) by splitting the long string with \n as the delimeter.
Then, you could split each line with | as the delimeter. The name would be the 0th index of the array and the number would be the 1st index of the array.
First of all, you can't have a multi line string without using verbatim string literal. With using verbatim string literal, you can split your string based on \r\n or Environment.NewLine like;
string str = #"name1|10|junk data.....
name2|9|junk data.....
name3|8|junk data.....
name4|7|junk data.....";
var array = str.Split(new []{Environment.NewLine},
StringSplitOptions.RemoveEmptyEntries);
foreach (var item in array)
{
Console.WriteLine(item.Split(new[]{"|"},
StringSplitOptions.RemoveEmptyEntries)[0].Trim());
}
Output will be;
name1
name2
name3
name4
Try this:
string str ="name1|10|junk data.....\n" +
"name2|9|junk data.....\n" +
"name3|8|junk data.....\n" +
"name4|7|junk data.....";
string[] tempArray1 = str.Split('\n');
string[] tempArray2 = null;
string[,] newArray = null;
for (int i = 0; i < tempArray1.Length; i++)
{
tempArray2 = tempArray1[i].Split('|');
if (newArray[0, 0].ToString().Length == 0)
{
newArray = new string[tempArray1.Length, tempArray2.Length];
}
for (int j = 0; j < tempArray2.Length; j++)
{
newArray[i,j] = tempArray2[j];
}
}

Parse for words starting with # character in a string

I have to write a program which parses a string for words starting with '#' and return the words along with the # symbol.
I have tried something like:
char[] delim = { '#' };
string[] strArr = commenttext.Split(delim);
return strArr;
But it returns all the words without '#' in an array.
I need something pretty straight forward.No LINQ like things
If the string is "abc #ert #xyz" then I should get back #ert and #xyz.
If you define "word" as "separated by spaces" then this would work:
string[] strArr = commenttext.Split(' ')
.Where(w => w.StartsWith("#"))
.ToArray();
If you need something more complex, a Regular Expression might be more appropriate.
I need something pretty straight forward.No LINQ like things>
The non-Linq equivalent would be:
var words = commenttext.Split(' ');
List<string> temp = new List<string>();
foreach(string w in words)
{
if(w.StartsWith("#"))
temp.Add(w);
}
string[] strArr = temp.ToArray();
If you're against using Linq, which you should not be unless you're required to use older .NET versions, an approach along these lines would suit your needs.
string[] words = commenttext.Split(delimiter);
for (int i = 0; i < words.Length; i++)
{
string word = words[i];
if (word.StartsWith(delimiter))
{
// save in array / list
}
}
const string test = "#Amir abcdef #Stack #C# mnop xyz";
var splited = test.Split(' ').Where(m => m.StartsWith("#")).ToList();
foreach (var b in splited)
{
Console.WriteLine(b.Substring(1, b.Length - 1));
}
Console.ReadKey();

C# Regex.Split and Regular expression

I have string, I need split it two times and select part which goes after special character.
Lets say:
string myString = "Word 2010|82e146e7-bc85-4bd4-a691-23d55c686f4b;#Videos|55140947-00d0-4d75-9b5c-00d8d5ab8436";
string[] guids = Regex.Split(myString,";#");
So here I am getting array of two elements with Value + GUID. But I need only Guids, like:
[0]82e146e7-bc85-4bd4-a691-23d55c686f4b
[1]55140947-00d0-4d75-9b5c-00d8d5ab8436
Any way of doing it in one/two lines?
You can do this but just because you can do it in one line doesn't mean you should (readability comes into play if you get too fancy here). There's obviously no validation here at all.
string myString = "Word 2010|82e146e7-bc85-4bd4-a691-23d55c686f4b;#Videos|55140947-00d0-4d75-9b5c-00d8d5ab8436";
string[] guids = Regex.Split(myString, ";#")
.SelectMany(s => Regex.Split(s, #"\|").Skip(1))
.ToArray();
Assert.AreEqual(2, guids.Length);
Assert.AreEqual("82e146e7-bc85-4bd4-a691-23d55c686f4b", guids[0]);
Assert.AreEqual("55140947-00d0-4d75-9b5c-00d8d5ab8436", guids[1]);
You could easily do this without a regex if the last part of each is always a guid:
string[] guids = String.Split(";").Select(c => c.Substring(c.Length - 36)).ToArray();
string[] guids = myString.Split(';').Select(x => x.Split('|')[1]).ToArray();
string myString = "Word 2010|82e146e7-bc85-4bd4-a691-23d55c686f4b;#Videos|55140947-00d0-4d75-9b5c-00d8d5ab8436";
//split the string by ";#"
string[] results = myString.Split(new string[] { ";#" }, StringSplitOptions.RemoveEmptyEntries);
//remove the "value|" part
results[0] = results[0].Substring(results[0].IndexOf('|') + 1);
results[1] = results[1].Substring(results[1].IndexOf('|') + 1);
//Same as above, but in a for loop. usefull if there are more then 2 guids to find
//for(int i = 0; i < results.Length; i++)
// results[i] = results[i].Substring(results[i].IndexOf('|') + 1);
foreach(string result in results)
Console.WriteLine(result);
var guids = Regex
.Matches(myString, #"HEX{8}-HEX{4}-HEX{4}-HEX{4}-HEX{12}".Replace("HEX", "[A-Fa-f0-9]"))
.Cast<Match>()
.Select(m => m.Value)
.ToArray();

How to replace special characters with their equivalent (such as " á " for " a") in C#?

I need to get the Portuguese text content out of an Excel file and create an xml which is going to be used by an application that doesn't support characters such as "ç", "á", "é", and others. And I can't just remove the characters, but replace them with their equivalent ("c", "a", "e", for example).
I assume there's a better way to do it than check each character individually and replace it with their counterparts. Any suggestions on how to do it?
You could try something like
var decomposed = "áéö".Normalize(NormalizationForm.FormD);
var filtered = decomposed.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark);
var newString = new String(filtered.ToArray());
This decomposes accents from the text, filters them and creates a new string. Combining diacritics are in the Non spacing mark unicode category.
string text = {text to replace characters in};
Dictionary<char, char> replacements = new Dictionary<char, char>();
// add your characters to the replacements dictionary,
// key: char to replace
// value: replacement char
replacements.Add('ç', 'c');
...
System.Text.StringBuilder replaced = new System.Text.StringBuilder();
for (int i = 0; i < text.Length; i++)
{
char character = text[i];
if (replacements.ContainsKey(character))
{
replaced.Append(replacements[character]);
}
else
{
replaced.Append(character);
}
}
// 'replaced' is now your converted text
For future reference, this is exactly what I ended up with:
temp = stringToConvert.Normalize(NormalizationForm.FormD);
IEnumerable<char> filtered = temp;
filtered = filtered.Where(c => char.GetUnicodeCategory(c) != System.Globalization.UnicodeCategory.NonSpacingMark);
final = new string(filtered.ToArray());
The perform is better with this solution:
string test = "áéíóúç";
string result = Regex.Replace(test .Normalize(NormalizationForm.FormD), "[^A-Za-z| ]", string.empty);

C# Linq non-vowels

From the given string
(i.e)
string str = "dry sky one two try";
var nonVowels = str.Split(' ').Where(x => !x.Contains("aeiou")); (not working).
How can i extract non-vowel words?
Come on now y'all. IndexOfAny is where it's at. :)
// if this is public, it's vulnerable to people setting individual elements.
private static readonly char[] Vowels = "aeiou".ToCharArray();
// C# 3
var nonVowelWorks = str.Split(' ').Where(word => word.IndexOfAny(Vowels) < 0);
// C# 2
List<string> words = new List<string>(str.Split(' '));
words.RemoveAll(delegate(string word) { return word.IndexOfAny(Vowels) >= 0; });
This should work:
var nonVowels = str.Split(' ').Where(x => x.Intersect("aeiou").Count() == 0);
String.Contains requires you to pass a single char. Using Enumerable.Contains would only work for a single char, as well - so you'd need multiple calls. Intersect should handle this case.
Something like:
var nonVowelWords = str.Split(' ').Where(x => Regex.Match(x, #"[aeiou]") == null);
string str = "dry sky one two try";
var nonVowels = str.ToCharArray()
.Where(x => !new [] {'a', 'e', 'i', 'o', 'u'}.Contains(x));

Categories