How to build a unicode string with emojis in c#? - c#

I've been using the following code to translate unicode parts that are taken from a text file in a format of string array ["1F3F3", "FE0F", "200D", "1F308"]. The mentioned unicode parts are a sample of 🏳️‍🌈 emoji and are taken from unicode.org resource(#1553 on the page).
public static void PrintEmoji(params string[] unicodeParts)
{
var unicodeBuilder = new StringBuilder();
foreach (var unicodePart in unicodeParts)
{
unicodeBuilder.Append((char) Convert.ToInt32(unicodePart, 16));
}
if(unicodeBuilder.ToString() is var unicodeResult && !string.IsNullOrWhiteSpace(unicodeResult))
Console.WriteLine(unicodeResult);
}
But this code only works for UTF-16 code units, for example 😀 (U+1F600), and not for unicode code part. How should i modify my method to be able to work with unicode code parts as well?

Thanks to JosefZ, the following solution seems to work fine.
public static void PrintEmoji(params string[] unicodeParts)
{
var unicodeBuilder = new StringBuilder();
foreach (var unicodePart in unicodeParts)
{
unicodeBuilder.Append(char.ConvertFromUtf32(Convert.ToInt32(unicodePart,16)));
}
if(unicodeBuilder.ToString() is var unicodeResult && !string.IsNullOrWhiteSpace(unicodeResult))
Console.WriteLine(unicodeResult);
}

Related

How to convert greek characters to HTML characters

I would like to be able to do this kind of operation:
var specialCharactersString = "αβ";
var encodedString = WebUtility.HtmlEncode(specialCharactersString);
Console.WriteLine(encodedString); // result: αβ
We work with an external database that stores data using both notations αβ and αβ. We want to be able to query both terms when the end-user use αβ.
So far, I tried:
WebUtility.HtmlEncode
HttpUtility.HtmlEncode
Encoding.GetEncoding(1253)
Thanks to #claudiom248, the answer was in another Stack Overflow post.
How to convert currency symbol to corresponding HTML entity
https://github.com/degant/web-utility-wrapper/blob/master/WebUtilityWrapper.cs
unicode characters and html has been a problem all the time. Here is a helper I use. Hope this helps.
Update: The source is from https://www.codeproject.com/Articles/20255/Full-HTML-Character-Encoding-in-C with very minor modification.
specialCharactersString.HtmlEncode()
public static class TextHelpers {
public static string HtmlEncode(this string text)
{
var chars = System.Web.HttpUtility.HtmlEncode(text).ToCharArray();
System.Text.StringBuilder result = new System.Text.StringBuilder(text.Length + (int)(text.Length * 0.1));
foreach (char c in chars)
{
int ansiValue = Convert.ToInt32(c);
if (ansiValue > 127)
result.AppendFormat("&#{0};", ansiValue);
else
result.Append(c);
}
return result.ToString();
}
}
As mentioned by claudiom248, the .NET Framework libraries cannot properly map high ASCII html entity characters. You can certainly pull in a 3rd party library, but if you'd like to avoid the additional cost and/or if you only have a small subset of characters that you always want to handle, you can maintain a simple dictionary lookup.
void Main()
{
var specialCharactersString = "αβ";
var sb = new StringBuilder();
foreach (var specialChar in specialCharactersString)
{
var valueExists = _dict.TryGetValue(specialChar, out var mappedSpecialChar);
if (valueExists)
{
sb.Append(mappedSpecialChar);
}
}
Console.WriteLine(sb.ToString());
}
private Dictionary<char, string> _dict = new Dictionary<char, string>
{
{ 'α', "α" },
{ 'β', "β" }
};
This will output αβ as expected.

Is there any way to append StringBuilder horizontally in C#?

I am trying to append two StringBuilders so that they produce something like:
Device # 1 2 3
Pt.Name ABC DEF GHI
what I have tried is:
class Program
{
class Device
{
public int ID { get; set; }
public string PatName { get; set; }
}
static void Main(string[] args)
{
var datas = new List<Device>
{
new Device { ID = 1, PatName = "ABC" },
new Device { ID = 2, PatName = "DEF" },
new Device { ID = 3, PatName = "GHO" }
};
// there is a collection which has all this information
StringBuilder sb = new StringBuilder();
sb.AppendFormat("{0} {1}", "Device #", "Pt.Name").AppendLine();
foreach (var data in datas)
{
var deviceId = data.ID;
var patName = data.PatName;
sb.AppendFormat("{0} {1}", deviceId, patName).AppendLine();
}
Console.WriteLine(sb);
}
}
but it is printing it in vertical manner, like
Device # Pt.Name
1 ABC
2 DEF
3 GHI
and if I remove that last AppendLine(); it is appending it at the end in the same line.
I want to use only one stringbuilder followed by only one foreach loop.
1.You could do it like:
StringBuilder sb=new StringBuilder();
sb.Append("Device #");
foreach(var data in datas)
sb.Append($" {data.deviceId}");
sb.Append("PT.Name");
foreach(var data in datas)
sb.Append($" {data.PatName}");
2.if you want to loop only once then you can use 2 StringBuilders:
StringBuilder sb1=new StringBuilder();
StringBuilder sb2=new StringBuilder();
sb1.Append("Device #");
sb2.Append("Pt.Name");
foreach(var data in datas)
{
sb1.Append($" {data.deviceId}");
sb2.Append($" {data.patName}");
}
sb1.Append(sb2.ToString());
3.You could also use string.Join() which also relies on StringBuilder to write a one-liner but however this way you have extra select statements:
string result = $"Device # {string.Join(" ",datas.Select(x => x.deviceId))}\r\nPt.Name {string.Join(" ",datas.Select(x => x.patName))}";
I love your question because it is based on avoiding these two assumption, 1) that strings are always printed left to right and 2) newlines always result in advancing the point of printing downwards.[1]
Others have given answers that will probably meet your needs, but I wanted to write about why your way of thinking won’t work. The assumptions above are so engrained into people’s thinking about how strings and terminals work that I'm sure many people taught your question was odd or even naïve, I did at first.
StringBuilder doesn’t print strings to the screen. Somewhere I suspect you are calling Console.Write to print the string. StringBuilder allows you to convert non-string variables as strings and to concatenate strings together in a more efficient way than String.Format and the + operator, see Immutability and the StringBuilder class.
When you are done using StringBuilder what you have is a string of characters. It’s called a string because it is a 1D structure, one character after each other. There is nothing special about the new line characters in the string,[2] they are just characters in the list. There is nothing in the string that specifies the position the characters other that that each one comes after the previous one. When you do something like Console.Write the position of the character on the screen is defined by the implementation of that method, or the implements of the terminal, or both. They follow the conventions of our language, i.e. each character is to the right of the previous one. When Console.Write you encounters a newline it then prints the following character in the first position of the line below the current one.
If you are using String, StringBuilder and Console you can write code to create a single string with the pieces of test in the places you want so that when Console.Write follows the left to write, top to bottom conventions your text will appear correctly. This is what the other answers here do.
Alternately you could find a library which gives you more control over when text is printed on the screen. These were very popular before Graphical User Interfaces when people build interactive applications in text terminals. Two examples I remember are CRT for Pascal and Ncurses for C. If you want to investigate this approach I’d suggest doing some web searches or even another question here. Most terminal applications you see at banks, hospitals and airlines use such a library running on a VAX.
[1] This may be differently in systems setup for languages which are not like English, or not like Latin.
[2] The character or characters which reprsent a new line are different on different operating systems.
normally you cannot horizontally append to the right side of stringbuilder so maybe you roll your own extension method such as
static class SbExtensions
{
public static StringBuilder AppendRight(this StringBuilder sb, string key, string value)
{
string output = sb.ToString();
string[] splitted = output.Split("\n");
splitted[0] += key.PadRight(10, ' ');
splitted[1] += value.PadRight(10, ' ');
output = string.Join('\n', splitted);
return new StringBuilder(output);
}
}
simple solution:
StringBuilder sb2 = new StringBuilder(columns);
foreach(var data in datas)
{
sb2 = sb.AppendRight(data.ID.ToString(), data.PatName);
}
Console.WriteLine(sb.ToString());
Console.ReadLine();
complex one: dynamic
just another solution using MathNet.Numerics library at https://numerics.mathdotnet.com/
introduce an array property in your Entity class
class Device
{
public int ID { get; set; }
public string PatName { get; set; }
public string[] Array
{
get
{
return new string[] { ID.ToString(), PatName };
}
}
}
then in main method
static void Main(string[] args)
{
var datas = new List<Device>
{
new Device { ID = 1, PatName = "ABC" },
new Device { ID = 2, PatName = "DEF" },
new Device { ID = 3, PatName = "GHO" }
};
var MatrixValues = datas
.SelectMany(x => x.Array)
.Select((item, index) => new KeyValuePair<double, string>(Convert.ToDouble(index), item)).ToArray();
var matrixIndexes = MatrixValues.Select(x => x.Key);
var M = Matrix<double>.Build;
var C = M.Dense(datas.Count, datas.First().Array.Count(), matrixIndexes.ToArray());
var TR = C.Transpose();
string columns = "Device #".PadRight(10, ' ') + "\n" + "Pt.Name".PadRight(10, ' ');
StringBuilder sb = new StringBuilder(columns);
for (int i = 0; i < TR.Storage.Enumerate().Count(); i += 2)
{
sb = sb.AppendRight(MatrixValues[i].Value, MatrixValues[i + 1].Value);
}
Console.WriteLine(sb.ToString());
Console.ReadLine();
}
yea and those references
using MathNet.Numerics.LinearAlgebra;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
Output
PS: this may not be your desired solution as it is creating multiple string builders when you append new data
This should get you going:
Code
StringBuilder deviceSB = new StringBuilder();
StringBuilder patNameSB = new StringBuilder();
deviceSB.Append("Device #".PadRight(9));
patNameSB.Append("Pt.Name".PadRight(9));
foreach (var data in datas)
{
deviceSB.Append($"{data.Device}".PadLeft(2).PadRight(4));
patNameSB.Append($"{data.PatName} ");
}
deviceSB.AppendLine();
deviceSB.Append(patNameSB);
Or optional without loop
StringBuilder result = new StringBuilder();
StringBuilder s1 = new StringBuilder("Device # ".PadRight(9));
StringBuilder s2 = new StringBuilder("Pt.Name ".PadRight(9));
s1 = s1.AppendJoin(String.Empty, datas.Select(x => x.Device.PadLeft(2).PadRight(4)));
s2 = s2.AppendJoin(' ', datas.Select(x => x.PatName));
result = result.Append(s1).AppendLine().Append(s2);
Note that i took the idea of the second option from #AshkanMobayenKhiabani, but instead of using strings i stick to StringBuilder since it is much more performant than using strings!
Output
Both of previous options offer the same output:

How would I access a txt file and split the links

Alright, I have a program that grabs links off of a website and puts it into a txt BUT the links aren't separated onto their own lines and I need to somehow do that without having to manually do it myself, here is the code used to grab the links off of the website, write the links to a text file then grab the txt file and read it.
private void linkLabel1_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
var client = new WebClient();
string text = client.DownloadString("https://currentlinks.com");
File.WriteAllText("C:/ProgramData/oof.txt", text);
string searchKeyword = "https://foobar.to/showthread.php";
string fileName = "C:/ProgramData/oof.txt";
string[] textLines = File.ReadAllLines(fileName);
List<string> results = new List<string>();
foreach (string line in textLines)
{
if (line.Contains(searchKeyword))
{
results.Add(line);
}
var sb = new StringBuilder();
foreach (var item in results)
{
sb.Append(item);
}
textBox1.Text = sb.ToString();
var parsed = textBox1;
TextWriter tw = new StreamWriter("C:/ProgramData/parsed.txt");
// write lines of text to the file
tw.WriteLine(parsed);
// close the stream
tw.Close();
}
}
You are getting all the Links (URLs) in one single string. There is not straight forward way to get all the URLs individually without some assumptions.
With the sample data you shared, I assume that the URLs in the string follow simple URLs format and do not have any fancy stuff in it. They start with http and one url does not have any other http.
With above assumptions, I suggest following code.
// Sample data as shared by the OP
string data = "https://forum.to/showthread.php?tid=22305https://forum.to/showthread.php?tid=22405https://forum.to/showthread.php?tid=22318";
//Splitting the string by string `http`
var items = data.Split(new [] {"http"},StringSplitOptions.RemoveEmptyEntries).ToList();
//At this point all the strings in items collection will be without "http" at the start.
//So they will look like as following.
// s://forum.to/showthread.php?tid=22305
// s://forum.to/showthread.php?tid=22405
// s://forum.to/showthread.php?tid=22318
//So we need to add "http" at the start of each of the item as following.
items = items.Select(i => "http" + i).ToList();
// After this they will become like following.
// https://forum.to/showthread.php?tid=22305
// https://forum.to/showthread.php?tid=22405
// https://forum.to/showthread.php?tid=22318
//Now we need to create a single string with newline character between two items so
//that they represent a single line individually.
var text = String.Join("\r\n", items);
// Then write the text to the file.
File.WriteAllText("C:/ProgramData/oof.txt", text);
This should help you resolve your issue.
.Split way
Could you use yourString.Split("https://");?
Example:
//This simple example assumes that all links are https (not http)
string contents = "https://www.example.com/dogs/poodles/poodle1.htmlhttps://www.example.com/dogs/poodles/poodle2.html";
const string Prefix = "https://";
var linksWithoutPrefix = contents.Split(Prefix, StringSplitOptions.RemoveEmptyEntries);
//using System.Linq
var linksWithPrefix = linksWithoutPrefix.Select(l => Prefix + l);
foreach (var match in linksWithPrefix)
{
Console.WriteLine(match);
}
Regex way
Another option is to use reg exp.
Failed - cannot find/write the right regex ... got to go now
string contents = "http://www.example.com/dogs/poodles/poodle1.htmlhttp://www.example.com/dogs/poodles/poodle2.html";
//From https://regexr.com/
var rgx = new Regex(#"(?<Protocol>\w+):\/\/(?<Domain>[\w#][\w.:#]+)\/?[\w\.?=%&=\-#/$,]*");
var matches = rgx.Matches(contents);
foreach(var match in matches )
{
Console.WriteLine(match);
}
//This finds 'http://www.example.com/dogs/poodles/poodle1.htmlhttp' (note the htmlhttp at the end

How to compare and convert emoji characters in C#

I am trying to figure out how to check if a string contains a specfic emoji. For example, look at the following two emoji:
Bicyclist: http://unicode.org/emoji/charts/full-emoji-list.html#1f6b4
US Flag: http://unicode.org/emoji/charts/full-emoji-list.html#1f1fa_1f1f8
Bicyclist is U+1F6B4, and the US flag is U+1F1FA U+1F1F8.
However, the emoji to check for are provided to me in an array like this, with just the numerical value in strings:
var checkFor = new string[] {"1F6B4","1F1FA-1F1F8"};
How can I convert those array values into actual unicode characters and check to see if a string contains them?
I can get something working for the Bicyclist, but for the US flag I'm stumped.
For the Bicyclist, I'm doing the following:
const string comparisonStr = "..."; //some string containing text and emoji
var hexVal = Convert.ToInt32(checkFor[0], 16);
var strVal = Char.ConvertFromUtf32(hexVal);
//now I can successfully do the following check
var exists = comparisonStr.Contains(strVal);
But this will not work with the US Flag because of the multiple code points.
You already got past the hard part. All you were missing is parsing the value in the array, and combining the 2 unicode characters before performing the check.
Here is a sample program that should work:
static void Main(string[] args)
{
const string comparisonStr = "bicyclist: \U0001F6B4, and US flag: \U0001F1FA\U0001F1F8"; //some string containing text and emoji
var checkFor = new string[] { "1F6B4", "1F1FA-1F1F8" };
foreach (var searchStringInHex in checkFor)
{
string searchString = string.Join(string.Empty, searchStringInHex.Split('-')
.Select(hex => char.ConvertFromUtf32(Convert.ToInt32(hex, 16))));
if (comparisonStr.Contains(searchString))
{
Console.WriteLine($"Found {searchStringInHex}!");
}
}
}

Removing duplicate substrings within a string in C#

How can I remove duplicate substrings within a string? so for instance if I have a string like smith:rodgers:someone:smith:white then how can I get a new string that has the extra smith removed like smith:rodgers:someone:white. Also I'd like to keep the colons even though they are duplicated.
many thanks
string input = "smith:rodgers:someone:smith:white";
string output = string.Join(":", input.Split(':').Distinct().ToArray());
Of course this code assumes that you're only looking for duplicate "field" values. That won't remove "smithsmith" in the following string:
"smith:rodgers:someone:smithsmith:white"
It would be possible to write an algorithm to do that, but quite difficult to make it efficient...
Something like this:
string withoutDuplicates = String.Join(":", myString.Split(':').Distinct().ToArray());
Assuming the format of that string:
var theString = "smith:rodgers:someone:smith:white";
var subStrings = theString.Split(new char[] { ':' });
var uniqueEntries = new List<string>();
foreach(var item in subStrings)
{
if (!uniqueEntries.Contains(item))
{
uniqueEntries.Add(item);
}
}
var uniquifiedStringBuilder = new StringBuilder();
foreach(var item in uniqueEntries)
{
uniquifiedStringBuilder.AppendFormat("{0}:", item);
}
var uniqueString = uniquifiedStringBuilder.ToString().Substring(0, uniquifiedStringBuilder.Length - 1);
Is rather long-winded but shows the process to get from one to the other.
not sure why you want to keep the duplicate colons. if you are expecting the output to be "smith:rodgers:someone::white" try this code:
public static string RemoveDuplicates(string input)
{
string output = string.Empty;
System.Collections.Specialized.StringCollection unique = new System.Collections.Specialized.StringCollection();
string[] parts = input.Split(':');
foreach (string part in parts)
{
output += ":";
if (!unique.Contains(part))
{
unique.Add(part);
output += part;
}
}
output = output.Substring(1);
return output;
}
ofcourse i've not checked for null input, but i'm sure u'll do it ;)

Categories