How to convert greek characters to HTML characters - c#

I would like to be able to do this kind of operation:
var specialCharactersString = "αβ";
var encodedString = WebUtility.HtmlEncode(specialCharactersString);
Console.WriteLine(encodedString); // result: αβ
We work with an external database that stores data using both notations αβ and αβ. We want to be able to query both terms when the end-user use αβ.
So far, I tried:
WebUtility.HtmlEncode
HttpUtility.HtmlEncode
Encoding.GetEncoding(1253)

Thanks to #claudiom248, the answer was in another Stack Overflow post.
How to convert currency symbol to corresponding HTML entity
https://github.com/degant/web-utility-wrapper/blob/master/WebUtilityWrapper.cs

unicode characters and html has been a problem all the time. Here is a helper I use. Hope this helps.
Update: The source is from https://www.codeproject.com/Articles/20255/Full-HTML-Character-Encoding-in-C with very minor modification.
specialCharactersString.HtmlEncode()
public static class TextHelpers {
public static string HtmlEncode(this string text)
{
var chars = System.Web.HttpUtility.HtmlEncode(text).ToCharArray();
System.Text.StringBuilder result = new System.Text.StringBuilder(text.Length + (int)(text.Length * 0.1));
foreach (char c in chars)
{
int ansiValue = Convert.ToInt32(c);
if (ansiValue > 127)
result.AppendFormat("&#{0};", ansiValue);
else
result.Append(c);
}
return result.ToString();
}
}

As mentioned by claudiom248, the .NET Framework libraries cannot properly map high ASCII html entity characters. You can certainly pull in a 3rd party library, but if you'd like to avoid the additional cost and/or if you only have a small subset of characters that you always want to handle, you can maintain a simple dictionary lookup.
void Main()
{
var specialCharactersString = "αβ";
var sb = new StringBuilder();
foreach (var specialChar in specialCharactersString)
{
var valueExists = _dict.TryGetValue(specialChar, out var mappedSpecialChar);
if (valueExists)
{
sb.Append(mappedSpecialChar);
}
}
Console.WriteLine(sb.ToString());
}
private Dictionary<char, string> _dict = new Dictionary<char, string>
{
{ 'α', "α" },
{ 'β', "β" }
};
This will output αβ as expected.

Related

How to build a unicode string with emojis in c#?

I've been using the following code to translate unicode parts that are taken from a text file in a format of string array ["1F3F3", "FE0F", "200D", "1F308"]. The mentioned unicode parts are a sample of 🏳️‍🌈 emoji and are taken from unicode.org resource(#1553 on the page).
public static void PrintEmoji(params string[] unicodeParts)
{
var unicodeBuilder = new StringBuilder();
foreach (var unicodePart in unicodeParts)
{
unicodeBuilder.Append((char) Convert.ToInt32(unicodePart, 16));
}
if(unicodeBuilder.ToString() is var unicodeResult && !string.IsNullOrWhiteSpace(unicodeResult))
Console.WriteLine(unicodeResult);
}
But this code only works for UTF-16 code units, for example 😀 (U+1F600), and not for unicode code part. How should i modify my method to be able to work with unicode code parts as well?
Thanks to JosefZ, the following solution seems to work fine.
public static void PrintEmoji(params string[] unicodeParts)
{
var unicodeBuilder = new StringBuilder();
foreach (var unicodePart in unicodeParts)
{
unicodeBuilder.Append(char.ConvertFromUtf32(Convert.ToInt32(unicodePart,16)));
}
if(unicodeBuilder.ToString() is var unicodeResult && !string.IsNullOrWhiteSpace(unicodeResult))
Console.WriteLine(unicodeResult);
}

Is there any way to append StringBuilder horizontally in C#?

I am trying to append two StringBuilders so that they produce something like:
Device # 1 2 3
Pt.Name ABC DEF GHI
what I have tried is:
class Program
{
class Device
{
public int ID { get; set; }
public string PatName { get; set; }
}
static void Main(string[] args)
{
var datas = new List<Device>
{
new Device { ID = 1, PatName = "ABC" },
new Device { ID = 2, PatName = "DEF" },
new Device { ID = 3, PatName = "GHO" }
};
// there is a collection which has all this information
StringBuilder sb = new StringBuilder();
sb.AppendFormat("{0} {1}", "Device #", "Pt.Name").AppendLine();
foreach (var data in datas)
{
var deviceId = data.ID;
var patName = data.PatName;
sb.AppendFormat("{0} {1}", deviceId, patName).AppendLine();
}
Console.WriteLine(sb);
}
}
but it is printing it in vertical manner, like
Device # Pt.Name
1 ABC
2 DEF
3 GHI
and if I remove that last AppendLine(); it is appending it at the end in the same line.
I want to use only one stringbuilder followed by only one foreach loop.
1.You could do it like:
StringBuilder sb=new StringBuilder();
sb.Append("Device #");
foreach(var data in datas)
sb.Append($" {data.deviceId}");
sb.Append("PT.Name");
foreach(var data in datas)
sb.Append($" {data.PatName}");
2.if you want to loop only once then you can use 2 StringBuilders:
StringBuilder sb1=new StringBuilder();
StringBuilder sb2=new StringBuilder();
sb1.Append("Device #");
sb2.Append("Pt.Name");
foreach(var data in datas)
{
sb1.Append($" {data.deviceId}");
sb2.Append($" {data.patName}");
}
sb1.Append(sb2.ToString());
3.You could also use string.Join() which also relies on StringBuilder to write a one-liner but however this way you have extra select statements:
string result = $"Device # {string.Join(" ",datas.Select(x => x.deviceId))}\r\nPt.Name {string.Join(" ",datas.Select(x => x.patName))}";
I love your question because it is based on avoiding these two assumption, 1) that strings are always printed left to right and 2) newlines always result in advancing the point of printing downwards.[1]
Others have given answers that will probably meet your needs, but I wanted to write about why your way of thinking won’t work. The assumptions above are so engrained into people’s thinking about how strings and terminals work that I'm sure many people taught your question was odd or even naïve, I did at first.
StringBuilder doesn’t print strings to the screen. Somewhere I suspect you are calling Console.Write to print the string. StringBuilder allows you to convert non-string variables as strings and to concatenate strings together in a more efficient way than String.Format and the + operator, see Immutability and the StringBuilder class.
When you are done using StringBuilder what you have is a string of characters. It’s called a string because it is a 1D structure, one character after each other. There is nothing special about the new line characters in the string,[2] they are just characters in the list. There is nothing in the string that specifies the position the characters other that that each one comes after the previous one. When you do something like Console.Write the position of the character on the screen is defined by the implementation of that method, or the implements of the terminal, or both. They follow the conventions of our language, i.e. each character is to the right of the previous one. When Console.Write you encounters a newline it then prints the following character in the first position of the line below the current one.
If you are using String, StringBuilder and Console you can write code to create a single string with the pieces of test in the places you want so that when Console.Write follows the left to write, top to bottom conventions your text will appear correctly. This is what the other answers here do.
Alternately you could find a library which gives you more control over when text is printed on the screen. These were very popular before Graphical User Interfaces when people build interactive applications in text terminals. Two examples I remember are CRT for Pascal and Ncurses for C. If you want to investigate this approach I’d suggest doing some web searches or even another question here. Most terminal applications you see at banks, hospitals and airlines use such a library running on a VAX.
[1] This may be differently in systems setup for languages which are not like English, or not like Latin.
[2] The character or characters which reprsent a new line are different on different operating systems.
normally you cannot horizontally append to the right side of stringbuilder so maybe you roll your own extension method such as
static class SbExtensions
{
public static StringBuilder AppendRight(this StringBuilder sb, string key, string value)
{
string output = sb.ToString();
string[] splitted = output.Split("\n");
splitted[0] += key.PadRight(10, ' ');
splitted[1] += value.PadRight(10, ' ');
output = string.Join('\n', splitted);
return new StringBuilder(output);
}
}
simple solution:
StringBuilder sb2 = new StringBuilder(columns);
foreach(var data in datas)
{
sb2 = sb.AppendRight(data.ID.ToString(), data.PatName);
}
Console.WriteLine(sb.ToString());
Console.ReadLine();
complex one: dynamic
just another solution using MathNet.Numerics library at https://numerics.mathdotnet.com/
introduce an array property in your Entity class
class Device
{
public int ID { get; set; }
public string PatName { get; set; }
public string[] Array
{
get
{
return new string[] { ID.ToString(), PatName };
}
}
}
then in main method
static void Main(string[] args)
{
var datas = new List<Device>
{
new Device { ID = 1, PatName = "ABC" },
new Device { ID = 2, PatName = "DEF" },
new Device { ID = 3, PatName = "GHO" }
};
var MatrixValues = datas
.SelectMany(x => x.Array)
.Select((item, index) => new KeyValuePair<double, string>(Convert.ToDouble(index), item)).ToArray();
var matrixIndexes = MatrixValues.Select(x => x.Key);
var M = Matrix<double>.Build;
var C = M.Dense(datas.Count, datas.First().Array.Count(), matrixIndexes.ToArray());
var TR = C.Transpose();
string columns = "Device #".PadRight(10, ' ') + "\n" + "Pt.Name".PadRight(10, ' ');
StringBuilder sb = new StringBuilder(columns);
for (int i = 0; i < TR.Storage.Enumerate().Count(); i += 2)
{
sb = sb.AppendRight(MatrixValues[i].Value, MatrixValues[i + 1].Value);
}
Console.WriteLine(sb.ToString());
Console.ReadLine();
}
yea and those references
using MathNet.Numerics.LinearAlgebra;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
Output
PS: this may not be your desired solution as it is creating multiple string builders when you append new data
This should get you going:
Code
StringBuilder deviceSB = new StringBuilder();
StringBuilder patNameSB = new StringBuilder();
deviceSB.Append("Device #".PadRight(9));
patNameSB.Append("Pt.Name".PadRight(9));
foreach (var data in datas)
{
deviceSB.Append($"{data.Device}".PadLeft(2).PadRight(4));
patNameSB.Append($"{data.PatName} ");
}
deviceSB.AppendLine();
deviceSB.Append(patNameSB);
Or optional without loop
StringBuilder result = new StringBuilder();
StringBuilder s1 = new StringBuilder("Device # ".PadRight(9));
StringBuilder s2 = new StringBuilder("Pt.Name ".PadRight(9));
s1 = s1.AppendJoin(String.Empty, datas.Select(x => x.Device.PadLeft(2).PadRight(4)));
s2 = s2.AppendJoin(' ', datas.Select(x => x.PatName));
result = result.Append(s1).AppendLine().Append(s2);
Note that i took the idea of the second option from #AshkanMobayenKhiabani, but instead of using strings i stick to StringBuilder since it is much more performant than using strings!
Output
Both of previous options offer the same output:

Best way to search for many guids at once in a string?

I have before me the problem of searching for 90,000 GUIDs in a string. I need to get all instances of each GUID. Technically, I need to replace them too, but that's another story.
Currently, I'm searching for each one individually, using a regex. But it occurred to me that I could get better performance by searching for them all together. I've read about tries in the past and have never used one but it occurred to me that I could construct a trie of all 90,000 GUIDs and use that to search.
Alternatively, perhaps there is an existing library in .NET that can do this. It crossed my mind that there is no reason why I shouldn't be able to get good performance with just a giant regex, but this appears not to work.
Beyond that, perhaps there is some clever trick I could use relating to GUID structure to get better results.
This isn't really a crucial problem for me, but I thought I might be able to learn something.
Have a look at the Rabin-Karp string search algorithm. It's well suited for multi-pattern searching in a string:
http://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_string_search_algorithm#Rabin.E2.80.93Karp_and_multiple_pattern_search
You will not get good performance with RegEx because it's performance is inherently poor. Additionally, if all the GUID's share the same format you should only need one RegEx. and regex.Replace(input, replacement); would do it.
If you have the list of guids in memory already the performance would be better by looping over that list and doing calling String.Replace like so
foreach(string guid in guids)
inputString.replace(guid, replacement);
I developed a method for replacing a large number of strings a while back, that may be useful:
A better way to replace many strings - obfuscation in C#
Another option would be to use a regular expression to find all GUIDs in the string, then loop through them and check if each is part of your set of GUIDs.
Basic example, using a Dictionary for fast lookup of the GUIDs:
Dictionary<string, string> guids = new Dictionary<string, string>();
guids.Add("3f74a071-54fc-10de-0476-a6b991f0be76", "(replacement)");
string text = "asdf 3f74a071-54fc-10de-0476-a6b991f0be76 lkaq2hlqwer";
text = Regex.Replace(text, #"[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}", m => {
string replacement;
if (guids.TryGetValue(m.Value, out replacement)) {
return replacement;
} else {
return m.Value;
}
});
Console.WriteLine(text);
Output:
asdf (replacement) lkaq2hlqwer
OK, this looks good. So to be clear here is the original code, which took 65s to run on the example string:
var unusedGuids = new HashSet<Guid>(oldToNewGuid.Keys);
foreach (var guid in oldToNewGuid) {
var regex = guid.Key.ToString();
if (!Regex.IsMatch(xml, regex))
unusedGuids.Add(guid.Key);
else
xml = Regex.Replace(xml, regex, guid.Value.ToString());
}
The new code is as follows and takes 6.7s:
var unusedGuids = new HashSet<Guid>(oldToNewGuid.Keys);
var guidHashes = new MultiValueDictionary<int, Guid>();
foreach (var guid in oldToNewGuid.Keys) {
guidHashes.Add(guid.ToString().GetHashCode(), guid);
}
var indices = new List<Tuple<int, Guid>>();
const int guidLength = 36;
for (int i = 0; i < xml.Length - guidLength; i++) {
var substring = xml.Substring(i, guidLength);
foreach (var value in guidHashes.GetValues(substring.GetHashCode())) {
if (value.ToString() == substring) {
unusedGuids.Remove(value);
indices.Add(new Tuple<int, Guid>(i, value));
break;
}
}
}
var builder = new StringBuilder();
int start = 0;
for (int i = 0; i < indices.Count; i++) {
var tuple = indices[i];
var substring = xml.Substring(start, tuple.Item1 - start);
builder.Append(substring);
builder.Append(oldToNewGuid[tuple.Item2].ToString());
start = tuple.Item1 + guidLength;
}
builder.Append(xml.Substring(start, xml.Length - start));
xml = builder.ToString();

Removing duplicate substrings within a string in C#

How can I remove duplicate substrings within a string? so for instance if I have a string like smith:rodgers:someone:smith:white then how can I get a new string that has the extra smith removed like smith:rodgers:someone:white. Also I'd like to keep the colons even though they are duplicated.
many thanks
string input = "smith:rodgers:someone:smith:white";
string output = string.Join(":", input.Split(':').Distinct().ToArray());
Of course this code assumes that you're only looking for duplicate "field" values. That won't remove "smithsmith" in the following string:
"smith:rodgers:someone:smithsmith:white"
It would be possible to write an algorithm to do that, but quite difficult to make it efficient...
Something like this:
string withoutDuplicates = String.Join(":", myString.Split(':').Distinct().ToArray());
Assuming the format of that string:
var theString = "smith:rodgers:someone:smith:white";
var subStrings = theString.Split(new char[] { ':' });
var uniqueEntries = new List<string>();
foreach(var item in subStrings)
{
if (!uniqueEntries.Contains(item))
{
uniqueEntries.Add(item);
}
}
var uniquifiedStringBuilder = new StringBuilder();
foreach(var item in uniqueEntries)
{
uniquifiedStringBuilder.AppendFormat("{0}:", item);
}
var uniqueString = uniquifiedStringBuilder.ToString().Substring(0, uniquifiedStringBuilder.Length - 1);
Is rather long-winded but shows the process to get from one to the other.
not sure why you want to keep the duplicate colons. if you are expecting the output to be "smith:rodgers:someone::white" try this code:
public static string RemoveDuplicates(string input)
{
string output = string.Empty;
System.Collections.Specialized.StringCollection unique = new System.Collections.Specialized.StringCollection();
string[] parts = input.Split(':');
foreach (string part in parts)
{
output += ":";
if (!unique.Contains(part))
{
unique.Add(part);
output += part;
}
}
output = output.Substring(1);
return output;
}
ofcourse i've not checked for null input, but i'm sure u'll do it ;)

Java equivalents of C# String.Format() and String.Join()

I know this is a bit of a newbie question, but are there equivalents to C#'s string operations in Java?
Specifically, I'm talking about String.Format and String.Join.
The Java String object has a format method (as of 1.5), but no join method.
To get a bunch of useful String utility methods not already included you could use org.apache.commons.lang.StringUtils.
String.format. As for join, you need to write your own:
static String join(Collection<?> s, String delimiter) {
StringBuilder builder = new StringBuilder();
Iterator<?> iter = s.iterator();
while (iter.hasNext()) {
builder.append(iter.next());
if (!iter.hasNext()) {
break;
}
builder.append(delimiter);
}
return builder.toString();
}
The above comes from http://snippets.dzone.com/posts/show/91
Guava comes with the Joiner class.
import com.google.common.base.Joiner;
Joiner.on(separator).join(data);
As of Java 8, join() is now available as two class methods on the String class. In both cases the first argument is the delimiter.
You can pass individual CharSequences as additional arguments:
String joined = String.join(", ", "Antimony", "Arsenic", "Aluminum", "Selenium");
// "Antimony, Arsenic, Alumninum, Selenium"
Or you can pass an Iterable<? extends CharSequence>:
List<String> strings = new LinkedList<String>();
strings.add("EX");
strings.add("TER");
strings.add("MIN");
strings.add("ATE");
String joined = String.join("-", strings);
// "EX-TER-MIN-ATE"
Java 8 also adds a new class, StringJoiner, which you can use like this:
StringJoiner joiner = new StringJoiner("&");
joiner.add("x=9");
joiner.add("y=5667.7");
joiner.add("z=-33.0");
String joined = joiner.toString();
// "x=9&y=5667.7&z=-33.0"
TextUtils.join is available on Android
You can also use variable arguments for strings as follows:
String join (String delim, String ... data) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < data.length; i++) {
sb.append(data[i]);
if (i >= data.length-1) {break;}
sb.append(delim);
}
return sb.toString();
}
As for join, I believe this might look a little less complicated:
public String join (Collection<String> c) {
StringBuilder sb=new StringBuilder();
for(String s: c)
sb.append(s);
return sb.toString();
}
I don't get to use Java 5 syntax as much as I'd like (Believe it or not, I've been using 1.0.x lately) so I may be a bit rusty, but I'm sure the concept is correct.
edit addition: String appends can be slowish, but if you are working on GUI code or some short-running routine, it really doesn't matter if you take .005 seconds or .006, so if you had a collection called "joinMe" that you want to append to an existing string "target" it wouldn't be horrific to just inline this:
for(String s : joinMe)
target += s;
It's quite inefficient (and a bad habit), but not anything you will be able to perceive unless there are either thousands of strings or this is inside a huge loop or your code is really performance critical.
More importantly, it's easy to remember, short, quick and very readable. Performance isn't always the automatic winner in design choices.
Here is a pretty simple answer. Use += since it is less code and let the optimizer convert it to a StringBuilder for you. Using this method, you don't have to do any "is last" checks in your loop (performance improvement) and you don't have to worry about stripping off any delimiters at the end.
Iterator<String> iter = args.iterator();
output += iter.hasNext() ? iter.next() : "";
while (iter.hasNext()) {
output += "," + iter.next();
}
I didn't want to import an entire Apache library to add a simple join function, so here's my hack.
public String join(String delim, List<String> destinations) {
StringBuilder sb = new StringBuilder();
int delimLength = delim.length();
for (String s: destinations) {
sb.append(s);
sb.append(delim);
}
// we have appended the delimiter to the end
// in the previous for-loop. Let's now remove it.
if (sb.length() >= delimLength) {
return sb.substring(0, sb.length() - delimLength);
} else {
return sb.toString();
}
}
If you wish to join (concatenate) several strings into one, you should use a StringBuilder. It is far better than using
for(String s : joinMe)
target += s;
There is also a slight performance win over StringBuffer, since StringBuilder does not use synchronization.
For a general purpose utility method like this, it will (eventually) be called many times in many situations, so you should make it efficient and not allocate many transient objects. We've profiled many, many different Java apps and almost always find that string concatenation and string/char[] allocations take up a significant amount of time/memory.
Our reusable collection -> string method first calculates the size of the required result and then creates a StringBuilder with that initial size; this avoids unecessary doubling/copying of the internal char[] used when appending strings.
I wrote own:
public static String join(Collection<String> col, String delim) {
StringBuilder sb = new StringBuilder();
Iterator<String> iter = col.iterator();
if (iter.hasNext())
sb.append(iter.next().toString());
while (iter.hasNext()) {
sb.append(delim);
sb.append(iter.next().toString());
}
return sb.toString();
}
but Collection isn't supported by JSP, so for tag function I wrote:
public static String join(List<?> list, String delim) {
int len = list.size();
if (len == 0)
return "";
StringBuilder sb = new StringBuilder(list.get(0).toString());
for (int i = 1; i < len; i++) {
sb.append(delim);
sb.append(list.get(i).toString());
}
return sb.toString();
}
and put to .tld file:
<?xml version="1.0" encoding="UTF-8"?>
<taglib version="2.1" xmlns="http://java.sun.com/xml/ns/javaee"
<function>
<name>join</name>
<function-class>com.core.util.ReportUtil</function-class>
<function-signature>java.lang.String join(java.util.List, java.lang.String)</function-signature>
</function>
</taglib>
and use it in JSP files as:
<%#taglib prefix="funnyFmt" uri="tag:com.core.util,2013:funnyFmt"%>
${funnyFmt:join(books, ", ")}
StringUtils is a pretty useful class in the Apache Commons Lang library.
There is MessageFormat.format() which works like C#'s String.Format().
I see a lot of overly complex implementations of String.Join here. If you don't have Java 1.8, and you don't want to import a new library the below implementation should suffice.
public String join(Collection<String> col, String delim) {
StringBuilder sb = new StringBuilder();
for ( String s : col ) {
if ( sb.length() != 0 ) sb.append(delim);
sb.append(s);
}
return sb.toString();
}
ArrayList<Double> j=new ArrayList<>;
j.add(1);
j.add(.92);
j.add(3);
String ntop=j.toString(); //ntop= "[1, 0.92, 3]"
So basically, the String ntop stores the value of the entire collection with comma separators and brackets.
I would just use the string concatenation operator "+" to join two strings. s1 += s2;

Categories