Remove accents from a text file - c#

I have issues with removing accents from a text file program replaces characters with diacritics to ? Here is my code:
private void button3_Click(object sender, EventArgs e)
{
if (radioButton3.Checked)
{
byte[] tmp;
tmp = System.Text.Encoding.GetEncoding("ISO-8859-1").GetBytes(richTextBox1.Text);
richTextBox2.Text = System.Text.Encoding.UTF8.GetString(tmp);
}
}

Taken from here: https://stackoverflow.com/a/249126/3047078
static string RemoveDiacritics(string text)
{
var normalizedString = text.Normalize(NormalizationForm.FormD);
var stringBuilder = new StringBuilder();
foreach (var c in normalizedString)
{
var unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
if (unicodeCategory != UnicodeCategory.NonSpacingMark)
{
stringBuilder.Append(c);
}
}
return stringBuilder.ToString().Normalize(NormalizationForm.FormC);
}
usage:
string result = RemoveDiacritics("včľťšľžšžščýščýťčáčáčťáčáťýčťž");
results in vcltslzszscyscytcacactacatyctz

richTextBox1.Text = "včľťšľžšžščýščýťčáčáčťáčáťýčťž";
string text1 = richTextBox1.Text.Normalize(NormalizationForm.FormD);
string pattern = #"\p{M}";
string text2 = Regex.Replace(text1, pattern, "�");
richTextBox2.Text = text2;
First normalize the string.
Then with a regular expression replace all diacritics. Pattern \p{M} is Unicode Category - All diacritic marks.

Related

c# writing text to different selected listboxes and refresh

I am pretty new to c# and have small task to do. At the moment I have code which reads directories and add.ini files to different listboxes depending on statement in it (this part seems to work perfectly for me). Now then I select item in my listbox2 and press button it write specific word in my selected .ini file. (this part seems to work as well), but here comes my problem... I want to write other word to .ini file with same button from listbox1. I cant wrap my head around how to make it with same button. I figure my problem is somewhere here. Also maybe you know how to update my listboxes after I change statement in .ini files? Thanks
private void button1_Click(object sender, EventArgs e)
{
var items = listBox2.SelectedItems;
//var items1 = listBox1.SelectedItems;
foreach (var item in items)
{
string fileName = listBox2.GetItemText(item);
string text = File.ReadAllText(fileName);
My entire code
namespace WindowsFormsApp5
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
string rootdir = #"C:\Users\isaced1\Desktop\test";
string[] files = Directory.GetFiles(rootdir, "*.ini", SearchOption.AllDirectories);
foreach (string item in files)
{
string fileContents = File.ReadAllText(item);
const string PATTERN = #"OTPM = true";
Match match = Regex.Match(fileContents, PATTERN, RegexOptions.IgnoreCase);
if (match.Success)
{
listBox1.Items.Add(item);
listBox1.ForeColor = Color.Green;
}
else
{
listBox2.Items.Add(item);
listBox2.ForeColor = Color.Red;
}
}
}
private void button1_Click(object sender, EventArgs e)
{
//listBox2.SelectedItem = listBox1.SelectedItem;
var items = listBox2.SelectedItems;
//var items1 = listBox1.SelectedItems;
foreach (var item in items)
{
string fileName = listBox2.GetItemText(item);
string text = File.ReadAllText(fileName);
const string PATTERN = #"OTPM = (?<Number>[false]+)";
Match match = Regex.Match(text, PATTERN, RegexOptions.IgnoreCase);
string otpmT = "true";
string otpmF = "false";
if (match.Success)
{
int index = match.Groups["Number"].Index;
int length = match.Groups["Number"].Length;
text = text.Remove(index, length);
text = text.Insert(index, otpmT.ToString());
File.WriteAllText(fileName, text);
Process.Start(fileName);
}
else
{
int index = match.Groups["Number"].Index;
int length = match.Groups["Number"].Length;
text = text.Remove(index, length);
text = text.Insert(index, otpmF.ToString());
File.WriteAllText(fileName, text);
Process.Start(fileName);
}
}
}
}
If i understand correctly, you just have to do the same thing again but for your other listbox :
private void button1_Click(object sender, EventArgs e)
{
// ------------------------------ Listbox2
var items = listBox2.SelectedItems;
foreach (var item in items)
{
string fileName = listBox2.GetItemText(item);
string text = File.ReadAllText(fileName);
const string PATTERN = #"OTPM = (?<Number>[false]+)";
Match match = Regex.Match(text, PATTERN, RegexOptions.IgnoreCase);
string otpmT = "true";
string otpmF = "false";
if (match.Success)
{
int index = match.Groups["Number"].Index;
int length = match.Groups["Number"].Length;
text = text.Remove(index, length);
text = text.Insert(index, otpmT.ToString());
File.WriteAllText(fileName, text);
Process.Start(fileName);
}
else
{
int index = match.Groups["Number"].Index;
int length = match.Groups["Number"].Length;
text = text.Remove(index, length);
text = text.Insert(index, otpmF.ToString());
File.WriteAllText(fileName, text);
Process.Start(fileName);
}
}
// ------------------------------ Listbox1
var items1 = listBox1.SelectedItems;
foreach (var item in items1)
{
string fileName = listBox1.GetItemText(item);
string text = File.ReadAllText(fileName);
const string PATTERN = #"OTPM = (?<Number>[false]+)";
Match match = Regex.Match(text, PATTERN, RegexOptions.IgnoreCase);
string otpmT = "true";
string otpmF = "false";
if (match.Success)
{
int index = match.Groups["Number"].Index;
int length = match.Groups["Number"].Length;
text = text.Remove(index, length);
text = text.Insert(index, otpmT.ToString());
File.WriteAllText(fileName, text);
Process.Start(fileName);
}
else
{
int index = match.Groups["Number"].Index;
int length = match.Groups["Number"].Length;
text = text.Remove(index, length);
text = text.Insert(index, otpmF.ToString());
File.WriteAllText(fileName, text);
Process.Start(fileName);
}
}
}
In this code I duplicate the code for listbox2, you just have to change what to check / what to write.
When the button is clicked both listboxes are proceed.

c# read value from file and ignore everything except for value

I have a program that I need to have a config file with value to be
displayed in my program. Inside my text file I have Wireless = 1
& Cradle = 2.
In my program I will have a label populate the release number only and not the other
characters.
private string searchFile(String path, String searchText)
{
string regex=#"(?i)(?<="+searchText+#"\s*=\s*)\d+";
return Regex.Match(File.ReadAllText(path),regex).Value;//version number
}
This is what I tried and it gives the correct output
string s="Wireless = 1 Cradle = 2";
Regex.Match(s,#"(?i)(?<=Wireless\s*=\s*)\d+").Value;//1
public static string match;
public static string ReadAllText(string path)
{
using (var r = new System.IO.StreamReader(path))
{
return r.ReadToEnd();
}
}
private string Wireless(String path, String searchText)
{
string regex = #"(?i)(?<=" + searchText + #"\s*=\s*)\d+";
match = Regex.Match(ReadAllText(path), regex).Value;
label1.Text = match;
return match;
}
private string Cradle(String path, String searchText)
{
string regex = #"(?i)(?<=" + searchText + #"\s*=\s*)\d+";
match = Regex.Match(ReadAllText(path), regex).Value;
label2.Text = match;
return match;
}
private void button1_Click(object sender, EventArgs e)
{
Wireless(#"\Storage Card\changelog.txt","Wireless");
Cradle(#"\Storage Card\changelog.txt", "Cradle");
}

String not getting decoded

I have a DecXpress report and the datasource shows a filed where the data is comming something like
PRODUCT - APPLE<BR/>ITEM NUMBER - 23454</BR>LOT NUMBER 3343 <BR/>
Now that is how it is showing in a cell, so i decided to decoded, but nothing is working, i tried HttpUtility.HtmlDecode and here i am trying WebUtility.HtmlDecode.
private void xrTableCell9_BeforePrint(object sender, System.Drawing.Printing.PrintEventArgs e)
{
XRTableCell cell = sender as XRTableCell;
string _description = WebUtility.HtmlDecode(Convert.ToString(GetCurrentColumnValue("Description")));
cell.Text = _description;
}
How can I decode the value of this column in the datasource?.
Thank you
If you need to show the description with the < /> also, you need to use HtmlEncode.
If you need to extract the text from that html
public static string ExtractTextFromHtml(this string text)
{
if (String.IsNullOrEmpty(text))
return text;
var sb = new StringBuilder();
var doc = new HtmlDocument();
doc.LoadHtml(text);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()"))
{
if (!String.IsNullOrWhiteSpace(node.InnerText))
sb.Append(HtmlEntity.DeEntitize(node.InnerText.Trim()) + " ");
}
return sb.ToString();
}
And you need HtmlAgilityPack
To remove the br tags:
var str = Convert.ToString(GetCurrentColumnValue("Description"));
Regex.Replace(str, #"</?\s?br\s?/?>", System.Environment.NewLine, RegexOptions.IgnoreCase);

Remove words from string c#

I am working on a ASP.NET 4.0 web application, the main goal for it to do is go to the URL in the MyURL variable then read it from top to bottom, search for all lines that start with "description" and only keep those while removing all HTML tags. What I want to do next is remove the "description" text from the results afterwords so I have just my device names left. How would I do this?
protected void parseButton_Click(object sender, EventArgs e)
{
MyURL = deviceCombo.Text;
WebRequest objRequest = HttpWebRequest.Create(MyURL);
objRequest.Credentials = CredentialCache.DefaultCredentials;
using (StreamReader objReader = new StreamReader(objRequest.GetResponse().GetResponseStream()))
{
originalText.Text = objReader.ReadToEnd();
}
//Read all lines of file
String[] crString = { "<BR> " };
String[] aLines = originalText.Text.Split(crString, StringSplitOptions.RemoveEmptyEntries);
String noHtml = String.Empty;
for (int x = 0; x < aLines.Length; x++)
{
if (aLines[x].Contains(filterCombo.SelectedValue))
{
noHtml += (RemoveHTML(aLines[x]) + "\r\n");
}
}
//Print results to textbox
resultsBox.Text = String.Join(Environment.NewLine, noHtml);
}
public static string RemoveHTML(string text)
{
text = text.Replace(" ", " ").Replace("<br>", "\n");
var oRegEx = new System.Text.RegularExpressions.Regex("<[^>]+>");
return oRegEx.Replace(text, string.Empty);
}
Ok so I figured out how to remove the words through one of my existing functions:
public static string RemoveHTML(string text)
{
text = text.Replace(" ", " ").Replace("<br>", "\n").Replace("description", "").Replace("INFRA:CORE:", "")
.Replace("RESERVED", "")
.Replace(":", "")
.Replace(";", "")
.Replace("-0/3/0", "");
var oRegEx = new System.Text.RegularExpressions.Regex("<[^>]+>");
return oRegEx.Replace(text, string.Empty);
}
public static void Main(String[] args)
{
string str = "He is driving a red car.";
Console.WriteLine(str.Replace("red", "").Replace(" ", " "));
}
Output:
He is driving a car.
Note: In the second Replace its a double space.
Link : https://i.stack.imgur.com/rbluf.png
Try this.It will remove all occurrence of the word which you want to remove.
Try something like this, using LINQ:
List<string> lines = new List<string>{
"Hello world",
"Description: foo",
"Garbage:baz",
"description purple"};
//now add all your lines from your html doc.
if (aLines[x].Contains(filterCombo.SelectedValue))
{
lines.Add(RemoveHTML(aLines[x]) + "\r\n");
}
var myDescriptions = lines.Where(x=>x.ToLower().BeginsWith("description"))
.Select(x=> x.ToLower().Replace("description",string.Empty)
.Trim());
// you now have "foo" and "purple", and anything else.
You may have to adjust for colons, etc.
void Main()
{
string test = "<html>wowzers description: none <div>description:a1fj391</div></html>";
IEnumerable<string> results = getDescriptions(test);
foreach (string result in results)
{
Console.WriteLine(result);
}
//result: none
// a1fj391
}
static Regex MyRegex = new Regex(
"description:\\s*(?<value>[\\d\\w]+)",
RegexOptions.Compiled);
IEnumerable<string> getDescriptions(string html)
{
foreach(Match match in MyRegex.Matches(html))
{
yield return match.Groups["value"].Value;
}
}
Adapted From Code Project
string value = "ABC - UPDATED";
int index = value.IndexOf(" - UPDATED");
if (index != -1)
{
value = value.Remove(index);
}
It will print ABC without - UPDATED

Format String "Hello\World" to "HelloWorld"

I loop the value on first column each row of datagridview, and the format has "\" in the middle, how do we convert convert the string without "\"
ex.
"Hello\World" to "HelloWorld"
"Hi\There" to "HiThere""
etc
String handling
string hello = "Hello\\World";
string helloWithoutBackslashes = hello.Replace("\\",string.Empty);
or, using the # operator
string hi = #"Hi\There";
string hiWithoutBackslashes = hi.Replace(#"\",string.Empty);
I thought I would mix it up a bit.
public class StringCleaner
{
private readonly string dirtyString;
public StringCleaner(string dirtyString)
{
this.dirtyString = dirtyString;
}
public string Clean()
{
using (var sw = new System.IO.StringWriter())
{
foreach (char c in dirtyString)
{
if (c != '\\') sw.Write(c);
}
return sw.ToString();
}
}
}

Categories