Counting/sorting characters in a text file - c#

I am trying to write a program that reads a text file, sorts it by character, and keeps track of how many times each character appears in the document. This is what I have so far.
class Program
{
static void Main(string[] args)
{
CharFrequency[] Charfreq = new CharFrequency[128];
try
{
string line;
System.IO.StreamReader file = new System.IO.StreamReader(#"C:\Users\User\Documents\Visual Studio 2013\Projects\Array_Project\wap.txt");
while ((line = file.ReadLine()) != null)
{
int ch = file.Read();
if (Charfreq.Contains(ch))
{
}
}
file.Close();
Console.ReadLine();
}
catch (Exception e)
{
Console.WriteLine("The process failed: {0}", e.ToString());
}
}
}
My question is, what should go in the if statement here?
I also have a Charfrequency class, which I'll include here in case it is helpful/necessary that I include it (and yes, it is necessary that I use an array versus a list or arraylist).
public class CharFrequency
{
private char m_character;
private long m_count;
public CharFrequency(char ch)
{
Character = ch;
Count = 0;
}
public CharFrequency(char ch, long charCount)
{
Character = ch;
Count = charCount;
}
public char Character
{
set
{
m_character = value;
}
get
{
return m_character;
}
}
public long Count
{
get
{
return m_count;
}
set
{
if (value < 0)
value = 0;
m_count = value;
}
}
public void Increment()
{
m_count++;
}
public override bool Equals(object obj)
{
bool equal = false;
CharFrequency cf = new CharFrequency('\0', 0);
cf = (CharFrequency)obj;
if (this.Character == cf.Character)
equal = true;
return equal;
}
public override int GetHashCode()
{
return m_character.GetHashCode();
}
public override string ToString()
{
String s = String.Format("'{0}' ({1}) = {2}", m_character, (byte)m_character, m_count);
return s;
}
}

Have a look at this post.
https://codereview.stackexchange.com/questions/63872/counting-the-number-of-character-occurrences
It uses LINQ to achieve your goal

You shouldn't use Contains
first you need to initialize your Charfreq array:
CharFrequency[] Charfreq = new CharFrequency[128];
for (int i = 0; i < Charferq.Length; i++)
{
Charfreq[i] = new CharFrequency((char)i);
}
try
then you can
int ch;
// -1 means that there are no more characters to read,
// otherwise ch is the char read
while ((ch = file.Read()) != -1)
{
CharFrequency cf = new CharFrequency((char)ch);
// This works because CharFrequency overloads the
// Equals method, and the Equals method checks only
// for the Character property of CharFrequency
int ix = Array.IndexOf(Charfreq, cf);
// if there is the "right" charfrequency
if (ix != -1)
{
Charfreq[ix].Increment();
}
}
Note that this isn't the way I would write the program. This is the minimum changes needed to make your program working.
As a sidenote, this program will count the "frequency" of ASCII characters (characters with code <= 127)
CharFrequency cf = new CharFrequency('\0', 0);
cf = (CharFrequency)obj;
And this is an useless initialization:
CharFrequency cf = (CharFrequency)obj;
is enough, otherwise you are creating a CharFrequency just to discard it the line below.

A dictionary is well suited for a task like this. You didn't say which character set and encoding the file was in. So, because Unicode is so common, let's assume the Unicode character set and UTF-8 encoding. (After all, it is the default for .NET, Java, JavaScript, HTML, XML,….) If that's not the case then read the file using the applicable encoding and fix your code because you currently are using UTF-8 in your StreamReader.
Next comes iterating across the "characters". And then incrementing the count for a "character" in the dictionary as it is seen in the text.
Unicode does have a few complex features. One is combining characters, where a base character can be overlaid with diacritics etc. Users view such combinations as one "character", or, as Unicode calls them, graphemes. Thankfully, .NET gives is the StringInfo class that iterates over them as a "text element."
So, if you think about it, using an array would be quite difficult. You'd have to build your own dictionary on top of your array.
The example below uses a Dictionary and is runnable using a LINQPad script. After it creates the dictionary, it orders and dumps it with a nice display.
var path = Path.GetTempFileName();
// Get some text we know is encoded in UTF-8 to simplify the code below
// and contains combining codepoints as a matter of example.
using (var web = new WebClient())
{
web.DownloadFile("http://superuser.com/questions/52671/which-unicode-characters-do-smilies-like-%D9%A9-%CC%AE%CC%AE%CC%83-%CC%83%DB%B6-consist-of", path);
}
// since the question asks to analyze a file
var content = File.ReadAllText(path, Encoding.UTF8);
var frequency = new Dictionary<String, int>();
var itor = System.Globalization.StringInfo.GetTextElementEnumerator(content);
while (itor.MoveNext())
{
var element = (String)itor.Current;
if (!frequency.ContainsKey(element))
{
frequency.Add(element, 0);
}
frequency[element]++;
}
var histogram = frequency
.OrderByDescending(f => f.Value)
// jazz it up with the list of codepoints in each text element
.Select(pair =>
{
var bytes = Encoding.UTF32.GetBytes(pair.Key);
var codepoints = new UInt32[bytes.Length/4];
Buffer.BlockCopy(bytes, 0, codepoints, 0, bytes.Length);
return new {
Count = pair.Value,
textElement = pair.Key,
codepoints = codepoints.Select(cp => String.Format("U+{0:X4}", cp) ) };
});
histogram.Dump(); // For use in LINQPad

Related

How to display a string up until a certain special character in a textbox from a file

I would want to know how do I display strings from a textfile in a textbox but only untill it reaches a '#" sign in the textfile in c# ?
string lines = outputToBox.ReadToEnd();
//outputToBox is streamreader var that holds the conent of the file
int index = lines.IndexOf('#');
txtDisplay.Text = lines.Substring(0, index);
The problem I now have is that it does not display all the lines in the textbox only the first one
It would help if you included an example of what you have and what you want it to look like. I assume your input looks something like this
x.field1#x.field2#x.field3
y.field1#y.field2#y.field3
z.field1#z.field2#z.field3
If there are multiple lines in the textbox you could turn it into an array and then run foreach through them (if you need an example I can show you)
string[] fileInput = System.IO.File.ReadAllLines(#FILE_PATH)
it would output like this
fileInput[0] = x.field1#x.field2#x.field3
then you can add
string[] myArray = fileInput[x].Split('#') // Into an array, so if you only want 'x.field1', you enter fileInput[0], and return myArray[0]
and implement your foreach. If you want very specific fields from the file that start with certain chars I recommend reading a bit about LINQ and how run small queries.
if your goal is to do this for every existing instance of a string in whatever file, you need a loop.
https://msdn.microsoft.com/en-us/library/bb308959.aspx (LINQ)
This should do the trick and is probably the optimal solution, without more information >:D.
void Main()
{
Debug.Print(Homework_Problem_Number_54_000_601.
Parse("NAME SURNAME NUMBER #2131231313"));
//this prints "NAME SURNAME NUMBER " without the quotes to the console
}
void SetTextBoxText(TextBox textBox, string value) { textBox.Text = value; }
unsafe static class Homework_Problem_Number_54_000_601
{
[ThreadStatic]static StringBuilder __builder;
public static string Parse(string textToParse)
{
if (textToParse == null) return null;
var builder = __builder = __builder ?? new StringBuilder();
builder.clear();
fixed (char* pTextToParse = textToParse)
{
var pTerminus = pTextToParse + textToParse.Length;
for (var pChar = pTextToParse; pChar < pTerminus; ++pChar)
{
switch (*pChar)
{
case '#':
return builder.ToString();
break;
default:
builder.Append(*pChar);
break;
}
}
}
throw new ArgumentException("textToParse was not in the expected format");
}
}
As for reading from the file, that's hard to say without a file format specification from you, but now that you have posted code I'd try this:
string lines = outputToBox.ReadToEnd();
StringBuilder builder = new StringBuilder();
foreach (string line in lines.Split('\n'))
{
int index = line.IndexOf('#');
if (index != -1) builder.AppendLine(line.Substring(0, index));
}
txtDisplay.Text = builder.ToString();
Don't forget to switch the TextBox to multi-line mode if needed.

C# linking hint with a random word from dictionary

I am displaying parts of a word by using:
public string GetPartialWord(string word)
{
if (string.IsNullOrEmpty(word))
{
return string.Empty;
}
char[] partialWord = word.ToCharArray();
int numberOfCharsToHide = word.Length / 2;
Random randomNumberGenerator = new Random();
HashSet<int> maskedIndices = new HashSet<int>();
for (int i = 0; i < numberOfCharsToHide; i++)
{
int rIndex = randomNumberGenerator.Next(0, word.Length);
while (!maskedIndices.Add(rIndex))
{
rIndex = randomNumberGenerator.Next(0, word.Length);
}
partialWord[rIndex] = '_';
}
return new string(partialWord);
}
Therefore: the word game would look like: _a_e
I am thinking of making adding a hint button to display another character. Any ideas on how to proceed?
G_m_ -> Hint -> G_me
You can use something like this:
public string GetWordAfterHint(string wordToProcess, string originalWord)
{
List<int> emptyIndexes = new List<int>();
for (int a = 0; a < wordToProcess.Length; a++)
{
if (wordToProcess[a] == '_')
{
emptyIndexes.Add(a);
}
}
// in case if word doesn't have empty positions
if (emptyIndexes.Count == 0)
{
return wordToProcess;
}
Random random = new Random();
var indexForLetter = random.Next(emptyIndexes.Count);
// create stringBuilder from string, because string is immutable and you can't change separate symbol
StringBuilder sb = new StringBuilder(wordToProcess);
// insert symbol from originalWord in empty previously generated position
sb[emptyIndexes[indexForLetter]] = originalWord[emptyIndexes[indexForLetter]]; //
//convert stringBuilder to string and return
return sb.ToString();
}
Method returns word after hint - if as wordToProcess you pass "_a_e" and as originalWord "game" then method returns ga_e or _ame.
Store the indexes of the characters displayed in an array or list. When the hint button is pressed, compute a random index. Compare that index with the indexes of letters already being displayed, and recalculate a new random index if necessary.
An Object Oriented approach:
From your GePartialWord method, return an instance of this class instead of a simple string:
public class GameWord
{
public string OriginalWord { get; set; }
public string GuessWord { get; set; }
public string Hint()
{
int index = this.GuessWord.IndexOf('_');
if (index != -1)
{
var builder = new StringBuilder(this.GuessWord);
builder[index] = this.OriginalWord[index];
this.GuessWord = builder.ToString();
return this.GuessWord; // if needed
}
// No more hints, the world has no underscores
return this.GuessWord;
}
}
So in your method you will do this instead:
public GameWord GetPartialWord(string word)
{
// The rest of your code
// Change this line return new string(partialWord);
// to this
return new GameWord{ OriginalWord = word, GuessWord = new string(partialWord)};
}
And in your form, create a private field like this:
private GameWord currentGameWord;
When you have the random word, call your GetPartialWord method and store the returned word in currentGameWord:
this.currentGameWord = GetPartialWord(someWord);
And because the method now returns an object, bind your textbox like this:
this.textBox1.Text = this.currentGameWord.GuessWord;
And in your button's click handler do this (your handler will have a different name):
private void HintButton_Click(object sender, EventArgs e)
{
this.textBox1.Text = this.currentGameWord.Hint();
}

How remove some special words from a string content?

I have some strings containing code for emoji icons, like :grinning:, :kissing_heart:, or :bouquet:. I'd like to process them to remove the emoji codes.
For example, given:
Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:
I want to get this:
Hello , how are you? Are you fine?
I know I can use this code:
richTextBox2.Text = richTextBox1.Text.Replace(":kissing_heart:", "").Replace(":bouquet:", "").Replace(":grinning:", "").ToString();
However, there are 856 different emoji icons I have to remove (which, using this method, would take 856 calls to Replace()). Is there any other way to accomplish this?
You can use Regex to match the word between :anything:. Using Replace with function you can make other validation.
string pattern = #":(.*?):";
string input = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet: Are you super fan, for example. :words not to replace:";
string output = Regex.Replace(input, pattern, (m) =>
{
if (m.ToString().Split(' ').Count() > 1) // more than 1 word and other validations that will help preventing parsing the user text
{
return m.ToString();
}
return String.Empty;
}); // "Hello , how are you? Are you fine? Are you super fan, for example. :words not to replace:"
If you don't want to use Replace that make use of a lambda expression, you can use \w, as #yorye-nathan mentioned, to match only words.
string pattern = #":(\w*):";
string input = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet: Are you super fan, for example. :words not to replace:";
string output = Regex.Replace(input, pattern, String.Empty); // "Hello , how are you? Are you fine? Are you super fan, for example. :words not to replace:"
string Text = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:";
i would solve it that way
List<string> Emoj = new List<string>() { ":kissing_heart:", ":bouquet:", ":grinning:" };
Emoj.ForEach(x => Text = Text.Replace(x, string.Empty));
UPDATE - refering to Detail's Comment
Another approach: replace only existing Emojs
List<string> Emoj = new List<string>() { ":kissing_heart:", ":bouquet:", ":grinning:" };
var Matches = Regex.Matches(Text, #":(\w*):").Cast<Match>().Select(x => x.Value);
Emoj.Intersect(Matches).ToList().ForEach(x => Text = Text.Replace(x, string.Empty));
But i'm not sure if it's that big difference for such short chat-strings and it's more important to have code that's easy to read/maintain. OP's question was about reducing redundancy Text.Replace().Text.Replace() and not about the most efficient solution.
I would use a combination of some of the techniques already suggested. Firstly, I'd store the 800+ emoji strings in a database and then load them up at runtime. Use a HashSet to store these in memory, so that we have a O(1) lookup time (very fast). Use Regex to pull out all potential pattern matches from the input and then compare each to our hashed emoji, removing the valid ones and leaving any non-emoji patterns the user has entered themselves...
public class Program
{
//hashset for in memory representation of emoji,
//lookups are O(1), so very fast
private HashSet<string> _emoji = null;
public Program(IEnumerable<string> emojiFromDb)
{
//load emoji from datastore (db/file,etc)
//into memory at startup
_emoji = new HashSet<string>(emojiFromDb);
}
public string RemoveEmoji(string input)
{
//pattern to search for
string pattern = #":(\w*):";
string output = input;
//use regex to find all potential patterns in the input
MatchCollection matches = Regex.Matches(input, pattern);
//only do this if we actually find the
//pattern in the input string...
if (matches.Count > 0)
{
//refine this to a distinct list of unique patterns
IEnumerable<string> distinct =
matches.Cast<Match>().Select(m => m.Value).Distinct();
//then check each one against the hashset, only removing
//registered emoji. This allows non-emoji versions
//of the pattern to survive...
foreach (string match in distinct)
if (_emoji.Contains(match))
output = output.Replace(match, string.Empty);
}
return output;
}
}
public class MainClass
{
static void Main(string[] args)
{
var program = new Program(new string[] { ":grinning:", ":kissing_heart:", ":bouquet:" });
string output = program.RemoveEmoji("Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:");
Console.WriteLine(output);
}
}
Which results in:
Hello :imadethis:, how are you? Are you fine? This is:a:strange:thing :to type:,
but valid :nonetheless:
You do not have to replace all 856 emoji's. You only have to replace those that appear in the string. So have a look at:
Finding a substring using C# with a twist
Basically you extract all tokens ie the strings between : and : and then replace those with string.Empty()
If you are concerned that the search will return strings that are not emojis such as :some other text: then you could have a hash table lookup to make sure that replacing said found token is appropriate to do.
Finally got around to write something up. I'm combining a couple previously mentioned ideas, with the fact we should only loop over the string once. Based on those requirement, this sound like the perfect job for Linq.
You should probably cache the HashSet. Other than that, this has O(n) performance and only goes over the list once. Would be interesting to benchmark, but this could very well be the most efficient solution.
The approach is pretty straight forwards.
First load all Emoij in a HashSet so we can quickly look them up.
Split the string with input.Split(':') at the :.
Decide if we keep the current element.
If the last element was a match, keep the current element.
If the last element was no match, check if the current element matches.
If it does, ignore it. (This effectively removes the substring from the output).
If it doesn't, append : back and keep it.
Rebuild our string with a StringBuilder.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
static class Program
{
static void Main(string[] args)
{
ISet<string> emojiList = new HashSet<string>(new[] { "kissing_heart", "bouquet", "grinning" });
Console.WriteLine("Hello:grinning: , ho:w: a::re you?:kissing_heart:kissing_heart: Are you fine?:bouquet:".RemoveEmoji(':', emojiList));
Console.ReadLine();
}
public static string RemoveEmoji(this string input, char delimiter, ISet<string> emojiList)
{
StringBuilder sb = new StringBuilder();
input.Split(delimiter).Aggregate(true, (prev, curr) =>
{
if (prev)
{
sb.Append(curr);
return false;
}
if (emojiList.Contains(curr))
{
return true;
}
sb.Append(delimiter);
sb.Append(curr);
return false;
});
return sb.ToString();
}
}
}
Edit: I did something cool using the Rx library, but then realized Aggregate is the IEnumerable counterpart of Scan in Rx, thus simplifying the code even more.
If efficiency is a concern and to avoid processing "false positives", consider rewriting the string using a StringBuilder while skipping the special emoji tokens:
static HashSet<string> emojis = new HashSet<string>()
{
"grinning",
"kissing_heart",
"bouquet"
};
static string RemoveEmojis(string input)
{
StringBuilder sb = new StringBuilder();
int length = input.Length;
int startIndex = 0;
int colonIndex = input.IndexOf(':');
while (colonIndex >= 0 && startIndex < length)
{
//Keep normal text
int substringLength = colonIndex - startIndex;
if (substringLength > 0)
sb.Append(input.Substring(startIndex, substringLength));
//Advance the feed and get the next colon
startIndex = colonIndex + 1;
colonIndex = input.IndexOf(':', startIndex);
if (colonIndex < 0) //No more colons, so no more emojis
{
//Don't forget that first colon we found
sb.Append(':');
//Add the rest of the text
sb.Append(input.Substring(startIndex));
break;
}
else //Possible emoji, let's check
{
string token = input.Substring(startIndex, colonIndex - startIndex);
if (emojis.Contains(token)) //It's a match, so we skip this text
{
//Advance the feed
startIndex = colonIndex + 1;
colonIndex = input.IndexOf(':', startIndex);
}
else //No match, so we keep the normal text
{
//Don't forget the colon
sb.Append(':');
//Instead of doing another substring next loop, let's just use the one we already have
sb.Append(token);
startIndex = colonIndex;
}
}
}
return sb.ToString();
}
static void Main(string[] args)
{
List<string> inputs = new List<string>()
{
"Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:",
"Tricky test:123:grinning:",
"Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:"
};
foreach (string input in inputs)
{
Console.WriteLine("In <- " + input);
Console.WriteLine("Out -> " + RemoveEmojis(input));
Console.WriteLine();
}
Console.WriteLine("\r\n\r\nPress enter to exit...");
Console.ReadLine();
}
Outputs:
In <- Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:
Out -> Hello , how are you? Are you fine?
In <- Tricky test:123:grinning:
Out -> Tricky test:123
In <- Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:
Out -> Hello :imadethis:, how are you? Are you fine? This is:a:strange:thing :to type:, but valid :nonetheless:
Use this code I put up below I think using this function your problem will be solved.
string s = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:";
string rmv = ""; string remove = "";
int i = 0; int k = 0;
A:
rmv = "";
for (i = k; i < s.Length; i++)
{
if (Convert.ToString(s[i]) == ":")
{
for (int j = i + 1; j < s.Length; j++)
{
if (Convert.ToString(s[j]) != ":")
{
rmv += s[j];
}
else
{
remove += rmv + ",";
i = j;
k = j + 1;
goto A;
}
}
}
}
string[] str = remove.Split(',');
for (int x = 0; x < str.Length-1; x++)
{
s = s.Replace(Convert.ToString(":" + str[x] + ":"), "");
}
Console.WriteLine(s);
Console.ReadKey();
I'd use extension method like this:
public static class Helper
{
public static string MyReplace(this string dirty, char separator)
{
string newText = "";
bool replace = false;
for (int i = 0; i < dirty.Length; i++)
{
if(dirty[i] == separator) { replace = !replace ; continue;}
if(replace ) continue;
newText += dirty[i];
}
return newText;
}
}
Usage:
richTextBox2.Text = richTextBox2.Text.MyReplace(':');
This method show be better in terms of performance compare to one with Regex
I would split the text with the ':' and then build the string excluding the found emoji names.
const char marker = ':';
var textSections = text.Split(marker);
var emojiRemovedText = string.Empty;
var notMatchedCount = 0;
textSections.ToList().ForEach(section =>
{
if (emojiNames.Contains(section))
{
notMatchedCount = 0;
}
else
{
if (notMatchedCount++ > 0)
{
emojiRemovedText += marker.ToString();
}
emojiRemovedText += section;
}
});

C# Create Acronym from Word

Given any string, I'd like to create an intelligent acronym that represents the string. If any of you have used JIRA, they accomplish this pretty well.
For example, given the word: Phoenix it would generate PHX or given the word Privacy Event Management it would create PEM.
I've got some code that will accomplish the latter:
string.Join(string.Empty, model.Name
.Where(char.IsLetter)
.Where(char.IsUpper))
This case doesn't handle if there is only one word and its lower case either.
but it doesn't account for the first case. Any ideas? I'm using C# 4.5
For the Phoenix => PHX, I think you'll need to check the strings against a dictionary of known abbreviations. As for the multiple word/camel-case support, regex is your friend!
var text = "A Big copy DayEnergyFree good"; // abbreviation should be "ABCDEFG"
var pattern = #"((?<=^|\s)(\w{1})|([A-Z]))";
string.Join(string.Empty, Regex.Matches(text, pattern).OfType<Match>().Select(x => x.Value.ToUpper()))
Let me explain what's happening here, starting with the regex pattern, which covers a few cases for matching substrings.
// must be directly after the beginning of the string or line "^" or a whitespace character "\s"
(?<=^|\s)
// match just one letter that is part of a word
(\w{1})
// if the previous requirements are not met
|
// match any upper-case letter
([A-Z])
The Regex.Matches method returns a MatchCollection, which is basically an ICollection so to use LINQ expressions, we call OfType() to convert the MatchCollection into an IEnumerable.
Regex.Matches(text, pattern).OfType<Match>()
Then we select only the value of the match (we don't need the other regex matching meta-data) and convert it to upper-case.
Select(x => x.Value.ToUpper())
I was able to extract out the JIRA key generator and posted it here. pretty interesting, and even though its JavaScript it could easily be converted to c#.
Here is a simple function that generates an acronym. Basically it puts letters or numbers into the acronym when there is a space before of this character. If there are no spaces in the string the the string is returned back. It does not capitalize letters in the acronym, but it is easy to amend.
You can just copy it in your code and start using it.
Results are the following. Just an example:
Deloitte Private Pty Ltd - DPPL
Clearwater Investment Co Pty Ltd (AC & CC Family Trust) - CICPLACFT
ASIC - ASIC
private string Acronym(string value)
{
if (string.IsNullOrWhiteSpace(value))
{
return value;
} else
{
var builder = new StringBuilder();
foreach(char c in value)
{
if (char.IsWhiteSpace(c) || char.IsLetterOrDigit(c))
{
builder.Append(c);
}
}
string trimmedValue = builder.ToString().Trim();
builder.Clear();
if (trimmedValue.Contains(' '))
{
for(int charIndex = 0; charIndex < trimmedValue.Length; charIndex++)
{
if (charIndex == 0)
{
builder.Append(trimmedValue[0]);
} else
{
char currentChar = trimmedValue[charIndex];
char previousChar = trimmedValue[charIndex - 1];
if (char.IsLetterOrDigit(currentChar) && char.IsWhiteSpace(previousChar))
{
builder.Append(trimmedValue[charIndex]);
}
}
}
return builder.ToString();
} else
{
return trimmedValue;
}
}
}
I need a not repeating code,So I create the follow method.
If you use like this,you will get
HashSet<string> idHashSet = new HashSet<string>();
for (int i = 0; i < 100; i++)
{
var eName = "China National Petroleum";
Console.WriteLine($"count:{i+1},short name:{GetIdentifierCode(eName,ref idHashSet)}");
}
the method is this.
/// <summary>
/// 根据英文名取其简写Code,优先取首字母,然后在每个单词中依次取字母作为Code,最后若还有重复则使用默认填充符(A)填充
/// todo 当名称为中文时,使用拼音作为取Code的源
/// </summary>
/// <param name="name"></param>
/// <param name="idHashSet"></param>
/// <returns></returns>
public static string GetIdentifierCode(string name, ref HashSet<string> idHashSet)
{
var words = name;
var fillChar = 'A';
if (string.IsNullOrEmpty(words))
{
do
{
words += fillChar.ToString();
} while (idHashSet.Contains(words));
}
//if (IsChinese)
//{
// words = GetPinYin(words);
//}
//中国石油天然气集团公司(China National Petroleum)
var sourceWord = new List<string>(words.Split(' '));
var returnWord = sourceWord.Select(c => new List<char>()).ToList();
int index = 0;
do
{
var listAddWord = sourceWord[index];
var addWord = returnWord[index];
//最后若还有重复则使用默认填充符(A)填充
if (sourceWord.All(c => string.IsNullOrEmpty(c)))
{
returnWord.Last().Add(fillChar);
continue;
}
//字符取完后跳过
else if (string.IsNullOrEmpty(listAddWord))
{
if (index == sourceWord.Count - 1)
index = 0;
else
{
index++;
}
continue;
}
if (addWord == null)
addWord = new List<char>();
string addString = string.Empty;
//字符全为大写时,不拆分
if (listAddWord.All(a => char.IsUpper(a)))
{
addWord = listAddWord.ToCharArray().ToList();
returnWord[index] = addWord;
addString = listAddWord;
}
else
{
addString = listAddWord.First().ToString();
addWord.Add(listAddWord.First());
}
listAddWord = listAddWord.Replace(addString, "");
sourceWord[index] = listAddWord;
if (index == sourceWord.Count - 1)
index = 0;
else
{
index++;
}
} while (idHashSet.Contains(string.Concat(returnWord.SelectMany(c => c))));
words = string.Concat(returnWord.SelectMany(c => c));
idHashSet.Add(words);
return words;

compare the characters in two strings

In C#, how do I compare the characters in two strings.
For example, let's say I have these two strings
"bc3231dsc" and "bc3462dsc"
How do I programically figure out the the strings
both start with "bc3" and end with "dsc"?
So the given would be two variables:
var1 = "bc3231dsc";
var2 = "bc3462dsc";
After comparing each characters from var1 to var2, I would want the output to be:
leftMatch = "bc3";
center1 = "231";
center2 = "462";
rightMatch = "dsc";
Conditions:
1. The strings will always be a length of 9 character.
2. The strings are not case sensitive.
The string class has 2 methods (StartsWith and Endwith) that you can use.
After reading your question and the already given answers i think there are some constraints are missing, which are maybe obvious to you, but not to the community. But maybe we can do a little guess work:
You'll have a bunch of string pairs that should be compared.
The two strings in each pair are of the same length or you are only interested by comparing the characters read simultaneously from left to right.
Get some kind of enumeration that tells me where each block starts and how long it is.
Due to the fact, that a string is only a enumeration of chars you could use LINQ here to get an idea of the matching characters like this:
private IEnumerable<bool> CommonChars(string first, string second)
{
if (first == null)
throw new ArgumentNullException("first");
if (second == null)
throw new ArgumentNullException("second");
var charsToCompare = first.Zip(second, (LeftChar, RightChar) => new { LeftChar, RightChar });
var matchingChars = charsToCompare.Select(pair => pair.LeftChar == pair.RightChar);
return matchingChars;
}
With this we can proceed and now find out how long each block of consecutive true and false flags are with this method:
private IEnumerable<Tuple<int, int>> Pack(IEnumerable<bool> source)
{
if (source == null)
throw new ArgumentNullException("source");
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
yield break;
}
bool current = iterator.Current;
int index = 0;
int length = 1;
while (iterator.MoveNext())
{
if(current != iterator.Current)
{
yield return Tuple.Create(index, length);
index += length;
length = 0;
}
current = iterator.Current;
length++;
}
yield return Tuple.Create(index, length);
}
}
Currently i don't know if there is an already existing LINQ function that provides the same functionality. As far as i have already read it should be possible with SelectMany() (cause in theory you can accomplish any LINQ task with this method), but as an adhoc implementation the above was easier (for me).
These functions could then be used in a way something like this:
var firstString = "bc3231dsc";
var secondString = "bc3462dsc";
var commonChars = CommonChars(firstString, secondString);
var packs = Pack(commonChars);
foreach (var item in packs)
{
Console.WriteLine("Left side: " + firstString.Substring(item.Item1, item.Item2));
Console.WriteLine("Right side: " + secondString.Substring(item.Item1, item.Item2));
Console.WriteLine();
}
Which would you then give this output:
Left side: bc3
Right side: bc3
Left side: 231
Right side: 462
Left side: dsc
Right side: dsc
The biggest drawback is in someway the usage of Tuple cause it leads to the ugly property names Item1 and Item2 which are far away from being instantly readable. But if it is really wanted you could introduce your own simple class holding two integers and has some rock-solid property names. Also currently the information is lost about if each block is shared by both strings or if they are different. But once again it should be fairly simply to get this information also into the tuple or your own class.
static void Main(string[] args)
{
string test1 = "bc3231dsc";
string tes2 = "bc3462dsc";
string firstmatch = GetMatch(test1, tes2, false);
string lasttmatch = GetMatch(test1, tes2, true);
string center1 = test1.Substring(firstmatch.Length, test1.Length -(firstmatch.Length + lasttmatch.Length)) ;
string center2 = test2.Substring(firstmatch.Length, test1.Length -(firstmatch.Length + lasttmatch.Length)) ;
}
public static string GetMatch(string fist, string second, bool isReverse)
{
if (isReverse)
{
fist = ReverseString(fist);
second = ReverseString(second);
}
StringBuilder builder = new StringBuilder();
char[] ar1 = fist.ToArray();
for (int i = 0; i < ar1.Length; i++)
{
if (fist.Length > i + 1 && ar1[i].Equals(second[i]))
{
builder.Append(ar1[i]);
}
else
{
break;
}
}
if (isReverse)
{
return ReverseString(builder.ToString());
}
return builder.ToString();
}
public static string ReverseString(string s)
{
char[] arr = s.ToCharArray();
Array.Reverse(arr);
return new string(arr);
}
Pseudo code of what you need..
int stringpos = 0
string resultstart = ""
while not end of string (either of the two)
{
if string1.substr(stringpos) == string1.substr(stringpos)
resultstart =resultstart + string1.substr(stringpos)
else
exit while
}
resultstart has you start string.. you can do the same going backwards...
Another solution you can use is Regular Expressions.
Regex re = new Regex("^bc3.*?dsc$");
String first = "bc3231dsc";
if(re.IsMatch(first)) {
//Act accordingly...
}
This gives you more flexibility when matching. The pattern above matches any string that starts in bc3 and ends in dsc with anything between except a linefeed. By changing .*? to \d, you could specify that you only want digits between the two fields. From there, the possibilities are endless.
using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;
class Sample {
static public void Main(){
string s1 = "bc3231dsc";
string s2 = "bc3462dsc";
List<string> common_str = commonStrings(s1,s2);
foreach ( var s in common_str)
Console.WriteLine(s);
}
static public List<string> commonStrings(string s1, string s2){
int len = s1.Length;
char [] match_chars = new char[len];
for(var i = 0; i < len ; ++i)
match_chars[i] = (Char.ToLower(s1[i])==Char.ToLower(s2[i]))? '#' : '_';
string pat = new String(match_chars);
Regex regex = new Regex("(#+)", RegexOptions.Compiled);
List<string> result = new List<string>();
foreach (Match match in regex.Matches(pat))
result.Add(s1.Substring(match.Index, match.Length));
return result;
}
}
for UPDATE CONDITION
using System;
class Sample {
static public void Main(){
string s1 = "bc3231dsc";
string s2 = "bc3462dsc";
int len = 9;//s1.Length;//cond.1)
int l_pos = 0;
int r_pos = len;
for(int i=0;i<len && Char.ToLower(s1[i])==Char.ToLower(s2[i]);++i){
++l_pos;
}
for(int i=len-1;i>0 && Char.ToLower(s1[i])==Char.ToLower(s2[i]);--i){
--r_pos;
}
string leftMatch = s1.Substring(0,l_pos);
string center1 = s1.Substring(l_pos, r_pos - l_pos);
string center2 = s2.Substring(l_pos, r_pos - l_pos);
string rightMatch = s1.Substring(r_pos);
Console.Write(
"leftMatch = \"{0}\"\n" +
"center1 = \"{1}\"\n" +
"center2 = \"{2}\"\n" +
"rightMatch = \"{3}\"\n",leftMatch, center1, center2, rightMatch);
}
}

Categories