Best way to extract two numbers and following letter from string - c#

Say we have a string, called source. It contains "New York City - 12A - 1234B"
Here are the rules:
a. We know that the closest two numbers to the beginning of the string should kept, along with the following character and placed into a separate string, called results;
b. We are not certain if this following character will be a number or a letter
c. The formatting of the string itself varies - it could be "NY 12A 1234B"
d. We could care less about anything else!
Now I in my infinite wisdom have crafted this monstrosity. It works but please tell me there is a better way to do this or at best a cleaner, more performance conscious way of doing it.
class Program
{
public static int i = 0;
public static int q = 0;
public static int x = 0;
public static string source = "New York City - 12A - 1234B";
public static string results = "";
public static char[] from_source_char;
public static List<string> from_source_list = new List<string>();
static void Main(string[] args)
{
from_source_char = source.ToCharArray();
foreach (char unit in from_source_char)
{
from_source_list.Add(unit.ToString());
}
Console.WriteLine("Doing while " + i.ToString() + " < " + (from_source_list.Count() - 1).ToString());
while (i < from_source_list.Count() - 1)
{
Console.WriteLine("i is at " + i.ToString());
Console.WriteLine("Examining " + from_source_list[i].ToString());
try
{
q = Convert.ToInt32(from_source_list[i]);
results += from_source_list[i].ToString();
Console.WriteLine("Found part 1!");
x++;
}
catch
{
Console.WriteLine("Disregarding " + from_source_list[i].ToString());
// do nothing
}
if (x == 2)
{
Console.WriteLine("Found final part! " + from_source_char[i+1].ToString());
results += from_source_char[i+1].ToString();
break;
}
i++;
}
Console.WriteLine("Result is " + results.ToString());
Thread.Sleep(999999);
}
}

You could use a Regex with this pattern: #"^.*?(?<numbers>\d{2}\w).*$".
Example:
var f = #"^.*?(?<numbers>\d{2}\w).*$";
var match = Regex.Match("NY 12A 1234B", f);
var result = match.Groups["numbers"].Value;

Another version without regex:
char a = source.First(pos => char.IsDigit(pos));
int b = source.IndexOf(a);
string result = source.Substring(b, 3);
Console.WriteLine(result);

Related

Find How many times an Anagram is contained in a String - asp.net c#

I need to find how many times the Anagrams are contained in a String like in this example:(the anagrams and the string itself)
Input 1(String) = thegodsanddogsweredogged
Input 2(String) = dog
Output(int) = 3
the output will be 3 because of these - (thegodsanddogsweredogged)
So far i managed to check how many times the word "dog" is contained in the string:
public ActionResult FindAnagram (string word1, string word2)
{
int ?count = Regex.Matches(word1, word2).Count;
return View(count);
}
This works to check how many times the word is contained but i still get an error: Cannot convert null to 'int' because it is a non-nulable value type.
So i need to check for how many times input 2 and the anagrams of input 2(dog,god,dgo,ogd etc) are contained in input 1?(int this case its 3 times-thegodsanddogsweredogged)
Thank you
I wanted to post a variation which is more readable at the cost of some runtime performance. Anton's answer is probably more performant, but IMHO less readable than it could be.
The nice thing about anagrams is that you know their exact length, and you can figure out all possible anagram locations quite easily. For a 3 letter anagram in a 100 letter haystack, you know that there are 98 possible locations:
0..2
1..3
2..4
...
96..98
97..99
These indexes can be generated quite easily:
var amountOfPossibleAnagramLocations = haystack.Length - needle.Length + 1;
var substringIndexes = Enumerable.Range(0, amountOfPossibleAnagramLocations);
At this point, you simply take every listed substring and test if it's an anagram.
var anagramLength = needle.Length;
int count = 0;
foreach(var index in substringIndexes)
{
var substring = haystack.Substring(index, anagramLength);
if(substring.IsAnagramOf(needle))
count++;
}
Note that a lot of this can be condensed into a single LINQ chain:
var amountOfPossibleAnagramLocations = haystack.Length - needle.Length + 1;
var anagramLength = needle.Length;
var anagramCount = Enumerable
.Range(0, amountOfPossibleAnagramLocations)
.Select(x => haystack.Substring(x, anagramLength))
.Count(substring => substring.IsAnagramOf(needle));
Whether it's more readable or not depends on how comfortable you are with LINQ. I personally prefer it (up to a reasonable size, of course).
To check for an anagram, simply sort the characters and check for equality. I used an extension method for the readability bonus:
public static bool IsAnagramOf(this string word1, string word2)
{
var word1Sorted = String.Concat(word1.OrderBy(c => c));
var word2Sorted = String.Concat(word2.OrderBy(c => c));
return word1Sorted == word2Sorted;
}
I've omitted things like case insensitivity or ignoring whitespace for the sake of brevity.
It would be better not to try to use Regex but write your own logic.
You can use a dictionary with key char - a letter of the word and value int - number of letter occurrences. And build such a dictionary for the word.
Anagrams will have similar dictionaries, so you can build a temp dictionary for each temp string built using the Windowing method over your str and compare it with the dictionary built for your word.
Here is my code:
using System;
using System.Collections.Generic;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
var str = "thegodsanddogsweredogged";
var word = "dog";
Console.WriteLine("Word: " + word);
Console.WriteLine("Str: " + str);
Console.WriteLine();
var count = CountAnagrams(str, word);
Console.WriteLine("Count: " + count);
Console.ReadKey();
}
private static int CountAnagrams(string str, string word) {
var charDict = BuildCharDict(word);
int count = 0;
for (int i = 0; i < str.Length - word.Length + 1; i++) {
string tmp = "";
for (int j = i; j < str.Length; j++) {
tmp += str[j];
if (tmp.Length == word.Length)
break;
}
var tmpCharDict = BuildCharDict(tmp);
if (CharDictsEqual(charDict, tmpCharDict)) {
count++;
Console.WriteLine("Anagram: " + tmp);
Console.WriteLine("Index: " + i);
Console.WriteLine();
}
}
return count;
}
private static Dictionary<char, int> BuildCharDict(string word) {
var charDict = new Dictionary<char, int>();
foreach (var ch in word)
{
if (charDict.ContainsKey(ch))
{
charDict[ch] += 1;
}
else
{
charDict[ch] = 1;
}
}
return charDict;
}
private static bool CharDictsEqual(Dictionary<char, int> dict1, Dictionary<char, int> dict2)
{
if (dict1.Count != dict2.Count)
return false;
foreach (var kv in dict1) {
if (!dict2.TryGetValue(kv.Key, out var val) || val != kv.Value)
{
return false;
}
}
return true;
}
}
}
Possibly there is a better solution, but mine works.
P.S. About your error. You should change int? to int, because your View might expect non-nullable int type

Thousands separator after the decimal point [duplicate]

I wonder what would be the best way to format numbers so that the NumberGroupSeparator would work not only on the integer part to the left of the comma, but also on the fractional part, on the right of the comma.
Math.PI.ToString("###,###,##0.0##,###,###,###") // As documented ..
// ..this doesn't work
3.14159265358979 // result
3.141,592,653,589,79 // desired result
As documented on MSDN the NumberGroupSeparator works only to the left of the comma. I wonder why??
A little clunky, and it won't work for scientific numbers but here is a try:
class Program
{
static void Main(string[] args)
{
var π=Math.PI*10000;
Debug.WriteLine(Display(π));
// 31,415.926,535,897,931,899
}
static string Display(double x)
{
int s=Math.Sign(x);
x=Math.Abs(x);
StringBuilder text=new StringBuilder();
var y=Math.Truncate(x);
text.Append((s*y).ToString("#,#"));
x-=y;
if (x>0)
{
// 15 decimal places is max reasonable precision
y=Math.Truncate(x*Math.Pow(10, 15));
text.Append(".");
text.Append(y.ToString("#,#").TrimEnd('0'));
}
return text.ToString();
}
}
It might be best to work with the string generated by your .ToString():
class Program
{
static string InsertSeparators(string s)
{
string decSeparator = System.Threading.Thread.CurrentThread.CurrentCulture.NumberFormat.NumberDecimalSeparator;
int separatorPos = s.IndexOf(decSeparator);
if (separatorPos >= 0)
{
string decPart = s.Substring(separatorPos + decSeparator.Length);
// split the string into parts of 3 or less characters
List<String> parts = new List<String>();
for (int i = 0; i < decPart.Length; i += 3)
{
string part = "";
for (int j = 0; (j < 3) && (i + j < decPart.Length); j++)
{
part += decPart[i + j];
}
parts.Add(part);
}
string groupSeparator = System.Threading.Thread.CurrentThread.CurrentCulture.NumberFormat.NumberGroupSeparator;
s = s.Substring(0, separatorPos) + decSeparator + String.Join(groupSeparator, parts);
}
return s;
}
static void Main(string[] args)
{
for (int n = 0; n < 15; n++)
{
string s = Math.PI.ToString("0." + new string('#', n));
Console.WriteLine(InsertSeparators(s));
}
Console.ReadLine();
}
}
Outputs:
3
3.1
3.14
3.142
3.141,6
3.141,59
3.141,593
3.141,592,7
3.141,592,65
3.141,592,654
3.141,592,653,6
3.141,592,653,59
3.141,592,653,59
3.141,592,653,589,8
3.141,592,653,589,79
OK, not my strong side, but I guess this may be my best bet:
string input = Math.PI.ToString();
string decSeparator = System.Threading.Thread.CurrentThread
.CurrentCulture.NumberFormat.NumberGroupSeparator;
Regex RX = new Regex(#"([0-9]{3})");
string result = RX.Replace(input , #"$1" + decSeparator);
Thanks for listening..

C# How to generate a new string based on multiple ranged index

Let's say I have a string like this one, left part is a word, right part is a collection of indices (single or range) used to reference furigana (phonetics) for kanjis in my word:
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす"
The pattern in detail:
word,<startIndex>(-<endIndex>):<furigana>
What would be the best way to achieve something like this (with a space in front of the kanji to mark which part is linked to the [furigana]):
子[こ]で 子[こ]にならぬ 時鳥[ほととぎす]
Edit: (thanks for your comments guys)
Here is what I wrote so far:
static void Main(string[] args)
{
string myString = "ABCDEF,1:test;3:test2";
//Split Kanjis / Indices
string[] tokens = myString.Split(',');
//Extract furigana indices
string[] indices = tokens[1].Split(';');
//Dictionnary to store furigana indices
Dictionary<string, string> furiganaIndices = new Dictionary<string, string>();
//Collect
foreach (string index in indices)
{
string[] splitIndex = index.Split(':');
furiganaIndices.Add(splitIndex[0], splitIndex[1]);
}
//Processing
string result = tokens[0] + ",";
for (int i = 0; i < tokens[0].Length; i++)
{
string currentIndex = i.ToString();
if (furiganaIndices.ContainsKey(currentIndex)) //add [furigana]
{
string currentFurigana = furiganaIndices[currentIndex].ToString();
result = result + " " + tokens[0].ElementAt(i) + string.Format("[{0}]", currentFurigana);
}
else //nothing to add
{
result = result + tokens[0].ElementAt(i);
}
}
File.AppendAllText(#"D:\test.txt", result + Environment.NewLine);
}
Result:
ABCDEF,A B[test]C D[test2]EF
I struggle to find a way to process ranged indices:
string myString = "ABCDEF,1:test;2-3:test2";
Result : ABCDEF,A B[test] CD[test2]EF
I don't have anything against manually manipulating strings per se. But given that you seem to have a regular pattern describing the inputs, it seems to me that a solution that uses regex would be more maintainable and readable. So with that in mind, here's an example program that takes that approach:
class Program
{
private const string _kinvalidFormatException = "Invalid format for edit specification";
private static readonly Regex
regex1 = new Regex(#"(?<word>[^,]+),(?<edit>(?:\d+)(?:-(?:\d+))?:(?:[^;]+);?)+", RegexOptions.Compiled),
regex2 = new Regex(#"(?<start>\d+)(?:-(?<end>\d+))?:(?<furigana>[^;]+);?", RegexOptions.Compiled);
static void Main(string[] args)
{
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす";
string result = EditString(myString);
}
private static string EditString(string myString)
{
Match editsMatch = regex1.Match(myString);
if (!editsMatch.Success)
{
throw new ArgumentException(_kinvalidFormatException);
}
int ichCur = 0;
string input = editsMatch.Groups["word"].Value;
StringBuilder text = new StringBuilder();
foreach (Capture capture in editsMatch.Groups["edit"].Captures)
{
Match oneEditMatch = regex2.Match(capture.Value);
if (!oneEditMatch.Success)
{
throw new ArgumentException(_kinvalidFormatException);
}
int start, end;
if (!int.TryParse(oneEditMatch.Groups["start"].Value, out start))
{
throw new ArgumentException(_kinvalidFormatException);
}
Group endGroup = oneEditMatch.Groups["end"];
if (endGroup.Success)
{
if (!int.TryParse(endGroup.Value, out end))
{
throw new ArgumentException(_kinvalidFormatException);
}
}
else
{
end = start;
}
text.Append(input.Substring(ichCur, start - ichCur));
if (text.Length > 0)
{
text.Append(' ');
}
ichCur = end + 1;
text.Append(input.Substring(start, ichCur - start));
text.Append(string.Format("[{0}]", oneEditMatch.Groups["furigana"]));
}
if (ichCur < input.Length)
{
text.Append(input.Substring(ichCur));
}
return text.ToString();
}
}
Notes:
This implementation assumes that the edit specifications will be listed in order and won't overlap. It makes no attempt to validate that part of the input; depending on where you are getting your input from you may want to add that. If it's valid for the specifications to be listed out of order, you can also extend the above to first store the edits in a list and sort the list by the start index before actually editing the string. (In similar fashion to the way the other proposed answer works; though, why they are using a dictionary instead of a simple list to store the individual edits, I have no idea…that seems arbitrarily complicated to me.)
I included basic input validation, throwing exceptions where failures occur in the pattern matching. A more user-friendly implementation would add more specific information to each exception, describing what part of the input actually was invalid.
The Regex class actually has a Replace() method, which allows for complete customization. The above could have been implemented that way, using Replace() and a MatchEvaluator to provide the replacement text, instead of just appending text to a StringBuilder. Which way to do it is mostly a matter of preference, though the MatchEvaluator might be preferred if you have a need for more flexible implementation options (i.e. if the exact format of the result can vary).
If you do choose to use the other proposed answer, I strongly recommend you use StringBuilder instead of simply concatenating onto the results variable. For short strings it won't matter much, but you should get into the habit of always using StringBuilder when you have a loop that is incrementally adding onto a string value, because for long string the performance implications of using concatenation can be very negative.
This should do it (and even handle ranged indices), based on the formatting of the input string you have-
using System;
using System.Collections.Generic;
public class stringParser
{
private struct IndexElements
{
public int start;
public int end;
public string value;
}
public static void Main()
{
//input string
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす";
int wordIndexSplit = myString.IndexOf(',');
string word = myString.Substring(0,wordIndexSplit);
string indices = myString.Substring(wordIndexSplit + 1);
string[] eachIndex = indices.Split(';');
Dictionary<int,IndexElements> index = new Dictionary<int,IndexElements>();
string[] elements;
IndexElements e;
int dash;
int n = 0;
int last = -1;
string results = "";
foreach (string s in eachIndex)
{
e = new IndexElements();
elements = s.Split(':');
if (elements[0].Contains("-"))
{
dash = elements[0].IndexOf('-');
e.start = int.Parse(elements[0].Substring(0,dash));
e.end = int.Parse(elements[0].Substring(dash + 1));
}
else
{
e.start = int.Parse(elements[0]);
e.end = e.start;
}
e.value = elements[1];
index.Add(n,e);
n++;
}
//this is the part that takes the "setup" from the parts above and forms the result string
//loop through each of the "indices" parsed above
for (int i = 0; i < index.Count; i++)
{
//if this is the first iteration through the loop, and the first "index" does not start
//at position 0, add the beginning characters before its start
if (last == -1 && index[i].start > 0)
{
results += word.Substring(0,index[i].start);
}
//if this is not the first iteration through the loop, and the previous iteration did
//not stop at the position directly before the start of the current iteration, add
//the intermediary chracters
else if (last != -1 && last + 1 != index[i].start)
{
results += word.Substring(last + 1,index[i].start - (last + 1));
}
//add the space before the "index" match, the actual match, and then the formatted "index"
results += " " + word.Substring(index[i].start,(index[i].end - index[i].start) + 1)
+ "[" + index[i].value + "]";
//remember the position of the ending for the next iteration
last = index[i].end;
}
//if the last "index" did not stop at the end of the input string, add the remaining characters
if (index[index.Keys.Count - 1].end + 1 < word.Length)
{
results += word.Substring(index[index.Keys.Count-1].end + 1);
}
//trimming spaces that may be left behind
results = results.Trim();
Console.WriteLine("INPUT - " + myString);
Console.WriteLine("OUTPUT - " + results);
Console.Read();
}
}
input - 子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす
output - 子[こ]で 子[こ]にならぬ 時鳥[ほととぎす]
Note that this should also work with characters the English alphabet if you wanted to use English instead-
input - iliketocodeverymuch,2:A;4-6:B;9-12:CDEFG
output - il i[A]k eto[B]co deve[CDEFG]rymuch

Decimal group seperator for the fractional part

I wonder what would be the best way to format numbers so that the NumberGroupSeparator would work not only on the integer part to the left of the comma, but also on the fractional part, on the right of the comma.
Math.PI.ToString("###,###,##0.0##,###,###,###") // As documented ..
// ..this doesn't work
3.14159265358979 // result
3.141,592,653,589,79 // desired result
As documented on MSDN the NumberGroupSeparator works only to the left of the comma. I wonder why??
A little clunky, and it won't work for scientific numbers but here is a try:
class Program
{
static void Main(string[] args)
{
var π=Math.PI*10000;
Debug.WriteLine(Display(π));
// 31,415.926,535,897,931,899
}
static string Display(double x)
{
int s=Math.Sign(x);
x=Math.Abs(x);
StringBuilder text=new StringBuilder();
var y=Math.Truncate(x);
text.Append((s*y).ToString("#,#"));
x-=y;
if (x>0)
{
// 15 decimal places is max reasonable precision
y=Math.Truncate(x*Math.Pow(10, 15));
text.Append(".");
text.Append(y.ToString("#,#").TrimEnd('0'));
}
return text.ToString();
}
}
It might be best to work with the string generated by your .ToString():
class Program
{
static string InsertSeparators(string s)
{
string decSeparator = System.Threading.Thread.CurrentThread.CurrentCulture.NumberFormat.NumberDecimalSeparator;
int separatorPos = s.IndexOf(decSeparator);
if (separatorPos >= 0)
{
string decPart = s.Substring(separatorPos + decSeparator.Length);
// split the string into parts of 3 or less characters
List<String> parts = new List<String>();
for (int i = 0; i < decPart.Length; i += 3)
{
string part = "";
for (int j = 0; (j < 3) && (i + j < decPart.Length); j++)
{
part += decPart[i + j];
}
parts.Add(part);
}
string groupSeparator = System.Threading.Thread.CurrentThread.CurrentCulture.NumberFormat.NumberGroupSeparator;
s = s.Substring(0, separatorPos) + decSeparator + String.Join(groupSeparator, parts);
}
return s;
}
static void Main(string[] args)
{
for (int n = 0; n < 15; n++)
{
string s = Math.PI.ToString("0." + new string('#', n));
Console.WriteLine(InsertSeparators(s));
}
Console.ReadLine();
}
}
Outputs:
3
3.1
3.14
3.142
3.141,6
3.141,59
3.141,593
3.141,592,7
3.141,592,65
3.141,592,654
3.141,592,653,6
3.141,592,653,59
3.141,592,653,59
3.141,592,653,589,8
3.141,592,653,589,79
OK, not my strong side, but I guess this may be my best bet:
string input = Math.PI.ToString();
string decSeparator = System.Threading.Thread.CurrentThread
.CurrentCulture.NumberFormat.NumberGroupSeparator;
Regex RX = new Regex(#"([0-9]{3})");
string result = RX.Replace(input , #"$1" + decSeparator);
Thanks for listening..

Increment a number within a string (C#)

I have a string with a number at the end, after a dash ("-"). I'd like to create that same string with that number incremented by 1. Pretty simple, but I'm wondering if there's a better approach to this? Thanks!
string oldString = "BA-0001-3";
int lastIndex = oldString.LastIndexOf("-");
string oldNumber = oldString.SubString(lastIndex + 1);
string oldPartialString = oldString.SubString(0, lastIndex);
int newNumber = Convert.ToInt32(oldNumber) + 1;
string newString = oldPartialString + newNumber.ToString();
Regex?
Example:
Regex.Replace("BA-0001-3", #"[A-Z]{2}-\d{4}-(\d+)",
m => (Convert.ToInt32(m.Groups[1].Value) + 1).ToString())
I would probably use my friend string.Split:
string oldString = "BA-0001-3";
string[] parts = oldString.Split('-');
parts[parts.Length-1] = (Convert.ToInt32(parts[parts.Length-1])+1).ToString();
string newString = string.Join("-", parts);
A small tweak that will perhaps make it quicker (by accessing parts.Length and subtracting 1 only once - didn't profile so it's purely a guess, and it is likely a marginal difference anyway), but above all more robust (by using int.TryParse):
string oldString = "BA-0001-3";
string[] parts = oldString.Split('-');
int number;
int lastIndex = parts.Length-1;
parts[lastIndex] = (int.TryParse(parts[lastIndex], out number) ? ++number : 1).ToString();
string newString = string.Join("-", parts);
Updated per Ahmad Mageed's comments below. This is his answer much more than it is mine now :-)
I would do it the way you have it now, but for fun wanted to see if I could do it with linq.
var x = "BA-0001-3".Split('-');
var y = x.First() + "-" + x.ElementAt(1) + "-" + (Convert.ToInt32(x.Last()) + 1);
This works in LINQPad.
Edit: Obviously I'm not a pro with linq. Hopefully there will be other answers/comments on how this can be improved.
Here's an example of how it could be done with RegEx:
public void Test()
{
System.Text.RegularExpressions.Regex rx = new Regex(#"(?<prefix>.*\-)(?<digit>\d+)");
string input = "BA-0001-3";
string output = string.Empty;
int digit = 0;
if (int.TryParse(rx.Replace(input, "${digit}"), out digit))
{
output = rx.Replace(input, "${prefix}" + (digit + 1));
}
Console.WriteLine(output);
}
Using the regex (which already seems to have now been filled in with more details) I end up with something like:
var regex = new Regex(#"^(?<Category>[A-Za-z]{1,2})-(?<Code>[0-9]{4})-(?<Number>[0-9]+)$");
var newCode = regex.Replace("BA-0001-3", new MatchEvaluator(ReplaceWithIncrementedNumber));
Where the MatchEvaluator function is:
public static string ReplaceWithIncrementedNumber(Match match)
{
Debug.Assert(match.Success);
var number = Int32.Parse(match.Groups["Number"].Value);
return String.Format("{0}-{1}-{2}", match.Groups["Category"].Value, match.Groups["Code"].Value, number + 1);
}
Here is an example of a class that exposes the three parts of your "part number". Not particularly fancy (also note the absence of error checking/validation).
class Program
{
static void Main(string[] args)
{
PartNumber p1 = new PartNumber("BA-0001-3");
for (int i = 0; i < 5; i++)
{
p1.Sub++;
Debug.WriteLine(p1);
}
PartNumber p2 = new PartNumber("BA", 2, 3);
for (int i = 0; i < 5; i++)
{
p2.Sub++;
Debug.WriteLine(p2);
}
}
}
class PartNumber
{
public PartNumber(string make, int model, int sub)
{
Make = make;
Model = model;
Sub = sub;
}
public PartNumber(string part)
{
//Might want to validate the string here
string [] fields = part.Split('-');
//Are there 3 fields? Are second and third fields valid ints?
Make = fields[0];
Model = Int32.Parse(fields[1]);
Sub = Int32.Parse(fields[2]);
}
public string Make { get; set; }
public int Model { get; set; }
public int Sub { get; set; }
public override string ToString()
{
return string.Format("{0}-{1:D4}-{2}", Make, Model, Sub);
}
}

Categories