Extract the most appearing sub string - c#

I have a string for example
string text = "xfoofoobarbar fooxxfoo barxxxfoo";
This string contains 5x foo which is the longest, most appearing repeated sub string with at least 2 characters within this string so it's my desired result.
bar appears only 3x so it's not the mostly appearing sub string
oo is also 5x within the string but foo is longer - so foo is to prefer
XababaY would result into ab which exists 2x (no overlapping, 2x ba is ignored because ab comes first)
XaaaaaaaY would result into aa because aa appears 3 times and it has the most repetion.
I would love to show some approaches what I've tried so far but I have honestly no idea where to start. Linq? RegEx?
A hint/approach into the right direction would help me too.

I would say the first place to start here is to generate a list of all the possible substrings from the input of length 2 to the length of the input:
string text = "xfoofoobarbar fooxxfoo barxxxfoo";
var allSubstrings = Enumerable.Range(2,text.Length)
.ToDictionary(k => k,v => FindSubStrings(text,v));
...
IEnumerable<string> FindSubStrings(string input, int length)
{
for(var i=0;i<input.Length-length;i++)
{
yield return input.Substring(i,length);
}
}
Live example: http://rextester.com/ZUR68480
From there it should be as simple as grouping by the substring to get a count, and ordering the result appropriately. But your requirements seem to pick and choose between "longest length" and "most occurrences", you cant have both!
Here is my full implementation, which I should point out chooses xfoo as the winner at present.
public static void Main(string[] args)
{
string text = "xfoofoobarbar fooxxfoo barxxxfoo";
var allSubstrings = Enumerable.Range(2,text.Length-2)
.Select(x => {
var longestSub = FindSubStrings(text,x).GroupBy(y => y).OrderByDescending(y => y.Count()).FirstOrDefault();
return new Substrings {
Length = x,
Count = longestSub.Count(),
Value = longestSub.Key
};
});
foreach(var item in allSubstrings)
{
Console.WriteLine(item.Length + ":" + item.Count + ":" + item.Value);
}
var best = allSubstrings.Where(x => x.Count>1).OrderByDescending(x => x.Length).ThenByDescending(x => x.Count).First();
Console.WriteLine("Longest, most frequest substring is " + best.Value);
}
public class Substrings
{
public int Length{get;set;}
public int Count{get;set;}
public string Value{get;set;}
}
private static IEnumerable<string> FindSubStrings(string input, int length)
{
for(var i=0;i<input.Length-length;i++)
{
yield return input.Substring(i,length);
}
}
Live example: http://rextester.com/RJNP55827

Related

Null Values Exception handling in Linq C# [duplicate]

I only found a way to do it the opposite way round: create a comma separated string from an int list or array, but not on how to convert input like string str = "1,2,3,4,5"; to an array or list of ints.
Here is my implementation (inspired by this post by Eric Lippert):
public static IEnumerable<int> StringToIntList(string str)
{
if (String.IsNullOrEmpty(str))
{
yield break;
}
var chunks = str.Split(',').AsEnumerable();
using (var rator = chunks.GetEnumerator())
{
while (rator.MoveNext())
{
int i = 0;
if (Int32.TryParse(rator.Current, out i))
{
yield return i;
}
else
{
continue;
}
}
}
}
Do you think this is a good approach or is there a more easy, maybe even built in way?
EDIT: Sorry for any confusion, but the method needs to handle invalid input like "1,2,,,3" or "###, 5," etc. by skipping it.
You should use a foreach loop, like this:
public static IEnumerable<int> StringToIntList(string str) {
if (String.IsNullOrEmpty(str))
yield break;
foreach(var s in str.Split(',')) {
int num;
if (int.TryParse(s, out num))
yield return num;
}
}
Note that like your original post, this will ignore numbers that couldn't be parsed.
If you want to throw an exception if a number couldn't be parsed, you can do it much more simply using LINQ:
return (str ?? "").Split(',').Select<string, int>(int.Parse);
If you don't want to have the current error handling behaviour, it's really easy:
return text.Split(',').Select(x => int.Parse(x));
Otherwise, I'd use an extra helper method (as seen this morning!):
public static int? TryParseInt32(string text)
{
int value;
return int.TryParse(text, out value) ? value : (int?) null;
}
and:
return text.Split(',').Select<string, int?>(TryParseInt32)
.Where(x => x.HasValue)
.Select(x => x.Value);
or if you don't want to use the method group conversion:
return text.Split(',').Select(t => t.TryParseInt32(t)
.Where(x => x.HasValue)
.Select(x => x.Value);
or in query expression form:
return from t in text.Split(',')
select TryParseInt32(t) into x
where x.HasValue
select x.Value;
Without using a lambda function and for valid inputs only, I think it's clearer to do this:
Array.ConvertAll<string, int>(value.Split(','), Convert.ToInt32);
--EDIT-- It looks like I took his question heading too literally - he was asking for an array of ints rather than a List --EDIT ENDS--
Yet another helper method...
private static int[] StringToIntArray(string myNumbers)
{
List<int> myIntegers = new List<int>();
Array.ForEach(myNumbers.Split(",".ToCharArray()), s =>
{
int currentInt;
if (Int32.TryParse(s, out currentInt))
myIntegers.Add(currentInt);
});
return myIntegers.ToArray();
}
quick test code for it, too...
static void Main(string[] args)
{
string myNumbers = "1,2,3,4,5";
int[] myArray = StringToIntArray(myNumbers);
Console.WriteLine(myArray.Sum().ToString()); // sum is 15.
myNumbers = "1,2,3,4,5,6,bad";
myArray = StringToIntArray(myNumbers);
Console.WriteLine(myArray.Sum().ToString()); // sum is 21
Console.ReadLine();
}
Let us assume that you will be reading the string from the console. Import System.Linq and try this one:
int[] input = Console.ReadLine()
.Split(',', StringSplitOptions.RemoveEmptyEntries)
.Select(int.Parse)
.ToArray();
This has been asked before. .Net has a built-in ConvertAll function for converting between an array of one type to an array of another type. You can combine this with Split to separate the string to an array of strings
Example function:
static int[] ToIntArray(this string value, char separator)
{
return Array.ConvertAll(value.Split(separator), s=>int.Parse(s));
}
Taken from here
This is for longs, but you can modify it easily to work with ints.
private static long[] ConvertStringArrayToLongArray(string str)
{
return str.Split(",".ToCharArray()).Select(x => long.Parse(x.ToString())).ToArray();
}
I don't see why taking out the enumerator explicitly offers you any advantage over using a foreach. There's also no need to call AsEnumerable on chunks.
import java.util.*;
import java.io.*;
public class problem
{
public static void main(String args[])enter code here
{
String line;
String[] lineVector;
int n,m,i,j;
Scanner sc = new Scanner(System.in);
line = sc.nextLine();
lineVector = line.split(",");
//enter the size of the array
n=Integer.parseInt(lineVector[0]);
m=Integer.parseInt(lineVector[1]);
int arr[][]= new int[n][m];
//enter the array here
System.out.println("Enter the array:");
for(i=0;i<n;i++)
{
line = sc.nextLine();
lineVector = line.split(",");
for(j=0;j<m;j++)
{
arr[i][j] = Integer.parseInt(lineVector[j]);
}
}
sc.close();
}
}
On the first line enter the size of the array separated by a comma. Then enter the values in the array separated by a comma.The result is stored in the array arr.
e.g
input:
2,3
1,2,3
2,4,6
will store values as
arr = {{1,2,3},{2,4,6}};

Efficient way of grouping characters in string c#

I want an efficient way of grouping strings whilst keeping duplicates and order.
Something like this
1100110002200 -> 101020
I tried this previously
_case.GroupBy(c => c).Select(g => g.Key)
but I got 102
But this gives me what I want, I just want to optimize it, so I wouldn't have to scour the entire list each time
static List<char> group(string _case)
{
var groups = new List<char>();
for (int i = 0; i < _case.Length; i++)
{
if (groups.LastOrDefault() != _case[i])
groups.Add(_case[i]);
}
return groups;
}
While I like the elegant solution of rshepp, it turns out that the very basic code can run even 5 times faster than that.
public static string Simplify2(string str)
{
if (string.IsNullOrEmpty(str)) { return str; }
StringBuilder sb = new StringBuilder();
char last = str[0];
sb.Append(last);
foreach (char c in str)
{
if (last != c)
{
sb.Append(c);
last = c;
}
}
return sb.ToString();
}
You could create a method that loops each character and checks the previous character for equality. If they aren't the same, append/yield return the character. This is pretty easy to do with Linq.
public static string Simplify(string str)
{
return string.Concat(str.Where((c, i) => i == 0 || c != str[i - 1]));
}
Usage:
string simplified = Simplify("1100110002200");
// 101020
In my testing, my method and yours are roughly equal in speed, mine being insignificantly slower after 10 million executions (4260ms vs 4241ms).
However, my method returns the result as a string whereas yours doesn't. If you need to convert your result back to a string (which is likely) then my method is indeed much faster/more efficient (4260ms vs 6569ms).

Formatting the string by rewriting the delimiters

I'm dealing with some legacy data, where they store each record in one huge/large string (one string = one record)
In each string, they split the data in some sort of delimiters, but each of them actually defines a meaning, for example: \vToyota\cBlue\cRed\cWhite\s200mph\oAndrew\oJohn
\v means vehicle, \c is color, \s is speed \o is Owner... something like that
My task requires me to reformat the data so that if there are multiple fields of one characteristic, I have to rewrite it as: (for example) \vToyota\cBlue\c2Red\c3White\s200mph\oAndrew\o2John
Edited: Alright. #DarrenYoung's suggestions works! Now I have an array of vToyota cBlue cRed cWhite s200mph oAndrew oJohn. I tested on other data using the same method and it is working too. Now I just need help to find a way to rewrite the first letter of each string whenever they are repeated.
Thank you!
I found this an interesting little puzzle to see what I could do with LINQ. The following seems to work:
private string FixIt(string foo)
{
var newFoo = "\\" + string.Join("\\",
foo.Split(new[] {'\\'}, StringSplitOptions.RemoveEmptyEntries)
.GroupBy(s => s[0],
(c, g) =>
{
var cnt = 0;
return g.Select(x => cnt++ == 0
? x
: x[0] + cnt.ToString() + x.Substring(1));
})
.SelectMany(g => g));
return newFoo;
}
Input: \vToyota\cBlue\cRed\cWhite\s200mph\oAndrew\oJohn
Output: \vToyota\cBlue\c2Red\c3White\s200mph\oAndrew\o2John
That SelectMany is a handy thing to remember.
Because I thought this question was interesting I wrote up a program to do what I believe to be a reasonable solution. I started with a few principle assumptions:
In "old data" situations you probably don't know every single option that is going to show up in the records. Consequently whatever approach is taken needs to quickly and easily accommodate new types of delimiters and tags. For that reason I did not use a string.split approach (even though this is easier to read). Instead all tokens are declared at the beginning of the file. Anything can be a token whether or not it has a "\" in front of it.
The solution needs to gracefully handle records that don't conform to the standards
The option of parsing integers for multiple records needs to be able to be disabled per record type. Speed, for example, doesn't (seem) to be able to appear multiple times per record. So, setting the value for speed to false in the "ALLOW_MULTIPLE" variable turns this parsing off, ensuring the correct output value.
In my solution I also created separate classes for readability and so the code could be quickly investigated. Although I would not suggest that this is production ready, the following should go a long ways towards solving the issue. Best of luck!
// Just paste the rest of this into a new console application to see it work!
public class Program
{
private static readonly List<string> TOKENS = new List<string> {#"\v", #"\c", #"\o", #"\s"};
private static readonly List<string> DISPLAY = new List<string> {"Vehicle", "Color", "Owner", "Speed"};
private static readonly List<bool> ALLOW_MULTIPLE = new List<bool> {false, true, true, false};
private class RecordEntry
{
public string Value { get; set; }
public int Index { get; set; }
public string DataType { get; set; }
public override string ToString() { return DataType + ": " + Value; }
}
private class ParsedRecord
{
private List<RecordEntry> entries = new List<RecordEntry>();
public List<RecordEntry> Entries { get { return entries; } }
}
public static void Main(string[] args)
{
// sample records (second has a \m which is ignored since it isn't a recognized token)
var records = new[] {#"\vToyota\cBlue\c2Red\c3White\s200mph\oAndrew\o2John",
#"\vChevy\c2Orange\cGreen\s50mph\o2Bob\mWhite"};
var parsedData = new List<ParsedRecord>();
foreach (var record in records)
{
// character by character parsing
var currentParseRecord = new ParsedRecord();
parsedData.Add(currentParseRecord);
var currentRecord = new StringBuilder(record);
var currentToken = new StringBuilder();
for (var parseIdx = 0; parseIdx < currentRecord.Length; parseIdx++)
{
currentToken.Append(currentRecord[parseIdx]);
var recordIdx = 0;
var index = TOKENS.IndexOf(currentToken.ToString());
if (index < 0) continue;
// current char is used up now (was part of the token)
parseIdx++;
if (ALLOW_MULTIPLE[index] && currentRecord.Length > parseIdx + 1)
{
// assuming less than 10 records max - if more, would need to pull multiple numeric values here
if (!Int32.TryParse(currentRecord[parseIdx] + "", out recordIdx)) recordIdx = 0;
else parseIdx++;
}
// find the next token or end of string
int valueLength = FindNextToken(currentRecord, parseIdx) - parseIdx;
if (valueLength <= 0) valueLength = currentRecord.Length - parseIdx;
currentParseRecord.Entries.Add(new RecordEntry
{
DataType = DISPLAY[index],
Index = recordIdx,
Value = currentRecord.ToString(parseIdx, valueLength)
});
parseIdx += valueLength - 1;
currentToken.Clear();
}
}
}
private static int FindNextToken(StringBuilder value, int currentIndex)
{
for (var searchIdx = currentIndex; searchIdx < value.Length; searchIdx++) {
if (TOKENS.Any(checkToken => value.Length > searchIdx + checkToken.Length &&
value.ToString(searchIdx, checkToken.Length) == checkToken)) {
return searchIdx;
}
}
return -1;
}
}

How to check if a string has at least 1 alphabetic character? [duplicate]

This question already has answers here:
How can I generate random alphanumeric strings?
(36 answers)
Closed 2 years ago.
My ASP.NET application requires me to generate a huge number of random strings such that each contain at least 1 alphabetic and numeric character and should be alphanumeric on the whole.
For this my logic is to generate the code again if the random string is numeric:
public static string GenerateCode(int length)
{
if (length < 2 || length > 32)
{
throw new RSGException("Length cannot be less than 2 or greater than 32.");
}
string newcode = Guid.NewGuid().ToString("n").Substring(0, length).ToUpper();
return newcode;
}
public static string GenerateNonNumericCode(int length)
{
string newcode = string.Empty;
try
{
newcode = GenerateCode(length);
}
catch (Exception)
{
throw;
}
while (IsNumeric(newcode))
{
return GenerateNonNumericCode(length);
}
return newcode;
}
public static bool IsNumeric(string str)
{
bool isNumeric = false;
try
{
long number = Convert.ToInt64(str);
isNumeric = true;
}
catch (Exception)
{
isNumeric = false;
}
return isNumeric;
}
While debugging, it is working properly but when I ask it to create 10,000 random strings, its not able to handle it properly. When I export that data to Excel, I find at least 20 strings on an average that are numeric.
Is it a problem with my code or C#? - Mine.
If anyone's looking for code,
public static string GenerateCode(int length)
{
if (length < 2)
{
throw new A1Exception("Length cannot be less than 2.");
}
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var random = new Random();
var result = new string(
Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)])
.ToArray());
return result;
}
public static string GenerateAlphaNumericCode(int length)
{
string newcode = string.Empty;
try
{
newcode = GenerateCode(length);
while (!IsAlphaNumeric(newcode))
{
newcode = GenerateCode(length);
}
}
catch (Exception)
{
throw;
}
return newcode;
}
public static bool IsAlphaNumeric(string str)
{
bool isAlphaNumeric = false;
Regex reg = new Regex("[0-9A-Z]+");
isAlphaNumeric = reg.IsMatch(str);
return isAlphaNumeric;
}
Thanks to all for your ideas.
If you want to stick with the Guid as the generator, you could always validate using a Regex
This will only return true if at least one alpha is present
Regex reg = new Regex("[a-zA-Z]+");
Then just use the IsMatch method to see if your string is valid
That way you don't need the (IMHO rather ugly) try..catch around the Convert.
Update : I see your subsequent comment about actually making your code slower. Are you instantiating the Regex object only once, or every time that the test is being done? If the latter then this will be rather inefficient, and you should consider using a "lazy-loaded" property on your class, e.g.
private Regex reg;
private Regex AlphaRegex
{
get
{
if (reg == null) reg = new Regex("[a-zA-Z]+");
return reg;
}
}
Then just use AlphaRegex.IsMatch() in your method. I would expect this to make a difference.
use name space then using System.Linq; use normal string
check whether the string consist at lest one character or number.
using System.Linq;
string StrCheck = "abcd123";
check the string has characters ---> StrCheck.Any(char.IsLetter)
check the string has numbers ---> StrCheck.Any(char.IsDigit)
if (StrCheck.Any(char.IsLetter) && StrCheck.Any(char.IsDigit))
{
//statement goes here.....
}
sorry for the late reply ...
I didn't quite understand what you want in the string except letters (abc etc) - lets say numbers.
You can generate a random character as following:
Random r = new Random();
r.Next('a', 'z'); //For lowercase
r.Next('A', 'Z'); //For capitals
//or you can convert lowercase to capital:
char c = 'k' + ('A' - 'a');
If you want to create a string:
var s = new StringBuilder();
for(int i = 0; i < length; ++i)
s.Append((char)r.Next('a', 'z' + 1)); //Changed to char
return s.ToString();
Note: I don't know ASP.NET so much, so I just act like it's C#.
To answer your question strictly, using your existing code: there is a problem with your recursion logic, which can be avoided by not using recursion (there is absolutely no reason to use recursion in GenerateNonNumericCode). Do the following instead:
public static string GenerateNonNumericCode(int length)
{
string newcode = GenerateCode(length);
while (IsNumeric(newcode))
{
newcode = GenerateCode(length);
}
return newcode;
}
Other General Notes
Your code is very inefficient--throwing exceptions is expensive, so using try/catch in a loop is therefore slow and pointless. As others have suggested, regex makes more sense (System.Text.RegularExpressions namespace).
Is it a problem with my code or C#?
When in doubt, the problem is almost never C#.
So, I would change the code to this:
static Random r = new Random();
public static string GenerateNonNumericCodeFaster(int length) {
var firstLength = r.Next(0, length - 1);
var secondLength = length - 1 - firstLength;
return GenerateCode(firstLength)
+ (char) r.Next((int)'A', (int)'G')
+ GenerateCode(secondLength);
}
You can keep your GenerateCode function as is. Everything else you toss out. The idea here of course is, rather than testing if the string contains an alphabetic character, you just explicitly PUT one in. In my tests, using this code could generate 10,000 8 character strings in 0.0172963 seconds compared to your code which takes around 52 seconds. So, yeah, this is about 3000 times faster :)

Convert comma separated string of ints to int array

I only found a way to do it the opposite way round: create a comma separated string from an int list or array, but not on how to convert input like string str = "1,2,3,4,5"; to an array or list of ints.
Here is my implementation (inspired by this post by Eric Lippert):
public static IEnumerable<int> StringToIntList(string str)
{
if (String.IsNullOrEmpty(str))
{
yield break;
}
var chunks = str.Split(',').AsEnumerable();
using (var rator = chunks.GetEnumerator())
{
while (rator.MoveNext())
{
int i = 0;
if (Int32.TryParse(rator.Current, out i))
{
yield return i;
}
else
{
continue;
}
}
}
}
Do you think this is a good approach or is there a more easy, maybe even built in way?
EDIT: Sorry for any confusion, but the method needs to handle invalid input like "1,2,,,3" or "###, 5," etc. by skipping it.
You should use a foreach loop, like this:
public static IEnumerable<int> StringToIntList(string str) {
if (String.IsNullOrEmpty(str))
yield break;
foreach(var s in str.Split(',')) {
int num;
if (int.TryParse(s, out num))
yield return num;
}
}
Note that like your original post, this will ignore numbers that couldn't be parsed.
If you want to throw an exception if a number couldn't be parsed, you can do it much more simply using LINQ:
return (str ?? "").Split(',').Select<string, int>(int.Parse);
If you don't want to have the current error handling behaviour, it's really easy:
return text.Split(',').Select(x => int.Parse(x));
Otherwise, I'd use an extra helper method (as seen this morning!):
public static int? TryParseInt32(string text)
{
int value;
return int.TryParse(text, out value) ? value : (int?) null;
}
and:
return text.Split(',').Select<string, int?>(TryParseInt32)
.Where(x => x.HasValue)
.Select(x => x.Value);
or if you don't want to use the method group conversion:
return text.Split(',').Select(t => t.TryParseInt32(t)
.Where(x => x.HasValue)
.Select(x => x.Value);
or in query expression form:
return from t in text.Split(',')
select TryParseInt32(t) into x
where x.HasValue
select x.Value;
Without using a lambda function and for valid inputs only, I think it's clearer to do this:
Array.ConvertAll<string, int>(value.Split(','), Convert.ToInt32);
--EDIT-- It looks like I took his question heading too literally - he was asking for an array of ints rather than a List --EDIT ENDS--
Yet another helper method...
private static int[] StringToIntArray(string myNumbers)
{
List<int> myIntegers = new List<int>();
Array.ForEach(myNumbers.Split(",".ToCharArray()), s =>
{
int currentInt;
if (Int32.TryParse(s, out currentInt))
myIntegers.Add(currentInt);
});
return myIntegers.ToArray();
}
quick test code for it, too...
static void Main(string[] args)
{
string myNumbers = "1,2,3,4,5";
int[] myArray = StringToIntArray(myNumbers);
Console.WriteLine(myArray.Sum().ToString()); // sum is 15.
myNumbers = "1,2,3,4,5,6,bad";
myArray = StringToIntArray(myNumbers);
Console.WriteLine(myArray.Sum().ToString()); // sum is 21
Console.ReadLine();
}
Let us assume that you will be reading the string from the console. Import System.Linq and try this one:
int[] input = Console.ReadLine()
.Split(',', StringSplitOptions.RemoveEmptyEntries)
.Select(int.Parse)
.ToArray();
This has been asked before. .Net has a built-in ConvertAll function for converting between an array of one type to an array of another type. You can combine this with Split to separate the string to an array of strings
Example function:
static int[] ToIntArray(this string value, char separator)
{
return Array.ConvertAll(value.Split(separator), s=>int.Parse(s));
}
Taken from here
This is for longs, but you can modify it easily to work with ints.
private static long[] ConvertStringArrayToLongArray(string str)
{
return str.Split(",".ToCharArray()).Select(x => long.Parse(x.ToString())).ToArray();
}
I don't see why taking out the enumerator explicitly offers you any advantage over using a foreach. There's also no need to call AsEnumerable on chunks.
import java.util.*;
import java.io.*;
public class problem
{
public static void main(String args[])enter code here
{
String line;
String[] lineVector;
int n,m,i,j;
Scanner sc = new Scanner(System.in);
line = sc.nextLine();
lineVector = line.split(",");
//enter the size of the array
n=Integer.parseInt(lineVector[0]);
m=Integer.parseInt(lineVector[1]);
int arr[][]= new int[n][m];
//enter the array here
System.out.println("Enter the array:");
for(i=0;i<n;i++)
{
line = sc.nextLine();
lineVector = line.split(",");
for(j=0;j<m;j++)
{
arr[i][j] = Integer.parseInt(lineVector[j]);
}
}
sc.close();
}
}
On the first line enter the size of the array separated by a comma. Then enter the values in the array separated by a comma.The result is stored in the array arr.
e.g
input:
2,3
1,2,3
2,4,6
will store values as
arr = {{1,2,3},{2,4,6}};

Categories