C# anagrams of different string length? - c#

public static bool IsAnagramOf(this string word1, string word2)
{
return word1.OrderBy(x => x).SequenceEqual(word2.OrderBy(x => x));
}
I'm currently pulling everything from a large xml file with all english words. I'm then comparing each word against the given string to see if it's an anagram. I'm then storing each correct word and returning them.
However...
I'm wanting to make it so the anagrams do not have to be of equal string length.
For example: "Hello" contains "Hello", "Hell", "He" etc...
Is there anyway to do this that's relatively small in code?
Thanks!
Edit: So including subanagrams as well as anagrams of equal length.

Maybe your method should be called ContainsTheSameSetOfLetters?
public static bool ContainsTheSameSetOfLetters(this string word1, string word2)
{
var chars = new HashSet<char>(word1);
return word2.All(x => chars.Contains(x));
}
If you care about number of time particular letter is being used, you can use following:
public static bool ContainsTheSameSetOfLetters(string word1, string word2)
{
var chars = word1.GroupBy(x => x).ToDictionary(g => g.Key, g => g.Count());
return word2.GroupBy(x => x).All(g => chars.ContainsKey(g.Key) && chars[g.Key] >= g.Count());
}

Instead of using SequenceEqual, try creating an extension method that checks that the sequence starts with another sequence.

Related

Selecting max number from a list of string in LINQ?

trying to select the max digit from a list of strings:
int maxDigit = this.myList.Where(x=> x.Name.Any(Char.IsDigit))
.Select(x => int.Parse(x.Name)).DefaultIfEmpty(0).Max();
It is int.Parse(x.Name) which is causing an exception as this is returning the entire name string e.g. 'myValue99' which of course cannot be parsed to an int. I just want to return 99. I don't know where the digits will be in the string therefore cannot for example take the last two.
I need the DefaultIfEmpty for cases where the string does not contain a number.
Assuming you want the max number and not the max digit, all you need is a function to convert "stuff99" to 99. Then the Linq part becomes child's play:
int maxNumber = myList.Max(ExtractNumberFromText);
or, to be closer to your specs:
int maxNumber = myList
.Select(ExtractNumberFromText)
.DefaultIfEmpty(0)
.Max();
#Codecaster already pointed to a few applicable answers on this site for the second part. I adopted a simple one. No error checking.
// the specs: Only ever going to be my stuff1, stuff99
int ExtractNumberFromText(string text)
{
Match m = Regex.Match(text, #"\d*");
return int.Parse(m.Groups[0].Value); // exception for "abc"
// int.Parse("0" + m.Groups[0].Value); // use this for default to 0
}
you should only select and parse the Digit characters out of your string
int maxDigit = this.myList.Where(x => x.Name.Any(Char.IsDigit))
.Select(x => int.Parse(new string(x.Name.Where(Char.IsDigit).ToArray())))
.DefaultIfEmpty(0).Max();
Assuming the input can contain the following categories:
nulls
Empty strings
Strings with only alphabetical characters
Strings with mixed alphabetical and numerical characters
Strings with only numerical characters
You want to introduce a method that extracts the number, if any, or returns a meaningful value if not:
private static int? ParseStringContainingNumber(string input)
{
if (String.IsNullOrEmpty(input))
{
return null;
}
var numbersInInput = new String(input.Where(Char.IsDigit).ToArray());
if (String.IsNullOrEmpty(numbersInInput))
{
return null;
}
int output;
if (!Int32.TryParse(numbersInInput, out output))
{
return null;
}
return output;
}
Note that not all characters for which Char.IsDigit returns true can be parsed by Int32.Parse(), hence the TryParse.
Then you can feed your list to this method:
var parsedInts = testData.Select(ParseStringContainingNumber)
.Where(i => i != null)
.ToList();
And do whatever you want with the parsedInts list, like calling IEnumerable<T>.Max() on it.
With the following test data:
var testData = new List<string>
{
"۱‎", // Eastern Arabic one, of which Char.IsDigit returns true.
"123",
"abc456",
null,
"789xyz",
"foo",
"9bar9"
};
This returns:
123
456
789
99
Especially note the latest case.
To find the max digit (not number) in each string:
static void Main(string[] args)
{
List<string> strList = new List<string>() { "Value99", "46Text" };
List<int> resultList = new List<int>();
foreach (var str in strList)
{
char[] resultString = Regex.Match(str, #"\d+").Value.ToCharArray();
int maxInt = resultString.Select(s => Int32.Parse(s.ToString())).Max();
resultList.Add(maxInt);
}
}
It can be simple using Regex.
You stated 99, so you need to span more than one digit:
var maxNumber = myTestList.SelectMany(x => getAllNumnbersFromString(x.Name)).DefaultIfEmpty(0).Max();
static List<int> getAllNumnbersFromString(string str)
{
List<int> results = new List<int>();
var matchesCollection = Regex.Matches(str, "[0-9]+");
foreach (var numberMatch in matchesCollection)
{
results.Add(Convert.ToInt32(numberMatch.ToString()));
}
return results;
}
One digit only check:
int maxNumber = myTestList.SelectMany(x => x.Name.ToCharArray().ToList())
.Select(x => Char.IsDigit(x) ? (int)Char.GetNumericValue(x) : 0)
.DefaultIfEmpty(0).Max();
Probably there's a slicker way to do this but I would just do:
int tmp = 0;
int maxDigit = this.myList.Where(x=> x.Name.Any(Char.IsDigit))
.Select(x =>
(int.TryParse(x.Name,out tmp ) ? int.Parse(x.Name) : 0 ) ).Max();
You have to remember that Parse will error out if it can't parse the value but TryParse will just give you false.

What is the best way to check if a string can be parsed into an int array?

I need to determine if a string can be parsed into an array of int. The string MAY be in the format
"124,456,789,0"
In case which can it can converted thus:
int[] Ids = SearchTerm.Split(',').Select(int.Parse).ToArray();
However the string may also be something like:
"Here is a string, it is very nice."
In which case the parsing fails.
The logic currently branches in two directions based on whether the string contains a comma character (assuming that only the array-like strings will contain this character) but this logic is now flawed and comma characters are now appearing in other strings.
I could put a Try..Catch around it but I am generally adverse to controlling logic flow by exceptions.
Is there an easy way to do this?
I could put a Try..Catch around it but I am generally adverse to controlling logic flow by exceptions
Good attitude. If you can avoid the exception, do so.
A number of answers have suggested
int myint;
bool parseFailed = SearchTerm.Split(',')
.Any( s => !int.TryParse(s, out myint));
Which is not bad, but not great either. I would be inclined to first, write a better helper method:
static class Extensions
{
public static int? TryParseAsInteger(this string s)
{
int j;
bool success = int.TryParse(s, out j);
if (success)
return j;
else
return null;
}
}
Now you can say:
bool parseFailed = SearchTerm.Split(',')
.Any( s => s.TryParseAsInteger() == null);
But I assume that what you really want is the parsed state if it can succeed, rather than just answering the question "would a parse succeed?" With this helper method you can say:
List<int?> parse = SearchTerm.Split(',')
.Select( s => s.TryParseAsInteger() )
.ToList();
And now if the list contains any nulls, you know that it was bad; if it doesn't contain any nulls then you have the results you wanted:
int[] results = parse.Contains(null) ? null : parse.Select(x=>x.Value).ToArray();
int myint;
bool parseFailed = SearchTerm.Split(',')
.Any( s => !int.TryParse(s, out myint));
You can use multiline lambda expression to get int.TryParse for every Split method result:
var input = "124,456,789,0";
var parts = input.Split(new [] {","}, StringSplitOptions.RemoveEmptyEntries);
var numbers
= parts.Select(x =>
{
int v;
if (!int.TryParse(x, out v))
return (int?)null;
return (int?)v;
}).ToList();
if (numbers.Any(x => !x.HasValue))
Console.WriteLine("string cannot be parsed as int[]");
else
Console.WriteLine("OK");
It will not only check if value can be parsed to int, but also return the value if it can, so you don't have to do the parsing twice.
you can use RegEx to determine if the string match your pattern
something like this
string st = "124,456,789,0";
string pattS = #"[0-9](?:\d{0,2})";
Regex regex = new Regex(pattS);
var res = regex.Matches(st);
foreach (var re in res)
{
//your code here
}
tested on rubular.com here
How about,
int dummy;
var parsable = SearchTerm.Split(',').All(s => int.TryParse(s, out dummy));
but if you are doing that you might as well just catch the exception
Why dont you first remove the characters from the string and use
bool res = int.TryParse(text1, out num1);
Example below has no limit.
The BigInteger type is an immutable type that represents an arbitrarily large integer whose value in theory has no upper or lower bounds.
BigInteger MSDN
string test = "20,100,100,100,100,100,100";
test = test.Replace(",", "");
BigInteger num1 = 0;
bool res = BigInteger.TryParse(test, out num1);

Generic extraction of multiple decimals using Regular expressions

Hi how do you extract multiple decimals with different number of decimal places from a string?
I'm looking to find a generic way to extract 3 numbers out the following strings.
e.g
CC77X1722X12 => 77,1722,12
PC77.5X10102X12.5 => 77.5, 10102, 12.5
XP60.25X0.333X12 => 60.25, 0.333, 12
The three numbers are always separated by 'X', and the string always starts with 2 characters
Thanks!
Since you have such a specific pattern, you don't even need to use regular expressions. Because the first two characters can be ignored and all the numbers are separated by 'X' characters, this C# code should do the trick (with appropriate error handling added, of course)
public IEnumerable<decimal> ExtractNumbers(string s)
{ // For s = "CC77X1722X12"
string[] nums = s.Substring(2).Split('X'); // nums = ["77", "1722", "12"];
return nums.Select(num => decimal.Parse(num)); // returns [77, 1722, 12]
}
For production code, though, I would recommend decimal.TryParse over decimal.Parse. To use that method, you could write something like
public IEnumerable<decimal> ExtractNumbers(string s)
{
string[] nums = s.Substring(2).Split('X');
return nums
.Select(num => {
decimal d;
if (decimal.Parse(num, out d))
return new {Number = d, Succeeded = true};
return new {Number = 0, Succeeded = false};
})
.Filter(result => result.Succeeded)
.Select(result => result.Number);
}

Sorting array of formatted time strings

I am trying to sort my arraylist.
The array list consists of data in time format.
Array:
9:15 AM, 10:20 AM
How should I sort it?
The result i get from below code is :
10:20 AM
9:15 AM
Below is my code:
String timeText = readFileTime.ReadLine();
timeSplit = timeText.Split(new char[] { '^' });
Array.Sort(timeSplit);
foreach (var sortedArray in timeSplit)
{
sortedTimeListBox.Items.Add(sortedArray);
}
Yes, since you simply split a string, you're merely sorting an array of strings (meaning 1 comes before 9 and it doesn't care about the decimal point). To get the sorting you desire, you need to first convert it into a DateTime like this:
timeSplit = timeText
.Split(new char[] { '^' });
.Select(x => new { Time = DateTime.Parse(x), String = x })
.OrderBy(x => x.Time)
.Select(x => x.String)
.ToArray();
Here, what we've done is:
Split the string as you had done before
Create a new anonymous type that contains the original string and also that string converted into a DateTime.
Ordered it by the DateTime property
Select'ed back to the original string
Converted it into an array
timeSplit now contains the strings sorted as you wanted.
Array.Sort(timeSplit, delegate(string first, string second)
{
return DateTime.Compare(Convert.ToDateTime(first), Convert.ToDateTime(second));
});

Is there a way of making strings file-path safe in c#?

My program will take arbitrary strings from the internet and use them for file names. Is there a simple way to remove the bad characters from these strings or do I need to write a custom function for this?
Ugh, I hate it when people try to guess at which characters are valid. Besides being completely non-portable (always thinking about Mono), both of the earlier comments missed more 25 invalid characters.
foreach (var c in Path.GetInvalidFileNameChars())
{
fileName = fileName.Replace(c, '-');
}
Or in VB:
'Clean just a filename
Dim filename As String = "salmnas dlajhdla kjha;dmas'lkasn"
For Each c In IO.Path.GetInvalidFileNameChars
filename = filename.Replace(c, "")
Next
'See also IO.Path.GetInvalidPathChars
To strip invalid characters:
static readonly char[] invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars
var validFilename = new string(filename.Where(ch => !invalidFileNameChars.Contains(ch)).ToArray());
To replace invalid characters:
static readonly char[] invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars and an _ for invalid ones
var validFilename = new string(filename.Select(ch => invalidFileNameChars.Contains(ch) ? '_' : ch).ToArray());
To replace invalid characters (and avoid potential name conflict like Hell* vs Hell$):
static readonly IList<char> invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars and replaces invalid chars with a unique letter (Moves the Char into the letter range of unicode, starting at "A")
var validFilename = new string(filename.Select(ch => invalidFileNameChars.Contains(ch) ? Convert.ToChar(invalidFileNameChars.IndexOf(ch) + 65) : ch).ToArray());
This question has been asked many times before and, as pointed out many times before, IO.Path.GetInvalidFileNameChars is not adequate.
First, there are many names like PRN and CON that are reserved and not allowed for filenames. There are other names not allowed only at the root folder. Names that end in a period are also not allowed.
Second, there are a variety of length limitations. Read the full list for NTFS here.
Third, you can attach to filesystems that have other limitations. For example, ISO 9660 filenames cannot start with "-" but can contain it.
Fourth, what do you do if two processes "arbitrarily" pick the same name?
In general, using externally-generated names for file names is a bad idea. I suggest generating your own private file names and storing human-readable names internally.
I agree with Grauenwolf and would highly recommend the Path.GetInvalidFileNameChars()
Here's my C# contribution:
string file = #"38?/.\}[+=n a882 a.a*/|n^%$ ad#(-))";
Array.ForEach(Path.GetInvalidFileNameChars(),
c => file = file.Replace(c.ToString(), String.Empty));
p.s. -- this is more cryptic than it should be -- I was trying to be concise.
Here's my version:
static string GetSafeFileName(string name, char replace = '_') {
char[] invalids = Path.GetInvalidFileNameChars();
return new string(name.Select(c => invalids.Contains(c) ? replace : c).ToArray());
}
I'm not sure how the result of GetInvalidFileNameChars is calculated, but the "Get" suggests it's non-trivial, so I cache the results. Further, this only traverses the input string once instead of multiple times, like the solutions above that iterate over the set of invalid chars, replacing them in the source string one at a time. Also, I like the Where-based solutions, but I prefer to replace invalid chars instead of removing them. Finally, my replacement is exactly one character to avoid converting characters to strings as I iterate over the string.
I say all that w/o doing the profiling -- this one just "felt" nice to me. : )
Here's the function that I am using now (thanks jcollum for the C# example):
public static string MakeSafeFilename(string filename, char replaceChar)
{
foreach (char c in System.IO.Path.GetInvalidFileNameChars())
{
filename = filename.Replace(c, replaceChar);
}
return filename;
}
I just put this in a "Helpers" class for convenience.
If you want to quickly strip out all special characters which is sometimes more user readable for file names this works nicely:
string myCrazyName = "q`w^e!r#t#y$u%i^o&p*a(s)d_f-g+h=j{k}l|z:x\"c<v>b?n[m]q\\w;e'r,t.y/u";
string safeName = Regex.Replace(
myCrazyName,
"\W", /*Matches any nonword character. Equivalent to '[^A-Za-z0-9_]'*/
"",
RegexOptions.IgnoreCase);
// safeName == "qwertyuiopasd_fghjklzxcvbnmqwertyu"
Here's what I just added to ClipFlair's (http://github.com/Zoomicon/ClipFlair) StringExtensions static class (Utils.Silverlight project), based on info gathered from the links to related stackoverflow questions posted by Dour High Arch above:
public static string ReplaceInvalidFileNameChars(this string s, string replacement = "")
{
return Regex.Replace(s,
"[" + Regex.Escape(new String(System.IO.Path.GetInvalidPathChars())) + "]",
replacement, //can even use a replacement string of any length
RegexOptions.IgnoreCase);
//not using System.IO.Path.InvalidPathChars (deprecated insecure API)
}
static class Utils
{
public static string MakeFileSystemSafe(this string s)
{
return new string(s.Where(IsFileSystemSafe).ToArray());
}
public static bool IsFileSystemSafe(char c)
{
return !Path.GetInvalidFileNameChars().Contains(c);
}
}
Why not convert the string to a Base64 equivalent like this:
string UnsafeFileName = "salmnas dlajhdla kjha;dmas'lkasn";
string SafeFileName = Convert.ToBase64String(Encoding.UTF8.GetBytes(UnsafeFileName));
If you want to convert it back so you can read it:
UnsafeFileName = Encoding.UTF8.GetString(Convert.FromBase64String(SafeFileName));
I used this to save PNG files with a unique name from a random description.
private void textBoxFileName_KeyPress(object sender, KeyPressEventArgs e)
{
e.Handled = CheckFileNameSafeCharacters(e);
}
/// <summary>
/// This is a good function for making sure that a user who is naming a file uses proper characters
/// </summary>
/// <param name="e"></param>
/// <returns></returns>
internal static bool CheckFileNameSafeCharacters(System.Windows.Forms.KeyPressEventArgs e)
{
if (e.KeyChar.Equals(24) ||
e.KeyChar.Equals(3) ||
e.KeyChar.Equals(22) ||
e.KeyChar.Equals(26) ||
e.KeyChar.Equals(25))//Control-X, C, V, Z and Y
return false;
if (e.KeyChar.Equals('\b'))//backspace
return false;
char[] charArray = Path.GetInvalidFileNameChars();
if (charArray.Contains(e.KeyChar))
return true;//Stop the character from being entered into the control since it is non-numerical
else
return false;
}
From my older projects, I've found this solution, which has been working perfectly over 2 years. I'm replacing illegal chars with "!", and then check for double !!'s, use your own char.
public string GetSafeFilename(string filename)
{
string res = string.Join("!", filename.Split(Path.GetInvalidFileNameChars()));
while (res.IndexOf("!!") >= 0)
res = res.Replace("!!", "!");
return res;
}
I find using this to be quick and easy to understand:
<Extension()>
Public Function MakeSafeFileName(FileName As String) As String
Return FileName.Where(Function(x) Not IO.Path.GetInvalidFileNameChars.Contains(x)).ToArray
End Function
This works because a string is IEnumerable as a char array and there is a string constructor string that takes a char array.
Many anwer suggest to use Path.GetInvalidFileNameChars() which seems like a bad solution to me. I encourage you to use whitelisting instead of blacklisting because hackers will always find a way eventually to bypass it.
Here is an example of code you could use :
string whitelist = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.";
foreach (char c in filename)
{
if (!whitelist.Contains(c))
{
filename = filename.Replace(c, '-');
}
}

Categories