Write line from text file if regex finds specific word - c#

I have multiple regex conditions in my application when my program reads and finds certain wording. I have a new requirement to write out that line to the Message.Body in my IF statement. I only need to look back 15 minutes. How do I send the line with this wording?
This is what the log file starts with before an error happens:
10/30/2014 7:19:06 AM 19993108 There is not enough space on the disk: I need the number after the time and before the message more than anything.
//This section looks for matching the words
Regex regex2 = new Regex("(?<time>.+(AM|PM)).*There is not enough space on the disk.");
var lastFailTime2 = File.ReadLines(file)
.Select(line => regex2.Match(line))
.Where(m => m.Success) // take only matched lines
.Select(m => DateTime.Parse(m.Groups["time"].Value))
.DefaultIfEmpty() // DateTime.Min if no failures
.Max();

Probably the fastest way would be using the Linq Extensions Library.
It's got an ElementAtMax() extension method, which returns the element for which the maximum selected value occurs (as opposed to LINQ Max(), which returns said maximum value).
EDIT: If for some reasons you need to avoid adding third party libraries to your code, it's not that complicated to write one yourself (though if at all possible, go with the former - this is basically reinventing the wheel):
public static TSource ElementAtMax<TSource, TComparable>(
this IEnumerable<TSource> source,
Func<TSource, TComparable> selector) where TComparable : IComparable
{
/* check for empty/null arguments */
TSource result = default(TSource);
TComparable currentMax = null;
bool firstItem = true;
foreach (var item in source)
{
if (firstItem)
{
result = item;
currentMax = selector(item);
firstItem = false;
continue;
}
var nextVal = selector(item);
if (currentMax != null && currentMax.CompareTo(nextVal) > 0)
continue;
currentMax = nextVal;
result = item;
}
return result;
}

I would get a string of the file's text, and then use the IndexOf method to find the index of the match string (m.ToString()) within the text of the file. Then just count the number of instances of the newline character from the beginning of the text to the index of the match. Use this count to determine what line it occurred on.

Related

how to group a field and check some condition in a faster way

I have a file as a input it contains data of 4 fields like [001|00|002|G1] this totally 20k records. i need to group based on first field [001] and check for the condition of last field [G1] and write on to a file and continue grouping next data for ex[002] and check for the last field and write into a file. for this i need a linq. i tried to create a list distinct first field and then checked with actual input file but it has 20k records so it takes so much of time so i need linq to reduce time. thank you
So you have a file with comma separated strings, where every string contains for items separated by a vertical bar. You won't to manipulate the data in your file.
First of all: good programming needs separation of concerns. You need to separate the way that your data is stored (a file with a certain format) from what the actual data.
This means, that you need to have a function to read the storage to produce the data. A separate method will manipulate the data.
This has the advantage, that if you need the same file to manipulate it otherwise, you don't have to rewrite the reading of the file. Or if you decide to use a different file format, for instance a proper CSV file, or a database, you don't have to rewrite the data manipulation part.
So let's separate your problem into smaller parts:
Read the file into a stream of characters and divide the stream of characters into a sequence of strings. Each string is the part between commas, without the whitespaces
Convert the sequence of strings into a sequence of the data that it represents
Do your data manipulation
To make the functions look more LINQ like, I'll use extension methods. See extension methods demystified
(1) Read the file to get a sequence of characters. For this we create an extension method for TextReader.
public static class ExtensionsForTextReader
{
public static IEnumerable<string> ReadLines (this TextReader reader, char delimiter)
{
reader.ReadLines(delimiter, null);
}
public static IEnumerable<string> ReadLines (this TextReader reader,
char delimiter,
IEqualityComparer<char> comparer)
{
// TODO: exception if no reader
// if no comparer given, use default comparer
if (comparer == null) comparer = EqualityComparer<char>.Default;
List<char> chars = new List<char> ();
while (reader.Peek() >= 0)
{
char c = (char)reader.Read ();
if (c == delimiter) {
yield return new String(chars.ToArray());
chars.Clear ();
continue;
}
chars.Add(c);
}
}
}
This was easy and fairly straightforward.
(2) Convert the sequence of strings into a sequence of the data that it represents
I don't know what your data means, so let's call it MyData:
class MyData
{
public int A {get; set;}
public int B {get; set;}
public int C {get; set;}
public string D {get; set;}
}
We also need extension methods for strings that will convert every string that with a delimiter to an object of MyData:
public static MyData ToMyData(string text)
{
return text.ToMyData('|');
}
public static MyData ToMyData(string text, char delimiter)
{
// TODO: exception if text equals null
// TODO: maybe you want to remove starting and trailing whitespaces
// we'll split the text into four separate strings using ToStrings defined above
using (TextReader textReader = new StringReader(text))
{
// I expect exactly 4 strings. Decide what to do if less or more
ICollection<string> textParts = textReader.ToStrings(delimiter)
.Take(5)
.ToList();
if (textParts.Count == 4)
{
return new MyData
{
A = Int32.Parse(textParts[0],
B = Int32.Parse(textParts[1],
C = Int32.Parse(textParts[2],
D = textParts[3],
};
}
// else TODO: decide what to do if incorrect input format
}
}
The same for a sequence of strings:
public static IEnumerable<MyData> ToMyData(this IEnumerable<string> lines)
{
return lines.Select(line => line.ToMyData();
}
For completeness, you can write an overload with a delimiter and a comparer.
This is all you need to convert from your storage into the data that is stored in it. Less than half a page of code.
// read the file and produce a sequence of MyData
using (TextReader textReader = new StreamReader(fileName))
{
IEnumerable<MyData> = textReader.ReadLines(',').ToMyData('|');
// TODO: process the sequence of MyData
}
Now we can finally focus on your data manipulation.
First I need to group 001 as a group and check for last field G1 and should not print. Then I need to group all three 002 as a group and check each lines last field R3 , H1 , G1 and print all 3 record if any one is not G1.
The grouping won't be the problem. The result is a group with key 001, or 002, etc, and values the MyData objects that have A == 001
If any of the D values in the group equal "G1", then the group should not be in the end result.
var groupsWithSameAWithoutG1 = textReader.ReadLines(',')
.ToMyData('|')
// make groups with same value for A:
.GroupBy(data => data.A)
// keep only those groups that have no groupMember with D == "G1"
.Where(group => !group.Where(groupMember => groupMember.D).Any());
You wrote: "and print all records if any one is not G1". So let's print the remaining items:
foreach (var group in groupsWithSameAWithoutG1)
{
Print(group.Key)
foreach (MyData groupMember in group)
{
Print(groupMember.B);
Print(groupMember.C);
Print(groupMember.D);
}
}
Because you separated the storage of your data (file) from the manipulation, it is easy to change the storage device, or to change the layout of the storage to for instance JSON, without having to rewrite the code manipulation.
Similarly: if you need different manipulations, for instance if you want the first 7 items with B == 5, ordered by ascending C, you can use the same file conversion methods
var DataWithBIs5 = textReader.ReadLines(',').ToMyData('|')
.Where(data => data.B == 5)
.OrderBy(data => data.C)
.Take(7);

How to combine items in List<string> to make new items efficiently

I have a case where I have the name of an object, and a bunch of file names. I need to match the correct file name with the object. The file name can contain numbers and words, separated by either hyphen(-) or underscore(_). I have no control of either file name or object name. For example:
10-11-12_001_002_003_13001_13002_this_is_an_example.svg
The object name in this case is just a string, representing an number
10001
I need to return true or false if the file name is a match for the object name. The different segments of the file name can match on their own, or any combination of two segments. In the example above, it should be true for the following cases (not every true case, just examples):
10001
10002
10003
11001
11002
11003
12001
12002
12003
13001
13002
And, we should return false for this case (among others):
13003
What I've come up with so far is this:
public bool IsMatch(string filename, string objectname)
{
var namesegments = GetNameSegments(filename);
var match = namesegments.Contains(objectname);
return match;
}
public static List<string> GetNameSegments(string filename)
{
var segments = filename.Split('_', '-').ToList();
var newSegments = new List<string>();
foreach (var segment in segments)
{
foreach (var segment2 in segments)
{
if (segment == segment2)
continue;
var newToken = segment + segment2;
newSegments.Add(newToken);
}
}
return segments.Concat(newSegments).ToList();
}
One or two segments combined can make a match, and that is enought. Three or more segments combined should not be considered.
This does work so far, but is there a better way to do it, perhaps without nesting foreach loops?
First: don't change debugged, working, sufficiently efficient code for no reason. Your solution looks good.
However, we can make some improvements to your solution.
public static List<string> GetNameSegments(string filename)
Making the output a list puts restrictions on the implementation that are not required by the caller. It should be IEnumerable<String>. Particularly since the caller in this case only cares about the first match.
var segments = filename.Split('_', '-').ToList();
Why ToList? A list is array-backed. You've already got an array in hand. Just use the array.
Since there is no longer a need to build up a list, we can transform your two-loop solution into an iterator block:
public static IEnumerable<string> GetNameSegments(string filename)
{
var segments = filename.Split('_', '-');
foreach (var segment in segments)
yield return segment;
foreach (var s1 in segments)
foreach (var s2 in segments)
if (s1 != s2)
yield return s1 + s2;
}
Much nicer. Alternatively we could notice that this has the structure of a query and simply return the query:
public static IEnumerable<string> GetNameSegments(string filename)
{
var q1= filename.Split('_', '-');
var q2 = from s1 in q1
from s2 in q1
where s1 != s2
select s1 + s2;
return q1.Concat(q2);
}
Again, much nicer in this form.
Now let's talk about efficiency. As is often the case, we can achieve greater efficiency at a cost of increased complication. This code looks like it should be plenty fast enough. Your example has nine segments. Let's suppose that nine or ten is typical. Our solutions thus far consider the ten or so singletons first, and then the hundred or so combinations. That's nothing; this code is probably fine. But what if we had thousands of segments and were considering millions of possibilities?
In that case we should restructure the algorithm. One possibility would be this general solution:
public bool IsMatch(HashSet<string> segments, string name)
{
if (segments.Contains(name))
return true;
var q = from s1 in segments
where name.StartsWith(s1)
let s2 = name.Substring(s1.Length)
where s1 != s2
where segments.Contains(s2)
select 1; // Dummy. All we care about is if there is one.
return q.Any();
}
Your original solution is quadratic in the number of segments. This one is linear; we rely on the constant order contains operation. (This assumes of course that string operations are constant time because strings are short. If that's not true then we have a whole other kettle of fish to fry.)
How else could we extract wins in the asymptotic case?
If we happened to have the property that the collection was not a hash set but rather a sorted list then we could do even better; we could binary search the list to find the start and end of the range of possible prefix matches, and then pour the list into a hashset to do the suffix matches. That's still linear, but could have a smaller constant factor.
If we happened to know that the target string was small compared to the number of segments, we could attack the problem from the other end. Generate all possible combinations of partitions of the target string and check if both halves are in the segment set. The problem with this solution is that it is quadratic in memory usage in the size of the string. So what we'd want to do there is construct a special hash on character sequences and use that to populate the hash table, rather than the standard string hash. I'm sure you can see how the solution would go from there; I shan't spell out the details.
Efficiency is very much dependent on the business problem that you're attempting to solve. Without knowing the full context/usage it's difficult to define the most efficient solution. What works for one situation won't always work for others.
I would always advocate to write working code and then solve any performance issues later down the line (or throw more tin at the problem as it's usually cheaper!) If you're having specific performance issues then please do tell us more...
I'm going to go out on a limb here and say (hope) that you're only going to be matching the filename against the object name once per execution. If that's the case I reckon this approach will be just about the fastest. In a circumstance where you're matching a single filename against multiple object names then the obvious choice is to build up an index of sorts and match against that as you were already doing, although I'd consider different types of collection depending on your expected execution/usage.
public static bool IsMatch(string filename, string objectName)
{
var segments = filename.Split('-', '_');
for (int i = 0; i < segments.Length; i++)
{
if (string.Equals(segments[i], objectName)) return true;
for (int ii = 0; ii < segments.Length; ii++)
{
if (ii == i) continue;
if (string.Equals($"{segments[i]}{segments[ii]}", objectName)) return true;
}
}
return false;
}
If you are willing to use the MoreLINQ NuGet package then this may be worth considering:
public static HashSet<string> GetNameSegments(string filename)
{
var segments = filename.Split(new char[] {'_', '-'}, StringSplitOptions.RemoveEmptyEntries).ToList();
var matches = segments
.Cartesian(segments, (x, y) => x == y ? null : x + y)
.Where(z => z != null)
.Concat(segments);
return new HashSet<string>(matches);
}
StringSplitOptions.RemoveEmptyEntries handles adjacent separators (e.g. --). Cartesian is roughly equivalent to your existing nested for loops. The Where is to remove null entries (i.e. if x == y). Concat is the same as your existing Concat. The use of HashSet allows for your Contains calls (in IsMatch) to be faster.

How to count from a sentence in C# [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
So, I am creating a word filter for a game server in C# and basically I am trying to scourer the sentence for banned words and replace them with clean words. I've already done so, but now I'm up to the part where I want to scan the sentence for a list of sentence banned words. I'm hopeless at this bit, and I can't seem to wrap my head around it.
Basically I am CheckSentence(Message) in the ChatManager, and need the following code to count and return continue; if the value is more than 5. So far I have:
public bool CheckSentence(string Message)
{
foreach (WordFilter Filter in this._filteredWords.ToList())
{
if (Message.ToLower().Contains(Filter.Word) && Filter.IsSentence)
{
// count Message, if message contains >5
// from (Message.Contains(Filter.Word))
// continue; else (ignore)
}
}
return false;
}
I'm not too sure if that makes much sense, but I want it to continue; if there are more than 5 Message.Contains(Filter.Word)
public bool CheckSentence(string rawMessage)
{
var lower = rawMessage.ToLower();
var count = 0;
foreach (WordFilter Filter in this._filteredWords.ToList())
{
if (lower.Contains(Filter.Word) && Filter.IsSentence)
{
count++;
}
}
return count >= 5;
}
If this becomes too slow, you may be better of caching the list of filtered words in a HashSet, and iterating over each word in the message, checking if it exists in the HashSet, which would give you O(n) speed, where N is the number of words.
LINQ Version
public bool CheckSentenceLinq(string rawMessage)
{
var lower = rawMessage.ToLower();
return _filteredWords
.Where(x => x.IsSentence)
.Count(x => lower.Contains(x.Word)) >= 5;
}
EDIT 2: LINQ Updated As per #S.C. Comment
By #S.C.
For the linq version, there's no need to count past the first five. return _filteredWords.Where(x => x.IsSentence && lower.Contains(x.Word)).Skip(5).Any();
public bool CheckSentenceLinq(string rawMessage)
{
var lower = rawMessage.ToLower();
return _filteredWords
.Where(x => x.IsSentence)
.Where(x => lower.Contains(x.Word))
.Skip(5)
.Any();
}
ToUpper vs ToLower
As #DevEstacion mentioned and per Microsoft best practices for using string recommendations here it is best to use ToUpperInvariant() for string comparisons rather than ToLowerInvariant().
EDIT:Using Continue
public bool CheckSentenceWithContinue(string rawMessage)
{
var lower = rawMessage.ToLower();
var count = 0;
foreach (WordFilter Filter in this._filteredWords.ToList())
{
if (!Filter.IsSentence)
continue; // Move on to the next filter, as this is not a senetece word filter
if (!lower.Contains(Filter.Word))
continue; // Move on to the next filter, as the message does not contain this word
// If you are here it means filter is a Sentence filter, and the message contains the word, so increment the counter
count++;
}
return count >= 5;
}
I believe someone already posted a correct answer, I'm just here to provide an alternative.
So instead of doing a forloop or foreach, I'll be providing you with Regex solution.
public bool CheckSentence(string rawMessage)
{
/*
The string.Join("|", _filteredWords) will create the pattern for the Regex
the '|' means or so from the list of filtered words, it will look it up on
the raw message and get all matches
*/
return new Regex(string.Join("|", _filteredWords.Where(x => x.IsSentence)),
RegexOptions.IgnoreCase | RegexOptions.Compiled).Match(rawMessage).Length >= 5;
}
Benefits? much shorter, prevents loop and could be faster :)
Don't forget to add these two lines of using declaration on top of the .cs file
using System.Linq;
using System.Text.RegularExpressions;

how to find all the double characters in a string in c#

I am trying to get a count of all the double characters in a string using C#,i.e "ssss" should be two doubles not three doubles.
For example right now i have to do a for loop in the string like this
string s="shopkeeper";
for(int i=1;i<s.Length;i++) if(s[i]==s[i-1]) d++;
the value of d at the end should be 1
Is there a shorter way to do this? in linq or regex? and what are the perfomance implications, what is the most effective way? Thanks for your help
I have read [How to check repeated letters in a string c#] and
it's helpful, but doesn't address double characters, i am looking for
double characters
Try following Regex to extract any double characters: "(.)\1"
UPD: simple example:
foreach (var match in Regex.Matches("shhopkeeper", #"(.)\1"))
Console.WriteLine(match);
This works:
var doubles =
text
.Skip(1)
.Aggregate(
text.Take(1).Select(x => x.ToString()).ToList(),
(a, c) =>
{
if (a.Last().Last() == c)
a[a.Count - 1] += c.ToString();
else
a.Add(c.ToString());
return a;
})
.Select(x => x.Length / 2)
.Sum();
I gives me these results:
"shopkeeper" -> 1
"beekeeper" -> 2
"bookkeeper" -> 3
"boookkkeeeper" -> 3
"booookkkkeeeeper" -> 6
First I would like to mention that there is no "natural" LINQ solution to this problem, so every standard LINQ based solution will be ugly and highly inefficient compared to a simple for loop.
However there is a LINQ "spirit" solution to this and similar problems, like the linked How to check repeated letters in a string c# or if you want for instance finding not doubles, but let say triples, quadruples etc.
The common sub problem is, given a some sequence of elements, generate a new sequence of (value, count) pair groups for the consecutive elements having one and the same value.
It can be done with a custom extension method like this (the name of the method could be different, it's not essential for the point):
public static class EnumerableEx
{
public static IEnumerable<TResult> Zip<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, int, TResult> resultSelector, IEqualityComparer<TSource> comparer = null)
{
if (comparer == null) comparer = EqualityComparer<TSource>.Default;
using (var e = source.GetEnumerator())
{
for (bool more = e.MoveNext(); more;)
{
var value = e.Current;
int count = 1;
while ((more = e.MoveNext()) && comparer.Equals(e.Current, value)) count++;
yield return resultSelector(value, count);
}
}
}
}
Using this function in combination with standard LINQ, one can easily solve the original question:
var s = "shhopkeeperssss";
var countDoubles = s.Zip((value, count) => count / 2).Sum();
but also
var countTriples = s.Zip((value, count) => count / 3).Sum();
or
var countQuadruples = s.Zip((value, count) => count / 4).Sum();
or the question from the link
var repeatedChars = s.Zip((value, count) => new { Char = value, Count = count })
.Where(e => e.Count > 1);
etc.

c# - BinarySearch StringList with wildcard

I have a sorted StringList and wanted to replace
foreach (string line3 in CardBase.cardList)
if (line3.ToLower().IndexOf((cardName + Config.EditionShortToLong(edition)).ToLower()) >= 0)
{
return true;
}
with a binarySearch, since the cardList ist rather large(~18k) and this search takes up around 80% of the time.
So I found the List.BinarySearch-Methode, but my problem is that the lines in the cardList look like this:
Brindle_Boar_(Magic_2012).c1p247924.prod
But I have no way to generate the c1p... , which is a problem cause the List.BinarySearch only finds exact matches.
How do I modify List.BinarySearch so that it finds a match if only a part of the string matches?
e. g.
searching for Brindle_Boar_(Magic_2012) should return the position of Brindle_Boar_(Magic_2012).c1p247924.prod
List.BinarySearch will return the ones complement of the index of the next item larger than the request if an exact match is not found.
So, you can do it like this (assuming you'll never get an exact match):
var key = (cardName + Config.EditionShortToLong(edition)).ToLower();
var list = CardBase.cardList;
var index = ~list.BinarySearch(key);
return index != list.Count && list[index].StartsWith(key);
BinarySearch() has an overload that takes an IComparer<T> has second parameter, implement a custom comparer and return 0 when you have a match within the string - you can use the same IndexOf() method there.
Edit:
Does a binary search make sense in your scenario? How do you determine that a certain item is "less" or "greater" than another item? Right now you only provide what would constitute a match. Only if you can answer this question, binary search applies in the first place.
You can take a look at the C5 Generic Collection Library (you can install it via NuGet also).
Use the SortedArray(T) type for your collection. It provides a handful of methods that could prove useful. You can even query for ranges of items very efficiently.
var data = new SortedArray<string>();
// query for first string greater than "Brindle_Boar_(Magic_2012)" an check if it starts
// with "Brindle_Boar_(Magic_2012)"
var a = data.RangeFrom("Brindle_Boar_(Magic_2012)").FirstOrDefault();
return a.StartsWith("Brindle_Boar_(Magic_2012)");
// query for first 5 items that start with "Brindle_Boar"
var b = data.RangeFrom("string").Take(5).Where(s => s.StartsWith("Brindle_Boar"));
// query for all items that start with "Brindle_Boar" (provided only ascii chars)
var c = data.RangeFromTo("Brindle_Boar", "Brindle_Boar~").ToList()
// query for all items that start with "Brindle_Boar", iterates until first non-match
var d = data.RangeFrom("Brindle_Boar").TakeWhile(s => s.StartsWith("Brindle_Boar"));
The RageFrom... methods perform a binary search, find the first element greater than or equal to your argument, that returns an iterator from that position

Categories