C# Search multiline string for keyword and store variables

C# Search multiline string for keyword and store variables - c#

I have a multi-line string that I get from a database.
This string would have a format like:
The below text is for the label program
COMPANY=ComanyName
PRODUCT=ProductName
SERIALMASK=123456789YYWWXXXX
How do I go through this text and store variables or an array with ComanyName, ProductName, 123456789YYWWXXXX, so I can insert these values into textboxes on my Windows Forms Application?
My big hurdle is that sometimes the format would be:
The below text is for the label program
Company1 Information:
COMPANY=ComanyName
PRODUCT=ProductName
SERIALMASK=123456789YYWWXXXX
Company2 Information:
COMPANY=ComanyName
PRODUCT=ProductName
SERIALMASK=123456789YYWWXXXX
And in that case, I only wanna extract the first occurance of COMPANY, PRODUCT and SERIALMASK variables.
Now I have code that save each line in a variables, and I guess I could run a switch-case function in the foreach loop and look for substring. But I am hoping there is a more effective way

Try code below
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string input =
"COMPANY=ComanyName\n" +
"PRODUCT=ProductName\n" +
"SERIALMASK=123456789YYWWXXXX\n";
string pattern = #"(?'name'\w+)=(?'value'\w+)";
MatchCollection matches = Regex.Matches(input,pattern);
foreach(Match match in matches)
{
Console.WriteLine("Name : '{0}', Value : '{1}'", match.Groups["name"].Value, match.Groups["value"].Value);
}
Console.ReadLine();
Dictionary<string, string> dict = matches.Cast<Match>()
.GroupBy(x => x.Groups["name"].Value, y => y.Groups["value"].Value)
.ToDictionary(x => x.Key, y => y.FirstOrDefault());
}
}
}

This might do the trick for you
string allText = File.ReadAllText("JsonFound.txt");
List<string> allrecord = allText.Split(new string[] { "\r\n\r\n" }, StringSplitOptions.RemoveEmptyEntries)
.Where(x => x.Contains(":"))
.ToList();
List<CompanyInfo> CompanyInfos = new List<CompanyInfo>();
List<string> infos = new List<string>();
foreach(string s in allrecord)
{
infos = s.Split(new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries)
.Skip(Math.Max(0, 1))
.SelectMany(q=>q.Split('='))
.ToList();
CompanyInfo ci = new CompanyInfo();
ci.CompanyName = infos[1];
ci.ProductName = infos[3];
ci.SerialMaskNumber = infos[5];
CompanyInfos.Add(ci);
}
The Class CompanyInfo would look like this
public class CompanyInfo
{
public string CompanyName
{
get;
set;
}
public string ProductName
{
get;
set;
}
public string SerialMaskNumber
{
get;
set;
}
}

you can use the HashSet function. The list contains no duplicate records.
var hashSet = new HashSet<CompanyInfo>(CompanyInfos);

Related

Search multiple column values with one search string

I have this query where I want to return results if the search string provided yields data from the 'FirstName' and 'LastName' column of my database. They each work individually as I can get results for Firstname = 'john' and LastName = 'doe'. I also want to be able to pass in a search string of 'John doe' and get results. How would I implement this using .net/linq
snippet:
var query = _context.Players.Where(p => p.Firstname.Contains(searchString.ToLower().Trim()) || p.Lastname.Contains(searchString.ToLower().Trim()));

use Split function like the following:
var parts = searchString.Split();
snippet: var query = _context.Players.Where(
p => p.Firstname.Contains(parts[0].ToLower().Trim())
|| p.Lastname.Contains(parts[1].ToLower().Trim()));
Extracted from the official docs:
If the separator parameter is null or contains no characters, white-space characters are assumed to be the delimiters.

Separating the input data is also convenient
var parts = searchString.Split();
var partOne = parts[0].ToLower().Trim();
var partTwo = parts[1].ToLower().Trim()
var query = _context.Players.Where(
p => p.Firstname.Contains(partOne)
|| p.Lastname.Contains(partTwo));

Created an extension method class to isolate the functional parts of search algo. This is based on pattern matching algo. You can try this one once.
public class Person
{
public string FirstName { get; set; }
public string LastName { get; set; }
}
static class Extensions
{
public static void Sanitize(this string item)
{
Regex rgx = new Regex("[^a-AA-Z0-9 -]");
item = rgx.Replace(item, " ");
}
public static string GetPipedString(this string item)
{
StringBuilder builder = new StringBuilder();
item.Split(' ').ToList().ForEach(x => builder.Append('|').Append(x));
builder.Remove(0, 1);
return builder.ToString();
}
public static IEnumerable<Person> FindPlayers(this IEnumerable<Person> persons, string searchKey)
{
searchKey.Sanitize();
string pattern = string.Format(#"^?:{0}\w*$", searchKey.GetPipedString());
return persons.Where(x => Regex.IsMatch(
string.Join(string.Empty,
new List<string>() { x.FirstName, x.LastName }),
pattern,
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace));
}
}
class Program
{
static void Main(string[] args)
{
/* Assuming peoples is the IEnumerable<Person>.
Anyways there is an explicit sanitization of the string to remove the non alphanum characters*/
var items = peoples.FindPlayers("ANY DATA SPACE SEPARATED").ToList();
}
}

how to use contains on list of string with where clause

When I am trying to filter records that match any item in the search list then it's not returning me anything. please see code
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
class Test {
// Main Method
public static void Main(String[] args)
{
List<String> searchlist = new List<String>();
searchlist.Add("Mang");
searchlist.Add("Apple");
List<String> firstlist = new List<String>();
firstlist.Add("Mango");
firstlist.Add("Apple");
firstlist.Add("Orange");
firstlist.Add("Grapes");
test1 obj = new test1();
obj.stringList = firstlist;
Console.Write(obj.stringList.Where(x=> searchlist.Contains(x)).Count());
}
public class test1
{
public string id { get; set; }
public string name { get; set; }
public List<String> stringList { get; set; }
}
}
In the above example, if I will pass a full string like "Mango" then it will return the result but if I try to search only "Mang" (partial words) then it's not working.

The reason is that when using Contains() on a collection, it results in comparing its items. So you are asking whether "Mang" == "Mango"
As it was stated here in another answer, you want to ensure that strings are compared using contains, but it is important to choose which string we apply a Contains to
var result = obj.stringList.Where(item => searchlist.Any(searchString => item.Contains(searchString))).Count();

Try this
Console.Write(obj.stringList.Where(x => searchlist.Any(list => x.Contains(list))).Count());

mongodb query: filter results by string attribute that is partly matched (filter string is subset of attribute string)

i want to query data from a mongoDB database and want to apply a filter to it. Stuff like this works fine:
var wantedAttributes = "word";
Collection.Find(Builders<MyModel>.Filter.Eq("Attributes", wantedAttributes)).ToList();
but only if my wantedAttributes match exactly to the Attributes field value in the db.
My usecase is that, the Attributes values are lists of strings, like:
word1, word2, word3
word2, word3, word1
word3, word1, word4
What i want is a method to get or match all entries that contain a given set of words, but not necessary in the same order. More words are allowed but not less!
So if my wantedAttributes = word4 i want to get the third entry only and if my wantedAttributes = word1,word2 i want the first and the second.
The wantedAttributes do not necessary has to be a string of comma separated words, but the database entries are.
What is the best way to achieve that?

Try this:
var wanteds = "word4";
var filter = Builders<MyModel>.Filter.Empty;
foreach (var wanted in wanteds.Split(','))
{
filter = filter & Builders<MyModel>.Filter.Where(m => m.Str.Contains(wanted.Trim()));
}
var models = collection.Find(filter).ToList();
My model:
class MyModel
{
[BsonElement("id")]
public int Id { get; set; }
[BsonElement("str")]
public string Str { get; set; }
}
You can customize the wanted string and string separator.

since you asked the best way, i'd suggest storing the attributes as a string array instead of a string in the db and query like so:
var wantedAttributes = new[] { "word1", "word2" };
var result = collections.AsQueryable<MyModel>()
.Where(m => wantedAttributes.All(a => m.Attributes.Contains(a)))
.ToList();
this way you can index the Attributes field and get lightning fast results. the m.Str.Contains(wanted.Trim()) method which #charles suggested will cause a regex match which will not be able to use an index. only prefixed regex queries can use indexes in mongodb.
here's full test program:
using MongoDB.Entities;
using MongoDB.Entities.Core;
using System;
using System.Linq;
namespace StackOverFlow
{
public class MyModel : Entity
{
public string[] Attributes { get; set; }
}
public static class Program
{
private static void Main()
{
new DB("test");
new[] {
new MyModel{
Attributes = new[]{ "word1", "word2", "word3" }
},
new MyModel{
Attributes = new[]{ "word2", "word3", "word1" }
},
new MyModel{
Attributes = new[]{ "word3", "word1", "word4" }
}
}
.Save();
var wantedAttributes = new[] { "word1", "word2" };
var result = DB.Queryable<MyModel>()
.Where(m => wantedAttributes.All(a => m.Attributes.Contains(a)))
.ToList();
}
}
}

Interleave an array of email addresses avoiding items with same domain to be consecutive

I'm looking for an efficient way of sorting an array of email addresses to avoid items with the same domain to be consecutive, in C#.
Email addresses inside the array are already distinct and all of them are lower case.
Example:
Given an array with the following entries:
john.doe#domain1.com
jane_doe#domain1.com
patricksmith#domain2.com
erick.brown#domain3.com
I would like to obtain something similar to the following:
john.doe#domain1.com
patricksmith#domain2.com
jane_doe#domain1.com
erick.brown#domain3.com

With the help of an extension method (stolen from https://stackoverflow.com/a/27533369/172769), you can go like this:
List<string> emails = new List<string>();
emails.Add("john.doe#domain1.com");
emails.Add("jane_doe#domain1.com");
emails.Add("patricksmith#domain2.com");
emails.Add("erick.brown#domain3.com");
var q = emails.GroupBy(m => m.Split('#')[1]).Select(g => new List<string>(g)).Interleave();
The Interleave method is defined as:
public static IEnumerable<T> Interleave<T>(this IEnumerable<IEnumerable<T>> source )
{
var queues = source.Select(x => new Queue<T>(x)).ToList();
while (queues.Any(x => x.Any())) {
foreach (var queue in queues.Where(x => x.Any())) {
yield return queue.Dequeue();
}
}
}
So basically, we create groups based on the domain part of the email adresses, project (or Select) each group into a List<string>, and then "Interleave" those lists.
I have tested against your sample data, but more thorough testing might be needed to find edge cases.
DotNetFiddle snippet
Cheers

This will distribute them semi-evenly and attempt to avoid matching domains next to each other (although in certain lists that may be impossible). This answer will use OOP and Linq.
DotNetFiddle.Net Example
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
var seed = new List<string>()
{
"1#a.com",
"2#a.com",
"3#a.com",
"4#a.com",
"5#a.com",
"6#a.com",
"7#a.com",
"8#a.com",
"9#a.com",
"10#a.com",
"1#b.com",
"2#b.com",
"3#b.com",
"1#c.com",
"4#b.com",
"2#c.com",
"3#c.com",
"4#c.com"
};
var work = seed
// Create a list of EmailAddress objects
.Select(s => new EmailAddress(s)) // s.ToLowerCase() ?
// Group the list by Domain
.GroupBy(s => s.Domain)
// Create a List<EmailAddressGroup>
.Select(g => new EmailAddressGroup(g))
.ToList();
var currentDomain = string.Empty;
while(work.Count > 0)
{
// this list should not be the same domain we just used
var noDups = work.Where(w => w.Domain != currentDomain);
// if none exist we are done, or it can't be solved
if (noDups.Count() == 0)
{
break;
}
// find the first group with the most items
var workGroup = noDups.First(w => w.Count() == noDups.Max(g => g.Count()));
// get the email address and remove it from the group list
var workItem = workGroup.Remove();
// if the group is empty remove it from *work*
if (workGroup.Count() == 0)
{
work.Remove(workGroup);
Console.WriteLine("removed: " + workGroup.Domain);
}
Console.WriteLine(workItem.FullEmail);
// last domain looked at.
currentDomain = workItem.Domain;
}
Console.WriteLine("Cannot disperse email addresses affectively, left overs:");
foreach(var workGroup in work)
{
while(workGroup.Count() > 0)
{
var item = workGroup.Remove();
Console.WriteLine(item.FullEmail);
}
}
}
public class EmailAddress
{
public EmailAddress(string emailAddress)
{
// Additional Email Address Validation
var result = emailAddress.Split(new char[] {'#'}, StringSplitOptions.RemoveEmptyEntries)
.ToList();
if (result.Count() != 2)
{
new ArgumentException("emailAddress");
}
this.FullEmail = emailAddress;
this.Name = result[0];
this.Domain = result[1];
}
public string Name { get; private set; }
public string Domain { get; private set; }
public string FullEmail { get; private set; }
}
public class EmailAddressGroup
{
private List<EmailAddress> _emails;
public EmailAddressGroup(IEnumerable<EmailAddress> emails)
{
this._emails = emails.ToList();
this.Domain = emails.First().Domain;
}
public int Count()
{
return _emails.Count();
}
public string Domain { get; private set; }
public EmailAddress Remove()
{
var result = _emails.First();
_emails.Remove(result);
return result;
}
}
}
Output:
1#a.com
1#b.com
2#a.com
1#c.com
3#a.com
2#b.com
4#a.com
2#c.com
5#a.com
3#b.com
6#a.com
3#c.com
7#a.com
removed: b.com
4#b.com
8#a.com
removed: c.com
4#c.com
9#a.com
Cannot disperse email addresses affectively, left overs:
10#a.com

Something like this will spread them equally, but you will have the problems (=consecutive elements) at the end of the new list...
var list = new List<string>();
list.Add("john.doe#domain1.com");
list.Add("jane_doe#domain1.com");
list.Add("patricksmith#domain2.com");
list.Add("erick.brown#domain3.com");
var x = list.GroupBy(content => content.Split('#')[1]);
var newlist = new List<string>();
bool addedSomething=true;
int i = 0;
while (addedSomething) {
addedSomething = false;
foreach (var grp in x) {
if (grp.Count() > i) {
newlist.Add(grp.ElementAt(i));
addedSomething = true;
}
}
i++;
}

Edit: Added a high level description :)
What this code does is group each element by the domain, sort the groups by size in descending order (largest group first), project the elements of each group into a stack, and pop them off of each stack (always pop the next element off the largest stack with a different domain). If there is only a single stack left, then its contents are yielded.
This should make sure that all domains distributed as evenly as possible.
MaxBy extension method from: https://stackoverflow.com/a/31560586/969962
private IEnumerable<string> GetNonConsecutiveEmails(List<string> list)
{
var emailAddresses = list.Distinct().Select(email => new EmailAddress { Email = email, Domain = email.Split('#')[1]}).ToArray();
var groups = emailAddresses
.GroupBy(addr => addr.Domain)
.Select (group => new { Domain = group.Key, EmailAddresses = new Stack<EmailAddress>(group)})
.ToList();
EmailAddress lastEmail = null;
while(groups.Any(g => g.EmailAddresses.Any()))
{
// Try and pick from the largest stack.
var stack = groups
.Where(g => (g.EmailAddresses.Any()) && (lastEmail == null ? true : lastEmail.Domain != g.Domain))
.MaxBy(g => g.EmailAddresses.Count);
// Null check to account for only 1 stack being left.
// If so, pop the elements off the remaining stack.
lastEmail = (stack ?? groups.First(g => g.EmailAddresses.Any())).EmailAddresses.Pop();
yield return lastEmail.Email;
}
}
class EmailAddress
{
public string Domain;
public string Email;
}
public static class Extensions
{
public static T MaxBy<T,U>(this IEnumerable<T> data, Func<T,U> f) where U:IComparable
{
return data.Aggregate((i1, i2) => f(i1).CompareTo(f(i2))>0 ? i1 : i2);
}
}

What I am trying to do here is to sort them first.
Then I re-arrange from a different end. I'm sure there're more efficient ways to do this but this is one easy way to do it.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication4
{
class Program
{
static void Main(string[] args)
{
String[] emails = { "john.doe#domain1.com", "jane_doe#domain1.com", "patricksmith#domain2.com", "erick.brown#domain3.com" };
var result = process(emails);
}
static String[] process(String[] emails)
{
String[] result = new String[emails.Length];
var comparer = new DomainComparer();
Array.Sort(emails, comparer);
for (int i = 0, j = emails.Length - 1, k = 0; i < j; i++, j--, k += 2)
{
if (i == j)
result[k] = emails[i];
else
{
result[k] = emails[i];
result[k + 1] = emails[j];
}
}
return result;
}
}
public class DomainComparer : IComparer<string>
{
public int Compare(string left, string right)
{
int at_pos = left.IndexOf('#');
var left_domain = left.Substring(at_pos, left.Length - at_pos);
at_pos = right.IndexOf('#');
var right_domain = right.Substring(at_pos, right.Length - at_pos);
return String.Compare(left_domain, right_domain);
}
}
}

Returning table with CLR

I want to write an CLR procedure which takes a text and returns a table with all the words in this text. But I can't figure out how to return a table. Could you please tell me it?
[Microsoft.SqlServer.Server.SqlFunction]
public static WhatTypeShouldIWriteHere Function1(SqlString str)
{
string[] words = Regex.Split(str, #"\W+").Distinct().ToArray();
//how to return a table with one column of words?
}
Thank you for your help.
UPDATED: I need to do it for sql-2005

Here is a full blown sample. I got tired of searching for this myself and even though this is answered, I thought I would post this just to keep a fresh reference online.
using System;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
using System.Collections;
using System.Collections.Generic;
public partial class UserDefinedFunctions {
[SqlFunction]
public static SqlBoolean RegexPatternMatch(string Input, string Pattern) {
return Regex.Match(Input, Pattern).Success ? new SqlBoolean(true) : new SqlBoolean(false);
}
[SqlFunction]
public static SqlString RegexGroupValue(string Input, string Pattern, int GroupNumber) {
Match m = Regex.Match(Input, Pattern);
SqlString value = m.Success ? m.Groups[GroupNumber].Value : null;
return value;
}
[SqlFunction(DataAccess = DataAccessKind.Read, FillRowMethodName = "FillMatches", TableDefinition = "GroupNumber int, MatchText nvarchar(4000)")]
public static IEnumerable RegexGroupValues(string Input, string Pattern) {
List<RegexMatch> GroupCollection = new List<RegexMatch>();
Match m = Regex.Match(Input, Pattern);
if (m.Success) {
for (int i = 0; i < m.Groups.Count; i++) {
GroupCollection.Add(new RegexMatch(i, m.Groups[i].Value));
}
}
return GroupCollection;
}
public static void FillMatches(object Group, out SqlInt32 GroupNumber, out SqlString MatchText) {
RegexMatch rm = (RegexMatch)Group;
GroupNumber = rm.GroupNumber;
MatchText = rm.MatchText;
}
private class RegexMatch {
public SqlInt32 GroupNumber { get; set; }
public SqlString MatchText { get; set; }
public RegexMatch(SqlInt32 group, SqlString match) {
this.GroupNumber = group;
this.MatchText = match;
}
}
};

You can return any list that implements an IEnumerable. Check this out.

[SqlFunction(DataAccess = DataAccessKind.Read, FillRowMethodName = "FillMatches", TableDefinition = "GroupNumber int, MatchText nvarchar(4000)")]
public static IEnumerable Findall(string Pattern, string Input)
{
List<RegexMatch> GroupCollection = new List<RegexMatch>();
Regex regex = new Regex(Pattern);
if (regex.Match(Input).Success)
{
int i = 0;
foreach (Match match in regex.Matches(Input))
{
GroupCollection.Add(new RegexMatch(i, match.Groups[0].Value));
i++;
}
}
return GroupCollection;
}
That was a slight alteration from the code by "Damon Drake"
This one does a findall instead of returning the first value found.
so
declare #txt varchar(100) = 'Race Stat 2017-2018 -(FINAL)';
select * from dbo.findall('(\d+)', #txt)
returns

This is a new area of SQL Server, you should consult this article. Which shows the syntax of a table-valued function -- that is what you want to create.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Search multiline string for keyword and store variables - c#

you can use the HashSet function. The list contains no duplicate records. var hashSet = new HashSet<CompanyInfo>(CompanyInfos);

Related

Search multiple column values with one search string

how to use contains on list of string with where clause

mongodb query: filter results by string attribute that is partly matched (filter string is subset of attribute string)

Interleave an array of email addresses avoiding items with same domain to be consecutive

Returning table with CLR

Categories

Resources