Find information that matches the entered word - c#

I am using C# net core, to consume an api through Postman, which by entering a word in my search input, returns data that contains that word regardless of whether it has an accent, uppercase, or lowercase.
It is the first time that I carry out this type of activation. However I don't remember how to do it, I have been looking for how to perform the filter with the matches using StringComparison.CurrentCulture.
[HttpGet("{datos}")]
public async Task<ActionResult<IEnumerable<FichaViewModel>>> GetSearchTramite(string nombre)
{
List<string> lista = new List<string>();
var coincidencias = _context.tra_tramites.Contains(String.CompareOrdinal());
return await FichaHelper.GetFicha(_context, nombre, url);
}
All the information I handle is aggregated in a database.
I would sincerely appreciate any comments and/or examples or pages to help me in my project.

Related

Combining fuzzy search with synonym expansion in Azure search

I'm using the Microsoft.Azure.Search SDK to run an Azure Cognitive Services search that includes synonym expansion. My SynonymMap is as follows:
private async Task UploadSynonyms()
{
var synonymMap = new SynonymMap()
{
Name = "desc-synonymmap",
Synonyms = "\"dog\", \"cat\", \"rabbit\"\n "
};
await m_SearchServiceClient.SynonymMaps.CreateOrUpdateAsync(synonymMap);
}
This is mapped to Animal.Name as follows:
index.Fields.First(f => f.Name == nameof(Animal.Name)).SynonymMaps = new[] { "desc-synonymmap" };
I am trying to use both fuzzy matching and synonym matching, so that, for example:
If I search for 'dog' it returns any Animal with a Name of 'dog', 'cat' or 'rabbit'
If I search for 'dob' it fuzzy matches to 'dog' and returns any Animal with a Name of 'dog', 'cat' or 'rabbit', as they are all synonyms for 'dog'
My search method is as follows:
private async Task RunSearch()
{
var parameters = new SearchParameters
{
SearchFields = new[] { nameof(Animal.Name) },
QueryType = QueryType.Full
};
var results = await m_IndexClientForQueries.Documents.SearchAsync<Animal>("dog OR dog~", parameters);
}
When I search for 'dog' it correctly returns any result with dog/cat/rabbit as it's Name. But when I search for 'dob' it only returns any matches for 'dog', and not any synonyms.
This answer from January 2019 states that "Synonym expansions do not apply to wildcard search terms; prefix, fuzzy, and regex terms aren't expanded." but this answer was posted over a year ago and things may have changed since then.
Is it possible to both fuzzy match and then match on synonyms in Azure Cognitive Search, or is there any workaround to achieve this?
#spaceplane
Synonym expansions do not apply to wildcard search terms; prefix, fuzzy, and regex terms aren't expanded
Unfortunately, this still holds true. Reference : https://learn.microsoft.com/en-us/azure/search/search-synonyms
The reason being the words/graphs that were obtained are directly passed to the index (as per this doc).
Having said that, I m thinking of two possible options that I may meet your requirement :
Option 1
Have a local Fuzzy matcher. Where you can get the possible matching words for a typed word.
Sharing a reference that I found: Link 1. I did come across a lot of packages which did the similar tasks.
Now from your obtained words you can build OR query binding all the matching words and issue it to the Azure cognitive Search.
So for an instance : When dob~ is fired - assuming "dot,dog" would be the words generated by the Fuzzy logic code.
We take these two words and subsequently issue "dog or dot" query to the Azure. Synonyms will be in turn effective because of the search term "dog "and the results will be retrieved accordingly based on the synonymmap.
Option 2
You could consider to handle using a synonym map. For example, mapping "dog" to "dob, dgo, dot" along with other synonyms.

JSON Data format to remove escaped characters

Having some trouble with parsing some JSON data, and removing the escaped characters so that I can then assign the values to a List. I've read lots of pages on SO about this very thing, and where people are having success, I am just now. I was wondering if anyone could run their eyes over my method to see what I am doing wrong?
The API I have fetching the JSON data from is from IPStack. It allows me to capture location based data from website visitors.
Here is how I am building up the API path. The two querystrings i've added to the URI are the access key that APIStack give you to use, as well as fields=main which gives you the main location based data (they have a few other blocks of data you can also get).
string api_URI = "http://api.ipstack.com/";
string api_IP = "100.121.126.33";
string api_KEY = "8378273uy12938";
string api_PATH = string.Format("{0}{1}?access_key={2}&fields=main", api_URI, api_IP, api_KEY);
The rest of the code in my method to pull the JSON data in is as follows.
System.Net.WebClient wc = new System.Net.WebClient();
Uri myUri = new Uri(api_PATH, UriKind.Absolute);
var jsonResponse = wc.DownloadString(myUri);
dynamic Data = Json.Decode(jsonResponse);
This gives me a JSON string that looks like this. (I have entered on each key/value to show you the format better). The IP and KEY I have obfuscated from my own details, but it won't matter in this summary anyway.
"{
\"ip\":\"100.121.126.33\",
\"type\":\"ipv4\",
\"continent_code\":\"OC\",
\"continent_name\":\"Oceania\",
\"country_code\":\"AU\",
\"country_name\":\"Australia\"
}"
This is where I believe the issue lies, in that I cannot remove the escaped characters. I have tried to use Regex.Escape(jsonResponse.ToString()); and whilst this does not throw any errors, it actually doesn't remove the \ characters either. It leaves me with the exact same string that went into it.
The rest of my method is to create a List which has one public string (country_name) just for limiting the scope during the test.
List<IPLookup> List = new List<IPLookup>();
foreach (var x in Data)
{
List.Add(new IPLookup()
{
country_name = x.country_name
});
}
The actual error in Visual Studio is thrown when it tries to add country_name to the List, as it complains that it does not contain country_name, and i'm presuming because it still has it's backslash attached to it?
Any help or pointers on where I can look to fix this one up?
Resolved just from the questions posed by Jon and Luke which got me looking at the problem from another angle.
Rather than finish my method in a foreach statement and trying to assign via x.something,,, I simple replaced that block of code with the following.
List<IPLookup> List = new List<IPLookup>();
List.Add(new IPLookup()
{
country_name = Data.country_name,
});
I can now access the key/value pairs from this JSON data without having to try remove the escaped characters that my debugger was showing me to have...

Regex: Find pagenumber from partial matching urls

As we all know, Regex patterns will make your stomache turn the first time you see them (or 10th time since you never went head first and truly learned it. Quilty.). I'm currently reading upon it, but since I'm on a tight deadline I'll check here if I can get a quicker and better answer/explaination meanwhile.
I have some url to a forum thread, and I want to scan through the html and find the last page for the thread.
So say I have one of the following urls identifying the thread in question:
https://www.somesite.com/forum/thread-93912* (absolute url to the
thread)
/forum/thread-93912 (relative url to the thread)
and I want to get all values (integers) that appear directly (next path) after any of the above "partial" match in the html-document.
So from any of the following hrefs located anywhere in the html-document (the doc is represented as a single string):
https://www.somesite.com/forum/thread-93912/34
https://www.somesite.com/forum/thread-93912/34/morestuffhere/whatevs
/forum/thread-93912/34
/forum/thread-93912/34/somethingheretoo
I want to extract the number 34 (only 34), so I can parse it to int.
EDIT
Okay, to make it simpler:
Say I have all the html in htmlString, and in this string I want to find all numbers x that appear after my inputString /forum/thread-93912.
These all appear in the htmlString, and I want to extract the numbers:
thread-93912/34
thread-93912/14
thread-93912/84
thread-93912/64
thread-93912/4
You don't need regex. Just use System.Uri.Segments
Uri url = new Uri("your url here");
Console.WriteLine(url.Segments[4]);
\b(\d+)\b(?=[^\d]*$)
Try this.See demo.grab the capture.
http://regex101.com/r/sU3fA2/55
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
Regex regex = new Regex(#"\b\d+\b(?=[^\d]*$)");
Match match = regex.Match("/forum/thread-93912/34");
if (match.Success)
{
Console.WriteLine(match.Value);
}
}
}
Since my question was a little hard to explain thuroughly (and since I "changed" my problem a little), I thought I'd add my own answer to get the exact code I went with (which I came up with thanks to the other answers here, so I'll give you all an upvote!).
I'm sure this can be made prettier and more compact, but I went for clearity since I'm new to regex!
First, get all strings matching the url + some number (separated with a slash "/"), then extract that number to a group called "page".
Regex regex = new Regex(urlToThread + #"/(?<page>\d+)");
MatchCollection matches = regex.Matches(htmlString);
Then iterate all matches and extract the "page"-value (garanteed to be an integer), and parse it to an integer. Add all parsed integers to a list and sort when done. The last one will be the greatest (last page).
List<int> pages = new List<int>();
foreach(Match match in matches)
pages.Add(int.Parse(match.Groups["page"].Value));
pages.Sort();
// And here we get the last page
int nrOfPages = pages[pages.Count-1];

easiest way to get each word of e-mail (text file) into an array C#

I am trying to build a phishing scanner for a class project and I am stuck on trying to get an e-mail saved in a text file to properly copy into an array for later processing. What I want is for each word to be in it's own array index.
Here is my sample e-mail:
Subject: Insufficient Funds Notice
Date: September 25, 2013
Insufficient Funds Notice
Unfortunately, on 09/25/2013 your available balance in your Wells Fargo account XXXXXX4653 was insufficient to cover one or more of your checks, Debit Card purchases, or other transactions.
An important notice regarding one or more of your payments is now available in your Messages & Alerts inbox.
To read the message, click here, and first confirm your identity.
Please make deposits to cover your payments, fees, and any other withdrawals or transactions you have initiated. If you have already taken care of this, please disregard this notice.
We appreciate your business and thank you for your prompt attention to this matter.
If you have questions after reading the notice in your inbox, please refer to the contact information in the notice. Please do not reply to this automated email.
Sincerely,
Wells Fargo Online Customer Service
wellsfargo.com | Fraud Information Center
4f57e44c-5d00-4673-8eae-9123909604b6
I don't want any of the punctuation all I need is the words and numbers.
Here is the code I have written for it so far.
StreamReader sr1 = new StreamReader(lblDisplaySelectedFilePath.Text);
string line = sr1.ReadToEnd();
words = line.Split(' ');
int wordslowercount = 0;
foreach (string word in words)
{
words[wordslowercount] = word.ToLower();
wordslowercount = wordslowercount + 1;
}
The issue with the above code is that I keep getting words that are either strung together and/or have "\r" or "\n" on them in the array. Here is an example of what is in the array that I don't want.
"notice\r\ndate:" don't want the \r, \n, or the :. Also the two words should be in different indexes.
The regex \W will allow you to split your string and create a list of words. This uses word boundaries, so it will not include punctuation.
Regex.Split(inputString, "\\W").Where(x => !string.IsNullOrWhiteSpace(x));
using System;
using System.Text.RegularExpressions;
public class Example
{
static string CleanInput(string strIn)
{
// Replace invalid characters with empty strings.
try {
return Regex.Replace(strIn, #"[^\w\.#-]", "",
RegexOptions.None, TimeSpan.FromSeconds(1.5));
}
// If we timeout when replacing invalid characters,
// we should return Empty.
catch (RegexMatchTimeoutException) {
return String.Empty;
}
}
}
Using line.Split(null) will split on white-space. From the C# String.Split method documentation:
If the separator parameter is null or contains no characters, white-space characters are assumed to be the delimiters. White-space characters are defined by the Unicode standard and return true if they are passed to the Char.IsWhiteSpace method.

Proximity Search example Lucene.Net

I want to make a Proximity Search with Lucene.Net. I saw this question where it looks like that was the answer for him, but no code was suplied. The Java documentation says to use the ~ character with the number of words in between, but I don't see where this character would go in the code. Anyone can give me an example of a Proximity Search using Lucene.Net?
Edit:
What I have so far:
IndexSearcher searcher = new IndexSearcher(this.Directory, true);
string[] fieldList = new string[] { "Name", "Description" };
List<BooleanClause.Occur> occurs = new List<BooleanClause.Occur>();
foreach (string field in fieldList)
{
occurs.Add(BooleanClause.Occur.SHOULD);
}
Query searchQuery = MultiFieldQueryParser.Parse(this.LuceneVersion, query, fieldList, occurs.ToArray(), this.Analyzer);
If I try to add the "~" with any number on the MultiFieldQueryParser it errors out saying that for a FuzzySearch the values should be between 0.0 and 1.0, but I want a Proximity Search 3 words of separation Ex. "my search"~3
The tilde means either a fuzzy search if you apply it on a single term, or a proximity search if you apply it on a phrase. The error you're receiving sounds like you're applying it on a single term (term~10) instead of using a phrase ("term term"~10).
To do a proximity search use the tilde, "~", symbol at the end of a Phrase.
The only differences between Lucene.NET and classic java lucene of the same version should be internal, not external -- operational goal is to have a very compatible project, especially on the input (queries) and output (index files) side. So it should work however it works for java lucene. If it don't, it is a bug.

Categories