Trouble with Regular Expression and Ampersand - c#

I'm having a bit of trouble with regex's (C#, ASP.NET), and I'm pretty sure I'm doing something fundamentally wrong. My task is to bind a dynamically created gridview to a datasource, and then iterate through a column in the grid, looking for the string "A&I". An example of what the data in the cell (in template column) looks like is:
Name: John Doe
Phone: 555-123-1234
Email: john.doe#url.com
Dept: DHS-A&I-MRB
Here's the code I'm using to find the string value:
foreach(GridViewRow gvrow in gv.Rows)
{
Match m = Regex.Match(gvrow.Cells[6].Text,"A&I");
if(m.Success)
{
gvrow.ForeColor = System.Drawing.Color.Red;
}
}
I'm not having any luck with any of these variations:
"A&I"
"[A][&][I]"
But when I strictly user "&", the row does turn red. Any suggestions?
Thanks, Dan

The Regex looks fine to me. I suspect the text to perhaps be encoded like:
A&I
on the input.
You could also do gvrow.Cells[6].Text.Contains("A&I") instead of regex. Or gvrow.Cells[6].Text.Contains("A&I") if I'm right with the encoding issue.
string.Contains is also faster than Regex.
You could also HttpUtility.HtmlDecode on the text before checking for the occurance of A&I.

Both of these match successfully:
Match m = Regex.Match("DHS-A&I-MRB", "A&I");
Match m0 = Regex.Match("DHS-A&I-MRB", #"A\&I");
Debug.WriteLine("m.Success = " + m.Success.ToString());
Debug.WriteLine("m0.Success = " + m0.Success.ToString());
Output:
m.Success = True
m0.Success = True
Perhaps the problem is elsewhere (possibly the wrong Cells index)?

Related

Look for words in a textbox text and display in data grid

I need to program a windows form in c#, it needs a textBox and a button. in the textbox I have to type a programming instruction for example:
for(i=0;i<10;i++)
then click a button and in the datagrid it should be displayed something like this:
for - cycle
( - agrupation
i - variable
= - asignation
and so on
how can I identify the parts of the text?
I've tried foreach char but I'm really messed up :( help please
Here is a solution that you can use which I cobbled together. I highly recommend you familiarise your with Regular Expressions:
https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
and here is a nice tester that I used:
http://regexstorm.net/tester
using System.Text.RegularExpressions;
string input = "for(i=0;i<10;i++)";
string pattern = #"^(\w+)(\W)(\w)(\W).*$";
MatchCollection matches = Regex.Matches(input, pattern);
string cycle = matches[0].Groups[1].Value;
string agrupation = matches[0].Groups[2].Value;
string variable = matches[0].Groups[3].Value;
string asignation = matches[0].Groups[4].Value;
string test = string.Format("cycle: {0}, agrupation: {1}, variable={2}, asignation: {3}", cycle, agrupation, variable, asignation);
Console.WriteLine(test);

How to strip a string from the point a hyphen is found within the string C#

I'm currently trying to strip a string of data that is may contain the hyphen symbol.
E.g. Basic logic:
string stringin = "test - 9894"; OR Data could be == "test";
if (string contains a hyphen "-"){
Strip stringin;
output would be "test" deleting from the hyphen.
}
Console.WriteLine(stringin);
The current C# code i'm trying to get to work is shown below:
string Details = "hsh4a - 8989";
var regexItem = new Regex("^[^-]*-?[^-]*$");
string stringin;
stringin = Details.ToString();
if (regexItem.IsMatch(stringin)) {
stringin = stringin.Substring(0, stringin.IndexOf("-") - 1); //Strip from the ending chars and - once - is hit.
}
Details = stringin;
Console.WriteLine(Details);
But pulls in an Error when the string does not contain any hyphen's.
How about just doing this?
stringin.Split('-')[0].Trim();
You could even specify the maximum number of substrings using overloaded Split constructor.
stringin.Split('-', 1)[0].Trim();
Your regex is asking for "zero or one repetition of -", which means that it matches even if your input does NOT contain a hyphen. Thereafter you do this
stringin.Substring(0, stringin.IndexOf("-") - 1)
Which gives an index out of range exception (There is no hyphen to find).
Make a simple change to your regex and it works with or without - ask for "one or more hyphens":
var regexItem = new Regex("^[^-]*-+[^-]*$");
here -------------------------^
It seems that you want the (sub)string starting from the dash ('-') if original one contains '-' or the original string if doesn't have dash.
If it's your case:
String Details = "hsh4a - 8989";
Details = Details.Substring(Details.IndexOf('-') + 1);
I wouldn't use regex for this case if I were you, it makes the solution much more complex than it can be.
For string I am sure will have no more than a couple of dashes I would use this code, because it is one liner and very simple:
string str= entryString.Split(new [] {'-'}, StringSplitOptions.RemoveEmptyEntries)[0];
If you know that a string might contain high amount of dashes, it is not recommended to use this approach - it will create high amount of different strings, although you are looking just for the first one. So, the solution would look like something like this code:
int firstDashIndex = entryString.IndexOf("-");
string str = firstDashIndex > -1? entryString.Substring(0, firstDashIndex) : entryString;
you don't need a regex for this. A simple IndexOf function will give you the index of the hyphen, then you can clean it up from there.
This is also a great place to start writing unit tests as well. They are very good for stuff like this.
Here's what the code could look like :
string inputString = "ho-something";
string outPutString = inputString;
var hyphenIndex = inputString.IndexOf('-');
if (hyphenIndex > -1)
{
outPutString = inputString.Substring(0, hyphenIndex);
}
return outPutString;

Get data from file and split into an array

I have information formatted on a webpage which looks like the following:
Key=submission_id, Value=300348811884547965
Key=formID, Value=50514289063151
Key=ip, Value=xxxxx
Key=editimage, Value=Yes
Key=openimage5, Value=Yes
Key=copyimage, Value=Yes
How would I go about getting the value of each line, I was thinking of doing some sort of next() while getting all data after the 2nd equal sign of each line however I am unsure on how to do it in c#. I am sure there is a better solution then what I have in mind. Please let me know your thoughts.
A regex works nicely for parsing data structured in this way.
Regex splitter = new Regex(#"Key=([\w]+), Value=([\w]+)");
string path = "TextFile1.txt";
string[] lines = System.IO.File.ReadAllLines(path);
lines.ToList().ForEach((s) =>
{
Match match = splitter.Match(s);
if (match.Success)
{
Console.WriteLine("The Key is " + match.Groups[1] + " and the value is " + match.Groups[2]);
}
});

How to check if a position in a string is empty in c#

I have strings with space seperated values and I would like to pick up from a certain index to another and save it in a variable. The strings are as follows:
John Doe Villa Grazia 323334I
I managed to store the id card (3rd column) by using:
if (line.length > 39)
{
idCard = line.Substring(39, 46);
}
However, if I store the name and address (1st and 2nd columns) with Substring there will be empty spaces since they are not of the same length (unlike the id cards). How can I store these 2 values and removing the unneccasry spaces BUT allowing the spaces between name and surname?
Try this:
string line = " John Doe Villa Grazia 323334I";
string name = line.Substring(02, 16).Trim();
string address = line.Substring(18, 23).Trim();
string id = line.Substring(41, 07).Trim();
var values = line.Split(' ');
string name = values[0] + " " + values[1];
string idCard = values[4];
It will be impossible to do without database lookups on names if there aren't spaces for sure in the previous columns.
Are these actually space separated or are they really fix width columns?
By that I mean do the "columns" start at the same index into the string in each case - from the way you're describing the data is sounds like the later i.e. the ID column is always column 39 for 7 characters.
In which case you need to a) pull the columns using the appropriate substring calls as you're already doing and then, use "string ".Trim() to cut off the spaces.
If the rows, are, as it seems fixed with then you don't want to use Split at all.
How can you even get the ID like that, when everything in front of it is of variable length? If that was used for my name, "David Hedlund 323334I", the ID would start at pos 14, not 39.
Try this more dynamic approach:
var name = str.Substring(0, str.LastIndexOf(" "));
var id = str.Substring(str.LastIndexOf(" ")+1);
Looks like your parsing strategy will cause you a lot of trouble. You shouldn't count on the string's size in order to parse it.
Why not save the data in CSV format (John Doe, Villa Grazia, 323334I)?
that way, you can assume that each "column" will be separated by a comma which will make your parsing efforts easier.
Possible "DOH!" question but are you sure they are spaces and not Tabs? Looks like it "could" be a tab seperated file?
Also for browie points you should use String.Empty instead of ' ' for comparisons, its more localisation and memory friendly apparently.
The first approach would be - as already mentioned - a CSV-like structure with a defined token as the field separator.
The second one would be fixed field lengths so you know the first column goes from char 1 to char 20, the second column from char 21 to char 30, and so on.
There is nothing bad about this concept besides that the human readability may be poor if the columns are filled up to their maximum so no spaces remain between them.
You could write a helper function or class which knows about the field lengths and provides an index-based, fault-tolerant access to the particular column. This function would extract the particular string parts and remove the leading and trailing spaces but leave the spaces in between as they are.
If your values have fixed width, best not split it but use the right indexes in your array.
const string input = "John Doe Villa Grazia 323334I";
var name = input.Substring(0, 15).TrimEnd();
var place = input.Substring(16, 38).TrimEnd();
var cardId = input.Substring(39).TrimEnd();
Assuming your values cannot contain two sequential spaces in them we can maybe use " " (double space" as a separator?
The following code will split your string based on the double space
const string input = "John Doe Villa Grazia 323334I";
var entries = input.Split(new[]{" "}, StringSplitOptions.RemoveEmptyEntries)
.Select(s=>s.Trim()).ToArray();
string name = entries[0];
string place = entries[1];
string idCard = entries[2];

Highlight a list of words using a regular expression in c#

I have some site content that contains abbreviations. I have a list of recognised abbreviations for the site, along with their explanations. I want to create a regular expression which will allow me to replace all of the recognised abbreviations found in the content with some markup.
For example:
content: This is just a little test of the memb to see if it gets picked up.
Deb of course should also be caught here.
abbreviations: memb = Member; deb = Debut;
result: This is just a little test of the [a title="Member"]memb[/a] to see if it gets picked up.
[a title="Debut"]Deb[/a] of course should also be caught here.
(This is just example markup for simplicity).
Thanks.
EDIT:
CraigD's answer is nearly there, but there are issues. I only want to match whole words. I also want to keep the correct capitalisation of each word replaced, so that deb is still deb, and Deb is still Deb as per the original text. For example, this input:
This is just a little test of the memb.
And another memb, but not amemba.
Deb of course should also be caught here.deb!
First you would need to Regex.Escape() all the input strings.
Then you can look for them in the string, and iteratively replace them by the markup you have in mind:
string abbr = "memb";
string word = "Member";
string pattern = String.Format("\b{0}\b", Regex.Escape(abbr));
string substitue = String.Format("[a title=\"{0}\"]{1}[/a]", word, abbr);
string output = Regex.Replace(input, pattern, substitue);
EDIT: I asked if a simple String.Replace() wouldn't be enough - but I can see why regex is desirable: you can use it to enforce "whole word" replacements only by making a pattern that uses word boundary anchors.
You can go as far as building a single pattern from all your escaped input strings, like this:
\b(?:{abbr_1}|{abbr_2}|{abbr_3}|{abbr_n})\b
and then using a match evaluator to find the right replacement. This way you can avoid iterating the input string more than once.
Not sure how well this will scale to a big word list, but I think it should give the output you want (although in your question the 'result' seems identical to 'content')?
Anyway, let me know if this is what you're after
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var input = #"This is just a little test of the memb to see if it gets picked up.
Deb of course should also be caught here.";
var dictionary = new Dictionary<string,string>
{
{"memb", "Member"}
,{"deb","Debut"}
};
var regex = "(" + String.Join(")|(", dictionary.Keys.ToArray()) + ")";
foreach (Match metamatch in Regex.Matches(input
, regex /*#"(memb)|(deb)"*/
, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture))
{
input = input.Replace(metamatch.Value, dictionary[metamatch.Value.ToLower()]);
}
Console.Write (input);
Console.ReadLine();
}
}
}
For anyone interested, here is my final solution. It is for a .NET user control. It uses a single pattern with a match evaluator, as suggested by Tomalak, so there is no foreach loop. It's an elegant solution, and it gives me the correct output for the sample input while preserving correct casing for matched strings.
public partial class Abbreviations : System.Web.UI.UserControl
{
private Dictionary<String, String> dictionary = DataHelper.GetAbbreviations();
protected void Page_Load(object sender, EventArgs e)
{
string input = "This is just a little test of the memb. And another memb, but not amemba to see if it gets picked up. Deb of course should also be caught here.deb!";
var regex = "\\b(?:" + String.Join("|", dictionary.Keys.ToArray()) + ")\\b";
MatchEvaluator myEvaluator = new MatchEvaluator(GetExplanationMarkup);
input = Regex.Replace(input, regex, myEvaluator, RegexOptions.IgnoreCase);
litContent.Text = input;
}
private string GetExplanationMarkup(Match m)
{
return string.Format("<b title='{0}'>{1}</b>", dictionary[m.Value.ToLower()], m.Value);
}
}
The output looks like this (below). Note that it only matches full words, and that the casing is preserved from the original string:
This is just a little test of the <b title='Member'>memb</b>. And another <b title='Member'>memb</b>, but not amemba to see if it gets picked up. <b title='Debut'>Deb</b> of course should also be caught here.<b title='Debut'>deb</b>!
I doubt it will perform better than just doing normal string.replace, so if performance is critical measure (refactoring a bit to use a compiled regex). You can do the regex version as:
var abbrsWithPipes = "(abbr1|abbr2)";
var regex = new Regex(abbrsWithPipes);
return regex.Replace(html, m => GetReplaceForAbbr(m.Value));
You need to implement GetReplaceForAbbr, which receives the specific abbr being matched.
I'm doing pretty exactly what you're looking for in my application and this works for me:
the parameter str is your content:
public static string GetGlossaryString(string str)
{
List<string> glossaryWords = GetGlossaryItems();//this collection would contain your abbreviations; you could just make it a Dictionary so you can have the abbreviation-full term pairs and use them in the loop below
str = string.Format(" {0} ", str);//quick and dirty way to also search the first and last word in the content.
foreach (string word in glossaryWords)
str = Regex.Replace(str, "([\\W])(" + word + ")([\\W])", "$1<span class='glossaryItem'>$2</span>$3", RegexOptions.IgnoreCase);
return str.Trim();
}

Categories