Multiple regex matches in single expression

Multiple regex matches in single expression - c#

I have the following sensitive data:
"Password":"123","RootPassword":"123qwe","PassPhrase":"phrase"
I would like to get the following safe data:
"Password":"***","RootPassword":"***","PassPhrase":"***"
It's my code:
internal class Program
{
private static void Main(string[] args)
{
var data = "\"Password\":\"123\",\"RootPassword\":\"123qwe\",\"PassPhrase\":\"phrase\"";
var safe1 = PasswordReplacer.Replace1(data);
var safe2 = PasswordReplacer.Replace2(data);
}
}
public static class PasswordReplacer
{
private const string RegExpReplacement = "$1***$2";
private const string Template = "(\"{0}\":\").*?(\")";
private static readonly string[] PasswordLiterals =
{
"password",
"RootPassword",
"PassPhrase"
};
public static string Replace1(string sensitiveInfo)
{
foreach (var literal in PasswordLiterals)
{
var pattern = string.Format(Template, literal);
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
sensitiveInfo = regex.Replace(sensitiveInfo, RegExpReplacement);
}
return sensitiveInfo;
}
public static string Replace2(string sensitiveInfo)
{
var multiplePattern = "(\"password\":\")|(\"RootPassword\":\")|(\"PassPhrase\":\").*?(\")"; //?
var regex = new Regex(string.Format(Template, multiplePattern), RegexOptions.IgnoreCase);
return regex.Replace(sensitiveInfo, RegExpReplacement);
}
}
Replace1 method works as expected. But it does it one by one. My question is is it possble to do the same but using single regex match ? If so I need help with Replace2.

The Replace2 can look like
public static string Replace2(string sensitiveInfo)
{
var multiplePattern = $"(\"(?:{string.Join("|", PasswordLiterals)})\":\")[^\"]*(\")";
return Regex.Replace(sensitiveInfo, multiplePattern, RegExpReplacement, RegexOptions.IgnoreCase);
}
See the C# demo.
The multiplePattern will hold a pattern like ("(?:password|RootPassword|PassPhrase)":")[^"]*("), see the regex demo. Quick details:
("(?:password|RootPassword|PassPhrase)":") - Group 1 ($1): a " char followed with either password, RootPassword or PassPhrase and then a ":" substring
[^"]* - any zero or more chars other than " as many as possible
(") - Group 2 ($2): a " char.

Related

C# remove empty url parameters regex

I am trying to remove empty url type parameters from a string using C#. My code sample is here.
public static string test ()
{
string parameters = "one=aa&two=&three=aaa&four=";
string pattern = "&[a-zA-Z][a-zA-Z]*=&";
string replacement = "";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(parameters, replacement);
return parameters;
}
public static void Main(string[] args)
{
Console.WriteLine(test());
}
I tried the code in rextester
output: one=aa&two=&three=aaa&four=
expected output: one=aa&three=aaa

You absolutely do not need to roll your own Regex for this, try using HttpUtility.ParseQueryString():
public static string RemoveEmptyUrlParameters(string input)
{
var results = HttpUtility.ParseQueryString(input);
Dictionary<string, string> nonEmpty = new Dictionary<string, string>();
foreach(var k in results.AllKeys)
{
if(!string.IsNullOrWhiteSpace(results[k]))
{
nonEmpty.Add(k, results[k]);
}
}
return string.Join("&", nonEmpty.Select(kvp => $"{kvp.Key}={kvp.Value}"));
}
Fiddle here

Regex:
(?:^|&)[a-zA-Z]+=(?=&|$)
This matches start of string or an ampersand ((?:^|&)) followed by at least one (english) letter ([a-zA-Z]+), an equal sign (=) and then nothing, made sure by the positive look-ahead ((?=&|$)) which matches end of string or a new parameter (started by &).
Code:
public static string test ()
{
string parameters = "one=aa&two=&three=aaa&four=";
string pattern = "(?:^|&)[a-zA-Z]+=(?=&|$)";
string replacement = "";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(parameters, replacement);
return result;
}
public static void Main(string[] args)
{
Console.WriteLine(test());
}
Note that this also returns the correct variable (as pointed out by Joel Anderson)
See it live here at ideone.

The results of the Regex replace is not returned by the function. The function returns the variable "parameters", which is never updated or changed.
string parameters = "one=aa&two=&three=aaa&four=";
...
string result = rgx.Replace(parameters, replacement);
return parameters;
....
Perhaps you meant
return results;

Regex to find a string between two special known characters

I'm trying to parse messages transmited over TCP for my own network protocol using regex without success.
My commands start with ! followed by COMMAND_NAME and a list of arguments in the format or ARGUMENT_NAME=ARGUMENT_VALUE enclosed in <>
for example:
!LOGIN?<USERNAME='user'><PASSWORD='password'>;
my code :
public class CommandParser
{
private Dictionary<string, string> arguments = new Dictionary<string, string>();
public CommandParser(string input)
{
Match commandMatch = Regex.Match(input, #"\!([^)]*)\&");
if (commandMatch.Success)
{
CommandName = commandMatch.Groups[1].Value;
}
// Here we call Regex.Match.
MatchCollection matches = Regex.Matches(input,"(?<!\\S)<([a-z0-9]+)=(\'[a-z0-9]+\')>(?!\\S)",
RegexOptions.IgnoreCase);
//
foreach (Match argumentMatch in matches)
{
arguments.Add(
argumentMatch.Groups[1].Value,
argumentMatch.Groups[2].Value);
}
}
public string CommandName { get; set; }
public Dictionary<string, string> Arguments
{
get { return arguments; }
}
/// <summary>
///
/// </summary>
public int ArgumentCount
{
get { return arguments.Count; }
}
}

To find the command name, finding the first word after the "!" should be enough:
/\!\w*/g
To match the key/value pairs in groups, you could try something like:
(\w+)='([a-zA-Z_]*)'
An example of the above regex can be found here.

You do not need regex here and avoid them unless that's a last option left. You could do this with simple C# logic.
string input = "!LOGIN?<USERNAME='user'><PASSWORD='password'>";
string command = input.Substring(1, input.IndexOf('?') - 1);
Console.WriteLine($"command: {command}");
var parameters = input
.Replace($"!{command}?", string.Empty)
.Replace("<", "")
.Split(">".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
string[] kvpair;
foreach(var kv in parameters) {
kvpair = kv.Split('=');
Console.WriteLine($"pname: {kvpair[0]}, pvalue: {kvpair[1]}");
}
Output:
command: LOGIN
pname: USERNAME, pvalue: 'user'
pname: PASSWORD, pvalue: 'password'

Parse through POST

I use Stream reader to read context.Request.InputStream to the end and end up with a string looking like
"Gamestart=true&GamePlayer=8&CurrentDay=Monday&..."
What would be the most efficent/"clean" way to parse that in a C# console?

You can use HttpUtility.ParseQueryString
Little sample:
string queryString = "Gamestart=true&GamePlayer=8&CurrentDay=Monday"; //Hardcoded just for example
NameValueCollection qscoll = HttpUtility.ParseQueryString(querystring);
foreach (String k in qscoll.AllKeys)
{
//Prints result in output window.
System.Diagnostics.Debug.WriteLine(k + " = " + qscoll[k]);
}

HttpUtility.ParseQueryString
Parses a query string into a NameValueCollection using UTF8 encoding.
http://msdn.microsoft.com/en-us/library/ms150046.aspx

I know this is a bit of a zombie post but I thought I'd add another answer since HttpUtility adds another assembly reference (System.Web), which may be undesirable to some.
using System.Net;
using System.Text.RegularExpressions;
static readonly Regex HttpQueryDelimiterRegex = new Regex(#"\?", RegexOptions.Compiled);
static readonly Regex HttpQueryParameterDelimiterRegex = new Regex(#"&", RegexOptions.Compiled);
static readonly Regex HttpQueryParameterRegex = new Regex(#"^(?<ParameterName>\S+)=(?<ParameterValue>\S*)$", RegexOptions.Compiled);
static string GetPath(string pathAndQuery)
{
var components = HttpQueryDelimiterRegex.Split(pathAndQuery, 2);
return components[0];
}
static Dictionary<string, string> GetQueryParameters(string pathAndQuery)
{
var parameters = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
var components = HttpQueryDelimiterRegex.Split(pathAndQuery, 2);
if (components.Length > 1)
{
var queryParameters = HttpQueryParameterDelimiterRegex.Split(components[1]);
foreach(var queryParameter in queryParameters)
{
var match = HttpQueryParameterRegex.Match(queryParameter);
if (!match.Success) continue;
var parameterName = WebUtility.HtmlDecode(match.Groups["ParameterName"].Value) ?? string.Empty;
var parameterValue = WebUtility.HtmlDecode(match.Groups["ParameterValue"].Value) ?? string.Empty;
parameters[parameterName] = parameterValue;
}
}
return parameters;
}
I wish they would add that same method to WebUtility which is available in System.Net as of .NET 4.0.

Entity Framework 4: Any way to know the plural form of an Entity Type?

Given this code:
private ObjectQuery<E> GetEntity()
{ // Pluralization concern. Table and Type need to be consistently named.
// TODO: Don't get cute with database table names. XXX and XXXs for pluralization
return _dc.CreateQuery<E>("[" + typeof(E).Name + "s]");
}
Is there any way to determine an Entity type's plural name so I can access the table, rather than just adding an 's' to the name?
For example, Medium is singular and Media is plural.

You can also use the PluralizationService provided by EF 4. Here is a blog post that covers the service in good detail.
http://web.archive.org/web/20130521044050/http://www.danrigsby.com/blog/index.php/2009/05/19/entity-framework-40-pluralization

I'm not sure how entity framework does this, but I use the pluralizer from Ruby on Rails. You can find this at http://dev.rubyonrails.org/browser/trunk/activesupport/lib/active_support/inflector.rb#L106. This is easy enough to implement in C#.
The entire source for a translation to C# is:
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
public static class Inflector
{
private static List<KeyValuePair<Regex, string>> _pluralRules = new List<KeyValuePair<Regex, string>>();
private static List<KeyValuePair<Regex, string>> _singularRules = new List<KeyValuePair<Regex, string>>();
private static List<KeyValuePair<string, string>> _irregulars = new List<KeyValuePair<string, string>>();
private static List<string> _uncountables = new List<string>();
static Inflector()
{
_uncountables.Add("equipment");
_uncountables.Add("information");
_uncountables.Add("rice");
_uncountables.Add("money");
_uncountables.Add("species");
_uncountables.Add("series");
_uncountables.Add("fish");
_uncountables.Add("sheep");
AddPlural("$", "s", true);
AddPlural("s$", "s");
AddPlural("(ax|test)is$", "$1es");
AddPlural("(octop|vir)us$", "$1i");
AddPlural("(alias|status)$", "$1es");
AddPlural("(bu)s$", "$1ses");
AddPlural("(buffal|tomat)o$", "$1oes");
AddPlural("([ti])um$", "$1a");
AddPlural("sis$", "ses");
AddPlural("(?:([^f])fe|([lr])f)$", "$1$2ves");
AddPlural("(hive)$", "$1s");
AddPlural("([^aeiouy]|qu)y$", "$1ies");
AddPlural("(x|ch|ss|sh)$", "$1es");
AddPlural("(matr|vert|ind)(?:ix|ex)$", "$1ices");
AddPlural("([m|l])ouse$", "$1ice");
AddPlural("^(ox)$", "$1en");
AddPlural("(quiz)$", "$1zes");
AddSingular("s$", "");
AddSingular("(n)ews$", "$1ews");
AddSingular("([ti])a$", "$1um");
AddSingular("((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$", "$1$2sis");
AddSingular("(^analy)ses$", "$1sis");
AddSingular("([^f])ves$", "$1fe");
AddSingular("(hive)s$", "$1");
AddSingular("(tive)s$", "$1");
AddSingular("([lr])ves$", "$1f");
AddSingular("([^aeiouy]|qu)ies$", "$1y");
AddSingular("(s)eries$", "$1eries");
AddSingular("(m)ovies$", "$1ovie");
AddSingular("(x|ch|ss|sh)es$", "$1");
AddSingular("([m|l])ice$", "$1ouse");
AddSingular("(bus)es$", "$1");
AddSingular("(o)es$", "$1");
AddSingular("(shoe)s$", "$1");
AddSingular("(cris|ax|test)es$", "$1is");
AddSingular("(octop|vir)i$", "$1us");
AddSingular("(alias|status)es$", "$1");
AddSingular("^(ox)en", "$1");
AddSingular("(vert|ind)ices$", "$1ex");
AddSingular("(matr)ices$", "$1ix");
AddSingular("(quiz)zes$", "$1");
AddIrregular("person", "people");
AddIrregular("man", "men");
AddIrregular("child", "children");
AddIrregular("sex", "sexes");
AddIrregular("move", "moves");
AddIrregular("cow", "kine");
}
private static void AddIrregular(string singular, string plural)
{
AddPlural(singular.Substring(0, 1).ToLower() + singular.Substring(1) + "$", plural.Substring(0, 1).ToLower() + plural.Substring(1));
AddPlural(singular.Substring(0, 1).ToUpper() + singular.Substring(1) + "$", plural.Substring(0, 1).ToUpper() + plural.Substring(1));
AddSingular(plural.Substring(0, 1).ToLower() + plural.Substring(1) + "$", singular.Substring(0, 1).ToLower() + singular.Substring(1));
AddSingular(plural.Substring(0, 1).ToUpper() + plural.Substring(1) + "$", singular.Substring(0, 1).ToUpper() + singular.Substring(1));
}
private static void AddPlural(string expression, string replacement)
{
AddPlural(expression, replacement, false);
}
private static void AddPlural(string expression, string replacement, bool caseSensitive)
{
var re = caseSensitive ? new Regex(expression) : new Regex(expression, RegexOptions.IgnoreCase);
_pluralRules.Insert(0, new KeyValuePair<Regex, string>(re, replacement));
}
private static void AddSingular(string expression, string replacement)
{
AddSingular(expression, replacement, false);
}
private static void AddSingular(string expression, string replacement, bool caseSensitive)
{
var re = caseSensitive ? new Regex(expression) : new Regex(expression, RegexOptions.IgnoreCase);
_singularRules.Insert(0, new KeyValuePair<Regex, string>(re, replacement));
}
public static string Pluralize(string value)
{
if (_uncountables.Contains(value))
return value;
foreach (var rule in _pluralRules)
{
if (rule.Key.Match(value).Success)
{
return rule.Key.Replace(value, rule.Value);
}
}
return value;
}
public static string Singularize(string value)
{
if (_uncountables.Contains(value))
return value;
foreach (var rule in _singularRules)
{
if (rule.Key.Match(value).Success)
{
return rule.Key.Replace(value, rule.Value);
}
}
return value;
}
public static string Camelize(string value, bool firstLetterUppercase = true)
{
if (firstLetterUppercase)
{
return
Regex.Replace(
Regex.Replace(value, "/(.?)", p => "::" + p.Groups[1].Value.ToUpperInvariant()),
"(?:^|_)(.)", p => p.Groups[1].Value.ToUpperInvariant()
);
}
else
{
return
value.Substring(0, 1).ToLowerInvariant() +
Camelize(value.Substring(1));
}
}
public static string Underscore(string value)
{
value = value.Replace("::", "/");
value = Regex.Replace(value, "([A-Z]+)([A-Z][a-z])", p => p.Groups[1].Value + "_" + p.Groups[2].Value);
value = Regex.Replace(value, "([a-z\\d])([A-Z])", p => p.Groups[1].Value + "_" + p.Groups[2].Value);
value = value.Replace("-", "_");
return value.ToLowerInvariant();
}
}

Have you tried
YourEntityObject.EntityKey.EntitySetName
Assuming your table names are plural.
If the generic method you have takes an entity (that inherits from EntityObject), then you can access the EntityKey from it.
private ObjectQuery<E> GetEntity()
{ // Pluralization concern. Table and Type need to be consistently named.
// TODO: Don't get cute with database table names. XXX and XXXs for pluralization
return _dc.CreateQuery<E>("[" + e.EntityKey.EntitySetName + "]");
}

Formatting Twitter text (TweetText) with C#

Is there a better way to format text from Twitter to link the hyperlinks, username and hashtags? What I have is working but I know this could be done better. I am interested in alternative techniques. I am setting this up as a HTML Helper for ASP.NET MVC.
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Web;
using System.Web.Mvc;
namespace Acme.Mvc.Extensions
{
public static class MvcExtensions
{
const string ScreenNamePattern = #"#([A-Za-z0-9\-_&;]+)";
const string HashTagPattern = #"#([A-Za-z0-9\-_&;]+)";
const string HyperLinkPattern = #"(http://\S+)\s?";
public static string TweetText(this HtmlHelper helper, string text)
{
return FormatTweetText(text);
}
public static string FormatTweetText(string text)
{
string result = text;
if (result.Contains("http://"))
{
var links = new List<string>();
foreach (Match match in Regex.Matches(result, HyperLinkPattern))
{
var url = match.Groups[1].Value;
if (!links.Contains(url))
{
links.Add(url);
result = result.Replace(url, String.Format("{0}", url));
}
}
}
if (result.Contains("#"))
{
var names = new List<string>();
foreach (Match match in Regex.Matches(result, ScreenNamePattern))
{
var screenName = match.Groups[1].Value;
if (!names.Contains(screenName))
{
names.Add(screenName);
result = result.Replace("#" + screenName,
String.Format("#{0}", screenName));
}
}
}
if (result.Contains("#"))
{
var names = new List<string>();
foreach (Match match in Regex.Matches(result, HashTagPattern))
{
var hashTag = match.Groups[1].Value;
if (!names.Contains(hashTag))
{
names.Add(hashTag);
result = result.Replace("#" + hashTag,
String.Format("#{1}",
HttpUtility.UrlEncode("#" + hashTag), hashTag));
}
}
}
return result;
}
}
}

That is remarkably similar to the code I wrote that displays my Twitter status on my blog. The only further things I do that I do are
1) looking up #name and replacing it with Real Name;
2) multiple #name's in a row get commas, if they don't have them;
3) Tweets that start with #name(s) are formatted "To #name:".
I don't see any reason this can't be an effective way to parse a tweet - they are a very consistent format (good for regex) and in most situations the speed (milliseconds) is more than acceptable.
Edit:
Here is the code for my Tweet parser. It's a bit too long to put in a Stack Overflow answer. It takes a tweet like:
#user1 #user2 check out this cool link I got from #user3: http://url.com/page.htm#anchor #coollinks
And turns it into:
<span class="salutation">
To Real Name,
Real Name:
</span> check out this cool link I got from
<span class="salutation">
Real Name
</span>:
http://site.com/...
#coollinks
It also wraps all that markup in a little JavaScript:
document.getElementById('twitter').innerHTML = '{markup}';
This is so the tweet fetcher can run asynchronously as a JS and if Twitter is down or slow it won't affect my site's page load time.

I created helper method to shorten text to 140 chars with url included. You can set share length to 0 to exclude url from tweet.
public static string FormatTwitterText(this string text, string shareurl)
{
if (string.IsNullOrEmpty(text))
return string.Empty;
string finaltext = string.Empty;
string sharepath = string.Format("http://url.com/{0}", shareurl);
//list of all words, trimmed and new space removed
List<string> textlist = text.Split(' ').Select(txt => Regex.Replace(txt, #"\n", "").Trim())
.Where(formatedtxt => !string.IsNullOrEmpty(formatedtxt))
.ToList();
int extraChars = 3; //to account for the two dots ".."
int finalLength = 140 - sharepath.Length - extraChars;
int runningLengthCount = 0;
int collectionCount = textlist.Count;
int count = 0;
foreach (string eachwordformated in textlist
.Select(eachword => string.Format("{0} ", eachword)))
{
count++;
int textlength = eachwordformated.Length;
runningLengthCount += textlength;
int nextcount = count + 1;
var nextTextlength = nextcount < collectionCount ?
textlist[nextcount].Length :
0;
if (runningLengthCount + nextTextlength < finalLength)
finaltext += eachwordformated;
}
return runningLengthCount > finalLength ? finaltext.Trim() + ".." : finaltext.Trim();
}

There is a good resource for parsing Twitter messages this link, worked for me:
How to Parse Twitter Usernames, Hashtags and URLs in C# 3.0
http://jes.al/2009/05/how-to-parse-twitter-usernames-hashtags-and-urls-in-c-30/
It contains support for:
Urls
#hashtags
#usernames
BTW: Regex in the ParseURL() method needs reviewing, it parses stock symbols (BARC.L) into links.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Multiple regex matches in single expression - c#

Related

C# remove empty url parameters regex

Regex to find a string between two special known characters

Parse through POST

Entity Framework 4: Any way to know the plural form of an Entity Type?

Formatting Twitter text (TweetText) with C#

Categories

Resources