Parsing Line Breaks from Plain Text - c#

I have a process that parses emails. The software that we're using to retrieve and store the contents of the body doesn't seem to include line-breaks, so I end up with something like this -
Good afternoon, [line-break] this is my email. [line-break] Info: data [line-break] More info: data
My [line-break] brackets are where the line breaks should be. However, when we extract the body, we get just the text. It makes it tough to parse the text without having the line breaks.
Essentially, what I need to do is parse each [Info]: [Data]. I can find where the [Info] tags begin, but without having line-breaks, I'm struggling to know where the data associated to that info should end. The email is coming from Windows.
Is there any way to take plain text and encode it to some way that would include line breaks?
Example Email Contents
Good Morning, Order: 1234 The Total: $445 When: 7/10 Type: Dry
Good Morning, Order: 1235 The Total: $1743 Type: Frozen When: 7/22
Order: 1236 The Total: $950.14 Type: DRY When: 7/10
The Total: $514 Order: 1237 Type: Dry CSR: Tim W
Sorry, below is your order: Order: 1236 The Total: $500 When: 7/10 Type: Dry Creator: Josh A. Thank you
Now, I need to loop through the email and parse out the values for Order, Total, and Type. The other placeholder: values are irrelevant and random.

Try something like this.
You need to add all possible sections identifiers: it can be updated over time, to add more known identifiers, to reduce the chance of mistakes in parsing the strings.
As of now, if the value marked by a known identifier contains an unknown identifier when the string is parsed, that part is removed.
If an unknown identifier is encountered, it's ignored.
Regex.Matches will extract all matching parts, return their Value, the Index position and the length, so it's simple to use [Input].SubString(Index, NextPosition - Index) to return the value corresponding to the part requested.
The EmailParser class GetPartValue(string) returns the content of an identifier by its name (the name can include the colon char or not, e.g. "Order" or "Order:").
The Matches properties returns a Dictionary<string, string> of all matched identifiers and their content. The content is cleaned up - as possible - calling CleanUpValue() method.
Adjust this method to deal with some specific/future requirements.
► If you don't pass a Pattern string, a default one is used.
► If you change the Pattern, setting the CurrentPatter property (perhaps using one stored in the app settings or edited in a GUI or whatever else), the Dictionary of matched values is rebuilt.
Initialize with:
string input = "Good Morning, Order: 1234 The Total: $445 Unknown: some value Type: Dry When: 7/10";
var parser = new EmailParser(input);
string value = parser.GetPartValue("The Total");
var values = parser.Matches;
public class EmailParser
{
static string m_Pattern = "Order:|The Total:|Type:|Creator:|When:|CSR:";
public EmailParser(string email) : this(email, null) { }
public EmailParser(string email, string pattern)
{
if (!string.IsNullOrEmpty(pattern)) {
m_Pattern = pattern;
}
Email = email;
this.Matches = GetMatches();
}
public string Email { get; }
public Dictionary<string, string> Matches { get; private set; }
public string CurrentPatter {
get => m_Pattern;
set {
if (value != m_Pattern) {
m_Pattern = value;
this.Matches = GetMatches();
}
}
}
public string GetPartValue(string part)
{
if (part[part.Length - 1] != ':') part += ':';
if (!Matches.Any(m => m.Key.Equals(part))) {
throw new ArgumentException("Part non included");
}
return Matches.FirstOrDefault(m => m.Key.Equals(part)).Value;
}
private Dictionary<string, string> GetMatches()
{
var dict = new Dictionary<string, string>();
var matches = Regex.Matches(Email, m_Pattern, RegexOptions.Singleline);
foreach (Match m in matches) {
int startPosition = m.Index + m.Length;
var next = m.NextMatch();
string parsed = next.Success
? Email.Substring(startPosition, next.Index - startPosition).Trim()
: Email.Substring(startPosition).Trim();
dict.Add(m.Value, CleanUpValue(parsed));
}
return dict;
}
private string CleanUpValue(string value)
{
int pos = value.IndexOf(':');
if (pos < 0) return value;
return value.Substring(0, value.LastIndexOf((char)32, pos));
}
}

Related

Replacing first 16 digits in a string with Regex.Replace

I'm trying to replace only the first 16 digits of a string with Regex. I want it replaced with "*". I need to take this string:
"Request=Credit Card.Auth
Only&Version=4022&HD.Network_Status_Byte=*&HD.Application_ID=TZAHSK!&HD.Terminal_ID=12991kakajsjas&HD.Device_Tag=000123&07.POS_Entry_Capability=1&07.PIN_Entry_Capability=0&07.CAT_Indicator=0&07.Terminal_Type=4&07.Account_Entry_Mode=1&07.Partial_Auth_Indicator=0&07.Account_Card_Number=4242424242424242&07.Account_Expiry=1024&07.Transaction_Amount=142931&07.Association_Token_Indicator=0&17.CVV=200&17.Street_Address=123
Road SW&17.Postal_Zip_Code=90210&17.Invoice_Number=INV19291"
And replace the credit card number with an asterisk, which is why I say the first 16 digits, as that is how many digits are in a credit card. I am first splitting the string where there is a "." and then checking if it contains "card" and "number". Then if it finds it I want to replace the first 16 numbers with "*"
This is what I've done:
public void MaskData(string input)
{
if (input.Contains("."))
{
string[] userInput = input.Split('.');
foreach (string uInput in userInput)
{
string lowerCaseInput = uInput.ToLower();
string containsCard = "card";
string containsNumber = "number";
if (lowerCaseInput.Contains(containsCard) && lowerCaseInput.Contains(containsNumber))
{
tbStoreInput.Text += Regex.Replace(lowerCaseInput, #"[0-9]", "*") + Environment.NewLine;
}
else
{
tbStoreInput.Text += lowerCaseInput + Environment.NewLine;
}
}
}
}
I am aware that the Regex is wrong, but not sure how to only get the first 16, as right now its putting an asterisks in the entire line like seen here:
"account_card_number=****************&**"
I don't want it to show the asterisks after the "&".
Same answer as in the comments but explained.
your regex pattern "[0-9]" is a single digit match, so each individual digit
including the digits after & will be a match and so would be replaced.
What you want to do is add a quantifier which restricts the matching to a number of characters ie 16, so your regex changes to "[0-9]{16}" to ensure those are the only characters affected by your replace operation
Disclaimer
My answer is purposely broader than what is asked by OP but I saw it as an opportunity to raise awareness of other tools that are available in C# (which are objects).
String replacement
Regex is not the only tool available to replace a simple string by another. Instead of
Regex.Replace(lowerCaseInput, #"[0-9]{16}", "****************")
it can also be
new StringBuilder()
.Append(lowerCaseInput.Take(20))
.Append(new string('*', 16))
.Append(lowerCaseInput.Skip(36))
.ToString();
Shifting from procedural to object
Now the real meat comes in the possibility to encapsulate the logic into an object which holds a kind of string representation of a dictionary (entries being separated by '.' while keys and values are separated by '=').
The only behavior this object has is to give back a string representation of the initial input but with some value (1 in your case) masked to user (I assume for some security reason).
public sealed class CreditCardRequest
{
private readonly string _input;
public CreditCardRequest(string input) => _input = input;
public static implicit operator string(CreditCardRequest request) => request.ToString();
public override string ToString()
{
var entries = _input.Split(".", StringSplitOptions.RemoveEmptyEntries)
.Select(entry => entry.Split("="))
.ToDictionary(kv => kv[0].ToLower(), kv =>
{
if (kv[0] == "Account_Card_Number")
{
return new StringBuilder()
.Append(new string('*', 16))
.Append(kv[1].Skip(16))
.ToString();
}
else
{
return kv[1];
}
});
var output = new StringBuilder();
foreach (var kv in entries)
{
output.AppendFormat("{0}={1}{2}", kv.Key, kv.Value, Environment.NewLine);
}
return output.ToString();
}
}
Usage becomes as follow:
tbStoreInput.Text = new CreditCardRequest(input);
The concerns of your code are now independant of each other (the rule to parse the input is no more tied to UI component) and the implementation details are hidden.
You can even decide to use Regex in CreditCardRequest.ToString() if you wish to, the UI won't ever notice the change.
The class would then becomes:
public override string ToString()
{
var output = new StringBuilder();
if (_input.Contains("."))
{
foreach (string uInput in _input.Split('.'))
{
if (uInput.StartsWith("Account_Card_Number"))
{
output.AppendLine(Regex.Replace(uInput.ToLower(), #"[0-9]{16}", "****************");
}
else
{
output.AppendLine(uInput.ToLower());
}
}
}
return output.ToString();
}
You can match 16 digits after the account number, and replace with 16 times an asterix:
(?<=\baccount_card_number=)[0-9]{16}\b
Regex demo
Or you can use a capture group and use that group in the replacement like $1****************
\b(account_card_number=)[0-9]{16}\b
Regex demo

How to avoid large switch statements and/or regular expressions when converting code from one language to another

I have to convert a few hundred test cases written in Java to code in C#. At the moment all I could think of is define a set of regular expressions, try to match it on a line and do an action based on which regex matched.
Any better ideas (this still stinks).
An example of from and to:
Java:
Request request = new Request(testRunner)
request.setUsername("userName")
request.setPassword("password")
log.info(request.getRequest())
C#
var request = new LoginRequest(LoginParams);
request.Username = "userName";
request.Password = "password";
var LoginResponse = Account.ExecuteCall(request, pathToApi);
The source I'm trying to convert is from SoapUI and the bits of script involved are within TestSteps of a humongous XML file. Also, most of them are simply forming some sort of request and checking for a specific response so there shouldn't be too many types to implement.
What I ended up doing was defined a base class (Map) that has a Pattern property, a Success indicator and the lines of Code that it results to after a successful match. In some cases a certain line can be simply replaced by another one but in other cases (setUserName) I need to extract content from the original script to put in the c# code. In other cases, a single line might be replaced with more than one. The transformation is all defined in the Match function.
public class SetUserName : Map
{
internal override string Pattern { get { return #"request.setUsername\(""(.*)""\)"; } }
public override void Match(string line)
{
Match match = Regex.Match(line, Pattern);
if (match.Success)
{
Success = true;
CodeLines = new Code<CodeLine>
{new CodeLine("request.Username = \"" + match.Groups[1].Value + "\"")};
}
}
}
Then I put the maps in a list ordered by occurrence and loop through each line of script:
foreach (string scriptLine in scriptLines)
{
string line = Strip(scriptLine);
if (string.IsNullOrEmpty(line) || Regex.Match(line, #"^\s+$").Success)
{
continue;
}
Map[] RegExes =
{
new Request(),
new SetUserName(),
new SetPassword(),
new RunRequest()
};
foreach (Map map in RegExes)
{
map.Match(line);
if (map.Success)
{
codeList.AddRange(map.CodeLines);
break;
}
}
}

Adding an incremental number to duplicate string

I'm working in c# (.Net4 using Visual Studio) and I'm trying to figure out an algorithm to append incremental numbers to strings entered, based on existing strings in the program. Not doing too well searching around for an answer.
I have a List<string>. An example would be
{"MyItem (2)", "MyItem", "Other thing", "string here", "MyItem (1)"}
So say the user wants to add another string to this list, and they've selected "MyItem" as the string to add. So given the input and the existing list, the algorithm would return "MyItem (3)" as the new string to add.
It's the same function as in Windows Explorer where you keep adding New Folders ("New Folder (1)", "New Folder (2)" and on and on)
I'm trying just looping through the list and figuring out what the next logical number should be but I'm getting stuck (and the code's getting large). Anyone know an elegent way of doing this? (I'm not too good with Regex so maybe that's what I'm missing)
Get the input and search for it, if it's present in the list then get the count and concatenate input string and count + 1 otherwise just add the input to the list:
var input = Console.ReadLine(); // just for example
if(list.Any(x => x == input))
{
var count = list.Count(x => x == input);
list.Add(string.Format("{0} ({1})", input, count+1);
}
else list.Add(input);
This should work:
var list = new List<string>{"MyItem (2)", "MyItem", "Other thing", "string here", "MyItem (1)"} ;
string str = "MyItem";
string newStr = str;
int i = 0;
while(list.Contains(newStr))
{
i++;
newStr = string.Format("{0} ({1})",str,i);
}
// newStr = "MyItem (3)"
The following is a useful extension method that I came up with to simulate the behaviour of Windows Explorer.
The previous answers I feel were too simple and only partially satisfied the requirements, they were also not presented in a way that you could easily reuse them.
This solution is based on you first identifying the list of strings that you want to compare against, they might come from a file system, or database, its up to you to resolve the list of values from your business domain, then the process of identifying the duplicates and generating a unique values is very repeatable.
Extension Method:
/// <summary>
/// Generate a uniquely numbered string to insert into this list
/// Uses convention of appending the value with the duplication index number in brackets "~ (#)"
/// </summary>
/// <remarks>This will not actually add this list</remarks>
/// <param name="input">The string to evaluate against this collection</param>
/// <param name="comparer">[Optional] One of the enumeration values that specifies how the strings will be compared, will default to OrdinalIgnoreCase </param>
/// <returns>A numbered variant of the input string that would be unique in the list of current values</returns>
public static string GetUniqueString(this IList<string> currentValues, string input, StringComparison comparison = StringComparison.OrdinalIgnoreCase)
{
// This matches the pattern we are using, i.e. "A String Value (#)"
var regex = new System.Text.RegularExpressions.Regex(#"\(([0-9]+)\)$");
// this is the comparison value that we want to increment
string prefix = input.Trim();
string result = input.Trim();
// let it through if there is no current match
if (currentValues.Any(x => x.Equals(input, comparison)))
{
// Identify if the input value has already been incremented (makes this more reusable)
var inputMatch = regex.Match(input);
if (inputMatch.Success)
{
// this is the matched value
var number = inputMatch.Groups[1].Captures[0].Value;
// remove the numbering from the alias to create the prefix
prefix = input.Replace(String.Format("({0})", number), "").Trim();
}
// Now evaluate all the existing items that have the same prefix
// NOTE: you can do this as one line in Linq, this is a bit easier to read
// I'm trimming the list for consistency
var potentialDuplicates = currentValues.Select(x => x.Trim()).Where(x => x.StartsWith(prefix, comparison));
int count = 0;
int maxIndex = 0;
foreach (string item in potentialDuplicates)
{
// Get the index from the current item
var indexMatch = regex.Match(item);
if (indexMatch.Success)
{
var index = int.Parse(indexMatch.Groups[1].Captures[0].Value);
var test = item.Replace(String.Format("({0})", index), "").Trim();
if (test.Equals(prefix, comparison))
{
count++;
maxIndex = Math.Max(maxIndex, index);
}
}
}
int nextIndex = Math.Max(maxIndex, count) + 1;
result = string.Format("{0} ({1})", prefix, nextIndex);
}
return result;
}
Implementation:
var list = new string [] { "MyItem (2)", "MyItem", "Other thing", "string here", "MyItem (1)" };
string input = Console.ReadLine(); // simplify testing, thanks #selman-genç
var result = list.GetUniqueString(input, StringComparison.OrdinalIgnoreCase);
// Display the result, you can add it to the list or whatever you need to do
Console.WriteLine(result);
Input | Result
---------------------------------
MyItem | MyItem (3)
myitem (1) | myitem (3)
MyItem (3) | MyItem (3)
MyItem (4) | MyItem (4)
MyItem 4 | MyItem 4
String Here | String Here (1)
a new value | a new value
Pseudo-code:
If the list has no such string, add it to the list.
Otherwise, set variable N = 1.
Scan the list and look for strings like the given string + " (*)" (here Regex would help).
If any string is found, take the number from the braces and compare it against N. Set N = MAX( that number + 1, N ).
After the list has been scanned, N contains the number to add.
So, add the string + " (N)" to the list.

c#: regex how to differentiate between two variations of a string

This is tough to explain enough to ask the question, but i'll try:
I have two possibilities of user input:
S01E05 or 0105 (two different input strings)
which both translate to season 01, episode 05
but if they user inputs it backwards E05S01 or 0501, i need to be able to return the same result, Season 01 Episode 05
The control for this would be the user defining the format of the original filename with something like this:
"SssEee" -- uppercase 'S' denoting that the following lowercase 's' belong to Season and uppercase 'E' denoting that the following lowercase 'e' belong to Episode. So if the user decides to define the format as EeeSss then my function should still return the same result since it knows which numbers belong to season or episode.
I don't have anything working quite yet to share, but what I was toying with is a loop that builds the regex pattern. The function, so far, accepts the user format and the file name:
public static int(string userFormat, string fileName)
{
}
the userFormat would be a string and look something like this:
t.t.t.SssEee
or even
t.SssEee
where t is for title, and the rest you know.
The file name might look like this:
battlestar.galactica.S01E05.mkv
Ive got the function that extracts the title from the file name by using the userFormat to build the regex string
public static string GetTitle(string userFormat, string fileName)
{
string pattern = "^";
char positionChar;
string fileTitle;
for (short i = 0; i < userFormat.Length; i++)
{
positionChar = userFormat[i];
//build the regex pattern
if (positionChar == 't')
{
pattern += #"\w+";
}
else if (positionChar == '#')
{
pattern += #"\d+";
}
else if (positionChar == ' ')
{
pattern += #"\s+";
}
else
pattern += positionChar;
}
//pulls out the title with or without the delimiter
Match title = Regex.Match(fileName, pattern, RegexOptions.IgnoreCase);
fileTitle = title.Groups[0].Value;
//remove the delimiter
string[] tempString = fileTitle.Split(#"\/.-<>".ToCharArray());
fileTitle = "";
foreach (string part in tempString)
{
fileTitle += part + " ";
}
return CultureInfo.CurrentCulture.TextInfo.ToTitleCase(fileTitle);
}
but im kind of stumped on how to do the extraction of the episode and season numbers. In my head im thinking the process would look something like:
Look through the userFormat string to find the uppercase S
Determine how many lowercase 's' are following the uppercase S
Build the regex expression that describes this
Search through the file name and find that pattern
Extract the number from that pattern
Sounds simple enough but im having a hard time putting it into actions. The complication being the the fact that the format in the filename could be S01E05 or it could be simply 0105. Either scenario would be identified by the user when they define the format.
Ex 1. the file name is battlestar.galactica.S01E05
the user format submitted will be t.t.?ss?ee
Ex 2. the file name is battlestar.galactica.0105
the user format submitted will be t.t.SssEee
Ex 3. the file name is battlestar.galactica.0501
the user format submitted will be t.t.EeeSss
Sorry for the book... the concept is simple, the regex function should be dynamic, allowing the user to define the format of a file name to where my method can generate the expression and use it to extract information from the file name. Something is telling me that this is simpler than it seems... but im at a loss. lol... any suggestions?
So if I read this right, you know where the the Season/Episode number is in the string because the user has told you. That is, you have t.t.<number>.more.stuff. And <number> can take one of these forms:
SssEee
EeeSss
ssee
eess
Or did you say that the user can define how many digits will be used for season and episode? That is, could it be S01E123?
I'm not sure you need a regex for this. Since you know the format, and it appears that things are separated by periods (I assume that there can't be periods in the individual fields), you should be able to use String.Split to extract the pieces, and you know from the user's format where the Season/Episode is in the resulting array. So you now have a string that takes one of the forms above.
You have the user's format definition and the Season/Episode number. You should be able to write a loop that steps through the two strings together and extracts the necessary information, or issues an error.
string UserFormat = "SssEee";
string EpisodeNumber = "0105";
int ifmt = 0;
int iepi = 0;
int season = 0;
int episode = 0;
while (ifmt <= UserFormat.Length && iepi < EpisodeNumber.Length)
{
if ((UserFormat[ifmt] == "S" || UserFormat[ifmt] == "E"))
{
if (EpisodeNumber[iepi] == UserFormat[ifmt])
{
++iepi;
}
else if (!char.IsDigit(EpisodeNumber[iepi]))
{
// Error! Chars didn't match, and it wasn't a digit.
break;
}
++ifmt;
}
else
{
char c = EpisodeNumber[iepi];
if (!char.IsDigit(c))
{
// error. Expected digit.
}
if (UserFormat[ifmt] == 'e')
{
episode = (episode * 10) + (int)c - (int)'0';
}
else if (UserFormat[ifmt] == 's')
{
season = (season * 10) + (int)c - (int)'0';
}
else
{
// user format is broken
break;
}
++iepi;
++ifmt;
}
}
Note that you'll probably have to do some checking to see that the lengths are correct. That is, the code above will accept S01E1 when the user's format is SssEee. There's a bit more error handling that you can add, depending on how worried you are about bad input. But I think this gives you the gist of the idea.
I have to think that's going to be a whole lot easier than trying to dynamically build regular expressions.
After #Sinaesthetic answered my question we can reduce his original post to:
The challenge is to receive any of these inputs:
0105 (if your input is 0105 you assume SxxEyy)
S01E05
E05S01 OR
1x05 (read as season 1 episode 5)
and transform any of these inputs into: S01E05
At this point title and file format are irrelevant, they just get tacked on to the ends.
Based on that the following code will always result in 'Battlestar.Galactica.S01E05.mkv'
static void Main(string[] args)
{
string[] inputs = new string[6] { "E05S01", "S01E05", "0105", "105", "1x05", "1x5" };
foreach (string input in inputs)
{
Console.WriteLine(FormatEpisodeTitle("Battlestar.Galactica", input, "mkv"));
}
Console.ReadLine();
}
private static string FormatEpisodeTitle(string showTitle, string identifier, string fileFormat)
{
//first make identifier upper case
identifier = identifier.ToUpper();
//normalize for SssEee & EeeSee
if (identifier.IndexOf('S') > identifier.IndexOf('E'))
{
identifier = identifier.Substring(identifier.IndexOf('S')) + identifier.Substring(identifier.IndexOf('E'), identifier.IndexOf('S'));
}
//now get rid of S and replace E with x as needed:
identifier = identifier.Replace("S", string.Empty).Replace("E", "X");
//at this point, if there isn't an "X" we need one, as in 105 or 0105
if (identifier.IndexOf('X') == -1)
{
identifier = identifier.Substring(0, identifier.Length - 2) + "X" + identifier.Substring(identifier.Length - 2);
}
//now split by the 'X'
string[] identifiers = identifier.Split('X');
// and put it back together:
identifier = 'S' + identifiers[0].PadLeft(2, '0') + 'E' + identifiers[1].PadLeft(2, '0');
//tack it all together
return showTitle + '.' + identifier + '.' + fileFormat;
}

Regular expression that returns a constant value as part of a match

I have a regular expression to match 2 different number formats: \=(?[0-9]+)\?|\+(?[0-9]+)\?
This should return 9876543 as its Value for ;1234567890123456?+1234567890123456789012345123=9876543? and ;1234567890123456?+9876543?
What I would like is to be able to return another value along with the matched 'Value'.
So, for example, if the first string was matched, I'd like it to return:
Value:
9876543
Format:
LongFormat
And if matched in the second string:
Value:
9876543
Format:
ShortFormat
Is this possible?
Another option, which is not quite the solution you wanted, but saves you using two separate regexes, is to use named groups, if your implementation supports it.
Here is some C#:
var regex = new Regex(#"\=(?<Long>[0-9]+)\?|\+(?<Short>[0-9]+)\?");
string test1 = ";1234567890123456?+1234567890123456789012345123=9876543?";
string test2 = ";1234567890123456?+9876543?";
var match = regex.Match(test1);
Console.WriteLine("Long: {0}", match.Groups["Long"]); // 9876543
Console.WriteLine("Short: {0}", match.Groups["Short"]); // blank
match = regex.Match(test2);
Console.WriteLine("Long: {0}", match.Groups["Long"]); // blank
Console.WriteLine("Short: {0}", match.Groups["Short"]); // 9876543
Basically just modify your regex to include the names, and then regex.Groups[GroupName] will either have a value or wont. You could even just use the Success property of the group to know which matched (match.Groups["Long"].Success).
UPDATE:
You can get the group name out of the match, with the following code:
static void Main(string[] args)
{
var regex = new Regex(#"\=(?<Long>[0-9]+)\?|\+(?<Short>[0-9]+)\?");
string test1 = ";1234567890123456?+1234567890123456789012345123=9876543?";
string test2 = ";1234567890123456?+9876543?";
ShowGroupMatches(regex, test1);
ShowGroupMatches(regex, test2);
Console.ReadLine();
}
private static void ShowGroupMatches(Regex regex, string testCase)
{
int i = 0;
foreach (Group grp in regex.Match(testCase).Groups)
{
if (grp.Success && i != 0)
{
Console.WriteLine(regex.GroupNameFromNumber(i) + " : " + grp.Value);
}
i++;
}
}
I'm ignoring the 0th group, because that is always the entire match in .NET
No, you can't match text that isn't there. The match can only return a substring of the target.
You essentially want to match against two patterns and take different actions in each case. See if you can separate them in your code:
if match(\=(?[0-9]+)\?) then
return 'Value: ' + match + 'Format: LongFormat'
else if match(\+(?[0-9]+)\?) then
return 'Value: ' + match + 'Format: ShortFormat'
(Excuse the dodgy pseudocode, but you get the idea.)
You can't match text that isn't there - but, depending on what language you're using, you can process what you match, and conditionally add text based on what is there.
With some implementations of regex, you can specify a "callback function" which allows you to run logic against each result.
Here's a pseudo-code example:
Input.replaceAll( /[+=][0-9]+(?=\?)/ , formatValue );
formatValue : function(match,groups)
{
switch( left(match,1) )
{
case '+' : Format = 'Short'; break;
case '=' : Format = 'Long'; break;
default : Format = 'Unknown'; break;
}
Value : match.replace('[+=]');
return 'Value: '+Value+' Format: ' + Format;
}
What that will do, in a language that supports regex callbacks, is execute the formatValue function every time it finds a match, and use the result of the function as the replacement text.
You haven't specified which implementation you're using, so this may or not be possible for you, but it is definitely worth checking out.

Categories