Extract string parts separated by pound sign using C#

Extract string parts separated by pound sign using C# - c#

I have the the following string [5111110233857£254736283045£1000£25£212541£20120605152412
£KEN£NAI],[5111110233858£254736283045£2500£25£257812£2012
0605152613£KEN£NAI]. The comma separated strings are derived from a web service and the number depends on the records returned.
Now, the fields are separated by pound sign (£). I want to extract each field and save to database.
I have tried string.split() but i don't know how to use it on unknown number of strings.

What you can do here is Split by the comma, remove the [ and ] in each record then Split by the pound sign, example:
string test = "[5111110233857£254736283045£1000£25£212541£20120605152412 £KEN£NAI],[5111110233858£254736283045£2500£25£257812£20120605152613£KEN£NAI]";
string[] commaSeperatedStrings = test.Split(',').Select(s => s.Substring(1, s.Length - 2)).ToArray();
foreach (string commaSeperatedString in commaSeperatedStrings)
{
string[] numbers = commaSeperatedString.Split('£');
foreach (string number in numbers)
{
// You can int.Parse each number and work with them now
}
}

A little program that will remove the brackets, split on comma, then split on pound sign, using foreach loops instead of using lambda statements (easier to understand for some).
class Program
{
static void Main(string[] args)
{
string s = "[5111110233857£254736283045£1000£25£212541£20120605152412 £KEN£NAI],[5111110233858£254736283045£2500£25£257812£2012 0605152613£KEN£NAI]";
s = s.Replace("[", "").Replace("]", "");
var split_s = s.Split(',');
List<string> ans = new List<string>();
foreach(var x in split_s)
{
var t = x.Split('£');
foreach(var y in t)
{
ans.Add(y);
}
}
foreach(var x in ans)
{
Console.WriteLine(x);
}
Console.WriteLine("Press any key");
Console.ReadKey();
}
}
Using a `List', this is a dynamic array which you can add elements as you go, regardless of the number of values in your original string.

Related

Replacing first 16 digits in a string with Regex.Replace

I'm trying to replace only the first 16 digits of a string with Regex. I want it replaced with "*". I need to take this string:
"Request=Credit Card.Auth
Only&Version=4022&HD.Network_Status_Byte=*&HD.Application_ID=TZAHSK!&HD.Terminal_ID=12991kakajsjas&HD.Device_Tag=000123&07.POS_Entry_Capability=1&07.PIN_Entry_Capability=0&07.CAT_Indicator=0&07.Terminal_Type=4&07.Account_Entry_Mode=1&07.Partial_Auth_Indicator=0&07.Account_Card_Number=4242424242424242&07.Account_Expiry=1024&07.Transaction_Amount=142931&07.Association_Token_Indicator=0&17.CVV=200&17.Street_Address=123
Road SW&17.Postal_Zip_Code=90210&17.Invoice_Number=INV19291"
And replace the credit card number with an asterisk, which is why I say the first 16 digits, as that is how many digits are in a credit card. I am first splitting the string where there is a "." and then checking if it contains "card" and "number". Then if it finds it I want to replace the first 16 numbers with "*"
This is what I've done:
public void MaskData(string input)
{
if (input.Contains("."))
{
string[] userInput = input.Split('.');
foreach (string uInput in userInput)
{
string lowerCaseInput = uInput.ToLower();
string containsCard = "card";
string containsNumber = "number";
if (lowerCaseInput.Contains(containsCard) && lowerCaseInput.Contains(containsNumber))
{
tbStoreInput.Text += Regex.Replace(lowerCaseInput, #"[0-9]", "*") + Environment.NewLine;
}
else
{
tbStoreInput.Text += lowerCaseInput + Environment.NewLine;
}
}
}
}
I am aware that the Regex is wrong, but not sure how to only get the first 16, as right now its putting an asterisks in the entire line like seen here:
"account_card_number=****************&**"
I don't want it to show the asterisks after the "&".

Same answer as in the comments but explained.
your regex pattern "[0-9]" is a single digit match, so each individual digit
including the digits after & will be a match and so would be replaced.
What you want to do is add a quantifier which restricts the matching to a number of characters ie 16, so your regex changes to "[0-9]{16}" to ensure those are the only characters affected by your replace operation

Disclaimer
My answer is purposely broader than what is asked by OP but I saw it as an opportunity to raise awareness of other tools that are available in C# (which are objects).
String replacement
Regex is not the only tool available to replace a simple string by another. Instead of
Regex.Replace(lowerCaseInput, #"[0-9]{16}", "****************")
it can also be
new StringBuilder()
.Append(lowerCaseInput.Take(20))
.Append(new string('*', 16))
.Append(lowerCaseInput.Skip(36))
.ToString();
Shifting from procedural to object
Now the real meat comes in the possibility to encapsulate the logic into an object which holds a kind of string representation of a dictionary (entries being separated by '.' while keys and values are separated by '=').
The only behavior this object has is to give back a string representation of the initial input but with some value (1 in your case) masked to user (I assume for some security reason).
public sealed class CreditCardRequest
{
private readonly string _input;
public CreditCardRequest(string input) => _input = input;
public static implicit operator string(CreditCardRequest request) => request.ToString();
public override string ToString()
{
var entries = _input.Split(".", StringSplitOptions.RemoveEmptyEntries)
.Select(entry => entry.Split("="))
.ToDictionary(kv => kv[0].ToLower(), kv =>
{
if (kv[0] == "Account_Card_Number")
{
return new StringBuilder()
.Append(new string('*', 16))
.Append(kv[1].Skip(16))
.ToString();
}
else
{
return kv[1];
}
});
var output = new StringBuilder();
foreach (var kv in entries)
{
output.AppendFormat("{0}={1}{2}", kv.Key, kv.Value, Environment.NewLine);
}
return output.ToString();
}
}
Usage becomes as follow:
tbStoreInput.Text = new CreditCardRequest(input);
The concerns of your code are now independant of each other (the rule to parse the input is no more tied to UI component) and the implementation details are hidden.
You can even decide to use Regex in CreditCardRequest.ToString() if you wish to, the UI won't ever notice the change.
The class would then becomes:
public override string ToString()
{
var output = new StringBuilder();
if (_input.Contains("."))
{
foreach (string uInput in _input.Split('.'))
{
if (uInput.StartsWith("Account_Card_Number"))
{
output.AppendLine(Regex.Replace(uInput.ToLower(), #"[0-9]{16}", "****************");
}
else
{
output.AppendLine(uInput.ToLower());
}
}
}
return output.ToString();
}

You can match 16 digits after the account number, and replace with 16 times an asterix:
(?<=\baccount_card_number=)[0-9]{16}\b
Regex demo
Or you can use a capture group and use that group in the replacement like $1****************
\b(account_card_number=)[0-9]{16}\b
Regex demo

How to read between a specified character in a string?

I was trying to create a list from a user input with something like this:
Create newlist: word1, word2, word3, etc...,
but how do I get those words one by one only by using commas as references going through them (in order) and placing them into an Array etc? Example:
string Input = Console.ReadLine();
if (Input.Contains("Create new list:"))
{
foreach (char character in Input)
{
if (character == ',')//when it reach a comma
{
//code goes here, where I got stuck...
}
}
}
Edit: I didn`t know the existence of "Split" my mistake... but at least it would great if you could explain me to to use it for the problem above?

You can use this:
String words = "word1, word2, word3";
List:
List<string> wordsList= words.Split(',').ToList<string>();
Array:
string[] namesArray = words.Split(',');

#patrick Artner beat me to it, but you can just split the input with the comma as the argument, or whatever you want the argument to be.
This is the example, and you will learn from the documentation.
using System;
public class Example {
public static void Main() {
String value = "This is a short string.";
Char delimiter = 's';
String[] substrings = value.Split(delimiter);
foreach (var substring in substrings)
Console.WriteLine(substring);
}
}
The example displays the following output:
Thi
i
a
hort
tring.

Split with multiple characters

So I am coding a converter program that convers a old version of code to the new version you just put the old text in a text box and it converts Txt to Xml and im trying to get each items beetween two characters and below is the string im trying to split. I have put just the name of the param in the " " to protect my users credentials. So i want to get every part of code beetween the ","
["Id","Username","Cash","Password"],["Id","Username","Cash","Password"]
And then add each string to a list so it would be like
Item 1
["Id","Username","Cash","Password"]
Item 2
["Id","Username","Cash","Password"]
I would split it using "," but then it would mess up because there is a "," beetween the params of the string so i tried using "],"
string input = textBox1.Text;
string[] parts1 = input.Split(new string[] { "]," }, StringSplitOptions.None);
foreach (string str in parts1)
{
//Params is a list...
Params.Add(str);
}
MessageBox.Show(string.Join("\n\n", Params));
But it sort of take the ] of the end of each one. And it messes up in other ways

This looks like a great opportunity for Regular Expressions.
My approach would be to get the row parts first, then get the column parts. I'm sure there are about 30 ways to do this, but this is my (simplistic) approach.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var rowPattern = new Regex(#"(?<row>\[[^]]+\])", RegexOptions.Multiline | RegexOptions.ExplicitCapture);
var columnPattern = new Regex(#"(?<column>\"".+?\"")", RegexOptions.Multiline | RegexOptions.ExplicitCapture);
var data = "[\"Id\",\"Username\",\"Cash\",\"Password\"],[\"Id\",\"Username\",\"Cash\",\"Password\"]";
var rows = rowPattern.Matches(data);
var rowCounter = 0;
foreach (var row in rows)
{
Console.WriteLine("Row #{0}", ++rowCounter);
var columns = columnPattern.Matches(row.ToString());
foreach (var column in columns)
Console.WriteLine("\t{0}", column);
}
Console.ReadLine();
}
}
}
Hope this helps!!

You can use Regex.Split() together with positive lookbehind and lookahead to do this:
var parts = Regex.Split(input, "(?<=]),(?=\\[)");
Basically this says “split on , with ] right before it and [ right after it”.

Assuming that the character '|' does not occur in your original data, you can try:
input.Replace("],[", "]|[").Split(new char[]{'|'});
If the pipe character does occur, use another (non-occurring) character.

Finding the number of occurences strings in a specific format occur in a given text

I have a large string, where there can be specific words (text followed by a single colon, like "test:") occurring more than once. For example, like this:
word:
TEST:
word:
TEST:
TEST: // random text
"word" occurs twice and "TEST" occurs thrice, but the amount can be variable. Also, these words don't have to be in the same order and there can be more text in the same line as the word (as shown in the last example of "TEST"). What I need to do is append the occurrence number to each word, for example the output string needs to be this:
word_ONE:
TEST_ONE:
word_TWO:
TEST_TWO:
TEST_THREE: // random text
The RegEx for getting these words which I've written is ^\b[A-Za-z0-9_]{4,}\b:. However, I don't know how to accomplish the above in a fast way. Any ideas?

Regex is perfect for this job - using Replace with a match evaluator:
This example is not tested nor compiled:
public class Fix
{
public static String Execute(string largeText)
{
return Regex.Replace(largeText, "^(\w{4,}):", new Fix().Evaluator);
}
private Dictionary<String, int> counters = new Dictionary<String, int>();
private static String[] numbers = {"ONE", "TWO", "THREE",...};
public String Evaluator(Match m)
{
String word = m.Groups[1].Value;
int count;
if (!counters.TryGetValue(word, out count))
count = 0;
count++;
counters[word] = count;
return word + "_" + numbers[count-1] + ":";
}
}
This should return what you requested when calling:
result = Fix.Execute(largeText);

i think you can do this with Regax.Replace(string, string, MatchEvaluator) and a dictionary.
Dictionary<string, int> wordCount=new Dictionary<string,int>();
string AppendIndex(Match m)
{
string matchedString = m.ToString();
if(wordCount.Contains(matchedString))
wordCount[matchedString]=wordCount[matchedString]+1;
else
wordCount.Add(matchedString, 1);
return matchedString + "_"+ wordCount.ToString();// in the format: word_1, word_2
}
string inputText = "....";
string regexText = #"";
static void Main()
{
string text = "....";
string result = Regex.Replace(text, #"^\b[A-Za-z0-9_]{4,}\b:",
new MatchEvaluator(AppendIndex));
}
see this:
http://msdn.microsoft.com/en-US/library/cft8645c(v=VS.80).aspx

If I understand you correctly, regex is not necessary here.
You can split your large string by the ':' character. Maybe you also need to read line by line (split by '\n'). After that you just create a dictionary (IDictionary<string, int>), which counts the occurrences of certain words. Every time you find word x, you increase the counter in the dictionary.
EDIT
Read your file line by line OR split the string by '\n'
Check if your delimiter is present. Either by splitting by ':' OR using regex.
Get the first item from the split array OR the first match of your regex.
Use a dictionary to count your occurrences.
if (dictionary.Contains(key)) dictionary[key]++;
else dictionary.Add(key, 1);
If you need words instead of numbers, then create another dictionary for these. So that dictionary[key] equals one if key equals 1. Mabye there is another solution for that.

Look at this example (I know it's not perfect and not so nice)
lets leave the exact argument for the Split function, I think it can help
static void Main(string[] args)
{
string a = "word:word:test:-1+234=567:test:test:";
string[] tks = a.Split(':');
Regex re = new Regex(#"^\b[A-Za-z0-9_]{4,}\b");
var res = from x in tks
where re.Matches(x).Count > 0
select x + DecodeNO(tks.Count(y=>y.Equals(x)));
foreach (var item in res)
{
Console.WriteLine(item);
}
Console.ReadLine();
}
private static string DecodeNO(int n)
{
switch (n)
{
case 1:
return "_one";
case 2:
return "_two";
case 3:
return "_three";
}
return "";
}

How can I split this string into an array?

My string is as follows:
smtp:jblack#test.com;SMTP:jb#test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;
I need back:
smtp:jblack#test.com
SMTP:jb#test.com
X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;
The problem is the semi-colons seperate the addresses and also part of the X400 address. Can anyone suggest how best to split this?
PS I should mentioned the order differs so it could be:
X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;;smtp:jblack#test.com;SMTP:jb#test.com
There can be more than 3 address, 4, 5.. 10 etc including an X500 address, however they do all start with either smtp: SMTP: X400 or X500.

EDIT: With the updated information, this answer certainly won't do the trick - but it's still potentially useful, so I'll leave it here.
Will you always have three parts, and you just want to split on the first two semi-colons?
If so, just use the overload of Split which lets you specify the number of substrings to return:
string[] bits = text.Split(new char[]{';'}, 3);

May I suggest building a regular expression
(smtp|SMTP|X400|X500):((?!smtp:|SMTP:|X400:|X500:).)*;?
or protocol-less
.*?:((?![^:;]*:).)*;?
in other words find anything that starts with one of your protocols. Match the colon. Then continue matching characters as long as you're not matching one of your protocols. Finish with a semicolon (optionally).
You can then parse through the list of matches splitting on ':' and you'll have your protocols. Additionally if you want to add protocols, just add them to the list.
Likely however you're going to want to specify the whole thing as case-insensitive and only list the protocols in their uppercase or lowercase versions.
The protocol-less version doesn't care what the names of the protocols are. It just finds them all the same, by matching everything up to, but excluding a string followed by a colon or a semi-colon.

Split by the following regex pattern
string[] items = System.Text.RegularExpressions.Split(text, ";(?=\w+:)");
EDIT: better one can accept more special chars in the protocol name.
string[] items = System.Text.RegularExpressions.Split(text, ";(?=[^;:]+:)");

http://msdn.microsoft.com/en-us/library/c1bs0eda.aspx
check there, you can specify the number of splits you want. so in your case you would do
string.split(new char[]{';'}, 3);

Not the fastest if you are doing this a lot but it will work for all cases I believe.
string input1 = "smtp:jblack#test.com;SMTP:jb#test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";
string input2 = "X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;;smtp:jblack#test.com;SMTP:jb#test.com";
Regex splitEmailRegex = new Regex(#"(?<key>\w+?):(?<value>.*?)(\w+:|$)");
List<string> sets = new List<string>();
while (input2.Length > 0)
{
Match m1 = splitEmailRegex.Matches(input2)[0];
string s1 = m1.Groups["key"].Value + ":" + m1.Groups["value"].Value;
sets.Add(s1);
input2 = input2.Substring(s1.Length);
}
foreach (var set in sets)
{
Console.WriteLine(set);
}
Console.ReadLine();
Of course many will claim Regex: Now you have two problems. There may even be a better regex answer than this.

You could always split on the colon and have a little logic to grab the key and value.
string[] bits = text.Split(':');
List<string> values = new List<string>();
for (int i = 1; i < bits.Length; i++)
{
string value = bits[i].Contains(';') ? bits[i].Substring(0, bits[i].LastIndexOf(';') + 1) : bits[i];
string key = bits[i - 1].Contains(';') ? bits[i - 1].Substring(bits[i - 1].LastIndexOf(';') + 1) : bits[i - 1];
values.Add(String.Concat(key, ":", value));
}
Tested it with both of your samples and it works fine.

This caught my curiosity .... So this code actually does the job, but again, wants tidying :)
My final attempt - stop changing what you need ;=)
static void Main(string[] args)
{
string fneh = "X400:C=US400;A= ;P=Test;O=Exchange;S=Jack;G=Black;x400:C=US400l;A= l;P=Testl;O=Exchangel;S=Jackl;G=Blackl;smtp:jblack#test.com;X500:C=US500;A= ;P=Test;O=Exchange;S=Jack;G=Black;SMTP:jb#test.com;";
string[] parts = fneh.Split(new char[] { ';' });
List<string> addresses = new List<string>();
StringBuilder address = new StringBuilder();
foreach (string part in parts)
{
if (part.Contains(":"))
{
if (address.Length > 0)
{
addresses.Add(semiColonCorrection(address.ToString()));
}
address = new StringBuilder();
address.Append(part);
}
else
{
address.AppendFormat(";{0}", part);
}
}
addresses.Add(semiColonCorrection(address.ToString()));
foreach (string emailAddress in addresses)
{
Console.WriteLine(emailAddress);
}
Console.ReadKey();
}
private static string semiColonCorrection(string address)
{
if ((address.StartsWith("x", StringComparison.InvariantCultureIgnoreCase)) && (!address.EndsWith(";")))
{
return string.Format("{0};", address);
}
else
{
return address;
}
}

Try these regexes. You can extract what you're looking for using named groups.
X400:(?<X400>.*?)(?:smtp|SMTP|$)
smtp:(?<smtp>.*?)(?:;+|$)
SMTP:(?<SMTP>.*?)(?:;+|$)
Make sure when constructing them you specify case insensitive. They seem to work with the samples you gave

Lots of attempts. Here is mine ;)
string src = "smtp:jblack#test.com;SMTP:jb#test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";
Regex r = new Regex(#"
(?:^|;)smtp:(?<smtp>([^;]*(?=;|$)))|
(?:^|;)x400:(?<X400>.*?)(?=;x400|;x500|;smtp|$)|
(?:^|;)x500:(?<X500>.*?)(?=;x400|;x500|;smtp|$)",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
foreach (Match m in r.Matches(src))
{
if (m.Groups["smtp"].Captures.Count != 0)
Console.WriteLine("smtp: {0}", m.Groups["smtp"]);
else if (m.Groups["X400"].Captures.Count != 0)
Console.WriteLine("X400: {0}", m.Groups["X400"]);
else if (m.Groups["X500"].Captures.Count != 0)
Console.WriteLine("X500: {0}", m.Groups["X500"]);
}
This finds all smtp, x400 or x500 addresses in the string in any order of appearance. It also identifies the type of address ready for further processing. The appearance of the text smtp, x400 or x500 in the addresses themselves will not upset the pattern.

This works!
string input =
"smtp:jblack#test.com;SMTP:jb#test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";
string[] parts = input.Split(';');
List<string> output = new List<string>();
foreach(string part in parts)
{
if (part.Contains(":"))
{
output.Add(part + ";");
}
else if (part.Length > 0)
{
output[output.Count - 1] += part + ";";
}
}
foreach(string s in output)
{
Console.WriteLine(s);
}

Do the semicolon (;) split and then loop over the result, re-combining each element where there is no colon (:) with the previous element.
string input = "X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G="
+"Black;;smtp:jblack#test.com;SMTP:jb#test.com";
string[] rawSplit = input.Split(';');
List<string> result = new List<string>();
//now the fun begins
string buffer = string.Empty;
foreach (string s in rawSplit)
{
if (buffer == string.Empty)
{
buffer = s;
}
else if (s.Contains(':'))
{
result.Add(buffer);
buffer = s;
}
else
{
buffer += ";" + s;
}
}
result.Add(buffer);
foreach (string s in result)
Console.WriteLine(s);

here is another possible solution.
string[] bits = text.Replace(";smtp", "|smtp").Replace(";SMTP", "|SMTP").Replace(";X400", "|X400").Split(new char[] { '|' });
bits[0],
bits[1], and
bits[2]
will then contains the three parts in the order from your original string.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extract string parts separated by pound sign using C# - c#

Related

Replacing first 16 digits in a string with Regex.Replace

How to read between a specified character in a string?

Split with multiple characters

Finding the number of occurences strings in a specific format occur in a given text

How can I split this string into an array?

Categories

Resources