The best way to split a string without a separator - c#

I have string:
MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938
From the string above you can see that it is separated by colon (:), but in the actual environment, it does not have a standard layout.
The standard is the fields name, example MONEY-ID, and MONEY-STAT.
How I can I split it the right way? And get the value from after the fields name?

Something like that should work:
string s = "MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938";
Regex regex = new Regex(#"MONEY-ID(?<moneyId>.*?)\:MONEY-STAT(?<moneyStat>.*?)\:MONEY-PAYetr-(?<moneyPaetr>.*?)$"); Match match = regex.Match(s);
if (match.Success)
{
Console.WriteLine("Money ID: " + match.Groups["moneyId"].Value);
Console.WriteLine("Money Stat: " + match.Groups["moneyStat"].Value);
Console.WriteLine("Money Paetr: " + match.Groups["moneyPaetr"].Value);
}
Console.WriteLine("hit <enter>");
Console.ReadLine();
UPDATE
Answering additional question, if we're not sure in format, then something like the following could be used:
string s = "MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938";
var itemsToExtract = new List<string> { "MONEY-STAT", "MONEY-PAYetr-", "MONEY-ID", };
string regexFormat = #"{0}(?<{1}>[\d]*?)[^\w]";//sample - MONEY-ID(?<moneyId>.*?)\:
foreach (var item in itemsToExtract)
{
string input = s + ":";// quick barbarian fix of lack of my knowledge of regex. Sorry
var match = Regex.Match(input, string.Format(regexFormat, item, "match"));
if (match.Success)
{
Console.WriteLine("Value of {0} is:{1}", item, match.Groups["match"]);
}
}
Console.WriteLine("hit <enter>");
Console.ReadLine();

As Andre said, I would personally go with regular expressions.
Use groups of something like,
"MONEY-ID(?<moneyid>.*)MONEY-STAT(?<moneystat>.*)MONEY-PAYetr(?<moneypay>.*)"
See this post for how to extract the groups.
Probably followed by a private method that trims off illegal characters in the matched group (e.g. : or -).

Check this out:
string regex = #"^(?i:money-id)(?<moneyid>.*)(?i:money-stat)(?<moneystat>.*)(?i:money-pay)(?<moneypay>.*)$";
string input = "MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938";
Match regexMatch = Regex.Match(input, regex);
string moneyID = regexMatch.Groups["moneyid"].Captures[0].Value.Trim();
string moneyStat = regexMatch.Groups["moneystat"].Captures[0].Value.Trim();
string moneyPay = regexMatch.Groups["moneypay"].Captures[0].Value.Trim();

Try
string data = "MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938";
data = data.Replace("MONEY-", ";");
string[] myArray = data.Split(';');
foreach (string s in myArray)
{
if (!string.IsNullOrEmpty(s))
{
if (s.StartsWith("ID"))
{
}
else if (s.StartsWith("STAT"))
{
}
else if (s.StartsWith("PAYetr"))
{
}
}
}
results in
ID123456:
STAT43:
PAYetr-1232832938

For example, using regular expressions,
(?<=MONEY-ID)(\d)*
It will extract
123456
from your string.

Related

How to read a specific line and text from a text file

string lot = "RU644276G01";
var year = "201" + lot.Substring(2, 1);
var folder = #"\\sinsdn38.ap.infineon.com\ArchView\03_Reports\" + year +
#"\" + lot.Substring(3, 2) + #"\" + lot.Substring(0,8) + #"\";
DirectoryInfo di = new DirectoryInfo(folder);
foreach (var fi in di.GetFiles("*.TLT"))
{
var file = fi.FullName;
string line;
using (StreamReader sr = new StreamReader(file))
{
while ((line = sr.ReadLine()) != null)
{
if (line.StartsWith("TEST-END"))
{
timeStampTextBox.Text = line;
}
}
}
This is my code currently.
I want to read from a specific line (for example line 8) and the line starts with "Test-End". However, line 8 contains all these
"TEST-END : 2017-01-08 15:51 PROGRAM : TLE8888QK-B2 BAU-NR : 95187193"
but I only want to read "2017-01-98 15:51".
How do I change my code to get that? Currently I'm getting the whole line instead of the specific timestamp that I want.
Edit
How do I change the code such that the string lot =" " can be any number, meaning it does not need to be RU644276G01, it can be a different number which will be typed by users. I have created a textbox for users to input the number.
You extract the text. It seems quite regular pattern, so regular expressions should be able to help:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
var line = "TEST-END : 2017-01-08 15:51 PROGRAM : TLE8888QK-B2 BAU-NR : 95187193";
Regex re = new Regex(#"^(?:TEST-END : )(.*?\d{4}-\d{2}-\d{2} \d{2}:\d{2})");
var match = re.Match(line);
Console.WriteLine(match.Groups[1]);
Console.ReadLine(); // leave console open
}
}
Output:
2017-01-08 15:51 // this is group 1, group 0 is the full capture including TEST-END :
Use this to check it in regexr: https://regexr.com/3l1sf if you hover about the text it will diplay your capturing groups
The regex means:
^ start of the string
(?:TEST-END : ) non capturing group, text must be present
( a group
.*? as few (0-n) anythings as possible
\d{4}-\d{2}-\d{2} \d{2}:\d{2} 4 digits-2 digits-2digits 2digits:2digits
) end of group
More about regular expressions:
RegEx-Class
a regex Tester (one of many, the one I use): https://regexr.com/
Here is my answer using Regular Expressions.
if (line.StartsWith("TEST-END"))
{
Regex re = new Regex(#"\d{4}-\d{2}-\d{2} \d{2}:\d{2}");
var match = re.Match(line);
if(m.Success)
{
timeStampTextBox.Text = match.Value;
}
}
Output: 2017-01-08 15:51
you can split the line with ":", like this
var value = line.split(':');
and get your date like this.
var date = value[1] + ":" + value[2].Replace("PROGRAM", "");
above statement means
date = "2017-01-98 15" + ":" + "51"
if (line.StartsWith("TEST-END"))
{
var value = line.split(':');
var date = value[1] + ":" + value[2].Replace("PROGRAM", "");
timeStampTextBox.Text = date;
}
This is not the best answer, it depends on exactly the statement you had given.
I finally got all three parameters out of the last line
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Dictionary<string, string> dict = new Dictionary<string, string>();
string pattern = #"(?'name'[^\s]+)\s:\s(?'value'[\w\s\-]*|\d{4}-\d{2}-\d{2}\s\d{2}:\d{2})";
string line = "TEST-END : 2017-01-08 15:51 PROGRAM : TLE8888QK-B2 BAU-NR : 95187193";
MatchCollection matches = Regex.Matches(line, pattern, RegexOptions.RightToLeft);
foreach (Match match in matches)
{
Console.WriteLine("name : '{0}', value : '{1}'", match.Groups["name"].Value, match.Groups["value"].Value);
dict.Add(match.Groups["name"].Value, match.Groups["value"].Value);
}
DateTime date = DateTime.Parse(dict["TEST-END"]);
Console.ReadLine();
}
}
}

I want to read text and replace parts that are surrounded by signals

I have a chunk of text for example
string OriginalText = "Hello my name is <!name!> and I am <!age!> years old";
I'm struggling to write a function that I can enter this text into and it will return the same string except with the values surrounded by the Tags "<!" and "!>" to be replace with actual values. I have some code written but don't know how to progress any further.
if(OriginalText.Contains("<!")) //Checks if Change is necessary
{
string[] Total = OriginalText.Split(
new Char[] { '<', '!' },
StringSplitOptions.RemoveEmptyEntries);
if(Total[1].Contains("!>")) //Checks if closing tag exists
{
string ExtTag = Total[1].Split(
new Char[] { '<', '!' },
StringSplitOptions.RemoveEmptyEntries)[0];
ExtData.Add(Total[1].Split(
new Char[] { '<', '!' },
StringSplitOptions.RemoveEmptyEntries)[0]);
return Total[1];
}
}
The desired output would be
"Hello my name is James and I am 21 years old"
I am currently getting this text from a database and so this functions purpose would be to read that text and input the correct information.
Edit: Figured out how to do it so I'm going to include it below however I'm writing this in a program called mattersphere so there will reference to functions that aren't standard c#, I will put comments next to them explain what they do.
private string ConvertCodeToExtData(string OriginalText) //Accepts text with the identifying symbols as placeholders
{
string[] OriginalWords = OriginalText.Split(' '); //Creates array of individual words
string ConvertedText = string.Empty;
int Index = 0;
foreach(string OriginalWord in OriginalWords) //Go through each word in the array
{
if(OriginalWord.Substring(0,1).Equals("<") && OriginalWord.Substring(OriginalWord.Length-1 ,1).Equals(">")) //Checks if Change is necessary
{
string[] ExtDataCodeAndSymbols = OriginalWord.Substring(1, OriginalWord.Length-2).Split('.'); //Decided to create 4 different parts inbetween the <> tags it goes Symbol(e.g £, $, #) . area to look . field . symbol //separates the Code Identifier and the ExtData and Code
try
{
foreach(ExtendedData ex in this.CurrentSession.CurrentFile.ExtendedData) //Search through All data connected to the file, Extended data is essentially all the data from the database that is specific to the current user
{
if(ex.Code.ToLower() == ExtDataCodeAndSymbols[1].ToLower())
{
OriginalWords[Index] = ExtDataCodeAndSymbols[0] + ex.GetExtendedData(ExtDataCodeAndSymbols[2]).ToString() + ExtDataCodeAndSymbols[3]; //Replace code with new data
break;
}
}
}
catch (Exception ex)
{
System.Windows.Forms.MessageBox.Show("Extended Data Field " + ExtDataCodeAndSymbols[1] + "." + ExtDataCodeAndSymbols[2] + " Not found, please speak to your system administrator"); //Handles Error if Ext Data is not found
}
}
Index++;
}
foreach(string Word in OriginalWords)
{
ConvertedText += Word + " "; //Adds all words into a single string and adds space
}
ConvertedText.Remove(ConvertedText.Length -1, 1); //Removes Last Space
return ConvertedText;
}
The text goes in "Hello my name is <.Person.name.> and I have <£.Account.Balance.> in my bank account" and comes out "Hello my name is James and I have £100 in my bank account"
The symbols are optional but the "." are necessary as they are used to split the strings early in the function
If you have to use <!...!> placeholders, I suggest regular expressions:
using System.Text.RegularExpressions;
...
string OriginalText = "Hello my name is <!name!> and I am <!age!> years old";
Dictionary<string, string> substitutes =
new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase) {
{ "name", "John" },
{ "age", "108"},
};
string result = Regex
.Replace(OriginalText,
#"<!([A-Za-z0-9]+)!>", // let placeholder contain letter and digits
match => substitutes[match.Groups[1].Value]);
Console.WriteLine(result);
Outcome:
Hello my name is John and I am 108 years old
Assuming you are stuck with that format, and assuming you know the list of fields ahead of time, you can compose a dictionary of replacement strings and, well, replace them.
//Initialize fields dictionary
var fields = new Dictionary<string, string>();
fields.Add("name", "John");
fields.Add("age", "18");
//Replace each field if it is found
string text = OriginalText;
foreach (var key in fields.Keys)
{
string searchFor = "<!" + key + "!>";
text = text.Replace(searchFor, fields[key]);
}
If the values for the replacement fields come from a domain object, you could just iterate over the properties using reflection:
class Person
{
public string Name { get; set; }
public int Age { get; set; }
}
class Program
{
const string OriginalText = "Hello my name is <!name!> and I am <!age!> years old";
public static void Main()
{
var p = new Person();
p.Age = 18;
p.Name = "John";
//Initialize fields dictionary
var fields = new Dictionary<string, string>();
foreach (var prop in typeof(Person).GetProperties(BindingFlags.Public | BindingFlags.Instance))
{
fields.Add(prop.Name, prop.GetValue(p).ToString());
}
///etc....
And if you need the tag check to be case insensitive, you can use this instead of String.Replace():
string searchFor = #"\<\!" + key + #"\!\>";
text = Regex.Replace(text, searchFor, fields[key], RegexOptions.IgnoreCase);
I think you're looking for this:
var str = string.Format("Hello my name is {0} and I am {1} years old", name, age);
Or, since C# 6, you can just use this:
var str = $"Hello my name is {name} and I am {age} years old";

C# Check string for specific length of numbers

I have the ability to search and return files in a given file location. I also have the ability to return a number sequence from the file name as such:
public List<AvailableFile> GetAvailableFiles(string rootFolder)
{
List<AvailableFile> files = new List<AvailableFile>();
if (Directory.Exists(rootFolder))
{
Log.Info("Checking folder: " + rootFolder + " for files");
try
{
foreach (string f in Directory.GetFiles(rootFolder))
{
files = FileUpload.CreateFileList(f);
var getNumbers = new String(f.Where(Char.IsDigit).ToArray());
System.Diagnostics.Debug.WriteLine(getNumbers);
}
}
catch (System.Exception excpt)
{
Log.Fatal("GetAvailableFiles failed: " + excpt.Message);
}
}
return files;
}
What I want to do now is only return a sequence of numbers that is exactly 8 characters long. For example a file with the name New File1 12345678 123 I'm only caring about getting 12345678 back.
How can I modify my method to achieve this?
A regex seems to be good for this:
var r = new Regex(".*(\\d{8})");
foreach (string f in Directory.GetFiles(rootFolder))
{
files = FileUpload.CreateFileList(f);
var match = r.Match(f);
if(m.Success)
{
Console.WriteLine(m.Groups[1]); // be aware that index zero contains the entire matched string
}
}
The regex will match the very first occurence of 8 digits and put it into the GroupsCollection.
You could use a regular expression:
var match = Regex.Match(input, #"\d{8}");

Regex find all occurrences of a pattern in a string

I have a problem finding all occurences of a pattern in a string.
Check this string :
string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
I want to return the 2 occurrences (in order to later decode them):
=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?=
and
=?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?="
With the following regex code, it returns only 1 occurrence: the full string.
var charSetOccurences = new Regex(#"=\?.*\?B\?.*\?=", RegexOptions.IgnoreCase);
var charSetMatches = charSetOccurences.Matches(input);
foreach (Match match in charSetMatches)
{
charSet = match.Groups[0].Value.Replace("=?", "").Replace("?B?", "").Replace("?b?", "");
}
Do you know what I'm missing?
When regexp parser sees the .* character sequence, it matches everything up to the end of the string and goes back, char by char, (greedy match). So, to avoid the problem, you can use a non-greedy match or explicitly define the characters that can appear at a string.
"=\?[a-zA-Z0-9?=-]*\?B\?[a-zA-Z0-9?=-]*\?="
A non-regex way:
string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
string[] charSetOccurences = msg.Split(new string[]{ " " }, StringSplitOptions.None);
foreach (string s in charSetOccurences)
{
string charSet = s.Replace("=?", "").Replace("?B?", "").Replace("?b?", "");
Console.WriteLine(charSet);
}
See an ideone.
And if you still want to use regex, you should make the .* lazy by adding a ?. This was already mentioned by the previous users, but it seems you are not getting matches?
string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
var charSetOccurences = new Regex(#"=\?.*?\?B\?.*?\?=", RegexOptions.IgnoreCase);
var charSetMatches = charSetOccurences.Matches(msg);
foreach (Match match in charSetMatches)
{
string charSet = match.Groups[0].Value.Replace("=?", "").Replace("?B?", "").Replace("?b?", "");
Console.WriteLine(charSet);
}
See another ideone.
The output is the same in both cases:
windows-1258UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?=
windows-1258IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=
EDIT: As per update, see an all in one solution for your problem
string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
var charSetOccurences = new Regex(#"=\?.*?\?[BQ]\?.*?\?=", RegexOptions.IgnoreCase);
MatchCollection matches = charSetOccurences.Matches(msg);
foreach (Match match in matches)
{
string[] encoding = match.Groups[0].Value.Split(new string[]{ "?" }, StringSplitOptions.None);
string charSet = encoding[1];
string encodeType = encoding[2];
string encodedString = encoding[3];
Console.WriteLine("Charset: " + charSet);
Console.WriteLine("Encoding type: " + encodeType);
Console.WriteLine("Encoded String: " + encodedString + "\n");
}
Returns:
Charset: windows-1258
Encoding type: B
Encoded String: UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz
Charset: windows-1258
Encoding type: B
Encoded String: IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=
See this.
Or since we already had the regex, we can use:
string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
var charSetOccurences = new Regex(#"=\?(.*?)\?([BQ])\?(.*?)\?=", RegexOptions.IgnoreCase);
MatchCollection matches = charSetOccurences.Matches(msg);
foreach (Match match in matches)
{
Console.WriteLine("Charset: " + match.Groups[1].Value);
Console.WriteLine("Encoding type: " + match.Groups[2].Value);
Console.WriteLine("Encoded String: " + match.Groups[3].Value + "\n");
}
Returns the same output.
.* is greedy and will match everything from the first ? to the last ?B?.
You need to use either a non-greedy match
=\?.*?\?B\?.*?\?=
or exclude ? from your list of characters
=\?[^?]*\?B\?[^?]*\?=

RegEx -- getting rid of double whitespaces?

I have an app that goes in, replaces "invalid" chars (as defined by my Regex) with a blankspace. I want it so that if there are 2 or more blank spaces in the filename, to trim one. For example:
Deal A & B.txt after my app runs, would be renamed to Deal A   B.txt (3 spaces b/w A and B). What i want is really this: Deal A B.txt (one space between A and B).
I'm trying to determine how to do this--i suppose my app will have to run through all filenames at least once to replace invalid chars and then run through filenames again to get rid of extraneous whitespace.
Can anybody help me with this?
Here is my code currently for replacing the invalid chars:
public partial class CleanNames : Form
{
public CleanNames()
{
InitializeComponent();
}
public void Sanitizer(List<string> paths)
{
string regPattern = (#"[~#&$!%+{}]+");
string replacement = " ";
Regex regExPattern = new Regex(regPattern);
StreamWriter errors = new StreamWriter(#"S:\Testing\Errors.txt", true);
var filesCount = new Dictionary<string, int>();
dataGridView1.Rows.Clear();
try
{
foreach (string files2 in paths)
{
string filenameOnly = System.IO.Path.GetFileName(files2);
string pathOnly = System.IO.Path.GetDirectoryName(files2);
string sanitizedFileName = regExPattern.Replace(filenameOnly, replacement);
string sanitized = System.IO.Path.Combine(pathOnly, sanitizedFileName);
if (!System.IO.File.Exists(sanitized))
{
DataGridViewRow clean = new DataGridViewRow();
clean.CreateCells(dataGridView1);
clean.Cells[0].Value = pathOnly;
clean.Cells[1].Value = filenameOnly;
clean.Cells[2].Value = sanitizedFileName;
dataGridView1.Rows.Add(clean);
System.IO.File.Move(files2, sanitized);
}
else
{
if (filesCount.ContainsKey(sanitized))
{
filesCount[sanitized]++;
}
else
{
filesCount.Add(sanitized, 1);
}
string newFileName = String.Format("{0}{1}{2}",
System.IO.Path.GetFileNameWithoutExtension(sanitized),
filesCount[sanitized].ToString(),
System.IO.Path.GetExtension(sanitized));
string newFilePath = System.IO.Path.Combine(System.IO.Path.GetDirectoryName(sanitized), newFileName);
System.IO.File.Move(files2, newFilePath);
sanitized = newFileName;
DataGridViewRow clean = new DataGridViewRow();
clean.CreateCells(dataGridView1);
clean.Cells[0].Value = pathOnly;
clean.Cells[1].Value = filenameOnly;
clean.Cells[2].Value = newFileName;
dataGridView1.Rows.Add(clean);
}
}
}
catch (Exception e)
{
errors.Write(e);
}
}
private void SanitizeFileNames_Load(object sender, EventArgs e)
{ }
private void dataGridView1_CellContentClick(object sender, DataGridViewCellEventArgs e)
{
}
private void button1_Click(object sender, EventArgs e)
{
Application.Exit();
}
}
The problem is, that not all files after a rename will have the same amount of blankspaces. As in, i could have Deal A&B.txt which after a rename would become Deal A B.txt (1 space b/w A and B--this is fine). But i will also have files that are like: Deal A & B & C.txt which after a rename is: Deal A   B   C.txt (3 spaces between A,B and C--not acceptable).
Does anybody have any ideas/code for how to accomplish this?
Do the local equivalent of:
s/\s+/ /g;
Just add a space to your regPattern. Any collection of invalid characters and spaces will be replaced with a single space. You may waste a little bit of time replacing a space with a space, but on the other hand you won't need a second string manipulation call.
Does this help?
var regex = new System.Text.RegularExpressions.Regex("\\s{2,}");
var result = regex.Replace("Some text with a lot of spaces, and 2\t\ttabs.", " ");
Console.WriteLine(result);
output is:
Some text with a lot of spaces, and 2 tabs.
It just replaces any sequence of 2 or more whitespace characters with a single space...
Edit:
To clarify, I would just perform this regex right after your existing one:
public void Sanitizer(List<string> paths)
{
string regPattern = (#"[~#&$!%+{}]+");
string replacement = " ";
Regex regExPattern = new Regex(regPattern);
Regex regExPattern2 = new Regex(#"\s{2,}");
and:
foreach (string files2 in paths)
{
string filenameOnly = System.IO.Path.GetFileName(files2);
string pathOnly = System.IO.Path.GetDirectoryName(files2);
string sanitizedFileName = regExPattern.Replace(filenameOnly, replacement);
sanitizedFileName = regExPattern2.Replace(sanitizedFileName, replacement); // clean up whitespace
string sanitized = System.IO.Path.Combine(pathOnly, sanitizedFileName);
I hope that makes more sense.
you can perform another regex replace after your first one
#" +" -> " "
As Fosco said, with formatting:
while (mystring.Contains(" ")) mystring = mystring.Replace(" "," ");
// || || |
After you're done sanitizing it your way, simply replace 2 spaces with 1 space, while 2 spaces exist in the string.
while (mystring.Contains(" ")) mystring = mystring.Replace(" "," ");
I think that's the right syntax...

Categories