How should we reverse format & extract a string in C#?

How should we reverse format & extract a string in C#? - c#

Pretty sure we all do string.format and define some string in a specifed format.
I have a string which is always formatted in a way like this:
const string myString = string.Format("pt:{0}-first:{1}", inputString);
To get {0}, I can always check for pt:{ and read till }.
But what is the best/recommended way to extract {0} & {1} from the above variable myString ?

A Regex version of answer, but again, assuming your input doesnt contain '-'
var example = = "pt:hello-first:23";
var str = "pt:(?<First>[^-]+)-first:(?<Second>[^%]+)";
var match = new Regex(str).Match(example);
var first = match.Groups["First"].Value;
var second = match.Groups["Second"].Value;
It might be a good idea that you define what your variable can/cannot contain.

Not sure if this is the best way to do it, but it's the most obvious:
string example = "pt:123-first:456";
var split = example.Split('-');
var pt = split[0].Substring(split[0].IndexOf(':') + 1);
var first = split[1].Substring(split[1].IndexOf(':') + 1);
As Shawn said, if you can guarantee that the variables wont contain either : or - this will be adequate.

Related

Remove part of a string between an start and end

Code first:
string myString = "<at>onePossibleName</at> some question here regarding <at>disPossibleName</at>"
// some code to handle myString and save it in myEditedString
Console.WriteLine(myEditedString);
//output now is: some question here regarding <at>disPossibleName</at>
I want to remove <at>onePossibleName</at> from myString. The string onePossibleName and disPossbileName could be any other string.
So far I am working with
string myEditedString = string.Join(" ", myString.Split(' ').Skip(1));
The problem here would be that if onePossibleName becomes one Possible Name.
Same goes for the try with myString.Remove(startIndex, count) - this is not the solution.

There will be different method depending on what you want, you can go with a IndexOf and a SubString, regex would be a solution too.
// SubString and IndexOf method
// Usefull if you don't care of the word in the at tag, and you want to remove the first at tag
if (myString.Contains("</at>"))
{
var myEditedString = myString.Substring(myString.IndexOf("</at>") + 5);
}
// Regex method
var stringToRemove = "onePossibleName";
var rgx = new Regex($"<at>{stringToRemove}</at>");
var myEditedString = rgx.Replace(myString, string.Empty, 1); // The 1 precise that only the first occurrence will be replaced

You could use this generic regular expression.
var myString = "<at>onePossibleName</at> some question here regarding <at>disPossibleName</at>";
var rg = new Regex(#"<at>(.*?)<\/at>");
var result = rg.Replace(myString, "").Trim();
This would remove all 'at' tags and the content between. The Trim() call is to remove any white space at the beginning/end of the string after the replacement.

string myString = "<at>onePossibleName</at> some question here regarding <at>disPossibleName</at>"
int sFrom = myString.IndexOf("<at>") + "<at>".Length;
int sTo = myString.IndexOf("</at>");
string myEditedString = myString.SubString(sFrom, sFrom - sTo);
Console.WriteLine(myEditedString);
//output now is: some question here regarding <at>disPossibleName</at>

C# String Format give wrong result

I'm trying to format a string of 6 digits with the following format:
1234/56.
I'm using the following:
string.Format("123456", "{0000/00}");
Result 123456

Try something like
var number = int.Parse("123456");
var formattedNumber = number.ToString("####/##", CultureInfo.InvariantCulture);
Edit to include zero fill (if required):
var number = int.Parse("123456");
var formattedNumber = number.ToString("####/##", CultureInfo.InvariantCulture).PadLeft(7,'0');
You will still have issues with any number below 10 (i.e "000009") showing up as 0000/9. In which you are probably better off with a substring modification like below.
var text = "123456";
var formattedNumber = text.Substring(0, 4) + "/" + text.Substring(4, 2);

Please try the below code.
string.Format("{0:####/##}",123456)
I tested the code and is working.
For reference on string formatting please visit:
https://learn.microsoft.com/en-us/dotnet/standard/base-types/custom-numeric-format-strings

If you are using numbers I would use #rob-s 's example above. If you are dealing with actual strings then you can use the following code sample.
var value = "123456";
var index = value.Length - 2;
var formattedValue = $"{value.Substring(0, index)}/{value.Substring(index)}";
Console.WriteLine(formattedValue);
Your output will look like what you have described in your question.
1234/56

The fastest way to trim string in C#

I need to trim paths in million strings like this:
C:\workspace\my_projects\my_app\src\my_component\my_file.cpp
to
src\my_component\my_file.cpp
I.e. remove absolute part of the path, what is the fastest way to do that?
My try using regex:
Regex.Replace(path, #"(.*?)\src", ""),

I wouldn't go with regex for this, use the plain old method.
If the path prefix is always the same:
const string partToRemove = #"C:\workspace\my_projects\my_app\";
if (path.StartsWith(partToRemove, StringComparison.OrdinalIgnoreCase))
path = path.Substring(partToRemove.Length);
If the prefix is variable, you can get the last index of \src\:
var startIndex = path.LastIndexOf(#"\src\", StringComparison.OrdinalIgnoreCase);
if (startIndex >= 0)
path = path.Substring(startIndex + 1);

define the regex with a new and reuse it
there is a (significant) cost to creating the regex
string input = "This is text with far too much " +
"whitespace.";
string pattern = "\\s+";
string replacement = " ";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);

I'm not sure if you need speed here, but if you always get the full path, you could do a simple .Substring()
var path = #"C:\workspace\my_projects\my_app\src\my_component\my_file.cpp";
Console.WriteLine(path.Substring(32));
However, I think you should sanitize your input first; in this case, the Uri class could do the parsing step:
var root = #"C:\workspace\my_projects\my_app\";
var path = #"C:\workspace\my_projects\my_app\src\my_component\my_file.cpp";
var relative = new Uri(root).MakeRelativeUri(new Uri(path));
Console.WriteLine(relative.OriginalString.Replace("/", "\\"));
Notice here the Uri will change the \ with a /: that's the .Replace reason.

Cant think any faster than this
path.Substring(33);
What is before src is constant. and it starts from index 33.
C:\workspace\my_projects\my_app\src\my_component\my_file.cpp
^
How ever if its not always constant. you can find it once. and do the rest inside loop.
int startInd = path.IndexOf(#"\src\") + 1;
// Do this inside loop. 1 million times
path.Substring(startInd);

If your files will all end in "src/filename.ext" you could use the Path class in the .NET framework for it and get around all caveats you could have with pathes and filenames:
result = "src\" + Path.GetFileName(path);
So you should first double-check that the conversion is the thing that takes to long.

Getting substring between two separators in an arbitrary position

I have following string:
string source = "Test/Company/Business/Department/Logs.tvs/v1";
The / character is the separator between various elements in the string. I need to get the last two elements of the string. I have following code for this purpose. This works fine. Is there any faster/simpler code for this?
CODE
static void Main()
{
string component = String.Empty;
string version = String.Empty;
string source = "Test/Company/Business/Department/Logs.tvs/v1";
if (!String.IsNullOrEmpty(source))
{
String[] partsOfSource = source.Split('/');
if (partsOfSource != null)
{
if (partsOfSource.Length > 2)
{
component = partsOfSource[partsOfSource.Length - 2];
}
if (partsOfSource.Length > 1)
{
version = partsOfSource[partsOfSource.Length - 1];
}
}
}
Console.WriteLine(component);
Console.WriteLine(version);
Console.Read();
}

Why no regular expression? This one is fairly easy:
.*/(?<component>.*)/(?<version>.*)$
You can even label your groups so for your match all you need to do is:
component = myMatch.Groups["component"];
version = myMatch.Groups["version"];

The following should be faster, as it only scans as much of the string as it needs to to find two / and it doesn't bother splitting up the whole string:
string component = "";
string version = "";
string source = "Test/Company/Business/Department/Logs.tvs/v1";
int last = source.LastIndexOf('/');
if (last != -1)
{
int penultimate = source.LastIndexOf('/', last - 1);
version = source.Substring(last + 1);
component = source.Substring(penultimate + 1, last - penultimate - 1);
}
That said, as with all performance questions: profile! Try the two side-by-side with a big list of real-life inputs and see which is fastest.
(Also, this will leave empty strings rather than throw an exception if there is no slash in the input... but throw if source is null, lazy me.)

Your approach is the most suitable one given that your are looking for substrings at a particular index. A LINQ expression to do the same in this case will likely not improve the code or its readability.
For reference, there is some great information from Microsoft here on working with strings and LINQ. In particular see the article here which covers some examples with both LINQ and RegEx.
EDIT: +1 For Matt's named group within RegEx approach... that's the nicest solution I've seen.

Your code mostly looks fine. A couple of points to note:
String.Split() will never return null, so you don't need the null check on it.
If the source string has fewer than two / characters, how would you deal with that? (The Original Post was updated to address this)
Do you really want to just output empty strings if your source string is null or empty (or invalid)? If you have specific expectations about the nature of the input, you may want to consider failing fast when those expectations are not met.

You could try something like this but I doubt it would be much faster. You could do some meassurements with System.Diagnostics.StopWatch to see if you feel the need.
string source = "Test/Company/Business/Department/Logs.tvs/v1";
int index1 = source.LastIndexOf('/');
string last = source.Substring(index1 + 1);
string substring = source.Substring(0, index1);
int index2 = substring.LastIndexOf('/');
string secondLast = substring.Substring(index2 + 1);

I would try
string source = "Test/Company/Business/Department/Logs.tvs/v1";
var components = source.Split('/').Reverse().Take(2);
String last = string.Empty;
var enumerable = components as string[] ?? components.ToArray();
if (enumerable.Count() == 2)
last = enumerable.FirstOrDefault();
var secondLast = enumerable.LastOrDefault();
Hope this will help

you can retrieve the last two words using the process as below:
string source = "Test/Company/Business/Department/Logs.tvs/v1";
String[] partsOfSource = source.Split('/');
if(partsOfSourch.length>2)
for(int i=partsOfSourch.length-2;i<=partsOfSource.length-1;i++)
console.writeline(partsOfSource[i]);

Extracting data from plain text string

I am trying to process a report from a system which gives me the following code
000=[GEN] OK {Q=1 M=1 B=002 I=3e5e65656-e5dd-45678-b785-a05656569e}
I need to extract the values between the curly brackets {} and save them in to variables. I assume I will need to do this using regex or similar? I've really no idea where to start!! I'm using c# asp.net 4.
I need the following variables
param1 = 000
param2 = GEN
param3 = OK
param4 = 1 //Q
param5 = 1 //M
param6 = 002 //B
param7 = 3e5e65656-e5dd-45678-b785-a05656569e //I
I will name the params based on what they actually mean. Can anyone please help me here? I have tried to split based on spaces, but I get the other garbage with it!
Thanks for any pointers/help!

If the format is pretty constant, you can use .NET string processing methods to pull out the values, something along the lines of
string line =
"000=[GEN] OK {Q=1 M=1 B=002 I=3e5e65656-e5dd-45678-b785-a05656569e}";
int start = line.IndexOf('{');
int end = line.IndexOf('}');
string variablePart = line.Substring(start + 1, end - start);
string[] variables = variablePart.Split(' ');
foreach (string variable in variables)
{
string[] parts = variable.Split('=');
// parts[0] holds the variable name, parts[1] holds the value
}
Wrote this off the top of my head, so there may be an off-by-one error somewhere. Also, it would be advisable to add error checking e.g. to make sure the input string has both a { and a }.

I would suggest a regular expression for this type of work.
var objRegex = new System.Text.RegularExpressions.Regex(#"^(\d+)=\[([A-Z]+)\] ([A-Z]+) \{Q=(\d+) M=(\d+) B=(\d+) I=([a-z0-9\-]+)\}$");
var objMatch = objRegex.Match("000=[GEN] OK {Q=1 M=1 B=002 I=3e5e65656-e5dd-45678-b785-a05656569e}");
if (objMatch.Success)
{
Console.WriteLine(objMatch.Groups[1].ToString());
Console.WriteLine(objMatch.Groups[2].ToString());
Console.WriteLine(objMatch.Groups[3].ToString());
Console.WriteLine(objMatch.Groups[4].ToString());
Console.WriteLine(objMatch.Groups[5].ToString());
Console.WriteLine(objMatch.Groups[6].ToString());
Console.WriteLine(objMatch.Groups[7].ToString());
}
I've just tested this out and it works well for me.

Use a regular expression.
Quick and dirty attempt:
(?<ID1>[0-9]*)=\[(?<GEN>[a-zA-Z]*)\] OK {Q=(?<Q>[0-9]*) M=(?<M>[0-9]*) B=(?<B>[0-9]*) I=(?<I>[a-zA-Z0-9\-]*)}
This will generate named groups called ID1, GEN, Q, M, B and I.
Check out the MSDN docs for details on using Regular Expressions in C#.
You can use Regex Hero for quick C# regex testing.

You can use String.Split
string[] parts = s.Split(new string[] {"=[", "] ", " {Q=", " M=", " B=", " I=", "}"},
StringSplitOptions.None);

This solution breaks up your report code into segments and stores the desired values into an array.
The regular expression matches one report code segment at a time and stores the appropriate values in the "Parsed Report Code Array".
As your example implied, the first two code segments are treated differently than the ones after that. I made the assumption that it is always the first two segments that are processed differently.
private static string[] ParseReportCode(string reportCode) {
const int FIRST_VALUE_ONLY_SEGMENT = 3;
const int GRP_SEGMENT_NAME = 1;
const int GRP_SEGMENT_VALUE = 2;
Regex reportCodeSegmentPattern = new Regex(#"\s*([^\}\{=\s]+)(?:=\[?([^\s\]\}]+)\]?)?");
Match matchReportCodeSegment = reportCodeSegmentPattern.Match(reportCode);
List<string> parsedCodeSegmentElements = new List<string>();
int segmentCount = 0;
while (matchReportCodeSegment.Success) {
if (++segmentCount < FIRST_VALUE_ONLY_SEGMENT) {
string segmentName = matchReportCodeSegment.Groups[GRP_SEGMENT_NAME].Value;
parsedCodeSegmentElements.Add(segmentName);
}
string segmentValue = matchReportCodeSegment.Groups[GRP_SEGMENT_VALUE].Value;
if (segmentValue.Length > 0) parsedCodeSegmentElements.Add(segmentValue);
matchReportCodeSegment = matchReportCodeSegment.NextMatch();
}
return parsedCodeSegmentElements.ToArray();
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How should we reverse format & extract a string in C#? - c#

Related

Remove part of a string between an start and end

C# String Format give wrong result

The fastest way to trim string in C#

Getting substring between two separators in an arbitrary position

Extracting data from plain text string

Categories

Resources