The fastest way to trim string in C#

The fastest way to trim string in C# - c#

I need to trim paths in million strings like this:
C:\workspace\my_projects\my_app\src\my_component\my_file.cpp
to
src\my_component\my_file.cpp
I.e. remove absolute part of the path, what is the fastest way to do that?
My try using regex:
Regex.Replace(path, #"(.*?)\src", ""),

I wouldn't go with regex for this, use the plain old method.
If the path prefix is always the same:
const string partToRemove = #"C:\workspace\my_projects\my_app\";
if (path.StartsWith(partToRemove, StringComparison.OrdinalIgnoreCase))
path = path.Substring(partToRemove.Length);
If the prefix is variable, you can get the last index of \src\:
var startIndex = path.LastIndexOf(#"\src\", StringComparison.OrdinalIgnoreCase);
if (startIndex >= 0)
path = path.Substring(startIndex + 1);

define the regex with a new and reuse it
there is a (significant) cost to creating the regex
string input = "This is text with far too much " +
"whitespace.";
string pattern = "\\s+";
string replacement = " ";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);

I'm not sure if you need speed here, but if you always get the full path, you could do a simple .Substring()
var path = #"C:\workspace\my_projects\my_app\src\my_component\my_file.cpp";
Console.WriteLine(path.Substring(32));
However, I think you should sanitize your input first; in this case, the Uri class could do the parsing step:
var root = #"C:\workspace\my_projects\my_app\";
var path = #"C:\workspace\my_projects\my_app\src\my_component\my_file.cpp";
var relative = new Uri(root).MakeRelativeUri(new Uri(path));
Console.WriteLine(relative.OriginalString.Replace("/", "\\"));
Notice here the Uri will change the \ with a /: that's the .Replace reason.

Cant think any faster than this
path.Substring(33);
What is before src is constant. and it starts from index 33.
C:\workspace\my_projects\my_app\src\my_component\my_file.cpp
^
How ever if its not always constant. you can find it once. and do the rest inside loop.
int startInd = path.IndexOf(#"\src\") + 1;
// Do this inside loop. 1 million times
path.Substring(startInd);

If your files will all end in "src/filename.ext" you could use the Path class in the .NET framework for it and get around all caveats you could have with pathes and filenames:
result = "src\" + Path.GetFileName(path);
So you should first double-check that the conversion is the thing that takes to long.

Related

Regular expression does not fit to pattern

I want to go through a list of header files, and save the files which those include. My problem is that the pattern does not match.
In this link you can find the pattern which I thought will work: https://regex101.com/r/jbJLxT/3
string rgxPat = "\\#include\\s+\"(?:\\w+\\/)*(\\w +\\.(?:hed|he|hdb|h))\"";
Regex incLRgx = new Regex(rgxPat, RegexOptions.IgnoreCase);
for (int i = alrdyChckd; i < missFiles.Count; i++)
{
tmpStr = baseSBFolder + "\\" + missFiles[i].getPath() + "\\" + missFiles[i].getName();
System.IO.StreamReader actFile = new System.IO.StreamReader(tmpStr);
while((actLine = actFile.ReadLine()) != null)
{
Match match = incLRgx.Match(actLine);
if(match.Success)
{
missFiles.Add(baseSB.getFileByName(match.Groups[1].Value.ToString()));
}
}
alrdyChckd++;
}
I checked the debug varaibles and the match function always give back false return value, while the pattern and the actual line seems to be the same.
Also it's a problem that I cannot add double qoutes as I wanted with the string = #"[pettern]" form because the double queste will close the pattern.

This will give you the paths:
/^\#include\s+"(.+)"$/gm
Output of $1:
FSW/CustSW/CustSW_generic/RSC/Src/gen/rsc_cpif.h
EbsPartition/EbsCluster/EbsCluster_generic/EbsCore/Src/ebscore_basetypes.h
FSW/CustSW/CustSW_generic/RSC/Src/gen/rsc_types.h
If you want just the filenames then use:
/^\#include\s+".*\/([^\/]+)"$/gm
and $1 will give you:
rsc_cpif.h
ebscore_basetypes.h
rsc_types.h

You can capture the group by name to get the file path:
\#include(?<path>\s+"(?:\w+\/)*(\w+\.(?:hed|he|hdb|h))")
match.Groups["path"].Value.ToString()
This will give you the file path captured as "FSW/CustSW/CustSW_generic/RSC/Src/gen/rsc_cpif.h"

How should we reverse format & extract a string in C#?

Pretty sure we all do string.format and define some string in a specifed format.
I have a string which is always formatted in a way like this:
const string myString = string.Format("pt:{0}-first:{1}", inputString);
To get {0}, I can always check for pt:{ and read till }.
But what is the best/recommended way to extract {0} & {1} from the above variable myString ?

A Regex version of answer, but again, assuming your input doesnt contain '-'
var example = = "pt:hello-first:23";
var str = "pt:(?<First>[^-]+)-first:(?<Second>[^%]+)";
var match = new Regex(str).Match(example);
var first = match.Groups["First"].Value;
var second = match.Groups["Second"].Value;
It might be a good idea that you define what your variable can/cannot contain.

Not sure if this is the best way to do it, but it's the most obvious:
string example = "pt:123-first:456";
var split = example.Split('-');
var pt = split[0].Substring(split[0].IndexOf(':') + 1);
var first = split[1].Substring(split[1].IndexOf(':') + 1);
As Shawn said, if you can guarantee that the variables wont contain either : or - this will be adequate.

Split a string at 2 points

I have a file called file_test1.txt and I want to extract just test1 from the name and place it in a string. Whats the best way of doing this?
E.g.
string fullfile = #"C:\file_test1.txt";
string section = [test1] from fullfile; // <- expected result
I want to be able to split on 'file_' and '.txt' as the 'test1' section could be larger or smaller however the 'file_' and '.txt' will always be the same.

Try Path.GetFileNameWithoutExtension(fullfile).Substring(5) (or Substring("TEMPLATE_PREFIX".Length))

You can try spilt
var test = Path.GetFileNameWithoutExtension(fullfile).split('_')[1];

Try following
string fullfile = #"C:\file_test1.txt";
var name = fullfile.Substring(8,fullfile.Length-12)
As c:\file_ and .txt are fixed, You can take Substring starting at index 8 (skip leading name), upto length of total string length - 12 (12 => length of leading name, and trailing extension)

Thought I'd give a solution that uses Split and handles files with multiple underscores:
string.Join("_", Path.GetFileNameWithoutExtension(file).Split('_').Skip(1));

String.Split() works quite well for my uses:
http://msdn.microsoft.com/en-us/library/b873y76a.aspx

Obviously many ways to accomplish this. Here's yet another approach:
string fullfile = #"C:\file_test1.txt";
int index1 = fullfile.LastIndexOf("file_");
if (index1 != -1)
{
int index2 = fullfile.IndexOf(".", index1);
if (index2 != -1)
{
string section = fullfile.Substring(index1 + 5, index2 - index1 - 5);
}
}

You could also get "test1", or any subsequent filename (assuming your file naming convention remains constant!) using this regular expression:
var defaultRegex = new Regex(#"(?<=_).*(?=.txt)");
var matches = defaultRegex.Matches(fullfile);
var match = matches[0].Value;
The regular expression:
(?<=_).*(?=.txt)
uses positive look behind to find text preceded by '_', and also positive lookahead to find text which has '.txt' ahead of it.

Shorthand way to remove last forward slash and trailing characters from string

If I have the following string:
/lorem/ipsum/dolor
and I want this to become:
/lorem/ipsum
What is the short-hand way of removing the last forward slash, and all characters following it?
I know how I can do this by spliting the string into a List<> and removing the last item, and then joining, but is there a shorter way of writing this?
My question is not URL specific.

You can use Substring() and LastIndexOf():
str = str.Substring(0, str.LastIndexOf('/'));
EDIT (suggested comment)
To prevent any issues when the string may not contain a /, you could use something like:
int lastSlash = str.LastIndexOf('/');
str = (lastSlash > -1) ? str.Substring(0, lastSlash) : str;
Storing the position in a temp-variable would prevent the need to call .LastIndexOf('/') twice, but it could be dropped in favor of a one-line solution instead.

If there is '/' at the end of the url, remove it.
If not; just return the original one.
var url = this.Request.RequestUri.ToString();
url = url.EndsWith("/") ? url.Substring(0, url.Length - 1) : url;
url += #"/mycontroller";

You can do something like str.Remove(str.LastIndexOf("/")), but there is no built-in method to do what you want.
Edit: you could also use the Uri object to traverse directories, although it does not give exactly what you want:
Uri baseUri = new Uri("http://domain.com/lorem/ipsum/dolor");
Uri myUri = new Uri(baseUri, ".");
// myUri now contains http://domain.com/lorem/ipsum/

One simple way would be
String s = "domain.com/lorem/ipsum/dolor";
s = s.Substring(0, s.LastIndexOf('/'));
Console.WriteLine(s);
Another maybe
String s = "domain.com/lorem/ipsum/dolor";
s = s.TrimEnd('/');
Console.WriteLine(s);

You can use the regex /[^/]*$ and replace with the empty string:
var fixed = new Regex("/[^/]*$").Replace("domain.com/lorem/ipsum/dolor", "")
But it's probably overkill here. #newfurniturey's answer of Substring with LastIndexOf is probably best.

I like to create a String Extension for stuff like this:
/// <summary>
/// Returns with suffix removed, if present
/// </summary>
public static string TrimIfEndsWith(
this string value,
string suffix)
{
return
value.EndsWith(suffix) ?
value.Substring(0, value.Length - suffix.Length) :
value;
}
You can then use like this:
var myString = "/lorem/ipsum/dolor";
myStringClean = myString.TrimIfEndsWith("/dolor");
You now have a re-usable extension across all of your projects that can be used to remove one trailing character or multiple.

using System.IO;
mystring.TrimEnd(Path.AltDirectorySeparatorChar); // To remove "/"
mystring.TrimEnd(Path.DirectorySeparatorChar); // To remove "\"

while (input.Last() == '/' || input.Last() == '\\')
{
input = input.Substring(0, input.Length - 1);
}

Thank you #Curt for your question.
I slightly improved #newfurniturey's code, and here is my version.
if(str.Contains('/')){
str = str.Substring(0, str.LastIndexOf('/'));
}

I'm way late to the party, but if you're using C# 8.0+, another clean approach would be to use the range operator:
if (urlStr.EndsWith("/")) urlStr = urlStr[..^1];
If you're curious as to how this works, take a look at the spec for ranges in C#:
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/proposals/csharp-8.0/ranges
tldr; urlStr[..^1] roughly translates to something along the lines of "Give me a substring comprised of the characters contained within the range of index 0 to whatever index is 1 away from the last index.".
In other words, it's similar to...
urlStr.Substring(0, urlStr.Length-1)

replace a character in a string in c# based on position with a string

I want to replace a charecter in a string with a string in c#.
I have tried the following,
Here in the following program, i want replace set of charecters between charecters ':' and first occurance of '-' with some others charecters.
I could able to extract the set of charecters between ':' and first occurance of '-'.
Can any one say how to insert these back in the source string.
string source= "tcm:7-426-8";
string target= "tcm:10-15-2";
int fistunderscore = target.IndexOf("-");
string temp = target.Substring(4, fistunderscore-4);
Response.Write("<BR>"+"temp1:" + temp + "<BR>");
Examples:
source: "tcm:7-426-8" or "tcm:100-426-8" or "tcm:10-426-8"
Target: "tcm:10-15-2" or "tcm:5-15-2" or "tcm:100-15-2"
output: "tcm:10-426-8" or "tcm:5-426-8" or "tcm:100-426-8"
In a nutshell, I want to replace the set of charectes between ':' and '-'(firstoccurance) and the charecters extracetd from the same sort of string.
Can any help how it can be done.
Thank you.

If you want to replace the first ":Number-" from the source with the content from target, you can use the following regex.
var pattern1 = New Regex(":\d{1,3}-{1}");
if(pattern1.IsMatch(source) && pattern1.IsMatch(target))
{
var source = "tcm:7-426-8";
var target = "tcm:10-15-2";
var res = pattern1.Replace(source, pattern1.Match(target).Value);
// "tcm:10-426-8"
}
Edit: To not have your string replaced with something empty, add an if-clause before the actualy replacing.

Try a regex solution - first this method, takes the source and target strings, and performs a regex replace on the first, targetting the first numbers after the 'tcm', which must be anchored to the start of the string. In the MatchEvaluator it executes the same regex again, but on the target string.
static Regex rx = new Regex("(?<=^tcm:)[0-9]+", RegexOptions.Compiled);
public string ReplaceOneWith(string source, string target)
{
return rx.Replace(source, new MatchEvaluator((Match m) =>
{
var targetMatch = rx.Match(target);
if (targetMatch.Success)
return targetMatch.Value;
return m.Value; //don't replace if no match
}));
}
Note that no replacement is performed if the regex doesn't return a match on the target string.
Now run this test (probably need to copy the above into the test class):
[TestMethod]
public void SO9973554()
{
Assert.AreEqual("tcm:10-426-8", ReplaceOneWith("tcm:7-426-8", "tcm:10-15-2"));
Assert.AreEqual("tcm:5-426-8", ReplaceOneWith("tcm:100-426-8", "tcm:5-15-2"));
Assert.AreEqual("tcm:100-426-8", ReplaceOneWith("tcm:10-426-8", "tcm:100-15-2"));
}

I'm not clear on the logic used to decide which bit from which string is used, but still, you should use Split(), rather than mucking about with string offsets:
(note that the Remove(0,4) is there to remove the tcm: prefix)
string[] source = "tcm:90-2-10".Remove(0,4).Split('-');
string[] target = "tcm:42-23-17".Remove(0,4).Split('-');
Now you have the numbers from both source and target in easy-to-access arrays, so you can build the new string any way you want:
string output = string.Format("tcm:{0}-{1}-{2}", source[0], target[1], source[2]);

Heres without regex
string source = "tcm:7-426-8";
string target = "tcm:10-15-2";
int targetBeginning = target.IndexOf("-");
int sourceBeginning = source.IndexOf("-");
string temp = target.Substring(0, targetBeginning);//tcm:10
string result = temp + source.Substring(sourceBeginning, source.Length-sourceBeginning); //tcm:10 + -426-8

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

The fastest way to trim string in C# - c#

I need to trim paths in million strings like this: C:\workspace\my_projects\my_app\src\my_component\my_file.cpp to src\my_component\my_file.cpp I.e. remove absolute part of the path, what is the fastest way to do that? My try using regex: Regex.Replace(path, #"(.*?)\src", ""),

define the regex with a new and reuse it there is a (significant) cost to creating the regex string input = "This is text with far too much " + "whitespace."; string pattern = "\\s+"; string replacement = " "; Regex rgx = new Regex(pattern); string result = rgx.Replace(input, replacement);

Related

Regular expression does not fit to pattern

How should we reverse format & extract a string in C#?

Split a string at 2 points

Shorthand way to remove last forward slash and trailing characters from string

replace a character in a string in c# based on position with a string

Categories

Resources