Parse a string in c# after reading first and last alphabet - c#

I want to cut a string in c# after reading first and last alphabet.
string name = "20150910000549659ABCD000007348summary.pdf";
string result = "ABCD000007348"; // Something like this
string name = "1234 ABCD000007348 summary.pdf";
After reading 1234 "A" comes and at last "s" comes so I want "ABCD000007348"

Simply use Regex:
string CutString(string input)
{
Match result = Regex.Match(input, #"[a-zA-Z]+[0-9]+");
return result.Value;
}

Since you didn't say if it's always a timestamp at the beginning, I've instead opted to iterate over the string to find the first alphabetical character, rather than hardcoding s.Remove(0, n); where n is however many digits are in a timestamp.
string s = "20150910000549659ABCD000007348summary.pdf";
s = s.Replace("summary.pdf", String.Empty);
int firstLetter = 0;
foreach (char c in s)
{
if (Char.IsLetter(c))
{
firstLetter = s.IndexOf(c);
break;
}
}
s = s.Remove(0, firstLetter);

Related

Regex replace all occurences with something that is "derived" from the part to be replaced

I have the following line from a RTF document
10 \u8314?\u8805? 0
(which says in clear text 10 ⁺≥ 0). You can see that the special characters are escaped with \u followed by the decimal unicode and by a question mark (which is the replacement character which should be printed in the case that displaying the special character is not possible). I want to have the text in a string variable in C# which is equivalent to the following variable:
string expected = "10 \u207A\u2265 0";
In the debugger I want to see the variable to have the value of 10 ⁺≥ 0. I therefore must replace every occurence by the corresponding hexadecimal unicode (#207A = 8314 and #2265 = 8805). What is the simplest way to accomplish this with regular expressions?
The code is:
string str = #"10 \u8314?\u8805? 0";
string replaced = Regex.Replace(str, #"\\u([0-9]+)\?", match => {
string value = match.Groups[1].Value;
string hex = #"\u" + int.Parse(value).ToString("X4");
return hex;
});
This will return
string line = #"10 \u207A\u2265 0";
so the \u207A\u2265 won't be unescaped.
Note that the value is first converted to a number (int.Parse(value)) and then converted to a fixed-notation 4 digits hex number (ToString("X4"))
Or
string replaced = Regex.Replace(str, #"\\u([0-9]+)\?", match => {
string value = match.Groups[1].Value;
char ch = (char)int.Parse(value);
return ch.ToString();
});
This will return
string line = #"10 ⁺≥ 0";
If I understood your question correctly, you want to parse the unicode representation of the RTF to a C# string.
So, the one-liner solution looks like this
string result = Regex.Replace(line, #"\\u(\d+?)\?", new MatchEvaluator(m => ((char)Convert.ToInt32(m.Groups[1].Value)).ToString()));
But I suggest to use a cleaner code:
private static string ReplaceRtfUnicodeChar(Match match) {
int number = Convert.ToInt32(match.Groups[1].Value);
char chr = (char)number;
return chr.ToString();
}
public static void Main(string[] args)
{
string line= #"10 \u8314?\u8805? 0";
var r = new Regex(#"\\u(\d+?)\?");
string result = r.Replace(line, new MatchEvaluator(ReplaceRtfUnicodeChar));
Console.WriteLine(result); // Displays 10 ⁺≥ 0
}
You have to use MatchEvaluator:
string input = "10 \u8314?\u8805? 0";
Regex reg = new Regex(#"\\u([A-Fa-f0-9]+)\?",RegexOptions.Multiline);
string result = reg.Replace(input, delegate(Match m) {
return ConvertToWhatYouWant(m.Value);
});

Extracting parts of a string c#

In C# what would be the best way of splitting this sort of string?
%%x%%a,b,c,d
So that I end up with the value between the %% AND another variable containing everything right of the second %%
i.e. var x = "x"; var y = "a,b,c,d"
Where a,b,c.. could be an infinite comma seperated list. I need to extract the list and the value between the two double-percentage signs.
(To combat the infinite part, I thought perhaps seperating the string out to: %%x%% and a,b,c,d. At this point I can just use something like this to get X.
var tag = "%%";
var startTag = tag;
int startIndex = s.IndexOf(startTag) + startTag.Length;
int endIndex = s.IndexOf(tag, startIndex);
return s.Substring(startIndex, endIndex - startIndex);
Would the best approach be to use regex or use lots of indexOf and substring to do the extracting based on te static %% characters?
Given that what you want is "x,a,b,c,d" the Split() function is actually pretty powerful and regex would be overkill for this.
Here's an example:
string test = "%%x%%a,b,c,d";
string[] result = test.Split(new char[] { '%', ',' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string s in result) {
Console.WriteLine(s);
}
Basicly we ask it to split by both '%' and ',' and ignore empty results (eg. the result between "%%"). Here's the result:
x
a
b
c
d
To Extract X:
If %% is always at the start then;
string s = "%%x%%a,b,c,d,h";
s = s.Substring(2,s.LastIndexOf("%%")-2);
//Console.WriteLine(s);
Else;
string s = "v,u,m,n,%%x%%a,b,c,d,h";
s = s.Substring(s.IndexOf("%%")+2,s.LastIndexOf("%%")-s.IndexOf("%%")-2);
//Console.WriteLine(s);
If you need to get them all at once then use this;
string s = "m,n,%%x%%a,b,c,d";
var myList = s.ToArray()
.Where(c=> (c != '%' && c!=','))
.Select(c=>c).ToList();
This'll let you do it all in one go:
string pattern = "^%%(.+?)%%(?:(.+?)(?:,|$))*$";
string input = "%%x%%a,b,c,d";
Match match = Regex.Match(input, pattern);
if (match.Success)
{
// "x"
string first = match.Groups[1].Value;
// { "a", "b", "c", "d" }
string[] repeated = match.Groups[2].Captures.Cast<Capture>()
.Select(c => c.Value).ToArray();
}
You can use the char.IsLetter to get all the list of letter
string test = "%%x%%a,b,c,d";
var l = test.Where(c => char.IsLetter(c)).ToArray();
var output = string.Join(", ", l.OrderBy(c => c));
Since you want the value between the %% and everything after in separate variables and you don't need to parse the CSV, I think a RegEx solution would be your best choice.
var inputString = #"%%x%%a,b,c,d";
var regExPattern = #"^%%(?<x>.+)%%(?<csv>.+)$";
var match = Regex.Match(inputString, regExPattern);
foreach (var item in match.Groups)
{
Console.WriteLine(item);
}
The pattern has 2 named groups called x and csv, so rather than just looping, you can easily reference them by name and assign them to values:
var x = match.Groups["x"];
var y = match.Groups["csv"];

Retrieve String Containing Specific substring C#

I am having an output in string format like following :
"ABCDED 0000A1.txt PQRSNT 12345"
I want to retreieve substring(s) having .txt in above string. e.g. For above it should return 0000A1.txt.
Thanks
You can either split the string at whitespace boundaries like it's already been suggested or repeatedly match the same regex like this:
var input = "ABCDED 0000A1.txt PQRSNT 12345 THE.txt FOO";
var match = Regex.Match (input, #"\b([\w\d]+\.txt)\b");
while (match.Success) {
Console.WriteLine ("TEST: {0}", match.Value);
match = match.NextMatch ();
}
Split will work if it the spaces are the seperator. if you use oter seperators you can add as needed
string input = "ABCDED 0000A1.txt PQRSNT 12345";
string filename = input.Split(' ').FirstOrDefault(f => System.IO.Path.HasExtension(f));
filname = "0000A1.txt" and this will work for any extension
You may use c#, regex and pattern, match :)
Here is the code, plug it in try. Please comment.
string test = "afdkljfljalf dkfjd.txt lkjdfjdl";
string ffile = Regex.Match(test, #"\([a-z0-9])+.txt").Groups[1].Value;
Console.WriteLine(ffile);
Reference: regexp
I did something like this:
string subString = "";
char period = '.';
char[] chArString;
int iSubStrIndex = 0;
if (myString != null)
{
chArString = new char[myString.Length];
chArString = myString.ToCharArray();
for (int i = 0; i < myString.Length; i ++)
{
if (chArString[i] == period)
iSubStrIndex = i;
}
substring = myString.Substring(iSubStrIndex);
}
Hope that helps.
First split your string in array using
char[] whitespace = new char[] { ' ', '\t' };
string[] ssizes = myStr.Split(whitespace);
Then find .txt in array...
// Find first element starting with .txt.
//
string value1 = Array.Find(array1,
element => element.Contains(".txt", StringComparison.Ordinal));
Now your value1 will have the "0000A1.txt"
Happy coding.

All elements before last comma in a string in c#

How can i get all elements before comma(,) in a string in c#?
For e.g.
if my string is say
string s = "a,b,c,d";
then I want all the element before d i.e. before the last comma.So my new string shout look like
string new_string = "a,b,c";
I have tried split but with that i can only one particular element at a time.
string new_string = s.Remove(s.LastIndexOf(','));
If you want everything before the last occurrence, use:
int lastIndex = input.LastIndexOf(',');
if (lastIndex == -1)
{
// Handle case with no commas
}
else
{
string beforeLastIndex = input.Substring(0, lastIndex);
...
}
Use the follwoing regex: "(.*),"
Regex rgx = new Regex("(.*),");
string s = "a,b,c,d";
Console.WriteLine(rgx.Match(s).Groups[1].Value);
You can also try:
string s = "a,b,c,d";
string[] strArr = s.Split(',');
Array.Resize(strArr, Math.Max(strArr.Length - 1, 1))
string truncatedS = string.join(",", strArr);

how do i delete one part of a string?

String mystring="start i dont know hot text can it to have here important=value5; x=1; important=value2; z=3;";
suggest i want to get the value of "importante" now i know how to do it with a substring, but it has 2 subistring, then how do i get, first one, and after the next? ...??
if it is not posible i want to try it... save the first. and delete since "start" until value5 for next query save the value2...
how to do any of two things?
i get the first value so...
string word = "important=";
int c= mystring.IndexOf(word);
int c2 = word.Length;
for (int i = c+c2; i < mystring.Length; i++)
{
if (mystring[i].ToString() == ";")
{
break;
}
else
{
label1.Text += mystring[i].ToString(); // c#
// label1.setText(label1.getText()+mystring[i].ToString(); //java
}
}
If you want to extract all values you could use a regex:
string input = "start i dont know hot text can it to have here important=value5; x=1; important=value2; z=3;";
Regex regex = new Regex(#"important=(?<value>\w+)");
List<string> values = new List<string>();
MatchCollection matches = regex.Matches(input);
foreach (Match match in matches)
{
string value= match.Groups["value"].Value;
values.Add(value);
}
You can save the values in an array, instead of showing them with MessageBox.
string mystring = "start i dont know hot text can it to have here important=value5; x=1; important=value2; z=3;";
string temp = mystring;
string word = "important=";
while (temp.IndexOf(word) > 0)
{
MessageBox.Show( temp.Substring(temp.IndexOf(word) + word.Length).Split(';')[0]);
temp = temp.Remove(temp.IndexOf(word), word.Length);
}
You can use 2 methods:
String.Remove()
and
String.Replace()
use regular expression, find all the match and reconstruct the string yourself.

Categories