I'd like to trim these purchase order file names (a few examples below) so that everything after the first "_" is omitted.
INCOLOR_fc06_NEW.pdf
Keep: INCOLOR (write this to db as the VendorID) Remove: _fc08_NEW.pdf
NORTHSTAR_sc09.xls
Keep: NORTHSTAR (write this to db as the VendorID) Remove: _sc09.xls
Our scenario: The managers are uploading these files to our Intranet web server, to make them available to download/view ect. I'm using Brettles NeatUpload, and for each file uploaded, am writing the files attributes into the PO table (sql 2000). The first part of the file name will be written to the DB as a VendorID.
The naming convention for these files is consistent in that the the first part of the file is always the vendor name (or Vendor ID) followed by an "_" then other unpredictable chars used to identify the type of Purchase Order then the file extention - which is consistently either .xls, .XLS, .PDF, or .pdf.
I tried TrimEnd - but the array of chars that you have to provide ends up being long and can conflict with the part of the file name I want to keep. I have a feeling I'm not using TrimEnd properly.
What is the best way to use string.TrimEnd (or any other string manipulation in C#) that will strip off all chars after the first "_" ?
String s = "INCOLOR_fc06_NEW.pdf";
int index = s.IndexOf("_");
return index >= 0 ? s.Substring(0,index) : s;
I'll probably offend the anti-regex lobby, but here I go (ducking):
string stripped = Regex.Replace(filename, #"(?<=[^_]*)_.*",String.Empty);
This code will strip all extra characters after the first '_', unless there is no '_' in the string (then it will just return the original string).
It's one line of code. It's slower than the more elaborate IndexOf() algorithm, but when used in a non-performance-sensitive part of the code, it's a good solution.
Get your flame-throwers out...
TrimEnd removes white spaces and punctuation marks at the end of the String, it won't help you here. Read more about TrimEnd here:
http://msdn.microsoft.com/en-us/library/system.string.trimend.aspx
Bnaffas code (with a small tweak):
String fileName = "INCOLOR_fc06_NEW.pdf";
int index = fileName.IndexOf("_");
return index >= 0 ? fileName.Substring(0, index) : fileName;
If you want to do something with the other parts, you could use a Split
string fileName = "INCOLOR_fc06_NEW.pdf";
string[] parts = fileName.Split('_');
public string StripOffStuff(string sInput)
{
int iIndex = sInput.IndexOf("_");
return (iIndex > 0) ? sInput.Substring(0, iIndex) : sInput;
}
// Call it like:
string sNewString = StripOffStuff("INCOLOR_fc06_NEW.pdf");
I would go with the SubString approach but to round out the available solutions here's a LINQ approach just for fun:
string filename = "INCOLOR_fc06_NEW.pdf";
string result = new string(filename.TakeWhile(c => c != '_').ToArray());
It'll return the original string if no underscore is found.
To go with all the "alternative" solutions, here's the second one that I thought of (after substring):
string filename = "INCOLOR_fc06_NEW.pdf";
string stripped = filename.Split('_')[0];
Related
So I have this string which I have to trim and manipulate a little with it.
My string example:
string test = "studentName_123.pdf";
Now, what I want to do is somehow extract only the _123 part and at the end I need to have studentName.pdf
What I have tried:
string test_extracted = test.Substring(0, test.LastIndexOf("_") )+".pdf";
This also works but the thing is that I don't want to add the ".pdf" suffix at the end of the string manually because I can have strings that are not pdf, for ex. studentName.docx , studentName.png.
So basically I just want the "_123" part removed but still keep the remain part after that.
I think this might help you:
string test = "studentName_123.pdf";
string test_extracted = test.Substring(0, test.LastIndexOf("_") )+ test.Substring(test.LastIndexOf("."),test.Length - test.LastIndexOf(".") );
Using Remove(int startIndex, int count):
string test = "studentName_123.pdf";
string test_extracted = test.Remove(test.LastIndexOf("_"), test.LastIndexOf(".") - test.LastIndexOf("_"));
Sounds like you mean something like this?
string extension = Path.GetExtension(test);
string pdfName = Path.GetFileNameWithoutExtension(test).Split('_')[0];
string fullName = pdfName + extension;
Since you know what value you will always be replacing in your strings, "_123", to base on your example, just utilize the replace method and replace it with nothing since the method expects two arguments;
string test_extracted = test.replace('_123', '');
This could be solved with a regular expression like this
(\w*)_.*(\.\w*) where the first capture group (\w*) matches everything before the underscore and the second group (\.\w*) matches the file extensions.
Lastly we just have to concat the groups without the stuff inbetween like so:
string test = "studentName_123.pdf";
var regex = Regex.Match(test, #"(\w*)_.*(\.\w*)");
string newString = regex.Groups[1].Value + regex.Groups[2].Value;
How to get whole text from document contacted into the string. I'm trying to split text by dot: string[] words = s.Split('.'); I want take this text from text document. But if my text document contains empty lines between strings, for example:
pat said, “i’ll keep this ring.”
she displayed the silver and jade wedding ring which, in another time track,
she and joe had picked out; this
much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
result looks like this:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track
3. she and joe had picked out this
4. much of the alternate world she had elected to retain
5. he wondered what if any legal basis she had kept in addition
6. none he hoped wisely however he said nothing
7. better not even to ask
but desired correct output should be like this:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track she and joe had picked out this much of the alternate world she had elected to retain
3. he wondered what if any legal basis she had kept in addition
4. none he hoped wisely however he said nothing
5. better not even to ask
So to do this first I need to process text file content to get whole text as single string, like this:
pat said, “i’ll keep this ring.” she displayed the silver and jade wedding ring which, in another time track, she and joe had picked out; this much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
I can't to do this same way as it would be with list content for example: string concat = String.Join(" ", text.ToArray());,
I'm not sure how to contact text into string from text document
I think this is what you want:
var fileLocation = #"c:\\myfile.txt";
var stringFromFile = File.ReadAllText(fileLocation);
//replace Environment.NewLine with any new line character your file uses
var withoutNewLines = stringFromFile.Replace(Environment.NewLine, "");
//modify to remove any unwanted character
var withoutUglyCharacters = Regex.Replace(withoutNewLines, "[“’”,;-]", "");
var withoutTwoSpaces = withoutUglyCharacters.Replace(" ", " ");
var result = withoutTwoSpaces.Split('.').Where(i => i != "").Select(i => i.TrimStart()).ToList();
So first you read all text from your file, then you remove all unwanted characters and then split by . and return non empty items
Have you tried replacing double new-lines before splitting using a period?
static string[] GetSentences(string filePath) {
if (!File.Exists(filePath))
throw new FileNotFoundException($"Could not find file { filePath }!");
var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
var sentences = Regex.Split(lines, #"\.[\s]{1,}?");
return sentences;
}
I haven't tested this, but it should work.
Explanation:
if (!File.Exists(filePath))
throw new FileNotFoundException($"Could not find file { filePath }!");
Throws an exception if the file could not be found. It is advisory you surround the method call with a try/catch.
var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
Creates a string, and ignores any lines which are purely whitespace or empty.
var sentences = Regex.Split(lines, #".[\s]{1,}?");
Creates a string array, where the string is split at every period and whitespace following the period.
E.g:
The string "I came. I saw. I conquered" would become
I came
I saw
I conquered
Update:
Here's the method as a one-liner, if that's your style?
static string[] SplitSentences(string filePath) => File.Exists(filePath) ? Regex.Split(string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line))), #"") : null;
I would suggest you to iterate through all characters and just check if they are in range of 'a' >= char <= 'z' or if char == ' '. If it matches the condition then add it to the newly created string else check if it is '.' character and if it is then end your line and add another one :
List<string> lines = new List<string>();
string line = string.Empty;
foreach(char c in str)
{
if((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20)
line += c;
else if(c == '.')
{
lines.Add(line.Trim());
line = string.Empty;
}
}
Working online example
Or if you prefer "one-liner"s :
IEnumerable<string> lines = new string(str.Select(c => (char)(((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20) ? c : c == '.' ? '\n' : '\0')).ToArray()).Split('\n').Select(s => s.Trim());
I may be wrong about this. I would think that you may not want to alter the string if you are splitting it. Example, there are double/single quote(s) (“) in part of the string. Removing them may not be desired which brings up the possibly of a question, reading a text file that contains single/double quotes (as your example data text shows) like below:
var stringFromFile = File.ReadAllText(fileLocation);
will not display those characters properly in a text box or the console because the default encoding using the ReadAllText method is UTF8. Example the single/double quotes will display (replacement characters) as diamonds in a text box on a form and will be displayed as a question mark (?) when displayed to the console. To keep the single/double quotes and have them display properly you can get the encoding for the OS’s current ANSI encoding by adding a parameter to the ReadAllText method like below:
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
Below is code using a simple split method to .split the string on periods (.) Hope this helps.
private void button1_Click(object sender, EventArgs e) {
string fileLocation = #"C:\YourPath\YourFile.txt";
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
string bigString = stringFromFile.Replace(Environment.NewLine, "");
string[] result = bigString.Split('.');
int count = 1;
foreach (string s in result) {
if (s != "") {
textBox1.Text += count + ". " + s.Trim() + Environment.NewLine;
Console.WriteLine(count + ". " + s.Trim());
count++;
}
else {
// period at the end of the string
}
}
}
I'm currently trying to strip a string of data that is may contain the hyphen symbol.
E.g. Basic logic:
string stringin = "test - 9894"; OR Data could be == "test";
if (string contains a hyphen "-"){
Strip stringin;
output would be "test" deleting from the hyphen.
}
Console.WriteLine(stringin);
The current C# code i'm trying to get to work is shown below:
string Details = "hsh4a - 8989";
var regexItem = new Regex("^[^-]*-?[^-]*$");
string stringin;
stringin = Details.ToString();
if (regexItem.IsMatch(stringin)) {
stringin = stringin.Substring(0, stringin.IndexOf("-") - 1); //Strip from the ending chars and - once - is hit.
}
Details = stringin;
Console.WriteLine(Details);
But pulls in an Error when the string does not contain any hyphen's.
How about just doing this?
stringin.Split('-')[0].Trim();
You could even specify the maximum number of substrings using overloaded Split constructor.
stringin.Split('-', 1)[0].Trim();
Your regex is asking for "zero or one repetition of -", which means that it matches even if your input does NOT contain a hyphen. Thereafter you do this
stringin.Substring(0, stringin.IndexOf("-") - 1)
Which gives an index out of range exception (There is no hyphen to find).
Make a simple change to your regex and it works with or without - ask for "one or more hyphens":
var regexItem = new Regex("^[^-]*-+[^-]*$");
here -------------------------^
It seems that you want the (sub)string starting from the dash ('-') if original one contains '-' or the original string if doesn't have dash.
If it's your case:
String Details = "hsh4a - 8989";
Details = Details.Substring(Details.IndexOf('-') + 1);
I wouldn't use regex for this case if I were you, it makes the solution much more complex than it can be.
For string I am sure will have no more than a couple of dashes I would use this code, because it is one liner and very simple:
string str= entryString.Split(new [] {'-'}, StringSplitOptions.RemoveEmptyEntries)[0];
If you know that a string might contain high amount of dashes, it is not recommended to use this approach - it will create high amount of different strings, although you are looking just for the first one. So, the solution would look like something like this code:
int firstDashIndex = entryString.IndexOf("-");
string str = firstDashIndex > -1? entryString.Substring(0, firstDashIndex) : entryString;
you don't need a regex for this. A simple IndexOf function will give you the index of the hyphen, then you can clean it up from there.
This is also a great place to start writing unit tests as well. They are very good for stuff like this.
Here's what the code could look like :
string inputString = "ho-something";
string outPutString = inputString;
var hyphenIndex = inputString.IndexOf('-');
if (hyphenIndex > -1)
{
outPutString = inputString.Substring(0, hyphenIndex);
}
return outPutString;
How can I split a string from the end to some character I want.
Let me explain in example
"C:\Users\Esat\Desktop\BilimResimler\1620855_759701257391419_1132489417_n.jpg"
and I want to cut this part 1620855_759701257391419_1132489417_n.jpg but I have a lot of image and image names always changing so i can not use substring metod.So how can i do this ?
just to add to the answers - if this refers to a file that physically exists on disk, then why not let fileinfo do the work for you?
var path = #"C:\Users\Esat\Desktop\BilimResimler\1620855_759701257391419_1132489417_n.jpg";
System.IO.FileInfo myImageFile = new System.IO.FileInfo(path);
Console.WriteLine(myImageFile.Name); // gives 1620855_759701257391419_1132489417_n.jpg
You can search for the last "\" character and eliminate everything from it, including him.
OR
From 0 to the index of the length of "C:\Users\Esat\Desktop\BilimResimler\" - 1 (37 - 1 if I counted correctly) keep the string and eliminate everything else.
This should do it
string imageNameAndPath=#"C:\Users\Esat\Desktop\BilimResimler\1620855_759701257391419_1132489417_n.jpg"
imageNameAndPath=imageNameAndPath.Substring(0, imageNameAndPath.LastIndexOf('/'));
string FileName = Path.GetFileName(Path)
You can also get your file name using below code.
var path = #"C:\Users\Esat\Desktop\BilimResimler\1620855_759701257391419_1132489417_n.jpg";
string ImgPath = path.Substring(path.LastIndexOf(#"\") + 1);
I have a file called file_test1.txt and I want to extract just test1 from the name and place it in a string. Whats the best way of doing this?
E.g.
string fullfile = #"C:\file_test1.txt";
string section = [test1] from fullfile; // <- expected result
I want to be able to split on 'file_' and '.txt' as the 'test1' section could be larger or smaller however the 'file_' and '.txt' will always be the same.
Try Path.GetFileNameWithoutExtension(fullfile).Substring(5) (or Substring("TEMPLATE_PREFIX".Length))
You can try spilt
var test = Path.GetFileNameWithoutExtension(fullfile).split('_')[1];
Try following
string fullfile = #"C:\file_test1.txt";
var name = fullfile.Substring(8,fullfile.Length-12)
As c:\file_ and .txt are fixed, You can take Substring starting at index 8 (skip leading name), upto length of total string length - 12 (12 => length of leading name, and trailing extension)
Thought I'd give a solution that uses Split and handles files with multiple underscores:
string.Join("_", Path.GetFileNameWithoutExtension(file).Split('_').Skip(1));
String.Split() works quite well for my uses:
http://msdn.microsoft.com/en-us/library/b873y76a.aspx
Obviously many ways to accomplish this. Here's yet another approach:
string fullfile = #"C:\file_test1.txt";
int index1 = fullfile.LastIndexOf("file_");
if (index1 != -1)
{
int index2 = fullfile.IndexOf(".", index1);
if (index2 != -1)
{
string section = fullfile.Substring(index1 + 5, index2 - index1 - 5);
}
}
You could also get "test1", or any subsequent filename (assuming your file naming convention remains constant!) using this regular expression:
var defaultRegex = new Regex(#"(?<=_).*(?=.txt)");
var matches = defaultRegex.Matches(fullfile);
var match = matches[0].Value;
The regular expression:
(?<=_).*(?=.txt)
uses positive look behind to find text preceded by '_', and also positive lookahead to find text which has '.txt' ahead of it.