So I get last name data in all caps from the database. I need to format the text to have the first letter of each part of the name capitalized. Currently I have the code below which accounts for people with double last names (ex. SMITH-JONES becomes Smith-Jones) but while checking the results I noticed I still have an error when it comes to names like VAN BEBBER which becomes Van bebber. Any suggestions?
var fullLast = Last.Split('-');
var lastFormatted = new StringBuilder();
for (var i = 0; i < fullLast.Length; i++)
{
fullLast[i] = char.ToUpper(fullLast[i][0]) + fullLast[i].Substring(1).ToLower();
lastFormatted.Append(fullLast[i]);
if (i != fullLast.Length - 1)
lastFormatted.Append("-");
}
return string.Format(
"{0} {1}",
char.ToUpper(First[0]) + First.Substring(1).ToLower(),
lastFormatted);
This is a known issue with names - things are extremely inconsistent. Read this article for more information: http://www.w3.org/International/questions/qa-personal-names
In your example, you reference the last name "VAN BEBBER", which you want to be capitalized as "Van Bebber". However, as the article points out, there are other combinations from other areas of the world which would ruin most attempts at standardization - for instance, the last name "BIN OSMAN" would be properly capitalized as "bin Osman" - no capital "b" for "bin", which means "son of" and therefore doesn't fit well in the westernized concept of a last name.
You mention that you split last names by dashes, which most likely comes from the idea of a hyphenated last name - do you check the first name for dashes as well? The site gives the example name of "María-Jose Carreño Quiñones" - which is quite difficult to parse due to a double first name (separated by a hyphen) as well as a double last name (separated by a space). How would your program fair with that name?
To answer your question more directly, without bringing in more edge cases - you already know how to split a string via the dash - if you want to cover the case of last names with spaces, you should further split the last name string by spaces, and only then capitalize the first letter of the different split-up strings.
Alternatively, as Dai mentioned in a comment, you could use the ToTitleCase method - more information here: https://msdn.microsoft.com/en-us/library/system.globalization.textinfo.totitlecase(v=vs.110).aspx This is most likely a better solution than trying to make your own. However, this page references the fact that not all languages capitalize in the same way (and indeed, different last names may come from different areas/cultures/languages), and therefore setting the correct language may not always yield the correct last name capitalization. Note that it would capitalize "BIN OSMAN" as "Bin Osman", which is technically incorrect.
Here's a quick example from that page:
// Defines the String* with mixed casing.
String^ myString = "wAr aNd pEaCe";
// Creates a TextInfo based on the "en-US" culture.
/**** Personal Note - en-US may not be the correct culture for every last name! ****/
CultureInfo^ MyCI = gcnew CultureInfo( "en-US",false );
TextInfo^ myTI = MyCI->TextInfo;
// Changes a String* to lowercase. Outputs "War and Peace"
Console::WriteLine( "\"{0}\" to titlecase: {1}", myString, myTI->ToTitleCase( myString )
I think you can use ToTitleCase method....
CultureInfo cultureInfo = CultureInfo.CurrentCulture; //Or use a specific culture
var str1 = cultureInfo.TextInfo.ToTitleCase("VAN BEBBER".ToLower(cultureInfo));
var str2 = cultureInfo.TextInfo.ToTitleCase("SMITH-JONES".ToLower(cultureInfo));
Why don't you do a split by space and then by '-'. That way you could capture all of the instances.
See this example:
var names = fullName.Split(' ');
var formatted = new StringBuilder();
foreach(string name in names)
{
if(name.Contains('-'))
{
var nonHyphanatedNames = name.Split('-');
foreach (var nonHyphanatedName in nonHyphanatedNames)
{
formatted.Append(char.ToUpper(nonHyphanatedName[0]) + nonHyphanatedName.Substring(1).ToLower() + '-');
}
}
else
{
formatted.Append(char.ToUpper(name[0]) + name.Substring(1).ToLower() + ' ');
}
}
//remove last field
formatted.Remove(formatted.Length - 1, 1);
Console.Write(formatted);
Related
I'm building a program which processes documents based on their file path and file name.
My current solution is based on file names containing 3 strings each separated by a space, dash and another space so that a valid name would be: "STRING1 - STRING2 - STRING3.pdf".
My program reads these values by using IndexOf().
string1.Substring(fileName.IndexOf("-") - 1)
string3.Substring(fileName.LastIndexOf("-") + 2)
The problem is that this breaks when the file names don't contain whitespaces, therefore breaking everything. So I opted to use Regex instead but how would I add a condition, so it doesn't add spaces to a name which already contains them.
Example,
String fileName[1] = "Test123 - Dog - Page 1.pdf"
String fileName[2] = "Test123-Dog-Page1.pdf"
Regex.Replace(fileName[1], "-", " - ");
Regex.Replace(fileNameB[2], "-", " - ");
Output:
fileName[1] = Test123 - Dog - Page 1.pdf
fileName[2] = Test123 - Dog - Page 1.pdf
fileName[1] was originally valid, now it's invalid.
fileName[2] was originally invalid, now it's valid.
I need both to be valid via an if condition.
Ps. Apologies if anything is unclear, I'm new to posting on Stack
You don't need regex, in case pure string methods are more readable for you:
string FixFileName(string fn)
{
string fnwe = System.IO.Path.GetFileNameWithoutExtension(fn);
return string.Join(" - ", fnwe.Split('-').Select(token => token.Trim()))
+ System.IO.Path.GetExtension(fn);
}
Demo: https://dotnetfiddle.net/alv6sB
How to get whole text from document contacted into the string. I'm trying to split text by dot: string[] words = s.Split('.'); I want take this text from text document. But if my text document contains empty lines between strings, for example:
pat said, “i’ll keep this ring.”
she displayed the silver and jade wedding ring which, in another time track,
she and joe had picked out; this
much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
result looks like this:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track
3. she and joe had picked out this
4. much of the alternate world she had elected to retain
5. he wondered what if any legal basis she had kept in addition
6. none he hoped wisely however he said nothing
7. better not even to ask
but desired correct output should be like this:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track she and joe had picked out this much of the alternate world she had elected to retain
3. he wondered what if any legal basis she had kept in addition
4. none he hoped wisely however he said nothing
5. better not even to ask
So to do this first I need to process text file content to get whole text as single string, like this:
pat said, “i’ll keep this ring.” she displayed the silver and jade wedding ring which, in another time track, she and joe had picked out; this much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
I can't to do this same way as it would be with list content for example: string concat = String.Join(" ", text.ToArray());,
I'm not sure how to contact text into string from text document
I think this is what you want:
var fileLocation = #"c:\\myfile.txt";
var stringFromFile = File.ReadAllText(fileLocation);
//replace Environment.NewLine with any new line character your file uses
var withoutNewLines = stringFromFile.Replace(Environment.NewLine, "");
//modify to remove any unwanted character
var withoutUglyCharacters = Regex.Replace(withoutNewLines, "[“’”,;-]", "");
var withoutTwoSpaces = withoutUglyCharacters.Replace(" ", " ");
var result = withoutTwoSpaces.Split('.').Where(i => i != "").Select(i => i.TrimStart()).ToList();
So first you read all text from your file, then you remove all unwanted characters and then split by . and return non empty items
Have you tried replacing double new-lines before splitting using a period?
static string[] GetSentences(string filePath) {
if (!File.Exists(filePath))
throw new FileNotFoundException($"Could not find file { filePath }!");
var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
var sentences = Regex.Split(lines, #"\.[\s]{1,}?");
return sentences;
}
I haven't tested this, but it should work.
Explanation:
if (!File.Exists(filePath))
throw new FileNotFoundException($"Could not find file { filePath }!");
Throws an exception if the file could not be found. It is advisory you surround the method call with a try/catch.
var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
Creates a string, and ignores any lines which are purely whitespace or empty.
var sentences = Regex.Split(lines, #".[\s]{1,}?");
Creates a string array, where the string is split at every period and whitespace following the period.
E.g:
The string "I came. I saw. I conquered" would become
I came
I saw
I conquered
Update:
Here's the method as a one-liner, if that's your style?
static string[] SplitSentences(string filePath) => File.Exists(filePath) ? Regex.Split(string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line))), #"") : null;
I would suggest you to iterate through all characters and just check if they are in range of 'a' >= char <= 'z' or if char == ' '. If it matches the condition then add it to the newly created string else check if it is '.' character and if it is then end your line and add another one :
List<string> lines = new List<string>();
string line = string.Empty;
foreach(char c in str)
{
if((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20)
line += c;
else if(c == '.')
{
lines.Add(line.Trim());
line = string.Empty;
}
}
Working online example
Or if you prefer "one-liner"s :
IEnumerable<string> lines = new string(str.Select(c => (char)(((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20) ? c : c == '.' ? '\n' : '\0')).ToArray()).Split('\n').Select(s => s.Trim());
I may be wrong about this. I would think that you may not want to alter the string if you are splitting it. Example, there are double/single quote(s) (“) in part of the string. Removing them may not be desired which brings up the possibly of a question, reading a text file that contains single/double quotes (as your example data text shows) like below:
var stringFromFile = File.ReadAllText(fileLocation);
will not display those characters properly in a text box or the console because the default encoding using the ReadAllText method is UTF8. Example the single/double quotes will display (replacement characters) as diamonds in a text box on a form and will be displayed as a question mark (?) when displayed to the console. To keep the single/double quotes and have them display properly you can get the encoding for the OS’s current ANSI encoding by adding a parameter to the ReadAllText method like below:
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
Below is code using a simple split method to .split the string on periods (.) Hope this helps.
private void button1_Click(object sender, EventArgs e) {
string fileLocation = #"C:\YourPath\YourFile.txt";
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
string bigString = stringFromFile.Replace(Environment.NewLine, "");
string[] result = bigString.Split('.');
int count = 1;
foreach (string s in result) {
if (s != "") {
textBox1.Text += count + ". " + s.Trim() + Environment.NewLine;
Console.WriteLine(count + ". " + s.Trim());
count++;
}
else {
// period at the end of the string
}
}
}
I'm currently trying to strip a string of data that is may contain the hyphen symbol.
E.g. Basic logic:
string stringin = "test - 9894"; OR Data could be == "test";
if (string contains a hyphen "-"){
Strip stringin;
output would be "test" deleting from the hyphen.
}
Console.WriteLine(stringin);
The current C# code i'm trying to get to work is shown below:
string Details = "hsh4a - 8989";
var regexItem = new Regex("^[^-]*-?[^-]*$");
string stringin;
stringin = Details.ToString();
if (regexItem.IsMatch(stringin)) {
stringin = stringin.Substring(0, stringin.IndexOf("-") - 1); //Strip from the ending chars and - once - is hit.
}
Details = stringin;
Console.WriteLine(Details);
But pulls in an Error when the string does not contain any hyphen's.
How about just doing this?
stringin.Split('-')[0].Trim();
You could even specify the maximum number of substrings using overloaded Split constructor.
stringin.Split('-', 1)[0].Trim();
Your regex is asking for "zero or one repetition of -", which means that it matches even if your input does NOT contain a hyphen. Thereafter you do this
stringin.Substring(0, stringin.IndexOf("-") - 1)
Which gives an index out of range exception (There is no hyphen to find).
Make a simple change to your regex and it works with or without - ask for "one or more hyphens":
var regexItem = new Regex("^[^-]*-+[^-]*$");
here -------------------------^
It seems that you want the (sub)string starting from the dash ('-') if original one contains '-' or the original string if doesn't have dash.
If it's your case:
String Details = "hsh4a - 8989";
Details = Details.Substring(Details.IndexOf('-') + 1);
I wouldn't use regex for this case if I were you, it makes the solution much more complex than it can be.
For string I am sure will have no more than a couple of dashes I would use this code, because it is one liner and very simple:
string str= entryString.Split(new [] {'-'}, StringSplitOptions.RemoveEmptyEntries)[0];
If you know that a string might contain high amount of dashes, it is not recommended to use this approach - it will create high amount of different strings, although you are looking just for the first one. So, the solution would look like something like this code:
int firstDashIndex = entryString.IndexOf("-");
string str = firstDashIndex > -1? entryString.Substring(0, firstDashIndex) : entryString;
you don't need a regex for this. A simple IndexOf function will give you the index of the hyphen, then you can clean it up from there.
This is also a great place to start writing unit tests as well. They are very good for stuff like this.
Here's what the code could look like :
string inputString = "ho-something";
string outPutString = inputString;
var hyphenIndex = inputString.IndexOf('-');
if (hyphenIndex > -1)
{
outPutString = inputString.Substring(0, hyphenIndex);
}
return outPutString;
I am trying find a string in below string.
http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx?ID=1779
by using http://example.com/TIGS/SIM/Lists string. How can I get Team Discussion word from it?
Some times strings will be
http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx?ID=1779
I need `Team Discussion`
http://example.com/TIGS/ALIF/Lists/Artifical Lift Discussion Forum 2/DispForm.aspx?ID=8
I need `Artifical Lift Discussion Forum 2`
If you're always following that pattern, I recommend #Justin's answer. However, if you want a more robust method, you can always couple the System.Uri and Path.GetDirectoryName methods, then perform a String.Split. Like this example:
String url = #"http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx?ID=1779";
System.Uri uri = new System.Uri(url);
String dir = Path.GetDirectoryName(uri.AbsolutePath);
String[] parts = dir.Split(new[]{ Path.DirectorySeparatorChar });
Console.WriteLine(parts[parts.Length - 1]);
The only major problem, however, is you're going to wind up with a path that's been "encoded" (i.e. your space is now going to be represented by a %20)
This solution will get you the last directory of your URL regardless of how many directories are in your URL.
string[] arr = s.Split('/');
string lastPart = arr[arr.Length - 2];
You could combine this solution into one line, however it would require splitting the string twice, once for the values, the second for the length.
If you wanted to see a regular expression example:
string input = "http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx?ID=1779";
string given = "http://example.com/TIGS/SIM/Lists";
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(given + #"\/(.+)\/");
System.Text.RegularExpressions.Match match = regex.Match(input);
Console.WriteLine(match.Groups[1]); // Team Discussion
Here's a simple approach, assuming that your URL always has the same number of slashes before the are you want:
var value = url.Split(new[]{'/'}, StringSplitOptions.RemoveEmptyEntries)[5];
Here is another solution that provides the following advantages:
Does not require the use of regular expressions.
Does not require a certain 'count' of slashes be present (indexing based of a specific number). I consider this a key benefit because it makes the code less likely to fail if some part of the URL changes. Ultimately it is best to base your parsing logic off which part of the text's structure you consider least likely to change.
This method, however, DOES rely on the following assumptions, which I consider to be the least likely to change:
URL must have "/Lists/" right before target text.
URL must have "/" right after target text.
Basically, I just split the string twice, using text that I expect to be surrounding the area I am interested in.
String urlToSearch = "http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx";
String result = "";
// First, get everthing after "/Lists/"
string[] temp1 = urlToSearch.Split(new String[] { "/Lists/" }, StringSplitOptions.RemoveEmptyEntries);
if (temp1.Length > 1)
{
// Next, get everything before the first "/"
string[] temp2 = temp1[1].Split(new String[] { "/" }, StringSplitOptions.RemoveEmptyEntries);
result = temp2[0];
}
Your answer will then be stored in the 'result' variable.
I have strings with space seperated values and I would like to pick up from a certain index to another and save it in a variable. The strings are as follows:
John Doe Villa Grazia 323334I
I managed to store the id card (3rd column) by using:
if (line.length > 39)
{
idCard = line.Substring(39, 46);
}
However, if I store the name and address (1st and 2nd columns) with Substring there will be empty spaces since they are not of the same length (unlike the id cards). How can I store these 2 values and removing the unneccasry spaces BUT allowing the spaces between name and surname?
Try this:
string line = " John Doe Villa Grazia 323334I";
string name = line.Substring(02, 16).Trim();
string address = line.Substring(18, 23).Trim();
string id = line.Substring(41, 07).Trim();
var values = line.Split(' ');
string name = values[0] + " " + values[1];
string idCard = values[4];
It will be impossible to do without database lookups on names if there aren't spaces for sure in the previous columns.
Are these actually space separated or are they really fix width columns?
By that I mean do the "columns" start at the same index into the string in each case - from the way you're describing the data is sounds like the later i.e. the ID column is always column 39 for 7 characters.
In which case you need to a) pull the columns using the appropriate substring calls as you're already doing and then, use "string ".Trim() to cut off the spaces.
If the rows, are, as it seems fixed with then you don't want to use Split at all.
How can you even get the ID like that, when everything in front of it is of variable length? If that was used for my name, "David Hedlund 323334I", the ID would start at pos 14, not 39.
Try this more dynamic approach:
var name = str.Substring(0, str.LastIndexOf(" "));
var id = str.Substring(str.LastIndexOf(" ")+1);
Looks like your parsing strategy will cause you a lot of trouble. You shouldn't count on the string's size in order to parse it.
Why not save the data in CSV format (John Doe, Villa Grazia, 323334I)?
that way, you can assume that each "column" will be separated by a comma which will make your parsing efforts easier.
Possible "DOH!" question but are you sure they are spaces and not Tabs? Looks like it "could" be a tab seperated file?
Also for browie points you should use String.Empty instead of ' ' for comparisons, its more localisation and memory friendly apparently.
The first approach would be - as already mentioned - a CSV-like structure with a defined token as the field separator.
The second one would be fixed field lengths so you know the first column goes from char 1 to char 20, the second column from char 21 to char 30, and so on.
There is nothing bad about this concept besides that the human readability may be poor if the columns are filled up to their maximum so no spaces remain between them.
You could write a helper function or class which knows about the field lengths and provides an index-based, fault-tolerant access to the particular column. This function would extract the particular string parts and remove the leading and trailing spaces but leave the spaces in between as they are.
If your values have fixed width, best not split it but use the right indexes in your array.
const string input = "John Doe Villa Grazia 323334I";
var name = input.Substring(0, 15).TrimEnd();
var place = input.Substring(16, 38).TrimEnd();
var cardId = input.Substring(39).TrimEnd();
Assuming your values cannot contain two sequential spaces in them we can maybe use " " (double space" as a separator?
The following code will split your string based on the double space
const string input = "John Doe Villa Grazia 323334I";
var entries = input.Split(new[]{" "}, StringSplitOptions.RemoveEmptyEntries)
.Select(s=>s.Trim()).ToArray();
string name = entries[0];
string place = entries[1];
string idCard = entries[2];