How to write sorting in more efficient way? - c#

I have a project where I have to write an efficient code which will be working as fast as possible, but I have lack of knowledge do to it so...
So I have an asp.net(MVC) project using entity framework and as well I have to use Web Service to get info about details from it.
First I make request to Web service and is responds with a long string, which i have to parse in a list of strings for further activities.
I parse this string like this:
string resultString;
char[] delimiterChars = { ',', ':', '"', '}', '{' };
List<string> words = resultString.Split(delimiterChars).ToList();
From here i have list with a lot of rows, which have information and a lot of junk rows, which look like this:
I decided to clear this list from junk info, so as not to work with it in further methods and not to check this rows with ifs and so on:
for (int i = words.Count - 1; i >= 0; i--)
{
if (words[i] == "" || words[i] == "data" || words[i] == "array") words.RemoveAt(i);
}
After this I got clear list, but every decimal number like prices, sizes and so on got separated by ,, so if I had price 21,55 in my list it now looks like 2 elements 21 and 55. I cant just delete , from separators, because string I get as a response from web service mainly separates info by putting ,.
So I decided to glue decimal numbers back (before this block list elements looked like: 1)attrValue 2)21 3)55 and after like : 1)attrValue 2)21.55):
for (int i = 0; i < words.Count(); i++)
{
if (words[i] == "attrValue")
{
try
{
var seconPartInt = Int32.Parse(words[i + 2]);
words[i + 1] += "." + words[i + 2];
}
catch { }
}
if (words[i].Contains("\\/")) words[i].Replace("\\/", "/");
}
Every thing is ok, list is sorted, decimals are gathered, but speed is slowed down by 30%. After some tests with stopwatch and commenting blocks of code it became clear that this code above slows down the whole program too much...
To sum up:
I cant use that slow code and at the same time do not know how to make it work faster. May be the problem is that I convert string to int so as to check whether next element in the list is second part if my number.
How could I optimize my code?

The first thing you should do is use this version of Split to avoid getting empty entries (https://msdn.microsoft.com/en-us/library/ms131448(v=vs.110).aspx).
List<string> words = resultString.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries)
.ToList();
Also, if you know that "data" and "array" are in the string and you never want them, replace them with blanks before you split the string.
resultString = resultString.Replace("data", String.Empty)
.Replace("array", String.Empty);
What I don't understand is how the comma can be both a field delimiter and a meaningful character, and how you can possibly know the difference (i.e. whether 25,50 should be a single value or two values).

Related

How to contact whole text from file into the string avoiding empty lines beetwen strings

How to get whole text from document contacted into the string. I'm trying to split text by dot: string[] words = s.Split('.'); I want take this text from text document. But if my text document contains empty lines between strings, for example:
pat said, “i’ll keep this ring.”
she displayed the silver and jade wedding ring which, in another time track,
she and joe had picked out; this
much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
result looks like this:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track
3. she and joe had picked out this
4. much of the alternate world she had elected to retain
5. he wondered what if any legal basis she had kept in addition
6. none he hoped wisely however he said nothing
7. better not even to ask
but desired correct output should be like this:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track she and joe had picked out this much of the alternate world she had elected to retain
3. he wondered what if any legal basis she had kept in addition
4. none he hoped wisely however he said nothing
5. better not even to ask
So to do this first I need to process text file content to get whole text as single string, like this:
pat said, “i’ll keep this ring.” she displayed the silver and jade wedding ring which, in another time track, she and joe had picked out; this much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
I can't to do this same way as it would be with list content for example: string concat = String.Join(" ", text.ToArray());,
I'm not sure how to contact text into string from text document
I think this is what you want:
var fileLocation = #"c:\\myfile.txt";
var stringFromFile = File.ReadAllText(fileLocation);
//replace Environment.NewLine with any new line character your file uses
var withoutNewLines = stringFromFile.Replace(Environment.NewLine, "");
//modify to remove any unwanted character
var withoutUglyCharacters = Regex.Replace(withoutNewLines, "[“’”,;-]", "");
var withoutTwoSpaces = withoutUglyCharacters.Replace(" ", " ");
var result = withoutTwoSpaces.Split('.').Where(i => i != "").Select(i => i.TrimStart()).ToList();
So first you read all text from your file, then you remove all unwanted characters and then split by . and return non empty items
Have you tried replacing double new-lines before splitting using a period?
static string[] GetSentences(string filePath) {
if (!File.Exists(filePath))
throw new FileNotFoundException($"Could not find file { filePath }!");
var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
var sentences = Regex.Split(lines, #"\.[\s]{1,}?");
return sentences;
}
I haven't tested this, but it should work.
Explanation:
if (!File.Exists(filePath))
throw new FileNotFoundException($"Could not find file { filePath }!");
Throws an exception if the file could not be found. It is advisory you surround the method call with a try/catch.
var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
Creates a string, and ignores any lines which are purely whitespace or empty.
var sentences = Regex.Split(lines, #".[\s]{1,}?");
Creates a string array, where the string is split at every period and whitespace following the period.
E.g:
The string "I came. I saw. I conquered" would become
I came
I saw
I conquered
Update:
Here's the method as a one-liner, if that's your style?
static string[] SplitSentences(string filePath) => File.Exists(filePath) ? Regex.Split(string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line))), #"") : null;
I would suggest you to iterate through all characters and just check if they are in range of 'a' >= char <= 'z' or if char == ' '. If it matches the condition then add it to the newly created string else check if it is '.' character and if it is then end your line and add another one :
List<string> lines = new List<string>();
string line = string.Empty;
foreach(char c in str)
{
if((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20)
line += c;
else if(c == '.')
{
lines.Add(line.Trim());
line = string.Empty;
}
}
Working online example
Or if you prefer "one-liner"s :
IEnumerable<string> lines = new string(str.Select(c => (char)(((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20) ? c : c == '.' ? '\n' : '\0')).ToArray()).Split('\n').Select(s => s.Trim());
I may be wrong about this. I would think that you may not want to alter the string if you are splitting it. Example, there are double/single quote(s) (“) in part of the string. Removing them may not be desired which brings up the possibly of a question, reading a text file that contains single/double quotes (as your example data text shows) like below:
var stringFromFile = File.ReadAllText(fileLocation);
will not display those characters properly in a text box or the console because the default encoding using the ReadAllText method is UTF8. Example the single/double quotes will display (replacement characters) as diamonds in a text box on a form and will be displayed as a question mark (?) when displayed to the console. To keep the single/double quotes and have them display properly you can get the encoding for the OS’s current ANSI encoding by adding a parameter to the ReadAllText method like below:
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
Below is code using a simple split method to .split the string on periods (.) Hope this helps.
private void button1_Click(object sender, EventArgs e) {
string fileLocation = #"C:\YourPath\YourFile.txt";
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
string bigString = stringFromFile.Replace(Environment.NewLine, "");
string[] result = bigString.Split('.');
int count = 1;
foreach (string s in result) {
if (s != "") {
textBox1.Text += count + ". " + s.Trim() + Environment.NewLine;
Console.WriteLine(count + ". " + s.Trim());
count++;
}
else {
// period at the end of the string
}
}
}

Regex.Split and string.Split not working as expected

I am attempting to split strings using '?' as the delimiter. My code reads data from a CSV file, and certain symbols (like fractions) are not recognized by C#, so I am trying to replace them with a relevant piece of data (bond coupon in this case). I have print statements in the following code (which is embedded in a loop with index variable i) to test the output:
string[] l = lines[i][1].Split('?');
//string[] l = Regex.Split(lines[i][1], #"\?");
System.Console.WriteLine("L IS " + l.Length.ToString() + " LONG");
for (int j = 0; j < l.Length; j++)
System.Console.WriteLine("L["+ j.ToString() + "] IS " + l[j]);
if (l.Length > 1)
{
double cpn = Convert.ToDouble(lines[i][12]);
string couponFrac = (cpn - Math.Floor(cpn)).ToString().Remove(0,1);
lines[i][1] = l[0].Remove(l[0].Length-1) + couponFrac + l[1]; // Recombine, replacing '?' with CPN
}
The issue is that both split methods (string.Split() and Regex.Split() ) produce inconsistent results with some of the string elements in lines splitting correctly and the others not splitting at all (and thus the question mark is still in the string).
Any thoughts? I've looked at similar posts on split methods and they haven't been too helpful.
I had no problem using String.Split. Could you post your input and output?
If at all you could probably use String.Replace to replace your desired '?' with a character that does not occur in the string and then use String.Split on that character to split the resultant string for the same effect. (just a try)
I didn't have any trouble parsing the following.
var qsv = "now?is?the?time";
var keywords = qsv.Split('?');
keywords.Dump();
screenshot of code and output...
UPDATE:
There doesn't appear to be any problem with Split. There is a problem somewhere else because in this small scale test it works just fine. I would suggest you use LinqPad to test out these kinds of scenarios small scale.
var qsv = "TII 0 ? 04/15/15";
var keywords = qsv.Split('?');
keywords.Dump();
qsv = "TII 0 ? 01/15/22";
keywords = qsv.Split('?');
keywords.Dump();
New updated output:

sentence capitalizer

I'm woefully attempting a programming assignment. I'm not looking for a "this is how you do this" but more of a "what am I doing wrong?"
I'm attempting to capitalize the start of each sentence from a string input. So for example the string "Hello. my name is john. i like to ride bikes." I would modify the string and return it with capitals for example: "Hello. My name is john. I like to ride bikes." My logic seems a bit flawed and I'm very lost.
What I have so far below. Basically all I'm doing is testing for a punctuation signifying the end of a sentence. And then trying to replace the character. Also testing if it's the at the end of the string as to not create IndexOutOfRange exceptions. Although, that's all I've been getting :(
private string SentenceCapitalizer(string input)
{
for (int i = 0; i < input.Length; i++)
{
if (input[i] == '.' || input[i] == '!' || input[i] == '?')
{
if (!(input[i] == input.Length))
{
input.Replace(input[i + 2], char.ToUpper(input[i + 2]));
}
}
}
return input;
}
Any help is greatly appreciated. I'm just learning C# so the most basic of help would be of service. I don't know much :P
Instead of
if (!(input[i + 2] >= input.Length))
It should be
if (!(i + 2 >= input.Length))
You are comparing indices, not characters
You are checking if your current index is less than or equal to the length of the string and then attempting to alter an index 2 further along
if (!(input[i] == input.Length))
{
input.Replace(input[i + 2], char.ToUpper(input[i + 2]));
}
Should be changed to
if (!((i + 2) >= input.Length))
{
input.Replace(input[i + 2], char.ToUpper(input[i + 2]));
}
This will check that there is a value 2 places after a punctuation mark. Also make use of >= rather than == since you're jumping 2 you might end up going over the length of the array where == still returns false but there is no index.
Strings are immutable, you can't do:
var str = "123";
str.Replace('1', '2');
You have to do:
var str = "123";
str = str.Replace('1', '2');
Ok, others have provided you with some pointers to stop the obvious errors, but I'll try to give you some thoughts on how to best implement this.
It is worth thinking about this as a 3-step process
Tokenize the string into sentences
Ensure that the first character of each token is uppercase
reconstruct the string by joining the tokens back together
(1) I'll leave to your imagination, but the idea is to end up with an array of strings with each element representing a "sentence" according to your requirement
(2) Is pretty much as simple as
// Upercase character 0, and join it to everything from character 1 onwards
var fixedToken = token[0].ToUpper(CultureInfo.CurrentCulture)
+ token.Substring(1);
(3) Is also simple
// reconstruct string by joining all tokens with a space
var reconstructed = String.Join(" ",tokens);

Extracting values from a string in C#

I have the following string which i would like to retrieve some values from:
============================
Control 127232:
map #;-
============================
Control 127235:
map $;NULL
============================
Control 127236:
I want to take only the Control . Hence is there a way to retrieve from that string above into an array containing like [127232, 127235, 127236]?
One way of achieving this is with regular expressions, which does introduce some complexity but will give the answer you want with a little LINQ for good measure.
Start with a regular expression to capture, within a group, the data you want:
var regex = new Regex(#"Control\s+(\d+):");
This will look for the literal string "Control" followed by one or more whitespace characters, followed by one or more numbers (within a capture group) followed by a literal string ":".
Then capture matches from your input using the regular expression defined above:
var matches = regex.Matches(inputString);
Then, using a bit of LINQ you can turn this to an array
var arr = matches.OfType<Match>()
.Select(m => long.Parse(m.Groups[1].Value))
.ToArray();
now arr is an array of long's containing just the numbers.
Live example here: http://rextester.com/rundotnet?code=ZCMH97137
try this (assuming your string is named s and each line is made with \n):
List<string> ret = new List<string>();
foreach (string t in s.Split('\n').Where(p => p.StartsWith("Control")))
ret.Add(t.Replace("Control ", "").Replace(":", ""));
ret.Add(...) part is not elegant, but works...
EDITED:
If you want an array use string[] arr = ret.ToArray();
SYNOPSYS:
I see you're really a newbie, so I try to explain:
s.Split('\n') creates a string[] (every line in your string)
.Where(...) part extracts from the array only strings starting with Control
foreach part navigates through returned array taking one string at a time
t.Replace(..) cuts unwanted string out
ret.Add(...) finally adds searched items into returning list
Off the top of my head try this (it's quick and dirty), assuming the text you want to search is in the variable 'text':
List<string> numbers = System.Text.RegularExpressions.Regex.Split(text, "[^\\d+]").ToList();
numbers.RemoveAll(item => item == "");
The first line splits out all the numbers into separate items in a list, it also splits out lots of empty strings, the second line removes the empty strings leaving you with a list of the three numbers. if you want to convert that back to an array just add the following line to the end:
var numberArray = numbers.ToArray();
Yes, the way exists. I can't recall a simple way for It, but string is to be parsed for extracting this values. Algorithm of it is next:
Find a word "Control" in string and its end
Find a group of digits after the word
Extract number by int.parse or TryParse
If not the end of the string - goto to step one
realizing of this algorithm is almost primitive..)
This is simplest implementation (your string is str):
int i, number, index = 0;
while ((index = str.IndexOf(':', index)) != -1)
{
i = index - 1;
while (i >= 0 && char.IsDigit(str[i])) i--;
if (++i < index)
{
number = int.Parse(str.Substring(i, index - i));
Console.WriteLine("Number: " + number);
}
index ++;
}
Using LINQ for such a little operation is doubtful.

How to split a string while preserving line endings?

I have a block of text and I want to get its lines without losing the \r and \n at the end. Right now, I have the following (suboptimal code):
string[] lines = tbIn.Text.Split('\n')
.Select(t => t.Replace("\r", "\r\n")).ToArray();
So I'm wondering - is there a better way to do it?
Accepted answer
string[] lines = Regex.Split(tbIn.Text, #"(?<=\r\n)(?!$)");
The following seems to do the job:
string[] lines = Regex.Split(tbIn.Text, #"(?<=\r\n)(?!$)");
(?<=\r\n) uses 'positive lookbehind' to match after \r\n without consuming it.
(?!$) uses negative lookahead to prevent matching at the end of the input and so avoids a final line that is just an empty string.
Something along the lines of using this regular expression:
[^\n\r]*\r\n
Then use Regex.Matches().
The problem is you need Group(1) out of each match and create your string list from that. In Python you'd just use the map() function. Not sure the best way to do it in .NET, you take it from there ;-)
Dmitri, your solution is actually pretty compact and straightforward. The only thing more efficient would be to keep the string-splitting characters in the generated array, but the APIs simply don't allow for that. As a result, every solution will require iterating over the array and performing some kind of modification (which in C# means allocating new strings every time). I think the best you can hope for is to not re-create the array:
string[] lines = tbIn.Text.Split('\n');
for (int i = 0; i < lines.Length; ++i)
{
lines[i] = lines[i].Replace("\r", "\r\n");
}
... but as you can see that looks a lot more cumbersome! If performance matters, this may be a bit better. If it really matters, you should consider manually parsing the string by using IndexOf() to find the '\r's one at a time, and then create the array yourself. This is significantly more code, though, and probably not necessary.
One of the side effects of both your solution and this one is that you won't get a terminating "\r\n" on the last line if there wasn't one already there in the TextBox. Is this what you expect? What about blank lines... do you expect them to show up in 'lines'?
If you are just going to replace the newline (\n) then do something like this:
string[] lines = tbIn.Text.Split('\n')
.Select(t => t + "\r\n").ToArray();
Edit: Regex.Replace allows you to split on a string.
string[] lines = Regex.Split(tbIn.Text, "\r\n")
.Select(t => t + "\r\n").ToArray();
As always, extension method goodies :)
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
usage:
string text = "One,Two,Three,Four";
foreach (var s in text.SplitAndKeep(","))
{
Console.WriteLine(s);
}
Output:
One,
Two,
Three,
Four
You can achieve this with a regular expression. Here's an extension method with it:
public static string[] SplitAndKeepDelimiter(this string input, string delimiter)
{
MatchCollection matches = Regex.Matches(input, #"[^" + delimiter + "]+(" + delimiter + "|$)", RegexOptions.Multiline);
string[] result = new string[matches.Count];
for (int i = 0; i < matches.Count ; i++)
{
result[i] = matches[i].Value;
}
return result;
}
I'm not sure if this is a better solution. Yours is very compact and simple.

Categories