Load huge txt file for winform quickly - c#

I am going to make a sinhala english dictionary. SO i have a file that contains sinhala meaning for every english word. So i thought to load it while form is loading. So i added following command to get all file content to a string variable. SO i used following command in FormLoad method,
private string DictionaryWords = "";
private string ss = null;
...
private void Form1_Load(object sender, EventArgs e)
{
this.BackColor = ColorTranslator.FromHtml("#AFC3E0");
string fileName = #"SI-utf8.Txt";
using (StreamReader sr = File.OpenText(fileName))
{
while ((ss = sr.ReadLine()) != null)
{
DictionaryWords += ss;
}
}
}
But unfortunately that txt file has 130000+ line and it size it more than 5MB. SO my winform not loading.
see the image
I need to load this faster for winform to use REGEX form getting right meaning for every english word..
Could anybody tell me a method to do this. I tried everything.
Load this huge file to my project within 15 more less and need to use Regex for finding each english words..

Well, there are too little code to analyze. I suspect that
DictionaryWords += ss;
is the felon: appending string 130000 times which means re-creating quite long string over and over again can well put the system on the knees, but I have not rigorous proof (I've asked about DictionaryWords in the comment). Another possible candidate to be blamed is the unknown for me your regular expression.
That's why let me try to solve the problem from scratch.
We a have a (long) dictionary in SI-utf8.Txt.
We should load the dictionary without freezing the UI.
We should use the dictionary loaded to translate the English texts.
I have got something like this:
using System.IO;
using System.Linq;
using System.Threading.Tasks;
...
// Loading dictionary (async, since dictionary can be quite long)
// static: we want just one dictionary for all the instances
private static readonly Task<IReadOnlyDictionary<string, string>> s_Dictionary =
Task<IReadOnlyDictionary<string, string>>.Run(() => {
char[] delimiters = { ' ', '\t' };
IReadOnlyDictionary<string, string> result = File
.ReadLines(#"SI-utf8.Txt")
.Where(line => !string.IsNullOrWhiteSpace(line))
.Select(line => line.Split(delimiters, StringSplitOptions.RemoveEmptyEntries))
.Where(items => items.Length == 2)
.ToDictionary(items => items[0],
items => items[1],
StringComparer.OrdinalIgnoreCase);
return result;
});
Then we need a translation part:
// Let it be the simplest regex: English letters and apostrophes;
// you can improve it if you like
private static readonly Regex s_EnglishWords = new Regex("[A-Za-z']+");
// Tanslation is async, since we have to wait for dictionary to be loaded
private static async Task<string> Translate(string englishText) {
if (string.IsNullOrWhiteSpace(englishText))
return englishText;
var dictionary = await s_Dictionary;
return s_EnglishWords.Replace(englishText,
match => dictionary.TryGetValue(match.Value, out var translation)
? translation // if we know the translation
: match.Value); // if we don't know the translation
}
Usage:
// Note, that button event should be async as well
private async void button1_Click(object sender, EventArgs e) {
TranslationTextBox.Text = await Translate(OriginalTextBox.Text);
}
Edit: So, DictionaryWords is a string and thus
DictionaryWords += ss;
is a felon. Please, don't append string in a (deep) loop: each append re-creates the string which is slow. If you insist on the looping, use StringBuilder:
// Let's pre-allocate a buffer for 6 million chars
StringBuilder sb = new StringBuilder(6 * 1024 * 1024);
using (StreamReader sr = File.OpenText(fileName))
{
while ((ss = sr.ReadLine()) != null)
{
sb.Append(ss);
}
}
DictionaryWords = sb.ToString();
Or, why should you loop at all? Let .net do the work for you:
DictionaryWords = File.ReadAllText(#"SI-utf8.Txt");
Edit 2: If actual file size is not that huge (it is DictionaryWords += ss; alone who spoils the fun) you can stick to a simple synchronous solution:
private static readonly Regex s_EnglishWords = new Regex("[A-Za-z']+");
private static readonly IReadOnlyDictionary<string, string> s_Dictionary = File
.ReadLines(#"SI-utf8.Txt")
.Where(line => !string.IsNullOrWhiteSpace(line))
.Select(line => line.Split(new char[] { ' ', '\t' },
StringSplitOptions.RemoveEmptyEntries))
.Where(items => items.Length == 2)
.ToDictionary(items => items[0],
items => items[1],
StringComparer.OrdinalIgnoreCase);
private static string Translate(string englishText) {
if (string.IsNullOrWhiteSpace(englishText))
return englishText;
return s_EnglishWords.Replace(englishText,
match => s_Dictionary.TryGetValue(match.Value, out var translation)
? translation
: match.Value);
}
An then the usage is quite simple:
// Note, that button event should be async as well
private void button1_Click(object sender, EventArgs e) {
TranslationTextBox.Text = Translate(OriginalTextBox.Text);
}

Related

How can I simulate user input from a console?

Im doing some challenges in HackerRank. I usually use a windows Form project in visualstudio to do the debug, but realize I lost lot of time input the test cases. So I want suggestion of a way I can easy simulate the console.ReadLine()
Usually the challenges have the cases describe with something like this:
5
1 2 1 3 2
3 2
And then is read like: using three ReadLine
static void Main(String[] args) {
int n = Convert.ToInt32(Console.ReadLine());
string[] squares_temp = Console.ReadLine().Split(' ');
int[] squares = Array.ConvertAll(squares_temp,Int32.Parse);
string[] tokens_d = Console.ReadLine().Split(' ');
int d = Convert.ToInt32(tokens_d[0]);
int m = Convert.ToInt32(tokens_d[1]);
// your code goes here
}
Right now I was thinking in create a file testCase.txt and use StreamReader.
using (StreamReader sr = new StreamReader("testCase.txt"))
{
string line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
Console.WriteLine(line);
}
}
This way I can replace Console.ReadLine() with sr.ReadLine(), but this need have a text editor open, delete old case, copy the new one and save the file each time.
So is there a way I can use a Textbox, so only need copy/paste in the textbox and use streamReader or something similar to read from the textbox?
You can use the StringReader class to read from a string rather than a file.
the solution you accepted! doesn't really emulate the Console.ReadLine(), so you can't paste it directly to HackerRank.
I solved it this way:
.
.
Just paste this class above the static Main method or anywhere inside the main class to hide the original System.Console
class Console
{
public static Queue<string> TestData = new Queue<string>();
public static void SetTestData(string testData)
{
TestData = new Queue<string>(testData.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries).Select(x=>x.TrimStart()));
}
public static void SetTestDataFromFile(string path)
{
TestData = new Queue<string>(File.ReadAllLines(path));
}
public static string ReadLine()
{
return TestData.Dequeue();
}
public static void WriteLine(object value = null)
{
System.Console.WriteLine(value);
}
public static void Write(object value = null)
{
System.Console.WriteLine(value);
}
}
and use it this way.
//Paste the Console class here.
static void HackersRankProblem(String[] args)
{
Console.SetTestData(#"
6
6 12 8 10 20 16
");
int n = int.Parse(Console.ReadLine());
string arrStr = Console.ReadLine();
.
.
.
}
Now your code will look the same! and you can test as many data as you want without changing your code.
Note: If you need more complexes Write or WriteLine methods, just add them and send them to the original System.Console(..args)
Just set Application Arguments: <input.txt
and provide in input.txt your input text.
Be careful to save the file with ANSI encoding.

How to complete aspx connection string from text file

I must use a text file "db.txt" which inherits the names of the Server and Database to make my connection string complete.
db.txt looks like this:
<Anfang>
SERVER==dbServer\SQLEXPRESS
DATABASE==studentweb
<Ende>
The connection string:
string constr = ConfigurationManager.ConnectionStrings["DRIVER={SQL Server}; SERVER=SERVER DATABASE=DB UID=;PWD=;LANGUAGE=Deutsch;Trusted_Connection=YES"].ConnectionString;
Unfortunatly we are only allowed to use Classic ASPX.net (C# 2.0) and not the web.config.
I've searched a lot, but found nothing close to help me.
Somebody got an Idea how to make it work?
Here is something to get you going.
In a nutshell, I put the DBInfo file through a method that reads the file line by line. When I see the line <anfang> I know the next line will be important, and when I see the line <ende> I know it's the end, so I need to grab everything in between. Hence why I came up with the booleans areWeThereYet and isItDoneYet which I use to start and stop gathering data from the file.
In this snippet I use a Dictionary<string, string> to store and return the values but, you could use something different. At first I was going to create a custom class that would hold all the DB information but, since this is a school assignment, we'll go step by step and start by using what's already available.
using System;
using System.Collections.Generic;
namespace _41167195
{
class Program
{
static void Main(string[] args)
{
string pathToDBINfoFile = #"M:\StackOverflowQuestionsAndAnswers\41167195\41167195\sample\DBInfo.txt";//the path to the file holding the info
Dictionary<string, string> connStringValues = DoIt(pathToDBINfoFile);//Get the values from the file using a method that returns a dictionary
string serverValue = connStringValues["SERVER"];//just for you to see what the results are
string dbValue = connStringValues["DATABASE"];//just for you to see what the results are
//Now you can adjust the line below using the stuff you got from above.
//string constr = ConfigurationManager.ConnectionStrings["DRIVER={SQL Server}; SERVER=SERVER DATABASE=DB UID=;PWD=;LANGUAGE=Deutsch;Trusted_Connection=YES"].ConnectionString;
}
private static Dictionary<string, string> DoIt(string incomingDBInfoPath)
{
Dictionary<string, string> retVal = new Dictionary<string, string>();//initialize a dictionary, this will be our return value
using (System.IO.StreamReader sr = new System.IO.StreamReader(incomingDBInfoPath))
{
string currentLine = string.Empty;
bool areWeThereYet = false;
bool isItDoneYet = false;
while ((currentLine = sr.ReadLine()) != null)//while there is something to read
{
if (currentLine.ToLower() == "<anfang>")
{
areWeThereYet = true;
continue;//force the while to go into the next iteration
}
else if (currentLine.ToLower() == "<ende>")
{
isItDoneYet = true;
}
if (areWeThereYet && !isItDoneYet)
{
string[] bleh = currentLine.Split(new string[] { "==" }, StringSplitOptions.RemoveEmptyEntries);
retVal.Add(bleh[0], bleh[1]);//add the value to the dictionary
}
else if (isItDoneYet)
{
break;//we are done, get out of here
}
else
{
continue;//we don't need this line
}
}
}
return retVal;
}
}
}

Array Randomly Splitting String

The string is being split using commas as delimiters. Every time string is printed, it appears in a different order. The String is variable:
' String: Z1,TA,H999.00,T999.00 '
It Successfully splits, however even if the string is exactly the same, when printing the array, we get random new lines and random data missing.
When printed to Text box its either correctly split, or like:
-Z1
-T
-H999.00
-T999.
-00
If the Loop runs again, we get different results. On the odd occasion, it is correctly displayed.
I assume its this code: (EDIT: ITS NOT)
string[] ArrayCleanDataRX = CleanDataRX.Split(',');
foreach (string EntireList1 in ArrayCleanDataRX)
{
TxtZ1.AppendText(EntireList1);
TxtZ1.AppendText("\n");
}
Any Suggestions would be brilliant.
Thank you.
UPDATE: (Still Unsolved)
Update 2: More Code -
#region Global Strings
public string DirtyDataRX; //String contains Data from Serial
public string Z1 = "Z1"; //String to check if Data from serial Contains Z1
private void FeedbackProcessing(object sender, EventArgs e)
{
TxtDirtyDataRX.AppendText(DirtyDataRX); //Populate TxtDirtyTest with DirtyText String
var CleanDataRX = DirtyDataRX; //Clean Data = Dirty Text
var charstoremove = new string[] { "|", "-", "%", " ", " ", " ", "~", "$", "?", "'", ".,", "..,", "..", "..:", ".:", "...", "....", ".....", "......", "......", "......", "-" }; // Contents of CharsToRemove (Removes Bad Charecters from raw serial)
foreach (var c in charstoremove) //C is Char(s) to remove
{
CleanDataRX = CleanDataRX.Replace(c, string.Empty); //Replace C in CleanDataRX with nothing.
}
TxtCleanDataRX.AppendText(CleanDataRX); //Show DirtyDataRX in DirtyDataRX Textbox
#region IfZones and Array Loops
if (CleanDataRX.Contains(Z1)) // If CleanDataRX Contains "Z1" Run Code
{
string[] ArrayZ1 = CleanDataRX.Split(','); //New String Array from CleanDaraRX. Split using Comma as Delimiter
foreach (string StrArrayZ1 in ArrayZ1) // New string Called StrArrayZ1 in ArrayCleanDataRX
{
TxtZ1.AppendText(StrArrayZ1); //Append Textbox with String Array, Loop untill Empty
}
}
#region DirtyRX
private void serialPort1_DataReceived(object sender, System.IO.Ports.SerialDataReceivedEventArgs e)
{
DirtyDataRX = serialPort1.ReadExisting();
this.Invoke(new EventHandler(FeedbackProcessing));
}
#endregion
Code i think is irelevent to the problem is left out to simplify the problem.
Note: Some Array names have been edited slightly..
This peace of code is not enough to answer the question. However if you are using multithreading, you have to use locks to avoid hazardous results.
Example:
lock(TxtZ1)
{
string[] ArrayCleanDataRX = CleanDataRX.Split(',');
foreach (string EntireList1 in ArrayCleanDataRX1)
{
TxtZ1.AppendText(EntireList1);
TxtZ1.AppendText("\n");
}
}
TxtZ1 is a textbox?
Then you should rather do something like:
string[] ArrayCleanDataRX = CleanDataRX.Split(',');
StringBuilder sb = new StringBuilder();
foreach (string EntireList1 in ArrayCleanDataRX)
{
sb.AppendLine(EntireList1);
}
TxtZ1.AppendText(sb.ToString());
I think it would also solve it if you use Environment.Newline instead of \n. Strange things tend to happen with \n in windows controls...
thanks for your help. Problem solved on my own...
It wasn't exactly the loop, or the split...sorry.
This may help Others though......
The error was being thrown because the string was being actively built. The problem is in how the data is read in the serialport read method, this is how it was:
DirtyDataRX = serialPort1.ReadExisting();
this.Invoke(new EventHandler(FeedbackProcessing));
As the data coming through is line oriented .ReadLine should be used....
DirtyDataRX = serialPort1.ReadLine();
this.Invoke(new EventHandler(FeedbackProcessing));
Using ReadLine instead solves the issue at hand.

Converting list result into textbox.Text property

I am making a little word guessing game, and I'd like for the user to be able to guess the words. I have the following code, and it loads words from a text file located at string filename into a list box. However, I would like for the words to appear one by one in a textbox. The catch does not throw any errors out at all, the textbox is simply empty. Is this possible and could you show me some code so I can have a play please? Cheers!
I can then hide this word using one textbox, and trigger some code to move onto the next word if the typed word into the second, visible textbox is correct.
async private void LoadWords(string filename)
{
var wordList = new List<String>();
Windows.Storage.StorageFolder localFolder = Windows.ApplicationModel.Package.Current.InstalledLocation;
try
{
Windows.Storage.StorageFile sampleFile = await localFolder.GetFileAsync(filename);
var words = await Windows.Storage.FileIO.ReadLinesAsync(sampleFile);
foreach (var word in words)
{
wordList.Add(word);
}
HiddenWordBox.Text = string.Join(Environment.NewLine, wordList);
}
catch (Exception e)
{
MessageDialog CatchMsg = new MessageDialog(e.Message);
}
}
Forgive me for my stupid question but isn't the following code required for it to be seen?
await CatchMsg.ShowAsync();
Not necessarily with the await operator.
Change
HiddenWordBox.Text = string.Join(Environment.NewLine, wordList);
To
HiddenWordBox.Text = string.Join(Environment.NewLine, wordList.ToArray());
and the output should be as follows: http://ideone.com/DmFI0Z
(There's no constructor for string.Join involving List)

Writing from a list to a text file C#

How do I code the whole list into the text file with commas in between each bit of data? Currently it is creating the file newData, but it is not putting in the variables from the list. Here is what I have so far.
public partial class Form1 : Form {
List<string> newData = new List<string>();
}
Above is where I create my list. Below is where I am reading it from.
private void saveToolStripMenuItem_Click(object sender, EventArgs e) {
TextWriter tw = new StreamWriter("NewData.txt");
tw.WriteLine(newData);
buttonSave.Enabled = true;
textBoxLatitude.Enabled = false;
textBoxLongtitude.Enabled = false;
textBoxElevation.Enabled = false;
}
And below is where the variables are coming from.
private void buttonSave_Click(object sender, EventArgs e) {
newData.Add (textBoxLatitude.Text);
newData.Add (textBoxLongtitude.Text);
newData.Add (textBoxElevation.Text);
textBoxLatitude.Text = null;
textBoxLongtitude.Text = null;
textBoxElevation.Text = null;
}
While you can use String.Join as others have mentioned they're ignoring three important things:
The fact that what you're really trying to do is write a comma-separated values file
The input that you're receiving and whether or not it will have commas in it
If you sanitize your input, what the current culture on the thread is when you write it out to the file
You want to write a comma-delimited file. There's no standardized format for this, but you do have to be careful of string content, especially in your case, where you're getting user input. Consider the following input:
latitude = "39,41"
longitude = "41,20"
There are a number of countries where the comma is used as a decimal separator, so this kind of input is very possible, depending on how distributed your application is (I'd be even more concerned if this was a website, personally).
And when getting the elevation, it's absolutely possible in most other places that use a comma as the thousands separator:
elevation = 20,000
In all of the other answers, your output for the line in the file will be:
39,41,41,20,20,000
Which when parsed (assuming it will be parsed, you're creating a machine-readable format) will fail.
What you want to do is parse the content first into a decimal and then output that.
Assuming you sanitize your input like so:
decimal latitude = Decimal.Parse(textBoxLatitude.Text);
decimal longitude = Decimal.Parse(textBoxLongitude.Text);
decimal elevation = Decimal.Parse(textBoxElevation.Text);
You would then format the values so that there are no commas (if you want).
To that end, I really recommend that you want to use a dedicated CSV writer/parser (try ServiceStack's serializer on NuGet, or others, if you prefer), which accounts for commas within the content you want separated by commas.
private void saveToolStripMenuItem_Click(object sender, EventArgs e)
{
TextWriter tw = new StreamWriter("NewData.txt");
tw.WriteLine(String.Join(", ", newData));
// Add appropriate error detection
}
In response to the discussion in both main answer threads, here is an example from my older code of a more robust way to handle CSV output:
The above not checked for syntax, but the key concept is String.Join.
public const string Quote = "\"";
public static void EmitCsvLine(TextWriter report, IList<string> values)
{
List<string> csv = new List<string>(values.Count);
for (var z = 0; z < values.Count; z += 1)
{
csv.Add(Quote + values[z].Replace(Quote, Quote + Quote) + Quote);
}
string line = String.Join(",", csv);
report.WriteLine(line);
}
This could be made slightly more general with an IEnumerable<object> but in the code I took this form, I didn't have the need to.
You cannot output the list just by calling tw.WriteLine(newData);
But something like this will achieve that:
tw.WriteLine(string.Join(", ", newData));
you could:
StringBuilder b = new StringBuilder();
foreach (string s in yourList)
{
b.Append(s);
b.Append(", ");
}
string dir = "c:\mypath";
File.WriteAllText(dir, b.ToString());
You have to iterate the List (not tested) or use string.Join, as the other users suggested (you need to convert your list to an array then)
private void saveToolStripMenuItem_Click(object sender, EventArgs e)
{
TextWriter tw = new StreamWriter("NewData.txt");
for (int i = 0; i < newData.Count; i++)
{
tw.Write(newData[i]);
if(i < newData.Count-1)
{
tw.Write(",");
}
}
tw.close();
buttonSave.Enabled = true;
textBoxLatitude.Enabled = false;
textBoxLongtitude.Enabled = false;
textBoxElevation.Enabled = false;
}

Categories