Remove Byte Order Mark from a File.ReadAllBytes (byte[]) - c#

I have an HTTPHandler that is reading in a set of CSS files and combining them and then GZipping them. However, some of the CSS files contain a Byte Order Mark (due to a bug in TFS 2005 auto merge) and in FireFox the BOM is being read as part of the actual content so it's screwing up my class names etc. How can I strip out the BOM characters? Is there an easy way to do this without manually going through the byte array looking for ""?

Expanding on Jon's comment with a sample.
var name = GetFileName();
var bytes = System.IO.File.ReadAllBytes(name);
System.IO.File.WriteAllBytes(name, bytes.Skip(3).ToArray());

Expanding JaredPar sample to recurse over sub-directories:
using System.Linq;
using System.IO;
namespace BomRemover
{
/// <summary>
/// Remove UTF-8 BOM (EF BB BF) of all *.php files in current & sub-directories.
/// </summary>
class Program
{
private static void removeBoms(string filePattern, string directory)
{
foreach (string filename in Directory.GetFiles(directory, file Pattern))
{
var bytes = System.IO.File.ReadAllBytes(filename);
if(bytes.Length > 2 && bytes[0] == 0xEF && bytes[1] == 0xBB && bytes[2] == 0xBF)
{
System.IO.File.WriteAllBytes(filename, bytes.Skip(3).ToArray());
}
}
foreach (string subDirectory in Directory.GetDirectories(directory))
{
removeBoms(filePattern, subDirectory);
}
}
static void Main(string[] args)
{
string filePattern = "*.php";
string startDirectory = Directory.GetCurrentDirectory();
removeBoms(filePattern, startDirectory);
}
}
}
I had need that C# piece of code after discovering that the UTF-8 BOM corrupts file when you try to do a basic PHP download file.

var text = File.ReadAllText(args.SourceFileName);
var streamWriter = new StreamWriter(args.DestFileName, args.Append, new UTF8Encoding(false));
streamWriter.Write(text);
streamWriter.Close();

Another way, assuming UTF-8 to ASCII.
File.WriteAllText(filename, File.ReadAllText(filename, Encoding.UTF8), Encoding.ASCII);

For larger file, use the following code; memory efficient!
StreamReader sr = new StreamReader(path: #"<Input_file_full_path_with_byte_order_mark>",
detectEncodingFromByteOrderMarks: true);
StreamWriter sw = new StreamWriter(path: #"<Output_file_without_byte_order_mark>",
append: false,
encoding: new UnicodeEncoding(bigEndian: false, byteOrderMark: false));
var lineNumber = 0;
while (!sr.EndOfStream)
{
sw.WriteLine(sr.ReadLine());
lineNumber += 1;
if (lineNumber % 100000 == 0)
Console.Write("\rLine# " + lineNumber.ToString("000000000000"));
}
sw.Flush();
sw.Close();

Related

Read & write a single line from a file without overwrite [duplicate]

I have two text files, Source.txt and Target.txt. The source will never be modified and contain N lines of text. So, I want to delete a specific line of text in Target.txt, and replace by an specific line of text from Source.txt, I know what number of line I need, actually is the line number 2, both files.
I haven something like this:
string line = string.Empty;
int line_number = 1;
int line_to_edit = 2;
using StreamReader reader = new StreamReader(#"C:\target.xml");
using StreamWriter writer = new StreamWriter(#"C:\target.xml");
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
writer.WriteLine(line);
line_number++;
}
But when I open the Writer, the target file get erased, it writes the lines, but, when opened, the target file only contains the copied lines, the rest get lost.
What can I do?
the easiest way is :
static void lineChanger(string newText, string fileName, int line_to_edit)
{
string[] arrLine = File.ReadAllLines(fileName);
arrLine[line_to_edit - 1] = newText;
File.WriteAllLines(fileName, arrLine);
}
usage :
lineChanger("new content for this line" , "sample.text" , 34);
You can't rewrite a line without rewriting the entire file (unless the lines happen to be the same length). If your files are small then reading the entire target file into memory and then writing it out again might make sense. You can do that like this:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
int line_to_edit = 2; // Warning: 1-based indexing!
string sourceFile = "source.txt";
string destinationFile = "target.txt";
// Read the appropriate line from the file.
string lineToWrite = null;
using (StreamReader reader = new StreamReader(sourceFile))
{
for (int i = 1; i <= line_to_edit; ++i)
lineToWrite = reader.ReadLine();
}
if (lineToWrite == null)
throw new InvalidDataException("Line does not exist in " + sourceFile);
// Read the old file.
string[] lines = File.ReadAllLines(destinationFile);
// Write the new file over the old file.
using (StreamWriter writer = new StreamWriter(destinationFile))
{
for (int currentLine = 1; currentLine <= lines.Length; ++currentLine)
{
if (currentLine == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
writer.WriteLine(lines[currentLine - 1]);
}
}
}
}
}
If your files are large it would be better to create a new file so that you can read streaming from one file while you write to the other. This means that you don't need to have the whole file in memory at once. You can do that like this:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
int line_to_edit = 2;
string sourceFile = "source.txt";
string destinationFile = "target.txt";
string tempFile = "target2.txt";
// Read the appropriate line from the file.
string lineToWrite = null;
using (StreamReader reader = new StreamReader(sourceFile))
{
for (int i = 1; i <= line_to_edit; ++i)
lineToWrite = reader.ReadLine();
}
if (lineToWrite == null)
throw new InvalidDataException("Line does not exist in " + sourceFile);
// Read from the target file and write to a new file.
int line_number = 1;
string line = null;
using (StreamReader reader = new StreamReader(destinationFile))
using (StreamWriter writer = new StreamWriter(tempFile))
{
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
writer.WriteLine(line);
}
line_number++;
}
}
// TODO: Delete the old file and replace it with the new file here.
}
}
You can afterwards move the file once you are sure that the write operation has succeeded (no excecption was thrown and the writer is closed).
Note that in both cases it is a bit confusing that you are using 1-based indexing for your line numbers. It might make more sense in your code to use 0-based indexing. You can have 1-based index in your user interface to your program if you wish, but convert it to a 0-indexed before sending it further.
Also, a disadvantage of directly overwriting the old file with the new file is that if it fails halfway through then you might permanently lose whatever data wasn't written. By writing to a third file first you only delete the original data after you are sure that you have another (corrected) copy of it, so you can recover the data if the computer crashes halfway through.
A final remark: I noticed that your files had an xml extension. You might want to consider if it makes more sense for you to use an XML parser to modify the contents of the files instead of replacing specific lines.
When you create a StreamWriter it always create a file from scratch, you will have to create a third file and copy from target and replace what you need, and then replace the old one.
But as I can see what you need is XML manipulation, you might want to use XmlDocument and modify your file using Xpath.
You need to Open the output file for write access rather than using a new StreamReader, which always overwrites the output file.
StreamWriter stm = null;
fi = new FileInfo(#"C:\target.xml");
if (fi.Exists)
stm = fi.OpenWrite();
Of course, you will still have to seek to the correct line in the output file, which will be hard since you can't read from it, so unless you already KNOW the byte offset to seek to, you probably really want read/write access.
FileStream stm = fi.Open(FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None);
with this stream, you can read until you get to the point where you want to make changes, then write. Keep in mind that you are writing bytes, not lines, so to overwrite a line you will need to write the same number of characters as the line you want to change.
I guess the below should work (instead of the writer part from your example). I'm unfortunately with no build environment so It's from memory but I hope it helps
using (var fs = File.Open(filePath, FileMode.Open, FileAccess.ReadWrite)))
{
var destinationReader = StreamReader(fs);
var writer = StreamWriter(fs);
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
destinationReader .ReadLine();
}
line_number++;
}
}
The solution works fine. But I need to change single-line text when the same text is in multiple places. For this, need to define a trackText to start finding after that text and finally change oldText with newText.
private int FindLineNumber(string fileName, string trackText, string oldText, string newText)
{
int lineNumber = 0;
string[] textLine = System.IO.File.ReadAllLines(fileName);
for (int i = 0; i< textLine.Length;i++)
{
if (textLine[i].Contains(trackText)) //start finding matching text after.
traced = true;
if (traced)
if (textLine[i].Contains(oldText)) // Match text
{
textLine[i] = newText; // replace text with new one.
traced = false;
System.IO.File.WriteAllLines(fileName, textLine);
lineNumber = i;
break; //go out from loop
}
}
return lineNumber
}

c# Find file encoding

i m trying to write a method that does an estimate of the encoding of a file, i searched the msdn site and found this :
using (StreamReader sr = new StreamReader(openFileDialog1.FileName, true))
{
using (var reader = new StreamReader(openFileDialog1.FileName, defaultEncodingIfNoBom, true))
{
reader.Peek(); // you need this!
var encoding = reader.CurrentEncoding;
}
while (sr.Peek() >= 0)
{
Console.Write((char)sr.Read());
}
Console.WriteLine("The encoding used was {0}.", sr.CurrentEncoding);
Console.ReadLine();
Console.WriteLine();
textBox4.Text = sr.CurrentEncoding.ToString();
}
My problem with the code above is that for large files, it reads the entire file before there's any sort of output, is there a way to limit this to reading just say, the first ten lines of a file?
You could use the System.Linq derivative and use the following;
string[] first10Lines = File.ReadLines(path).Take(10).ToList();
That will read the first 10 lines of the file and store it in an array. you can then do a for loop to write the data in the array like this which should add a new line after each data is written.
foreach (string line in first10Lines)
{
File.AppendAllText(newPathName, line);
File.AppendAllText(newPathName, string.Format("{0}{1}", " ", Environment.NewLine));
}

Problems with text Encoding

I read txt file in ANSI or UTF-8 Encoding. Txt file consists of such lines:
79005213750:hello
79005213751:привет
79005213752:серега
Read it with such code:
TextReader readFile = new StreamReader(file_path, Encoding.Default);
foreach (string line in ReadLineFromFile(readFile))
{}
private static IEnumerable<string> ReadLineFromFile(TextReader fileReader)
{
using (fileReader)
{
string currentLine;
while ((currentLine = fileReader.ReadLine()) != null)
{
yield return currentLine;
}
}
}
and after all manipulations with lines I save them:
SaveFileDialog saveFile1 = new SaveFileDialog();
saveFile1.DefaultExt = "*.txt";
saveFile1.Filter = "TXT Files|*.txt";
saveFile1.FileName = "rus_number-pass";
if (saveFile1.ShowDialog() == System.Windows.Forms.DialogResult.OK && saveFile1.FileName.Length > 0)
{
using (System.IO.StreamWriter file = new System.IO.StreamWriter(saveFile1.FileName))
foreach (string line in digits_ru)
{
file.WriteLine(line);
}
}
In out I receive:
79005213750:hello
79005213751:привет
79005213752:серега
But expect :
79005213750:hello
79005213751:привет
79005213752:серега
Can u help me? I killed 2 days on this problem, but can't solve it=\
I believe you are using one encoding (Encoding.Default) for Read operations and another one (UTF8) for writing.
Use different overload of System.IO.StreamWriter constructor. E.g. this one
public StreamWriter(string path, bool append, Encoding encoding)
and as the encoding parameter pass the same default encoding you pass into the Reader.
TextReader readFile = new StreamReader(file_path, Encoding.Default);
I think that after you'll see the expected characters in the output file.
Btw. be aware that using Encoding.Default is not recommended.
You can specify the target encoding by using the codepage number:
var encoding = Encoding.GetEncoding(1251);
Which is russian, thats what you need I presume.

C# Saving an MP4 Resource to a file

I've tried a few different ways but it won't open when it's saved. How can I accomplish this?
Basically I want to be able to save an MP4 file that's currently a resource file to a temp location that I can access as a path.
Here's something I've tried:
public static void WriteResourceToFile(string resourceName, string fileName)
{
using (Stream s = Assembly.GetExecutingAssembly().GetManifestResourceStream(resourceName))
{
if (s != null)
{
byte[] buffer = new byte[s.Length];
char[] sb = new char[s.Length];
s.Read(buffer, 0, (int)(s.Length));
/* convert the byte into ASCII text */
for (int i = 0; i <= buffer.Length - 1; i++)
{
sb[i] = (char)buffer[i];
}
using (StreamWriter sw = new StreamWriter(fileName))
{
sw.Write(sb);
sw.Flush();
}
}
}}
You're overcomplicating it.
Try something like this (note, not compiled or tested, and Stream.CopyTo() only exists in .NET 4.0 and later).
using (Stream s = Assembly.GetExecutingAssembly().GetManifestResourceStream(resourceName)))
using (FileStream fs = File.Open("c:\myfile.mp4", FileMode.Create))
{
s.CopyTo(fs);
}
Job done.
If you don't have .NET 4.0 available, you'll need to implement one yourself, like one of these: How do I copy the contents of one stream to another?
To get a list of all of the resource names in the current assembly, do something like this:
Assembly a = Assembly.GetExecutingAssembly();
foreach (string s in a.GetManifestResourceNames())
{
Console.WriteLine(s);
}
Console.ReadKey();
Take what turns up on the console and pass it into GetManifestResourceStream() in the first snippet I posted.
http://msdn.microsoft.com/en-us/library/system.reflection.assembly.getmanifestresourcenames.aspx
Why are you writing an MP4 as a string? You should write out bytes without modification. Your conversion to chars is modifying the data. Use The FileStream call and call the Write method.
you could try something like this:
I pasted the wrong code in.... sorry, i was in a hurry
[HttpPost]
public ActionResult Create(VideoSermons video, HttpPostedFileBase videoFile)
{
var videoDb = new VideoSermonDb();
try
{
video.Path = Path.GetFileName(videoFile.FileName);
video.UserId = HttpContext.User.Identity.Name;
videoDb.Create(video);
if (videoFile != null && videoFile.ContentLength > 0)
{
var videoName = Path.GetFileName(videoFile.FileName);
var videoPath = Path.Combine(Server.MapPath("~/Videos/"),
System.IO.Path.GetFileName(videoFile.FileName));
videoFile.SaveAs(videoPath);
}
return RedirectToAction("Index");
}
catch
{
return View();
}
}
this actually loads video files to a directory, but it should work for your format as well.
-Thanks,

Delete specific line from a text file?

I need to delete an exact line from a text file but I cannot for the life of me workout how to go about doing this.
Any suggestions or examples would be greatly appreciated?
Related Questions
Efficient way to delete a line from a text file (C#)
If the line you want to delete is based on the content of the line:
string line = null;
string line_to_delete = "the line i want to delete";
using (StreamReader reader = new StreamReader("C:\\input")) {
using (StreamWriter writer = new StreamWriter("C:\\output")) {
while ((line = reader.ReadLine()) != null) {
if (String.Compare(line, line_to_delete) == 0)
continue;
writer.WriteLine(line);
}
}
}
Or if it is based on line number:
string line = null;
int line_number = 0;
int line_to_delete = 12;
using (StreamReader reader = new StreamReader("C:\\input")) {
using (StreamWriter writer = new StreamWriter("C:\\output")) {
while ((line = reader.ReadLine()) != null) {
line_number++;
if (line_number == line_to_delete)
continue;
writer.WriteLine(line);
}
}
}
The best way to do this is to open the file in text mode, read each line with ReadLine(), and then write it to a new file with WriteLine(), skipping the one line you want to delete.
There is no generic delete-a-line-from-file function, as far as I know.
One way to do it if the file is not very big is to load all the lines into an array:
string[] lines = File.ReadAllLines("filename.txt");
string[] newLines = RemoveUnnecessaryLine(lines);
File.WriteAllLines("filename.txt", newLines);
Hope this simple and short code will help.
List linesList = File.ReadAllLines("myFile.txt").ToList();
linesList.RemoveAt(0);
File.WriteAllLines("myFile.txt"), linesList.ToArray());
OR use this
public void DeleteLinesFromFile(string strLineToDelete)
{
string strFilePath = "Provide the path of the text file";
string strSearchText = strLineToDelete;
string strOldText;
string n = "";
StreamReader sr = File.OpenText(strFilePath);
while ((strOldText = sr.ReadLine()) != null)
{
if (!strOldText.Contains(strSearchText))
{
n += strOldText + Environment.NewLine;
}
}
sr.Close();
File.WriteAllText(strFilePath, n);
}
You can actually use C# generics for this to make it real easy:
var file = new List<string>(System.IO.File.ReadAllLines("C:\\path"));
file.RemoveAt(12);
File.WriteAllLines("C:\\path", file.ToArray());
This can be done in three steps:
// 1. Read the content of the file
string[] readText = File.ReadAllLines(path);
// 2. Empty the file
File.WriteAllText(path, String.Empty);
// 3. Fill up again, but without the deleted line
using (StreamWriter writer = new StreamWriter(path))
{
foreach (string s in readText)
{
if (!s.Equals(lineToBeRemoved))
{
writer.WriteLine(s);
}
}
}
Read and remember each line
Identify the one you want to get rid
of
Forget that one
Write the rest back over the top of
the file
I cared about the file's original end line characters ("\n" or "\r\n") and wanted to maintain them in the output file (not overwrite them with what ever the current environment's char(s) are like the other answers appear to do). So I wrote my own method to read a line without removing the end line chars then used it in my DeleteLines method (I wanted the option to delete multiple lines, hence the use of a collection of line numbers to delete).
DeleteLines was implemented as a FileInfo extension and ReadLineKeepNewLineChars a StreamReader extension (but obviously you don't have to keep it that way).
public static class FileInfoExtensions
{
public static FileInfo DeleteLines(this FileInfo source, ICollection<int> lineNumbers, string targetFilePath)
{
var lineCount = 1;
using (var streamReader = new StreamReader(source.FullName))
{
using (var streamWriter = new StreamWriter(targetFilePath))
{
string line;
while ((line = streamReader.ReadLineKeepNewLineChars()) != null)
{
if (!lineNumbers.Contains(lineCount))
{
streamWriter.Write(line);
}
lineCount++;
}
}
}
return new FileInfo(targetFilePath);
}
}
public static class StreamReaderExtensions
{
private const char EndOfFile = '\uffff';
/// <summary>
/// Reads a line, similar to ReadLine method, but keeps any
/// new line characters (e.g. "\r\n" or "\n").
/// </summary>
public static string ReadLineKeepNewLineChars(this StreamReader source)
{
if (source == null)
throw new ArgumentNullException(nameof(source));
char ch = (char)source.Read();
if (ch == EndOfFile)
return null;
var sb = new StringBuilder();
while (ch != EndOfFile)
{
sb.Append(ch);
if (ch == '\n')
break;
ch = (char)source.Read();
}
return sb.ToString();
}
}
Are you on a Unix operating system?
You can do this with the "sed" stream editor. Read the man page for "sed"
What?
Use file open, seek position then stream erase line using null.
Gotch it? Simple,stream,no array that eat memory,fast.
This work on vb.. Example search line culture=id where culture are namevalue and id are value and we want to change it to culture=en
Fileopen(1, "text.ini")
dim line as string
dim currentpos as long
while true
line = lineinput(1)
dim namevalue() as string = split(line, "=")
if namevalue(0) = "line name value that i want to edit" then
currentpos = seek(1)
fileclose()
dim fs as filestream("test.ini", filemode.open)
dim sw as streamwriter(fs)
fs.seek(currentpos, seekorigin.begin)
sw.write(null)
sw.write(namevalue + "=" + newvalue)
sw.close()
fs.close()
exit while
end if
msgbox("org ternate jua bisa, no line found")
end while
that's all..use #d

Categories