Azure Data Lake : Facing issue while moving data from Blob to ADLS - c#

I am creating an Azure Function in C# which does following things:
extracts the zipped file from a blob,
unzips it and copies it to Azure Data Lake Store.
I was able to unzip the file and upload it into another blob using the UploadFromStreamAsync(stream) function.
However, I am facing issue while doing the same For ADLS
I referred to the below link Upload to ADLS from file stream and tried to first create the file using adlsFileSystemClient.FileSystem.Create and then append the stream using adlsFileSystemClient.FileSystem.Append in data lake but it did not work.
- The create method creates a zero byte file but the append does nothing and the azure function still completes successfully without any error. Also, tried with adlsFileSystemClient.FileSystem.AppendAsync and still the same problem.
Code:
// Save blob(zip file) contents to a Memory Stream.
using (var zipBlobFileStream = new MemoryStream())
{
await blockBlob.DownloadToStreamAsync(zipBlobFileStream);
await zipBlobFileStream.FlushAsync();
zipBlobFileStream.Position = 0;
//use ZipArchive from System.IO.Compression to extract all the files from zip file
using (var zip = new ZipArchive(zipBlobFileStream))
{
//Each entry here represents an individual file or a folder
foreach (var entry in zip.Entries)
{
string destfilename = $"{destcontanierPath2}/"+entry.FullName;
log.Info($"DestFilename: {destfilename}");
//creating an empty file (blobkBlob) for the actual file with the same name of file
var blob = extractcontainer.GetBlockBlobReference($"{destfilename}");
using (var stream = entry.Open())
{
//check for file or folder and update the above blob reference with actual content from stream
if (entry.Length > 0)
{
await blob.UploadFromStreamAsync(stream);
//Creating a file and then append
adlsFileSystemClient.FileSystem.Create(_adlsAccountName, "/raw/Hello.txt",overwrite:true);
// Appending the stream to Azure Data Lake
using(var ms = new MemoryStream())
{
stream.CopyTo(ms);
ms.Position = 0; // rewind
log.Info($"**********MemoryStream: {ms}");
// do something with ms
await adlsFileSystemClient.FileSystem.AppendAsync(_adlsAccountName, "/raw/Hello.txt",ms,0);
}
}
}
}
}
}
New Interim Solution:
using (var zipBlobFileStream = new MemoryStream())
{
await blockBlob.DownloadToStreamAsync(zipBlobFileStream);
using (var zip = new ZipArchive(zipBlobFileStream))
{
//Each entry here represents an individual file or a folder
foreach (var entry in zip.Entries)
{
entry.ExtractToFile(directoryPath + entry.FullName, true);
//Upload the File to ADLS
var parameters = new UploadParameters(directoryPath + entry.FullName, "/raw/" + md5, _adlsAccountName, isOverwrite: true, maxSegmentLength: 268435456 * 2);
var frontend = new Microsoft.Azure.Management.DataLake.StoreUploader.DataLakeStoreFrontEndAdapter(_adlsAccountName, adlsFileSystemClient);
var uploader = new DataLakeStoreUploader(parameters, frontend);
uploader.Execute();
File.Delete(directoryPath + entry.FullName);
}
}
}

In your case, You could change your code as following, then it should work. You should remove the create file code out of the foreach clause.
//Creating a file and then append
adlsFileSystemClient.FileSystem.Create(_adlsAccountName, "/raw/Hello.txt",overwrite:true);
using (var zipBlobFileStream = new MemoryStream())
{
await blockBlob.DownloadToStreamAsync(zipBlobFileStream);
await zipBlobFileStream.FlushAsync();
zipBlobFileStream.Position = 0;
//use ZipArchive from System.IO.Compression to extract all the files from zip file
using (var zip = new ZipArchive(zipBlobFileStream))
{
//Each entry here represents an individual file or a folder
foreach (var entry in zip.Entries)
{
string destfilename = $"{destcontanierPath2}/"+entry.FullName;
log.Info($"DestFilename: {destfilename}");
//creating an empty file (blobkBlob) for the actual file with the same name of file
var blob = extractcontainer.GetBlockBlobReference($"{destfilename}");
using (var stream = entry.Open())
{
//check for file or folder and update the above blob reference with actual content from stream
if (entry.Length > 0)
{
using (MemoryStream ms = new MemoryStream())
{
stream.CopyTo(ms);
ms.Position = 0;
blob.UploadFromStream(ms);
ms.Position = 0;
adlsFileSystemClient.FileSystem.Append(adlsAccountName, "/raw/Hello.txt", ms);
}
}
}
}
}
}
}

Related

how to make file content as base64 encoded before making zip file

I have a directory where I have CSV files which I need to first encode the file content as base64 string and then make it as zip file.
I am able to make file as zip with below code, but in between on the fly how to make file content as base64 encoded? Thanks!
var csvFiles = Directory.GetFiles(#"C:\Temp", "*.csv")
.Select(f => new FileInfo(f));
foreach (var file in csvFiles)
{
using (var newFile = ZipFile.Open($#"C:\tmp\{Path.GetFileNameWithoutExtension(file.Name)}.zip",
ZipArchiveMode.Create))
{
newFile.CreateEntryFromFile($#"C:\Temp\{file.Name}",
file.Name);
}
}
Disregarding your motives or other problems (conceptual or otherwise)
Here is a fully streamed solution with minimal allocations (let's be nice to your Large Object Heap). The CryptoStream with ToBase64Transform, is just a way to stream base64 encoding
var csvFiles = Directory.GetFiles(#"D:\Temp");
using var outputStream = new FileStream(#"D:\Test.zip", FileMode.Create);
using var archive = new ZipArchive(outputStream, ZipArchiveMode.Create, true);
foreach (var file in csvFiles)
{
using var inputFile = new FileStream(file, FileMode.Open, FileAccess.Read);
using var base64Stream = new CryptoStream(inputFile, new ToBase64Transform(), CryptoStreamMode.Read);
var entry = archive.CreateEntry(Path.GetFileName(file));
using var zipStream = entry.Open();
base64Stream.CopyTo(zipStream);
}
You need to create the base64 string, convert it to a byte array, and then create the archive entry from the byte array (by creating a stream).
Something like this should do the job:
var dirInfo = new DirectoryInfo(#"C:\Temp");
var csvFiles = dirInfo.GetFiles("*.csv"); // This already returns a `FileInfo[]`.
foreach (var file in csvFiles)
{
var fileBytes = File.ReadAllBytes(file.FullName);
var base64String = Convert.ToBase64String(fileBytes);
var base64Bytes = Encoding.UTF8.GetBytes(base64String);
string newFilePath = $#"C:\tmp\{Path.GetFileNameWithoutExtension(file.Name)}.zip";
using (var newFile = ZipFile.Open(newFilePath, ZipArchiveMode.Create))
{
// You might want to change the extension
// since the file is no longer in CSV format.
var zipEntry = newFile.CreateEntry(file.Name);
using (var base64Stream = new MemoryStream(base64Bytes))
using (var zipEntryStream = zipEntry.Open())
{
base64Stream.CopyTo(zipEntryStream);
}
}
}
Alternatively, you could save the base64 string to a temporary file, create the entry from that file, and then delete it; but I don't prefer writing dummy data to the disk when the job can be done in memory.

C# ZipArchive "End of Central Directory record could not be found"

I created a BizTalk Custom Pipeline Component, which zips all message parts into a zip stream. After some fails I created the following test method in a separate test project.
Basically I get a XML file which contains a filename and an UUID which I use to call a stored procedure and get the Base64 encoded content of the database entry.
The base64 content seems valid, because after decoding it and saving it to the file system the files can be read by the windows explorer without problems.
After saving the archiveStream to the file system I get the following error message from 7Zip when I try to extract the file:
"Unexpected end of data". if I try to just open the file with 7Zip there is no problem. I even can open the files from inside the 7Zip explorer.
If I try to read the file from C# with the following code I get the error message:
"End of Central Directory record could not be found."
Unzip code:
private static void ReadDat() {
var path = #"...\zip\0e00128b-0a6e-4b99-944d-68e9c20a51c2.zip";
var stream = System.IO.File.OpenRead(path);
// End of Central Directory record could not be found:
var zipArchive = new ZipArchive(stream, ZipArchiveMode.Read, false);
foreach(var zipEntry in zipArchive.Entries) {
var stream = zipEntry.Open();
Console.WriteLine(stream.Length);
}
}
Zip Code:
private static void StreamUuidList() {
var path = #"...\2017-08-05T132705.xml";
var xdoc = XDocument.Load(System.IO.File.OpenRead(path));
var files = xdoc.Root.Descendants().Where(d => d.Name.LocalName.Equals("NodeName"));
using (var archiveStream = new MemoryStream())
using (var archive = new ZipArchive(archiveStream, ZipArchiveMode.Create, true)) {
foreach (var file in files) {
var fileName = file.Elements().Where(e => e.Name.LocalName.Equals("FileName")).FirstOrDefault()?.Value ?? "";
var streamUuid = file.Elements().Where(e => e.Name.LocalName.Equals("StreamUUID")).FirstOrDefault()?.Value ?? "";
// validation here...
// get base64 content and convert content
var base64Content = GetStreamContent(streamUuid);
var data = Convert.FromBase64String(base64Content);
var dataStream = new MemoryStream(data);
dataStream.Seek(0, SeekOrigin.Begin);
// debug - save to file location
using (var fileStream = new FileStream($#"...\files\{fileName}", FileMode.Create)) {
dataStream.CopyTo(fileStream);
}
dataStream.Seek(0, SeekOrigin.Begin);
// create zip entry
var zipFile = archive.CreateEntry(fileName, GetCompressionLevelFromString("Optimal"));
using (var zipFileStream = zipFile.Open()) {
// copy data from mesage part stream into zip entry stream
dataStream.Seek(0, SeekOrigin.Begin);
dataStream.CopyTo(zipFileStream);
}
Console.WriteLine(fileName + ": " + streamUuid);
}
// debug - save to file location
archiveStream.Seek(0, SeekOrigin.Begin);
using (var fileStream = new FileStream($#"...\zip\{Guid.NewGuid()}.dat", FileMode.Create)) {
archiveStream.CopyTo(fileStream);
}
// debug end
}
}

Re-create zip file from another zip file

Given a zip file, I need to re-create it with a specified compression level (eg, no compression).
I'm nearly there, but get the error:
Failed: Number of entries expected in End Of Central Directory does not correspond to number of entries in Central Directory.
If I save the recreated zip file to windows, it looks like it's correct (correct file size, entries all exist with correct file sizes) but none of the files are extractable.
public static byte[] ReCompress(byte[] originalArchive, CompressionLevel newCompressionLevel)
{
var entries = new Dictionary<string, byte[]>();
///////////////////////////
// STEP 1: EXTRACT ALL FILES
///////////////////////////
using (var ms = new MemoryStream(originalArchive))
using (var originalZip = new ZipArchive(ms, ZipArchiveMode.Read))
{
foreach (var entry in originalZip.Entries)
{
var isFolder = entry.FullName.EndsWith("/");
if (!isFolder)
{
using (var stream = entry.Open())
using (var entryMS = new MemoryStream())
{
stream.CopyTo(entryMS);
entries.Add(entry.FullName, entryMS.ToArray());
}
}
else
{
entries.Add(entry.FullName, new byte[0]);
}
}
}
///////////////////////////
// STEP 2: BUILD ZIP FILE
///////////////////////////
using (var ms = new MemoryStream())
using (var newArchive = new ZipArchive(ms, ZipArchiveMode.Create, true))
{
foreach (var uncompressedEntry in entries)
{
var newEntry = newArchive.CreateEntry(uncompressedEntry.Key, newCompressionLevel);
using (var entryStream = newEntry.Open())
using (var writer = new BinaryWriter(entryStream, Encoding.UTF8))
{
writer.Write(uncompressedEntry.Value);
}
}
return ms.ToArray();
}
}
At the end of the function if I do:
File.WriteAllBytes(#"D:\test.zip", ms.ToArray());
It creates a correctly structure archive sized 90mb but no files are extractable.
If I end with return ms.ToArray() it returns a ~130kb byte array.
Zip archive is broken because you read its content from MemoryStream before it is finished. In order to finish archive creation you need to call newArchive.Dispose() before calling ms.ToArray().
In this particular case you can do it like this:
using (var ms = new MemoryStream())
{
using (var newArchive = new ZipArchive(ms, ZipArchiveMode.Create, true))
{
foreach (var uncompressedEntry in entries)
{
var newEntry = newArchive.CreateEntry(uncompressedEntry.Key, newCompressionLevel);
using (var entryStream = newEntry.Open())
using (var writer = new BinaryWriter(entryStream, Encoding.UTF8))
{
writer.Write(uncompressedEntry.Value);
}
}
}
return ms.ToArray();
}

Nesting Zip Files and Folders in Memory using DotNetZip Library

We have a page that users can download media and we construct a folder structure similar to the following and zip it up and send it back to the user in the response.
ZippedFolder.zip
- Folder A
- File 1
- File 2
- Folder B
- File 3
- File 4
The existing implementation that accomplishes this saves files and directories temporarily to file system and then deletes them at the end. We are trying to get away from doing this and would like to accomplish this entirely in memory.
I am able to successfully create a ZipFile with files in it, but the problem I am running into is creating Folder A and Folder B and adding files to those and then adding those two folders to the Zip File.
How can I do this without saving to the file system?
The code for just saving the file streams to the zip file and then setting the Output Stream on the response is the following.
public Stream CompressStreams(IList<Stream> Streams, IList<string> StreamNames, Stream OutputStream = null)
{
MemoryStream Response = null;
using (ZipFile ZippedFile = new ZipFile())
{
for (int i = 0, length = Streams.Count; i < length; i++)
{
ZippedFile.AddEntry(StreamNames[i], Streams[i]);
}
if (OutputStream != null)
{
ZippedFile.Save(OutputStream);
}
else
{
Response = new MemoryStream();
ZippedFile.Save(Response);
// Move the stream back to the beginning for reading
Response.Seek(0, SeekOrigin.Begin);
}
}
return Response;
}
EDIT We are using DotNetZip for the zipping/unzipping library.
Here's another way of doing it using System.IO.Compression.ZipArchive
public Stream CompressStreams(IList<Stream> Streams, IList<string> StreamNames, Stream OutputStream = null)
{
MemoryStream Response = new MemoryStream();
using (ZipArchive ZippedFile = new ZipArchive(Response, ZipArchiveMode.Create, true))
{
for (int i = 0, length = Streams.Count; i < length; i++)
using (var entry = ZippedFile.CreateEntry(StreamNames[i]).Open())
{
Streams[i].CopyTo(entry);
}
}
if (OutputStream != null)
{
Response.Seek(0, SeekOrigin.Begin);
Response.CopyTo(OutputStream);
}
return Response;
}
and a little test:
using (var write = new FileStream(#"C:\users\Public\Desktop\Testzip.zip", FileMode.OpenOrCreate, FileAccess.Write))
using (var read = new FileStream(#"C:\windows\System32\drivers\etc\hosts", FileMode.Open, FileAccess.Read))
{
CompressStreams(new List<Stream>() { read }, new List<string>() { #"A\One.txt" }, write);
}
re: your comment -- sorry, not sure if it creates something in the background, but you're not creating it yourself to do anything

Extract ZipArchive entry to blob storage

Using the normal Windows file system, the ExtractToFile method would be sufficient:
using (ZipArchive archive = new ZipArchive(uploadedFile.InputStream, ZipArchiveMode.Read, true))
{
foreach (var entry in archive.Entries.Where(x => x.Length > 0))
{
entry.ExtractToFile(Path.Combine(location, entry.Name));
}
}
Now that we are using Azure, this obviously needs to change as we are using blob storage.
How can this be done?
ZipArchiveEntry class has an Open method which returns a stream. What you could do is create a blob using that stream.
static void ZipArchiveTest()
{
storageAccount = CloudStorageAccount.DevelopmentStorageAccount;
CloudBlobContainer container = storageAccount.CreateCloudBlobClient().GetContainerReference("temp");
container.CreateIfNotExists();
var zipFile = #"D:\node\test2.zip";
using (FileStream fs = new FileStream(zipFile, FileMode.Open))
{
using (ZipArchive archive = new ZipArchive(fs))
{
var entries = archive.Entries;
foreach (var entry in entries)
{
CloudBlockBlob blob = container.GetBlockBlobReference(entry.FullName);
using (var stream = entry.Open())
{
blob.UploadFromStream(stream);
}
}
}
}
}
I had written a blog post on the same topic with details here Extract a zip file stored as Azure Blob with this simple method
Just in case if your zip file also stored as a blob in storage, the below snippet from my above mentioned blog post helps.
// Save blob(zip file) contents to a Memory Stream.
using (var zipBlobFileStream = new MemoryStream())
{
await blockBlob.DownloadToStreamAsync(zipBlobFileStream);
await zipBlobFileStream.FlushAsync();
zipBlobFileStream.Position = 0;
//use ZipArchive from System.IO.Compression to extract all the files from zip file
using (var zip = new ZipArchive(zipBlobFileStream))
{
//Each entry here represents an individual file or a folder
foreach (var entry in zip.Entries)
{
//creating an empty file (blobkBlob) for the actual file with the same name of file
var blob = extractcontainer.GetBlockBlobReference(entry.FullName);
using (var stream = entry.Open())
{
//check for file or folder and update the above blob reference with actual content from stream
if (entry.Length > 0)
await blob.UploadFromStreamAsync(stream);
}
// TO-DO : Process the file (Blob)
//process the file here (blob) or you can write another process later
//to reference each of these files(blobs) on all files got extracted to other container.
}
}
}

Categories