AzCopy Ignoring or recreating Checksum - c#

I'm using the Azure tool AzCopy to export data from table storage, modify the exported data, and then import the data into another table storage table. I'm using the following command to export:
AzCopy /Source:https://MYSERVER/MYTABLE/ /SourceKey:SOURCEKEY /Dest:C:\migration /Manifest:MYTABLE
Since you cannot add a filter for the export, I'm filtering the data post-export, remove data from the JSON as necessary. I'm then using the following command to import this data to another server:
AzCopy/Source:C:\export /Dest:https://MYOTHERSERVER/MYTABLE /DestType:Table /DestKey:DESTKEY /Manifest:MYTABLE EntityOperation:InsertOrReplace
These operations work fine when I do not manipulate the JSON file. When I do, however, the contents of the file are, of course, changed and the checksum in the manifest file no longer matches. When I go to do the import, I get a "file is corrupt" message.
Here is what the manifest file looks like:
"Version":2,"PayloadFormat":"Json","Checksum":5500917691400439101,"AccountName":"SERVER","TableName":"MYTABLE","Timestamp":"2017-08-25T14:10:53.7489755Z","SplitSize":0,"TotalDataFiles":1}
How can I get AzCopy to either not validate the checksum or replace the checksum?
I've tried the following code to recreate the checksum, but when I do on the original JSON, it does not match:
var md5Hash = getFileHash(file);
var checksum = convertHash(md5Hash);
private byte[] getFileHash(string filePath)
{
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(filePath))
{
return md5.ComputeHash(stream);
}
}
}
private string convertHash(byte[] data)
{
var algorithm = MD5.Create();
var result = BitConverter.ToUInt64(data,0);
return result.ToString();
}
This returns 4500814390503865422.

AzCopy doesn't support skipping checksum validation during table import for now. BTW, the checksum recorded in manifest file is actually CRC rather than MD5, and it's calculated by aggregating CRC of all exported files rather than the single manifest file.

Related

File.ReadAllBytes takes a lot of RAM, is there a better way?

I am busy working with an encryption lib, and I want to encrypt large files (with AES GCM). Currently I have this for encrypting files, after writing a temp file, Chipher, from the CryptoStream:
byte[] Chiper = File.ReadAllBytes(BUFFER_PATH);
// Retrieve tag and create array to hold encrypted data.
byte[] AuthTag = encryptor.GetTag();
byte[] encrypted = new byte[Chiper.Length + aes.IV.Length + AuthTag.Length];
// Set needed data in byte array.
aes.IV.CopyTo(encrypted, 0);
AuthTag.CopyTo(encrypted, IV_LENGTH);
Chiper.CopyTo(encrypted, IV_LENGTH + TAG_LENGTH);
File.WriteAllBytes(END_PATH, encrypted);
This function works fine, however it takes a lot of RAM depending on the filesize. Is there a better way though this? I tried using a FileStream though it starts conflicting with my code. Is there a way to use less, or no memory to save Chiper(byte[])?
It appears that you're trying to write a composite file that has three pieces of information - the tag, IV and cipher text. Given that you can't get the tag value until after the encryption completes, you are trying to composite the data after encryption completes.
The problem comes in when you attempt to load a large encrypted file into memory. Fortunately, streams provide a nice simple solution for this:
byte[] authTag = encryptor.GetTag();
using (var tempfile = File.OpenRead(BUFER_PATH))
using (var outstream = File.Create(END_PATH))
{
// write tag
outstream.Write(authTag, 0, authTag.Length);
// write IV
outstream.Write(aes.IV, 0, aes.IV.Length);
// copy data from source file to output file
tempfile.CopyTo(outstream);
}
On the other hand you could also write the data straight to the output file if you know ahead of time what size the tag and IV are going to be. Just allocate space for the tag value at the start of the file and come back and write it in after the fact. That saves you having to use a temporary file.

Decryption of Encrypted Vector Tiles Mapbox

I'll try to be brief, but I'll share the whole picture.
Problem Statement
I am using vector tile from tippecanoe from mapbox to create .pbtiles from my geojson data. The issue is, on a web client when I see the inspect element and download the .pbf and run it by this (mapbox-vector-tile-cs) library, I am able to successfully get the data from the tile. Which means that any one with some basic google search can also steal my data from the vector tiles.
What I was able to achieve
To avoid the security concern, with the short timeline I have, I came up with a quick and dirty way. After tippecanoe creates the .mbtiles sqlite db, I run a java utility I made to encrypt the data in the blob using AES 256 encryption and stored it in two different ways in two different sqlite db's:
Stored as bytes into a different .mbtiles sqlite db (which get's stored as Blob). Along with z, x, y and metadata
Encoded the encrypted data as base64 and then stored the base64encoded encrypted tile data into a string data type column. Along with z, x, y and metadata.
and stored the key (base64 encoded) and initialization vector (base64 encoded) into a file.
The API side (Question 1)
Now, when I get the non encrypted .pbf from the API, a header of type gzip and application/x-protobuf is set that helps to convert the unencrypted blob data to a protobuf and returns a .pbf file that gets downloaded.
Now when I try to get the encrypted data from the API with the same header as the non encrypted on, the download of the .pbf fails saying Failed - Network error. I realized that it's being caused as the header application/x-protobuf is trying to package the file into a .pbf while the contents of the blob might not be matching what's expected and hence the result.
I removed the header application/x-protobuf and since I can't gzip now, i removed the header of gzip too. Now the data gets displayed on the chrome browser instead of being downloaded, I figure as now it's just a random response.
The question is, How can I make it to send a .pbf that has encrypted data in it and this((mapbox-vector-tile-cs)) library can parse the data? I know the data will be need to be decrypted first before I pass it for parsing assuming that it's decrypted and I have the data that was stored into the blob of the .mbtiles.
This Library with a UWP project (Question 2)
So now currently as mentioned above (since i don't have a solution to the headers part) I removed the headers and let the API return me a direct response.
The Issue now I am facing is that when I pass in the decryted (I checked the decryption was successful and the decrypted data is an exact match to the what was stored in the Blob) Blob data to the
var layerInfos = VectorTileParser.Parse(stream);
code line returns me an IEnumerable<Tile> that is not null but has 0 layers in it. while the actual tile contains 5 layers in it.
My Question is, how do I get this((mapbox-vector-tile-cs)) library to return me the layers.
The code to fetch the tile from the server and decrypt before I send it for parsing is as below:
//this code downloads the tile, layerInfos is returned as an empty collection
private async Task<bool> ProcessTile(TileData t, int xOffset, int yOffset)
{
var stream = await GetTileFromWeb(EncryptedTileURL,true);
if (stream == null)
return false;
var layerInfos = VectorTileParser.Parse(stream);
if (layerInfos.Count == 0)
return false;
return true;
}
The tiles are fetched from the server using a GetTileFromWeb() method:
private async Task<Stream> GetTileFromWeb(Uri uri, bool GetEnc = false)
{
var handler = new HttpClientHandler();
if (!GetEnc)
handler.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
var gzipWebClient = new HttpClient(handler);
var bytes = gzipWebClient.GetByteArrayAsync(uri).Result;
if (GetEnc)
{
var decBytes = await DecryptData(bytes);
return decBytes;
}
var stream = new MemoryStream(bytes);
return stream;
}
PS: Sorry for such a long question, I am not used to such elaborate detail, but seemed I need to share more as Encryption is my forte while map data vector tiles isn't.

Out Of Memory Exception in Foreach

I am trying to create a function that will retrieve all the uploaded files (which are now saved as byte in the database) and download it in a single zip file. I currently have 6000 files to download (and the number could grow).
The functionality is already working (from retrieval to download) if I limit the number of files being downloaded, otherwise, I get an OutOfMemoryException on the ForEach loop.
Here's a pseudo code: (files variable is a list of byte array and file name)
var files = getAllFilesFromDB();
foreach (var file in files)
{
var tempFilePath = Path.Combine(path, filename);
using (FileStream stream = new FileStream(tempfileName, FileMode.Create, FileAccess.ReadWrite))
{
stream.Write(file.byteArray, 0, file.byteArray.Length);
}
}
private readonly IEntityRepository<File> fileRepository;
IEnumerable<FileModel> getAllFilesFromDb()
{
return fileRepository.Select(f => new FileModel(){ fileData = f.byteArray, filename = f.fileName});
}
My question is, is there any other way to do this to avoid getting such errors?
To avoid this problem, you could avoid loading all the contents of all the files in one go. Most likely you will need to split your database call in to two database calls.
Retrieve a list of all the files without their contents but with some identifier - like the PK of the table.
A method which retrieves the contents of an individual file.
Then your (pseudo)code becomes
get list of all files
for each file
get the file contents
write the file to disk
Another possibility is to alter the way your query works currently, so that it uses deferred execution - this means it will not actually load all the files at once, but stream them one at a time from the database - but without seeing more code from your repository implementation, I cannot/ will not guess the right solution for you.

Decompressing a Zip file from a string

I'm fetching an object from couchbase where one of the fields has a file. The file is zipped and then encoded in base64.
How would I be able to take this string and decompress it back to the original file?
Then, if I'm using ASP.MVC 4 - How would I send it back to the browser as a downloadable file?
The original file is being created on a Linux system and decoded on a Windows system (C#).
You should use Convert.FromBase64String to get the bytes, then decompress, and then use Controller.File to have the client download the file. To decompress, you need to open the zip file using some sort of ZIP library. .NET 4.5's built-in ZipArchive class should work. Or you could use another library, both SharpZipLib and DotNetZip support reading from streams.
public ActionResult MyAction()
{
string base64String = // get from Linux system
byte[] zipBytes = Convert.FromBase64String(base64String);
using (var zipStream = new MemoryStream(zipBytes))
using (var zipArchive = new ZipArchive(zipStream))
{
var entry = zipArchive.Entries.Single();
string mimeType = MimeMapping.GetMimeMapping(entry.Name);
using (var decompressedStream = entry.Open())
return File(decompressedStream, mimeType);
}
}
You'll also need the MIME type of the file, you can use MimeMapping.GetMimeMapping to help you get that for most common types.
I've used SharpZipLib successfully for this type of task in the past.
For an example that's very close to what you need to do have a look here.
Basically, the steps should be something like this:
you get the compressed input as a string from the database
create a MemoryStream and write the string to it
seek back to the beginning of the memory stream
use the MemoryStream as an input to the SharpZipLib ZipFile class
follow the example provided above to unpack the contents of the ZipFile
Update
If the string contains only the zipped contents of the file (not a full Zip archive) then you can simply use the GZipStream class in .NET to unzip the contents. You can find a sample here. But the initial steps are the same as above (get string from db, write to memory stream, feed memory stream as input to the GZipStream to decompress).

file copy that won't change file hash

I'm having trouble copying a file and then verifying the integrity of the file afterward. I've tried every file copying method I can think of (File.Copy, filestreams, trying to do a binary copy) but the file hash is always different after the copy. I've been searching around and I notice a lot of people saying that copying a file from a network share can cause this but I get the same results from shares as I do just straight from my hard drive.
//File hashing method:
private byte[] hashFile(string file)
{
try
{
byte[] sourceFile = ASCIIEncoding.ASCII.GetBytes(file);
byte[] hash = new MD5CryptoServiceProvider().ComputeHash(sourceFile);
return hash;
...
Using this method the origional file and the copied file always produce the same hash (individually) through every run but the two hashes are not the same. Does anyone know of a way to copy files without changing the file hash?
I Think you are Hashing the FileName .. and not Content !
so sure it wont compute as same!
check the Value and Length of file and byte[] sourceFile
It seems you are passing the filename instead of the file contents to the hash function.
Use something like this:
byte[] hash = md5.ComputeHash(File.ReadAllBytes(filename));
Or this:
using (var stream = File.Open(filename)) {
byte[] hash = md5.ComputeHash(stream);
}

Categories