Download only files with specified extensions from Azure Git repository

Download only files with specified extensions from Azure Git repository - c#

I need to download files with specified extensions from Azure Git repository programmatically (C#, .NET Framework 4.8). Files are located in different server folders and sometimes in different branches. I was able to achieve this goal with the following code:
var connection = new VssConnection(new Uri(collectionUri), new VssBasicCredential("", personalAccessToken));
using (var ghc = connection.GetClient<GitHttpClient>())
{
string[] extensionsToDownload = { "xml", "xslt" }; // just for example, real cases are not limited to these extensions
string branch = "my-branch-name";
GitVersionDescriptor version = new GitVersionDescriptor { Version = branch };
// Get all items information
GitItemRequestData data = new GitItemRequestData
{
ItemDescriptors = new []
{
new GitItemDescriptor
{
Path = "/Some/Path",
RecursionLevel = VersionControlRecursionType.Full,
VersionType = GitVersionType.Branch,
Version = branch
},
new GitItemDescriptor
{
Path = "/Another/Path",
RecursionLevel = VersionControlRecursionType.Full,
VersionType = GitVersionType.Branch,
Version = branch
}
}
};
var items = ghc.GetItemsBatchAsync(data, project: projectName, repositoryId: repoName).Result;
// filter returned items by extension
List<GitItem> filteredItems = items
.SelectMany(item => item)
.Where(item => item.GitObjectType == GitObjectType.Blob && extensionsToDownload.Contains(item.Path.Split('.').Last())).ToList();
// download zipped items and extract
foreach (var item in filteredItems)
{
using (var stream = ghc.GetItemZipAsync(
project: projectName, repositoryId: repoName, path: item.Path, includeContent: true, versionDescriptor: version).Result)
{
ZipArchive archive = new ZipArchive(stream);
foreach (ZipArchiveEntry entry in archive.Entries)
{
entry.ExtractToFile(Path.Combine(localFolder, entry.FullName.Trim('/')), true);
}
}
}
However, it means that each item requires a separate API call. Definitely not good for performance. I thought "There should be a way to batch download all items at once".
GetBlobsZipAsync method seemed like what I exactly needed. However, attempt to use it had failed miserably. All I got was VssUnauthorizedException: 'VS30063: You are not authorized to access https://dev.azure.com'. Very strange because calling GetBlobZipAsync for each individual item id works perfectly (but it's almost the same as initial solution with the same far-from-ideal performance).
Dictionary<string, string> idToNameMappings = filteredItems.ToDictionary(k => k.ObjectId, v => Path.Combine(localFolder, v.Path.Trim('/')));
foreach (var item in filteredItems)
{
using (var stream = ghc.GetBlobsZipAsync(idToNameMappings.Select(i => i.Key), project: projectName, repositoryId: repoName).Result)
{
ZipArchive archive = new ZipArchive(stream);
foreach (ZipArchiveEntry entry in archive.Entries)
{
entry.ExtractToFile(entry.FullName, idToNameMappings[entry.FullName], true);
}
}
}
Another option is to download all items as zip archive and filter it on the client side:
foreach (var desc in data.ItemDescriptors)
{
using (var stream = ghc.GetItemZipAsync(projectName, repoName, null, desc.Path, desc.RecursionLevel, versionDescriptor:version).Result)
{
ZipArchive archive = new ZipArchive(stream);
foreach (ZipArchiveEntry entry in archive.Entries)
{
if (extensionsToDownload.Contains(entry.FullName.Split('.').Last()))
{
entry.ExtractToFile(Path.Combine(localFolder, entry.FullName.Trim('/')), true);
}
}
}
}
But it's even worse because the repository contains a large amount of data files (including some binary content). Downloading several hundreds MB of data to get less than 10 MB of xml files doesn't seem to be very efficient.
So at the moment I gave up and decided to stick with initial solution. But maybe there's something I overlooked?

Related

How to validate multi part compressed (i.e zip) files have all parts or not in C#?

I want to validate multipart compressed files like Zip because when any part missing for compressed files then it raises an error, but I want to validate it before extraction and different software creates a different naming structure.
I also refer one DotNetZip related questions.
The below screenshot is from 7z software.
And the second screenshot is from DotNetZip from C#.
One more thing is that I also want to test that it's also corrupted or not like 7z software. Please refer below screenshot for my requirements.
Please help me with these issues.

I am not sure if you will be able to see the exact error as shown in your snapshot. But I have a code which may help you to find if the multipart file is readble.
I have used nuget Package CombinationStream.
The ZipArchive constructor throws ArgumentException or InvalidDataException if the stream is not readable.
Below is the code:
public static bool IsZipValid()
{
try
{
string basePath = #"C:\multi-part-zip\";
List<string> files = new List<string> {
basePath + "somefile.zip.001",
basePath + "somefile.zip.002",
basePath + "somefile.zip.003",
basePath + "somefile.zip.004",
basePath + "somefile.zip.005",
basePath + "somefile.zip.006",
basePath + "somefile.zip.007",
basePath + "somefile.zip.008"
};
using (var zipFile = new ZipArchive(new CombinationStream(files.Select(x => new FileStream(x, FileMode.Open) as Stream).ToList()), ZipArchiveMode.Read))
{
// Do whatever you want
}
}
catch(InvalidDataException ex)
{
return false;
}
return true;
}
I am not sure if this is what you are looking for or you need more details in the error. But hope this helps you to come to solution of your issue.

From your comments I understood that the issue you have is to identify the files (get the list of parts belonging together). You can get a list of files like
List<string> files = System.IO.Directory.EnumerateFiles(#"D:\Zip\ForExtract\multipart\",
"500mbInputData.*", SearchOption.TopDirectoryOnly).OrderBy(x => x).ToList();
or for your second case
List<string> files = System.IO.Directory.EnumerateFiles(#"D:\Zip\ForExtract\multipart\",
"500mbInputData.zip.*", SearchOption.TopDirectoryOnly).OrderBy(x => x).ToList();
and then use the file list in your CombinationStream. The rest of the code would look like Manoj Choudhari wrote. You could also put the path and the file name with wild card into a parameter, so I'd suggest to add the following parameters to the function:
public static bool IsZipValid(string basePath, string fileNameWithWildcard)
{
try
{
List<string> files = System.IO.Directory.EnumerateFiles(
basePath, fileNameWithWildcard,
SearchOption.TopDirectoryOnly).OrderBy(x => x).ToList();
using (var zipFile = // ... rest is as Manoj wrote
and use it like:
if (IsZipValid(#"D:\Zip\ForExtract\multipart\", "500mbInputData.*")) { // ... }
or
if (IsZipValid(#"D:\Zip\ForExtract\multipart\", "500mbInputData.zip.*")) { // ... }
To find out which kind of files you have in the basepath, you could write a helper function like
List<string> getZipFormat(string path)
{
bool filesFound(string basePath, string pattern) => System.IO.Directory.EnumerateFiles(
basePath, pattern, SearchOption.TopDirectoryOnly).Any();
var isTar = filesFound(path, "*.tar.???");
var isZip = filesFound(path, "*.z??");
var is7Zip = filesFound(path, "*.7z.???");
var result = new List<string>();
if (isTar) result.Add("TAR");
if (isZip) result.Add("ZIP");
if (is7Zip) result.Add("7ZIP");
return result;
}
Modify it to your needs - it will return a list of strings containing "TAR", "ZIP" or "7ZIP" (or more than one of them), depending on the patterns matching against the files in the base directory.
Usage (example for multi-zipformat check):
var isValid = true;
var basePath = #"D:\Zip\ForExtract\multipart\";
foreach(var fmt in getZipFormat(basePath))
switch (fmt)
{
case "TAR":
isValid = isValid & IsZipValid(basePath, "500mbInputData.tar.*");
break;
case "ZIP":
isValid = isValid & IsZipValid(basePath, "500mbInputData.zip.*");
break;
case "7ZIP":
isValid = isValid & IsZipValid(basePath, "500mbInputData.7z.*");
break;
default:
break;
}
Note: As per my experiments with this, it could happen that the files remain open although your program has ended - meaning your files will still be locked the next time you run your code. So, I'd strongly suggest to explicitly close them, like
var fStreams = files.Select(x =>
new FileStream(x, FileMode.Open) as System.IO.Stream).ToList();
using (var cStream = new CombinationStream(fStreams))
using (var zipFile = new ZipArchive(cStream, ZipArchiveMode.Read))
{
// Do whatever you want...
// ... but ensure you close the files
fStreams.Select(s => { s.Close(); return s; });
};

How to delete all empty links to files from DB and all redundant files

I am building web app, which can store images. My DB stores paths to this images and all of them are stored in specific directory. How can I delete all files from download folder, which do not exist in DB, and all DB records, which have empty links?
For example, I have 3 files: File1.jpg, File2.jpg, File3.jpg.
My DB stores only File1.jpg and File2.jpg. For some reasons File1.jpg was deleted from directory but it's records still remain in DB. What is the best way to delete File3.jpg from folder(as it is not stored in DB) and File1.jpg from DB(as it does not exist in folder)?
I have written a method to delete files, which are not stored in DB:
public async Task DeleteNonExistingImagesInFolder(string imagesDirectory)
{
var images = _unitOfWork.Images.AsQueryable();
DirectoryInfo d = new DirectoryInfo(imagesDirectory);
FileInfo[] Files = d.GetFiles();
await Task.Run(() =>
{
foreach (var file in Files)
{
if (!images.Where(i => i.Path == file.FullName).Any())
file.Delete();
}
});
}
I have done the same thing for DB records:
public async Task DeleteNonExistingImagesInDB(string imagesDirectory)
{
var images = _unitOfWork.Images.AsQueryable();
DirectoryInfo d = new DirectoryInfo(imagesDirectory);
FileInfo[] Files = d.GetFiles();
await Task.Run(() =>
{
foreach (var image in images)
{
if (!Files.Where(f => f.FullName == image.Path).Any())
_unitOfWork.Images.Remove(image.Id);
}
});
}
But maybe there is a faster approach.

Something like this is pretty efficient and is done in a short bit of code. This just detects the changes you want from 2 collections and is a working example. See the end of the answer for some hints on what you will need to change for your implementation.
IEnumerable<string> files = new List<string> { "file1.txt", "file4.txt" };
IEnumerable<string> dbFiles = new List<string> { "file1.txt", "file2.txt", "file3.txt" };
IEnumerable<string> addsToFileSystem = files.Except(dbFiles);
IEnumerable<string> addsToDb = dbFiles.Except(files);
foreach (string file in addsToFileSystem) {
Console.WriteLine($"delete {file} from file system");
}
foreach (string file in addsToDb) {
Console.WriteLine($"delete {file} from db");
}
Output:
delete file4.txt from file system
delete file2.txt from db
delete file3.txt from db
// get collection of files from "my files" directory and select just the file name
IEnumerable<string> files = Directory.EnumerateFiles("my files").Select(x => Path.GetFileName(x))
// replace with selecting the file names from your database
IEnumerable<string> dbFiles = _unitOfWork.Images.Select(x => x..FileName);
IEnumerable<string> addsToFileSystem = files.Except(dbFiles);
IEnumerable<string> addsToDb = dbFiles.Except(files);
foreach (string file in addsToFileSystem) {
// remove from file system
}
foreach (string file in addsToDb) {
// remove from db
}

Roslyn / MSBuildWorkspace Does Not Load Documents on Non-Dev Computer

MSBuildWorkspace does now show documents on computers other than the one I'm using to build the application.
I have seen Roslyn workspace for .NET Core's new .csproj format, but in my case, it works fine on the development computer (Documents is populated), but it is empty for the same project on a different computer.
So I am not sure why it would work on my development computer, but not on my other computer...?
This is the code :
public static IReadOnlyList<CodeFile> ReadSolution(string path)
{
List<CodeFile> codes = new List<CodeFile>();
using (MSBuildWorkspace workspace = MSBuildWorkspace.Create())
{
var solution = workspace.OpenSolutionAsync(path).Result;
foreach (var project in solution.Projects)
{
//project.Documents.Count() is 0
foreach (var doc in project.Documents)
{
if (doc.SourceCodeKind == SourceCodeKind.Regular)
{
StringBuilder sb = new StringBuilder();
using (var sw = new StringWriter(sb))
{
var source = doc.GetTextAsync().Result;
source.Write(sw);
sw.WriteLine();
}
codes.Add(new CodeFile(doc.FilePath, sb.ToString()));
}
}
}
}
return codes;
}

It turns out that the newer versions of MSBuildWorkspace no long throw exceptions on failures, instead raising an event, WorkspaceFailure. (https://github.com/dotnet/roslyn/issues/15056)
This indicated that the required build tools (v15 / VS2017) were not available. The new 2017 Build Tools installer is both too complicated for our end users and much slower to install than the 2015 build tools installer was.
Instead, I came up with this less robust method to avoid that dependency:
public static IReadOnlyList<CodeFile> ReadSolution(string path)
{
List<CodeFile> codes = new List<CodeFile>();
var dirName = System.IO.Path.GetDirectoryName(path);
var dir = new DirectoryInfo(dirName);
var csProjFiles = dir.EnumerateFiles("*.csproj", SearchOption.AllDirectories);
foreach(var csProjFile in csProjFiles)
{
var csProjPath = csProjFile.Directory.FullName;
using (var fs = new FileStream(csProjFile.FullName, FileMode.Open, FileAccess.Read))
{
using (var reader = XmlReader.Create(fs))
{
while(reader.Read())
{
if(reader.Name.Equals("Compile", StringComparison.OrdinalIgnoreCase))
{
var fn = reader["Include"];
var filePath = Path.Combine(csProjPath, fn);
var text = File.ReadAllText(filePath);
codes.Add(new CodeFile(fn,text));
}
}
}
}
}
return codes;
}

I had same problem
I was using a .Net Core Console App.
I noticed that Microsoft.CodeAnalysis.Workspaces.MSBuild had the warning:
Package Microsoft.CodeAnalysis.Workspaces.MSBuild 2.10.0 was restored using '.Net Framework,version=v4.6.1' instead of the project target framework '.NetCoreApp,Version=v2.1'. this package may not be fully compatible with your project.
I changed to a .Net Framework Console App and I could access the documents.

Moving a SharePoint folder and contents to different location in same Document Library

I'm looking for a way to move a folder and all it's contents to a different location in the same library using the Client Object Model for SharePoint 2010 (C#).
For example we have a folder for a project (say 12345) and it's URL is
http://sharepoint/site/library/2012/12345
where 2012 represents a year. I'd like to programmatically move the 12345 folder to a different year, say 2014 which probably exists already but may not.
I've searched around but the solutions I'm getting seem extremely complicated and relevant to moving folders to different site collections, I'm hoping because it's in the same library there might be a simpler solution? One idea I have is to rely on Explorer View instead of CSOM?
Thanks a lot!

There is no built-in method in SharePoint CSOM API for moving Folder with Files from one location into another.
The following class represents how to move files from source folder into destination folder:
public static class FolderExtensions
{
public static void MoveFilesTo(this Folder folder, string folderUrl)
{
var ctx = (ClientContext)folder.Context;
if (!ctx.Web.IsPropertyAvailable("ServerRelativeUrl"))
{
ctx.Load(ctx.Web, w => w.ServerRelativeUrl);
}
ctx.Load(folder, f => f.Files, f => f.ServerRelativeUrl, f => f.Folders);
ctx.ExecuteQuery();
//Ensure target folder exists
EnsureFolder(ctx.Web.RootFolder, folderUrl.Replace(ctx.Web.ServerRelativeUrl, string.Empty));
foreach (var file in folder.Files)
{
var targetFileUrl = file.ServerRelativeUrl.Replace(folder.ServerRelativeUrl, folderUrl);
file.MoveTo(targetFileUrl, MoveOperations.Overwrite);
}
ctx.ExecuteQuery();
foreach (var subFolder in folder.Folders)
{
var targetFolderUrl = subFolder.ServerRelativeUrl.Replace(folder.ServerRelativeUrl,folderUrl);
subFolder.MoveFilesTo(targetFolderUrl);
}
}
public static Folder EnsureFolder(Folder parentFolder, string folderUrl)
{
var ctx = parentFolder.Context;
var folderNames = folderUrl.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries);
var folderName = folderNames[0];
var folder = parentFolder.Folders.Add(folderName);
ctx.Load(folder);
ctx.ExecuteQuery();
if (folderNames.Length > 1)
{
var subFolderUrl = string.Join("/", folderNames, 1, folderNames.Length - 1);
return EnsureFolder(folder, subFolderUrl);
}
return folder;
}
}
Key points:
allows to ensure whether destination folder(s) exists
In case of nested folders, its structure is preserved while moving files
Usage
var srcFolderUrl = "/news/pages";
var destFolderUrl = "/news/archive/pages";
using (var ctx = new ClientContext(url))
{
var sourceFolder = ctx.Web.GetFolderByServerRelativeUrl(srcFolderUrl);
sourceFolder.MoveFilesTo(destFolderUrl);
sourceFolder.DeleteObject(); // delete source folder if nessesary
ctx.ExecuteQuery();
}

Just in case someone needs this translated to PnP PowerShell. It's not battle tested but works for me. Versions and metadata moved as well within the same library.
$list = Get-PnPList -Identity Documents
$web = $list.ParentWeb
$folder = Ensure-PnPFolder -Web $list.ParentWeb -SiteRelativePath "Shared Documents/MoveTo"
$tofolder = Ensure-PnPFolder -Web $list.ParentWeb -SiteRelativePath "Shared Documents/MoveTwo"
function MoveFolder
{
[cmdletbinding()]
Param (
$web,
$fromFolder,
$toFolder
)
$fromFolder.Context.Load($fromFolder.Files)
$fromFolder.Context.Load($fromFolder.Folders)
$fromFolder.Context.ExecuteQuery()
foreach ($file in $fromFolder.Files)
{
$targetFileUrl = $file.ServerRelativeUrl.Replace($fromFolder.ServerRelativeUrl, $toFolder.ServerRelativeUrl);
$file.MoveTo($targetFileUrl, [Microsoft.SharePoint.Client.MoveOperations]::Overwrite);
}
$fromFolder.Context.ExecuteQuery();
foreach ($subFolder in $fromFolder.Folders)
{
$targetFolderUrl = $subFolder.ServerRelativeUrl.Replace($fromFolder.ServerRelativeUrl, $toFolder.ServerRelativeUrl);
$targetFolderRelativePath = $targetFolderUrl.SubString($web.RootFolder.ServerRelativeUrl.Length)
$tofolder = Ensure-PnPFolder -Web $list.ParentWeb -SiteRelativePath $targetFolderRelativePath
MoveFolder -Web $web -fromFolder $subFolder -toFolder $tofolder
}
}
$web.Context.Load($web.RootFolder)
$web.Context.ExecuteQuery()
MoveFolder -Web $web -fromFolder $folder -toFolder $tofolder
$folder.DeleteObject()
$web.Context.ExecuteQuery()

How to clear browser cache programmatically?

I try to clear the firefox 8 browser cache using programmatically. I am developing as a site using asp.net, I need to clear the browser cache for the security reason. I tried many ways to clear the cache but none seems to work. Any ideas?

Yes you can do it But........
You can't clear a browser's history via code because of browsers security reasons.
But you can delete all the files and folders under browsers "cache" directory using
file operation.
eg. Mozilla's default cache location(hidden) is
"..AppData\Local\Mozilla\Firefox\Profiles\2nfq77n2.default\Cache"
How to delete all files and folders in a directory?
try it!

I don't think this would be possible due to security reasons .
At max you can set HTTP header to tell the browser not to chache your pages like this :
Cache-Control: no-cache

It is not possible to clear browser's cache programmatically, however you can stop caching from your application.
Below code will help you for disabling caching and clears existing cache from your application:
public static void DisablePageCaching()
{
//Used for disabling page caching
HttpContext.Current.Response.Cache.SetExpires(DateTime.UtcNow.AddDays(-1));
HttpContext.Current.Response.Cache.SetValidUntilExpires(false);
HttpContext.Current.Response.Cache.SetRevalidation(HttpCacheRevalidation.AllCaches);
HttpContext.Current.Response.Cache.SetCacheability(HttpCacheability.NoCache);
HttpContext.Current.Response.Cache.SetNoStore();
}

Use this code (C#):
public static void DeleteFirefoxCache()
{
string profilesPath = #"Mozilla\Firefox\Profiles";
string localProfiles = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData), profilesPath);
string roamingProfiles = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData), profilesPath);
if (Directory.Exists(localProfiles))
{
var profiles = Directory.GetDirectories(localProfiles).OfType<string>().ToList();
profiles.RemoveAll(prfl => prfl.ToLowerInvariant().EndsWith("geolocation")); // do not delete this profile.
profiles.ForEach(delegate(string path)
{
var files = Directory.GetFiles(path, "*.*", SearchOption.AllDirectories).ToList<string>();
foreach (string file in files)
{
if (!Common.IsFileLocked(new FileInfo(file)))
File.Delete(file);
}
});
}
if (Directory.Exists(roamingProfiles))
{
var profiles = Directory.GetDirectories(roamingProfiles).OfType<string>().ToList();
profiles.RemoveAll(prfl => prfl.ToLowerInvariant().EndsWith("geolocation")); // do not delete this profile.
profiles.ForEach(delegate(string path)
{
var dirs = Directory.GetDirectories(path, "*", SearchOption.AllDirectories).OfType<string>().ToList();
dirs.ForEach(delegate(string dir)
{
var files = Directory.GetFiles(dir, "*.*", SearchOption.AllDirectories).ToList<string>();
foreach (string file in files)
{
if (!Common.IsFileLocked(new FileInfo(file)))
File.Delete(file);
}
});
var files0 = Directory.GetFiles(path, "*", SearchOption.TopDirectoryOnly).OfType<string>().ToList();
files0.ForEach(delegate(string file)
{
if (!Common.IsFileLocked(new FileInfo(file)))
File.Delete(file);
});
});
}
}

My solution:
string UserProfile = Environment.GetFolderPath(Environment.SpecialFolder.UserProfile);
try
{
string id = string.Empty;
var lines = File.ReadAllLines($#"{UserProfile}\AppData\Roaming\Mozilla\Firefox\profiles.ini");
foreach (var line in lines)
{
if (line.Contains("Path=Profiles/"))
{
var text = line.Replace("Path=Profiles/", "");
id = text.Trim();
}
}
Array.ForEach(Directory.GetFiles($#"{UserProfile}\AppData\Local\Mozilla\Firefox\Profiles\{id}\cache2\entries"), File.Delete);
}
catch { }

In asp.net/ c# you can trigger this.
string cacheKey = "TestCache";
//Add cache
Cache.Add(cacheKey, "Cache content", null, DateTime.Now.AddMinutes(30),
TimeSpan.Zero, CacheItemPriority.High, null);
Cache.Remove(cacheKey); //C# clear cache

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Download only files with specified extensions from Azure Git repository - c#

Related

How to validate multi part compressed (i.e zip) files have all parts or not in C#?

How to delete all empty links to files from DB and all redundant files

Roslyn / MSBuildWorkspace Does Not Load Documents on Non-Dev Computer

Moving a SharePoint folder and contents to different location in same Document Library

How to clear browser cache programmatically?

Categories

Resources