Different default identity for dynamically created app pools on Azure

by ingvar 13. June 2012 20:40

If you are creating app pools dynamically on Windows Azure, be sure to specify the desired identity. The reason for doing this is that, if you deploy with osFamily="1" the identity will be NetworkService, if you deploy with osFamily="2" the identity of the app pool will be ApplicationPoolIdentity. The latter might not work for your setup!

The reason for this difference is that osFamily="1" will give you a Windows 2008 server SP2. Using osFamily="2" will give you a Windows 2008 R2 and the two versions of Windows behaves differently when creating website/application pools.

Here is the code for creating an application pool where the identity is not left to a default vaule:

using (ServerManager serverManager = new ServerManager())
{
   ApplicationPool applicationPool = serverManager.ApplicationPools.Add("MyName");
   applicationPool.AutoStart = true;
   applicationPool.ManagedPipelineMode = ManagedPipelineMode.Integrated;
   applicationPool.ManagedRuntimeVersion = "v4.0";  
   /* This line is very important! */
   applicationPool.ProcessModel.IdentityType = ProcessModelIdentityType.ApplicationPoolIdentity;
   serverManager.CommitChanges();
}

Tags:

Auto updating webrole

by ingvar 20. February 2012 21:17

While working on a cool Windows Azure deployment package for Composite C1 I did a lot of deployments. The stuff I did, like reconfiguring the IIS, needed testing on an actual Azure hosted service and not just the emulator. Always trying to optimize, I thought if ways to get around all this redeploying and came up with the idea of making it possible to change the WebRole behavior at run time. Then I could just “inject” a new WebRole and start testing without having to wait on a new deployment. After some fiddling around I found a really nice solution and will present it here!

Solution description

The solution I came up with was dynamically loading an assembly into a newly created AppDomain and calling methods within on an instance of a class in the assembly.
This is a fairly simple task and all the code needed is shown here:

/* Creating domain */
AppDomainSetup domainSetup = new AppDomainSetup();
domainSetup.PrivateBinPath = folder;
domainSetup.ApplicationBase = folder;
AppDomain appDomain = AppDomain.CreateDomain(“MyAssembly”, null, domainSetup);

/* Creating remove proxy object */
IDynamicWebRole dynamicWebRole =
   (IDynamicWebRole)appDomain.CreateInstanceAndUnwrap(
      “MyAssembly”, 
      “MyAssembly.MyDynamicWebRole, MyAssembly”);

/* Calling method */
dynamicWebRole.Run();

 

Common interface: IDynamicWebRole

The code for the IDynamicWebRole interface is in its own little assembly. And the code shown here and can be changed as you wish.

interface IDynamicWebRole
{
    void Run();
}

There is no actual need for an interface, but both the Azure WebRole project and the assembly project that contains the actuall IDynamicWebRole implementation needs to share a type. So that’s why I created this interface and put it in its own assembly. The assemblies/projects in play is shown in this figure:

Now its time to look at the more interesting code in the WebRole. It’s where all the magic is going on!

The WebRole implementation

The WebRole implementation is rather complex. The WebRole needs to periodicly look for new versions of the IDynamicWebRole implementation and when there is a new version, download it, start new AppDomain and create a remote instance of the IDynamicWebRole implementation. Here is all the code for the WebRole. After the code, I will go into more detail on how this works.

public class WebRole : RoleEntryPoint
{
    /* Initializes these */
    private readonly CloudBlobClient _client;
    private readonly string _assemblyBlobPath = 
        "mycontainer/MyAssembly.dll";
    private readonly string _dynamicWebRoleHandlerTypeFullName = 
        "MyAssembly.MyDynamicWebRole, MyAssembly";

    private AppDomain _appDomain = null;
    private IDynamicWebRole _dynamicWebRole;

    private volatile bool _keepRunning = true;
    private DateTime _lastModifiedUtc = DateTime.MinValue;


    public override void Run()
    {
       int tempFolderCounter = 0;

       while (_keepRunning)
       {
           CloudBlob assemblyBlob = 
               _client.GetBlobReference(_assemblyBlobPath);
           DateTime lastModified = assemblyBlob.Properties.LastModifiedUtc;

           if (lastModified > _lastModifiedUtc)
           {
               /* Stop running appdomain */
               if (_appDomain != null)
               {
                   AppDomain.Unload(_appDomain);
                   _appDomain = null;
               }

               /* Create temp folder */
               string folder = Path.Combine(
                   AppDomain.CurrentDomain.BaseDirectory, 
                   tempFolderCounter.ToString());
               tempFolderCounter++;
               Directory.CreateDirectory(folder);

               /* Copy needed assemblies to the folder */
               File.Copy("DynamicWebRole.dll", 
                   Path.Combine(folder, "DynamicWebRole.dll"), true);

               File.Copy("Microsoft.WindowsAzure.StorageClient.dll", 
                   Path.Combine(folder, 
                       "Microsoft.WindowsAzure.StorageClient.dll"), true);

               /* Download from blob */
               string filename = 
                   _assemblyBlobPath.Remove(0, _assemblyBlobPath.LastIndexOf('/') + 1);
               string localPath = Path.Combine(folder, filename);
               assemblyBlob.DownloadToFile(localPath);

               string assemblyFileName = 
                   Path.GetFileNameWithoutExtension(localPath);

               /* Create new appdomain */
               AppDomainSetup domainSetup = new AppDomainSetup();
               domainSetup.PrivateBinPath = folder;
               domainSetup.ApplicationBase = folder;
               _appDomain = 
                  AppDomain.CreateDomain(assemblyFileName, null, domainSetup);

               /* Create IDynamicWebRole proxy instance for remoting */
               _dynamicWebRole = 
                   (IDynamicWebRole)_appDomain.CreateInstanceAndUnwrap(
                       assemblyFileName, _dynamicWebRoleHandlerTypeFullName);
                       
               /* Start the dynamic webrole in other thread */
               /* so we can continue testing for new assebmlies */
               /* Thread will end when the appdomain is unloaded by us */
               new Thread(() => _dynamicWebRole.Run()).Start();

               _lastModifiedUtc = lastModified;
           }
           
           Thread.Sleep(30 * 1000);
       }
    }


    public override void OnStop()
    {
       _keepRunning = false;
    }
}

I have omitted all the exception handling to make the code more readable an easier to understand.

IDynamicWebRole implementation

The last thing we need is to implement the IDynamicWebRole interface and have it in its own assembly. There is two important things when implementing the interface for the remoting to work and that is implementing MarshalByRefObject class and overriding the InitializeLifetimeService method. This is shown in the following code:

public class MyDynamicWebRole : MarshalByRefObject, IDynamicWebRole
{
    public void Run()
    {
       /* Put your webrole implementation here */
    }

    public override object InitializeLifetimeService()
    {
       /* This is needed so the proxy dont get recycled */
       return null;
    }
}

Thats all there is to it, enjoy! :)

Tags:

.NET | Azure | C#

Bitbucket setup on OSX Lion using Mercurial

by ingvar 12. February 2012 08:05

Here is a fast guide to getting Mercurial and bitbucket working from the terminal in OSX Lion, enjoy :)

1. Download and install Mercurial

Download binaries from http://mercurial.selenic.com/ and install.

2. Creating Mercurial configuration

Create a .hgrc filer in your home folder: ~/.hcrg and add the following lines. Change the name and email to your name and email used ont butbucket.

[ui]
username = Jon Doe <jon@doe.com>
ssh = ssh -C

3. Configure SSH 

3.1: Set up your default identity by issuing the following command and entering a proper password

ssh-keygen

3.2: Copy the public key to your clipboard using the following command

cat ~/.ssh/id_rsa.pub | pbcopy

3.3 Add public key to bitbucket by browsing to Account -> SSH Keys and past the public key in the textbox left to the 'Add key' button. Then click the 'Add key' button.

4. All done!

Rember to use the SSH URL when cloning the repository and not the HTTPS url.

Tags:

Abstract for Danish Developer Conference 2012

by ingvar 18. January 2012 10:59

Here is my abstract for the speach I'm going to give at Danish Developer Conference 2012.

Danish version:

Skalering og monitorering på Azure - Abstract

Den 14. oktober sidste år blev Kulturnattens officielle website besøgt af over 60.000 unikke besøgende og over 500.000 side visninger på et døgn, mens selve Kulturnatten fandt sted. På trods af den massive trafik lykkedes det at lave et website, som performede godt ved hjælp af Windows Azure. Azure’s udskalering kombineret med vores monitorering gjorde os i stand til at skrue op eller ned for antallet af maskiner i løbet af dagen, så siden altid performede, som den skulle.

Med nye cloud services som Windows Azure er det blev meget billigere at håndtere mange besøgende på et website ved hjælp af udskalering på hardware. Det skaber dog et helt nyt sæt udfordringer, bland andet at skrive website logikken, så det kan håndtere at køre på flere maskiner på samme tid.

Med udgangspunkt i Composite’s erfaringer med lanceringen af Kulturnatten.dk vil jeg i min præsentation kigge på op- vs. udskalering, og hvilke klassiske softwareproblemer der er ved udskalering.  Jeg vil også komme ind på de out-of-the-bog løsninger, der er i Windows Azure, så som Content delivery network (CDN) og traffic manageren, der begge er gode services, man kan bruge ved udskalering.
Til sidst vil runde monitorering, som gør det muligt at skrue op og ned for antallet af maskiner, der håndterer et website. God monitorering gør det muligt at handle i tide og undgå at sitet går ned. Men mindst lige så interessant muliggør det også, at man selv kan justere antallet af servere løbende og derved spare penge i sidste ende.

 

 

English version:

Scaling and monitoring on Azure - Abstract

On 14 October last year Culture Night official website (kulturnatten.dk) visited by over 60,000 unique visitors and over 500,000 page views on the day Culture Night took place. Despite the massive traffic we managed to create a website that performed very well by using Windows Azure. Azure's out-scaling combined with our custom monitoring enabled us to increase or decrease the number of machines during the day, so the site continued performing as it should.

With new cloud services like Windows Azure, it was much cheaper to handle huge amount of visitors coming to a website using out-scaling. But it is an entirely new set of challenges creating a website that can run on multiple machines simultaneously.

Based on the Composite's experience with the launch of kulturnatten.dk I will in my presentation looking at up- vs. out-scaling and the classic software problems that are with out-scaling. I will also touch on the out-of-the-book solutions in Windows Azure, such as Content Delivery Network (CDN) and the Traffic Manager. Both of which are good services that can be used by out-scaling.
Finally I will talk about monitoring, which makes it possible to increase or decrease the number of machines that handle a website in an intelligently way. Good monitoring makes it possible to act in time and avoid the site goes down. But equally interesting monitoring, makes it possible to turn the number of running machines down and thereby saving money in the end.

Tags:

.NET | Azure | C#

Accessing and saving the IIS log from a Windows Azure WebRole

by ingvar 28. September 2011 21:36

A Windows Azure WebRoles may get recycled due to crashes, maintenance or other things. And because a WebRole is a VM, all changed/added files are lost when it gets recycled. This includes the IIS server logs. For many reasons, these servers logs would be nice to save outside the WebRole, so they don’t get lost if/when a WebRole gets recycled. I have found a nice way of saving these logs files to the blob storage. The code could easily be modified to other things with the log files.

The web role needs to have elevated privileges. This is done by adding the following element to the WebRole element in the ServiceDefinition.csdef file.


<Runtime executionContext="elevated" />

And here is the code needed to access the IIS log files:

CloudBlobContainer container = null; /* Initalize this */

Regex regex = new Regex(@"(?<string>\%[^\%]*\%)");

ServerManager serverManager = new ServerManager();

foreach (Site site in serverManager.Sites)
{
/* Resolve enviroment path */
string directory = regex.Replace(
site.LogFile.Directory,
m => Environment.ExpandEnvironmentVariables(m.Groups["string"].Value)
);

if (!Directory.Exists(directory)) continue;

IEnumerable<string> files =
Directory.GetFiles(directory, "*.log", SearchOption.AllDirectories);
foreach (string filePath in )
{
/* Avoid name clashes if we have multi instances */
string blobName =
string.Format("{0}_{1}", site.Name, Path.GetFileName(filePath));

CloudBlob blob = container.GetBlobReference(blobName);

/* Only upload new or written to log files */
if (!blob.Exists() ||
blob.Properties.LastModifiedUtc < File.GetLastWriteTimeUtc(filePath))
{
blob.UploadFile(filePath);
}
}
}

I use an extension method on the blob called Exists. See the implementation for it here.

Tags:

Composite C1 non-live edit multi instance Windows Azure deployment

by ingvar 7. August 2011 19:53

Composite C1 non-live edit multi instance Windows Azure deployment

Introduction

This post is a technical description of how we made a non-live editing multi datacenter, multi instance Composite C1 deployment. A very good overview of the setup can be found here. The setup can be split into three parts. The first one is the Windows Azure Web Role. The second one is the Composite C1 “Windows Azure Publisher” package. And the third part is the Windows Azure Blob Storage. The latter is the common resource shared between the two first and, except for it usage, is self explaining. The rest of this blog post I will describe the first two parts of this setup in more technical detail. This setup also supports the Windows Azure Traffic Manager for handling geo dns and fall-over, which is a really nice feature to put on top!

The non-live edit web role

The Windows Azure deployment package contains an empty website and a web role. The configuration for the package contains a blob connection string, a name for the website blob container, a name for the work blob container and a display name used by the C1 Windows Azure Publisher package. The web role periodically checks the timestamp of a specific blob in the named work blob container. If this specific blob is changed since last synchronization, the web role will start a new synchronization from named website blob container to the local file system. It is a optimized synchronization that I have described in an earlier blog post: How to do a fast recursive local folder to/from azure blob storage synchronization.

Because it is time consuming to downloading new files from the blob, the synchronization is done to a local folder and not the live website. This minimizes the offline time of the website. All the paths of downloaded and deleted files are kept in memory. When the synchronization is done the live website is put offline with the app_offline.htm and all downloaded files are copied to the website and all deleted files are also deleted from the website. After this, the website is put back online. All this extra work is done to keep the offline time as low as possible.

The web role writes its current status (Initialized, Ready, Updating, Stopped, etc) in a xml file located in the named work blob container. During the synchronization (updating), the web role also includes the progress in the xml file. This xml is read by the local C1’s Windows Azure Publisher and displayed to the user. This is a really nice feature because if a web role is located in a datacenter on the other side of the planet, it will take longer time before it is done with the synchronization. And this feature gives the user a full overview of the progress of each web role. See the movie below of how this feature looks like.

All needed for starting a new Azure instance with this, is to create a new blob storage or use and existing one and modified the blob connection string in the package configuration. This is also shown in the movie below.

Composite C1 Windows Azure Publisher

This Composite C1 package adds a new feature to an existing/new C1 web site. The package is installed on a local C1 instance and all development and future editing is done on this local C1 website. The package adds a way of configuring the Windows Azure setup and a way of publishing the current version of the website.

A configuration consists of the blob storage name and access key. It also contains two blob container names. One container is used to upload all files in the website and the other container is used for very simple communication between the web roles and the local C1 installation.

After a valid configuration has been done, it is possible to publish the website. The publish process is a local folder to blob container synchronization with the same optimization as the one I have described in this earlier blog post:  How to do a fast recursive local folder to/from azure blob storage synchronization. Before the synchronization is started the C1 application is halted. This is done to insure that no changes will be made by the user while the synchronization is in progress. The first synchronization will obvious take some time because all files has to be uploaded to the blob. Ranging from 5 to 15 minutes or even more, depending on the size of the website. Consecutive synchronizations are much faster. And if no large files like movies are added a consecutive synchronization takes less than 1 minute!

The Windows Azure Publisher package also installs a feature that gives the user an overview of all the deployed web roles and their status. When a new publish is finished all the web roles will start synchronizing and the current progress for each web role synchronization is also displayed the overview. The movie below also shows this feature.

Here is a movie that shows: How to configure and deploy the Azure package, Adding the Windows Azure Traffic Manager, Installing and configuring the C1 Azure Publisher package, the synchronization process and overview and finally showing the result.

 

 

Tags:

.NET | Azure | Blob | C#

ContentMD5 not working in the Storage Client library

by ingvar 28. June 2011 16:33

Edit: 16th february

This bug was fixed in the Windows Azure SDK v. 1.5 or v. 1.6. So go download the newest version!

 

Edit: 6th july

This is a bug and it has been filed withint the team thats developing the Windows Azure Client Library. More information here.

 

Edit: 1st july - new version of this blog post:

I think my old version (see below) was not totally clear on how to reproduce what I was trying to say. So I have made a new code snip that will show that the ContentMD5 blob property is not populated by the Storage Client library event if the MD5 value is in the REST result. Below is the code that show it. To try the code, just copy/past it to a console application and change the account, storageKey and containerName to match your settings.

I should also add, that I'm not doing this to show of or talk down on Microsofts work with Windows Azure. I think all the Azure stuff is awesome!! It is just that, if this is a bug, and if it is fixed, it will make working with MD5 hash values on blobs a lot easier and faster. If I'm wrong I will take it all back :)

Here is the output of the code:

/md5test/MyBlob.txt MD5 =
MyBlob.txt MD5 = zhFORQHS9OLc6j4XtUbzOQ==

Here is the output of the code:

string account = "ACCOUNT NAME";
string storageKey = "STORAGE KEY";
string containerName = "md5test";

/* ListBlobs done by storage library */
StorageCredentialsAccountAndKey credentials =
    new StorageCredentialsAccountAndKey(
        account, 
        storageKey);
CloudStorageAccount storageAccount = new CloudStorageAccount(credentials, false);
CloudBlobClient client = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = client.GetContainerReference(containerName);
container.CreateIfNotExist();

string blobContent = "This is a test";
MD5 md5 = new MD5CryptoServiceProvider();
byte[] hashBytes = md5.ComputeHash(Encoding.UTF8.GetBytes(blobContent));
CloudBlob cloudBlob = container.GetBlobReference("MyBlob.txt");
cloudBlob.Properties.ContentMD5 = Convert.ToBase64String(hashBytes);
cloudBlob.UploadText(blobContent);

foreach (var blob in container.ListBlobs().OfType())
{
    Console.WriteLine(blob.Uri.LocalPath + " MD5 = " +
        blob.Properties.ContentMD5 ?? "Not populated");
}

/* ListBlobs done by REST */
DateTime now = DateTime.UtcNow;

string nowString = now.ToString("R", CultureInfo.InvariantCulture);

string canonicalizedResource = 
    string.Format("/{0}/{1}\ncomp:list\nrestype:container", 
    account, containerName);
string canonicalizedHeaders = 
    string.Format("x-ms-date:{0}\nx-ms-version:2009-09-19", 
    nowString);

string messageSignature =
    string.Format("GET\n\n\n\n\n\n\n\n\n\n\n\n{0}\n{1}",
    canonicalizedHeaders,
    canonicalizedResource

);

byte[] SignatureBytes = Encoding.UTF8.GetBytes(messageSignature);
HMACSHA256 SHA256 = new HMACSHA256(Convert.FromBase64String(storageKey));
String authorizationHeader = 
    string.Format("SharedKey {0}:{1}", 
    account, 
    Convert.ToBase64String(SHA256.ComputeHash(SignatureBytes)));


Uri url = new Uri(
   string.Format("http://{0}.blob.core.windows.net/{1}?restype=container&comp=list",
   account, containerName));
HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;
request.Method = "GET";
request.ContentLength = 0;
request.Headers.Add("x-ms-date", nowString);
request.Headers.Add("x-ms-version", "2009-09-19");
request.Headers.Add("Authorization", authorizationHeader);

HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream responseStream = response.GetResponseStream();
StreamReader reader = new StreamReader(responseStream);

string responseString = reader.ReadToEnd();

reader.Close();
responseStream.Close();
response.Close();

XElement responseElement = XElement.Parse(responseString);

foreach (XElement element in responseElement.Descendants("Content-MD5"))
{
    string name = element.Parent.Parent.Element("Name").Value;
    string md5value = element.Value;

    if (string.IsNullOrEmpty(md5value))
    {
        md5value = "Not populated";
    }

    Console.WriteLine(name + " MD5 = " + md5value);
}

 

 

 

Original version of this blog post

This is a follow up on my other blob post about the ContentMD5 blob property (Why is MD5 not part of ListBlobs result?). And it is motivated by the comment from Gaurav Mantri stating that the MD5 is actually in the REST response, it is just not populated by the Storage Client library.

So I also did some more testing and I found out that the MD5 is calculated by the and set blob stroge when you upload the blob. Because when I did a simple test of creating a new container and uploading a simple 'Hello World' txt file and tried doing a ListBlobs through the REST API, I got the XML back below. Note that I did not populate the ContentMD5 when I uploaded the file! If I try doing a ListBlobs method call on a CloudBlobContainer instance the ContentMD5 is NOT populated (Read more about this in my other blog post). So my best guess is that this is a error in the API of some sort.

I would be very nice if this was fixed, so we did not have to compuste the MD5 hash by hand and adding it to the blobs metadata. I would make code like I did for the local folder to/from blob storage synchronization much simpler (Read it here). 


<EnumerationResults ContainerName="http://myblobtest.blob.core.windows.net/md5test">
 <Blobs>
  <Blob>
   <Name>MyTestBlob.xml</Name>
   <Url>http://myblobtest.blob.core.windows.net/md5test/MyTestBlob.xml</Url>
   <Properties>
   <Last-Modified>Tue, 28 Jun 2011 16:20:30 GMT</Last-Modified>
   <Etag>0x8CE038BC293943D</Etag>
   <Content-Length>12</Content-Length>
   <Content-Type>text/xml</Content-Type>
   <Content-Encoding/>
   <Content-Language/>
   <Content-MD5>LL0183lu9Ms2aq/eo4TesA==</Content-MD5>
   <Cache-Control/>
   <BlobType>BlockBlob</BlobType>
   <LeaseStatus>unlocked</LeaseStatus>
  </Properties>
  </Blob>
 </Blobs>
 <NextMarker/>
</EnumerationResults>

 

Here is the code I did for doing the REST call:

string requestUrl = 
"http://myblobtest.blob.core.windows.net/md5test?restype=container&comp=list"; string certificatePath = /* Path to certificate */ HttpWebRequest httpWebRequest =
(HttpWebRequest)HttpWebRequest.Create(new Uri(requestUrl, true)); httpWebRequest.ClientCertificates.Add(new X509Certificate2(certificatePath)); httpWebRequest.Headers.Add("x-ms-version", "2009-09-19"); httpWebRequest.ContentType = "application/xml"; httpWebRequest.ContentLength = 0; httpWebRequest.Method = "GET"; using (HttpWebResponse httpWebResponse = (HttpWebResponse)httpWebRequest.GetResponse()) { using (Stream responseStream = httpWebResponse.GetResponseStream()) { using (StreamReader reader = new StreamReader(responseStream)) { Console.WriteLine(reader.ReadToEnd()); } } }

Tags:

.NET | Azure | Blob

Why is MD5 not part of ListBlobs result?

by ingvar 26. June 2011 20:13

Edit 6th july:

It turns out that this is a bug in the Windows Azure Client Library. Read more here.

 

Original version:

For some strange reason the ContentMD5 blob property is only populated if you call FetchAttributes on the blob. This means that you can not obtain the value of the ContentMD5 blob property by calling ListBlobs, with any parameter settings. If you have to call FetchAttributes to obtain the value it is not feasible to use on a large set of blobs because of the request time for calling FetchAttributes is going to take a long time. This means that it can not be used in a optimal way when doing something like fast folder synchronization I wrote about in this blog post.

The ContentMD5 blob property is never set by the blob storage it self. And if you try to put any value in it that is not a correct 128 MD5 hash of the file, you will get an exception. So it has limited usage.

I have made some code to illustrate the missing population of the ContentMD5 blob property when using the listBlobs method. 

Here is the output of running the code:

GetBlobReference without FetchAttributes: Not populated
GetBlobReference with FetchAttributes: zhFORQHS9OLc6j4XtUbzOQ==
ListBlobs with BlobListingDetails.None: Not populated
ListBlobs with BlobListingDetails.Metadata: Not populated
ListBlobs with BlobListingDetails.All: Not populated

And here is the code:

CloudStorageAccount storageAccount = /* Initialize this */;
CloudBlobClient client = storageAccount.CreateCloudBlobClient();

CloudBlobContainer container = client.GetContainerReference("mytest");
container.CreateIfNotExist();

const string blobContent = "This is a test";

/* Compute 128 MD5 hash of content */
MD5 md5 = new MD5CryptoServiceProvider();
byte[] hashBytes = md5.ComputeHash(Encoding.UTF8.GetBytes(blobContent));

/* Get the blob reference and upload it with MD5 hash */
CloudBlob blob = container.GetBlobReference("MyBlob1.txt");
blob.Properties.ContentMD5 = Convert.ToBase64String(hashBytes);
blob.UploadText(blobContent);



CloudBlob blob1 = container.GetBlobReference("MyBlob1.txt");

/* Not populated - this is expected */
Console.WriteLine("GetBlobReference without FetchAttributes: " + 
    (blob1.Properties.ContentMD5 ?? "Not populated"));



blob1.FetchAttributes();

/* Populated - this is expected */
Console.WriteLine("GetBlobReference with FetchAttributes: " + 
    (blob1.Properties.ContentMD5 ?? "Not populated"));



CloudBlob blob2 = container.ListBlobs().OfType().Single();
            
/* Not populated - this is NOT expected */
Console.WriteLine("ListBlobs with BlobListingDetails.None: " + 
    (blob2.Properties.ContentMD5 ?? "Not populated"));



CloudBlob blob3 = container.ListBlobs(new BlobRequestOptions
    { BlobListingDetails = BlobListingDetails.Metadata }).
    OfType().Single();

/* Not populated - this is NOT expected */
Console.WriteLine("ListBlobs with BlobListingDetails.Metadata: " + 
    (blob3.Properties.ContentMD5 ?? "Not populated"));



CloudBlob blob4 = container.ListBlobs(new BlobRequestOptions
    {
        UseFlatBlobListing = true,
        BlobListingDetails = BlobListingDetails.All
    }).
    OfType().Single();

/* Not populated - this is NOT expected */
Console.WriteLine("ListBlobs with BlobListingDetails.All: " + 
    (blob4.Properties.ContentMD5 ?? "Not populated"));

Tags:

.NET | Azure | Blob | C#

How to do a fast recursive local folder to/from azure blob storage synchronization

by ingvar 23. June 2011 20:49

Introduction

In this blob post I will describe how to do a synchronize of a local file system folder against a Windows Azure blob container/folder. There are many ways to do this, some faster than others. My way of doing this is especially fast if few files has been added/updated/deleted. If many is added/updated/deleted it is still fast, but uploading/downloadig files to/from the blob will be the main time factor. The algorithm I’m going to describe was developed by me when I was implementing a non-live-editing Window Azure deployment model for Composite C1. You can read more about the setup here. I will do a more technical blog post about this non-live-editing setup later.

Breakdown of the problem

The algorithm should only do one way synchronization. Meaning that it either updates the local file system folder to match whats stored on the blob container. Or updates the blob container to match whats stored in the local folder. So I will split it up into two. One for synchronizing to the blob storage and one from the blob storage. 

Because the blob storage is located on another computer we can’t compare the time stamp of the blobs against timestamps on local files. The reason for this is that the two computers (blob storage and our local) clocks will never be 100% in sync. What we can do, is use file hashes like MD5. The only problem with file hashes is that they are expensive to calculate, so we have to this as little as possible. We can accomplish this by saving the MD5 hash in the blobs metadata and cache the hash for the local file in memory. Even if we convert the hash value to a base 64 string, holding the hash in memory for 10.000 files will cost less than 0.3 mega bytes. So this scales fairly okay. 

When working with the Windows Azure blob storage we have to take care not to do lots request. Especially we should take care not to do a request for every file/blob we process. Each request is likely to take more than 50ms and if we have 10.000 files to process, this will cost more than 8 minutes! So we should never use GetBlobReference/FetchAttributes to see if the blob exists and/or get its MD5 hash. But this is no problem because we can use the ListBlobs method with the right options.

Semi Pseudo Algorithms

Lets start with some semi pseudo code. I have left out some methods and properties but they should be self explaining enouth, to get the overall understanding of the algorithms. I did this so it would be easier to read and understand. Further down I’ll show the full C# code for these algorithms. 

You might wonder why I store the MD5 hash value in the blobs meta data and not in the ContentMD5 property of the blob. The reason is that ContentMD5 is only populated with a value if FetchAttribute is called on the blob, which will make the algorithm perform really bad. Ill do a blog post on the odd behavior of the ContentMD5 blob property in a later blog post. Edit: Read it here.

Download the full source here: BlobSync.cs (7.80 kb).

Synchronizing to the blob

public void SynchronizeToBlob()
{
    DateTime lastSync = LastSyncTime;
    DateTime newLastSyncTime = DateTime.Now;

    IEnumerable allFilesInStartFolder = GetAllFilesInStartFolder();

    var blobOptions = new BlobRequestOptions { UseFlatBlobListing = true };
            
    /* This is the only request to the blob storage that we will do */
    /* except when we have to upload or delete to/from the blob */
    var blobs =
        Container.ListBlobs(blobOptions).
        OfType().
        Select(b => new
        {
            Blob = b,
            LocalPath = GetLocalPath(b)
        }).
        ToList();
    /* We use ToList here to avoid multiple requests when enumerating */

    foreach (string filePath in allFilesInStartFolder)
    {
        string fileHash = GetFileHashFromCache(filePath, lastSync);

        /* Checking for added files */
        var blob = blobs.Where(b => b.LocalPath == filePath).SingleOrDefault();
        if (blob == null) // Does not exist
        {
            UploadToBlobStorage(filePath, fileHash);
        }

        /* Checking for changed files */
        if (fileHash != blob.Blob.Metadata["Hash"])
        {
            UploadToBlobStorage(filePath, fileHash, blob.Blob);
        }
    }

    /* Check for deleted files */
    foreach (var blob in blobs)
    {
        bool exists = allFilesInStartFolder.Where(f => blob.LocalPath == f).Any();

        if (!exists)
        {
            DeleteBlob(blob.Blob);
        }
    }

    LastSyncTime = newLastSyncTime;
}

 

Synchronizing from the blob

public void SynchronizeFromBlob()
{
    IEnumerable allFilesInStartFolder = GetAllFilesInStartFolder();

    var blobOptions = new BlobRequestOptions
    {
        UseFlatBlobListing = true,
        BlobListingDetails = BlobListingDetails.Metadata
    };

    /* This is the only request to the blob storage that we will do */
    /* except when we have to upload or delete to/from the blob */
    var blobs =
        Container.ListBlobs(blobOptions).
        OfType().
        Select(b => new
        {
            Blob = b,
            LocalPath = GetLocalPath(b)
        }).
        ToList();
    /* We use ToList here to avoid multiple requests when enumerating */

    foreach (var blob in blobs)
    {
        /* Checking for added files */
        if (!File.Exists(blob.LocalPath))
        {
            DownloadFromBlobStorage(blob.Blob, blob.LocalPath);
        }

        /* Checking for changed files */
        string fileHash = GetFileHashFromCache(blob.LocalPath);
        if (fileHash != blob.Blob.Metadata["Hash"])
        {
            DownloadFromBlobStorage(blob.Blob, blob.LocalPath);
            UpdateFileHash(blob.LocalPath, blob.Blob.Metadata["Hash"]);
        }
    }

    /* Checking for deleted files */
    foreach (string filePath in allFilesInStartFolder)
    {
        bool exists = blobs.Where(b => b.LocalPath == filePath).Any();
        if (!exists)
        {
            File.Delete(filePath);
        }
    }
}

The rest of the code

In this section I will go through the missing methods and properties from the semi pseudo algorithms above. Most of them are pretty simple and self explaining, but a few of them are more complex and needs more attention. 

LastSyncTime and Container

These are just get/set properties. Container should be initialized with the the blob container that you wish to synchronize to/from. LastSyncTime is initialized with DateTime.MinValue. LocalFolder points the to local directory to synchronize to/from.

private DateTime LastSyncTime { get; set; }      
private CloudBlobContainer Container { get; set; }
/* Ends with a \ */
private string LocalFolder { get; set; }

UploadToBlobStorage

Simply adds the file hash to the blobs metadata and uploads the file.

private void UploadToBlobStorage(string filePath, string fileHash)
{
    string blobPath = filePath.Remove(0, LocalFolder.Length);
    CloudBlob blob = Container.GetBlobReference(blobPath);
    blob.Metadata["Hash"] = fileHash;
    blob.UploadFile(filePath);
}


private void UploadToBlobStorage(string filePath, string fileHash, CloudBlob cloudBlob)
{
    cloudBlob.Metadata["Hash"] = fileHash;
    cloudBlob.UploadFile(filePath);
}

DownloadFromBlobStorage

Simply downloads the blob

private void DownloadFromBlobStorage(CloudBlob blob, string filePath)
{
    blob.DownloadToFile(filePath);
}

DeleteBlob

Simply deletes the blob

private void DeleteBlob(CloudBlob blob)
{
    blob.Delete();
}

GetFileHashFromCache

There are two versions of this method. This one is used when synchronizing to the blob. It uses the LastWriteTime of the file and the last time we did a sync to skip calculating the file hash of files that have not been changed. This saves a lot of time, so its worth the complexity. 

private readonly Dictionary _syncToBlobHashCache = new Dictionary();
private string GetFileHashFromCache(string filePath, DateTime lastSync)
{
    if (File.GetLastWriteTime(filePath) <= lastSync && 
        _syncToBlobHashCache.ContainsKey(filePath))
    {
        return _syncToBlobHashCache[filePath];
    }
    else
    {
        using (FileStream file = new FileStream(filePath, FileMode.Open))
        {
            MD5 md5 = new MD5CryptoServiceProvider();

            string fileHash = Convert.ToBase64String(md5.ComputeHash(file));
            _syncToBlobHashCache[filePath] = fileHash;

            return fileHash;
        }
    }
}

GetFileHashFromCache and UpdateFileHash

This is the other version of the GetFileHashFromCache method. This one is used when synchronizing from the blob. The UpdateFileHash is used for updating the file hash cache when a new hash is obtained from a blob.

private readonly Dictionary _syncFromBlobHashCache = new Dictionary();
private string GetFileHashFromCache(string filePath)
{
    if (_syncFromBlobHashCache.ContainsKey(filePath))
    {
        return _syncFromBlobHashCache[filePath];
    }
    else
    {
        using (FileStream file = new FileStream(filePath, FileMode.Open))
        {
            MD5 md5 = new MD5CryptoServiceProvider();

            string fileHash = Convert.ToBase64String(md5.ComputeHash(file));
            _syncFromBlobHashCache[filePath] = fileHash;

            return fileHash;
        }
    }
}

private void UpdateFileHash(string filePath, string fileHash)
{
    _syncFromBlobHashCache[filePath] = fileHash;
}

GetAllFilesInStartFolder

This method returns all files in the start folder given by the LocalFolder property. It ToLowers all file paths. This is done because blobs names are case sensitive, so when we compare paths returned from the method we want to compare on all lower cased paths. When comparing paths we also use the GetLocalPath method which translates a blob path to a local path and also ToLowers the result. 

private IEnumerable GetAllFilesInStartFolder()
{
    Queue foldersToProcess = new Queue();
    foldersToProcess.Enqueue(LocalFolder);

    while (foldersToProcess.Count > 0)
    {
        string currentFolder = foldersToProcess.Dequeue();
        foreach (string subFolder in Directory.GetDirectories(currentFolder))
        {
            foldersToProcess.Enqueue(subFolder);
        }

        foreach (string filePath in Directory.GetFiles(currentFolder))
        {
            yield return filePath.ToLower();
        }
    }
}

GetLocalPath

Returns the local path of the given blob using the LocalFolder property as base folder. It ToLowers the result so we only compare all lower cased paths due to that blob names are case sensitive. 

private string GetLocalPath(CloudBlob blob)
{
    /* Path should only use \ and no /  */
    string path = blob.Uri.LocalPath.Remove(blob.Container.Name.Length + 2).Replace('/', '\\');

    /* Blob names are case sensitive, so when we check local */
    /* filenames agains blob names we tolower all of it */
    return Path.Combine(LocalFolder, path).ToLower();
}

Tags:

.NET | Azure | C# | Blob

Azure cloud blob property vs metadata

by ingvar 5. June 2011 21:45

Both properties (some of them) and the metadata collection of a blob can be used to store meta data for a given blob. But there are a small differences between them. When working with the blob storage, the number of HTTP REST request plays a significant role when it comes to performance. The number of request becomes very important if the blob storage contains a lot of small files. There are at least three properties found in the CloudBlob.Properties property that can be used freely. These are ContentType, ContentEncoding and ContentLanguage. These can hold very large strings! I have tried testing with a string containing 100.000 characters and it worked. They could possible hold a lot more, but hey, 100.000 is a lot! So all three of them can be used to hold meta data.
So, what are the difference of using these properties and using the metadata collection? The difference lies in when they get populated. This is best illustrated by the following code:

CloudBlobContainer container; /* Initialized assumed */
CloudBlob blob1 = container.GetBlobReference("MyTestBlob.txt");
blob1.Properties.ContentType = "MyType";
blob1.Metadata["Meta"] = "MyMeta";
blob1.UploadText("Some content");

CloudBlob blob2 = container.GetBlobReference("MyTestBlob.txt");
string value21 = blob2.Properties.ContentType; /* Not populated */
string value22 = blob2.Metadata["Meta"]; /* Not populated */

CloudBlob blob3 = container.GetBlobReference("MyTestBlob.txt");
blob3.FetchAttributes();
string value31 = blob3.Properties.ContentType; /* Populated */
string value32 = blob3.Metadata["Meta"]; /* Populated */

CloudBlob blob4 = (CloudBlob)container.ListBlobs().First();
string value41 = blob4.Properties.ContentType; /* Populated */
string value42 = blob4.Metadata["Meta"]; /* Not populated */

BlobRequestOptions options = new BlobRequestOptions
  {
     BlobListingDetails = BlobListingDetails.Metadata
  };
CloudBlob blob5 = (CloudBlob)container.ListBlobs(options).First();
string value51 = blob5.Properties.ContentType; /* Populated */
string value52 = blob5.Metadata["Meta"]; /* populated */

 

The difference is when using ListBlobs on a container or blob directory and the values of the BlobRequestOptions object. It might not seem to be a big difference, but imagine that there are 10.000 blobs all with a meta data string value with a length of 100 characters. That sums to 1.000.000 extra data to send when listing the blobs. So if the meta data is not used every time you do a ListBlobs call, you might consider moving it to the Metadata collection. I will investigate more into the performance of these methods of storing meta data for a blob in a later blog post.

Tags:

.NET | Azure | C# | Blob

About the author

Martin Ingvar Kofoed Jensen

Architect and Senior Developer at Composite on the open source project Composite C1 - C#/4.0, LINQ, Azure, Parallel and much more!

Follow me on Twitter

Read more about me here.

Read press and buzz about my work and me here.

Stack Overflow

Month List