<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en_US"><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://suprasanna.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://suprasanna.com/" rel="alternate" type="text/html" hreflang="en_US" /><updated>2026-06-16T07:19:08-04:00</updated><id>https://suprasanna.com/feed.xml</id><title type="html">Suprasanna Sarkar</title><subtitle>Personal website and blog of Suprasanna Sarkar. Building intelligent data and AI systems at scale for real-world impact. Sharing insights on data engineering, machine learning, AI solutions, and technical leadership.</subtitle><author><name>Suprasanna Sarkar</name></author><entry><title type="html">Welcome to My Blog</title><link href="https://suprasanna.com/blog/welcome-to-my-blog/" rel="alternate" type="text/html" title="Welcome to My Blog" /><published>2025-11-30T00:00:00-05:00</published><updated>2025-11-30T00:00:00-05:00</updated><id>https://suprasanna.com/blog/welcome-to-my-blog</id><content type="html" xml:base="https://suprasanna.com/blog/welcome-to-my-blog/"><![CDATA[<h2 id="hello-world-">Hello, World! 👋</h2>

<p>Welcome to my personal blog! I’m excited to start sharing my experiences, learnings, and insights from my 17+ years journey in data, analytics, and AI/ML engineering.</p>

<h3 id="what-to-expect">What to Expect</h3>

<p>On this blog, you’ll find content about:</p>

<ul>
  <li><strong>Data Engineering</strong>: Building scalable data pipelines, ETL best practices, and modern data stack</li>
  <li><strong>Machine Learning</strong>: ML model development, deployment, and MLOps strategies</li>
  <li><strong>AI/LLMs</strong>: Working with Large Language Models, prompt engineering, and AI applications</li>
  <li><strong>Cloud Architecture</strong>: Designing and implementing cloud-native solutions on AWS, Azure, and GCP</li>
  <li><strong>Leadership</strong>: Team management, technical mentoring, and agile methodologies</li>
  <li><strong>Best Practices</strong>: Code quality, testing, DevOps, and production-ready solutions</li>
</ul>

<h3 id="my-background">My Background</h3>

<p>I’ve spent over 17 years specializing in building and managing the entire lifecycle of data, analytics, and AI/Machine Learning systems. My passion lies in:</p>

<ul>
  <li>Designing scalable AI solutions that solve real-world problems</li>
  <li>Leading high-performing teams in agile environments</li>
  <li>Implementing DevOps practices for smooth operation</li>
  <li>Delivering impactful, user-centric solutions</li>
</ul>

<h3 id="why-this-blog">Why This Blog?</h3>

<p>I believe in giving back to the community that has helped me grow throughout my career. Through this blog, I aim to:</p>

<ol>
  <li><strong>Share Knowledge</strong>: Document solutions to complex problems I’ve encountered</li>
  <li><strong>Learn in Public</strong>: Deepen my understanding by explaining concepts</li>
  <li><strong>Connect</strong>: Build relationships with fellow engineers and technologists</li>
  <li><strong>Contribute</strong>: Help others accelerate their learning journey</li>
</ol>

<h3 id="stay-connected">Stay Connected</h3>

<p>I’ll be posting regularly about technical deep-dives, tutorials, case studies, and reflections on technology trends. Feel free to reach out if you have questions or want to discuss any topics!</p>

<p>You can find me on:</p>
<ul>
  <li><a href="https://github.com/spsarkar">GitHub</a></li>
  <li><a href="https://linkedin.com/in/yourprofile">LinkedIn</a></li>
  <li><a href="https://twitter.com/yourusername">Twitter</a></li>
</ul>

<p>Let’s learn and build amazing things together! 🚀</p>

<hr />

<p><em>Have a topic you’d like me to write about? Drop me an email or connect on social media!</em></p>]]></content><author><name>Suprasanna Sarkar</name></author><category term="blog" /><category term="introduction" /><category term="personal" /><summary type="html"><![CDATA[Welcome to my personal blog where I share insights about data engineering, machine learning, and technology leadership.]]></summary></entry><entry><title type="html">Deploy Azure Batch Apps Service Sample through Azure Batch Apps Portal</title><link href="https://suprasanna.com/azure/cloud%20computing/tutorial/deploy-azure-batch-apps-through-portal/" rel="alternate" type="text/html" title="Deploy Azure Batch Apps Service Sample through Azure Batch Apps Portal" /><published>2015-05-18T00:00:00-04:00</published><updated>2015-05-18T00:00:00-04:00</updated><id>https://suprasanna.com/azure/cloud%20computing/tutorial/deploy-azure-batch-apps-through-portal</id><content type="html" xml:base="https://suprasanna.com/azure/cloud%20computing/tutorial/deploy-azure-batch-apps-through-portal/"><![CDATA[<p>Microsoft Azure Batch Apps is an Azure service which provides the ability to run compute-intensive and massively parallel workload on demand. These workloads could be managed either via Azure Batch Apps APIs or the Batch Apps portal. This article uses a sample from GitHub (<a href="https://github.com/spsarkar/AzureBatchDependenciesStorage">https://github.com/spsarkar/AzureBatchDependenciesStorage</a>) to demonstrate the usage of Azure Batch Apps through Azure Batch Apps management portal.</p>

<h2 id="preparing-azure-batch-service">Preparing Azure Batch Service</h2>

<p>You need to have Visual studio 2013 or above to build this project. This project uses nugget packages:</p>

<ul>
  <li>Microsoft Azure Batch Apps Cloud SDK (<code class="language-plaintext highlighter-rouge">PM&gt; Install-Package Microsoft.Azure.Batch.Apps.Cloud –Pre</code>)</li>
  <li>Windows Azure Storage (<code class="language-plaintext highlighter-rouge">PM&gt; Install-Package WindowsAzure.Storage –Pre</code>)</li>
</ul>

<p>You need to create Azure batch Service from Azure Management portal if you haven’t done it yet.</p>

<p><img src="https://www.suprasanna.com/content/images/2015/05/AzureBatchCreation.jpg" alt="Create Azure Batch Service" /></p>

<h2 id="workflow-to-publish-and-run-azure-batch-applications">Workflow to Publish and Run Azure Batch Applications</h2>

<p>This article (<a href="http://azure.microsoft.com/en-us/documentation/articles/batch-technical-overview/">http://azure.microsoft.com/en-us/documentation/articles/batch-technical-overview/</a>) describes the basic concepts required for Azure Batch Service and ‘Workflow to publish and run an application with Batch Apps’ section of this article demonstrates workflow to publish and run Azure Batch Apps. Briefly Azure Batch apps consist of two major components (Application image and cloud assembly).</p>

<p><img src="https://acomdpsstorage.blob.core.windows.net/dpsmedia-prod/azure.microsoft.com/en-us/documentation/articles/batch-technical-overview/20150514052253/app_pub_workflow.png" alt="Azure Batch App Workflow" /></p>

<p><em>Diagram source: <a href="http://azure.microsoft.com/en-us/documentation/articles/batch-technical-overview/">http://azure.microsoft.com/en-us/documentation/articles/batch-technical-overview/</a></em></p>

<h3 id="application-image">Application Image</h3>

<p>An application image is a zip file containing application executables and related necessary support files. A dummy test application image (<code class="language-plaintext highlighter-rouge">mri-processing-dummy.zip</code>) is included in the GitHub project. This zip contains two Windows Batch Files (<code class="language-plaintext highlighter-rouge">niftiInit.bat</code> and <code class="language-plaintext highlighter-rouge">skullStrip.bat</code>) which copies text content of input file to output files.</p>

<h3 id="cloud-assembly">Cloud Assembly</h3>

<p>A Cloud Assembly is a zip file containing cloud assembly that will invoke and dispatch workload to Azure Batch Service. It contains a Job Splitter and a Task processor. Cloud assembly is not included in the GitHub project, you make it by yourself by compiling the project itself and zipping the content of the output folder.</p>

<h2 id="entry-point-to-azure-batch-apps">Entry Point to Azure Batch Apps</h2>

<p><code class="language-plaintext highlighter-rouge">ApplicationDefinition</code> represents the main entry point in the cloud assembly. This definition includes the following:</p>

<ul>
  <li>Name of the Job</li>
  <li>Name of the Application</li>
  <li>Job Splitter – This step gives us the possibility to split a job in multiple tasks</li>
  <li>Task Processor – which invokes the application executable for a given task</li>
</ul>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">class</span> <span class="nc">ApplicationDefinition</span> 
<span class="p">{</span> 
    <span class="k">public</span> <span class="k">static</span> <span class="k">readonly</span> <span class="n">CloudApplication</span> <span class="n">Application</span> <span class="p">=</span> <span class="k">new</span> <span class="n">ParallelCloudApplication</span> 
    <span class="p">{</span> 
        <span class="n">ApplicationName</span> <span class="p">=</span> <span class="s">"AzureBatchNiftiProcessing"</span><span class="p">,</span> 
        <span class="n">JobType</span> <span class="p">=</span> <span class="s">"AzureBatchNiftiProcessing"</span><span class="p">,</span> 
        <span class="n">JobSplitterType</span> <span class="p">=</span> <span class="k">typeof</span><span class="p">(</span><span class="n">AzureBatchNiftiProcessingJobSplitter</span><span class="p">),</span> 
        <span class="n">TaskProcessorType</span> <span class="p">=</span> <span class="k">typeof</span><span class="p">(</span><span class="n">AzureBatchNiftiProcessingTaskProcessor</span><span class="p">)</span> 
    <span class="p">};</span> 
<span class="p">}</span>
</code></pre></div></div>

<h2 id="job-splitter">Job Splitter</h2>

<p>Job Splitter does the splitting of your job into multiple tasks. This enables you to run multiple parallel and dependent tasks. For More detailed information, please read this article (<a href="http://azure.microsoft.com/en-us/documentation/articles/batch-dotnet-get-started/">http://azure.microsoft.com/en-us/documentation/articles/batch-dotnet-get-started/</a>).</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">protected</span> <span class="k">override</span> <span class="n">IEnumerable</span><span class="p">&lt;</span><span class="n">TaskSpecifier</span><span class="p">&gt;</span> <span class="nf">Split</span><span class="p">(</span><span class="n">IJob</span> <span class="n">job</span><span class="p">,</span> <span class="n">JobSplitSettings</span> <span class="n">settings</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">var</span> <span class="n">reorientTask</span> <span class="p">=</span> <span class="k">new</span> <span class="n">TaskSpecifier</span>
    <span class="p">{</span>
        <span class="n">TaskId</span> <span class="p">=</span> <span class="n">TaskIds</span><span class="p">.</span><span class="n">Reslice</span><span class="p">,</span>
        <span class="n">RequiredFiles</span> <span class="p">=</span> <span class="n">job</span><span class="p">.</span><span class="n">Files</span><span class="p">.</span><span class="nf">Take</span><span class="p">(</span><span class="m">1</span><span class="p">).</span><span class="nf">ToList</span><span class="p">(),</span>
        <span class="n">Parameters</span> <span class="p">=</span> <span class="n">job</span><span class="p">.</span><span class="n">Parameters</span><span class="p">,</span>                
    <span class="p">};</span>
    
    <span class="kt">var</span> <span class="n">reorientTask2</span> <span class="p">=</span> <span class="k">new</span> <span class="n">TaskSpecifier</span>
    <span class="p">{</span>
        <span class="n">TaskId</span> <span class="p">=</span> <span class="n">TaskIds</span><span class="p">.</span><span class="n">Reslice1</span><span class="p">,</span>
        <span class="n">RequiredFiles</span> <span class="p">=</span> <span class="n">job</span><span class="p">.</span><span class="n">Files</span><span class="p">.</span><span class="nf">Take</span><span class="p">(</span><span class="m">2</span><span class="p">).</span><span class="nf">ToList</span><span class="p">(),</span>
        <span class="n">Parameters</span> <span class="p">=</span> <span class="n">job</span><span class="p">.</span><span class="n">Parameters</span><span class="p">,</span>
    <span class="p">};</span>
    
    <span class="kt">var</span> <span class="n">skullStripTask</span> <span class="p">=</span> <span class="k">new</span> <span class="n">TaskSpecifier</span>
    <span class="p">{</span>
        <span class="n">TaskId</span> <span class="p">=</span> <span class="n">TaskIds</span><span class="p">.</span><span class="n">SkullStrip</span><span class="p">,</span>
        <span class="n">Parameters</span> <span class="p">=</span> <span class="n">job</span><span class="p">.</span><span class="n">Parameters</span><span class="p">,</span>
        <span class="n">DependsOn</span> <span class="p">=</span> <span class="n">TaskDependency</span><span class="p">.</span><span class="nf">OnId</span><span class="p">(</span><span class="n">TaskIds</span><span class="p">.</span><span class="n">Reslice</span><span class="p">)</span>
    <span class="p">}.</span><span class="nf">RequiringAllJobFiles</span><span class="p">(</span><span class="n">job</span><span class="p">);</span>
    
    <span class="kt">var</span> <span class="n">skullStripTask2</span> <span class="p">=</span> <span class="k">new</span> <span class="n">TaskSpecifier</span>
    <span class="p">{</span>
        <span class="n">TaskId</span> <span class="p">=</span> <span class="n">TaskIds</span><span class="p">.</span><span class="n">SkullStrip1</span><span class="p">,</span>
        <span class="n">Parameters</span> <span class="p">=</span> <span class="n">job</span><span class="p">.</span><span class="n">Parameters</span><span class="p">,</span>
        <span class="n">DependsOn</span> <span class="p">=</span> <span class="n">TaskDependency</span><span class="p">.</span><span class="nf">OnId</span><span class="p">(</span><span class="n">TaskIds</span><span class="p">.</span><span class="n">Reslice1</span><span class="p">)</span>
    <span class="p">}.</span><span class="nf">RequiringAllJobFiles</span><span class="p">(</span><span class="n">job</span><span class="p">);</span>
    
    <span class="k">return</span> <span class="k">new</span> <span class="n">List</span><span class="p">&lt;</span><span class="n">TaskSpecifier</span><span class="p">&gt;</span> <span class="p">{</span> <span class="n">reorientTask</span><span class="p">,</span> <span class="n">reorientTask2</span><span class="p">,</span> <span class="n">skullStripTask</span><span class="p">,</span> <span class="n">skullStripTask2</span> <span class="p">};</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="defining--running-tasks">Defining &amp; Running Tasks</h2>

<p>At this stage you define each of the non merge task specified and returned from the job splitter. It invokes application with the appropriate arguments, and return a collection of outputs that need to be kept for later use. The following could be specified for ExternalProcess:</p>

<ul>
  <li><strong>CommandPath</strong> – The path of the executable file</li>
  <li><strong>Arguments</strong> – The command line argument for the executable</li>
  <li><strong>WorkingDirectory</strong> – The working directory for the external program process</li>
  <li><strong>CancellationToken</strong> – Gets or sets a cancellation token which can be used to cancel the external</li>
</ul>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">originalInputFileName</span> <span class="p">=</span> <span class="n">task</span><span class="p">.</span><span class="n">RequiredFiles</span><span class="p">[</span><span class="m">0</span><span class="p">].</span><span class="n">Name</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">inputFile</span> <span class="p">=</span> <span class="nf">LocalPath</span><span class="p">(</span><span class="n">originalInputFileName</span><span class="p">);</span>
<span class="kt">string</span> <span class="n">commandPathStr</span> <span class="p">=</span> <span class="nf">ExecutablePath</span><span class="p">(</span><span class="s">@"mri-processing\niftiInit.bat"</span><span class="p">);</span>
<span class="kt">string</span> <span class="n">strExecutableLocation</span> <span class="p">=</span> <span class="n">Path</span><span class="p">.</span><span class="nf">GetDirectoryName</span><span class="p">(</span><span class="n">ExecutablesPath</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">outputFile</span> <span class="p">=</span> <span class="n">Path</span><span class="p">.</span><span class="nf">Combine</span><span class="p">(</span><span class="n">strExecutableLocation</span><span class="p">,</span> <span class="n">strOutputFileName</span><span class="p">);</span>

<span class="kt">string</span> <span class="n">externalProcessArgs</span> <span class="p">=</span> <span class="kt">string</span><span class="p">.</span><span class="nf">Format</span><span class="p">(</span><span class="s">"\"{0}\" \"{1}\""</span><span class="p">,</span> 
    <span class="n">inputFile</span><span class="p">.</span><span class="nf">Replace</span><span class="p">(</span><span class="s">".nii"</span><span class="p">,</span> <span class="kt">string</span><span class="p">.</span><span class="n">Empty</span><span class="p">),</span> 
    <span class="n">outputFile</span><span class="p">.</span><span class="nf">Replace</span><span class="p">(</span><span class="s">".nii"</span><span class="p">,</span> <span class="kt">string</span><span class="p">.</span><span class="n">Empty</span><span class="p">));</span>
    
<span class="kt">var</span> <span class="n">process</span> <span class="p">=</span> <span class="k">new</span> <span class="n">ExternalProcess</span>
<span class="p">{</span>
    <span class="n">CommandPath</span> <span class="p">=</span> <span class="n">commandPathStr</span><span class="p">,</span>
    <span class="n">Arguments</span> <span class="p">=</span> <span class="n">externalProcessArgs</span><span class="p">,</span>
    <span class="n">WorkingDirectory</span> <span class="p">=</span> <span class="n">LocalStoragePath</span>
<span class="p">};</span>
</code></pre></div></div>

<h2 id="building-cloud-assembly">Building Cloud Assembly</h2>

<p>You need to specify the access details of Azure storage account associated with your azure batch app service. You will find the details on how to find out this azure storage details here (<a href="http://sarkar.azurewebsites.net/2015/03/16/uploading-large-executable-on-azure-batch-service-app-management-portal">http://sarkar.azurewebsites.net/2015/03/16/uploading-large-executable-on-azure-batch-service-app-management-portal</a>).</p>

<p>Follow the instructions below to build the cloud assembly:</p>

<ol>
  <li>Add Azure Storage access details to DownloadFile method in TaskProcessor.cs file in the AzureBatchDependenciesStorage project.</li>
  <li>Build the AzureBatchDependenciesStorage project.</li>
  <li>Open the output folder of the AzureBatchDependenciesStorage project.</li>
  <li>Select all the DLLs (and optionally PDB files) in the output folder.</li>
  <li>Right-click and choose Send To &gt; Compressed Folder.</li>
</ol>

<h2 id="uploading-the-application-to-batch-apps-service">Uploading the Application to Batch Apps Service</h2>

<ol>
  <li>Open the Azure management portal (manage.windowsazure.com).</li>
  <li>Select Batch Services in the left-hand menu.</li>
  <li>Select your service in the list and click “Manage Batch Apps.” This opens the Batch Apps management portal.</li>
  <li>Select Services in the left-hand menu.</li>
  <li>Select your service in the list and click View Details.</li>
  <li>Choose the Manage Applications tab.</li>
  <li>Click New Application.</li>
  <li>Under “Select and upload a cloud assembly,” choose your cloud assembly zip file and click Upload.</li>
  <li>Under “Select and upload an application image,” choose your application image zip file and click Upload. (Be sure to leave the version as “default”.)</li>
  <li>Click Done.</li>
</ol>

<h2 id="running-jobs-from-azure-batch-apps-portal">Running Jobs from Azure Batch Apps Portal</h2>

<ol>
  <li>Open the Azure management portal (manage.windowsazure.com).</li>
  <li>Select Batch Services in the left-hand menu.</li>
  <li>Select your service in the list and click “Manage Batch Apps.” This opens the Batch Apps management portal.</li>
  <li>Select Services in the left-hand menu.</li>
  <li>Select your service in the list and click View Details.</li>
  <li>Choose the Manage Applications tab.</li>
  <li>Click Run Jobs.</li>
  <li>Enter Job Name, select Job Type and enter any parameters your job.</li>
  <li>Under “Select the input files for your job” choose your input file. For simplicity, this sample project requires that these two files have these names (<code class="language-plaintext highlighter-rouge">brain.nii</code> and <code class="language-plaintext highlighter-rouge">tissue.nii</code>). The GitHub project contains a zip file (<code class="language-plaintext highlighter-rouge">TestInputFiles.zip</code>) containing two input files and those files could be used here.</li>
  <li>Then you could start the job.</li>
</ol>

<h2 id="running-jobs-using-azure-batch-app-client-apis">Running Jobs using Azure Batch App Client APIs</h2>

<p>Azure Batch Apps Could also be managed through Azure Batch App Client APIs. The client example (<code class="language-plaintext highlighter-rouge">ImageMagick.Console.Client</code>) project from “Microsoft Azure Batch Apps Samples” (<a href="https://code.msdn.microsoft.com/Azure-Batch-Apps-Samples-dd781172">https://code.msdn.microsoft.com/Azure-Batch-Apps-Samples-dd781172</a>) demonstrates how this could be achieved. The source code and documentation from this sample project could be re-used easily to start submitting &amp; monitoring the jobs for this sample project also.</p>]]></content><author><name>Suprasanna Sarkar</name></author><category term="Azure" /><category term="Cloud Computing" /><category term="Tutorial" /><category term="Azure Batch" /><category term="Azure Batch Apps" /><category term="Cloud Computing" /><category term="Parallel Processing" /><category term="C#" /><summary type="html"><![CDATA[A comprehensive guide to deploying Azure Batch Apps using the management portal, including sample code and step-by-step instructions.]]></summary></entry><entry><title type="html">Uploading Large Executable on Azure Batch Service App Management Portal</title><link href="https://suprasanna.com/azure/cloud%20computing/uploading-large-executable-on-azure-batch-service/" rel="alternate" type="text/html" title="Uploading Large Executable on Azure Batch Service App Management Portal" /><published>2015-03-16T00:00:00-04:00</published><updated>2015-03-16T00:00:00-04:00</updated><id>https://suprasanna.com/azure/cloud%20computing/uploading-large-executable-on-azure-batch-service</id><content type="html" xml:base="https://suprasanna.com/azure/cloud%20computing/uploading-large-executable-on-azure-batch-service/"><![CDATA[<p>Azure batch App service includes a management Portal where developer can manage jobs, upload application &amp; cloud assembly, view logs and download outputs without having to write their own client code. At this moment it is not possible to load a large zip file with the executables (2+GB in size) through the management portal, it only gives this error.</p>

<p>Over 2GB seems like a very large application image, some programs could become very large, but it is rare to see programs above 200-300 MB. This was probably the reason why this 2GB limit in batch management portal makes sense. But in some cases some people might need to upload some application package which exceeds this size limit. Naturally it is not the executables themselves that are so large. For example to run image processing algorithm on volumetric (3D) brain images might require to use some template files that makes this application package big.</p>

<h2 id="the-workaround">The Workaround</h2>

<p>Even if Batch service App management portal doesn’t allow users to upload application images which exceeds this limit, we could still be able to do it by using AzCopy (<a href="http://azure.microsoft.com/en-us/documentation/articles/storage-use-azcopy/">http://azure.microsoft.com/en-us/documentation/articles/storage-use-azcopy/</a>) or similar Azure storage application to upload those. To be able to do it, we need to understand how Azure Batch Service App uses Azure Storage.</p>

<h3 id="understanding-azure-batch-storage">Understanding Azure Batch Storage</h3>

<p>A storage account is automatically created when Azure batch App Service is created. This is created in the same region where Batch app service is created and the storage name starts with ‘batchapps’. The exact naming convention of the storage account is “batchapps” + (random padding up to 24 chars). The storage account for a particular service can be found via Batch Apps management portal: Select service, “View details” -&gt; “Sync access Keys”.</p>

<h3 id="getting-storage-access-keys">Getting Storage Access Keys</h3>

<p>Once you get Azure storage account from Batch portal, you could proceed to Management Portal to get storage access key which will be required to upload these executables. For detailed instructions, please visit ‘View, copy, and regenerate storage access keys’ section of this article (<a href="http://azure.microsoft.com/en-us/documentation/articles/storage-create-storage-account/">http://azure.microsoft.com/en-us/documentation/articles/storage-create-storage-account/</a>).</p>

<h3 id="storage-container-naming-convention">Storage Container Naming Convention</h3>

<p>The storage account where this executable ends up is created in your subscription so you can actually locate this and upload your zip to a container called:</p>

<ul>
  <li><strong>“clouddrives”</strong> (for cloud assemblies)</li>
  <li><strong>“greenbutton-cloud-assemblies”</strong> (for application image)</li>
</ul>

<p>Create it if it doesn’t exist. The naming convention of the storage account but is a not-so-attractive-looking generated string and should stand out! The blob should be named <strong>ApplicationName.zip</strong> where ApplicationName is that defined in your Cloud Assembly.</p>

<h3 id="uploading-large-files">Uploading Large Files</h3>

<p>Following the above mentioned naming convention, now you can use AzCopy or any other tools of your choice to upload those to storage so that Azure Batch App service finds those apps correctly.</p>

<h2 id="additional-resources">Additional Resources</h2>

<ul>
  <li>To learn more about Azure Batch, please visit <a href="http://azure.microsoft.com/en-us/documentation/services/batch/">http://azure.microsoft.com/en-us/documentation/services/batch/</a></li>
  <li>You could find Azure storage related information here (<a href="http://azure.microsoft.com/en-us/documentation/services/storage/">http://azure.microsoft.com/en-us/documentation/services/storage/</a>)</li>
</ul>]]></content><author><name>Suprasanna Sarkar</name></author><category term="Azure" /><category term="Cloud Computing" /><category term="Azure Batch" /><category term="Azure Storage" /><category term="AzCopy" /><category term="Cloud Computing" /><summary type="html"><![CDATA[Workaround for uploading application images larger than 2GB to Azure Batch Apps using AzCopy and Azure Storage.]]></summary></entry><entry><title type="html">Azure Batch Apps Task Dependencies and Specifying Intermediate Output File as Required Files</title><link href="https://suprasanna.com/azure/cloud%20computing/azure-batch-task-dependencies/" rel="alternate" type="text/html" title="Azure Batch Apps Task Dependencies and Specifying Intermediate Output File as Required Files" /><published>2015-03-11T00:00:00-04:00</published><updated>2015-03-11T00:00:00-04:00</updated><id>https://suprasanna.com/azure/cloud%20computing/azure-batch-task-dependencies</id><content type="html" xml:base="https://suprasanna.com/azure/cloud%20computing/azure-batch-task-dependencies/"><![CDATA[<p>Azure Batch Service enables developers to provision compute resources on demand at very high scale and to process lot of compute work efficiently. The Batch makes it easy for developers to use the cloud and take advantage of scale and reliability without needing to learn about managing multiple instances, fault domain, error handling and other concepts that batch service handles. There are two developer scenario for using Azure Batch Service:</p>

<ul>
  <li>Using Azure batch API</li>
  <li>Using Azure Batch Apps API</li>
</ul>

<p>If you would like to learn more about how to get started with Azure Batch Apps please read tutorials from this article (<a href="http://azure.microsoft.com/en-us/documentation/articles/batch-dotnet-get-started/">http://azure.microsoft.com/en-us/documentation/articles/batch-dotnet-get-started/</a>).</p>

<p>Azure Batch Apps is a feature of Azure Batch that provides application-centric way of managing and executing Batch workloads and this article applies to Azure Batch Apps only. It includes a management Portal where you can manage jobs, view logs and download outputs without having to write your own client code.</p>

<h2 id="task-dependencies-in-batch-apps">Task Dependencies in Batch Apps</h2>

<p>Job splitter will split job in subtask. Using Azure Batch Apps API some developer might be interested to use Azure Batch single job to run sequence of consecutive tasks where some phases are embarrassingly parallelizable and some are not. These tasks have several interdependencies, where some tasks can be run in parallel but others must wait for the previous tasks to provide their outputs. They still would like to run the whole pipeline as a single Azure Batch Apps job for simplicity and leverage the Azure Batch Management portal to submitting jobs and downloading outputs.</p>

<p>As of today there is no way in the API to specify intermediate task output files (which doesn’t exist during the job submission) as one of the required files. I tried few ways to make this work within the API, but it didn’t work. RequiredFiles takes IFileSpecifier instead of just a string (e.g. “intermediate.jpg”) and there is no implementation available for IFileSpecifier. I implemented the interface on my own specifying file name as “intermediate.jpg”.</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">class</span> <span class="nc">MyFileSpecifier</span> <span class="p">:</span> <span class="n">IFileSpecifier</span>
<span class="p">{</span>
  <span class="k">public</span> <span class="kt">string</span> <span class="n">Name</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
  <span class="k">public</span> <span class="n">DateTime</span> <span class="n">Timestamp</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
  <span class="k">public</span> <span class="kt">string</span> <span class="n">OriginalPath</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
  <span class="k">public</span> <span class="kt">string</span> <span class="n">Hash</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span> 
</code></pre></div></div>

<p>But this didn’t work (An exception occurred processing task 2: One or more errors occurred. The remote server returned an error: (404) Not Found).</p>

<p>The batch apps runtime doesn’t seem to download intermediate file from the job’s container and hence ‘Task.RequiredFiles’ doesn’t contain outputs from previous completed tasks, only the original input file is available. It make sense for batch apps runtime not to download all the intermediate output files because you don’t always need these files in the Embarrassingly Parallel compute. Besides it could slow down the task execution time significantly and could be waste of disk space downloading these intermediate files unnecessarily.</p>

<h2 id="solution-approaches">Solution Approaches</h2>

<p>As the runtime doesn’t seem to download intermediate files, the batch application cloud assembly can download only the required files from the job’s container to the target virtual machines. Job’s container is located inside the azure storage account associated with batch service. There are several ways to achieve this and some of these includes the following:</p>

<ul>
  <li>
    <p><strong>Using Azure Batch Apps REST API</strong> (<a href="https://msdn.microsoft.com/en-us/library/azure/dn820126.aspx">https://msdn.microsoft.com/en-us/library/azure/dn820126.aspx</a>), but this requires OAuth 2.0 authentication of your application with Azure Active Directory (Requests must be authenticated using an OAuth 2 bearer token issued by Azure Active Directory) which involves several more steps in Azure portal and bit more coding required to make that works. I would write another blog post showing this in future.</p>
  </li>
  <li>
    <p><strong>Using Azcopy or Azure Storage library</strong> (Microsoft.WindowsAzure.Storage nuget) is the simplest approach to get this working along with Azure Batch Apps .NET API in the cloud assembly.</p>
  </li>
</ul>

<h2 id="using-azure-storage-library-to-get-required-files-in-tvm">Using Azure Storage Library to Get Required Files in TVM</h2>

<p>A storage account is automatically created when Azure batch App Service is created. This is created in the same region where Batch app service is created and the storage name starts with ‘batchapps’. We need to get the Azure Storage account Name and access key in our cloud assembly in order to get those file downloaded in the task virtual machines.</p>

<p>We need to add the code to ParallelTaskProcessor implementation of our code.</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">protected</span> <span class="k">override</span> <span class="n">TaskProcessResult</span> <span class="nf">RunExternalTaskProcess</span><span class="p">(</span><span class="n">ITask</span> <span class="n">task</span><span class="p">,</span> <span class="n">TaskExecutionSettings</span> <span class="n">settings</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">string</span> <span class="n">blobName</span> <span class="p">=</span> <span class="s">"resliced_brain.nii"</span><span class="p">;</span>
    <span class="kt">string</span> <span class="n">blobContainerName</span> <span class="p">=</span> <span class="s">"job-"</span> <span class="p">+</span> <span class="n">task</span><span class="p">.</span><span class="n">JobId</span><span class="p">.</span><span class="nf">ToString</span><span class="p">();</span>
    <span class="kt">string</span> <span class="n">targetFolder</span> <span class="p">=</span> <span class="n">LocalStoragePath</span><span class="p">;</span>
    <span class="kt">string</span> <span class="n">strFileDownloaded</span> <span class="p">=</span> <span class="kt">string</span><span class="p">.</span><span class="n">Empty</span><span class="p">;</span>
    <span class="kt">string</span> <span class="n">storageAccountName</span> <span class="p">=</span> <span class="s">"[YOUR BATCH STORAGE ACCOUNT NAME]"</span><span class="p">;</span>
    <span class="kt">string</span> <span class="n">storageAccountKey</span> <span class="p">=</span> <span class="s">"[YOUR BATCH STORAGE ACCOUNT KEY]"</span><span class="p">;</span>
    
    <span class="kt">string</span> <span class="n">connectionString</span> <span class="p">=</span> <span class="kt">string</span><span class="p">.</span><span class="nf">Format</span><span class="p">(</span><span class="s">@"DefaultEndpointsProtocol=https;AccountName={0};AccountKey={1}"</span><span class="p">,</span>
       <span class="n">storageAccountName</span><span class="p">,</span> <span class="n">storageAccountKey</span><span class="p">);</span>

    <span class="c1">//get a reference to the container where you want to put the files</span>
    <span class="n">CloudStorageAccount</span> <span class="n">cloudStorageAccount</span> <span class="p">=</span> <span class="n">CloudStorageAccount</span><span class="p">.</span><span class="nf">Parse</span><span class="p">(</span><span class="n">connectionString</span><span class="p">);</span>
    <span class="n">CloudBlobClient</span> <span class="n">cloudBlobClient</span> <span class="p">=</span> <span class="n">cloudStorageAccount</span><span class="p">.</span><span class="nf">CreateCloudBlobClient</span><span class="p">();</span>
    <span class="n">CloudBlobContainer</span> <span class="n">cloudBlobContainer</span> <span class="p">=</span> <span class="n">cloudBlobClient</span><span class="p">.</span><span class="nf">GetContainerReference</span><span class="p">(</span><span class="n">blobContainerName</span><span class="p">);</span>
    <span class="n">CloudBlockBlob</span> <span class="n">blobSource</span> <span class="p">=</span> <span class="n">cloudBlobContainer</span><span class="p">.</span><span class="nf">GetBlockBlobReference</span><span class="p">(</span><span class="n">blobName</span><span class="p">);</span>

    <span class="k">if</span> <span class="p">(</span><span class="n">blobSource</span><span class="p">.</span><span class="nf">Exists</span><span class="p">())</span>
    <span class="p">{</span>
        <span class="c1">//blob storage uses forward slashes, windows uses backward slashes; do a replace</span>
        <span class="c1">// so localPath will be right</span>
        <span class="kt">string</span> <span class="n">localDestination</span> <span class="p">=</span> <span class="n">Path</span><span class="p">.</span><span class="nf">Combine</span><span class="p">(</span><span class="n">targetFolder</span><span class="p">,</span> <span class="n">blobSource</span><span class="p">.</span><span class="n">Name</span><span class="p">.</span><span class="nf">Replace</span><span class="p">(</span><span class="s">@"/"</span><span class="p">,</span> <span class="s">@"\"</span><span class="p">));</span>
        <span class="c1">//if the directory path matching the "folders" in the blob name don't exist, create them</span>
        <span class="kt">string</span> <span class="n">dirPath</span> <span class="p">=</span> <span class="n">Path</span><span class="p">.</span><span class="nf">GetDirectoryName</span><span class="p">(</span><span class="n">localDestination</span><span class="p">);</span>
        
        <span class="k">if</span> <span class="p">(!</span><span class="n">Directory</span><span class="p">.</span><span class="nf">Exists</span><span class="p">(</span><span class="n">localDestination</span><span class="p">))</span>
        <span class="p">{</span>
            <span class="n">Directory</span><span class="p">.</span><span class="nf">CreateDirectory</span><span class="p">(</span><span class="n">dirPath</span><span class="p">);</span>
        <span class="p">}</span>
        <span class="n">blobSource</span><span class="p">.</span><span class="nf">DownloadToFile</span><span class="p">(</span><span class="n">localDestination</span><span class="p">,</span> <span class="n">FileMode</span><span class="p">.</span><span class="n">Create</span><span class="p">);</span>
        <span class="n">strFileDownloaded</span> <span class="p">=</span> <span class="n">localDestination</span><span class="p">;</span>
    <span class="p">}</span>
    
    <span class="kt">string</span> <span class="n">strInputFileName</span> <span class="p">=</span> <span class="s">"resliced_brain.nii"</span><span class="p">;</span>
    <span class="kt">string</span> <span class="n">blobContainerName</span> <span class="p">=</span> <span class="s">"job-"</span> <span class="p">+</span> <span class="n">task</span><span class="p">.</span><span class="n">JobId</span><span class="p">.</span><span class="nf">ToString</span><span class="p">();</span>
    <span class="kt">string</span> <span class="n">inputFile</span> <span class="p">=</span> <span class="nf">DownloadFile</span><span class="p">(</span><span class="n">blobContainerName</span><span class="p">,</span> <span class="n">strInputFileName</span><span class="p">,</span> <span class="n">LocalStoragePath</span><span class="p">);</span>

    <span class="c1">// code to specify this input File as a parameter to run the external processs task</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Source code for the cloud assembly demonstrating this could be found at (<a href="https://github.com/spsarkar/AzureBatchDependenciesStorage">https://github.com/spsarkar/AzureBatchDependenciesStorage</a>)</p>

<h2 id="additional-resources">Additional Resources</h2>

<ul>
  <li>Sign up for the preview here (<a href="https://account.windowsazure.com/PreviewFeatures">https://account.windowsazure.com/PreviewFeatures</a>)</li>
  <li>Learn about Batch (<a href="http://azure.microsoft.com/en-us/documentation/services/batch/">http://azure.microsoft.com/en-us/documentation/services/batch/</a>)</li>
</ul>]]></content><author><name>Suprasanna Sarkar</name></author><category term="Azure" /><category term="Cloud Computing" /><category term="Azure Batch" /><category term="Azure Batch Apps" /><category term="Cloud Computing" /><category term="Parallel Processing" /><summary type="html"><![CDATA[Learn how to handle task dependencies and intermediate output files in Azure Batch Apps using Azure Storage Library.]]></summary></entry></feed>