Bachelors of Technology
Vaish College of Engineering, Rohtak
August 2012 - July 2016
Department of Computer Science and Engineering
Optimizing the Linear Fascicle Evaluation Algorithm
Diffusion magnetic resonance imaging (dMRI) is popular macro-scale level method for building brain connectivity graphs; helping in understanding the brain-behavior relationship. However, it faces a serious shortcoming of lack of access to ground truths. The LiFE algorithm is a popular techniques to tackle some of these issues. The LiFE is a heavily compute-intensive model due to multiple irregular array accesses taking several hours to prune even a low-resolution dMRI dataset. We incorporated various platform independent and dependent optimization techniques to reduce the execution time of 225 min using the original sequential CPU approach to 7.5 min and 0.83 min on CPU (16-core Skylake based processor) and GPU (NVIDIA Titan based) systems respectively.
Optimizing the Harris Corner Detection Benchmark
Harris corner algorithm is a widely used image-processing benchmark for detecting corners of an image. Image-processing pipelines are viewed as interconnected stages of a graph; where each stage takes input from previous stages. The computations involved are huge and heavily bandwidth bound. Post compiler optimization techniques such as loop tiling and code parallelization, we achieved a speedup of 4.4x on a CPU (4-core Haswell based) over the OpenCV library code of the Harris corner.
- Karan Aggarwal, Uday Bondhugula, "Optimizing the Linear Fascicle Evaluation Algorithm for Many-Core Systems", ACM International Conference on Supercomputing (ICS) 2019, Phoenix, Arizona, USA [pdf]
- Karan Aggarwal, Uday Bondhugula, Varsha Sreenivasan, Devarajan Sridharan "Optimizing the Linear Fascicle Evaluation Algorithm for Multi-Core and Many-Core Systems", ACM Transactions on Architecture and Compiler Optimizations (TACO) 2019, (under review) [pdf]
Software Engineering Intern
AlphaICs Pvt. Ltd. Bangalore
May 2018 - July 2018
Manager: Siddharth Tiwary
AlphaICs Pvt. Ltd. is a startup focused on designing a specialized hardware chip for the AI-based applications, known as Real-AI processor (RAP). In one of the projects, I designed an efficient software-based memory management unit, involving transfer of data from main memory to cache; specialized for the RAP using the modified least recently used page replacement policy. In another project, I was involved in implementation of an optimized library code, specific for the RAP hardware and helped mapping them to the Tensorflow API calls using the flexibility of the XLA compiler.
- Intermediate: C, C++
- Basic: Python, MATLAB, JAVA, JULIA
- Ubuntu, CentOS
- Intermediate: CUDA, OpenMP
- Basic: MRTrix, Tensorflow
- Mythological and Sci-fi Movies