Mr Partha Maji
I am a PhD researcher in the Computer Architecture Research Lab of the University of Cambridge, England. My current research lies at the intersection of cutting edge machine learning and emerging computer architecture. Before embarking on a full-time research career, I spent almost a decade in the industry (ARM UK and Broadcom UK) as a CPU subsystem architect and an ASIC design engineer, respectively. I was fortunate enough to receive several excellence awards and scholarships including a Mentor Graphics prize for outstanding achievement in the master's degree. My earlier research work on Network-on-Chip architecture for the Gannet multi-core System-on-a-Chip won the best poster and the best presentation award from Epson Europe and the IET UK. I am also an Ex-Chevening scholar at the University of Edinburgh and a fellow at the Cambridge Philosophical Society. My current research is funded by an EPSRC doctoral award. I'm very passionate about machine learning systems design and in spare time, I love tinkering with deep neural networks. I'm also a community mentor at www.deeplearning.ai, where I foster other learners in the field of deep learning.
Co-design of Machine Learning Architecture and Hardware for Embedded Systems: Over the past few years, deep learning has helped forge advances in many different areas as diverse as image classification, machine translation, and voice recognition - all research topics that have long been difficult for AI researchers to crack. A subcategory of machine learning, deep learning deals with the use of neural networks to learn features automatically than writing handcrafted programs. This has become possible because computers become faster than it was a decade before. Most deep learning models are massive and take a long time to train. Currently, researchers use multiple high-end GPUs to train and infer such networks. Considering its potential in the future, it would be very beneficial if we can deploy deep nets (CNN, RNN, LSTM etc.) on embedded platforms. But, the large and complex network models work against the small silicon footprint and very limited power budget required for portable computers, tablets, and smartphones. My research interests lie in understanding the behavior of these complex models, optimizing them, and development of novel low power compute accelerator architectures that can support running them efficiently. My approach heavily involves cross-layer optimization across the stack - from model, implementation to underlying hardware architecture. I also aim at solutions which are mathematically guided and have a solid reasoning.
Maji P. (Cambridge), Mundy A. (ARM Research), Dasika G. (ARM Research), Beu J. (ARM Research), Mattina M. (ARM Research), and Mullins R. (Cambridge). Efficient Winograd or Cook-Toom Convolution Kernel Implementation on Widely Used Mobile CPUs. IEEE HPCA.EMC2 (Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications) 2019
Maji P., Mullins R. (2017) 1D-FALCON: Accelerating Deep Convolutional Neural Network Inference by Co-optimization of Models and Underlying Arithmetic Implementation. In: Lintas A., Rovetta S., Verschure P., Villa A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2017. ICANN 2017. Lecture Notes in Computer Science, vol 10614. Springer, Cham
Maji, P.; Mullins, R. On the Reduction of Computational Complexity of Deep Convolutional Neural Networks. Entropy 2018, 20, 305.
Maji, P., Bates, D., Chadwick, A', and Mullins, R. ADaPT: optimizing CNN inference on IoT and mobile devices using approximately separable 1-D kernels. In Proceedings of the 1st International Conference on Internet of Things and Machine Learning (IML 2017). ACM, New York, NY, USA, Article 43, 12 pages.