Hayat’s Homepage

I am pursuing PhD degree in Computer Science (major in Computer Vision) at Florida Atlantic University under the supervision of Prof. Arslan Munir. I work as Graduate Research Assistant at Intelligent Systems, Computer Architecture, Analytics and Security Laboratory (ISCAAS LAB). Earlier, I completed my Master’s degree in 2021 at Sejong University, South Korea.

Research Interests

My primary research focuses on multimodal-driven video analytics, encompassing action and activity recognition, temporal action localization, and spatio-temporal action detection in videos. I specialize in the pretraining, fine-tuning, and zero-shot evaluation of vision-language models (VLMs) for complex video understanding tasks. In addition, my expertise extends to knowledge distillation, enhancing adversarial robustness, and scaling deep learning models through distributed training. I am also actively involved in benchmarking the performance of different HPC Nodes (NVIDIA) using large language models (LLMs) and VLMs.

News

Sep, 2025: We are excited to announce that our paper entitled “DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition” has been accepted for publication in IEEE Transactions on Circuits and Systems for Video Technology.
Sep, 2025: We are excited to announce the publication of our white paper in collaboration with NVIDIA’s Sustainable Computing Library.
Sep, 2025: Serving as a reviewer for AAAI 2026.
Oct, 2024: I successfully passed my Ph.D. Candidacy Exam.

Work Experience

Florida Atlantic University
Graduate Research Assistant
August 2024 - Present

Kansas State University
Graduate Research Assistant
January 2022 - July 2024

Sejong University
Full-time Researcher
Feburary 2021 - December 2021

NINE VR
Machine Learning Engineer (Intern)
July 2020 - December 2020

Sejong University
Graduate Research Assistant
March 2019 - January 2021

Selected Publications

Comparison of Air-Cooled versus Liquid-Cooled NVIDIA GPU Systems [Online]

NVIDIA's Sustainable Computing Resource Center platform

This study compares liquid-cooled and air-cooled 8× NVIDIA HGX™ GPU systems, showing that direct-to-chip liquid cooling keeps GPUs cooler, boosts performance, and reduces power use by up to 16%. When scaled to large data centers, these efficiency gains translate to millions in annual energy savings, making liquid cooling a sustainable and high-performance choice for future AI infrastructure.

Cooling Matters: Benchmarking Large Language Models and Vision-Language Models on Liquid-Cooled Versus Air-Cooled H100 GPU Systems [PDF]

arXiv

We benchmark LLMs and VLMs on two HGX nodes with 8× NVIDIA H100 GPUs using liquid and air cooling. Liquid-cooled systems sustain lower temperatures (41–50°C vs. 54–72°C) and deliver 17% higher performance (54 vs. 46 TFLOPs/GPU), improved performance-per-watt, and reduced energy overhead. These results highlight liquid cooling's efficiency and sustainability advantages for scaling AI infrastructure.

Hierarchical Multi-Stage Transformer Architecture for Context-Aware Temporal Action Localization [PDF]

arXiv

We introduce PCL-Former, a hierarchical multi-stage transformer for temporal action localization. It decomposes the task into three stages, each handled by a dedicated transformer module. The Proposal-Former detects candidate segments in untrimmed videos. The Classification-Former identifies action categories within those segments. The Localization-Former refines temporal boundaries with specialized loss functions.

DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition [PDF]

IEEE Transactions on Circuits and Systems for Video Technology

We introduces a computationally efficient VFL-Net model, optimized for spatio-temporal context modeling using nano-scale spatio-temporal focal modulation mechanism. Further, we combine the forward Kullback– Leibler (KL) divergence and spatio-temporal focal modulation to distill the local and global spatio-temporal context from the Video-FocalNet Base (teacher) to our proposed VFL-Net (student) model.

OD-VIRAT: A Large-Scale Benchmark for Object Detection in Realistic Surveillance Environments [PDF]

arXiv

We introduce two object detection benchmarks, OD-VIRAT Large and OD-VIRAT Tiny, for surveillance imagery. Both cover 10 scenes recorded from significant height and distance. OD-VIRAT Large contains 8.7 million instances in 599,996 images, while OD-VIRAT Tiny has 288,901 instances in 19,860 images. Our proposed OD-VIRAT offers rich annotations of bounding boxes and categories.

Improving Adversarial Robustness Through Adaptive Learning-Driven Multi-Teacher Knowledge Distillation [PDF]

arXiv

We propose a multi-teacher adversarial robustness distillation framework with adaptive weighting. Adversarially trained CNNs on perturbed data act as teachers for a student model trained on clean data. Adaptive weights adjust the teachers' contributions based on precision. This enhances the student's learning and robustness to adversarial attacks. The student model remains resilient without exposure to perturbed data.

Vision-Based Semantic Segmentation in Scene Understanding for Autonomous Driving: Recent Achievements, Challenges, and Outlooks [PDF]

IEEE Transactions on Intelligent Transportation

This survey reviews the current achievements in scene understanding, focusing on computationally complex deep learning models. It outlines the generic pipeline, evaluates state-of-the-art performance, and analyzes the time complexity of advanced modeling approaches. Additionally, it highlights key successes and limitations in current research efforts.

Efficient Fire Segmentation for IoT-Assisted Intelligent Transportation Systems [PDF]

IEEE Transactions on Intelligent Transportation

We propose an efficient and lightweight CNN architecture for early fire detection and segmentation, focusing on IoT-enabled ITS environments. We effectively utilize depth-wise separable convolution, point-wise group convolution, and a channel shuffling strategy with an optimal number of convolution kernels per layer, significantly reducing the model size and computation costs.

Light-DehazeNet: A Novel Lightweight CNN Architecture for Single Image Dehazing [PDF]

IEEE Transactions on Image Processing

We present Light-DehazeNet (LD-Net), a lightweight CNN for hazy image reconstruction that jointly estimates the transmission map and atmospheric light using a transformed scattering model. A color visibility restoration method is proposed to avoid color distortion. Extensive experiments are conducted with synthetic and natural hazy images.

Cascaded Deep Reinforcement Learning-Based Multi-Revolution Low-Thrust Spacecraft Orbit-Transfer [PDF]

IEEE Access

We introduce a cascaded deep reinforcement learning (DRL) model to guide low-thrust spacecraft toward desired orbits by determining optimal thrust directions. A gradient-aided reward function based on orbital elements ensures mission requirements and optimal flight times. Results demonstrate time-efficient, near-optimal orbit-raising. This approach effectively improves spacecraft trajectory planning.

Professional Services

Reviewer at Journals

IEEE Transactions on Image Processing
IEEE Transactions on MultiMedia
IEEE Transactions on Circuits and Systems for Video Technology
IEEE Access
Elsevier Journal of Image and Vision Computing
Neural Computing and Applications

Reviewer at Conferences

AAAI’ 2022
AAAI’ 2026

Hayat Ullah

Research Interests

News

Work Experience

Selected Publications

Comparison of Air-Cooled versus Liquid-Cooled NVIDIA GPU Systems [Online]

Cooling Matters: Benchmarking Large Language Models and Vision-Language Models on Liquid-Cooled Versus Air-Cooled H100 GPU Systems [PDF]

Hierarchical Multi-Stage Transformer Architecture for Context-Aware Temporal Action Localization [PDF]

DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition [PDF]

OD-VIRAT: A Large-Scale Benchmark for Object Detection in Realistic Surveillance Environments [PDF]

Improving Adversarial Robustness Through Adaptive Learning-Driven Multi-Teacher Knowledge Distillation [PDF]

Vision-Based Semantic Segmentation in Scene Understanding for Autonomous Driving: Recent Achievements, Challenges, and Outlooks [PDF]

Efficient Fire Segmentation for IoT-Assisted Intelligent Transportation Systems [PDF]

Light-DehazeNet: A Novel Lightweight CNN Architecture for Single Image Dehazing [PDF]

Cascaded Deep Reinforcement Learning-Based Multi-Revolution Low-Thrust Spacecraft Orbit-Transfer [PDF]

Professional Services