About the Customer
HubX is a leading technology hub that develops AI-powered, highly scalable mobile applications and games, reaching over 300 million users in more than 170 countries. It operates in a collaborative structure, sharing central resources with autonomous in-house studios. The internal team, the subject of this case study, focused on increasing the operational efficiency and speed of AI-powered image generation services that form the foundation of products used by millions. HubX is committed to providing fast and seamless AI experiences to its users.
Customer Challenge
HubX’s AI-powered image generation model was architected to run on an event-driven, serverless-like architecture built on the AWS ecosystem, auto-scaled with Amazon SQS and KEDA. In this high-performance architecture, which specifically utilized a massive 15 GB container image, designing a highly reliable and fast operational structure that could scale consistently and faultlessly despite high concurrent demand was a critical challenge. Although the architecture delivered on its promise of scalability, initial measurements indicated that processing times needed optimization. The end-to-end processing time for each image generation request was measured at an unacceptably high 518 seconds (approximately 8.5 minutes). This latency posed a risk that could significantly affect the application’s potential for global adoption and was well above the targets for user experience.
The root causes stemmed from two major bottlenecks:
- Node Warm-up Time: Due to the use of standard NVIDIA AMIs, it took 3–4 minutes for the nodes (g6e.xlarge and g6e.2xlarge GPUs) to be ready.
- Large Image and Model Sizes: The 15 GB container image and the 40 GB AI model being pulled for every new pod caused the pull time alone to exceed 8 minutes.
These lengthy durations represented non-ideal waiting times for HubX’s high-standard AI projects and thereby carried the risk of negatively impacting operational efficiency. Furthermore, resources sitting idle for long periods before becoming fully available potentially led to unnecessarily high operational costs. To address these critical challenges, the HubX and Onkatec teams began the project by creating scenarios for comprehensive optimization work.
Partner Solution
The DevOps and ML engineering teams at HubX, in collaboration with Onkatec, an AWS Advanced Partner, implemented a comprehensive three-phase optimization strategy to resolve this critical latency issue. The main goal was to optimize every layer of the architecture, creating a fast and scalable production pipeline for high-performance ML workloads.
Key AWS Services: Amazon EKS, Bottlerocket AMIs, and Amazon EFS (Elastic Mode).
Phase 1: Node Warm-up Time Optimization
To eliminate high initial latency, the focus was on the operating system layer. They switched from standard NVIDIA AMIs on EKS to Bottlerocket AMIs, a container-focused, minimal Linux distribution. Bottlerocket’s minimalist OS design and atomic updates significantly reduced boot times. Furthermore, the “Warm Pool” strategy, implemented using KEDA and Kubernetes PriorityClasses, successfully reduced the cold start waiting time from over 3-4 minutes to just 10 seconds.
Phase 2: Image and Model Size Optimization
The second phase of the optimization strategy began with reducing the container image size. To eliminate the 8-minute delay caused by the existence of the 15 GB container image and the attempt to pull the 40 GB AI model every time the image was downloaded, two main steps were followed:
- Model Migration to EFS: The 40 GB AI model was completely removed from the container image and configured to be retrieved from Amazon EFS, a scalable, low-latency shared storage solution. This strategy initially reduced the container image size from 15 GB to 8 GB. EFS’s Elastic Mode feature was utilized to find the optimal performance/cost balance for model storage.
- Multistage Dockerfile and Image Cleanup: With the technical know-how and suggestions provided by AWS Partner Onkatec, Multistage Dockerfile usage was implemented to eliminate unnecessary build-time dependencies. As a result of this meticulous cleanup and restructuring, the final production image size was impressively reduced to 3.6 GB.
Phase 3: Image Prefetching with Bottlerocket Data Volume
The most critical improvement was achieved by leveraging Bottlerocket’s split disk layout—the read-only OS volume and the writable Data volume. A base image was pulled once, and a snapshot of the EBS Data Volume containing this pulled image was taken. New node groups were then launched from this snapshot with the preloaded image. This reduced the container image pull time, which had been lowered to 2 minutes and 45 seconds by Phase 2 efforts, down to just 8 seconds, making the pull process nearly instantaneous.
Results and Benefits
The strategic optimizations implemented resulted in a dramatic performance improvement for the AI workload and successfully addressed all initial challenges. The specific metrics obtained prove the success of the solution:
Metric |
Initial Performance |
Optimized Performance |
Improvement |
End-to-End Processing Time |
518 seconds (8.5 minutes) |
73.94 seconds |
85.7% Reduction |
Node Warm-up Time |
3–4 minutes |
10 seconds (with Warm Pool) |
~96% Reduction |
Image Pull Time |
8 minutes |
8 seconds |
98.3% Reduction |
About the Partner
With a focus on Cloud DevOps and infrastructure optimization, Onkatec provides SLA-driven AWS solutions to businesses that have critical workloads. Onkatec provides extensive services in cloud migration, infrastructure as code (IaC), CI/CD pipelines, observability, and cloud-native development. The company has a team of certified professionals and demonstrated AWS expertise.
Our AWS-validated skills guarantee high-performing, scalable, and secure cloud platforms that are customized to meet the demands of our clients. We assist businesses in lowering cloud complexity and accelerating digital transformation with proactive support, expert-led automation, and best practices in architecture. Onkatec has continuously produced quantifiable business results, such as improved uptime, a 40% decrease in IT operating expenses, and a quicker time to market for application deployments.