AI/ML Computing Cluster Engineer

Sk Hynix America
San Jose, CA

Job Title: AI/ML Computing Cluster Engineer
Office Location: San Jose, CA
Work Model: Onsite

About SK hynix America
At SK hynix America, we're at the forefront of semiconductor innovation, developing advanced memory solutions that power everything from smartphones to data centers. As a global leader in DRAM and NAND flash technologies, we drive the evolution of advancing mobile technology, empowering cloud computing, and pioneering future technologies. Our cutting-edge memory technologies are essential in today's most advanced electronic devices and IT infrastructure, enabling enhanced performance and user experiences across the digital landscape.
We're looking for innovative minds to join our mission of shaping the future of technology. At SK hynix America, you'll be part of a team that's pioneering breakthrough memory solutions while maintaining a strong commitment to sustainability. We're not just adapting to technological change – we're driving it, with significant investments in artificial intelligence, machine learning, and eco-friendly solutions and operational practices. As we continue to expand our market presence and push the boundaries of what's possible in semiconductor technology, we invite you to be part of our journey to creating the next generation of memory solutions that will define the future of computing.

Job Overview:

As the AI/ML Computing Cluster engineer, you will work on development and operation of high-performance computing clusters supporting AI/ML workloads. You will be responsible for development, implementation, operation, and optimization of AI data center IT environments to ensure scalability, performance, reliability, and cost-effectiveness. This role requires collaboration with cross-functional teams to align computing infrastructure with the organization's strategic direction.

Responsibilities:

Computing Cluster Infrastructure Development


  • Design and implement distributed computing cluster infrastructure to support large-scale AI/ML model training and inference jobs with a focus on transformer-based AI models.


  • Build and maintain distributed system to ensure scalability, efficient resource allocation, and high throughput.

  • Optimize cluster performance through hardware selection, equipment configuration, network engineering, and performance analysis.

  • Deploy and operate data center networking infrastructure using software system for automation, design validation, deployment, and operational support.

  • Implement tools and processes to maintain high uptime and ensure infrastructure reliability during both model training and inference phases.

  • Identify and resolve performance bottlenecks, improving overall system throughput and response times.

Team Leadership & Collaboration


  • Collaborate with cross-functional teams, including research, security, and benchmark test engineering teams, to integrate infrastructure with AI workflows, ensuring seamless deployment and operation.


  • Engage with technology vendors and partners to evaluate new solutions to drive innovation in AI computing infrastructure.

Qualification:


  • Master’s degree or above in Computer Science, Electrical Engineering, or related fields.

  • 2+ years of experience in AI cluster engineering, MLOps, and benchmark testing, including GPU performance analysis, memory usage, and energy/power monitoring tools.

  • Strong familiarity with AI computing architecture, AI/ML infrastructure requirements, memory architecture and usages in AI/ML, AI algorithm trends and best practices.


  • Expertise in optimizing resource utilization, improving system throughput, and reducing latency in both training and inference.

Equal Employment Opportunity:

SKHYA is an Equal Employment Opportunity Employer. We provide equal employment opportunities to all qualified applicants and employees and prohibit discrimination and harassment of any type without regard to race, sex, pregnancy, sexual orientation, religion, age, gender identity, national origin, color, protected veteran or disability status, genetic information or any other status protected under federal, state, or local applicable laws.

Compensation:

Our compensation reflects the cost of labor across several U.S. geographic markets, and we pay differently based on those defined markets. Pay within the provided range varies by work location and may also depend on job-related skills and experience. Your Recruiter can share more about the specific salary range for the job location during the hiring process.

Pay Range

$100,000 - $150,000 USD

Posted 2025-10-25

Recommended Jobs

Certified Phlebotomy Tech 1- Mobile Phlebotomist- FT Day Shift

University of California, Irvine
Orange, CA

Overview: UCI Health is the clinical enterprise of the University of California, Irvine, and the only academic health system based in Orange County. UCI Health is comprised of its main campus, UCI …

View Details
Posted 2025-11-04

Dishwasher

Post Ranch Inn
Big Sur, CA

Are you inspired by nature, food and beauty? We invite you to join a friendly and innovative team of dedicated hospitality professionals to provide a world class guest experience at Post Ranch Inn.…

View Details
Posted 2025-10-01

Staff Software Engineer

Eliseai
San Francisco, CA

About EliseAI EliseAI develops cutting-edge conversational AI technology for industries fundamental to our lives: housing and healthcare. Everything is built on the foundation of health and home. Br…

View Details
Posted 2025-09-14

Senior Product Manager, Digital Experience

Pure Storage
Santa Clara, CA

We’re in an unbelievably exciting area of tech and are fundamentally reshaping the data storage industry. Here, you lead with innovative thinking, grow along with us, and join the smartest team in th…

View Details
Posted 2025-10-01

Site Reliability Engineer

Runloop
San Francisco, CA

About Runloop Runloop is building the foundational infrastructure for the next generation of AI development. We provide AI engineers and data scientists with lightning-fast, secure, and reproducib…

View Details
Posted 2025-09-22

Full Stack Developer

Gridcare
Redwood City, CA

&##128640; About Us GridCARE solves data center developers' most urgent bottleneck — immediate access to power — through a pioneering physics-based generative AI platform that unlocks gigawatts of …

View Details
Posted 2025-10-25

Senior Software Engineer, Jirachi

Ediphi
Santa Barbara, CA

About Us Ediphi offers the most comprehensive cloud-based estimating and preconstruction solution on the market. Our goal is to shorten the time spent in preconstruction, allowing projects to break…

View Details
Posted 2025-09-22

Customer Service Associate / Inside Sales Support

Envirocheck
Orange, CA

Ideal Candidate Profile Do you fit this profile? Wants to start a CAREER / not just another "job" Wants to make a difference High Moral Standards & Values Positive Helpful Team P…

View Details
Posted 2025-10-19

Sr. Battery Control Systems Engineer

Rivian
California

About Rivian Rivian is on a mission to keep the world adventurous forever. This goes for the emissions-free Electric Adventure Vehicles we build, and the curious, courageous souls we seek to att…

View Details
Posted 2025-10-31

AWS Cloud DevOps Engineer

Odaseva
San Francisco, CA

Since 2012 Odaseva has helped global enterprises protect and secure their most valuable asset: data. Our platform and tools empower data-driven organizations to combat evolving threats, maintain…

View Details
Posted 2025-09-22