Infrastructure Engineer(kernel API)
Location: San Jose, CA day one onsite (4days)
Job description:
Problem Solving and Deep-Level Troubleshooting: Investigating and troubleshooting problems and hardware faults that our automation can’t determine within our GPU platforms. This will involve taking data from system logs, kernel logs, BMC redfish APIs, and if the data is not there, working with hardware and kernel engineers to add information you need to make accurate determinations.
Coordination and Collaboration: Working closely with our Data Centre Operations, Hardware Engineering and Capacity Planning teams to repair and remediate failed hardware, ensure consistent delivery of new hardware to customers, and roll out new upgrades across the fleet
Automation and Tool Development: Automate routine processes and build hardware diagnostics, provisioning and repair tooling
Build Processes and Documentation: When you figure out the best way to do something, you’ll be working on building processes, documentation and tooling to help the next person who finds this problem
Validate and Test new hardware: Crusoe is often the first company in the world to get the latest generation AI hardware, before it’s fully tested. Conducting rigorous testing and validation on such cutting-edge hardware and servers that comes back from repair
On-Call: Participate in our on-call rotation, partnering with our US teams to provide follow-the-sun coverage
What You’ll Bring to the Team
Strong analytical, troubleshooting and problem-solving skills: Our automation takes care of the easy problems, you’ll be digging deep to figure out the hard ones
Linux experience: You’ll have solid unAbout the Rolederstanding of Linux internals and feel at home working in a terminal
Server Hardware and Provisioning: Exposure to server-class hardware & provisioning
Fundamentals of Hardware and Networking: You don’t need to be an expert, but you should know if an error message is due to a failed hardware component, a firmware bug, or a networking misconfiguration without escalating
Excellent communication and collaboration skills: You’ll be working with many different people across a lot of different teams - communication is critical
Education: Bachelor's Degree in Computer Science, related field, or self-educated in computer science fundamentals.
Bonus Points
Large-scale GPU operations: We work with cutting edge hardware and software, so we understand most people won’t have worked with it - but it would be nice if you have!
Programming Proficiency: Proficiency with at least one programming language (Python, Go, or similar
Recommended Jobs
HEMATOLOGY/ONCOLOGY NP/PA
HEMATOLOGY/ONCOLOGY NP/PA LOCATION: SAN FRANCISCO, CA SETTING: MEDICAL CENTER TERM: 24 WEEKS SHIFT SCHEDULE: 5X8 OR 4X10 PAY RATE: $150-$160/HR A hospital in San Francisco, California…
Project Engineer
Job Description Hill International is seeking a highly organized and detail-oriented Project Engineer to support a construction management firm acting as the Owner's Authorized Representative on…
NetSuite Integration Specialist
We are looking for an experienced NetSuite Consultant to support a nonprofit organization with upcoming system improvement and integration work. In this role, you will work closely with stakeholders …
Travel Nurse RN - Labor & Delivery - $3,600 to $3,700 per week in Santa Clara, CA
Registered Nurse (RN) | Labor & Delivery Location: Santa Clara, CA Agency: Compunnel Healthcare Pay: $3,600 to $3,700 per week Shift Information: Days - 3 days x 12 hours Contr…
Delivery Driver
We are seeking a dependable and efficient Delivery Driver to join our team. This role involves ensuring timely deliveries, maintaining vehicle cleanliness, and providing outstanding customer service …
Service Champion
As a Service Champion, you will be responsible for: Serving each guest courteously, quickly and efficiently with a sincere, positive, pleasant and enthusiastic attitude Filling orders Operat…
Senior Machine Learning Engineer
Job Description Job Description Orchard Robotics is a Series A startup backed by top VCs like Quiet Capital, Shine Capital, and General Catalyst. We're securing America’s food supply by building…
Travel Nurse RN - Cardiovascular Intensive Care Unit - $2,227 per week in Larkspur, CA
Registered Nurse (RN) | Cardiovascular Intensive Care Unit Location: Larkspur, CA Agency: OneStaff Medical Pay: $2,227 per week Shift Information: Rotating - 5 days x 8 hours C…
Senior German AI Quality Analyst
Description Join the Human Intelligence Behind the World’s Leading AI – NYCAre you sharp, curious, and obsessed with the details? Welo Data is looking for Data Quality Analysts who can go beyond s…
Principal Electrical Engineer - PCB
Job Description Job Description About Radiant Radiant is an El Segundo, CA-based startup building the world’s first mass-produced, portable nuclear microreactors. The company’s first reactor, Ka…