Nvidia Jetson Xavier NX review: Redefining GPU accelerated machine learning
Posted by admin on
Nvidia launched the Jetson Xavier NX embedded System-on-Module (SoM) at the end of last year. It is pin-compatible with the Jetson Nano SoM and includes a CPU, a GPU, PMICs, DRAM, and flash storage. However, it was missing an important accessory, its own development kit. Since a SoM is an embedded board with just a row of connector pins, it is hard to use out-of-the-box. A development board connects all the pins on the module to ports like HDMI, Ethernet, and USB. A Jetson module combined with a development board looks similar to a Raspberry Pi or other Single Board Computers (SBC). But don’t be fooled this is no low-end, low-performance device.
Like the Jetson Nano, the Jetson Xavier NX developer kit is a machine learning platform; unlike the Jetson Nano, it isn’t an entry-level device. The Xavier is designed for applications that need some serious AI processing power.
Onboard of the SoM you get a hexa-core CPU using Nvidia’s custom Carmel ARM-based cores, a 384-core Volta-based GPU, and 8GB of LPDDR4x RAM @51.2 GB/s. The development board adds HDMI, DisplayPort, Gigabit Ethernet, 4x USB 3.1 ports, Wi-Fi, Bluetooth, 2x camera connectors, 40 GPIO pins, and an M.2 slot for an SSD!
The 8GB of RAM and support for M.2 NVMe makes this a significant upgrade to the Jetson Nano, but the real upgrade is in the processing power. Compared to the Jetson Nano, the Xavier NX is anywhere between two to seven times faster, depending on the application.
This is due to the improved CPU, hexa-core Nvidia Carmel (ARM v8.2 64-bit with 6 MB L2 + 4 MB L3 caches) upgraded from quad-core Cortex-A57; better GPU, 384-core Voltra compared to 128-core Maxwell; plus the inclusion of 48 tensor cores and two Deep Learning Accelerator (DLA) engines.
Read more: Artificial Intelligence vs Machine Learning: what’s the difference?
Nvidia’s Jetson modules are primarily designed for embedded applications, meaning the SoM will be embedded into a specific product. Anything from robots, drones, machine vision systems, high-resolution sensor arrays, video analytics, and autonomous machines can benefit from the machine learning performance, small form factor, and lower power requirements of the Xavier NX.
Nvidia’s primary aim is to sell the SoMs to device manufacturers. However, the development kit is essential for product design and development, and for anyone who wants to try advanced machine learning at home.
Performance and form factor are essential for embedded projects, but so is power usage. The Jetson Xavier NX delivers up to 21 Trillions Operations per Second (TOPS) while using up to 15 watts of power. When needed the board can be set into a 10W mode. Both power modes can be tweaked depending on how much CPU performance you need compared to the GPU performance. For example, you could run just two CPU cores at 1.9GHz and the GPU at 1.1GHz or alternatively you could use four CPU cores @1.2GHz and clock the GPU at 800Mhz. The level of control is exceptional.
Tell me about the GPU
When you think of Nvidia you probably think about graphics cards and GPUs, and rightly so. While Graphic Processing Units are great for 3D gaming, it also turns out that they are good at running machine learning algorithms. Nvidia has a whole software eco-system based around its CUDA parallel computing and programming model. The CUDA toolkit gives you everything you need to develop GPU-accelerated applications and includes GPU-accelerated libraries, a compiler, development tools, and the CUDA runtime.
I was able to build Doom 3 for the Xavier NX and run it at 4K!
The Jetson Xavier NX has a 384 core GPU based on the Volta architecture. Each generation of GPU from Nvidia is based on a new microarchitecture design. This central design is then used to create different GPUs (with different core counts, and so on) for that generation. The Volta architecture is aimed at the datacenter and at AI applications. It can be found in PC graphic cards like the Nvidia Titan V.
The potential for fast and smooth 3D games, like those based on the various 3D engines released under open source from ID software, is good. I was able to build Doom 3 for the Xavier NX and run it at 4K! At Ultra High Quality the board managed 41 fps. Not bad for 15 watts!
Nvidia has a universal software offering that covers all of its Jetson boards, including the Jetson Nano and the Jetson Xavier NX, called JetPack. It is based on Ubuntu Linux and comes pre-installed with the CUDA toolkit and other relevant GPU accelerated development packages like TensorRT and DeepStream. There is also a large collection of CUDA demos from smoke particle simulations to Mandelbrot rendering with a healthy dose of Gaussian blurs, jpeg encoding, and fog simulations along the way.
Read more: Jetson Nano review: Is it AI for the masses?
Make my machine learn
Having a good GPU for CUDA based computations and for gaming is nice, but the real power of the Jetson Nano is when you start using it for machine learning (or AI as the marketing people like to call it). Jetson Xavier NX supports all the popular AI frameworks including TensorFlow, PyTorch, MxNet, Keras, and Caffe.
All of Nvidia’s Jetson boards come with excellent documentation and example projects. Because they all use the same ecosystem and software (JetPack etc) then the examples work equally as well on the Jetson Nano or on the Jetson Xavier NX. A great place to start is the Hello AI World example. It is simple to download and compile, and in just a few minutes, you will have an AI demo up and running for image classification, object detection, and semantic segmentation, all using pre-trained models.
I fished out a picture of a Jellyfish (pun intended) from my visit to the Monterey Bay Aquarium in 2018 and asked the image classifier to label it.
Why pre-trained? The hardest part about machine learning is getting to the point where you can present data to a model and get a result. Before that the model needs training, and training AI models is not a trivial effort. To help, Nvidia provides pre-trained models as well as a Transfer Learning ToolKit (TLT) which allows developers to take the pre-trained models and retrain them with their own data.
The Hello AI World demo gives you a set of tools to play around with including an image classifier, and an object detection program. These tools can either process photos or use a live camera feed. I fished out a picture of a Jellyfish (pun intended) from my visit to the Monterey Bay Aquarium in 2018 and asked the image classifier to label it.
But this is just the tip of the iceberg. To demonstrate the power of the Xavier NX board, Nvidia has a setup which shows the Xavier NX performing parallel machine learning tasks including gaze detection, pose detection, voice detection, and people detection, all at the same time from video feeds. A service robot in a retail environment would need all of these functions so it can tell when a person is looking at it (gaze detection), what the person is saying (voice detection), and where the person is pointing (pose detection).
The cloud has gone native
One of the core technologies of “the cloud” is containerization. The ability to run self-contained micro-services in a pre-defined environment. However this concept isn’t limited to huge servers in a data center, it can also be applied to smaller devices. Container software like Docker runs on Arm-based systems, including the Raspberry Pi and the Xavier NX. The machine learning demo above is actually four separate containers running in parallel, on the development board.
This means that developers can move away from monolithic firmware images that include the base operating system along with the embedded applications and embrace micro-services and containers. Because the development of a self-contained service can be done without necessarily having to upgrade and update all the other applications then software updates become easier, and the options for scaling increase.
The Xavier NX fully supports Docker and the containers have full access to the machine learning capabilities of the board including the GPU, the tensor cores, and the DLA engines.
How fast is the Nvidia Jetson Xavier NX?
For those interested in some actual performance numbers. Using my “threadtesttool” (here on GitHub) with eight threads each calculating the first 12,500,000 primes, the Jetson Xavier was able to perform the test in 15 seconds. This compares to 46 seconds on the Jetson Nano and 92 seconds on a Raspberry Pi 4.
The tool can also test single-core performance by asking it use just one thread. That takes 10 seconds on the Jetson Xavier NX and 46 seconds on the Raspberry Pi 4. If you set the Xavier NX into its 2x core 15W mode, where the CPU clock speeds are higher, then performing the same test takes only seven seconds!
Here are some CUDA performance numbers comparing the Jetson Nano with the Jetson Xavier:
Jetson Nano | Jetson Xavier NX | |
---|---|---|
convolutionFFT2D (in secs) | 15.1 | 8.4 |
fastWalshTransform (in secs) | 12.2 | 3.5 |
matrixMul (in GFlop/s) | 30.2 | 215.25 |
sortingNetworks | 21.2 | 5.0 |
Even a cursory look at these numbers shows how much faster the Xavier NX is compared to the Nano.
When you factor in the support for a 4K display, 8GB of RAM, and the access to NVMe storage, the Xavier NX development board is a joy to use.
Any good for doing development work?
As an Arm development environment, the Jetson Nano is excellent. You get access to all the standard programming languages like C, C++, Python, Java, Javascript, Go, and Rust. Plus there are all the Nvidia libraries and SDKs, like CUDA, cuDNN, and TensorRT. You can even install IDEs like Microsoft Visual Code!
As I mentioned earlier, I was able to grab the software for the Doom 3 engine and build the game quite easily. Plus I was able to try different machine learning tools like PyTorch and Numba. When you factor in the support for a 4K display, 8GB of RAM, and the access to NVMe storage, the Xavier NX development board is a joy to use.
Is the Nvidia Jetson Xavier NX the right board for you?
If you are just starting with machine learning then the Xavier NX probably isn’t the right option for your first investment. You can learn the basics of ML and AI on just about anything including a Raspberry Pi. If you want to benefit from some hardware-based acceleration then the Jetson Nano is highly recommended.
But if you have outgrown the Jetson Nano, or you are are looking to build a professional product that requires greater processing power, then the Xavier NX is a must. Also, if you are just looking for a decent Arm-based development machine, for remote builds or as a desktop, then the Xavier NX is a potential winner.
The bottom line is this: if the Raspberry Pi 4 is good enough for you, stick with it. If you want better overall performance, hardware-accelerated machine learning, and a way into the Jetson ecosystem, then get a Jetson Nano. If you need more than that, then get an Xavier NX development kit.