Powered by Pascal architecture, the Tesla P4 from NVIDIA is a small-factor, 50/75W graphics card designed to boost the efficiency of scale-out servers running deep learning workloads, enabling smart responsive AI-based services. It reduces inference latency by up to 15x in hyperscale infrastructures and boosts energy efficiency. The hardware-decode engine is capable of transcoding and inferencing 35 HD video streams in real time. Additionally, the P4 uses a passive cooler for increased reliability and reduced power consumption.
Low-profile, plug-in card form factor
Enhanced programmability with page migration engine
Server-optimized for data center deployment
ECC protection
Responsive Experience with Real-Time Inference
The Tesla P4 delivers 22 TOPs of inference performance with INT8 operations to slash latency by 15x.
Efficiency for Low-Power Scale-Out Servers
The Tesla P4's small form factor and 50/75W power footprint design accelerates density-optimized, scale-out servers. It also provides 60x better energy efficiency
than CPUs for deep learning inference workloads, letting customers meet the growth in demand for AI applications.
Unlock AI-Based Video Services with a Dedicated Decode Engine
Tesla P4 can transcode and infer up to 35 HD video streams in real time, powered by a dedicated hardware-accelerated decode engine that works in parallel with the GPU doing
inference.
Faster Deployment with TensorRT and Deepstream SDK
TensorRT is a library created for optimizing deep learning models for production deployment. It takes trained neural nets—usually in 32- or 16-bit data—and optimizes them for reduced precision INT8 operations. NVIDIA DeepStream SDK taps into the power of Pascal GPUs to simultaneously decode and analyze video streams.