Cover Image
Annual Information Service

The VPU Report: Visual Processors and CNN - The Next Generation Supercomputer

Published by Jon Peddie Research Product code 519802
Published annual subscription Content info
Price
Back to Top
The VPU Report: Visual Processors and CNN - The Next Generation Supercomputer
Published: annual subscription Content info:
Description

JPR's quarterly The VPU Report is a direct-to-the-point, detailed analysis of vision processors, as SoC or as IP, from four of the leading companies in the field. Future quarterly issues of the report will present up to the minute information on VPUs available from a different selection of companies. The current issue provides analysis of Movidius, Intel, Ceva and Inuitive: four of the most influential companies operating in the consumer edge device category. The report includes a summary of the companies reviewed (useful when so many entrants into this field are startups) and commentary to shed light on the potential and pitfalls of VPU design.

Table of Contents

Table of Contents

  • Executive summary
  • Introduction
    • STUDY GOALS AND OBJECTIVES
    • SCOPE OF REPORT
    • INTENDED AUDIENCE
    • Methodology
      • Information Sources
      • Primary research for this report
      • Secondary research for this report
  • Verisilicon
    • Executive summary analysis and overview
    • Technology Overview
      • Vivante IP in NXP i.MX6 Silicon
      • GPU-compute core overview
        • GC2000 to GC7000 evolution
        • Configurations in production or pre-production
      • VIP Series Vision and Image Processors
        • VIP Series Software Environment
        • Block Structure of VIP Engine
        • Programmable Engine
        • Neural Network Engine
        • Tensor Processing Fabric
        • Hardware Accelerators
        • Arithmetic throughput summary
      • Example Use Cases: Always-on Camera and Faster RCNN
      • Variants in Production
      • The Acuity Vision SDK
      • Roadmap
      • Market Status and ecosystem
      • Analysis
  • Cadence
    • Exec Summary
    • Technology Overview
      • Tensilica Processors
      • Vision Processors
      • IVP-32
      • IVP-EP
      • Vision P5
        • Super Gather vector assembly hardware
        • Multi-processor operation
      • Vision P6
        • Improved data formatting and Super Gather in P6
      • Software
      • Performance
        • Resource Summary of Tensilica Vision Processors
        • CNN performance of P5
    • Market Status
      • Ecosystem Support
      • Roadmap
      • Analysis
  • Wave Computing
    • Executive summary
    • Technology Overview
      • Wave Dataflow Processor Architecture
        • The Processing Element
        • The Cluster
        • The compute machine
        • DPU top level
        • Power management
        • The DPU board
        • The Machine Learning node
      • Software
        • Programming the CGRA Architecture
      • Performance
      • Analysis
    • Google Tensor Processing Unit
    • Summary
    • Definitions & methodology
    • SAM, TAM and PAM
    • Appendix
      • Glossary
    • Index

Table of Figures

  • Figure 1: Vivante IP integrated into NXP i.MX6 (source: NXP)
  • Figure 2: GC2000 Programmable compute core block diagram (Source: Verisilicon)
  • Figure 3: GC6x and GC7x ‘Vega' programmable compute core block diagram (source: Verisilicon)
  • Figure 4: Verisilicon vision processing software environment (source: Verisilicon)
  • Figure 5: Verisilicon's VIP8xxx vision processing architecture block diagram (source: Verisilicon)
  • Figure 6: Always-on camera use case
  • Figure 7: Faster RCNN use case illustration
  • Figure 8: Acuity tools for CNN optimization and retraining (source: Verisilicon)
  • Figure 9: Tensilica IVP-32 block diagram (source: Cadence)
  • Figure 10: IVP-EP performance relative to IVP32 (source: Cadence)
  • Figure 11: P5 block diagram (source: Cadence)
  • Figure 12: Scatter gather mechanism on Vision P5 (source: Cadence)
  • Figure 13: Performance of P5 vs IVP-EP over common algorithms (source: Cadence)
  • Figure 14: P5 Multicore operation (source: Cadence)
  • Figure 15: Multicore operation control flow (source: Cadence)
  • Figure 16: Performance increase of P6 with Super Gather vs without
  • Figure 17: DPU board block diagram (source: Wave)
  • Figure 18: Wave Processing Element block diagram (source: Wave)
  • Figure 19: Cluster block diagram (source: Wave)
  • Figure 20: Cluster Word and Byte switches (source: Wave)
  • Figure 21: Wave compute machine block diagram (source: Wave)
  • Figure 22: Wave Computing's 32 x 32 array (Source: Wave)
  • Figure 23: Sleep/wake process (source: Wave)
  • Figure 24: The DPU board (source: Wave)
  • Figure 25: Process of translating Tensorflow output to Wave executables (source: Wave)
  • Figure 26: Mapping of Tensorflow session onto Wave hardware (source: Wave)
  • Figure 27: Wave software stack (source: Wave)
  • Figure 28: TPU block diagram (source: Google)

Table of Tables

  • Table 1: Table of GPU cores (source: Verisilicon)
  • Table 2: GC7000 series cores (source: Verisilicon)
  • Table 3: VIP 8 series Arithmetic throughput summary (Source: Verisilicon)
  • Table 4: VIP8000 Product variants matrix (Source: Verisilicon)
  • Table 5: IVP-32 Summary (source: Cadence)
  • Table 6: IVP-EP Summary (source: Cadence)
  • Table 7: Vision P5 summary (source: Cadence)
  • Table 8: P6 summary (source: Cadence)
  • Table 9: Summary of P6 vector arithmetic throughput (source: Cadence)
  • Table 10: Peak memory bandwidth of P6 vs access type (source: Cadence)
  • Table 11: Comparison of MAC resources of Tensilica Vision Processor IP cores (source: Cadence)
  • Table 12: Ecosystem partners (source: Cadence)
  • Table 13: Per-cycle and total 8-bit OPS per cluster (source: Wave)
  • Table 14: DPU Silicon summary (source: Wave)
  • Table 15: Summary of OPS at Full clock rate (source: Wave)
  • Table 16: Compute and memory resources of the Machine Learning node (source: Wave)
  • Table 17: Performance comparison on Word2Vec (source: Wave)
  • Table 18: Power benchmarks vs Haswell CPU and K80 GPU (source: Google)
  • Table 19: Performance limiting factors vs network type (source: Google)
  • Table 20: GOPS comparison between VIP nano and P6
Back to Top