

# TH1520 Video Image Processing User Manual

 Revision
 1.0.0

 Security
 Secret

 Date
 2023-08-26



#### Copyright © 2022 T-HEAD (Shanghai) Semiconductor Co., Ltd. All rights reserved.

This document is the property of T-HEAD (Shanghai) Semiconductor Co., Ltd. This document may only be distributed to: (i) a T-HEAD party having a legitimate business need for the information contained herein, or (ii) a non-T-HEAD party having a legitimate business need for the information contained herein. No license, expressed or implied, under any patent, copyright or trade secret right is granted or implied by the conveyance of this document. No part of this document may be reproduced, transmitted, transcribed, stored in a retrieval system, translated into any language or computer language, in any form or by any means, electronic, mechanical, magnetic, optical, chemical, manual, or otherwise without the prior written permission of T-HEAD (Shanghai) Semiconductor Co., Ltd.

#### **Trademarks and Permissions**

The T-HEAD Logo and all other trademarks indicated as such herein are trademarks of T-HEAD (Shanghai) Semiconductor Co., Ltd. All other products or service names are the property of their respective owners.

#### Notice

The purchased products, services and features are stipulated by the contract made between T-HEAD and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied.

#### 平头哥(上海)半导体技术有限公司 T-HEAD (Shanghai) Semiconductor Co., LTD

Address: 5th Floor Number 2 Chuan He Road 55, Number 366 Shang Ke Road, Shanghai free trade area, China Website: www.t-head.cn

#### Copyright © 2022 平头哥(上海)半导体技术有限公司,保留所有权利.

本文档的所有权及知识产权归属于平头哥(上海)半导体技术有限公司及其关联公司(下称"平头哥")。本文档仅能分派给: (i)拥有合法雇佣关系,并需要本文档的信息的平头哥员工,或(ii)非平头哥组织但拥有合法合作关系,并且其需要本文档的信息的合作方。对于本文档,未经平头哥(上海)半导体技术有限公司明示同意,则不能使用该文档。在未经平头哥(上海)半导体技术有限公司的书面许可的情形下,不得复制本文档的任何部分,传播、转录、储存在检索系统中或翻译成任何语言或 计算机语言。

#### 商标申明

平头哥的 LOGO 和其它所有商标归平头哥(上海)半导体技术有限公司及其关联公司所有,未经平头哥(上海)半导体技术 有限公司的书面同意,任何法律实体不得使用平头哥的商标或者商业标识。

#### 注意

您购买的产品、服务或特性等应受平头哥商业合同和条款的约束,本文档中描述的全部或部分产品、服务或特性可能不在您 的购买或使用范围之内。除非合同另有约定,平头哥对本文档内容不做任何明示或默示的声明或保证。

由于产品版本升级或其他原因,本文档内容会不定期进行更新。除非另有约定,本文档仅作为使用指导,本文档中的所有陈述、信息和建议不构成任何明示或暗示的担保。平头哥(上海)半导体技术有限公司不对任何第三方使用本文档产生的损失 承担任何法律责任。

#### 平头哥(上海)半导体技术有限公司 T-HEAD (Shanghai) Semiconductor Co., LTD

- 地址:
   中国(上海)自由贸易试验区上科路 366 号、川和路 55 弄 2 号 5 层

   网址:
   如如如 t bood sp
- 网址: www.t-head.cn



## **Revisions**

| Rev    | Description     | Author(s) | Date       |
|--------|-----------------|-----------|------------|
| V1.0.0 | Initial version | T-Head    | 2023-08-26 |



## Contents

| Revisions                | I  |
|--------------------------|----|
| Contents                 |    |
| Figures & Tables         |    |
| List of Abbreviations    | IV |
| 1 GPU                    | 1  |
| 1.1 Overview             | 1  |
| 1.2 Main Features        | 1  |
| 1.3 Function Description | 2  |
| 1.4 Usage                | 3  |
| 2 G2D                    | 5  |
| 2.1 Overview             | 5  |
| 2.2 Main Features        | 5  |
| 2.3 Function Description | 7  |
| 2.4 Usage                | 8  |
| 3 DEWARP                 |    |
| 3.1 Overview             | 9  |
| 3.2 Main Features        | 10 |
| 3.3 Function Description | 11 |



## **Figures & Tables**

| Figure & Table 1-1 High level visual format processing overview      | 2  |
|----------------------------------------------------------------------|----|
| Figure & Table 1-2 Texture processing flow                           |    |
| Figure & Table 1-3 Power up the core                                 | 3  |
| Figure & Table 2-1 G2D block function diagram                        |    |
| Figure & Table 2-2 Supported formats                                 | 7  |
| Figure & Table 3-1 DEWARP block function diagram                     | 9  |
| Figure & Table 3-2 DEWARP Input-output format summary                | 11 |
| Figure & Table 3-3 VSE relationship between input and output formats | 12 |

## **List of Abbreviations**

| Abbreviations | Full Spelling           | Chinese Explanation |  |  |
|---------------|-------------------------|---------------------|--|--|
| BIF           | Bus Interface Module    | 总线接口模块              |  |  |
| FOV           | Field of View           | 视场角                 |  |  |
| LUT           | Look-Up-Table           | 查找表                 |  |  |
| PBE           | Pixel Backend           | 像素后端                |  |  |
| TPU           | Texture Processing Unit | 纹理处理单元              |  |  |
| USC           | Unified Shading Cluster | 统一的阴影集群             |  |  |

## 1 GPU

### 1.1 Overview

**T-HEAD** 

The IMG<sup>™</sup> B-Series BXM-4-64 GPU process a number of different workload types, namely:

- 3D Graphics Workload, which involves processing vertex data and pixel data for rendering of 3D scenes.
- Compute Workload (GP-GPU), which involves general purpose data processing.

The BXM-4-64 architecture is fully OpenGL ES 3.2, OpenCL 1.2 EP1, Vulkan 1.2 and Android NN HAL compliant.

#### **1.2 Main Features**

The BXM-4-64 graphics processors are built around multi-threaded Unified Shading Clusters (USCs) which feature an ALU architecture with high SIMD efficiency, and support tile-based deferred rendering with concurrent processing of multiple tiles.

This BXM-4-64 GPU core has the following features:

- Base architecture, fully compliant with the following APIs:
  - OpenGL ES 3.2
  - OpenCL 1.2 EP2
  - Vulkan 1.2
  - Android NN HAL
- Tile-based deferred rendering architecture for 3D graphics workloads, with concurrent processing of multiple tiles
- Programmable high quality image anti-aliasing
- Fine grain triangle culling
- Support for DRM security
- Support for Imagination AI Synergy when paired with an Imagination Neural Network Accelerator (NNA) core
- Multi-threaded Unified Shading Cluster (USC) engine incorporating pixel shader, vertex shader and GP-GPU (compute shader) functionality
- USC incorporates an ALU architecture with high SIMD efficiency
- Fully virtualized memory addressing (up to 64GB address space), supporting unified memory architecture
- Fine-grained task switching, workload balancing and power management
- Advanced DMA driven operation for minimum host CPU interaction
- System Level Cache (SLC)
  - The size of the SLC is 64KB
- Specialised Texture Cache Unit (TCU)
- Compressed texture decoding





- Lossless data compression (PVRGC) The PowerVR's geometry compression, which is performed in the geometry processing phase of the 3D graphics workload
- Lossless and/or visually lossless low area image compression the Imagination frame buffer compression and decompression (TFBC) algorithm
- Dedicated processor for B-Series core firmware execution
  - Single-threaded firmware processor with a 2KB instruction cache and a 2KB data cache
- Separate power island for the firmware processor
- On-Chip Performance, Power and Statistics Registers

## **1.3 Function Description**

The following figure illustrates at a high level how the GPU core reads texture formats into the TPU block and writes surface formats out from the PBE block. Other modules such as video and ISP cores can provide textures to the TPU. As with all suitable items in memory, surfaces can be read from memory into the TPU as textures.

Texture formats are also known as input, consumable, read or TPU formats.

Surface formats are also known as output, displayable, write or PBE formats.

Texture and surface formats are collectively known as visual formats.



Figure & Table 1-1 High level visual format processing overview





Pixel Write



#### 1.4 Usage

1. Power up the core

The only requirement that the core has in terms of the power-up procedure is that the reset to the power domain that is being powered-up is asserted and then de-asserted as the last step of the power-up procedure. The reset has to be de-asserted after the clocks to the domain have been enabled and running for 32 clock cycles. The following picture shows the power-up procedure of the core.







Considering the power up sequence described above. It should firstly release the GPU clock gate enable, and after 32 core clock cycles then can release the GPU soft reset.

- 2. The soc\_timer counter can be reset, but this counter should not be reset once GPU is first initialized.
- 3. The QoS of memory interface can be changed. Generally, it is strongly suggesting that only the MMU access (which tag ID is 0x0) need to be set a higher priority.

# 2 G2D

### 2.1 Overview

G2D module defines a high-performance multi-pipe 2D raster graphics core that accelerates the 2D graphics display on a variety of consumer devices. Addressable screen sizes range from the smallest cell phones up to UltraHD 4K2K displays.

### 2.1.1 Block Diagram

**T-HEAD** 

The block diagram of the G2D module is as follows:



Figure & Table 2-1 G2D block function diagram

#### 2.2 Main Features

Main features of G2D:

- Bit Blit
- Stretch Blit
- Alpha blending, including Java 2 Porter-Duff compositing blending rules



- 32K x 32K coordinate system
- Rectangle fill and clear
- 90° / 180° / 270° / X-Flip / Y-Flip / XY-Flip / Mirror rotation
- Line draw
- ROP Support
- Clipping
- Dither
- Multi -source blending
  - Full support for multi-source blending with variable block size to improve BW and reduce SW overhead
  - Up to 8 sources are supported
  - Programmable block size guarantees cache efficiency so each source is read once and the blended destination is written once
  - Supports 90, 180, 270 degree rotation with different block size for higher cache efficiency
  - Supports independent source rotation with blending
  - Supports independent source rotation with blending
  - Multi-source support for up to 4 video inputs
- YUV support
  - Source cache enhancements for input YUV420 to make all the requests 64 bytes. The 420 surface is packed to a 422 surface internally.
  - Supports 2 pixel aligned surface for 8-bit YUV formats; supports 4 pixel aligned surface for 10-bit YUV formats; supports 1 pixel aligned surface for RGB formats
  - Supports for separate U and V strides in multi-plane YUV format
  - Full multi-destination support for converting non-planar YUV formats to planar YUV. Used in extracting various components from the input color into different destination planes
  - YUV420 2-plane and YUV422 packed output with alpha blending supported
  - When source is one of (UYVY/YUY2/YV12/NV12/NV16), and destination belongs to (UYVY/YUY2), the direct path could be enabled, disabling color conversion to save power
  - Supports BT2020 for YUV2RGB and RGB2YUV. BT709 optional for 8bit
  - Support for 10-bit per component formats for A2R10G10B10, YUV420 2-plane, and YUV422
     2-plane
- Clock disabling
- AXI bus
  - Supports for programming outstanding request numbers. Supports MAX 32 outstanding request numbers
  - ACE-Lite and coherency support
- Additional enhancements
  - Full functional MMU with variable page size support

## 2.3 Function Description

#### 2.3.1 Stretch Blit

The stretch blit primitive performs a bit blit operation with stretch or shrink. The modified Bresenham algorithm is used to generate corresponding coordinates for fast stretching. The stretch factor is specified in a 16.0 fixed-point format. Stretch blit is not allowed to overlap, that is, no part of source and destination can share any piece of memory. Non-stretch blits can overlap. For stretch blit clipping is performed on a per pixel basis.

### 2.3.2 Filter Blit

Filter blit performs high quality scaling, up or down, using a FIR re-sampling filter with 3/5 taps. Sub-pixel coordinates (locations between the pixel grids) are generated by the drawing engine. The filter block in the drawing engine uses the sub-pixel information to select the appropriate filter kernel. GC620 processes 1 pixel every cycle when performing filter blit.

A stretch-or shrink-factor of 15.16 fixed-point format is supported. To generate a single destination pixel requires 9/25 source pixels. An image is scaled in two passes, one for X-dimension (HOR\_FILTER\_BLT) and the other for Y-dimension (VER\_FILTER\_BLT). Software sets up the filter kernel/coefficient table and the kernel size.

When the stretch or shrink factor is 1, the filter blit works as a bit blit copy. It can be used as format converter in that case, for instance, YUV to RGB converter.

When the stretch or shrink factor (scale ratio) is not 1:1, filter blit requires both a vertical pass as well as a horizontal pass to do the scaling. Shrink performance will be less than 1 pixel per cycle for each vertical pass and for each horizontal pass. Stretch performance will be near to the performance for the 1:1 scale ratio.

### 2.3.3 Supported Formats

The graphics engine supports following source and destination formats for data, bit blits and filter blits. In addition to these source and destination RGB formats, their swizzle formats (ARGB, RGBA, ABGR, BGRA) are also supported. For YUV formats, the G2D supports their U/V swap formats (NV21, NV61).

| Format   | Bit<br>Blit<br>Input | Bit Blit<br>Output | Stretch<br>Blit<br>Input |   |   | OPF<br>Output | MultiSource<br>Input | MultiSource<br>Output |
|----------|----------------------|--------------------|--------------------------|---|---|---------------|----------------------|-----------------------|
| A1R5G5B5 | Y                    | Y                  | Y                        | Y | Y | Y             | Y                    | Y                     |
| A4R4G4B4 | Y                    | Y                  | Y                        | Y | Y | Y             | Y                    | Y                     |
| A8R8G8B8 | Y                    | Y                  | Y                        | Y | Y | Y             | Y                    | Y                     |



| Format                                 | Bit<br>Blit<br>Input | Bit Blit<br>Output | Stretch<br>Blit | Stretch<br>Blit | OPF<br>Input | OPF<br>Output |   | MultiSource<br>Output |
|----------------------------------------|----------------------|--------------------|-----------------|-----------------|--------------|---------------|---|-----------------------|
|                                        | mput                 |                    | Input           | Output          |              |               |   |                       |
| X1R5G5B5                               | Y                    | Y                  | Y               | Y               | Y            | Y             | Y | Y                     |
| X4R4G4B4                               | Y                    | Y                  | Y               | Y               | Y            | Y             | Y | Y                     |
| X8R8G8B8                               | Y                    | Y                  | Y               | Y               | Y            | Y             | Υ | Y                     |
| R5G6B5                                 | Y                    | Y                  | Y               | Y               | Y            | Y             | Y | Y                     |
| A2R10G10B10                            | Y                    | Y                  | Y               | Y               | Y            | Y             | Y | Y                     |
| A8                                     | Y                    | Y                  | Y               | Y               | Y            | Y             | Y | Y                     |
| sRGB                                   | N                    | Ν                  | N               | N               | N            | N             | N | N                     |
| YUY2 (packed YUV422)                   | Y                    | Y                  | Y               | Y               | Y            | Y             | Y | Y                     |
| UYVY (packed YUV422)                   | Y                    | Y                  | Y               | Y               | Y            | Y             | Y | Y                     |
| YV12 (planar YUV420)                   | N                    | Y                  | N               | Y               | N            | Y             | N | Y                     |
| NV12 (semi-planar<br>YUV420)           | Y                    | Y                  | Y               | Y               | Y            | Y             | Y | Y                     |
| NV16 (semi-planar<br>YUV422)           | Y                    | Y                  | Y               | Y               | Y            | Y             | Y | Y                     |
| P010 (semi-planar YUV420)              | Y                    | Y                  | Y               | Y               | Y            | Y             | Y | Y                     |
| NV12-10bit (semi-planar<br>YUV420)     | Y                    | Y                  | Y               | Y               | Y            | Y             | Y | Y                     |
| NV16-10bit(semi-planar<br>YUV422)      | Y                    | N                  | Y               | N               | Y            | N             | Y | N                     |
| YUV420-2101010(semi-<br>planar YUV420) | N                    | N                  | N               | N               | N            | N             | N | N                     |
| 8-bit color index                      | Y                    | N                  | Y               | N               | N            | N             | Y | N                     |
| planar RGB888/YUV444                   | N                    | Y                  | N               | Y               | N            | Y             | N | Y                     |

## 2.4 Usage

- 1. Configure AQINTRENBL register to be 0x01000000.
- 2. Configure AQCMDBUFFERADDR register to the value of base address of the command buffer.
- 3. Configure AQCMDBUFFERCTRL register to be 0xffffffff.



## **3 DEWARP**

### 3.1 Overview

DEWARP processors provide both dewarp processing and scalar functionality in a single module. The high performance dewarp processing module allows for the correction of the distortion that is introduced in images produced by fisheye and wide-angle lenses, and scale up.

The scalar processing module provides high-performance scalar processing for YUV images. It supports scale down and scale up, but upscaling only when data is from DDR memory. Three scalar engines and dual AXI master write interfaces provide efficient throughput with a maximum output size of 4096x3072.

DEWARP supports the performance up to 4k@60fps.

#### 3.1.1 Block Diagram

The block diagram of the DEWARP is as follow:



Figure & Table 3-1 DEWARP block function diagram

There is an outside DEC400 to link DEWARP.





#### 3.2 Main Features

DEWARP module consist of Dewarp module and Scalar module, their main features are as follows:

- Dewarp Main Feature
  - Line based architecture and configurable map to support various lens distortions
  - YUV422 and YUV420 inputs and outputs:
    - Inputs: YUV422sp, YUV422 interleave, YUV420sp
    - Outputs: YUV422sp, YUV422 interleave, YUV420sp

Note: Only YUV422sp is supported when connected with VSE.

- Configurable bilinear interpolation engine
- Grid map with 16x16 pixel macroblocks indexed using X, Y vertex coordinates
- Fisheye correction
- Wide Field of View (FOV) correction
- Keystone correction
- Scaling and 4PTZ view
- High performance to 4K@60fps
- Optimized pixel reuse technology
- Flexible pixel cache to achieve low bandwidth consumption
- Low power design techniques
- Scalar Main Feature
  - Three scalar engines, with maximum resolutions of: 4K, 1080p and 1080p
  - Supports YCbCr 4:2:2 and YCbCr 4:2:0 semi-planar inputs:
    - For data from DDR: Supports YUV422sp, YUV422 interleave and YUV420sp
    - For data from the ISP stream interface: Supports YUV420sp and YUV422sp
    - For data from Dewarp: Supports YUV422sp
  - Supports YUV and RGB888 output: supports different output formats by choosing different scaling factors for luminance and chrominance components: YUV444, YUV422, YUV420, RGB888
  - Supports for co-sited and non-co-sited output formats by programmable phase offsets
  - Supports 8 bit or 10 bit component data
  - Bypass support
  - Input and output pictures up to 4K pixel width
  - Separate scaling in horizontal and vertical directions, and for chrominance and luminance components
  - Lookup table for programmable scaling characteristics
  - Shadow registers for on-the-fly reprogramming of control registers
  - Low power design techniques

## **3.3 Function Description**

#### 3.3.1 Pixel Interpolation for Downscaling

For downscaling, the input picture pixels are considered as areas. Pixels of the downscaled picture are calculated as an average of the input picture pixels, weighed according to their contributing area part.

#### NOTE

The maximum downscale factor is 128.

### 3.3.2 Pixel Interpolation for Upscaling

The basic calculation for an upscaled picture is the interpolation of two adjacent pixels. Interpolation factors are taken from a programmable lookup table. The distance between two input pixels is divided into 64 intervals with one LUT entry for each interval. For calculation of the output sample, the LUT entry of the interval containing the position of the output sample is used. If a linear LUT is programmed, then bilinear interpolation will result.

#### NOTE

- The maximum upscale for the horizontal direction is 4.
- The maximum upscale for the vertical direction is 4.
- The scalar engine supports upscaling only when data is from DDR memory.

#### 3.3.3 Format Summary

| Format             | DeWarp<br>Input from<br>DMA Read | DeWarp<br>Output | Scalar<br>Input from<br>ISP | Scalar<br>Input from<br>DeWarp | Scalar<br>Input from<br>DMA Read | Scalar<br>Output |
|--------------------|----------------------------------|------------------|-----------------------------|--------------------------------|----------------------------------|------------------|
| YUV422 semi-planar | YES                              | YES              | YES                         | YES                            | YES                              | YES              |
| YUV420 semi-planar | YES                              | YES              | YES                         |                                | YES                              | YES              |
| YUV422 interleave  | YES                              | YES              |                             |                                | YES                              | YES              |
| YUV444 planar      |                                  |                  |                             |                                |                                  | YES              |
| YUV444 interleave  |                                  |                  |                             |                                |                                  | YES              |
| RGB888 planar      |                                  |                  |                             |                                |                                  | YES              |
| RGB888 interleave  |                                  |                  |                             |                                |                                  | YES              |

Figure & Table 3-2 DEWARP Input-output format summary

For VSE (Scalar), Figure & Table 3-3 shows the relationship between input formats and output formats, where the output formats are constrained by the input format.



| Input Format | Output Format                             |  |  |  |  |
|--------------|-------------------------------------------|--|--|--|--|
| YUV422       | YUV422, YUV420, YUV444, RGB               |  |  |  |  |
| YUV420       | YUV420 YUV422, YUV444, RGB                |  |  |  |  |
|              | (Supported when source data is from DDR.) |  |  |  |  |

Figure & Table 3-3 VSE relationship between input and output formats

Each scalar can support RGB output, but three scalars cannot configure RGB simultaneously. Only one of the three scalars can configure RGB format.