# MIPS® Architecture Extension: nanoMIPS32™ DSP Technical Reference Manual



Revision 0.04 April 27, 2018 Public Unpublished rights (if any) reserved under the copyright laws of the United States of America and other countries.

This document contains information that is proprietary to MIPS Tech, LLC, a Wave Computing company ("MIPS") and MIPS' affiliates as applicable. Any copying, reproducing, modifying or use of this information (in whole or in part) that is not expressly permitted in writing by MIPS or MIPS' affiliates as applicable or an authorized third party is strictly prohibited. At a minimum, this information is protected under unfair competition and copyright laws. Violations thereof may result in criminal penalties and fines. Any document provided in source format (i.e., in a modifiable form such as in FrameMaker or Microsoft Word format) is subject to use and distribution restrictions that are independent of and supplemental to any and all confidentiality restrictions. UNDER NO CIRCUMSTANCES MAY A DOCUMENT PROVIDED IN SOURCE FORMAT BE DISTRIBUTED TO A THIRD PARTY IN SOURCE FORMAT WITHOUT THE EXPRESS WRITTEN PERMISSION OF MIPS (AND MIPS' AFFILIATES AS APPLICABLE) reserve the right to change the information contained in this document to improve function, design or otherwise.

MIPS and MIPS' affiliates do not assume any liability arising out of the application or use of this information, or of any error or omission in such information. Any warranties, whether express, statutory, implied or otherwise, including but not limited to the implied warranties of merchantability or fitness for a particular purpose, are excluded. Except as expressly provided in any written license agreement from MIPS or an authorized third party, the furnishing of this document does not give recipient any license to any intellectual property rights, including any patent rights, that cover the information in this document.

The information contained in this document shall not be exported, reexported, transferred, or released, directly or indirectly, in violation of the law of any country or international law, regulation, treaty, Executive Order, statute, amendments or supplements thereto. Should a conflict arise regarding the export, reexport, transfer, or release of the information contained in this document, the laws of the United States of America shall be the governing law.

The information contained in this document constitutes one or more of the following: commercial computer software, commercial computer software documentation or other commercial items. If the user of this information, or any related documentation of any kind, including related technical data or manuals, is an agency, department, or other entity of the United States government ("Government"), the use, duplication, reproduction, release, modification, disclosure, or transfer of this information, or any related documentation of any kind, is restricted in accordance with Federal Acquisition Regulation 12.212 for civilian agencies and Defense Federal Acquisition Regulation Supplement 227.7202 for military agencies. The use of this information by the Government is further restricted in accordance with the terms of the license agreement(s) and/or applicable contract terms and conditions covering this information from MIPS Technologies or an authorized third party.

MIPS I, MIPS II, MIPS II, MIPS IV, MIPS V, MIPSr3, MIPS32, MIPS64, microMIPS32, microMIPS64, MIPS-3D, MIPS16, MIPS16e, MIPS-Based, MIPSsim, MIPSpro, MIPS-VERIFIED, Aptiv logo, microAptiv logo, interAptiv logo, microMIPS logo, MIPS Technologies logo, MIPS-VERIFIED logo, proAptiv logo, 4K, 4Kc, 4Km, 4Kp, 4KE, 4KEc, 4KEm, 4KEp, 4KS, 4KSc, 4KSd, M4K, M14K, 5K, 5Kc, 5Kf, 24K, 24Kc, 24Kf, 24KE, 24KEc, 24KEf, 34K, 34Kc, 34Kf, 74K, 74Kc, 74Kf, 1004K, 1004Kc, 1004Kf, 1074K, 1074Kc, 1074Kf, R3000, R4000, R5000, Aptiv, ASMACRO, Atlas, "At the core of the user experience.", BusBridge, Bus Navigator, CLAM, CorExtend, CoreFPGA, CoreLV, EC, FPGA View, FS2, FS2 FIRST SILICON SOLUTIONS logo, FS2 NAVIGATOR, HyperDebug, HyperJTAG, IASim, iFlowtrace, interAptiv, JALGO, Logic Navigator, Malta, MDMX, MED, MGB, microAptiv, microMIPS, Navigator, OCI, PDtrace, the Pipeline, proAptiv, Pro Series, SEAD-3, SmartMIPS, SOC-it, and YAMON are trademarks or registered trademarks of MIPS and MIPS' affiliates as applicable in the United States and other countries.

All other trademarks referred to herein are the property of their respective owners.

# Contents

| Chapter 1: About This Book                                | 2  |
|-----------------------------------------------------------|----|
| 1.1: Typographical Conventions                            |    |
| 1.1.1: Italic Text                                        | 2  |
| 1.1.2: Bold Text                                          | 2  |
| 1.1.3: Courier Text                                       | 2  |
| 1.2: UNPREDICTABLE and UNDEFINED                          | 2  |
| 1.2.1: UNPREDICTABLE                                      |    |
| 1.2.2: UNDEFINED                                          |    |
| 1.2.3: UNSTABLE                                           |    |
| 1.3: Special Symbols in Pseudocode Notation               |    |
| 1.4: Notation for Register Field Accessibility            | 7  |
| 1.5: For More Information                                 | 9  |
|                                                           |    |
| Chapter 2: Guide to the Instruction Set                   |    |
| 2.1: Understanding the Instruction Fields                 | 10 |
| 2.1.1: Instruction Fields                                 |    |
| 2.1.2: Instruction Descriptive Name and Mnemonic          |    |
| 2.1.3: Format Field                                       |    |
| 2.1.4: Purpose Field                                      |    |
| 2.1.5: Description Field                                  |    |
| 2.1.6: Restrictions Field                                 |    |
| 2.1.7: Availability and Compatibility Fields              |    |
| 2.1.8: Operation Field                                    |    |
| 2.1.9: Exceptions Field                                   |    |
| 2.1.10: Programming Notes and Implementation Notes Fields |    |
| 2.2: Operation Section Notation and Functions             |    |
| 2.2.1: Instruction Execution Ordering                     |    |
| 2.2.2: Pseudocode Functions                               |    |
| 2.3: Op and Function Subfield Notation                    |    |
| 2.4: FPU Instructions                                     |    |
|                                                           |    |

# Chapter 3: The nanoMIPS® DSP Application Specific Extension to the nanoMIPS32® Architecture 30

| 3.1: Base Architecture Requirements                                       |  |
|---------------------------------------------------------------------------|--|
| 3.2: Compliance and Subsetting                                            |  |
| 3.3: Introduction to the nanoMIPS® DSP Module                             |  |
| 3.4: DSP Applications and their Requirements                              |  |
| 3.5: Fixed-Point Data Types                                               |  |
| 3.6: Saturating Math                                                      |  |
| 3.7: Conventions Used in the Instruction Mnemonics                        |  |
| 3.8: Effect of Endian-ness on Register SIMD Data                          |  |
| 3.9: Additional Register State for the DSP Module                         |  |
| 3.10: Software Detection of the DSP Module                                |  |
| 3.11: Exception Table for the DSP Module                                  |  |
| 3.12: DSP Module Instructions that Read and Write the DSPControl Register |  |
| 3.13: Arithmetic Exceptions                                               |  |
|                                                                           |  |

| apter 4: nanoMIPS® DSP Module Instruction Summary |   |
|---------------------------------------------------|---|
| 4.1: The nanoMIPS® DSP Module Instruction Summary |   |
| enter El la charactica. Encondita a               |   |
| apter 5: Instruction Encoding                     |   |
| 5.1: Instruction Bit Encoding                     |   |
| apter 6: The MIPS® DSP Module Instruction Set     | 5 |
| 6.1: Compliance and Subsetting                    |   |
| 6.2: DSP Module Specific Pseudocode Functions     |   |
|                                                   |   |
| 6.2.1: ValidateAccessToDSPResources()             |   |
| 6.2.2: ValidateAccessToDSP2Resources()            |   |
| ABSQ_S.PH                                         |   |
| ABSQ_S.QB.                                        |   |
| ABSQ_S.W.                                         |   |
| ADDQ[_S].PH                                       |   |
| ADDQ_S.W                                          |   |
| ADDQH[_R].PH                                      |   |
| ADDQH[_R].W                                       |   |
| ADDSC                                             |   |
| ADDU[_S].PH                                       |   |
| ADDU[_S].QB                                       |   |
| ADDWC                                             |   |
| ADDUH[_R].QB                                      |   |
| BALIGN                                            |   |
| BITREV                                            |   |
| BPOSGE32C                                         |   |
| CMP.cond.PH                                       |   |
| CMPGDU.cond.QB                                    |   |
| CMPGU.cond.QB                                     |   |
| CMPU.cond.QB                                      |   |
| DPA.W.PH                                          |   |
| DPAQ_S.W.PH                                       |   |
| DPAQ SA.L.W                                       |   |
| DPAQX S.W.PH                                      |   |
| DPAQX SA.W.PH                                     |   |
| DPAU.H.QBL                                        |   |
| DPAU.H.QBR                                        |   |
| DPAX.W.PH                                         |   |
| DPS.W.PH.                                         |   |
| DPSQ S.W.PH                                       |   |
| DPSQ_SA.L.W                                       |   |
| DPSQX S.W.PH                                      |   |
| DPSQX_SA.W.PH                                     |   |
| DPSU.H.QBL                                        |   |
| DPSU.H.QBR                                        |   |
| DPSX.W.PH                                         |   |
|                                                   |   |
| EXTP                                              |   |
| EXTPDP                                            |   |
| EXTPDPV                                           |   |
| EXTPV                                             |   |
| EXTR[_RS].W                                       |   |
| EXTR_S.H.                                         |   |
| EXTRV[_RS].W                                      |   |

| EXTRV S.H          | 144 |
|--------------------|-----|
| INSV.              |     |
| LBUX               |     |
|                    |     |
| LWX                |     |
| MADD               |     |
|                    |     |
| MADDU              |     |
| MAQ_S[A].W.PHL     |     |
| MAQ_S[A].W.PHR     |     |
| MFHI               |     |
| MFLO               |     |
| MODSUB             |     |
| MSUB               | 168 |
| MSUBU              | 170 |
| MTHI               | 172 |
| MTHLIP             | 174 |
| MTLO               | 176 |
| MUL S].PH          |     |
| MULEQ S.W.PHL      |     |
| MULEQ S.W.PHR      |     |
| MULEU S.PH.QBL     |     |
| MULEU S.PH.QBR.    |     |
| MULQ RS.PH         |     |
| MULQ_RS.W          |     |
|                    |     |
| MULQ_S.PH.         |     |
| MULQ_S.W           |     |
| MULSA.W.PH.        |     |
| MULSAQ_S.W.PH      |     |
| MULT               |     |
| MULTU              |     |
| PACKRL.PH          |     |
| PICK.PH            |     |
| PICK.QB.           |     |
| PRECEQ.W.PHL       | 210 |
| PRECEQ.W.PHR       | 212 |
| PRECEQU.PH.QBL     | 214 |
| PRECEQU.PH.QBLA    | 216 |
| PRECEQU.PH.QBR     |     |
| PRECEQU.PH.QBRA    |     |
| PRECEU.PH.QBL      |     |
| PRECEU.PH.QBLA     |     |
| PRECEU.PH.QBR.     |     |
| PRECEU.PH.QBRA     |     |
|                    |     |
| PRECR.QB.PH        |     |
| PRECR_SRA[_R].PH.W |     |
| PRECRQ.PH.W.       |     |
| PRECRQ.QB.PH.      |     |
| PRECRQU_S.QB.PH    |     |
| PRECRQ_RS.PH.W.    |     |
| PREPEND            |     |
| RADDU.W.QB         | 244 |
| RDDSP              |     |
| REPL.PH            | 248 |

| REPL.QB                                                                   |  |
|---------------------------------------------------------------------------|--|
| REPLV.PH                                                                  |  |
| REPLV.QB                                                                  |  |
| SHILO                                                                     |  |
| SHILOV                                                                    |  |
| SHLL[_S].PH                                                               |  |
| SHLL.QB                                                                   |  |
| SHLLV[_S].PH                                                              |  |
| SHLLV.QB                                                                  |  |
| SHLLV_S.W                                                                 |  |
| SHLL_S.W                                                                  |  |
| SHRA[_R].QB                                                               |  |
| SHRA[_R].PH                                                               |  |
| SHRAV[_R].PH                                                              |  |
| SHRAV[_R].QB                                                              |  |
| SHRAV_R.W                                                                 |  |
| SHRA_R.W                                                                  |  |
| SHRL.PH                                                                   |  |
| SHRL.QB                                                                   |  |
| SHRLV.PH                                                                  |  |
| SHRLV.QB                                                                  |  |
| SUBQ[_S].PH                                                               |  |
| SUBQ_S.W                                                                  |  |
| SUBQH[_R].PH                                                              |  |
| SUBQH[_R].W                                                               |  |
| SUBU[_S].PH                                                               |  |
| SUBU[_S].QB                                                               |  |
| SUBUH[_R].QB                                                              |  |
| WRDSP                                                                     |  |
| Appendix A: Endian-Agnostic Reference to Register Elements                |  |
| A.1: Using Endian-Agnostic Instruction Names                              |  |
| A.2: Mapping Endian-Agnostic Instruction Names to DSP Module Instructions |  |
| Appendix B: Revision History                                              |  |
|                                                                           |  |

# **List of Figures**

| Figure 2.1: Example of Instruction Description                                                                                                          |            |
|---------------------------------------------------------------------------------------------------------------------------------------------------------|------------|
| Figure 2.2: Example of Instruction Fields                                                                                                               |            |
| Figure 2.3: Example of Instruction Descriptive Name and Mnemonic                                                                                        |            |
| Figure 2.4: Example of Instruction Format                                                                                                               |            |
| Figure 2.5: Example of Instruction Purpose                                                                                                              |            |
| Figure 2.6: Example of Instruction Description                                                                                                          |            |
| Figure 2.7: Example of Instruction Restrictions                                                                                                         |            |
| Figure 2.8: Example of Instruction Operation                                                                                                            |            |
| Figure 2.9: Example of Instruction Exception                                                                                                            | . 15       |
| Figure 2.10: Example of Instruction Programming Notes                                                                                                   |            |
| Figure 2.11: COP_LW Pseudocode Function                                                                                                                 |            |
| Figure 2.12: COP_LD Pseudocode Function                                                                                                                 |            |
| Figure 2.13: COP_SW Pseudocode Function                                                                                                                 |            |
| Figure 2.14: COP_SD Pseudocode Function                                                                                                                 |            |
| Figure 2.15: CoprocessorOperation Pseudocode Function                                                                                                   |            |
| Figure 2.16: MisalignedSupport Pseudocode Function                                                                                                      |            |
| Figure 2.17: AddressTranslation Pseudocode Function                                                                                                     |            |
| Figure 2.18: LoadMemory Pseudocode Function                                                                                                             |            |
| Figure 2.19: StoreMemory Pseudocode Function                                                                                                            |            |
| Figure 2.20: Prefetch Pseudocode Function                                                                                                               |            |
| Figure 2.21: SyncOperation Pseudocode Function                                                                                                          |            |
| Figure 2.22: ValueFPR Pseudocode Function                                                                                                               |            |
| Figure 2.23: StoreFPR Pseudocode Function                                                                                                               |            |
| Figure 2.24: CheckFPException Pseudocode Function                                                                                                       |            |
| Figure 2.25: FPConditionCode Pseudocode Function                                                                                                        |            |
| Figure 2.26: SetFPConditionCode Pseudocode Function                                                                                                     |            |
| Figure 2.27: Are64BitFPOperationsEnabled Pseudocode Function                                                                                            |            |
| Figure 2.28: IsCoprocessorEnabled PseudocodeFunction                                                                                                    | . 24       |
| Figure 2.29: IsCoprocessor2 Pseudocode Function                                                                                                         |            |
| Figure 2.30: IsEJTAGImplemented Pseudocode Function                                                                                                     |            |
| Figure 2.31: IsFloatingPointImplemented Pseudocode Function                                                                                             |            |
| Figure 2.32: sign_extend Pseudocode Functions                                                                                                           |            |
| Figure 2.33: memory_address Pseudocode Function                                                                                                         |            |
| Figure 2.34: Instruction Fetch Implicit memory_address Wrapping                                                                                         |            |
| Figure 2.35: AddressTranslation implicit memory_address Wrapping                                                                                        | . 27       |
| Figure 2.36: SignalException Pseudocode Function                                                                                                        |            |
| Figure 2.37: SignalDebugBreakpointException Pseudocode Function                                                                                         |            |
| Figure 2.38: SignalDebugModeBreakpointException Pseudocode Function                                                                                     |            |
| Figure 2.39: NullifyCurrentInstruction PseudoCode Function                                                                                              |            |
| Figure 2.40: PolyMult Pseudocode Function<br>Figure 3.1: Computing the Value of a Fixed-Point (Q7) Number                                               |            |
| Figure 3.1: Computing the value of a Fixed-Point (Q7) Number<br>Figure 3.2: A Paired-Half (PH) Representation in a GPR for the microMIPS32 Architecture | . 33<br>24 |
| Figure 3.2: A Paired-Hall (PH) Representation in a GPR for the microWIPS32 Architecture                                                                 |            |
| Figure 3.3: A Quad-Byte (QB) Representation in a GPR for the hanomine 332 Architecture<br>Figure 3.4: Operation of MULQ_RS.PH rd, rs, rt                |            |
| Figure 3.5: MIPS® DSP Module Control Register (DSPControl) Format                                                                                       | 20.        |
| Figure 3.6: Config3 Register Format                                                                                                                     |            |
| Figure 3.7: CP0 Status Register Format                                                                                                                  |            |
| right of a status register i offici i sinthin international statements and the statement of the statement of the                                        | . 00       |

| Figure 6.1: ValidateAccessToDSPResource Pseudocode Function                                          | 58    |
|------------------------------------------------------------------------------------------------------|-------|
| Figure 6.2: ValidateAccessToDSP2Resources Pseudocode Function                                        | 59    |
| Figure 6.3: Operation of the INSV Instruction                                                        | . 146 |
| Figure A.1: The Endian-Independent PHL and PHR Elements in a GPR for the microMIPS32 Architecture    | . 309 |
| Figure A.2: The Big-Endian PH0 and PH1 Elements in a GPR for the microMIPS32 Architecture            | . 309 |
| Figure A.3: The Little-Endian PH0 and PH1 Elements in a GPR for the microMIPS32 Architecture         | . 309 |
| Figure A.4: The Endian-Independent QBL and QBR Elements in a GPR for the microMIPS32 Architecture    | . 310 |
| Figure A.5: The Endian-Independent QBLA and QBRA Elements in a GPR for the microMIPS32 Architecture. | . 310 |

# **List of Tables**

| Table 1.1: Symbols Used in Instruction Operation Statements                                        | 4  |
|----------------------------------------------------------------------------------------------------|----|
| Table 1.2: Read/Write Register Field Notation                                                      |    |
| Table 2.1: AccessLength Specifications for Loads/Stores                                            | 20 |
| Table 3.1: Data Size of DSP Applications                                                           |    |
| Table 3.2: The Value of a Fixed-Point Q31 Number                                                   |    |
| Table 3.3: The Limits of Q15 and Q31 Representations                                               | 32 |
| Table 3.4: MIPS® DSP Module Control Register (DSPControl) Field Descriptions                       | 36 |
| Table 3.5: Instructions that set the ouflag bits in DSPControl                                     | 37 |
| Table 3.7: Exception Table for the DSP Module                                                      | 39 |
| Table 3.6: Cause Register ExcCode Field                                                            | 39 |
| Table 3.8: Instructions that Read/Write Fields in DSPControl                                       | 40 |
| Table 4.1: List of Instructions in nanoMIPS® DSP Module in Arithmetic Sub-class                    | 41 |
| Table 4.2: List of Instructions in nanoMIPS® DSP Module in GPR-Based Shift Sub-class               | 44 |
| Table 4.3: List of Instructions in nanoMIPS® DSP Module in Multiply Sub-class                      | 46 |
| Table 4.4: List of Instructions in MIPS® DSP Module in Bit/ Manipulation Sub-class                 | 51 |
| Table 4.5: List of Instructions in MIPS® DSP Module in Compare-Pick Sub-class                      | 51 |
| Table 4.6: List of Instructions in MIPS® DSP Module in Accumulator and DSPControl Access Sub-class | 53 |
| Table 4.7: List of Instructions in MIPS® DSP Module in Indexed-Load Sub-class                      |    |
| Table 4.8: List of Instructions in MIPS® DSP Module in Branch Sub-class                            | 56 |
| Table 5.1: Symbols Used in the Instruction Encoding Tables                                         | 57 |
|                                                                                                    |    |

# **Chapter 1**

# **About This Book**

This chapter describes the terminology and conventions for describing features of the MIPS<sup>®</sup> Architecture such as instructions and control and status registers.

# **1.1 Typographical Conventions**

This section describes the use of *italic*, **bold** and courier fonts in this book.

## 1.1.1 Italic Text

- is used for emphasis
- is used for *bits*, *fields*, and *registers* that are important from a software perspective (for instance, address bits used by software, and programmable fields and registers), and various *floating point instruction formats*, such as *S* and *D*
- is used for the memory access types, such as cached and uncached

## 1.1.2 Bold Text

- represents a term that is being **defined**
- is used for **bits** and **fields** that are important from a hardware perspective (for instance, **register** bits, which are not programmable but accessible only to hardware)
- is used for ranges of numbers; the range is indicated by an ellipsis. For instance, **5..1** indicates numbers 5 through 1
- is used to emphasize UNPREDICTABLE and UNDEFINED behavior, as defined below.

## 1.1.3 Courier Text

Courier fixed-width font is used for text that is displayed on the screen, and for examples of code and instruction pseudocode.

# **1.2 UNPREDICTABLE and UNDEFINED**

The terms **UNPREDICTABLE** and **UNDEFINED** are used throughout this book to describe the behavior of the processor in certain cases. **UNDEFINED** behavior or operations can occur only as the result of executing instructions in a privileged mode (i.e., in Kernel Mode or Debug Mode, or with the CP0 usable bit set in the Status register). Unprivileged software can never cause **UNDEFINED** behavior or operations. Conversely, both privileged and unprivileged software can cause **UNPREDICTABLE** results or operations.

# **1.2.1 UNPREDICTABLE**

**UNPREDICTABLE** results may vary from processor implementation to implementation, instruction to instruction, or as a function of time on the same implementation or instruction. Software can never depend on results that are **UNPREDICTABLE**. **UNPREDICTABLE** operations may cause a result to be generated or not. If a result is generated, it is **UNPREDICTABLE**. **UNPREDICTABLE** operations may cause arbitrary exceptions.

UNPREDICTABLE results or operations have several implementation restrictions:

- Implementations of operations generating **UNPREDICTABLE** results must not depend on any data source (memory or internal state) which is inaccessible in the current processor mode
- UNPREDICTABLE operations must not read, write, or modify the contents of memory or internal state which is inaccessible in the current processor mode. For example, UNPREDICTABLE operations executed in user mode must not access memory or internal state that is only accessible in Kernel Mode or Debug Mode or in another process
- UNPREDICTABLE operations must not halt or hang the processor

## **1.2.2 UNDEFINED**

**UNDEFINED** operations or behavior may vary from processor implementation to implementation, instruction to instruction, or as a function of time on the same implementation or instruction. **UNDEFINED** operations or behavior may vary from nothing to creating an environment in which execution can no longer continue. **UNDEFINED** operations or behavior may cause data loss.

**UNDEFINED** operations or behavior has one implementation restriction:

• **UNDEFINED** operations or behavior must not cause the processor to hang (that is, enter a state from which there is no exit other than powering down the processor). The assertion of any of the reset signals must restore the processor to an operational state

## 1.2.3 UNSTABLE

**UNSTABLE** results or values may vary as a function of time on the same implementation or instruction. Unlike **UNPREDICTABLE** values, software may depend on the fact that a sampling of an **UNSTABLE** value results in a legal transient value that was correct at some point in time prior to the sampling.

UNSTABLE values have one implementation restriction:

• Implementations of operations generating **UNSTABLE** results must not depend on any data source (memory or internal state) which is inaccessible in the current processor mode

# **1.3 Special Symbols in Pseudocode Notation**

In this book, algorithmic descriptions of an operation are described using a high-level language pseudocode resembling Pascal. Special symbols used in the pseudocode notation are listed in Table 1.1.

| Symbol                                        | Meaning                                                                                                                                                                                                                                                           |
|-----------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| •                                             | Assignment                                                                                                                                                                                                                                                        |
| =,≠                                           | Tests for equality and inequality                                                                                                                                                                                                                                 |
|                                               | Bit string concatenation                                                                                                                                                                                                                                          |
| x <sup>y</sup>                                | A <i>y</i> -bit string formed by <i>y</i> copies of the single-bit value <i>x</i>                                                                                                                                                                                 |
| b#n                                           | A constant value $n$ in base $b$ . For instance 10#100 represents the decimal value 100, 2#100 represents the binary value 100 (decimal 4), and 16#100 represents the hexadecimal value 100 (decimal 256). If the "b#" prefix is omitted, the default base is 10. |
| 0bn                                           | A constant value <i>n</i> in base 2. For instance 0b100 represents the binary value 100 (decimal 4).                                                                                                                                                              |
| 0xn                                           | A constant value $n$ in base 16. For instance $0x100$ represents the hexadecimal value 100 (decimal 256).                                                                                                                                                         |
| x <sub>y z</sub>                              | Selection of bits $y$ through $z$ of bit string $x$ . Little-endian bit notation (rightmost bit is 0) is used. If $y$ is less than $z$ , this expression is an empty (zero length) bit string.                                                                    |
| x.bit[y]                                      | Bit <i>y</i> of bitstring <i>x</i> . Alternative to the traditional MIPS notation $x_{y}$ .                                                                                                                                                                       |
| x.bits[yz]                                    | Selection of bits y through z of bit string x. Alternative to the traditional MIPS notation $x_{y z}$ .                                                                                                                                                           |
| x.byte[y]                                     | Byte <i>y</i> of bitstring <i>x</i> . Equivalent to the traditional MIPS notation $x_{8*y+7}$ $x_{8*y-7}$                                                                                                                                                         |
| x.bytes[yz]                                   | Selection of bytes y through z of bit string x. Alternative to the traditional MIPS notation $x_{8*y+7}$ $x_{8*z}$                                                                                                                                                |
| x halfword[y]<br>x.word[i]<br>x.doubleword[i] | Similar extraction of particular bitfields (used in e.g., MSA packed SIMD vectors).                                                                                                                                                                               |
| x.bit31, x.byte0, etc.                        | Examples of abbreviated form of x.bit[y], etc. notation, when y is a constant.                                                                                                                                                                                    |
| x fieldy                                      | Selection of a named subfield of bitstring <i>x</i> , typically a register or instruction encoding.<br>More formally described as "Field y of register x".<br>For example, FIR.D = "the D bit of the Coprocessor 1 Floating-point Implementation Register (FIR)". |
| +, -                                          | 2's complement or floating point arithmetic: addition, subtraction                                                                                                                                                                                                |
| *, ∞                                          | 2's complement or floating point multiplication (both used for either)                                                                                                                                                                                            |
| div                                           | 2's complement integer division                                                                                                                                                                                                                                   |
| mod                                           | 2's complement modulo                                                                                                                                                                                                                                             |
| /                                             | Floating point division                                                                                                                                                                                                                                           |
| <                                             | 2's complement less-than comparison                                                                                                                                                                                                                               |
| >                                             | 2's complement greater-than comparison                                                                                                                                                                                                                            |
| ≤                                             | 2's complement less-than or equal comparison                                                                                                                                                                                                                      |
| 2                                             | 2's complement greater-than or equal comparison                                                                                                                                                                                                                   |
| nor                                           | Bitwise logical NOR                                                                                                                                                                                                                                               |
| xor                                           | Bitwise logical XOR                                                                                                                                                                                                                                               |
| and                                           | Bitwise logical AND                                                                                                                                                                                                                                               |
| or                                            | Bitwise logical OR                                                                                                                                                                                                                                                |

### Table 1.1 Symbols Used in Instruction Operation Statements

| Symbol        | Meaning                                                                                                                                                                                                                                                                                                                                                               |
|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| not           | Bitwise inversion                                                                                                                                                                                                                                                                                                                                                     |
| &&            | Logical (non-Bitwise) AND                                                                                                                                                                                                                                                                                                                                             |
| <<            | Logical Shift left (shift in zeros at right-hand-side)                                                                                                                                                                                                                                                                                                                |
| >>            | Logical Shift right (shift in zeros at left-hand-side)                                                                                                                                                                                                                                                                                                                |
| GPRLEN        | The length in bits (32 or 64) of the CPU general-purpose registers                                                                                                                                                                                                                                                                                                    |
| GPR[x]        | CPU general-purpose register x. The content of $GPR[0]$ is always zero. In Release 2 of the Architecture, $GPR[x]$ is a short-hand notation for $SGPR[SRSCtl_{CSS}, x]$ .                                                                                                                                                                                             |
| SGPR[s,x]     | In Release 2 of the Architecture and subsequent releases, multiple copies of the CPU general-purpose registers may be implemented. <i>SGPR[s,x]</i> refers to GPR set <i>s</i> , register <i>x</i> .                                                                                                                                                                  |
| FPR[x]        | Floating Point operand register x                                                                                                                                                                                                                                                                                                                                     |
| FCC[CC]       | Floating Point condition code CC. <i>FCC[0]</i> has the same value as <i>COC[1]</i> .<br>Release 6 removes the floating point condition codes.                                                                                                                                                                                                                        |
| FPR[x]        | Floating Point (Coprocessor unit 1), general register x                                                                                                                                                                                                                                                                                                               |
| CPR[z,x,s]    | Coprocessor unit z, general register x, select s                                                                                                                                                                                                                                                                                                                      |
| CP2CPR[x]     | Coprocessor unit 2, general register <i>x</i>                                                                                                                                                                                                                                                                                                                         |
| CCR[z,x]      | Coprocessor unit <i>z</i> , control register <i>x</i>                                                                                                                                                                                                                                                                                                                 |
| CP2CCR[x]     | Coprocessor unit 2, control register <i>x</i>                                                                                                                                                                                                                                                                                                                         |
| COC[z]        | Coprocessor unit z condition signal                                                                                                                                                                                                                                                                                                                                   |
| Xlat[x]       | Translation of the MIPS16e GPR number x into the corresponding 32-bit GPR number                                                                                                                                                                                                                                                                                      |
| BigEndianMem  | Endian mode as configured at chip reset ( $0 \rightarrow$ Little-Endian, $1 \rightarrow$ Big-Endian). Specifies the endianness of the memory interface (see LoadMemory and StoreMemory pseudocode function descriptions) and the endianness of Kernel and Supervisor mode execution.                                                                                  |
| BigEndianCPU  | The endianness for load and store instructions ( $0 \rightarrow$ Little-Endian, $1 \rightarrow$ Big-Endian). In User mode, this endianness may be switched by setting the <i>RE</i> bit in the <i>Status</i> register. Thus, BigEndianCPU may be computed as (BigEndianMem XOR ReverseEndian).                                                                        |
| ReverseEndian | Signal to reverse the endianness of load and store instructions. This feature is available in User mode only, and is implemented by setting the <i>RE</i> bit of the <i>Status</i> register. Thus, ReverseEndian may be computed as $(SR_{RE} \text{ and User mode})$ .                                                                                               |
| LLbit         | Bit of <b>virtual</b> state used to specify operation for instructions that provide atomic read-modify-write. <i>LLbit</i> is set when a linked load occurs and is tested by the conditional store. It is cleared, during other CPU operation, when a store to the location would no longer be atomic. In particular, it is cleared by exception return instructions. |

# Table 1.1 Symbols Used in Instruction Operation Statements (Continued)

| Symbol                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                  | Meaning                                                                                                                                                                                    |                      |
|-----------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
| I:,<br>I+n:,<br>I-n:              | This occurs as a prefix to <i>Operation</i> description lines and functions as a label. It indicates the instruction time during which the pseudocode appears to "execute." Unless otherwise indicated, all effects of the current instruction appear to occur during the instruction time of the current instruction. No label is equivalent to a time label of <b>I</b> . Sometimes effects of an instruction appear to occur either earlier or later — that is, during the instruction time, relative to the current instruction <b>I</b> , in which the effect of that pseudocode appears to occur. For example, an instruction may have a result that is not available until after the next instruction. Such an instruction has the portion of the instruction labeled <b>I+1</b> appears to occur "at the same time" as the effect of pseudocode statements labeled <b>I</b> for the following instruction. Within one pseudocode sequence, the effects of the statements take place in order. However, between sequences of statements for different instructions that occur "at the same time," there is no defined order. Programs must not depend on a particular order of evaluation between such sections. |                                  |                                                                                                                                                                                            |                      |
| PC                                | The <i>Program Counter</i> value. During the instruction time of an instruction, this is the address of the instruc-<br>tion word. The address of the instruction that occurs during the next instruction time is determined by assign-<br>ing a value to <i>PC</i> during an instruction time. If no value is assigned to <i>PC</i> during an instruction time by any<br>pseudocode statement, it is automatically incremented by either 2 (in the case of a 16-bit MIPS16e instruc-<br>tion) or 4 before the next instruction time. A taken branch assigns the target address to the <i>PC</i> during the<br>instruction time of the instruction in the branch delay slot.<br>In the MIPS Architecture, the PC value is only visible indirectly, such as when the processor stores the restart<br>address into a GPR on a jump-and-link or branch-and-link instruction, or into a Coprocessor 0 register on an<br>exception. Release 6 adds PC-relative address computation and load instructions. The PC value contains a<br>full 32-bit address, all of which are significant during a memory reference.                                                                                                            |                                  |                                                                                                                                                                                            |                      |
| ISA Mode                          | In processors that implement the MIPS16e Application Specific Extension or the microMIPS base architec-<br>tures, the <i>ISA Mode</i> is a single-bit register that determines in which mode the processor is executing, as fol-<br>lows:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                  |                                                                                                                                                                                            |                      |
|                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | Encoding                         | Meaning                                                                                                                                                                                    |                      |
|                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 0                                | The processor is executing 32-bit MIPS instructions                                                                                                                                        |                      |
|                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 1                                | The processor is executing MIIPS16e or microMIPS instructions                                                                                                                              |                      |
|                                   | In the MIPS Architecture, the <i>ISA Mode</i> value is only visible indirectly, such as when the processor stores a combined value of the upper bits of PC and the <i>ISA Mode</i> into a GPR on a jump-and-link or branch-and-link instruction, or into a Coprocessor 0 register on an exception.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                  |                                                                                                                                                                                            |                      |
| PABITS                            | The number of physical address bits implemented is represented by the symbol PABITS. As such, if 36 physical address bits were implemented, the size of the physical address space would be $2^{PABITS} = 2^{36}$ bytes.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                  |                                                                                                                                                                                            |                      |
| FP32RegistersMode                 | Indicates whether the FPU has 32-bit or 64-bit floating point registers (FPRs). It is optional if the FPU has 32 64-bit FPRs in which 64-bit data types are stored in any FPR.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |                                  |                                                                                                                                                                                            |                      |
|                                   | were a microMIPS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 32 implementanis bit is a 0, the | ave a compatibility mode in which the processor reference<br>tion. In such a case <b>FP32RegisterMode</b> is computed from<br>e processor operates as if it had 32, 32-bit FPRs. If this b | om the FR bit in the |
|                                   | The value of <b>FP32</b>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | RegistersMod                     | le is computed from the FR bit in the <i>Status</i> register.                                                                                                                              |                      |
| InstructionInBranchDe-<br>laySlot | Indicates whether the instruction at the Program Counter address was executed in the delay slot of a branch or jump. This condition reflects the <i>dynamic</i> state of the instruction, not the <i>static</i> state. That is, the value is false if a branch or jump occurs to an instruction whose PC immediately follows a branch or jump, but which is not executed in the delay slot of a branch or jump.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                  |                                                                                                                                                                                            |                      |

# Table 1.1 Symbols Used in Instruction Operation Statements (Continued)

| Symbol                 | Meaning                                                                                                     |
|------------------------|-------------------------------------------------------------------------------------------------------------|
| SignalException(excep- | Causes an exception to be signaled, using the exception parameter as the type of exception and the argument |
| tion, argument)        | parameter as an exception-specific argument). Control does not return from this pseudocode function-the     |
|                        | exception is signaled at the point of the call.                                                             |

# Table 1.1 Symbols Used in Instruction Operation Statements (Continued)

# 1.4 Notation for Register Field Accessibility

In this document, the read/write properties of register fields use the notations shown in Table 1.1.

| Read/Write<br>Notation | Hardware Interpretation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | Software Interpretation                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| R/W                    | A field in which all bits are readable and writable by software and, potentially, by hardware.<br>Hardware updates of this field are visible by software read. Software updates of this field are visible by<br>hardware read.<br>If the Reset State of this field is "Undefined", either software or hardware must initialize the value before<br>the first read will return a predictable value. This should not be confused with the formal definition of<br><b>UNDEFINED</b> behavior.                                                                                                                                                                                                                                                                                                                              |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| R                      | A field which is either static or is updated only by<br>hardware.<br>If the Reset State of this field is either "0", "Pre-<br>set", or "Externally Set", hardware initializes this<br>field to zero or to the appropriate state, respectively,<br>on powerup. The term "Preset" is used to suggest<br>that the processor establishes the appropriate state,<br>whereas the term "Externally Set" is used to sug-<br>gest that the state is established via an external<br>source (e.g., personality pins or initialization bit<br>stream). These terms are suggestions only, and are<br>not intended to act as a requirement on the imple-<br>mentation.<br>If the Reset State of this field is "Undefined", hard-<br>ware updates this field only under those conditions<br>specified in the description of the field. | A field to which the value written by software is<br>ignored by hardware. Software may write any value<br>to this field without affecting hardware behavior.<br>Software reads of this field return the last value<br>updated by hardware.<br>If the Reset State of this field is "Undefined", soft-<br>ware reads of this field result in an <b>UNPREDICT-</b><br><b>ABLE</b> value except after a hardware update done<br>under the conditions specified in the description of<br>the field. |

| Read/Write<br>Notation | Hardware Interpretation                                                                                                                                                                                               | Software Interpretation                                                                                                                                                                                                                                                                                                                                                                                                       |  |
|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| R0                     | R0 = reserved, read as zero, ignore writes by soft-<br>ware.                                                                                                                                                          | <b>Architectural Compatibility:</b> R0 fields are reserved, and may be used for not-yet-defined purposes in future revisions of the architecture.                                                                                                                                                                                                                                                                             |  |
|                        | Hardware ignores software writes to an R0 field.<br>Neither the occurrence of such writes, nor the values written, affects hardware behavior.<br>Hardware always returns 0 to software reads of R0                    | When writing an R0 field, current software should<br>only write either all 0s, or, preferably, write back the<br>same value that was read from the field.                                                                                                                                                                                                                                                                     |  |
|                        | fields.<br>The Reset State of an R0 field must always be 0.                                                                                                                                                           | Current software should not assume that the value<br>read from R0 fields is zero, because this may not be<br>true on future hardware.                                                                                                                                                                                                                                                                                         |  |
|                        | If software performs an mtc0 instruction which<br>writes a non-zero value to an R0 field, the write to<br>the R0 field will be ignored, but permitted writes to<br>other fields in the register will not be affected. | Future revisions of the architecture may redefine an R0 field, but must do so in such a way that software which is unaware of the new definition and either writes zeros or writes back the value it has read from the field will continue to work correctly.                                                                                                                                                                 |  |
|                        |                                                                                                                                                                                                                       | Writing back the same value that was read is guaran-<br>teed to have no unexpected effects on current or<br>future hardware behavior. (Except for non-atomicity<br>of such read-writes.)                                                                                                                                                                                                                                      |  |
|                        |                                                                                                                                                                                                                       | Writing zeros to an R0 field may not be preferred<br>because in the future this may interfere with the oper-<br>ation of other software which has been updated for<br>the new field definition.                                                                                                                                                                                                                               |  |
| 0                      | Release 6 legacy "0" behaves like R                                                                                                                                                                                   | ease 6<br>) - read as zero, nonzero writes ignored.<br>control register fields; R0 should be used instead.                                                                                                                                                                                                                                                                                                                    |  |
|                        | HW returns 0 when read.<br>HW ignores writes.                                                                                                                                                                         | Only zero should be written, or, value read from reg-<br>ister.                                                                                                                                                                                                                                                                                                                                                               |  |
|                        | pre-Release 6<br>pre-Release 6 legacy "0" - read as zero, nonzero writes UNDEFINED                                                                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                               |  |
|                        | A field which hardware does not update, and for<br>which hardware can assume a zero value.                                                                                                                            | A field to which the value written by software must<br>be zero. Software writes of non-zero values to this<br>field may result in <b>UNDEFINED</b> behavior of the<br>hardware. Software reads of this field return zero as<br>long as all previous software writes are zero.<br>If the Reset State of this field is "Undefined", soft-<br>ware must write this field with zero before it is guar-<br>anteed to read as zero. |  |

| Read/Write<br>Notation | Hardware Interpretation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | Software Interpretation                                                                                                                                              |  |
|------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| R/W0                   | Like R/W, except that writes of non-zero to a R/W0 field are ignored.<br>E.g. Status.NMI                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                                                                                                                                                      |  |
|                        | Hardware may set or clear an R/W0 bit.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Software can only clear an R/W0 bit.                                                                                                                                 |  |
|                        | <ul> <li>Hardware ignores software writes of nonzero to an R/W0 field. Neither the occurrence of such writes, nor the values written, affects hardware behavior.</li> <li>Software writes of 0 to an R/W0 field may have an effect.</li> <li>Hardware may return 0 or nonzero to software reads of an R/W0 bit.</li> <li>If software performs an mtc0 instruction which writes a non-zero value to an R/W0 field, the write to the R/W0 field will be ignored, but permitted writes to other fields in the register will not be affected.</li> </ul> | Software writes 0 to an R/W0 field to clear the field.<br>Software writes nonzero to an R/W0 bit in order to<br>guarantee that the bit is not affected by the write. |  |

### Table 1.2 Read/Write Register Field Notation (Continued)

# **1.5 For More Information**

MIPS processor manuals and additional information about MIPS products can be found at http://www.o kru.com.

0 .

# **Chapter 2**

# **Guide to the Instruction Set**

This chapter provides a detailed guide to understanding the instruction descriptions, which are listed in alphabetical order in the tables at the beginning of the next chapter.

# 2.1 Understanding the Instruction Fields

Figure 2.1 shows an example instruction. Following the figure are descriptions of the fields listed below:

- "Instruction Fields" on page 12
- "Instruction Descriptive Name and Mnemonic" on page 12
- "Format Field" on page 12
- "Purpose Field" on page 13
- "Description Field" on page 13
- "Restrictions Field" on page 13
- "Operation Field" on page 14
- "Exceptions Field" on page 15
- "Programming Notes and Implementation Notes Fields" on page 15

| Instruction Mnemonic and —                                           | Example Instruction Name EXAMPLE |                                              |                                                           |                                                          |                          |                                                                    |
|----------------------------------------------------------------------|----------------------------------|----------------------------------------------|-----------------------------------------------------------|----------------------------------------------------------|--------------------------|--------------------------------------------------------------------|
|                                                                      |                                  |                                              | EXA                                                       | MPLE                                                     |                          |                                                                    |
|                                                                      | 31 26                            | 25 21                                        | 20 16                                                     | 15 11                                                    | 10 6                     | 5 0                                                                |
| Instruction Encoding<br>Constant and Variable                        | SPECIAL<br>000000                | 0                                            | rt                                                        | rd                                                       | 0<br>00000               | EXAMPLE<br>000000                                                  |
| Architecture Level at<br>which Instruction Was<br>Defined/Redefined  | 6                                | 5                                            | 5                                                         | 5                                                        | 5                        | 6                                                                  |
| Assembler Format(s) for ——<br>Each Definition                        | ► Format:                        | EXAMPLE 1                                    | d,rs,rt                                                   |                                                          |                          | MIPS32                                                             |
| Short Description                                                    | Purpose:                         | Example Inst                                 | ruction Name                                              |                                                          |                          |                                                                    |
|                                                                      | To execut                        | te an EXAMPL                                 | E op.                                                     |                                                          |                          |                                                                    |
| Symbolic Description                                                 | Descript                         | ion:GPR[rd] ↔                                | — GPR[r]s e                                               | xampleop GPH                                             | R[rt]                    |                                                                    |
| Full Description of —————<br>Instruction Operation                   |                                  | ion describes the information that           |                                                           |                                                          |                          | d illustrations. It section.                                       |
| Restrictions on Instruction ——<br>and Operands                       | Restricti                        | ons:                                         |                                                           |                                                          |                          |                                                                    |
|                                                                      | tion enco                        | ding fields such                             | as register spec                                          | cifiers, operand                                         | values, operand          | alues of the instruc-<br>d formats, address<br>for addressed loca- |
| High Level Language —<br>Description of the<br>Instruction Operation | /* a<br>/* t<br>/* j<br>temp     | This section<br>a high level<br>the Descript | pseudo lang<br>ion section<br>that is hard<br>s] exampleo | <pre>guage. It is is not, but to express p GPR[rt]</pre> | precise in<br>is also mi | ssing */                                                           |
| Exceptions that the Instruction                                      | Exceptio                         | ns:                                          |                                                           |                                                          |                          |                                                                    |
|                                                                      | A list of e                      | exceptions taker                             | by the instruct                                           | ion.                                                     |                          |                                                                    |
| Notes for Programmers                                                | Program                          | ming Notes:                                  |                                                           |                                                          |                          |                                                                    |
|                                                                      | Informati<br>instructio          | on useful to pro<br>n.                       | grammers, but                                             | not necessary to                                         | o describe the o         | peration of the                                                    |
| Notes for Implementers                                               | → Impleme                        | ntation Notes:                               |                                                           |                                                          |                          |                                                                    |
|                                                                      | Like Prog                        | gramming Notes                               | , except for pro                                          | ocessor impleme                                          | entors.                  |                                                                    |

# Figure 2.1 Example of Instruction Description

# 2.1.1 Instruction Fields

Fields encoding the instruction word are shown in register form at the top of the instruction description. The following rules are followed:

- The values of constant fields and the *opcode* names are listed in uppercase (SPECIAL and ADD in Figure 2.2). Constant values in a field are shown in binary below the symbolic or hexadecimal value.
- All variable fields are listed with the lowercase names used in the instruction description (*rs*, *rt*, and *rd* in Figure 2.2).
- Fields that contain zeros but are not named are unused fields that are required to be zero (bits 10:6 in Figure 2.2). If such fields are set to non-zero values, the operation of the processor is **UNPREDICTABLE**.

| 31                | 26 25 2 | 1 20 16 | 15 11 | 10 6       | 5 0           |
|-------------------|---------|---------|-------|------------|---------------|
| SPECIAL<br>000000 | rs      | rt      | rd    | 0<br>00000 | ADD<br>100000 |
| 6                 | 5       | 5       | 5     | 5          | 6             |

Figure 2.2 Example of Instruction Fields

# 2.1.2 Instruction Descriptive Name and Mnemonic

The instruction descriptive name and mnemonic are printed as page headings for each instruction, as shown in Figure 2.3.



| Add Word | ADD |  |
|----------|-----|--|
|----------|-----|--|

# 2.1.3 Format Field

The assembler formats for the instruction and the architecture level at which the instruction was originally defined are given in the *Format* field. If the instruction definition was later extended, the architecture levels at which it was extended and the assembler formats for the extended definition are shown in their order of extension (for an example, see C.cond fmt). The MIPS architecture levels are inclusive; higher architecture levels include all instructions in previous levels. Extensions to instructions are backwards compatible. The original assembler formats are valid for the extended architecture.

| Figure 2.4 Examp | e of Instruc | tion Format |
|------------------|--------------|-------------|
|------------------|--------------|-------------|

| Format: | ADD fd,rs,rt | MIPS32 |
|---------|--------------|--------|
|---------|--------------|--------|

The assembler format is shown with literal parts of the assembler instruction printed in uppercase characters. The variable parts, the operands, are shown as the lowercase names of the appropriate fields.

The architectural level at which the instruction was first defined, for example "MIPS32" is shown at the right side of the page. Instructions introduced at different times by different ISA family members, are indicated by markings such as "MIPS64, MIPS32 Release 2". Instructions removed by particular architecture release are indicated in the Availability section.

There can be more than one assembler format for each architecture level. Floating point operations on formatted data show an assembly format with the actual assembler mnemonic for each valid value of the *fint* field. For example, the ADD fmt instruction lists both ADD.S and ADD.D.

The assembler format lines sometimes include parenthetical comments to help explain variations in the formats (once again, see C.cond.fmt). These comments are not a part of the assembler format.

The term *decoded\_immediate* is used if the immediate field is encoded within the binary format but the assembler format uses the decoded value. The term *left\_shifted\_offset* is used if the offset field is encoded within the binary format but the assembler format uses value after the appropriate amount of left shifting.

# 2.1.4 Purpose Field

The Purpose field gives a short description of the use of the instruction.

#### Figure 2.5 Example of Instruction Purpose

Purpose: Add Word

To add 32-bit integers. If an overflow occurs, then trap.

### 2.1.5 Description Field

If a one-line symbolic description of the instruction is feasible, it appears immediately to the right of the *Description* heading. The main purpose is to show how fields in the instruction are used in the arithmetic or logical operation.

#### Figure 2.6 Example of Instruction Description

| <b>Description:</b> GPR [rd]                                                                                                                               |
|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| The 32-bit word value in GPR <i>rt</i> is added to the 32-bit value in GPR <i>rs</i> to produce a 32-bit result.                                           |
| • If the addition results in 32-bit 2's complement arithmetic overflow, the destination register is not modified and an Integer Overflow exception occurs. |
| • If the addition does not overflow, the 32-bit result is placed into GPR <i>rd</i> .                                                                      |

The body of the section is a description of the operation of the instruction in text, tables, and figures. This description complements the high-level language description in the *Operation* section.

This section uses acronyms for register descriptions. "GPR rt" is CPU general-purpose register specified by the instruction field rt. "FPR fs" is the floating point operand register specified by the instruction field fs. "CP1 register fd" is the coprocessor 1 general register specified by the instruction field fd. "FCSR" is the floating point Control / Status register.

### 2.1.6 Restrictions Field

The *Restrictions* field documents any possible restrictions that may affect the instruction. Most restrictions fall into one of the following six categories:

- Valid values for instruction fields (for example, see floating point ADD.fmt)
- ALIGNMENT requirements for memory addresses (for example, see LW)

- Valid values of operands (for example, see ALNV.PS)
- Valid operand formats (for example, see floating point ADD.fmt)
- Order of instructions necessary to guarantee correct execution. These ordering constraints avoid pipeline hazards for which some processors do not have hardware interlocks (for example, see MUL).
- Valid memory access types (for example, see LL/SC)

#### Figure 2.7 Example of Instruction Restrictions

| Restrictions: |
|---------------|
| None          |
|               |
|               |
|               |

# 2.1.7 Availability and Compatibility Fields

The *Availability* and *Compatibility* sections are not provided for all instructions. These sections list considerations relevant to whether and how an implementation may implement some instructions, when software may use such instructions, and how software can determine if an instruction or feature is present. Such considerations include:

- Some instructions are not present on all architecture releases. Sometimes the implementation is required to signal a Reserved Instruction exception, but sometimes executing such an instruction encoding is architecturally defined to give UNPREDICTABLE results.
- Some instructions are available for implementations of a particular architecture release, but may be provided only if an optional feature is implemented. Control register bits typically allow software to determine if the feature is present.
- Some instructions may not behave the same way on all implementations. Typically this involves behavior that was UNPREDICTABLE in some implementations, but which is made architectural and guaranteed consistent so that software can rely on it in subsequent architecture releases.
- Some instructions are prohibited for certain architecture releases and/or optional feature combinations.
- Some instructions may be removed for certain architecture releases. Implementations may then be required to signal a Reserved Instruction exception for the removed instruction encoding; but sometimes the instruction encoding is reused for other instructions.

All of these considerations may apply to the same instruction. If such considerations applicable to an instruction are simple, the architecture level in which an instruction was defined or redefined in the *Format* field, and/or the *Restrictions* section, may be sufficient; but if the set of such considerations applicable to an instruction is complicated, the *Availability* and *Compatibility* sections may be provided.

## 2.1.8 Operation Field

The *Operation* field describes the operation of the instruction as pseudocode in a high-level language notation resembling Pascal. This formal description complements the *Description* section; it is not complete in itself because many of the restrictions are either difficult to include in the pseudocode or are omitted for legibility.

Figure 2.8 Example of Instruction Operation

```
Operation:
    temp ← (GPR[rs]<sub>31</sub>||GPR[rs]<sub>31..0</sub>) + (GPR[rt]<sub>31</sub>||GPR[rt]<sub>31..0</sub>)
    if temp<sub>32</sub> ≠ temp<sub>31</sub> then
       SignalException(IntegerOverflow)
    else
       GPR[rd] ← temp
    endif
```

See 2.2 "Operation Section Notation and Functions" on page 15 for more information on the formal notation used here.

## 2.1.9 Exceptions Field

The *Exceptions* field lists the exceptions that can be caused by *Operation* of the instruction. It omits exceptions that can be caused by the instruction fetch, for instance, TLB Refill, and also omits exceptions that can be caused by asynchronous external events such as an Interrupt. Although a Bus Error exception may be caused by the operation of a load or store instruction, this section does not list Bus Error for load and store instructions because the relationship between load and store instructions and external error indications, like Bus Error, are dependent upon the implementation.

#### Figure 2.9 Example of Instruction Exception

| Exceptions:      |
|------------------|
| Integer Overflow |

An instruction may cause implementation-dependent exceptions that are not present in the Exceptions section.

#### 2.1.10 Programming Notes and Implementation Notes Fields

The *Notes* sections contain material that is useful for programmers and implementors, respectively, but that is not necessary to describe the instruction and does not belong in the description sections.

#### Figure 2.10 Example of Instruction Programming Notes

#### **Programming Notes:**

ADDU performs the same arithmetic operation but does not trap on overflow.

# 2.2 Operation Section Notation and Functions

In an instruction description, the *Operation* section uses a high-level language notation to describe the operation performed by each instruction. Special symbols used in the pseudocode are described in the previous chapter. Specific pseudocode functions are described below.

This section presents information about the following topics:

- "Instruction Execution Ordering" on page 16
- "Pseudocode Functions" on page 16

## 2.2.1 Instruction Execution Ordering

Each of the high-level language statements in the *Operations* section are executed sequentially (except as constrained by conditional and loop constructs).

### 2.2.2 Pseudocode Functions

There are several functions used in the pseudocode descriptions. These are used either to make the pseudocode more readable, to abstract implementation-specific behavior, or both.

These functions are defined in this section, and include the following:

- "Coprocessor General Register Access Functions" on page 16
- "Memory Operation Functions" on page 17
- "Floating Point Functions" on page 20
- "Instruction Mode Checking Functions" on page 23
- "Miscellaneous Functions" on page 27

#### 2.2.2.1 Coprocessor General Register Access Functions

Defined coprocessors, except for CP0, have instructions to exchange words and doublewords between coprocessor general registers and the rest of the system. What a coprocessor does with a word or doubleword supplied to it and how a coprocessor supplies a word or doubleword is defined by the coprocessor itself. This behavior is abstracted into the functions described in this section.

#### 2.2.2.1.1 COP\_LW

The COP\_LW function defines the action taken by coprocessor *z* when supplied with a word from memory during a load word operation. The action is coprocessor-specific. The typical action would be to store the contents of mem-word in coprocessor general register *rt*.

#### Figure 2.11 COP\_LW Pseudocode Function

COP\_LW (z, rt, memword)
 z: The coprocessor unit number
 rt: Coprocessor general register specifier
 memword: A 32-bit word value supplied to the coprocessor

/\* Coprocessor-dependent action \*/

endfunction COP\_LW

#### 2.2.2.1.2 COP\_LD

The COP\_LD function defines the action taken by coprocessor z when supplied with a doubleword from memory during a load doubleword operation. The action is coprocessor-specific. The typical action would be to store the contents of memdouble in coprocessor general register *rt*.

#### Figure 2.12 COP\_LD Pseudocode Function

COP\_LD (z, rt, memdouble)
z: The coprocessor unit number
rt: Coprocessor general register specifier
memdouble: 64-bit doubleword value supplied to the coprocessor.

/\* Coprocessor-dependent action \*/

endfunction COP\_LD

#### 2.2.2.1.3 COP\_SW

The COP\_SW function defines the action taken by coprocessor z to supply a word of data during a store word operation. The action is coprocessor-specific. The typical action would be to supply the contents of the low-order word in coprocessor general register rt.

#### Figure 2.13 COP\_SW Pseudocode Function

```
dataword ← COP_SW (z, rt)
    z: The coprocessor unit number
    rt: Coprocessor general register specifier
    dataword: 32-bit word value
    /* Coprocessor-dependent action */
endfunction COP SW
```

#### 2.2.2.1.4 COP\_SD

The COP\_SD function defines the action taken by coprocessor z to supply a doubleword of data during a store doubleword operation. The action is coprocessor-specific. The typical action would be to supply the contents of the low-order doubleword in coprocessor general register rt.

#### Figure 2.14 COP\_SD Pseudocode Function

```
datadouble ← COP_SD (z, rt)
    z: The coprocessor unit number
    rt: Coprocessor general register specifier
    datadouble: 64-bit doubleword value
    /* Coprocessor-dependent action */
endfunction COP SD
```

#### 2.2.2.1.5 CoprocessorOperation

The CoprocessorOperation function performs the specified Coprocessor operation.

#### Figure 2.15 CoprocessorOperation Pseudocode Function

CoprocessorOperation (z, cop\_fun) /\* z: Coprocessor unit number \*/ /\* cop\_fun: Coprocessor function from function field of instruction \*/ /\* Transmit the cop\_fun value to coprocessor z \*/ endfunction CoprocessorOperation

#### 2.2.2.2 Memory Operation Functions

Regardless of byte ordering (big- or little-endian), the address of a halfword, word, or doubleword is the smallest byte address of the bytes that form the object. For big-endian ordering this is the most-significant byte; for a little-endian ordering this is the least-significant byte.

In the *Operation* pseudocode for load and store operations, the following functions summarize the handling of virtual addresses and the access of physical memory. The size of the data item to be loaded or stored is passed in the *Access-Length* field. The valid constant names and values are shown in Table 2.1. The bytes within the addressed unit of memory (word for 32-bit processors or doubleword for 64-bit processors) that are used can be determined directly from the *AccessLength* and the two or three low-order bits of the address.

#### 2.2.2.2.1 Misaligned Support

MIPS processors originally required all memory accesses to be naturally aligned. MSA (the MIPS SIMD Architecture) supported misaligned memory accesses for its 128 bit packed SIMD vector loads and stores, from its introduction in MIPS Release 5. Release 6 requires systems to provide support for misaligned memory accesses for all ordinary memory reference instructions: the system must provide a mechanism to complete a misaligned memory reference for this instruction, ranging from full execution in hardware to trap-and-emulate.

The pseudocode function MisalignedSupport encapsulates the version number check to determine if misalignment is supported for an ordinary memory access.

#### Figure 2.16 MisalignedSupport Pseudocode Function

```
predicate \leftarrow MisalignedSupport () return Config.AR \geq 2 // Architecture Revision 2 corresponds to MIPS Release 6. end function
```

See Appendix B, "Misaligned Memory Accesses" on page 511 for a more detailed discussion of misalignment, including pseudocode functions for the actual misaligned memory access.

#### 2.2.2.2.2 AddressTranslation

The AddressTranslation function translates a virtual address to a physical address and its cacheability and coherency attribute, describing the mechanism used to resolve the memory reference.

Given the virtual address *vAddr*, and whether the reference is to Instructions or Data (*IorD*), find the corresponding physical address (*pAddr*) and the cacheability and coherency attribute (*CCA*) used to resolve the reference. If the virtual address is in one of the unmapped address spaces, the physical address and *CCA* are determined directly by the virtual address. If the virtual address is in one of the mapped address spaces then the TLB or fixed mapping MMU determines the physical address and access type; if the required translation is not present in the TLB or the desired access is not permitted, the function fails and an exception is taken.

#### Figure 2.17 AddressTranslation Pseudocode Function

```
(pAddr, CCA) ← AddressTranslation (vAddr, IorD, LorS)
/* pAddr: physical address */
/* CCA: Cacheability&Coherency Attribute,the method used to access caches*/
/* and memory and resolve the reference */
/* vAddr: virtual address */
/* IorD: Indicates whether access is for INSTRUCTION or DATA */
/* LorS: Indicates whether access is for LOAD or STORE */
/* See the address translation description for the appropriate MMU */
/* type in Volume III of this book for the exact translation mechanism */
```

endfunction AddressTranslation

#### 2.2.2.2.3 LoadMemory

The LoadMemory function loads a value from memory.

This action uses cache and main memory as specified in both the Cacheability and Coherency Attribute (*CCA*) and the access (*IorD*) to find the contents of *AccessLength* memory bytes, starting at physical location *pAddr*. The data is returned in a fixed-width naturally aligned memory element (*MemElem*). The low-order 2 (or 3) bits of the address and the *AccessLength* indicate which of the bytes within *MemElem* need to be passed to the processor. If the memory access type of the reference is *uncached*, only the referenced bytes are read from memory and marked as valid within the memory element. If the access type is *cached* but the data is not present in cache, an implementation-specific *size* and *alignment* block of memory is read and loaded into the cache to satisfy a load reference. At a minimum, this block is the entire memory element.

#### Figure 2.18 LoadMemory Pseudocode Function

```
MemElem 	LoadMemory (CCA, AccessLength, pAddr, vAddr, IorD)
                Data is returned in a fixed width with a natural alignment. The */
   /* MemElem:
   /*
                width is the same size as the CPU general-purpose register, */
   /*
                32 or 64 bits, aligned on a 32- or 64-bit boundary, */
   /*
                respectively. */
   /* CCA:
                Cacheability&CoherencyAttribute=method used to access caches */
   /*
                and memory and resolve the reference */
   /* AccessLength: Length, in bytes, of access */
   /* pAddr: physical address */
   /* vAddr:
                virtual address */
   /* IorD:
                Indicates whether access is for Instructions or Data */
```

endfunction LoadMemory

#### 2.2.2.2.4 StoreMemory

The StoreMemory function stores a value to memory.

The specified data is stored into the physical location *pAddr* using the memory hierarchy (data caches and main memory) as specified by the Cacheability and Coherency Attribute (*CCA*). The *MemElem* contains the data for an aligned, fixed-width memory element (a word for 32-bit processors, a doubleword for 64-bit processors), though only the bytes that are actually stored to memory need be valid. The low-order two (or three) bits of *pAddr* and the *AccessLength* field indicate which of the bytes within the *MemElem* data should be stored; only these bytes in memory will actually be changed.

#### Figure 2.19 StoreMemory Pseudocode Function

```
StoreMemory (CCA, AccessLength, MemElem, pAddr, vAddr)
   /* CCA:
                Cacheability&Coherency Attribute, the method used to access */
   /*
                caches and memory and resolve the reference. */
   /* AccessLength: Length, in bytes, of access */
   /* MemElem: Data in the width and alignment of a memory element. */
   /*
                The width is the same size as the CPU general */
   /*
                purpose register, either 4 or 8 bytes, */
   /*
                aligned on a 4- or 8-byte boundary. For a */
   /*
                partial-memory-element store, only the bytes that will be*/
                stored must be valid.*/
   /*
   /* pAddr:
                physical address */
   /* vAddr:
                virtual address */
```

endfunction StoreMemory

#### 2.2.2.2.5 Prefetch

The Prefetch function prefetches data from memory.

Prefetch is an advisory instruction for which an implementation-specific action is taken. The action taken may increase performance but must not change the meaning of the program or alter architecturally visible state.

#### Figure 2.20 Prefetch Pseudocode Function

```
Prefetch (CCA, pAddr, vAddr, DATA, hint)
    /* CCA: Cacheability&Coherency Attribute, the method used to access */
    /* caches and memory and resolve the reference. */
    /* pAddr: physical address */
    /* vAddr: virtual address */
    /* DATA: Indicates that access is for DATA */
    /* hint: hint that indicates the possible use of the data */
```

endfunction Prefetch

Table 2.1 lists the data access lengths and their labels for loads and stores.

| AccessLength Name | Value | Meaning           |
|-------------------|-------|-------------------|
| DOUBLEWORD        | 7     | 8 bytes (64 bits) |
| SEPTIBYTE         | 6     | 7 bytes (56 bits) |
| SEXTIBYTE         | 5     | 6 bytes (48 bits) |
| QUINTIBYTE        | 4     | 5 bytes (40 bits) |
| WORD              | 3     | 4 bytes (32 bits) |
| TRIPLEBYTE        | 2     | 3 bytes (24 bits) |
| HALFWORD          | 1     | 2 bytes (16 bits) |
| BYTE              | 0     | 1 byte (8 bits)   |

 Table 2.1 AccessLength Specifications for Loads/Stores

#### 2.2.2.2.6 SyncOperation

The SyncOperation function orders loads and stores to synchronize shared memory.

This action makes the effects of the synchronizable loads and stores indicated by *stype* occur in the same order for all processors.

#### Figure 2.21 SyncOperation Pseudocode Function

SyncOperation(stype)

/\* stype: Type of load/store ordering to perform. \*/

/\* Perform implementation-dependent operation to complete the \*/

/\* required synchronization operation \*/

endfunction SyncOperation

#### 2.2.2.3 Floating Point Functions

The pseudocode shown in below specifies how the unformatted contents loaded or moved to CP1 registers are interpreted to form a formatted value. If an FPR contains a value in some format, rather than unformatted contents from a load (uninterpreted), it is valid to interpret the value in that format (but not to interpret it in a different format).

#### 2.2.2.3.1 ValueFPR

The ValueFPR function returns a formatted value from the floating point registers.

```
Figure 2.22 ValueFPR Pseudocode Function
```

```
/* value: The formattted value from the FPR */
  /* fpr:
           The FPR number */
  /* fmt:
          The format of the data, one of: */
  /*
           S, D, W, L, PS, */
  /*
           OB, QH, */
  /*
           UNINTERPRETED WORD, */
  /*
           UNINTERPRETED DOUBLEWORD */
  /* The UNINTERPRETED values are used to indicate that the datatype */
  /* is not known as, for example, in SWC1 and SDC1 */
  case fmt of
     S, W, UNINTERPRETED_WORD:
        D, UNINTERPRETED DOUBLEWORD:
        if (FP32RegistersMode = 0)
           if (fpr_0 \neq 0) then
              else
              valueFPR \leftarrow FPR[fpr+1]<sub>31.0</sub> || FPR[fpr]<sub>31.0</sub>
           endif
        else
           endif
     L:
        if (FP32RegistersMode = 0) then
           else
           endif
     DEFAULT:
        endcase
endfunction ValueFPR
```

The pseudocode shown below specifies the way a binary encoding representing a formatted value is stored into CP1 registers by a computational or move operation. This binary representation is visible to store or move-from instructions. Once an FPR receives a value from the StoreFPR(), it is not valid to interpret the value with ValueFPR() in a different format.

#### 2.2.2.3.2 StoreFPR

StoreFPR (fpr, fmt, value)

I

#### Figure 2.23 StoreFPR Pseudocode Function

```
/* fpr: The FPR number */
/* fmt: The format of the data, one of: */
/* S, D, W, L, PS, */
```

```
OB, QH, */
/*
/*
           UNINTERPRETED WORD, */
/*
           UNINTERPRETED DOUBLEWORD */
/* value: The formattted value to be stored into the FPR */
/* The UNINTERPRETED values are used to indicate that the datatype */
/* is not known as, for example, in LWC1 and LDC1 */
case fmt of
   S, W, UNINTERPRETED_WORD:
       FPR[fpr] \leftarrow value
   D, UNINTERPRETED DOUBLEWORD:
       if (FP32RegistersMode = 0)
           if (fpr_0 \neq 0) then
               UNPREDICTABLE
           else
               FPR[fpr] \leftarrow UNPREDICTABLE<sup>32</sup> \parallel value<sub>31.0</sub>
               FPR[fpr+1] \leftarrow UNPREDICTABLE^{32} \parallel value_{63...32}
           endif
       else
           FPR[fpr] ← value
       endif
   L:
       if (FP32RegistersMode = 0) then
           UNPREDICTABLE
       else
           FPR[fpr] \leftarrow value
       endif
endcase
```

endfunction StoreFPR

#### 2.2.2.3.3 CheckFPException

The pseudocode shown below checks for an enabled floating point exception and conditionally signals the exception.

#### Figure 2.24 CheckFPException Pseudocode Function

```
CheckFPException()
```

```
/* A floating point exception is signaled if the E bit of the Cause field is a 1 */ /* (Unimplemented Operations have no enable) or if any bit in the Cause field */ /* and the corresponding bit in the Enable field are both 1 */
```

```
if ( (FCSR<sub>17</sub> = 1) or
            ((FCSR<sub>16..12</sub> and FCSR<sub>11..7</sub>) ≠ 0)) ) then
      SignalException(FloatingPointException)
endif
```

endfunction CheckFPException

#### 2.2.2.3.4 FPConditionCode

The FPConditionCode function returns the value of a specific floating point condition code.

#### Figure 2.25 FPConditionCode Pseudocode Function

```
tf ← FPConditionCode(cc)
    /* tf: The value of the specified condition code */
    /* cc: The Condition code number in the range 0..7 */
    if cc = 0 then
        FPConditionCode ← FCSR<sub>23</sub>
    else
        FPConditionCode ← FCSR<sub>24+cc</sub>
    endif
```

endfunction FPConditionCode

#### 2.2.2.3.5 SetFPConditionCode

The SetFPConditionCode function writes a new value to a specific floating point condition code.

#### Figure 2.26 SetFPConditionCode Pseudocode Function

endfunction SetFPConditionCode

#### 2.2.2.4 Instruction Mode Checking Functions

#### 2.2.2.4.1 Are64BitFPOperationsEnabled

The Are64BitFPOperationsEnabled function is used to determine if a 64-bit floating point instruction may be executed (and conversely, whether a Reserved Instruction exception should be signaled). On a Release 1 processor, such operations are never enabled and this function returns 0. On a Release 2 processor, which supports a 64-bit FPU on a 32-bit processors (and therefore, on a 64-bit processor running with 64-bit operations disabled), the function simply checks the *F64* bit in the *FIR* register.

#### Figure 2.27 Are64BitFPOperationsEnabled Pseudocode Function

```
enabled ← Are64BitFPOperationsEnabled()
    /* enabled: true if 64-bit floating point operations are enabled; */
    /* false if they are not */
    if (ArchitectureRevision() ≥ 2) then
        Are64BitFPOperationsEnabled ← FIR<sub>F64</sub>
    else
        Are64BitFPOperationsEnabled ← 0
    endif
```

endfunction Are64FPBitOperationsEnabled

#### 2.2.2.4.2 IsCoprocessorEnabled

The IsCoprocessorEnabled function is used to determine if access is available to one of the four coprocessors. This is primarily done by looking at the value of the appropriate CU bit in the *Status* register, but complicated by the fact that access to coprocessor 0 is also enabled if the processor is running in Kernel Mode or Debug Mode.

#### Figure 2.28 IsCoprocessorEnabled PseudocodeFunction

```
enabled ← IsCoprocessorEnabled(z)
/* enabled: true if the coprocessor is enabled; false if it is not */
/* z: The coprocessor unit number in the range 0..3 */
case z of
    0:
        IsCoprocessorEnabled ←
            (Status<sub>KSU</sub> = 0b00) or (Debug<sub>DM</sub> = 1) or
            (Status<sub>EXL</sub> = 1) or (Status<sub>ERL</sub> = 1)
    1:
        IsCoprocessorEnabled ← (Status<sub>CU1</sub> = 1)
    2:
        IsCoprocessorEnabled ← (Status<sub>CU2</sub> = 1)
    3:
        IsCoprocessorEnabled ← (Status<sub>CU3</sub> = 1)
    endcase
```

endfunction IsCoprocessorEnabled

#### 2.2.2.4.3 IsCoprocessor2Implemented

The IsCoprocessor2Implemented function is used to determine if coprocessor 2 is implemented. This is determined by the state of the *C*2 bit in the *Config1* register.

#### Figure 2.29 IsCoprocessor2 Pseudocode Function

endfunction IsCoprocessor2Implemented

#### 2.2.2.4.4 IsEJTAGImplemented

The IsEJTAGImplemented function is used to determine if EJTAG is implemented by the processor. This is determined by the state of the *EP* bit in the *Config1* register.

#### Figure 2.30 IsEJTAGImplemented Pseudocode Function

```
impl ← IsEJTAGImplemented()
    /* impl: true if EJTAG is implemented; false if it is not */
    IsEJTAGImplemented ← Config1<sub>EP</sub>
endfunction IsEJTAGImplemented
```

#### 2.2.2.4.5 IsFloatingPointImplemented

The IsFloatingPointImplemented function is used to determine if floating point is implemented by the processor and, additionally, whether a particular floating point datatype is implemented. Whether floating point is implemented at all is determined by the state of the *FP* bit in the *Config1* register. The determination of whether a particular datatype is implemented is done by looking at the architecture of the chip (MIPS32 or MIPS64, as determined by the *AT* field in the *Config* register), and the state of the *S*, *D*, and *PS* bits in the *FIR* coprocessor 1 register.

#### Figure 2.31 IsFloatingPointImplemented Pseudocode Function

```
/* impl: true if floating point is implemented; false if it is not */
   /* fmt: The floating point datatype to be checked:/
   /*
               0: Determine if any floating point datatype is implemented */
   /*
               S, D, W, L, PS: Determine if a specific datatype is */
   /*
                                   implemented
   if Configl_{FP} = 0 then
       IsFloatingPointImplemented \leftarrow 0
   else
       case fmt of
           0:
               IsFloatingPointImplemented \leftarrow 1
           S:
               IsFloatingPointImplemented \leftarrow FIR<sub>s</sub>
           W:
               IsFloatingPointImplemented ←
                   ( ((ArchitectureRevision() = 1) and FIR<sub>S</sub>)
                                           or
                       ((\text{ArchitectureRevision}() \geq 2) \text{ and } \text{FIR}_{W}))
           D:
               IsFloatingPointImplemented \leftarrow FIR<sub>D</sub>
           L: /* L datatype is valid on a MIPS64 Release 1 implementation */
               /* or on a Release 2 implementation with the L bit set in FIR */
               IsFloatingPointImplemented \leftarrow
                   ( ((ArchitectureRevision() = 1) and
                           ((Config_{AT} = 1) \text{ or } (Configl_{AT} = 2)))
                                           or
                       ((\text{ArchitectureRevision}() \geq 2) \text{ and } \text{FIR}_{T}))
           PS:
               <code>IsFloatingPointImplemented</code> \leftarrow \texttt{FIR}_{ps} and
                       ((ArchitectureRevision() = 1) and
                   (
                           ((Config_{AT} = 1) \text{ or } (Configl_{AT} = 2)))
                                           or
                        (ArchitectureRevision() \ge 2))
       endcase
   endif
```

endfunction IsFloatingPointImplemented

#### 2.2.2.5 Pseudocode Functions Related to Sign and Zero Extension

#### 2.2.2.5.1 Sign extension and zero extension in pseudocode

Much pseudocode uses a generic function sign\_extend without specifying from what bit position the extension is done, when the intention is obvious. E.g. sign\_extend(immediate16) or sign\_extend(disp9).

```
However, sometimes it is necessary to specify the bit position. For example, sign_extend(temp<sub>31..0</sub>) or the more complicated (offset<sub>15</sub>) ^{\text{GPRLEN-(16+2)}} || offset || 0<sup>2</sup>.
```

The explicit notation sign\_extend.nbits(val) or sign\_extend(val,nbits) is suggested as a simplification. They say to sign extend as if an nbits-sized signed integer. The width to be sign extended to is usually apparent by context, and is usually GPRLEN, 32 or 64 bits. The previous examples then become.

```
sign_extend(temp<sub>31..0</sub>)
= sign extend.32(temp)
```

and

```
(offset_{15})^{GPRLEN-(16+2)} || offset || 0^2 = sign_extend.16(offset) << 2
```

Note that sign\_extend.N(value) extends from bit position N-1, if the bits are numbered 0..N-1 as is typical.

The explicit notations sign\_extend.nbits(val) or sign\_extend(val, nbits) is used as a simplification. These notations say to sign extend as if an nbits-sized signed integer. The width to be sign extended to is usually apparent by context, and is usually GPRLEN, 32 or 64 bits.

#### Figure 2.32 sign\_extend Pseudocode Functions

```
sign_extend.nbits(val) = sign_extend(val,nbits) /* syntactic equivalents */
function sign_extend(val,nbits)
    return (val_nbits-1) GPRLEN-nbits || val_nbits-1..0
end function
The earlier examples can be expressed as
    (offset<sub>15</sub>) GPRLEN-(16+2) || offset || 0<sup>2</sup>
    = sign_extend.16(offset) << 2)
and
    sign_extend(temp<sub>31..0</sub>)
    = sign_extend.32(temp)
```

Similarly for zero\_extension, although zero extension is less common than sign extension in the MIPS ISA.

Floating point may use notations such as zero\_extend.fmt corresponding to the format of the FPU instruction. E.g. zero\_extend.S and zero\_extend.D are equivalent to zero\_extend.32 and zero\_extend.64.

Existing pseudocode may use any of these, or other, notations.

#### 2.2.2.5.2 memory\_address

The pseudocode function memory\_address performs mode-dependent address space wrapping for compatibility between MIPS32 and MIPS64. It is applied to all memory references. It may be specified explicitly in some places, particularly for new memory reference instructions, but it is also declared to apply implicitly to all memory references as defined below. In addition, certain instructions that are used to calculate effective memory addresses but which are not themselves memory accesses specify memory\_address explicitly in their pseudocode.

Figure 2.33 memory\_address Pseudocode Function

function memory\_address(ea)
 return ea
end function

On a 32-bit CPU, memory address returns its 32-bit effective address argument unaffected.

In addition to the use of memory\_address for all memory references (including load and store instructions, LL/SC), Release 6 extends this behavior to control transfers (branch and call instructions), and to the PC-relative address calculation instructions (ADDIUPC, AUIPC, ALUIPC). In newer instructions the function is explicit in the pseudo-code.

Implicit address space wrapping for all instruction fetches is described by the following pseudocode fragment which should be considered part of instruction fetch:

#### Figure 2.34 Instruction Fetch Implicit memory\_address Wrapping

```
PC ← memory_address( PC )
( instruction_data, length ) ← instruction_fetch( PC )
/* decode and execute instruction */
```

Implicit address space wrapping for all data memory accesses is described by the following pseudocode, which is inserted at the top of the AddressTranslation pseudocode function:

```
Figure 2.35 AddressTranslation implicit memory_address Wrapping
(pAddr, CCA) ← AddressTranslation (vAddr, IorD, LorS)
vAddr ← memory address(vAddr)
```

In addition to its use in instruction pseudocode,

#### 2.2.2.6 Miscellaneous Functions

This section lists miscellaneous functions not covered in previous sections.

#### 2.2.2.6.1 SignalException

The SignalException function signals an exception condition.

This action results in an exception that aborts the instruction. The instruction operation pseudocode never sees a return from this function call.

#### Figure 2.36 SignalException Pseudocode Function

SignalException(Exception, argument)

```
/* Exception: The exception condition that exists. */
/* argument: A exception-dependent argument, if any */
```

endfunction SignalException

#### 2.2.2.6.2 SignalDebugBreakpointException

The SignalDebugBreakpointException function signals a condition that causes entry into Debug Mode from non-Debug Mode.

This action results in an exception that aborts the instruction. The instruction operation pseudocode never sees a return from this function call.

#### Figure 2.37 SignalDebugBreakpointException Pseudocode Function

SignalDebugBreakpointException()

endfunction SignalDebugBreakpointException

#### 2.2.2.6.3 SignalDebugModeBreakpointException

The SignalDebugModeBreakpointException function signals a condition that causes entry into Debug Mode from Debug Mode (i.e., an exception generated while already running in Debug Mode).

This action results in an exception that aborts the instruction. The instruction operation pseudocode never sees a return from this function call.

#### Figure 2.38 SignalDebugModeBreakpointException Pseudocode Function

```
SignalDebugModeBreakpointException()
```

endfunction SignalDebugModeBreakpointException

#### 2.2.2.6.4 NullifyCurrentInstruction

The NullifyCurrentInstruction function nullifies the current instruction.

The instruction is aborted, inhibiting not only the functional effect of the instruction, but also inhibiting all exceptions detected during fetch, decode, or execution of the instruction in question. For branch-likely instructions, nullification kills the instruction in the delay slot of the branch likely instruction.

#### Figure 2.39 NullifyCurrentInstruction PseudoCode Function

```
NullifyCurrentInstruction()
```

endfunction NullifyCurrentInstruction

#### 2.2.2.6.5 PolyMult

The PolyMult function multiplies two binary polynomial coefficients.

#### Figure 2.40 PolyMult Pseudocode Function

```
PolyMult(x, y)

temp \leftarrow 0

for i in 0 .. 31

if x_i = 1 then

temp \leftarrow temp xor (y_{(31-i)..0} || 0^i)

endif

endfor

PolyMult \leftarrow temp
```

endfunction PolyMult

# 2.3 Op and Function Subfield Notation

In some instructions, the instruction subfields *op* and *function* can have constant 5- or 6-bit values. When reference is made to these instructions, uppercase mnemonics are used. For instance, in the floating point ADD instruction, *op*=COP1 and *function*=ADD. In other cases, a single field has both fixed and variable subfields, so the name contains both upper- and lowercase characters.

# 2.4 FPU Instructions

In the detailed description of each FPU instruction, all variable subfields in an instruction format (such as *fs*, *ft*, *imme*-*diate*, and so on) are shown in lowercase. The instruction name (such as ADD, SUB, and so on) is shown in upper-case.

For the sake of clarity, an alias is sometimes used for a variable subfield in the formats of specific instructions. For example, rs=base in the format for load and store instructions. Such an alias is always lowercase since it refers to a variable subfield.

Bit encodings for mnemonics are given in Volume I, in the chapters describing the CPU, FPU, MDMX, and MIPS16e instructions.

See "Op and Function Subfield Notation" on page 28 for a description of the op and function subfields.

# **Chapter 3**

# The nanoMIPS® DSP Application Specific Extension to the nanoMIPS32® Architecture

# 3.1 Base Architecture Requirements

The Release 6 nanoMIPS DSP Module requires the implementation of the Release 6 nanoMIPS baseline architecture for support, specifically the Instruction Set and Privileged Resource Architectures.

# 3.2 Compliance and Subsetting

Instruction subsetting is not allowed for any version of the DSP Module.

# 3.3 Introduction to the nanoMIPS® DSP Module

This document contains a complete specification of the DSP Module for the nanoMIPS32<sup>TM</sup> architecture. Statements about DSP Module include MIPS DSP Rev1/2/3 and nanoMIPS DSP except where noted. The table entries in Chapter 4, "nanoMIPS® DSP Module Instruction Summary" on page 41 contain notations which flag the Rev2 instructions, and changes related to nanoMIPS; this information is also available in the per instruction pages. The extensions comprises new integer instructions and new state that includes new HI-LO accumulator pairs and a *DSPControl* register. 32-bit and 64-bit versions of the DSP Module exist which can be included with 32-bit and 64-bit versions of the baseline architecture, respectively.

The Module has been designed to benefit a wide range of DSP, multimedia, and DSP-like algorithms. The performance increase from these extensions can be used to integrate DSP-like functionality into MIPS cores used in a SOC (System on Chip), potentially reducing overall system cost. The Module includes many of the typical features found in other integer-based DSP extensions, for example, support for operations on fractional data types and register SIMD (Single Instruction Multiple Data) operations such as add, subtract, multiply, shift, etc. In addition, the extensions includes some key features that efficiently address specific problems often encountered in DSP applications. These include, for example, support for complex multiplication, variable bit insertion and extraction, and the implementation and use of virtual circular buffers.

This chapter contains a basic overview of the principles behind DSP application processing and the data types and structures needed to efficiently process such applications. Chapter 4, "nanoMIPS® DSP Module Instruction Summary" on page 41, contains a list of all the instructions in the DSP Module arranged by function type. Chapter 5, "Instruction Encoding" on page 57, describes the position of the new instructions in the MIPS instruction opcode map. The rest of the specification contains a complete list of all the instructions that comprise the DSP Module, and serves as a quick reference guide to all the instructions. Finally, various Appendix chapters describe how to implement and use the DSP Module instructions in some common algorithms and inner loops.

# 3.4 DSP Applications and their Requirements

The DSP Module has been designed specifically to improve the performance of a set of DSP and DSP-like applications. Table 3.1 shows these application areas sorted by the size of the data operands typically preferred by that application for internal computations. For example, raw audio data is usually signed 16-bit, but 32-bit internal calculations are often necessary for high quality audio. (Typically, an internal precision of about 28 bits may be all that is required which can be achieved using a fractional data type of the appropriate width.) There is some cross-over in some cases, which are not explicitly listed here. For example, some hand-held consumer devices may use lower precision internal arithmetic for audio processing, that is, 16-bit internal data formats may be sufficient for the quality required for hand-held devices.

| In/Out Data Size | Internal Data Size | Applications                                                                                                                                                                                                                                                                        |
|------------------|--------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 8 bits           | 8/16 bits          | <ul><li>Printer image processing.</li><li>Still JPEG processing.</li><li>Moving video processing</li></ul>                                                                                                                                                                          |
| 16 bits          | 16 bits            | <ul> <li>Voice Processing. For example, G.723.1, G.729, G.726, echo cancellation, noise cancellation, channel equalization, etc.</li> <li>Soft modem processing. For example V.92.</li> <li>General DSP processing. For example, filters, correlation, convolution, etc.</li> </ul> |
| 16/24 bits       | 32 bits            | <ul> <li>Audio decoding and encoding. For example, MP3, AAC, SRS TruSurround,<br/>Dolby Digital Decoder, Pro Logic II, etc.</li> </ul>                                                                                                                                              |

|  | Table 3.1 | Data | Size o | f DSP | Ap | plications |
|--|-----------|------|--------|-------|----|------------|
|--|-----------|------|--------|-------|----|------------|

# 3.5 Fixed-Point Data Types

Typical implementations of DSP algorithms use fractional fixed-point arithmetic, for reasons of size, cost, and power efficiency. Unlike floating-point arithmetic, fractional fixed-point arithmetic assumes that the position of the decimal point is fixed with respect to the bits representing the fractional value in the operand. To understand this type of arithmetic further, please consult DSP textbooks or other references that are easily available on the internet.

Fractional fixed-point data types are often referred to using Q format notation. The general form for this notation is Qm.n, where Q designates that the data is in fractional fixed-point format, *m* is the number of bits used to designate the twos complement integer portion of the number, and *n* is the number of bits used to designate the twos complement fractional part of the number. Because the twos complement number is signed, the number of bits required to express a number is m+n+1, where the additional bit is required to denote the sign. In typical usage, it is very common for *m* to be zero. That is, only fractional bits are represented. In this case, a Q notation of the form Q0.*n* is abbreviated to Q*n*.

For example, a 32-bit word can be used to represent data in Q31 format, which implies one (left-most) sign bit followed by the binary point and then 31 bits representing the fractional data value. The interpretation of the 32 bits of the Q31 representation is shown in Table 3.2. Negative values are represented using the twos-complement of the equivalent positive value. This format can represent numbers in the range of -1.0 to +0.9999999999... Similarly a 16-bit halfword can be used to represent data in Q15 format, which implies one sign bit followed by 15 fractional bits that represent a value between -1.0 and +0.9999...

#### Table 3.2 The Value of a Fixed-Point Q31 Number

| + | 2-1 | 2-2 | 2-3 | 2-4 | 2-5 | 2-6 | 2-7 | 2-8 | 2-9 | 2-10 | 2-11 |   | 2-13 | 2-14 | 2-15 |   |   |   | 2-19 | 2-20 | 2-21 | 7-22 | 2-23 | 2-24 | 2-25 | 2-26 | 2-27 | 2-28 | 2-29 |   | 2-31 |
|---|-----|-----|-----|-----|-----|-----|-----|-----|-----|------|------|---|------|------|------|---|---|---|------|------|------|------|------|------|------|------|------|------|------|---|------|
| - | 2   | 2   | 2   | 2   | 2   | 2   | 2   | 2   | 2   | 2    | 2    | 2 | 2    | 2    | 2    | 2 | 2 | 2 | 2    | 2    | 2    | 2    | 2    | 2    | 2    | 2    | Ĺ    | 2    | 2    | 2 | 2    |

Table 3.3 shows the limits of the Q15 and the Q31 representations. Note that the value -1.0 can be represented exactly, but the value +1.0 cannot. For practical purposes, 0x7FFFFFF is used to represent 1.0 inexactly. Thus, the multiplication of two values where both are -1 will result in an overflow since there is no representation for +1 in fixed-point format. Saturating instructions must check for this case and prevent the overflow by clamping the result to the maximal representable value. Instructions in the DSP Module that operate on fractional data types include a "Q" in the instruction mnemonic; the assumed size of the instruction operands is detailed in the instruction description.

| Fixed-Point<br>Representation | Definition                        | Hexadecimal<br>Representation | Decimal<br>Equivalent             |
|-------------------------------|-----------------------------------|-------------------------------|-----------------------------------|
| Q15 minimum                   | -2 <sup>15</sup> /2 <sup>15</sup> | 0x8000                        | -1.0                              |
| Q15 maximum                   | $(2^{15}-1)/2^{15}$               | 0x7FFF                        | 0.999969482421875                 |
| Q31 minimum                   | $-2^{31}/2^{31}$                  | 0x80000000                    | -1.0                              |
| Q31 maximum                   | $(2^{31}-1)/2^{31}$               | 0x7FFFFFFF                    | 0.9999999995343387126922607421875 |

Given a fixed-point representation, we can compute the corresponding decimal value by using bit weights per position as shown in Figure 3.1 for a hypothetical Q7 format number representation with 8 total bits.

DSP applications often, but not always, prefer to saturate the result after an arithmetic operation that causes an overflow or underflow. For operations on signed values, saturation clamps the result to the smallest negative or largest positive value in the case of underflow and overflow, respectively. For operations on unsigned values, saturation clamps the result to either zero or the maximum positive value.

| bit<br>weights             | -2 <sup>0</sup>        | 2 <sup>-1</sup> | 2-2   | 2-3      | 2 <sup>-4</sup> | 2 <sup>-5</sup> | 2 <sup>-6</sup> | 2-7 |                                                                                                                     |  |  |  |  |  |  |
|----------------------------|------------------------|-----------------|-------|----------|-----------------|-----------------|-----------------|-----|---------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|--|
| Example                    |                        |                 |       |          |                 |                 |                 |     | decimal $2^{-1} + 2^{-2} + 2^{-5}$                                                                                  |  |  |  |  |  |  |
| binary<br>value            | 0                      | 1               | 1     | 0        | 0               | 1               | 0               | 0   | value is $= 0.5 + 0.25 + 0.03125$<br>= 0.78125                                                                      |  |  |  |  |  |  |
|                            |                        |                 |       |          |                 |                 |                 |     |                                                                                                                     |  |  |  |  |  |  |
| Example<br>binary<br>value | 0                      | 0               | 1     | 1        | 0               | 0               | 0               | 0   | decimal $2^{-2} + 2^{-3}$<br>value is $= 0.25 + 0.125$<br>= 0.375                                                   |  |  |  |  |  |  |
|                            |                        |                 |       |          |                 |                 |                 |     |                                                                                                                     |  |  |  |  |  |  |
|                            |                        |                 |       |          |                 |                 |                 |     |                                                                                                                     |  |  |  |  |  |  |
|                            | maximum positive value |                 |       |          |                 |                 |                 |     |                                                                                                                     |  |  |  |  |  |  |
| Example<br>binary<br>value | 0                      | 1               | 1     | 1        | 1               | 1               | 1               | 1   | decimal $2^{-1} + 2^{-2} + 2^{-3} + 2^{-4}$<br>value is $2^{-5} + 2^{-6} + 2^{-7}$<br>= 0.5 + 0.25 + 0.125 + 0.0625 |  |  |  |  |  |  |
|                            |                        |                 |       |          |                 |                 |                 |     | + 0.03125 + 0.01562 + 0.00781 = 0.99218                                                                             |  |  |  |  |  |  |
|                            | ·                      | 1               | 1     | 1        | 1               | 1               | 1               | 1   | decimal $-2^0 + 2^{-2} + 2^{-4}$                                                                                    |  |  |  |  |  |  |
| Example<br>binary<br>value | 1                      | 0               | 1     | 0        | 1               | 0               | 0               | 0   | value is $= -1.0 + 0.25 + 0.0625$<br>= -0.6875                                                                      |  |  |  |  |  |  |
|                            |                        |                 |       |          |                 |                 |                 |     |                                                                                                                     |  |  |  |  |  |  |
|                            |                        |                 | maxim | um negat | tive value      |                 |                 |     |                                                                                                                     |  |  |  |  |  |  |
| Example<br>binary<br>value | 1                      | 0               | 0     | 0        | 0               | 0               | 0               | 0   | decimal $-2^0$<br>value is $=-1.0$                                                                                  |  |  |  |  |  |  |
|                            |                        |                 |       |          |                 |                 |                 |     |                                                                                                                     |  |  |  |  |  |  |

#### Figure 3.1 Computing the Value of a Fixed-Point (Q7) Number

# 3.6 Saturating Math

Many of the DSP Module arithmetic instructions provide optional saturation of the results, as detailed in each instructions description.

Saturation of fixed-point addition, subtraction, or shift operations that result in an underflow or overflow requires clamping the result value to the closest available fixed-point value representable in the given number of result bits. For operations on unsigned values, underflow is clamped to zero, and overflow to the largest positive fixed-point value. For operations on signed values, underflow is clamped to the minimum negative fixed-point value and overflow to the maximum positive value.

Saturation of fractional fixed-point multiplication operations clamps the result to the maximum representable fixed-point value when both input multiplicands are equal to the minimum negative value of -1.0, which is independent of the Q format used.

# 3.7 Conventions Used in the Instruction Mnemonics

DSP Module instructions with a **Q** in the mnemonic assume the input operands to be in fractional fixed-point format. Multiplication instructions that operate on fractional fixed-point data will not produce correct results when used with integer fixed-point data. However, addition and subtraction instructions will work correctly with either fractional fixed-point or signed integer fixed-point data.

Instructions that use unsigned data are indicated with the letter U. This letter appears after the letter Q for fractional in the instruction mnemonic. For example, the **ADDQU** instruction performs an unsigned addition of fractional data. In the MIPS base instruction set, the overflow trap distinguishes signed and unsigned arithmetic instructions. In the DSP Module, the results of saturation distinguish signed and unsigned arithmetic instructions.

Some instructions provide optional rounding up, saturation, or rounding up and saturation of the result(s). These instructions use one of the modifiers \_RS, \_R, \_S, or \_SA in their mnemonic. For example, MULQ\_RS is a multiply instruction (MUL) where the result is the same size as the input operands (indicated by the absence of E for expanded result in the mnemonic) that assumes fractional (Q) input data operands, and where the result is rounded up and saturated (\_RS) before writing the result in the destination register. (For fractional multiplication, saturation clamps the result to the maximum positive representable value if both multiplicands are equal to -1.0.) Several multiply-accumulate (dot product) instructions use a variant of the saturation flag, \_SA, indicating that the accumulated value is saturated in addition to the regular fractional multiplication saturation check.

The DSP Module instructions provide support for single-instruction, multiple data (SIMD) operations where a single instruction can invoke multiple operation on multiple data operands. As noted previously, DSP applications typically use data types that are 8, 16, or 32 bits wide. In the nanoMIPS32 architecture a general-purpose register (GPR) is 32 bits wide, and in the nanoMIPS64 architecture, 64 bits wide. Thus, each GPR can be used to hold one or more operands of each size. For example, a 64-bit GPR can store eight 8-bit operands, a 32-bit GPR can store two 16-bit operands, and so on. A GPR containing multiple data operands is referred to as a *vector*.

nanoMIPS32 implementations of the DSP Module support three basic formats for data operands: 32 bit, 16 bit, and 8 bit. The latter format is motivated by the fact that video applications typically operate on 8-bit data. The instruction mnemonics indicate the supported data types as follows:

- W = "Word", 1 × 32-bit
- PH = "Paired Halfword", 2 × 16-bit. See Figure 3.2.
- QB = "Quad Byte",  $4 \times 8$ -bit. See Figure 3.3.



#### Figure 3.2 A Paired-Half (PH) Representation in a GPR for the microMIPS32 Architecture

Figure 3.3 A Quad-Byte (QB) Representation in a GPR for the nanoMIPS32 Architecture



For example, **MULQ\_RS.PH rd**, **rs**,**rt** refers to the multiply instruction (**MUL**) that multiplies two vector elements of type fractional (**Q**) 16 bit (Halfword) data (**PH**) with rounding and saturation (**\_RS**). Each source register supplies two data elements and the two results are written into the destination register in the corresponding vector position as shown in Figure 3.4.

When an instruction shows two format types, then the first is the output size and the second is the input size. For example, **PRECRQ.PH.W** is the (fractional) precision reduction instruction that creates a **PH** output format and uses **W** format as input from the two source registers. When the instruction only shows one format then this implies the same source and destination format.



Figure 3.4 Operation of MULQ\_RS.PH rd, rs, rt

# 3.8 Effect of Endian-ness on Register SIMD Data

The order of data in memory and therefore in the register has a direct impact on the algorithm being executed. To reduce the effort required by the programmer and the development tools to take endian-ness into account, many of the instructions operate on pre-defined bits of a given register. The assembler can be used to map the endian-agnostic names to the actual instructions based on the endian-ness of the processor during the compilation and assembling of the instructions.

When a SIMD vector is loaded into a register or stored back to memory from a register, the endian-ness of the processor and memory has an impact on the view of the data. For example, consider a vector of eight byte values aligned in memory on a 64-bit boundary and loaded into a 64-bit register using the load double instruction: the order of the eight byte values within the register depends on the processor endian-ness. In a big-endian processor, the byte value stored at the lowest memory address is loaded into the left-most (most-significant) 8 bits of the 64-bit register. In a little-endian processor, the same byte value is loaded into the right-most (least-significant) 8 bits of the register.

In general, if the byte elements are numbered 0-7 according to their order in memory, in a big-endian configuration, element 0 is at the most-significant end and element 7 is at the least-significant end. In a little-endian configuration, the order is reversed. This effect applies to all the sizes of data when they are in SIMD format.

To avoid dealing with the endian-ness issue directly, the instructions in the DSP Module simply refer to the left and right elements of the register when it is required to specify a subset of the elements. This issue can quite easily be dealt with in the assembler or user code using suitably defined mnemonics that use the appropriate instruction for a given endian-ness of the processor. A description of how to do this is specified in Appendix A.

# 3.9 Additional Register State for the DSP Module

The DSP Module adds four new registers. The operating system is required to recognize the presence of the DSP Module and to include these additional registers in context save and restore operations.

• Three additional *HI-LO* registers to create a total of four accumulator registers. Many common DSP computations involve accumulation, e.g., convolution. DSP Module instructions that target the accumulators use two bits to specify the destination accumulator, with the zero value referring to the original accumulator of the MIPS architecture.

Release 6 of the MIPS Architecture moves the accumulators into the DSP Module for use as a DSP resource exclusively.

• A new control register, *DSPControl*, is used to hold extra state bits needed for efficient support of the new instructions. Figure 3.5 illustrates the bits in this register. Table 3.4 describes the use of the various bits and the instructions that refer to the fields. Table 3.5 lists the instructions that affect the *DSPControl* register *ouflag* field.

#### Figure 3.5 MIPS® DSP Module Control Register (DSPControl) Format

| 31 28 | 27 24 | 23 16  | 15 | 14 13 | 12     | 76 | 5 0 |
|-------|-------|--------|----|-------|--------|----|-----|
| 0     | ccond | ouflag | 0  | EFI c | scount | 0  | pos |

#### Table 3.4 MIPS® DSP Module Control Register (DSPControl) Field Descriptions

| Fie   | elds      |                                                                                                                                                                                                                                                                                            | Read / | Reset |            |
|-------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|-------|------------|
| Name  | Bits      | Description                                                                                                                                                                                                                                                                                | Write  | State | Compliance |
| 0     | 31:28, 15 | Not used in the nanoMIPS32 architecture, but<br>these are reserved bits since they are used in<br>the nanoMIPS64 architecture. Must be written<br>as zero; returns zero on read.                                                                                                           | 0      | 0     | Required   |
| ccond | 27:24     | Condition code bits set by vector comparison<br>instructions and used as source selectors by<br>PICK instructions. The vector element size<br>determines the number of bits set by a compar-<br>ison (1, 2, or 4); bits not set are <b>UNPRE-</b><br><b>DICTABLE</b> after the comparison. | R/W    | 0     | Required   |

| Fie    | elds  |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | Read / | Reset |            |
|--------|-------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|-------|------------|
| Name   | Bits  | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | Write  | State | Compliance |
| ouflag | 23:16 | Overflow/underflow indication bits set when<br>the result(s) of specific instructions (listed in<br>Table 3.5) caused, or, if optional saturation has<br>been used, would have caused overflow or<br>underflow.                                                                                                                                                                                                                                                                                                                | R/W    | 0     | Required   |
| EFI    | 14    | Extract Fail Indicator. This bit is set to 1 when<br>one of the extraction instructions (EXTP,<br>EXTPV, EXTPDP, or EXTPDP) fails. Failure<br>occurs when there are insufficient bits to<br>extract, i.e., when the value of the <i>pos</i> field in<br>the <i>DSPControl</i> register is less than the <i>size</i><br>argument specified in the instruction. This bit<br>is not sticky—the bit is set or reset after each<br>extraction operation.                                                                            | R/W    | 0     | Required   |
| с      | 13    | Carry bit set and used by a special add instruc-<br>tion used to implement a 64-bit addition across<br>two GPRs in a nanoMIPS32 implementation.<br>Instruction ADDSC sets the bit and instruction<br>ADDWC uses this bit.                                                                                                                                                                                                                                                                                                      | R/W    | 0     | Required   |
| scount | 12:7  | This field is used by the INSV instruction to specify the size of the bit field to be inserted.                                                                                                                                                                                                                                                                                                                                                                                                                                | R/W    | 0     | Required   |
| pos    | 5:0   | This field is used by the variable insert instruc-<br>tion INSV to specify the position to insert bits.<br>It is also used to indicate the extract position<br>for the EXTP, EXTPV, EXTPDP, and EXTPD-<br>PVinstructions. The <i>decrement pos</i> (DP) vari-<br>ants of these instructions decrement the value<br>of the pos field by the amount <i>size</i> +1 after the<br>extraction completes successfully.<br>The MTHLIP instruction increments the value<br>of <i>pos</i> by 32 after copying the value of LO to<br>HI. | R/W    | 0     | Required   |

#### Table 3.4 MIPS® DSP Module Control Register (DSPControl) Field Descriptions

The bits of the overflow flag (*ouflag*) field in the *DSPControl* register are set by a number of instructions. These bits are sticky and can be reset only by an explicit write to these bits in the register (using the **WRDSP** instruction). The table below shows which bits can be set by which instructions and under what conditions.

### Table 3.5 Instructions that set the ouflag bits in DSPControl

| Bit Number | Instructions That Set This Bit                                                                                                                                                                                                                               |
|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 16         | Instructions that set this bit when the destination is accumulator ( <i>HI-LO</i> pair) zero and an operation over-<br>flow or underflow occurs are: DPAQ_S, DPAQ_SA, DPSQ_S, DPSQ_SA, MAQ_S, MAQ_SA, and<br>MULSAQ_S, DPAQX_S, DPAQX_SA, DPSQX_S, DPSQX_SA. |
| 17         | Instructions as above, when the destination is accumulator ( <i>HI-LO</i> pair) one.                                                                                                                                                                         |
| 18         | Instructions as above, when the destination is accumulator ( <i>HI-LO</i> pair) two.                                                                                                                                                                         |

| Bit Number | Instructions That Set This Bit                                                                                                                                       |
|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 19         | Instructions as above, when the destination is accumulator ( <i>HI-LO</i> pair) three.                                                                               |
| 20         | Instructions that on an overflow/underflow will set this bit are: ABSQ_S, ADD, ADD_S, ADDQ, ADDQ_S, ADDU, ADDU_S, ADDWC, SUB, SUB_S, SUBQ, SUBQ_S, SUBU, and SUBU_S. |
| 21         | Instructions that on an overflow/underflow will set this bit are: MUL, MUL_S, MULEQ_S, MULEU_S, MULQ_RS, and MULQ_S.                                                 |
| 22         | Instructions that on an overflow/underflow will set this bit are: PRECRQ_RS, PRECRQU_RS, SHLL, SHLL_S, SHLLV, and SHLLV_S.                                           |
| 23         | Instructions that on an overflow/underflow will set this bit are: EXTR, EXTR_S, EXTR_RS, EXTRV, EXTRV_RS                                                             |

#### Table 3.5 Instructions that set the ouflag bits in DSPControl

# 3.10 Software Detection of the DSP Module

Bit 10 in the *config3* CP0 register, "DSP Present" (DSPP), is used to indicate the presence of the DSP Module Rev1, and bit 11, "DSP Rev2 Present," (DSP2P), the presence of the DSP Module Rev2, as shown in Figure 3.6. Valid DSP Module Rev2 implementations set both DSPP and DSP2P bits: the condition of DSP2P set and DSPP unset is invalid. Software may read the DSPP, DSP2P bits of the *Config3* CP0 register to check whether this processor has implemented the DSP Module Rev1 and DSP Module Rev2.

Release 6 of the MIPS Architecture moves the accumulators into the DSP Module for use as a DSP resource exclusively, and introduces the compact branch BPOSGE32C, for which DSP Module Rev3 is required. An implementation supports Rev3 if CP0 Config3<sub>DSPP</sub>=1 and Config3<sub>DSP2P</sub>=1 and Config3<sub>DSP2P</sub>=2.

Software must read Config3<sub>MMAR</sub> to determine if Release 6 nanoMIPS is supported. If CP0 Config3<sub>DSPP</sub>=1 and Config3<sub>DSP2P</sub>=1 and Config3<sub>MMAR</sub>>=3, then Release 6 nanoMIPS DSP is supported.

Any attempt to execute DSP Module instructions must cause a Reserved Instruction Exception if DSPP, and DSP2P are not indicating the presence of the appropriate DSP Module implementation. The DSPP and DSP2P bits are fixed by the hardware implementation and are read-only for software.

|    | Figure 3.6 Config3 Register Format |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |       |      |   |   |   |   |   |   |   |   |   |   |
|----|------------------------------------|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-------|------|---|---|---|---|---|---|---|---|---|---|
| 31 | 30                                 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11    | 10   | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|    |                                    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    | DSP2P | DSPP |   |   |   |   |   |   |   |   |   |   |

The "DSP Module Enable" (DSPEn) bit—the MX bit, bit 24 in the CP0 *Status* register as shown in Figure 3.7—is used to enable access to the extra instructions defined by the DSP Module as well as enabling four modified move instructions (MTLO/HI and MFLO/HI) that provide access to the three additional accumulators *ac1*, *ac2*, and *ac3*. Executing a DSP Module instruction or one of the four modified move instructions when DSPEn is set to zero causes a DSP State Disabled Exception and results in exception code 26 in the CP0 *Cause* register. This allows the OS to do lazy context-switching. Table 3.6 shows the *Cause* Register exception code fields.

#### Figure 3.7 CP0 Status Register Format

| 31 | 25 24 2 | 23 |  |  | 0 |
|----|---------|----|--|--|---|
|    | MX      |    |  |  |   |

| Exception Code Value |                     |        |                                     |
|----------------------|---------------------|--------|-------------------------------------|
| Decimal              | Decimal Hexadecimal |        | Description                         |
| 26                   | 16#1a               | DSPDis | DSP Module State Disabled Exception |

#### Table 3.6 Cause Register ExcCode Field

# 3.11 Exception Table for the DSP Module

Table 3.7 shows the exceptions caused when a DSP Module or DSP Module Rev2 instruction, MTLO/HI or MFLO/HI, or any other instruction such as an CorExtend instruction attempts to access the new DSP Module state, that is, *ac1*, *ac2*, or *ac3*, or the *DSPControl* register, and all other possible exceptions that relate to the DSP Module.

Implementation Note: Any implementation of the DSP Module must not read or write ac1, ac2, or ac3 if  $Status_{MX}=0$  for any instruction which might be interpreted as having a field which encodes an accumulator number. Such instructions include:

- DSP Module Rev1, Rev2 instructions
- MADD, MADDU, MSUB, MSUBU, MULT OR MULTU from the base instruction set.
- MADDP, MFLHXU, MTLHX, MULTP, or PPERM from the SmartMIPS® ASE instruction set.

| Config3 <sub>DSP2P</sub> | Config3 <sub>DSPP</sub> | Status <sub>MX</sub> | Exception for<br>DSP Module Rev2<br>(or Greater)<br>Instructions | Exception for DSP<br>Module Rev1<br>Instructions |  |  |
|--------------------------|-------------------------|----------------------|------------------------------------------------------------------|--------------------------------------------------|--|--|
| 0                        | 0                       | ×                    | Reserved Instruction                                             |                                                  |  |  |
| 0                        | 1                       | 0                    | Reserved Instruction                                             | DSP Module State Dis-<br>abled                   |  |  |
| 0                        | 1                       | 1                    | Reserved Instruction                                             | None                                             |  |  |
| 1                        | 1                       | 0                    | DSP Module State Disabled                                        |                                                  |  |  |
| 1                        | 1                       | 1                    | None                                                             |                                                  |  |  |
| 1                        | 1                       | 0                    | DSP Module State Disabled                                        |                                                  |  |  |
| 1                        | 1                       | 1                    | None                                                             |                                                  |  |  |

#### Table 3.7 Exception Table for the DSP Module

# 3.12 DSP Module Instructions that Read and Write the DSPControl Register

Many DSP Module instructions read and write the *DSPControl* register, some explicitly and some implicitly. Like other register resource in the architecture, it is the responsibility of the hardware implementation to ensure that appropriate execution dependency barriers are inserted and the pipeline stalled for read-after-write dependencies and other data dependencies that may occur. Table 3.8 lists the DSP Module instructions that can read and write the *DSPControl* 

register and the bits or fields in the register that they read or write.

| Instruction                                        | Read/Write | DSPControl Field (Bits) |
|----------------------------------------------------|------------|-------------------------|
| WRDSP                                              | W          | All (31:0)              |
| EXTPDP, EXTPDPV,MTHLIP                             | W          | pos (5:0)               |
| ADDSC                                              | W          | c (13)                  |
| EXTP, EXTPV, EXTPDP, EXTPDPV                       | W          | EFI (14)                |
| See Table 3.5                                      | W          | ouflag (23:16)          |
| CMP, CMPU, and CMPGDU variants                     | W          | ccond (27:24)           |
| RDDSP                                              | R          | All (31:0)              |
| BPOSGE32C, EXTP, EXTPV, EXTPDP, EXTP-<br>DPV, INSV | R          | pos (5:0)               |
| INSV                                               | R          | scount (12:7)           |
| ADDWC                                              | R          | c (13)                  |
| PICK variants                                      | R          | ccond (27:24)           |

#### Table 3.8 Instructions that Read/Write Fields in DSPControl

# 3.13 Arithmetic Exceptions

Under no circumstances do any of the DSP Module instructions cause an arithmetic exception. Other exceptions are possible, for example, the indexed load instruction can cause an address exception. The specific exceptions caused by the different instructions are listed in the per-instruction description pages.

# nanoMIPS® DSP Module Instruction Summary

# 4.1 The nanoMIPS® DSP Module Instruction Summary

The tables in this chapter list all the instructions in the DSP Module. For operation details about each instruction, refer to the per-page descriptions. In each table, the column entitled "Writes GPR / ac / DSPControl", indicates the explicit write performed by each instruction. This column indicates the writing of a field in the DSPControl register other than the *ouflag* field (which is written by a large number of instructions as a side-effect).

All instructions from the first version of the MIPS® DSP Module onwards are included in the nanoMIPS DSP Module unless explicitly stated otherwise. Release 6 nanoMIPS deprecates BPOSGE32, and replaces PREPEND, BALIGN, LBUX, LHX, LWX by instructions in the baseline nanoMIPS Instruction Set as indicated in the table below.

| Instruction<br>Mnemonics                                          | Input<br>Data<br>Type        | Output<br>Data<br>Type       | Writes<br>GPR / ac /<br>DSPControl | Арр           | Description                                                                                                                                                                                                   |
|-------------------------------------------------------------------|------------------------------|------------------------------|------------------------------------|---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ADDQ.PH rd,rs,rt<br>ADDQ_S.PH rd,rs,rt                            | Pair Q15                     | Pair Q15                     | GPR                                | VoIP<br>SoftM | Element-wise addition of two vectors of Q15 fractional values, with optional saturation.                                                                                                                      |
| ADDQ_S.W rd,rs,rt                                                 | Q31                          | Q31                          | GPR                                | Audio         | Add two Q31 fractional values with saturation.                                                                                                                                                                |
| ADDU.QB rd,rs,rt<br>ADDU_S.QB rd,rs,rt                            | Quad<br>Unsigned<br>Byte     | Quad<br>Unsigned<br>Byte     | GPR                                | Video         | Element-wise addition of unsigned byte val-<br>ues, with optional unsigned saturation.                                                                                                                        |
| ADDUH.QB rd,rs,rt<br>ADDUH_R.QB rd,rs,rt<br>Introduced in DSP-R2. | Quad<br>Unsigned<br>Byte     | Quad<br>Unsigned<br>Byte     | GPR                                | Video         | Element-wise addition of vectors of four<br>unsigned byte values, halving each result by<br>right-shifting by one bit position. Results may<br>be optionally rounded up in the least-signifi-<br>cant bit.    |
| ADDU.PH rd,rs,rt<br>ADDU_S.PH rd,rs,rt<br>Introduced in DSP-R2.   | Pair<br>Unsigned<br>Halfword | Pair<br>Unsigned<br>Halfword | GPR                                | Video         | Element-wise addition of vectors of two<br>unsigned halfword values, with optional satu-<br>ration on overflow.                                                                                               |
| ADDQH.PH rd,rs,rt<br>ADDQH_R.PH rd,rs,rt<br>Introduced in DSP-R2. | Pair Signed<br>Halfword      | Pair Signed<br>Halfword      | GPR                                | Misc          | Element-wise addition of vectors of two<br>signed halfword values, halving each result<br>with right-shifting by one bit position. Results<br>may be optionally rounded up in the least-sig-<br>nificant bit. |

#### Table 4.1 List of Instructions in nanoMIPS® DSP Module in Arithmetic Sub-class

| Instruction<br>Mnemonics                                          | Input<br>Data<br>Type        | Output<br>Data<br>Type       | Writes<br>GPR / ac /<br>DSPControl | Арр   | Description                                                                                                                                                                                                                                  |
|-------------------------------------------------------------------|------------------------------|------------------------------|------------------------------------|-------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ADDQH.W rd,rs,rt<br>ADDQH_R.W rd,rs,rt<br>Introduced in DSP-R2.   | Signed<br>Word               | Signed<br>Word               | GPR                                | Misc  | Add two signed word values, halving the result<br>with right-shifting by one bit position. Result<br>may be optionally rounded up in the least-sig-<br>nificant bit.                                                                         |
| SUBQ.PH rd,rs,rt<br>SUBQ_S.PH rd,rs,rt                            | Pair Q15                     | Pair Q15                     | GPR                                | VoIP  | Element-wise subtraction of two vectors of Q15 fractional values, with optional satura-<br>tion.                                                                                                                                             |
| SUBQ_S.W rd,rs,rt                                                 | Q31                          | Q31                          | GPR                                | Audio | Subtraction with Q31 fractional values, with saturation.                                                                                                                                                                                     |
| SUBU.QB rd,rs,rt<br>SUBU_S.QB rd,rs,rt                            | Quad<br>Unsigned<br>Byte     | Quad<br>Unsigned<br>Byte     | GPR                                | Video | Element-wise subtraction of unsigned byte values, with optional unsigned saturation.                                                                                                                                                         |
| SUBUH.QB rd,rs,rt<br>SUBUH_R.QB rd,rs,rt<br>Introduced in DSP-R2. | Quad<br>Unsigned<br>Byte     | Quad<br>Unsigned<br>Byte     | GPR                                | Video | Element-wise subtraction of unsigned byte values, shifting the results right one bit position (halving). The results may be optionally rounded up by adding 1 to each result at the most-significant discarded bit position before shifting. |
| SUBU.PH rd,rs,rt<br>SUBU_S.PH rd,rs,rt<br>Introduced in DSP-R2.   | Pair<br>Unsigned<br>Halfword | Pair<br>Unsigned<br>Halfword | GPR                                | Video | Element-wise subtraction of vectors of two<br>unsigned halfword values, with optional satu-<br>ration on overflow.                                                                                                                           |
| SUBQH.PH rd,rs,rt<br>SUBQH_R.PH rd,rs,rt<br>Introduced in DSP-R2. | Pair Signed<br>Halfword      | Pair Signed<br>Halfword      | GPR                                | Misc  | Element-wise subtraction of vectors of two<br>signed halfword values, halving each result<br>with right-shifting by one bit position. Results<br>may be optionally rounded up in the least-sig-<br>nificant bit.                             |
| SUBQH.W rd,rs,rt<br>SUBQH_R.W rd,rs,rt<br>Introduced in DSP-R2.   | Signed<br>Word               | Signed<br>Word               | GPR                                | Misc  | Subtract two signed word values, halving the result with right-shifting by one bit position. Result may be optionally rounded up in the least-significant bit.                                                                               |
| ADDSC rd,rs,rt                                                    | Signed<br>Word               | Signed<br>Word               | GPR &<br>DSPControl                | Audio | Add two signed words and set the carry bit in the <i>DSPControl</i> register.                                                                                                                                                                |
| ADDWC rd,rs,rt                                                    | Signed<br>Word               | Signed<br>Word               | GPR                                | Audio | Add two signed words with the carry bit from the <i>DSPControl</i> register.                                                                                                                                                                 |
| MODSUB rd,rs,rt                                                   | Signed<br>Word               | Signed<br>Word               | GPR                                | Misc  | Modulo addressing support: update a byte<br>index into a circular buffer by subtracting a<br>specified decrement (in bytes) from the index,<br>resetting the index to a specified value if the<br>subtraction results in underflow.          |

# Table 4.1 List of Instructions in nanoMIPS® DSP Module in Arithmetic Sub-class (Continued)

| Instruction<br>Mnemonics                                                            | Input<br>Data<br>Type            | Output<br>Data<br>Type   | Writes<br>GPR / ac /<br>DSPControl | Арр  | Description                                                                                                                                                                                                                                                                                                                                                                                               |
|-------------------------------------------------------------------------------------|----------------------------------|--------------------------|------------------------------------|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| RADDU.W.QB rd,rs                                                                    | Quad<br>Unsigned<br>Byte         | Unsigned<br>Word         | GPR                                | Misc | Reduce (add together) the 4 unsigned byte values in <i>rs</i> , zero-extending the sum to 32 bits before writing to the destination register. For example, if all 4 input values are 0x80 (decimal 128), then the result in <i>rd</i> is 0x200 (decimal 512).                                                                                                                                             |
| ABSQ_S.QB rd,rt<br>Introduced in DSP-R2.                                            | Quad Q7                          | Quad Q7                  | GPR                                | Misc | Find the absolute value of each of four Q7 fractional byte elements in the source register, saturating values of -1.0 to the maximum positive Q7 fractional value.                                                                                                                                                                                                                                        |
| ABSQ_S.PH rd,rt                                                                     | Pair Q15                         | Pair Q15                 | GPR                                | Misc | Find the absolute value of each of two Q15 fractional halfword elements in the source reg-<br>ister, saturating values of -1.0 to the maximum positive Q15 fractional value.                                                                                                                                                                                                                              |
| ABSQ_S.W rd,rt                                                                      | Q31                              | Q31                      | GPR                                | Misc | Find the absolute value of the Q31 fractional element in the source register, saturating the value -1.0 to the maximum positive Q31 fractional value.                                                                                                                                                                                                                                                     |
| PRECR.QB.PH rd,rs,rt<br>Introduced in DSP-R2.                                       | Two Pair<br>Integer<br>Halfwords | Four Inte-<br>ger Bytes  | GPR                                | Misc | Reduce the precision of four signed integer<br>halfword input values by discarding the eight<br>most-significant bits from each to create four<br>signed integer byte output values. The two<br>halfword values from register <i>rs</i> are used to<br>create the two left-most byte results, allowing<br>an endian-agnostic implementation.                                                              |
| PRECRQ.QB.PH rd,rs,rt                                                               | 2 Pair Q15                       | Quad Byte                | GPR                                | Misc | Reduce the precision of four Q15 fractional<br>input values by truncation to create four Q7<br>fractional output values. The two Q15 values<br>from register <i>rs</i> are written to the two<br>left-most byte results, allowing an<br>endian-agnostic implementation.                                                                                                                                   |
| PRECR_SRA.PH.W<br>rt,rs,sa<br>PRECR_SRA_R.PH.W<br>rt,rs,sa<br>Introduced in DSP-R2. | Two Inte-<br>ger Words           | Pair Integer<br>Halfword | GPR                                | Misc | Reduce the precision of two integer word values to create a pair of integer halfword values. Each word value is first shifted right arithmetically by <i>sa</i> bit positions, and optionally rounded up by adding 1 at the most-significant discard bit position. The 16 least-significant bits of each word are then written to the corresponding halfword elements of destination register <i>rt</i> . |

| Instruction<br>Mnemonics                                                                             | Input<br>Data<br>Type | Output<br>Data<br>Type   | Writes<br>GPR / ac /<br>DSPControl | Арр   | Description                                                                                                                                                                                                                                                                                                                                                     |
|------------------------------------------------------------------------------------------------------|-----------------------|--------------------------|------------------------------------|-------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| PRECRQ.PH.W rd,rs,rt<br>PRECRQ_RS.PH.W<br>rd,rs,rt                                                   | 2 Q31                 | Pair half-<br>word       | GPR                                | Misc  | Reduce the precision of two Q31 fractional<br>input values by truncation to create two Q15<br>fractional output values. The Q15 value<br>obtained from register <i>rs</i> creates the left-most<br>result, allowing an endian-agnostic implemen-<br>tation. Results may be optionally rounded up<br>and saturated before being written to the desti-<br>nation. |
| PRECRQU_S.QB.PH<br>rd,rs,rt                                                                          | 2 Pair Q15            | Quad<br>Unsigned<br>Byte | GPR                                | Misc  | Reduce the precision of four Q15 fractional values by saturating and truncating to create four unsigned byte values.                                                                                                                                                                                                                                            |
| PRECEQ.W.PHL rd,rt<br>PRECEQ.W.PHR rd,rt                                                             | Q15                   | Q31                      | GPR                                | Misc  | Expand the precision of a Q15 fractional value<br>to create a Q31 fractional value by adding 16<br>least-significant bits to the input value.                                                                                                                                                                                                                   |
| PRECEQU.PH.QBL rd,rt<br>PRECEQU.PH.QBR rd,rt<br>PRECEQU.PH.QBLA<br>rd,rt<br>PRECEQU.PH.QBRA<br>rd,rt | Unsigned<br>Byte      | Q15                      | GPR                                | Video | Expand the precision of two unsigned byte values by prepending a sign bit and adding seven least-significant bits to each to create two Q15 fractional values.                                                                                                                                                                                                  |
| PRECEU.PH.QBL rd,rt<br>PRECEU.PH.QBR rd,rt<br>PRECEU.PH.QBLA rd,rt<br>PRECEU.PH.QBRA rd,rt           | Unsigned<br>Byte      | Unsigned<br>halfword     | GPR                                | Video | Expand the precision of two unsigned byte values by adding eight least-significant bits to each to create two unsigned halfword values.                                                                                                                                                                                                                         |

# Table 4.2 List of Instructions in nanoMIPS® DSP Module in GPR-Based Shift Sub-class

| Instruction<br>Mnemonics                                                                   | Input<br>Data<br>Type    | Output<br>Data<br>Type   | Writes<br>GPR / ac /<br>DSPControl | Арр  | Description                                                                                                                                                                                                                                       |
|--------------------------------------------------------------------------------------------|--------------------------|--------------------------|------------------------------------|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| SHLL.QB rd, rt, sa<br>SHLLV.QB rd, rt, rs                                                  | Quad<br>Unsigned<br>Byte | Quad<br>Unsigned<br>Byte | GPR                                | Misc | Element-wise left shift of eight signed bytes.<br>Zeros are inserted into the bits emptied by the<br>shift. The shift amount is specified by the three<br>least-significant bits of sa or <i>rs</i> .                                             |
| SHLL.PH rd, rt, sa<br>SHLLV.PH rd, rt, rs<br>SHLL_S.PH rd, rt, sa<br>SHLLV_S.PH rd, rt, rs | Pair Signed<br>halfword  | Pair Signed<br>halfword  | GPR                                | Misc | Element-wise left shift of two signed half-<br>words, with optional saturation on overflow.<br>Zeros are inserted into the bits emptied by the<br>shift. The shift amount is specified by the four<br>least-significant bits of sa or <i>rs</i> . |

| Instruction<br>Mnemonics                                                                                           | Input<br>Data<br>Type    | Output<br>Data<br>Type   | Writes<br>GPR / ac /<br>DSPControl | Арр   | Description                                                                                                                                                                                                                                                                                               |
|--------------------------------------------------------------------------------------------------------------------|--------------------------|--------------------------|------------------------------------|-------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| SHLL_S.W rd, rt, sa<br>SHLLV_S.W rd, rt, rs                                                                        | Signed<br>Word           | Signed<br>Word           | GPR                                | Misc  | Left shift of a signed word, with saturation on<br>overflow. Zeros are inserted into the bits emp-<br>tied by the shift. The shift amount is specified<br>by the five least-significant bits of sa or <i>rs</i> .<br>Use the microMIPS32 instructions SLL or<br>SLLV for non-saturating shift operations. |
| SHRL.QB rd, rt, sa<br>SHRLV.QB rd, rt, rs                                                                          | Quad<br>Unsigned<br>Byte | Quad<br>Unsigned<br>Byte | GPR                                | Video | Element-wise logical right shift of four byte values. Zeros are inserted into the bits emptied by the shift. The shift amount is specified by the three least-significant bits of sa or <i>rs</i> .                                                                                                       |
| SHRL.PH rd, rt, sa<br>SHRLV.PH rd, rt, rs<br>Introduced in DSP-R2.                                                 | Pair Half-<br>words      | Pair Half-<br>words      | GPR                                | Video | Element-wise logical right shift of two half-<br>word values. Zeros are inserted into the bits<br>emptied by the shift. The shift amount is spec-<br>ified by the four least-significant bits of <i>r</i> s or<br>the <i>sa</i> argument.                                                                 |
| SHRA.QB rd,rt,sa<br>SHRA_R.QB rd,rt,sa<br>SHRAV.QB rd,rt,rs<br>SHRAV_R.QB rd,rt,rs<br><b>Introduced in DSP-R2.</b> | Quad Byte                | Quad Byte                | GPR                                | Misc  | Element-wise arithmetic (sign preserving)<br>right shift of four byte values. Optional round-<br>ing may be performed, adding 1 at the<br>most-significant discard bit position. The shift<br>amount is specified by the three least-signifi-<br>cant bits of <i>rs</i> or by the argument <i>sa</i> .    |
| SHRA.PH rd, rt, sa<br>SHRAV.PH rd, rt, rs<br>SHRA_R.PH rd, rt, sa<br>SHRAV_R.PH rd, rt, rs                         | Pair Signed<br>halfword  | Pair Signed<br>halfword  | GPR                                | Misc  | Element-wise arithmetic (sign preserving)<br>right shift of two halfword values. Optionally,<br>rounding may be performed, adding 1 at the<br>most-significant discard bit position. The shift<br>amount is specified by the four least-signifi-<br>cant bits of <i>rs</i> or by the argument <i>sa</i> . |
| SHRA_R.W rd, rt, sa<br>SHRAV_R.W rd, rt, rs                                                                        | Signed<br>Word           | Signed<br>Word           | GPR                                | Video | Arithmetic (sign preserving) right shift of a<br>word value. Optionally, rounding may be per-<br>formed, adding 1 at the most-significant dis-<br>card bit position. The shift amount is specified<br>by the five least-significant bits of <i>rs</i> or the<br>argument <i>sa</i> .                      |

| Instruction<br>Mnemonics                                 | Input<br>Data<br>Type                                   | Output<br>Data<br>Type       | Writes<br>GPR / ac /<br>DSPControl | Арр            | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
|----------------------------------------------------------|---------------------------------------------------------|------------------------------|------------------------------------|----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| MULEU_S.PH.QBL<br>rd,rs,rt<br>MULEU_S.PH.QBR<br>rd,rs,rt | Pair<br>Unsigned<br>Byte, Pair<br>Unsigned<br>Halfword, | Pair<br>Unsigned<br>Halfword | GPR                                | Still<br>Image | Element-wise multiplication of two unsigned<br>byte values from register <i>rs</i> with two unsigned<br>halfword values from register <i>rt</i> . Each 24-bit<br>product is truncated to 16 bits, with saturation<br>if the product exceeds 0xFFFF, and written to<br>the corresponding element in the destination<br>register.                                                                                                                                                                                                                                                                                                                                   |
| MULQ_RS.PH rd,rs,rt                                      | Pair Q15                                                | Pair Q15                     | GPR                                | Misc           | Element-wise multiplication of two Q15 frac-<br>tional values to create two Q15 fractional<br>results, with rounding and saturation. After<br>multiplication, each 32-bit product is rounded<br>up by adding 0x00008000, then truncated to<br>create a Q15 fractional value that is written to<br>the destination register. If both multiplicands<br>are -1.0, the result is saturated to the maximum<br>positive Q15 fractional value.<br>To stay compliant with the base architecture,<br>this instruction leaves the base <i>HI-LO</i> pair<br><b>UNPREDICTABLE</b> after the operation. The<br>other DSP Module accumulators <i>ac1-ac3</i> are<br>untouched. |
| MULEQ_S.W.PHL<br>rd,rs,rt<br>MULEQ_S.W.PHR<br>rd,rs,rt   | Pair Q15                                                | Q31                          | GPR                                | VoIP           | Multiplication of two Q15 fractional values,<br>shifting the product left by 1 bit to create a<br>Q31 fractional result. If both multiplicands are<br>-1.0 the result is saturated to the maximum<br>positive Q31 value.<br>To stay compliant with the base architecture,<br>this instruction leaves the base <i>HI-LO</i> pair<br><b>UNPREDICTABLE</b> after the operation. The<br>other DSP Module accumulators <i>ac1-ac3</i><br>must beare untouched.                                                                                                                                                                                                         |
| DPAU.H.QBL<br>DPAU.H.QBR                                 | Pair Bytes                                              | Halfword                     | Acc                                | Image          | Dot-product accumulation. Two pairs of corresponding unsigned byte elements from source registers <i>rt</i> and <i>rs</i> are separately multiplied, and the two 16-bit products are then summed together. The summed products are then added to the accumulator.                                                                                                                                                                                                                                                                                                                                                                                                 |
| DPSU.H.QBL<br>DPSU.H.QBR                                 | Pair Bytes                                              | Halfword                     | Acc                                | Image          | Dot-product subtraction. Two pairs of corre-<br>sponding unsigned byte elements from source<br>registers <i>rt</i> and <i>rs</i> are separately multiplied,<br>and the two 16-bit products are then summed<br>together. The summed products are then sub-<br>tracted from the accumulator.                                                                                                                                                                                                                                                                                                                                                                        |

Table 4.3 List of Instructions in nanoMIPS® DSP Module in Multiply Sub-class

| Instruction<br>Mnemonics                        | Input<br>Data<br>Type   | Output<br>Data<br>Type  | Writes<br>GPR / ac /<br>DSPControl | Арр             | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|-------------------------------------------------|-------------------------|-------------------------|------------------------------------|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DPA.W.PH ac,rs,rt<br>Introduced in DSP-R2.      | Pair Signed<br>Halfword | Pair Signed<br>Halfword | ac                                 | VoIP /<br>SoftM | Dot-product accumulation. The two pairs of corresponding signed integer halfword values from source registers <i>rt</i> and <i>rs</i> are separately multiplied to create two separate integer word products. The products are then summed and accumulated into the specified accumulator.                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| DPAX.W.PH ac,rs,rt<br>Introduced in DSP-R2.     | Pair Signed<br>Halfword | Double-<br>word         | ac                                 | VoIP            | Dot-product with crossed operands and accu-<br>mulation. The two crossed pairs of signed inte-<br>ger halfword values from source registers <i>rt</i><br>and <i>rs</i> are separately multiplied to create two<br>separate integer word products. The products<br>are then summed and accumulated into the<br>specified accumulator.                                                                                                                                                                                                                                                                                                                                                                                                 |
| DPAQ_S.W.PH ac,rs,rt                            | Pair Q15                | Q32.31                  | ac                                 | VoIP /<br>SoftM | Dot-product accumulation. Two pairs of corre-<br>sponding Q15 fractional values from source<br>registers <i>rt</i> and <i>rs</i> are separately multiplied and<br>left-shifted 1 bit to create two Q31 fractional<br>products. For each product, if both multipli-<br>cands are equal to -1.0 the product is clamped<br>to the maximum positive Q31 fractional value.<br>The products are then summed, and the sum is<br>then sign extended to the width of the accumu-<br>lator and accumulated into the specified accu-<br>mulator.<br>This instruction may be used to compute the<br>imaginary component of a 16-bit complex<br>multiplication operation after first swapping<br>the operands to place them in the correct order. |
| DPAQX_S.W.PH ac,rs,rt<br>Introduced in DSP-R2.  | Pair Signed<br>Halfword | Q32.31                  | ac                                 | VoIP            | Dot-product with saturating fractional multi-<br>plication and using crossed operands, with a<br>final accumulation. The two crossed pairs of<br>signed fractional halfword values from source<br>registers <i>rt</i> and <i>rs</i> are separately multiplied to<br>create two separate fractional word products.<br>The products are then summed and accumu-<br>lated into the specified accumulator.                                                                                                                                                                                                                                                                                                                               |
| DPAQX_SA.W.PH ac,rs,rt<br>Introduced in DSP-R2. | Pair Signed<br>Halfword | Q32.31                  | ac                                 | VoIP            | Dot-product with saturating fractional multi-<br>plication and using crossed operands, with a<br>final saturating accumulation. The two crossed<br>pairs of signed fractional halfword values from<br>source registers <i>rt</i> and <i>rs</i> are separately multi-<br>plied to create two separate fractional word<br>products. The products are then summed and<br>accumulated with saturation into the specified<br>accumulator.                                                                                                                                                                                                                                                                                                 |

| Instruction<br>Mnemonics                        | Input<br>Data<br>Type   | Output<br>Data<br>Type | Writes<br>GPR / ac /<br>DSPControl | Арр             | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|-------------------------------------------------|-------------------------|------------------------|------------------------------------|-----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DPS.W.PH ac,rs,rt<br>Introduced in DSP-R2.      | Pair Signed<br>Halfword | Double-<br>word        | ac                                 | VoIP /<br>SoftM | Dot-product subtraction. The two pairs of cor-<br>responding signed integer halfword values<br>from source registers <i>rt</i> and <i>rs</i> are separately<br>multiplied to create two separate integer word<br>products. The products are then summed and<br>subtracted from the specified accumulator.                                                                                                                                                                                                                                                                                                                                                                                                                          |
| DPSX.W.PH ac,rs,rt<br>Introduced in DSP-R2.     | Pair Signed<br>Halfword | Q32.31                 | ac                                 | VoIP            | Dot-product with crossed operands and sub-<br>traction. The two crossed pairs of signed inte-<br>ger halfword values from source registers <i>rt</i><br>and <i>rs</i> are separately multiplied to create two<br>separate integer word products. The products<br>are then summed and subtracted into the speci-<br>fied accumulator.                                                                                                                                                                                                                                                                                                                                                                                               |
| DPSQ_S.W.PH ac,rs,rt                            | Pair Q15                | Q32.31                 | ac                                 | VoIP /<br>SoftM | Dot-product subtraction. Two pairs of corre-<br>sponding Q15 fractional values from source<br>registers <i>rt</i> and <i>rs</i> are separately multiplied and<br>left-shifted 1 bit to create two Q31 fractional<br>products. For each product, if both multipli-<br>cands are equal to -1.0 the product is clamped<br>to the maximum positive Q31 fractional value.<br>The products are then summed, and the sum is<br>then sign extended to the width of the accumu-<br>lator and subtracted from the specified accu-<br>mulator.<br>This instruction may be used to compute the<br>imaginary component of a 16-bit complex<br>multiplication operation after first swapping<br>the operands to place them in the correct order. |
| DPSQX_S.W.PH ac,rs,rt<br>Introduced in DSP-R2.  | Pair Signed<br>Halfword | Q32.31                 | ac                                 | VoIP            | Dot-product with saturating fractional multi-<br>plication and using crossed operands, with a<br>final subtraction. The two crossed pairs of<br>signed fractional halfword values from source<br>registers <i>rt</i> and <i>rs</i> are separately multiplied to<br>create two separate fractional word products.<br>The products are then summed and subtracted<br>from the specified accumulator.                                                                                                                                                                                                                                                                                                                                 |
| DPSQX_SA.W.PH ac,rs,rt<br>Introduced in DSP-R2. | Pair Signed<br>Halfword | Q32.31                 | ac                                 | VoIP            | Dot-product with saturating fractional multi-<br>plication and using crossed operands, with a<br>final saturating subtraction. The two crossed<br>pairs of signed fractional halfword values from<br>source registers <i>rt</i> and <i>rs</i> are separately multi-<br>plied to create two separate fractional word<br>products. The products are then summed and<br>subtracted with saturation into the specified<br>accumulator.                                                                                                                                                                                                                                                                                                 |

| Instruction<br>Mnemonics                       | Input<br>Data<br>Type | Output<br>Data<br>Type | Writes<br>GPR / ac /<br>DSPControl | Арр    | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|------------------------------------------------|-----------------------|------------------------|------------------------------------|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| MULSAQ_S.W.PH<br>ac,rs,rt                      | Pair Q15              | Q32.31                 | ac                                 | SoftM  | Complex multiplication step. Performs ele-<br>ment-wise fractional multiplication of the two<br>Q15 fractional values from registers <i>rt</i> and <i>rs</i> ,<br>subtracting one product from the other to cre-<br>ate a Q31 fractional result that is added to<br>accumulator <i>ac</i> . The intermediate products are<br>saturated to the maximum positive Q31 frac-<br>tional value if both multiplicands are equal to<br>-1.0.                                                                                         |
| DPAQ_SA.L.W ac,rs,rt                           | Q31                   | Q63                    | ac                                 | Audio  | Fractional multiplication of two Q31 fractional values to produce a Q63 fractional product. If both multiplicands are -1.0 the product is saturated to the maximum positive Q63 fractional value. The product is then added to accumulator <i>ac</i> . If the addition results in overflow or underflow, the accumulator is saturated to the maximum positive or minimum negative value.                                                                                                                                     |
| DPSQ_SA.L.W ac,rs,rt                           | Q31                   | Q63                    | ac                                 | Audio  | Fractional multiplication of two Q31 fractional values to produce a Q63 fractional product. If both multiplicands are -1.0 the product is saturated to the maximum positive Q63 fractional value. The product is then subtracted from accumulator <i>ac</i> . If the addition results in overflow or underflow, the accumulator is saturated to the maximum positive or minimum negative value.                                                                                                                              |
| MAQ_S.W.PHL ac,rs,rt<br>MAQ_S.W.PHR ac,rs,rt   | Q15                   | Q32.31                 | ac                                 | SoftM  | Fractional multiply-accumulate. The product<br>of two Q15 fractional values is sign extended<br>to the width of the accumulator and added to<br>accumulator <i>ac</i> . The intermediate product is<br>saturated to the maximum positive Q31 frac-<br>tional value if both multiplicands are equal to<br>-1.0.                                                                                                                                                                                                               |
| MAQ_SA.W.PHL ac,rs,rt<br>MAQ_SA.W.PHR ac,rs,rt | Q15                   | Q31                    | ac                                 | speech | Fractional multiply-accumulate with satura-<br>tion after accumulation. The product of two<br>Q15 fractional values is sign extended to the<br>width of the accumulator and added to accu-<br>mulator <i>ac</i> . The intermediate product is satu-<br>rated to the maximum positive Q31 fractional<br>value if both multiplicands are equal to -1.0.<br>If the accumulation results in overflow or<br>underflow, the accumulator value is saturated<br>to the maximum positive or minimum negative<br>Q31 fractional value. |

| Instruction<br>Mnemonics                                                                              | Input<br>Data<br>Type   | Output<br>Data<br>Type  | Writes<br>GPR / ac /<br>DSPControl | Арр    | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|-------------------------------------------------------------------------------------------------------|-------------------------|-------------------------|------------------------------------|--------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| MUL.PH rd,rs,rt<br>MUL_S.PH rd,rs,rt<br><b>Introduced in DSP-R2.</b>                                  | Pair Signed<br>Halfword | Pair Signed<br>Halfword | GPR                                | speech | Element-wise multiplication of two vectors of<br>signed integer halfwords, writing the 16<br>least-significant bits of each 32-bit product to<br>the corresponding element of the destination<br>register. Optional saturation clamps each<br>16-bit result to the maximum positive or mini-<br>mum negative value if the product cannot be<br>accurately represented in 16 bits.                                                                                           |
| MULQ_S.PH rd,rs,rt<br>Introduced in DSP-R2.                                                           | Pair Q15                | Pair Q15                | GPR                                | speech | Element-wise multiplication of two vectors of Q15 fractional halfwords, writing the 16 most-significant bits of each Q31-format product to the corresponding element of the destination register. Each result is saturated to the maximum positive Q15 value if both multiplicands were equal to -1.0 (0x8000 hexadecimal).                                                                                                                                                 |
| MULQ_S.W rd,rs,rt<br>Introduced in DSP-R2.                                                            | Q31                     | Q31                     | GPR                                | speech | Fractional multiplication of two Q31 format<br>words to create a Q63 format result that is<br>truncated by discarding the 32 least-significant<br>bits before being written to the destination reg-<br>ister. The result is saturated to the maximum<br>positive Q31 value if both multiplicands were<br>equal to -1.0 (0x80000000 hexadecimal).                                                                                                                            |
| MULQ_RS.W rd,rs,rt<br>Introduced in DSP-R2.                                                           | Q31                     | Q31                     | GPR                                | speech | Multiplication of two Q31 fractional words to<br>create a Q63-format intermediate product that<br>is rounded up by adding a 1 at bit position 31.<br>The 32 most-significant bits of the rounded<br>result are then written to the destination regis-<br>ter. If both multiplicands were equal to -1.0<br>(0x80000000 hexadecimal), rounding is not<br>performed and the result is clamped to the<br>maximum positive Q31 value before being<br>written to the destination. |
| MULSA.W.PH ac,rs,rt<br>Introduced in DSP-R2.                                                          | Pair Signed<br>Halfword | Double-<br>word         | ac                                 | speech | Element-wise multiplication of two vectors of<br>signed integer halfwords to create two 32-bit<br>word intermediate results. The right intermedi-<br>ate result is subtracted from the left intermedi-<br>ate result, and the resulting sum is<br>accumulated into the specified accumulator.                                                                                                                                                                               |
| MADD ac,rs,rt<br>MADDU ac,rs,rt<br>MSUB ac,rs,rt<br>MSUBU ac,rs,rt<br>MULT ac,rs,rt<br>MULTU ac,rs,rt | Word                    | Double-<br>word         | ac                                 | Misc   | Allows these instructions to target accumula-<br>tors $ac1$ , $ac2$ , and $ac3$ (in addition to the orig-<br>inal $ac0$ destination).                                                                                                                                                                                                                                                                                                                                       |

| Instruction<br>Mnemonics         | Input<br>Data<br>Type | Output<br>Data<br>Type  | Writes<br>GPR / ac /<br>DSPControl | Арр             | Description                                                                                                                                                                                                                                                             |
|----------------------------------|-----------------------|-------------------------|------------------------------------|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| BITREV rd,rt                     | Unsigned<br>Word      | Unsigned<br>Word        | GPR                                | Audio/<br>FFT   | Reverse the order of the 16 least-significant<br>bits of register <i>rt</i> , writing the result to register<br><i>rd</i> . The 16 most-significant bits are set to zero.                                                                                               |
| INSV rt,rs                       | Unsigned<br>Word      | Unsigned<br>Word        | GPR                                | Misc            | Like the Release 2 INS instruction, except that<br>the 5 bits for <i>pos</i> and <i>size</i> values are obtained<br>from the <i>DSPControl</i> register. <i>size</i> =<br>scount[14:10], and <i>pos</i> = pos[20:16].                                                   |
| REPL.QB rd,imm<br>REPLV.QB rd,rt | Byte                  | Quad Byte               | GPR                                | Video /<br>Misc | Replicate a signed byte value into the four byte elements of register <i>rd</i> . The byte value is given by the 8 least-significant bits of the specified 10-bit immediate constant or by the 8 least-significant bits of register <i>rt</i> .                         |
| REPL.PH rd,imm<br>REPLV.PH rd,rt | Signed<br>halfword    | Pair Signed<br>halfword | GPR                                | Misc            | Replicate a signed halfword value into the two<br>halfword elements of register <i>rd</i> . The halfword<br>value is given by the 16 least-significant bits of<br>register <i>rt</i> , or by the value of the 10-bit imme-<br>diate constant, sign-extended to 16 bits. |

Table 4.4 List of Instructions in MIPS® DSP Module in Bit/ Manipulation Sub-class

# Table 4.5 List of Instructions in MIPS® DSP Module in Compare-Pick Sub-class

| Instruction<br>Mnemonics                                                                         | Input<br>Data<br>Type    | Output<br>Data<br>Type   | Writes<br>GPR / ac /<br>DSPControl | Арр   | Description                                                                                                                                                                                                                                                                                                |
|--------------------------------------------------------------------------------------------------|--------------------------|--------------------------|------------------------------------|-------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CMPU.EQ.QB rs,rt<br>CMPU.LT.QB rs,rt<br>CMPU.LE.QB rs,rt                                         | Quad<br>Unsigned<br>Byte | Quad<br>Unsigned<br>Byte | DSPControl                         | Video | Element-wise unsigned comparison of the four<br>unsigned byte elements of <i>rs</i> and <i>rt</i> , recording<br>the boolean comparison results to the four<br>right-most bits in the <i>ccond</i> field of the<br><i>DSPControl</i> register.                                                             |
| CMPGDU.EQ.QB rd,rs,rt<br>CMPGDU.LT.QB rd,rs,rt<br>CMPGDU.LE.QB rd,rs,rt<br>Introduced in DSP-R2. | Quad<br>Unsigned<br>Byte | Quad<br>Unsigned<br>Byte | GPR<br>DSPControl                  | Video | Element-wise unsigned comparison of the four right-most unsigned byte elements of <i>rs</i> and <i>rt</i> , recording the boolean comparison results to the four least-significant bits of register <i>rd</i> and to the four right-most bits in the <i>ccond</i> field of the <i>DSPControl</i> register. |
| CMPGU.EQ.QB rd,rs,rt<br>CMPGU.LT.QB rd,rs,rt<br>CMPGU.LE.QB rd,rs,rt                             | Quad<br>Unsigned<br>Byte | Quad<br>Unsigned<br>Byte | GPR                                | Video | Element-wise unsigned comparison of the four right-most unsigned byte elements of <i>rs</i> and <i>rt</i> , recording the boolean comparison results to the four least-significant bits of register <i>rd</i> .                                                                                            |

| Instruction<br>Mnemonics                                                                   | Input<br>Data<br>Type    | Output<br>Data<br>Type   | Writes<br>GPR / ac /<br>DSPControl | Арр   | Description                                                                                                                                                                                                                                                                                                                                                                                                                               |
|--------------------------------------------------------------------------------------------|--------------------------|--------------------------|------------------------------------|-------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CMP.EQ.PH rs,rt<br>CMP.LT.PH rs,rt<br>CMP.LE.PH rs,rt                                      | Pair Signed<br>halfword  | Pair Signed<br>halfword  | DSPControl                         | Misc  | Element-wise signed comparison of the two<br>halfword elements of <i>rs</i> and <i>rt</i> , recording the<br>boolean comparison results to the two<br>right-most bits in the <i>ccond</i> field of the<br><i>DSPControl</i> register.                                                                                                                                                                                                     |
| PICK.QB rd,rs,rt                                                                           | Quad<br>Unsigned<br>Byte | Quad<br>Unsigned<br>Byte | GPR                                | Video | Element-wise selection of unsigned bytes from<br>the four bytes of registers <i>rs</i> and <i>rt</i> into the<br>corresponding elements of register <i>rd</i> , based<br>on the value of the four right-most bits of the<br><i>ccond</i> field in the <i>DSPControl</i> register. If the<br>corresponding <i>ccond</i> bit is 1, the byte value is<br>copied from register <i>rs</i> , otherwise it is copied<br>from <i>rt</i> .         |
| PICK.PH rd,rs,rt                                                                           | Pair Signed<br>halfword  | Pair Signed<br>halfword  | GPR                                | Misc  | Element-wise selection of signed halfwords<br>from the two halfwords in registers <i>rs</i> and <i>rt</i><br>into the corresponding elements of register <i>rd</i> ,<br>based on the value of the two right-most bits of<br>the <i>ccond</i> field in the <i>DSPControl</i> register. If<br>the corresponding <i>ccond</i> bit is 1, the halfword<br>value is copied from register <i>rs</i> , otherwise it is<br>copied from <i>rt</i> . |
| APPEND rt,rs,sa<br>Introduced in DSP-R2.                                                   | Two Words                | Word                     | GPR                                | Misc  | Shifts the 32-bit word in register <i>rt</i> left by <i>sa</i> bits, inserting the <i>sa</i> least-significant bits from register <i>rs</i> into the bit positions emptied by the shift. The 32-bit result is then written to register <i>rt</i> .                                                                                                                                                                                        |
| PREPEND rt,rs,sa<br>Introduced in DSP-R2.<br>Replaced by EXTW in<br>baseline nanoMIPS ISA. | Two Words                | Word                     | GPR                                | Misc  | Shifts the 32-bit word in register <i>rt</i> right by <i>sa</i> bits, inserting the <i>sa</i> least-significant bits from register <i>rs</i> into the bit positions emptied by the shift. The 32-bit result is then written to register <i>rt</i> .                                                                                                                                                                                       |
| BALIGN rt,rs,bp<br>Introduced in DSP-R2.<br>Replaced by EXTW in<br>baseline nanoMIPS ISA.  | Two Words                | Word                     | GPR                                | Misc  | Packs <i>bp</i> bytes from register <i>rt</i> and (4- <i>bp</i> ) bytes from register <i>rs</i> into a 32-bit word and writes it to register <i>rt</i> .                                                                                                                                                                                                                                                                                  |
| PACKRL.PH rd,rs,rt                                                                         | Pair Signed<br>Halfwords | Pair Signed<br>Halfword  | GPR                                | Misc  | Pack two halfwords taken from registers <i>rs</i> and <i>rt</i> into destination register <i>rd</i> .                                                                                                                                                                                                                                                                                                                                     |

# Table 4.6 List of Instructions in MIPS® DSP Module in Accumulator and DSPControl Access Sub-class

| Instruction<br>Mnemonics                                            | Input<br>Data<br>Type | Output<br>Data<br>Type | Writes<br>GPR / ac /<br>DSPControl | Арр  | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|---------------------------------------------------------------------|-----------------------|------------------------|------------------------------------|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| EXTR.W rt,ac,shift<br>EXTR_R.W rt,ac,shift<br>EXTR_RS.W rt,ac,shift | Q63                   | Q31                    | GPR                                | Misc | Extract a Q31 fractional value from the 32<br>least-significant bits of 64-bit accumulator <i>ac</i> .<br>The accumulator value may be shifted right<br>logically by <i>shift</i> bits prior to the extraction,<br>and the extracted value may be optionally<br>rounded or rounded and saturated before being<br>written to register <i>rt</i> .<br>The <i>shift</i> argument value ranges from 0 to 31.<br>The optional rounding step adds 1 at the<br>most-significant bit position discarded by the<br>shift. The optional saturation clamps the<br>extracted value to the maximum positive Q31<br>value if the rounding step results in overflow. |
| EXTR_S.H rt,ac,shift                                                | Q63                   | Q15                    | GPR                                | Misc | Extract a Q15 fractional value from the 16<br>least-significant bits of 64-bit accumulator <i>ac</i> .<br>The accumulator value may be shifted right<br>logically by <i>shift</i> bits prior to the extraction,<br>and the extracted value is saturated before<br>being written to register <i>rt</i> .<br>The <i>shift</i> argument value ranges from 0 to 31.<br>The saturation clamps the extracted value to<br>the maximum positive or minimum negative<br>Q15 value if the shifted accumulator value<br>cannot be represented accurately as a Q15 for-<br>mat value.                                                                             |
| EXTRV_S.H rt,ac,rs                                                  | Q63                   | Q15                    | GPR                                | Misc | Extract a Q15 fractional value from the 16<br>least-significant bits of 64-bit accumulator <i>ac</i> .<br>The accumulator value may be shifted right<br>logically by <i>shift</i> bits prior to the extraction,<br>and the extracted value is saturated before<br>being written to register <i>rt</i> .<br>The <i>shift</i> argument ranges from 0 to 31 and is<br>given by the five least-significant bits of regis-<br>ter <i>rs</i> . The saturation clamps the extracted<br>value to the maximum positive or minimum<br>negative Q15 value if the shifted accumulator<br>value cannot be represented accurately as a<br>Q15 format value.         |

| Table 4.6 List of Instructions in MIPS® DSP Module in Accumulator and DSPControl Access Sub- | lass |
|----------------------------------------------------------------------------------------------|------|
|----------------------------------------------------------------------------------------------|------|

| Instruction<br>Mnemonics                                                   | Input<br>Data<br>Type | Output<br>Data<br>Type | Writes<br>GPR / ac /<br>DSPControl | Арр             | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|----------------------------------------------------------------------------|-----------------------|------------------------|------------------------------------|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| EXTRV.W rt,ac,rs<br>EXTRV_R.W rt,ac,rs<br>EXTRV_RS.W rt,ac,rs              | Q63                   | Q31                    | GPR                                | Misc            | Extract a Q31 fractional value from the 32<br>least-significant bits of 64-bit accumulator <i>ac</i> .<br>The accumulator value may be shifted right<br>logically by <i>shift</i> bits prior to the extraction,<br>and the extracted value may be optionally<br>rounded or rounded and saturated before being<br>written to register <i>rt</i> .<br>The <i>shift</i> argument value is provided by the<br>five least-significant bits of <i>rs</i> and ranges from<br>0 to 31. The optional rounding step adds 1 at<br>the most-significant bit position discarded by<br>the shift. The optional saturation clamps the<br>extracted value to the maximum positive Q31<br>value if the rounding step results in overflow. |
| EXTP rt,ac,size<br>EXTPV rt,ac,rs<br>EXTPDP rt,ac,size<br>EXTPDPV rt,ac,rs | Unsigned<br>DWord     | Unsigned<br>Word       | GPR /<br>DSPControl                | Audio/<br>Video | Extract a set of <i>size</i> +1 contiguous bits from<br>accumulator <i>ac</i> , right-justifying and<br>sign-extending the result to 32 bits before writ-<br>ing the result to register <i>rt</i> .<br>The position of the left-most bit to extract is<br>given by the value of the <i>pos</i> field in the<br><i>DSPControl</i> register (see Appendix A for<br>details). The number of bits (less one) to<br>extract is provided either by the <i>size</i> immedi-<br>ate operand or by the five least-significant bits<br>of <i>rs</i> .<br>The EXTPDP and EXTPDPV instructions also<br>decrement the <i>pos</i> field by <i>size</i> +1 to facilitate<br>sequential bit field extraction operations.                 |
| SHILO ac,shift<br>SHILOV ac,rs                                             | Unsigned<br>DWord     | Unsigned<br>DWord      | ac                                 | Misc            | Shift accumulator <i>ac</i> left or right by the speci-<br>fied number of bits, writing the shifted value<br>back to the accumulator. The signed shift argu-<br>ment is specified either by the immediate oper-<br>and <i>shift</i> or by the six least-significant bits of<br>register <i>rs</i> . A negative shift argument results in<br>a right shift of up to 32 bits, and a positive shift<br>argument results in a left shift of up to 31 bits.                                                                                                                                                                                                                                                                   |
| MTHLIP rs, ac                                                              | Unsigned<br>Word      | Unsigned<br>Word       | ac /<br>DSPControl                 | Audio/<br>Video | Copy the <i>LO</i> register of the specified accumulator to the <i>HI</i> register, copy <i>rs</i> to <i>LO</i> , and increment the <i>pos</i> field in <i>DSPcontrol</i> by 32.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| MFHI/MFLO/MTHI/MT<br>LO                                                    | Unsigned<br>Word      | Unsigned<br>Word       | GPR/ac                             | Misc            | Copy an unsigned word to or from the speci-<br>fied accumulator <i>HI</i> or <i>LO</i> register to the spec-<br>ified GPR.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |

## Table 4.6 List of Instructions in MIPS® DSP Module in Accumulator and DSPControl Access Sub-class

| Instruction<br>Mnemonics | Input<br>Data<br>Type                                                                                                                                                                                                                                                                        | Output<br>Data<br>Type | Writes<br>GPR / ac /<br>DSPControl                                                                                                                                                                                                                                                                                                                                   | Арр  | Description                                                                                                                                                                                                                                                                                                                                            |
|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| WRDSP rt,mask            | P rt,mask     Unsigned<br>Word     Unsigned<br>Word     DSPControl     Misc     Overwrite specific fields in the<br>register using the correspond<br>specified GPR. Bits in the mark<br>respond to specific fields in the<br>value of 1 causes the correspond<br>DSPControl field to be over |                        | Overwrite specific fields in the <i>DSPControl</i> register using the corresponding bits from the specified GPR. Bits in the <i>mask</i> argument correspond to specific fields in <i>DSPControl</i> ; a value of 1 causes the corresponding <i>DSPControl</i> field to be overwritten using the corresponding bits in <i>rt</i> , otherwise the field is unchanged. |      |                                                                                                                                                                                                                                                                                                                                                        |
| RDDSP rt,mask            | Unsigned<br>Word                                                                                                                                                                                                                                                                             | Unsigned<br>Word       | GPR                                                                                                                                                                                                                                                                                                                                                                  | Misc | Copy the values of specific fields in the <i>DSPControl</i> register to the specified GPR.<br>Bits in the <i>mask</i> argument correspond to specific fields in <i>DSPControl</i> ; a value of 1 causes the corresponding <i>DSPControl</i> field to be copied to the corresponding bits in <i>rt</i> , otherwise the bits in <i>rt</i> are unchanged. |

# Table 4.7 List of Instructions in MIPS® DSP Module in Indexed-Load Sub-class

| Instruction<br>Mnemonics                                             | Input<br>Data<br>Type | Output<br>Data<br>Type | Writes<br>GPR / ac /<br>DSPControl | Арр  | Description                                                                                                                                      |
|----------------------------------------------------------------------|-----------------------|------------------------|------------------------------------|------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| LBUX rd,index(base)<br>Replaced by LBUX in<br>baseline nanoMIPS ISA. | -                     | Unsigned<br>byte       | GPR                                | Misc | Index byte load from address base+(index).<br>Loads the byte in the low-order bits of the des-<br>tination register and zero-extends the result. |
| LHX rd,index(base)<br>Replaced by LHX in<br>baseline nanoMIPS ISA.   | -                     | Signed<br>halfword     | GPR                                | Misc | Index halfword load from address<br>base+(index). Loads the halfword in the<br>low-order bits of the register and sign-extends<br>the result.    |
| LWX rd, index(base)<br>Replaced by LWX in<br>baseline nanoMIPS ISA.  | -                     | Signed<br>Word         | GPR                                | Misc | Indexed word load from address base+(index).                                                                                                     |

| Instruction<br>Mnemonics                          | Input<br>Data<br>Type | Output<br>Data<br>Type | Writes<br>GPR / ac /<br>DSPControl | Арр             | Description                                                            |
|---------------------------------------------------|-----------------------|------------------------|------------------------------------|-----------------|------------------------------------------------------------------------|
| BPOSGE32 offset<br>Deprecated in<br>nanoMIPS DSP. | -                     | -                      | -                                  | Audio/<br>Video | Branch if the <i>pos</i> value is greater than or equal to integer 32. |
| BPOSGE32C offset<br>Introduced in DSP-R3.         |                       |                        |                                    |                 |                                                                        |

## Table 4.8 List of Instructions in MIPS® DSP Module in Branch Sub-class

# **Chapter 5**

# **Instruction Encoding**

The opcode map for DSP instructions is under development and will be made available in a subsequent non-preliminary release.

# 5.1 Instruction Bit Encoding

This chapter describes the bit encoding tables used for the MIPS DSP ASE. Table 5.1 describes the meaning of the symbols used in the tables. These tables only list the instruction encoding for the MIPS DSP ASE instructions. See Volumes I and II of this multi-volume set for a full encoding of all instructions.

| Symbol | Meaning                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| *      | Operation or field codes marked with this symbol are reserved for future use. Executing such an instruction must cause a Reserved Instruction Exception.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| δ      | (Also <i>italic</i> field name.) Operation or field codes marked with this symbol denotes a field class. The instruction word must be further decoded by examining additional tables that show values for another instruction field.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| β      | Operation or field codes marked with this symbol represent a valid encoding for a higher-order MIPS ISA level.<br>Executing such an instruction must cause a Reserved Instruction Exception.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| θ      | Operation or field codes marked with this symbol are available to licensed MIPS partners. To avoid multiple conflicting instruction definitions, MIPS Technologies will assist the partner in selecting appropriate encoding if requested by the partner. The partner is not required to consult with MIPS Technologies when one of these encoding is used. If no instruction is encoded with this value, executing such an instruction must cause a Reserved Instruction Exception ( <i>SPECIAL2</i> encoding or coprocessor instruction encoding for a coprocessor to which access is allowed) or a Coprocessor Unusable Exception (coprocessor instruction encoding for a coprocessor to which access is not allowed). |
| σ      | Field codes marked with this symbol represent an EJTAG support instruction and implementation of this encod-<br>ing is optional for each implementation. If the encoding is not implemented, executing such an instruction must<br>cause a Reserved Instruction Exception. If the encoding is implemented, it must match the instruction encoding<br>as shown in the table.                                                                                                                                                                                                                                                                                                                                               |
| 3      | Operation or field codes marked with this symbol are reserved for MIPS Application Specific Extensions. If the ASE is not implemented, executing such an instruction must cause a Reserved Instruction Exception.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| φ      | Operation or field codes marked with this symbol are obsolete and will be removed from a future revision of the MIPS32 ISA. Software should avoid using these operation or field codes.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| ÷      | Operation or field codes marked with this symbol are valid for Release 2 implementations of the architecture.<br>Executing such an instruction in a Release 1 implementation must cause a Reserved Instruction Exception.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |

| Table 5.1 St | ymbols Used | in the | Instruction | Encoding   | Tables |
|--------------|-------------|--------|-------------|------------|--------|
|              | ymbols oseu | in the | manuchom    | Lincounity | labies |

# **Chapter 6**

# The MIPS® DSP Module Instruction Set

# 6.1 Compliance and Subsetting

There are no instruction subsets allowed for the MIPS DSP Module —all instructions must be implemented with all data format types as shown. Instructions are listed in alphabetical order, with a secondary sort on data type format from narrowest to widest, i.e., quad byte, paired halfword, and word.

# 6.2 DSP Module Specific Pseudocode Functions

This section defines the pseudocode functions that are specific to the DSP Module and DSP Module Rev2. These functions are used in the Operation section of each DSP Module instruction description.

## 6.2.1 ValidateAccessToDSPResources()

The ValidateAccessToDSPResoures function is used to determine if access is available to the DSP Module resources. This is done by looking at the state of the DSPP bit in *Config3* and MX bit in the *Status* register.

#### Figure 6.1 ValidateAccessToDSPResource Pseudocode Function

```
ValidateAccessToDSPResources()
/* The function does not return if an exception is signaled */
   /* If DSP is not implemented by the processor, a Reserved */
   /* Instruction exception is signaled */
   if (Config3<sub>DSPP</sub> = 0) then
      SignalException(ReservedInstruction)
   endif
   case Status<sub>MX</sub> of
      /* MX off */
      i#0:
        SignalException(DSPDisabled)
      /* MX on */
      i#1:
           /* Access allowed to DSP Module resources */
   endcase
```

endfunction ValidateAccessToDSPResources

# 6.2.2 ValidateAccessToDSP2Resources()

The ValidateAccessToDSP2Resources function is used to determine if access is available to the DSP Module Rev2 resources. This is done by checking the state of the DSP2P bit (DSP Rev2 Present, bit 11 in the *Config3* CPO register), and the MX bit in the *Status* register.

#### Figure 6.2 ValidateAccessToDSP2Resources Pseudocode Function

```
ValidateAccessToDSP2Resources()
/* The function does not return if an exception is signaled */
   /* If DSP Module Rev2 is not implemented by the processor, a */
   /* Reserved Instruction exception is signaled */
   if ((Config3<sub>DSP2P</sub> = 0) or (Config3<sub>DSPP</sub> = 0)) then
      SignalException(ReservedInstruction)
   endif
   case Status<sub>MX</sub> of
      /* MX off */
      l#0:
        SignalException(DSPDisabled)
      /* MX on */
      l#1:
           /* Access allowed to DSP Module Rev2 resources */
   endcase
```

endfunction ValidateAccessToDSP2Resources

| AE | SQ_S.PH        |     |             |    |    | Find Absolute Valu | ie of Tv | /0 | Fractio | ona | l Half | wor | ds  |
|----|----------------|-----|-------------|----|----|--------------------|----------|----|---------|-----|--------|-----|-----|
|    | 31             | 26  | 25 21       | 20 | 16 | 15 9               | 8        | 6  | 5       | 3   | 2      | 0   |     |
|    | P32A<br>001000 |     | rt          | rs |    | 0001000            | 100      |    | 111     |     | 111    |     |     |
|    | 6              |     | 5           | 5  |    | 7                  | 3        |    | 3       |     | 3      |     |     |
|    | Format: AB     | SQ_ | S.PH rt, rs |    |    |                    |          |    |         |     |        |     | DSP |

Purpose: Find Absolute Value of Two Fractional Halfwords

Find the absolute value of each of a pair of Q15 fractional halfword values with 16-bit saturation.

**Description:**  $rt \leftarrow sat16(abs(rs_{31..16})) || sat16(abs(rs_{15..0}))$ 

For each value in the pair of Q15 fractional halfword values in register *rs*, the absolute value is found and written to the corresponding Q15 halfword in register *rt*. If either input value is the minimum Q15 value (-1.0 in decimal, 0x8000 in hexadecimal), the corresponding result is saturated to 0x7FFF.

This instruction sets bit 20 in the DSPControl register in the outflag field if either input value was saturated.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
ValidateAccessToDSPResources()
tempB_{15..0} \leftarrow satAbs16(GPR[rs]_{31..16})
tempA_{15..0} \leftarrow satAbs16(GPR[rs]_{15..0})
GPR[rt]_{31..0} \leftarrow tempB_{15..0} || tempA_{15..0}
function satAbs16( a_{15..0} )
    if ( a_{15..0} = 0x8000 ) then
         DSPControl_{ouflag:20} \leftarrow 1
         temp_{15..0} \leftarrow 0x7FFF
    else
         if ( a_{15} = 1 ) then
              temp_{15..0} \leftarrow -a_{15..0}
         else
              \texttt{temp}_{15..0} \leftarrow \texttt{a}_{15..0}
         endif
     endif
     return temp<sub>15..0</sub>
endfunction satAbs16
```

#### **Exceptions:**

Reserved Instruction, DSP Disabled

# ABSQ\_S.PH

Find Absolute Value of Two Fractional Halfwords

| BSQ_S.QB Find Absolute Value of Four Fractional Byte |    |    |    |    |    |         |   |     |   |    | e Valu |   |     |
|------------------------------------------------------|----|----|----|----|----|---------|---|-----|---|----|--------|---|-----|
| 31                                                   | 26 | 25 | 21 | 20 | 16 | 15 9    | Э | 8   | 6 | 5  | 3      | 2 | 0   |
| P32A<br>001000                                       |    | rt |    | rs |    | 0000000 |   | 100 |   | 11 | l      |   | 111 |
| 6                                                    |    | 5  |    | 5  |    | 7       |   | 3   |   | 3  |        |   | 3   |

Purpose: Find Absolute Value of Four Fractional Byte Values

Find the absolute value of four fractional byte vector elements with saturation.

```
Description: rt \leftarrow sat8(abs(rs_{31..24})) || sat8(abs(rs_{23..16})) || sat8(abs(rs_{15..8})) ||
sat8(abs(rs_{7,10}))
```

For each value in the four Q7 fractional byte elements in register rs, the absolute value is found and written to the corresponding byte in register rt. If either input value is the minimum Q7 value (-1.0 in decimal, 0x80 in hexadecimal), the corresponding result is saturated to 0x7F.s

This instruction sets bit 20 in ouflag field of the DSPControl register if any input value was saturated.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are UNPREDICTABLE and the values of the operand vectors become UNPREDICTABLE.

#### **Operation:**

```
ValidateAccessToDSP2Resources()
tempD<sub>7 0</sub> \leftarrow abs8( GPR[rs]<sub>31 24</sub> )
tempC_{7..0} \leftarrow abs8(GPR[rs]_{23..16})
tempB_{7..0} \leftarrow abs8(GPR[rs]_{15..8})
tempA_{7..0} \leftarrow abs8(GPR[rs]_{7..0})
GPR[rt]_{31..0} \leftarrow tempD_{7..0} || tempC_{7..0} || tempB_{7..0} || tempA_{7..0}
function abs8( a_{7..0} )
    if ( a_{7..0} = 0x80 ) then
         \text{DSPControl}_{\text{ouflag:20}} \leftarrow 1
          temp_{7..0} \leftarrow 0x7F
     else
          if (a_7 = 1) then
               \texttt{temp}_{7..0} \leftarrow -\texttt{a}_{7..0}
          else
               temp_{7..0} \leftarrow a_{7..0}
          endif
     endif
     return temp_{7..0}
endfunction abs8
```

#### **Exceptions:**

Reserved Instruction, DSP Disabled

# ABSQ\_S.QB

Find Absolute Value of Four Fractional Byte Values

| ABSQ_S.W Find Absolute Value of Fraction |                |     |            |    |    |         |   |     |   |     |   | tior | nal Wo | ord |
|------------------------------------------|----------------|-----|------------|----|----|---------|---|-----|---|-----|---|------|--------|-----|
|                                          | 31             | 26  | 25 21      | 20 | 16 | 15 9    | 8 | 6   | 5 |     | 3 | 2    | 0      |     |
|                                          | P32A<br>001000 |     | rt         |    | rs | 0010000 |   | 100 |   | 111 |   | 1    | 111    |     |
|                                          | 6              |     | 5          |    | 5  | 7       |   | 3   |   | 3   |   |      | 3      | 1   |
|                                          | Format: AB     | SQ_ | S.W rt, rs |    |    |         |   |     |   |     |   |      |        | DSP |

Purpose: Find Absolute Value of Fractional Word

Find the absolute value of a fractional Q31 value with 32-bit saturation.

**Description:**  $rt \leftarrow sat32(abs(rs_{31..0}))$ 

The absolute value of the Q31 fractional value in register rs is found and written to destination register rt. If the input value is the m inimum Q31 value (-1.0 in decimal, 0x80000000 in hexadecimal), the result is saturated to 0x7FFFFFF before being sign-extended and written to register rt.

This instruction sets bit 20 in the DSPControl register in the outflag field if the input value was saturated.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
ValidateAccessToDSPResources()
\texttt{temp}_{\texttt{31..0}} \leftarrow \texttt{satAbs32(GPR[rs]}_{\texttt{31..0}})
GPR[rt]_{31..0} \leftarrow temp_{31..0}
function satAbs32( a_{31..0} )
     if ( a_{31..0} = 0 \times 80000000 ) then
           \texttt{DSPControl}_{\texttt{ouflag:20}} \leftarrow \texttt{1}
           else
           if ( a_{31} = 1 ) then
                 \texttt{temp}_{\texttt{31..0}} \leftarrow \texttt{-a}_{\texttt{31..0}}
           else
                 \texttt{temp}_{\texttt{31..0}} \leftarrow \texttt{a}_{\texttt{31..0}}
            endif
      endif
     return temp<sub>31..0</sub>
endfunction satAbs32
```

#### **Exceptions:**

Reserved Instruction, DSP Disabled

## ABSQ\_S.W

Find Absolute Value of Fractional Word

| ADDQ[_S].PI   | -  |                    |        |    |    |       | Ad | d Fractional H | alfw | ord Ve | ectors |
|---------------|----|--------------------|--------|----|----|-------|----|----------------|------|--------|--------|
| 31<br>ADDQ.PH | 26 | 25                 | 21 20  | 16 | 15 | 11 1( | 9  |                | 3    | 2      | 0      |
| P32<br>0010   |    | rt                 |        | rs | rd | 0     |    | 0000001        |      | 101    |        |
| ADDQ_S.PH     |    | - <b> </b>         |        |    |    |       | -  |                |      | l      |        |
| P32<br>0010   |    | rt                 |        | rs | rd | 1     |    | 0000001        |      | 101    |        |
| 6             |    | 5                  | 1      | 5  | 5  | 1     |    | 7              |      | 3      |        |
| Format        | ~  | [_S].PH<br>.PH rd, | rs, rt |    |    |       |    |                |      |        | DS     |

Purpose: Add Fractional Halfword Vectors

ADDQ S.PH rd, rs, rt

Element-wise addition of two vectors of Q15 fractional values to produce a vector of Q15 fractional results, with optional saturation.

**Description:**  $rd \leftarrow sat16(rs_{31..16} + rt_{31..16}) || sat16(rs_{15..0} + rt_{15..0})$ 

Each of the two fractional halfword elements in register *rt* are added to the corresponding fractional halfword elements in register *rs*.

For the non-saturating version of the instruction, the result of each addition is written into the corresponding element in register *rd*. If the addition results in overflow or underflow, the result modulo 2 is written to the corresponding element in register *rd*.

For the saturating version of the instruction, signed saturating arithmetic is performed, where an overflow is clamped to the largest representable value (0x7FFF he xadecimal) and an underflow to the sm allest representable value (0x8000 hexadecimal) before being written to the destination register *rd*.

For each instruction, if either of the individual additions result in underflow, overflow, or saturation, a 1 is written to bit 20 in the *DSPControl* register in the ouflag field.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
\begin{array}{l} \text{ADDQ.PH:} \\ \text{ValidateAccessToDSPResources()} \\ \text{tempB}_{15..0} \leftarrow \text{add16(GPR[rs]}_{31..16}, \text{GPR[rt]}_{31..16}) \\ \text{tempA}_{15..0} \leftarrow \text{add16(GPR[rs]}_{15..0}, \text{GPR[rt]}_{15..0}) \\ \text{GPR[rd]}_{31..0} \leftarrow \text{tempB}_{15..0} \mid \mid \text{tempA}_{15..0} \\ \end{array}
\begin{array}{l} \text{ADDQ\_S.PH:} \\ \text{ValidateAccessToDSPResources()} \\ \text{tempB}_{15..0} \leftarrow \text{satAdd16(GPR[rs]}_{31..16}, \text{GPR[rt]}_{31..16}) \\ \text{tempA}_{15..0} \leftarrow \text{satAdd16(GPR[rs]}_{15..0}, \text{GPR[rt]}_{15..0}) \\ \text{GPR[rd]}_{31..0} \leftarrow \text{tempB}_{15..0} \mid \mid \text{tempA}_{15..0} \\ \end{array}
\begin{array}{l} \text{function add16(a}_{15..0}, b_{15..0}) \\ \text{temp}_{16..0} \leftarrow (a_{15} \mid \mid a_{15..0}) + (b_{15} \mid \mid b_{15..0}) \\ \text{if (temp}_{16} \neq \text{temp}_{15}) \text{ then} \\ \\ \text{DSPControl}_{ouflag:20} \leftarrow 1 \\ \text{endif} \end{array}
```

DSP

## ADDQ[\_S].PH

## Add Fractional Halfword Vectors

```
return temp<sub>15..0</sub>
endfunction add16
function satAdd16( a_{15..0}, b_{15..0} )
  temp<sub>16..0</sub> \leftarrow ( a_{15} \mid \mid a_{15..0} ) + ( b_{15} \mid \mid b_{15..0} )
  if ( temp<sub>16</sub> \neq temp<sub>15</sub> ) then
    if ( temp<sub>16</sub> = 0 ) then
       temp<sub>15..0</sub> \leftarrow 0x7FFF
  else
       temp<sub>15..0</sub> \leftarrow 0x8000
  endif
    DSPControl<sub>ouflag:20</sub> \leftarrow 1
  endif
  return temp<sub>15..0</sub>
endfunction satAdd16
```

### **Exceptions:**

| ٩DE | DQ_S.W         |      |            |     |       |    |    |    |   | Add F   | ract | ion | al W | ords |
|-----|----------------|------|------------|-----|-------|----|----|----|---|---------|------|-----|------|------|
| 3   | :1 :           | 26 2 | 25 2       | 1 2 | 20 16 | 15 | 11 | 10 | 9 |         | 3    | 2   | (    | )    |
|     | P32A<br>001000 |      | rt         |     | rs    | rc | l  | x  |   | 1100000 |      |     | 101  |      |
|     | 6              |      | 5          |     | 5     | 5  |    | 1  |   | 7       |      |     | 3    |      |
|     | Format: ADD    | ΣS   | .W rd, rs, | rt  |       |    |    |    |   |         |      |     |      | DS   |

**Purpose:** Add Fractional Words

Addition of two Q31 fractional values to produce a Q31 fractional result, with saturation.

**Description:**  $rd \leftarrow sat32(rs_{31..0} + rt_{31..0})$ 

The Q31 fractional word in register *rt* is added to the corresponding fractional word in register *rs*. The result is then written to the destination register *rd*.

Signed saturating arithmetic is used, where an overflow is clamped to the largest representable value (0x7FFFFFF hexadecimal) and an un derflow to the smallest representable value (0x80000000 hexadecimal) before being sign-extended and written to the destination register *rd*.

If the addition results in underflow, overflow, or saturation, a 1 is written to bit 20 in the *DSPControl* register within the *ouflag* field.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{temp}_{31..0} \leftarrow \mbox{satAdd32( GPR[rs]_{31..0} , GPR[rt]_{31..0} )} \\ \mbox{GPR[rd]_{31..0}} \leftarrow \mbox{temp}_{31..0} \\ \mbox{function satAdd32( a_{31..0}, b_{31..0} )} \\ \mbox{temp}_{32..0} \leftarrow (a_{31} || a_{31..0} ) + (b_{31} || b_{31..0} ) \\ \mbox{if (temp}_{32} \neq \mbox{temp}_{31} ) \mbox{then} \\ \mbox{if (temp}_{32} \neq \mbox{temp}_{31} ) \mbox{then} \\ \mbox{temp}_{31..0} \leftarrow \mbox{0x7FFFFFF} \\ \mbox{else} \\ \mbox{temp}_{31..0} \leftarrow \mbox{0x8000000} \\ \mbox{endif} \\ \mbox{DSPControl}_{\mbox{outlag:20}} \leftarrow 1 \\ \mbox{endif} \\ \mbox{return temp}_{31..0} \\ \mbox{endfunction satAdd32} \\ \end{array}
```

#### **Exceptions:**

# ADDQ\_S.W

Add Fractional Words

ADDQH[\_R].PH

Add Fractional Halfword Vectors And Shift Right to Halve Results

| 31<br>ADDQH.PH | 26 25 | 21 |    | 15 11 | 10 | 9       | 3 2 |
|----------------|-------|----|----|-------|----|---------|-----|
| P32A<br>001000 |       | rt | rs | rd    | 0  | 0001001 | 101 |
| ADDQH_R.PH     |       |    |    | 1     | -  | ļ       |     |
| P32A<br>001000 |       | rt | rs | rd    | 1  | 0001001 | 101 |
| 6              | Į     | 5  | 5  | 5     | 1  | 7       | 3   |

```
ADDQH.PH rd, rs, rt
ADDQH R.PH rd, rs, rt
```

DSP-R2 DSP-R2

Purpose: Add Fractional Halfword Vectors And Shift Right to Halve Results

Element-wise fractional addition of halfword vectors, with a right shift by one bit to halve each result, with optional rounding.

**Description:**  $rd \leftarrow round((rs_{31..16} + rt_{31..16}) >> 1) || round((rs_{15..0} + rt_{15..0}) >> 1)$ 

Each element from the two halfword values in register *rs* is added to the corresponding halfword element in register *rt* to create an interim 17-bit result.

In the non-rounding instruction variant, each interim result is then shifted right by one bit before being written to the corresponding halfword element of destination register *rd*.

In the rounding version of the instruction, a v alue of 1 is added at the least-significant bit position of each interim result; the interim result is then right-shifted by one bit and written to the destination register.

This instruction does not modify the DSPControl register.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
ADDOH.PH
    ValidateAccessToDSP2Resources()
    tempB_{15..0} \leftarrow rightShift1AddQ16(GPR[rs]_{31..16}, GPR[rt]_{31..16})
    tempA_{15..0} \leftarrow rightShift1AddQ16(GPR[rs]_{15..0}, GPR[rt]_{15..0})
    GPR[rd]_{31..0} \leftarrow tempB_{15..0} \mid \mid tempA_{15..0}
ADDQH R.PH
    ValidateAccessToDSP2Resources()
    tempB_{15..0} \leftarrow roundRightShift1AddQ16(GPR[rs]_{31..16}, GPR[rt]_{31..16})
    tempA_{15..0} \leftarrow roundRightShift1AddQ16(GPR[rs]_{15..0}, GPR[rt]_{15..0})
    GPR[rd]_{31..0} \leftarrow tempB_{15..0} \mid tempA_{15..0}
function rightShift1AddQ16( a_{15..0} , b_{15..0} )
    temp_{16..0} \leftarrow ((a_{15} || a_{15..0}) + (b_{15} || b_{15..0}))
    return temp<sub>16..1</sub>
endfunction rightShift1AddQ16
function roundRightShift1AddQ16( a_{15\ldots0} , b_{15\ldots0} )
    temp_{16..0} \leftarrow ((a_{15} || a_{15..0}) + (b_{15} || b_{15..0}))
    temp_{16..0} \leftarrow temp_{16..0} + 1
```

# ADDQH[\_R].PH

## Add Fractional Halfword Vectors And Shift Right to Halve Results

 $\label{eq:return_temp_16..1} endfunction \ roundRightShift1AddQ16$ 

#### **Exceptions:**

| ADDQH[_R].     | w  |                      |    |        | Add Fractional Words And Shift Right to Halve Results |    |    |    |   |        |   |     |      |  |
|----------------|----|----------------------|----|--------|-------------------------------------------------------|----|----|----|---|--------|---|-----|------|--|
| 31             | 26 | 25                   | 21 | 20     | 16                                                    | 15 | 11 | 10 | 9 |        | 3 | 2   | 0    |  |
| ADDQH.W        |    |                      |    |        |                                                       |    |    |    |   |        |   |     |      |  |
| P32A<br>001000 |    | rt                   |    | rs     |                                                       | ro | đ  | 0  | 0 | 010001 |   | 101 |      |  |
| ADDQH_R.W      |    |                      |    |        |                                                       | 1  |    | 1  |   |        |   |     |      |  |
| P32A<br>001000 |    | rt                   |    | rs     |                                                       | re | d  | 1  | 0 | 010001 |   | 101 |      |  |
| 6              |    | 5                    |    | 5      |                                                       | 5  | ;  | 1  |   | 7      |   | 3   |      |  |
| Format         |    | DDQH[_R].W<br>DDQH.W |    | rs, rt |                                                       |    |    |    |   |        |   | DSP | '-R2 |  |

ADDQH R.W rd, rs, rt

DSP-R2 DSP-R2

Purpose: Add Fractional Words And Shift Right to Halve Results

Fractional addition of word vectors, with a right shift by one bit to halve the result, with optional rounding.

**Description:**  $rd \leftarrow round((rs_{31..0} + rt_{31..0}) >> 1)$ 

The word in register rs is added to the word in register rt to create an interim 33-bit result.

In the non-rounding instruction variant, the interim result is then shifted right by one bit before being written to the destination register *rd*.

In the rounding version of the instruction, a value of 1 is added at the least-si gnificant bit position of the interim result; the interim result is then right-shifted by one bit and written to the destination register.

This instruction does not modify the DSPControl register.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
ADDQH.W
    ValidateAccessToDSP2Resources()
     tempA_{31..0} \leftarrow rightShift1AddQ32(GPR[rs]_{31..0}, GPR[rt]_{31..0})
    GPR[rd]_{31...0} \leftarrow tempA_{31...0}
ADDQH R.W
    ValidateAccessToDSP2Resources()
     tempA_{31..0} \leftarrow roundRightShift1AddQ32(GPR[rs]_{31..0}, GPR[rt]_{31..0})
    GPR[rd]_{31..0} \leftarrow tempA_{31..0}
function rightShift1AddQ32( a_{31\ldots0} , b_{31\ldots0} )
    \texttt{temp}_{32..0} \leftarrow (( a_{31} || a_{31..0} ) + ( b_{31} || b_{31..0} ))
    return temp_{32..1}
endfunction rightShift1AddQ32
function roundRightShift1AddQ32( a_{31\ldots0} , b_{31\ldots0} )
     \texttt{temp}_{32..0} \leftarrow ((a_{31} || a_{31..0}) + (b_{31} || b_{31..0}))
    \texttt{temp}_{\texttt{32..0}} \leftarrow \texttt{temp}_{\texttt{32..0}} + \texttt{1}
    return temp<sub>32..1</sub>
endfunction roundRightShift1AddQ32
```

MIPS® Architecture Extension: nanoMIPS32™ DSP Technical Reference Manual — Revision 0.04

## ADDQH[\_R].W

Add Fractional Words And Shift Right to Halve Results

## **Exceptions:**

| A | DDSC           |     |            |      |   |    |    | Ac | dd Signeo | d Word and | I S | et Car | ry E | Bit |
|---|----------------|-----|------------|------|---|----|----|----|-----------|------------|-----|--------|------|-----|
|   | 31             | 26  | 25 21      | 20 1 | 6 | 15 | 11 | 10 | 9         |            | 3   | 2      | 0    |     |
|   | P32A<br>001000 |     | rt         | rs   |   | rd |    | x  | 1         | 110000     |     | 101    |      |     |
|   | 6              |     | 5          | 5    |   | 5  |    | 1  |           | 7          |     | 3      |      |     |
|   | Format: ADI    | DSC | rd, rs, rt |      |   |    |    |    |           |            |     |        |      | DSP |

Purpose: Add Signed Word and Set Carry Bit

Add two signed 32-bit values and set the carry bit in the DSPControl register if the addition generates a carry-out bit.

**Description:** DSPControl[c], rd ← rs + rt

The 32-bit signed value in register *rt* is added to the 32-bit signed value in register *rs*. The result is then written into register *rd*. The carry bit result out of the addition operation is written to bit 13 (the c field) of the *DSPControl* register.

This instruction does not modify the ouflag field in the DSPControl register.

#### **Restrictions:**

No data-dependent exceptions are possible.

#### **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{temp}_{32..0} \leftarrow (\ 0 \ || \ \mbox{GPR[rs]}_{31..0} \ ) + (\ 0 \ || \ \mbox{GPR[rt]}_{31..0} \ ) \\ \mbox{DSPControl}_{c:13} \leftarrow \mbox{temp}_{32} \\ \mbox{GPR[rd]}_{31..0} \leftarrow \mbox{temp}_{31..0} \end{array}
```

#### **Exceptions:**

Reserved Instruction, DSP Disabled

#### **Programming Notes:**

Note that this is really two's complement (modulo) arithmetic on the two integer values, where the overflow is preserved in architectural state. The ADDWC instruction can be used to do an add using this carry bit. These instructions are provided in the MIPS32 ISA to support 64-bit addition and subtraction using two pairs of 32-bit GPRs to hold each 64-bit value. In the MIPS64 ISA, 64-bit addition and subtraction can be performed directly, without requiring the use of these instructions.

## ADDSC

| DU[_S].PH      |       |       |    |    |       | Unsig | ned Add In | tege | er Hal | fwo |
|----------------|-------|-------|----|----|-------|-------|------------|------|--------|-----|
| 31             | 26 25 | 21 20 | 16 | 15 | 11 10 | 9     |            | 3    | 2      | 0   |
| ADDU.PH        |       |       |    |    |       |       |            |      |        |     |
| P32A<br>001000 | rt    |       | rs | rd | 0     |       | 0100001    |      | 10     | )1  |
| ADDU_S.PH      |       |       |    |    |       |       |            |      |        |     |
| P32A<br>001000 | rt    |       | rs | rd | 1     |       | 0100001    |      | 10     | )1  |
| 6              | 5     |       | 5  | 5  | 1     | 1     | 7          |      | 3      | 3   |

ADDU.PH rd, rs, rt ADDU\_S.PH rd, rs, rt

DSP-R2 DSP-R2

Purpose: Unsigned Add Integer Halfwords

Add two pairs of unsigned integer halfwords, with optional saturation.

**Description:**  $rd \leftarrow sat16(rs_{31..16} + rt_{31..16}) || sat16(rs_{15..0} + rt_{15..0})$ 

The two unsigned integer halfword elements in register *rt* are added to the corresponding unsigned integer halfword elements in register *rs*.

For the non-saturating version of the instruction, the result modulo 65,536 is written into the corresponding element in register *rd*.

For the saturating version of the instruction, the addition is performed using unsigned saturating arithmetic. Results that overflow are clamped to the lar gest representable value (65,535 decimal, 0xFFFF hexadecimal) before being written to the destination register *rd*.

For either instruction, if any of the individual additions result in overflow or saturation, a 1 is written to bit 20 in the *DSPControl* register within the ouflag field.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

#### **Exceptions:**

# ADDU[\_S].PH

Unsigned Add Integer Halfwords

| ADDU[_S].QB Unsigned Add Quad Byte Vectors |       |                                       |      |      |       |         |     |            |  |  |  |
|--------------------------------------------|-------|---------------------------------------|------|------|-------|---------|-----|------------|--|--|--|
| 31                                         | 26    | 25 21                                 | 20 1 | 6 15 | 11 10 | 9       | 3 2 | 0          |  |  |  |
| ADDU.QB                                    |       |                                       |      |      |       |         |     |            |  |  |  |
| P32<br>0010                                |       | rt                                    | rs   | rd   | 0     | 0011001 | 101 |            |  |  |  |
| ADDU_S.QB                                  |       |                                       |      | _    | ĮĮ_   |         |     |            |  |  |  |
| P32<br>0010                                |       | rt                                    | rs   | rd   | 1     | 0011001 | 101 |            |  |  |  |
| 6                                          |       | 5                                     | 5    | 5    | 1     | 7       | 3   |            |  |  |  |
| Format                                     | ADDU. | [_S].QB<br>QB rd, rs,<br>S.QB rd, rs, |      |      |       |         |     | DSP<br>DSP |  |  |  |

Purpose: Unsigned Add Quad Byte Vectors

Element-wise addition of two vectors of unsigned byte values to produce a vector of unsigned byte results, with optional saturation.

**Description:**  $rd \leftarrow sat8(rs_{31..24} + rt_{31..24}) || sat8(rs_{23..16} + rt_{23..16}) || sat8(rs_{15..8} + rt_{15..8}) || sat8(rs_{7..0} + rt_{7..0})$ 

The four byte elements in register rt are added to the corresponding byte elements in register rs.

For the non-saturating version of the instruction, the result modulo 256 is written into the corresponding element in register *rd*.

For the saturating version of the instruction, the addition is performed using unsigned saturating arithmetic. Results that overflow are clamped to the largest representable value (255 decimal, 0xFF hexadecimal) before being written to the destination register *rd*.

For either instruction, if any of the individual additions result in overflow or saturation, a 1 is written to bit 20 in the *DSPControl* register within the *ouflag* field.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
ADDU.OB:
    ValidateAccessToDSPResources()
     tempD<sub>7 0</sub> \leftarrow addU8( GPR[rs]<sub>31 24</sub>, GPR[rt]<sub>31 24</sub>)
     \texttt{tempC}_{7..0} \leftarrow \texttt{addU8}(\texttt{GPR[rs]}_{23..16}, \texttt{GPR[rt]}_{23..16})
     tempB_{7..0} \leftarrow addU8(GPR[rs]_{15..8}, GPR[rt]_{15..8})
     tempA_{7..0} \leftarrow addU8(GPR[rs]_{7..0}, GPR[rt]_{7..0})
    GPR[rd]_{31\dots0} \leftarrow tempD_{7\dots0} \mid \mid tempC_{7\dots0} \mid \mid tempB_{7\dots0} \mid \mid tempA_{7\dots0}
ADDU_S.QB:
    ValidateAccessToDSPResources()
     tempD_{7..0} \leftarrow satAddU8(GPR[rs]_{31..24}, GPR[rt]_{31..24})
     tempC_{7..0} \leftarrow satAddU8(GPR[rs]_{23..16}, GPR[rt]_{23..16})
     \texttt{tempB}_{7..0} \leftarrow \texttt{satAddU8}(\texttt{GPR[rs]}_{15..8}, \texttt{GPR[rt]}_{15..8})
     tempA_{7..0} \leftarrow satAddU8(GPR[rs]_{7..0}, GPR[rt]_{7..0})
    GPR[rd]_{31..0} \leftarrow tempD_{7..0} || tempC_{7..0} || tempB_{7..0} || tempA_{7..0}
function addU8( a_{7..0}, b_{7..0})
     temp_{8..0} \leftarrow (0 || a_{7..0}) + (0 || b_{7..0})
```

## ADDU[\_S].QB

## **Unsigned Add Quad Byte Vectors**

```
if (temp<sub>8</sub> = 1) then

DSPControl<sub>ouflag:20</sub> \leftarrow 1

endif

return temp<sub>7..0</sub>

endfunction addU8

function satAddU8(a<sub>7..0</sub>, b<sub>7..0</sub>)

temp<sub>8..0</sub> \leftarrow (0 || a<sub>7..0</sub>) + (0 || b<sub>7..0</sub>)

if (temp<sub>8</sub> = 1) then

temp<sub>7..0</sub> \leftarrow 0xFF

DSPControl<sub>ouflag:20</sub> \leftarrow 1

endif

return temp<sub>7..0</sub>

endfunction satAddU8
```

### **Exceptions:**

| DDWC        |         |           |    |    |    |    |    |    |   | Add Wo  | rd w | ith C | arry | Bit |
|-------------|---------|-----------|----|----|----|----|----|----|---|---------|------|-------|------|-----|
| 31          | 26      | 25        | 21 | 20 | 16 | 15 | 11 | 10 | 9 |         | 3    | 2     | 0    |     |
| P32<br>0010 |         | rt        |    | rs |    | rd |    | x  |   | 1111000 |      | 1     | 01   |     |
| 6           |         | 5         |    | 5  |    | 5  |    | 1  |   | 7       |      |       | 3    |     |
| Format      | : ADDWC | rd, rs, 1 | rt |    |    |    |    |    |   |         |      |       |      | D   |

Purpose: Add Word with Carry Bit

Add two signed 32-bit values with the carry bit in the DSPControl register.

**Description:** rd ← rs + rt + DSPControl<sub>c:13</sub>

The 32-bit value in register *rt* is added to the 32-bit value in register *rs* and the carry bit in the *DSPControl* register. The result is then written to destination register *rd*.

If the addition results in either overflow or underflow, this instruction writes a 1 to bit 20 in the *ouflag* field of the *DSPControl* register.

#### **Restrictions:**

No data-dependent exceptions are possible.

#### **Operation:**

#### **Exceptions:**

## ADDWC

| ADDUH[_R].QE   | 3     |                               |    | Unsigned Add | Vector Quad | I-Bytes | And Right Shift t | o Halve Results |
|----------------|-------|-------------------------------|----|--------------|-------------|---------|-------------------|-----------------|
| 31<br>ADDUH.QB | 26    | 25                            | 21 | 20 16        | 15          | 11 10   | 9                 | 320             |
| P32A<br>001000 |       | rt                            |    | rs           | rd          | 0       | 0101001           | 101             |
| ADDUH_R.QB     |       | 1                             |    | 4            | ļ           |         | 1                 |                 |
| P32A<br>001000 |       | rt                            |    | rs           | rd          | 1       | 0101001           | 101             |
| 6              |       | 5                             |    | 5            | 5           | 1       | 7                 | 3               |
|                | ADDUH | [_R].QB r<br>.QB r<br>_R.QB r |    |              |             |         |                   | DSP-1<br>DSP-1  |

Purpose: Unsigned Add Vector Quad-Bytes And Right Shift to Halve Results

Element-wise unsigned addition of unsigned byte vectors, with right shift by one bit to halve each result, with optional rounding.

**Description** rd  $\leftarrow$  round((rs<sub>31..24</sub> + rt<sub>31..24</sub>)>>1) || round((rs<sub>23..16</sub> + rt<sub>23..16</sub>)>>1) || round((rs<sub>15..8</sub> + rt<sub>15..8</sub>)>>1) || round((rs<sub>7..0</sub> + rt<sub>7..0</sub>)>>1)

Each element from the four unsigned byte values in register *rs* is added to the corresponding unsigned byte element in register *rt* to create an unsigned interim result.

In the non-rounding instruction variant, each interim result is then shifted right by one bit before being written to the corresponding unsigned byte element of destination register *rd*.

In the rounding version of the instruction, a v alue of 1 is added at the least-significant bit position of each interim result before being right-shifted by one bit and written to the destination register.

This instruction does not modify the DSPControl register.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
ADDUH.QB
    ValidateAccessToDSPResources()
    \texttt{tempD}_{7..0} \leftarrow \texttt{rightShift1AddU8(GPR[rs]_{31..24}, GPR[rt]_{31..24})}
    tempC_{7..0} \leftarrow rightShift1AddU8(GPR[rs]_{23..16}, GPR[rt]_{23..16})
    \texttt{tempB}_{7..0} \leftarrow \texttt{rightShift1AddU8(GPR[rs]_{15..8}, GPR[rt]_{15..8})}
    \texttt{tempA}_{7..0} \leftarrow \texttt{rightShift1AddU8(GPR[rs]_{7..0}, GPR[rt]_{7..0})}
    GPR[rd]_{31..0} \leftarrow tempD_{7..0} || tempC_{7..0} || tempB_{7..0} || tempA_{7..0}
ADDUH R.QB
    ValidateAccessToDSPResources()
    tempD_{7..0} \leftarrow roundRightShift1AddU8(GPR[rs]_{31..24}, GPR[rt]_{31..24})
    \texttt{tempC}_{7..0} \leftarrow \texttt{roundRightShift1AddU8(GPR[rs]_{23..16}, GPR[rt]_{23..16})}
    tempB_{7..0} \leftarrow roundRightShift1AddU8(GPR[rs]_{15..8}, GPR[rt]_{15..8})
    tempA_{7..0} \leftarrow roundRightShift1AddU8(GPR[rs]_{7..0}, GPR[rt]_{7..0})
    GPR[rd]_{31..0} \leftarrow tempD_{7..0} || tempC_{7..0} || tempB_{7..0} || tempA_{7..0}
function rightShift1AddU8( a_{7..0} , b_{7..0} )
    temp_{8..0} \leftarrow ((0 || a_{7..0}) + (0 || b_{7..0}))
```

## ADDUH[\_R].QB

```
\begin{array}{l} \mbox{return temp}_{8..1} \\ \mbox{endfunction rightShift1AddU8} \\ \label{eq:second} function roundRightShift1AddU8(a_{7..0}, b_{7..0}) \\ \mbox{temp}_{8..0} \leftarrow ((0 \mid \mid a_{7..0}) + (0 \mid \mid b_{7..0})) \\ \mbox{temp}_{8..0} \leftarrow \mbox{temp}_{8..0} + 1 \\ \mbox{return temp}_{8..1} \\ \mbox{endfunction roundRightShift1AddU8} \end{array}
```

## **Exceptions:**

#### BALIGN

### Byte Align Contents from Two Registers

```
Format: BALIGN rt, rs, bp
EXTW rt, rs, rt, 8*(4-bp)
```

DSP-R2 Replaced with EXTW in nanoMIPS

Purpose: Byte Align Contents from Two Registers

Create a word result by combining a specified number of bytes from each of two source registers.

**Description:** rt ← (rt << 8\*bp) || (rs >> 8\*(4-bp))

The 32-bit word in register rt is left-shifted as a 32-bit value by bp byte positions, and the right-most word in register rs is right-shifted as a 32-bit value by (4-bp) byte positions. The shifted values are then or-ed together to create a 32-bit result that is written to destination register rt.

The argument bp is provided by the instruction, and is interpreted as an unsigned two-bit integer taking values between zero and three.

#### **Restrictions:**

No data-dependent exceptions are possible.

#### **Operation:**

#### **Implementation Notes:**

When bp is equal to zero, no left-shift is performed. When bp is equal to two, the result is equivalent to a PACKRL operation when the destination register is identical to the first source register. The assembler is expected to map these two variants of the BALIGN instructions to the appropriate equivalents. The only valid values of bp that the hardware must implement are when bp is equal to 1 and 3. If this instruction is passed through to the hardware with bp value equal to 0 or 2, the result is **UNPREDICTABLE**.

#### **Exceptions:**

## BALIGN

Byte Align Contents from Two Registers

| BIT | REV        |     |          |    |    |    |    |         |   |    |    | Bit- | Rever | se l | Halfw | ord |
|-----|------------|-----|----------|----|----|----|----|---------|---|----|----|------|-------|------|-------|-----|
|     | 31         | 26  | 25       | 21 | 20 | 16 | 15 |         | 9 | 8  | 6  | 5    | 3     | 2    | 0     |     |
|     | 001000     |     | rt       |    | rs |    |    | 0011000 |   | 10 | )0 |      | 111   |      | 111   | ]   |
| L   | 6          |     | 5        |    | 5  |    |    | 7       |   | 3  | 3  |      | 3     |      | 3     | 1   |
|     | Format: BI | TRE | V rt, rs |    |    |    |    |         |   |    |    |      |       |      |       | DSP |

#### Purpose: Bit-Reverse Halfword

To reverse the order of the bits of the least-significant halfword in the specified register.

### **Description:** rt ← rs<sub>0..15</sub>

The right-most halfword value in register *rs* is bit-reversed into the right-most halfword position in the destination register *rt*. The 16 most-significant bits of the destination register are zero-filled.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

$$\begin{split} & \text{ValidateAccessToDSPResources()} \\ & \text{temp}_{15..0} \leftarrow \text{GPR[rs]}_{0..15} \\ & \text{GPR[rt]}_{31..0} \leftarrow \text{O}^{16} \mid \mid \text{temp}_{15..0} \end{split}$$

#### **Exceptions:**

## **BITREV**

**Bit-Reverse Halfword** 

| BF | POSGE32C       |     |           | Bra | nch on Grea | ater | Than  | or Equal To Value 32 in DSPC | ontrol Pos Field |
|----|----------------|-----|-----------|-----|-------------|------|-------|------------------------------|------------------|
|    | 31             | 26  | 25        | 21  | 20          | 16   | 15 14 | 13                           | 1 0              |
|    | P32A<br>100010 |     | X         |     | 00100       |      | 01    | s[13:1]                      | s<br>[14]        |
|    | 6              |     | 5         |     | 5           |      | 2     | 13                           | 1                |
|    | Format: BP     | ose | E32C offs | et  |             |      |       |                              | DSP-R3           |

Purpose: Branch on Greater Than or Equal To Value 32 in DSPControl Pos Field

Perform a PC-relative branch if the value of the pos field in the DSPControl register is greater than or equal to 32.

**Description:** if  $(DSPControl_{pos:5..0} \ge 32)$  then goto PC+offset

First, the *offset* argument is left-shifted by one bit to form a 17-bit signed integer value. This value is added to the address of the instruction immediately following the branch to form a target branch address. Then, if the value of the pos field of the *DSPControl* register is greater than or equal to 32, the branch is taken and execution begins from the target address.

#### **Restrictions:**

Any instruction may be placed at PC + 4, where PC is that of the branch. An exception on such an instruction does not affect CP0 CAUSE<sub>BD</sub>, and CP0 EPC is that of instruction in slot after branch.

#### Availability:

This instruction is introduced by and required as of Revision 3 of the DSP Module.

#### **Operation:**

```
I: ValidateAccessToDSPResources()
    se_offset<sub>GPRLEN..0</sub> ← ( offset<sub>15</sub> )<sup>GPRLEN-17</sup> || offset<sub>15..0</sub> || 0<sup>1</sup>
    branch_condition ← ( DSPControl<sub>pos:5..0</sub> >= 32 ? 1 : 0 )
I+1: if ( branch_condition = 1 ) then
    PC<sub>GPRLEN..0</sub> ← PC<sub>GPRLEN..0</sub> + se_offset<sub>GPRLEN..0</sub>
endif
```

#### **Exceptions:**

Reserved Instruction, DSP Disabled

#### **Programming Notes:**

With the 17-bit signed instruction offset, the conditional branch range is  $\pm 64$  Kbytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside of this range.

**BPOSGE32C** 

Branch on Greater Than or Equal To Value 32 in DSPControl Pos Field

| CMP.cond.PH     |       |         | Compare Vec | ctors of Signed Inte | eger Halfword Values |
|-----------------|-------|---------|-------------|----------------------|----------------------|
| 31<br>CMP.EQ.PH | 26 25 | 21 20 1 | 6 15        | 10 9                 | 3 2 0                |
| P32A<br>0010000 | rt    | rs      | X           | 00000                | 00 101               |
| CMP.LE.PH       |       |         |             |                      |                      |
| P32A<br>0010000 | rt    | rs      | x           | 00100                | 00 101               |
| CMP.LT.PH       | I     |         |             |                      |                      |
| P32A<br>0010000 | rt    | rs      | X           | 00010                | 00 101               |
| 6               | 5     | 5       | 6           | 7                    | 3                    |

| Format: | CMP.cond.PH |     |    |
|---------|-------------|-----|----|
|         | CMP.EQ.PH   | rs, | rt |
|         | CMP.LT.PH   | rs, | rt |
|         | CMP.LE.PH   | rs, | rt |

Purpose: Compare Vectors of Signed Integer Halfword Values

Perform an element-wise comparison of two vectors of two signed integer halfwords, recording the results of the comparison in condition code bits.

**Description:** DSPControl<sub>ccond:25..24</sub>  $\leftarrow$  (rs<sub>31..16</sub> cond rt<sub>31..16</sub>) || (rs<sub>15..0</sub> cond rt<sub>15..0</sub>)

The two signed integer halfword elements in register *rs* are compared with the corresponding signed integer halfword element in register *rt*. The two 1-bit boolean comparison results are written to bits 24 and 25 of the *DSPControl* register's 4-bit condition code field. The values of the two remaining condition code bits (bits 26 through 27 of the *DSPControl* register) are **UNPREDICTABLE**.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
CMP.EQ.PH
    ValidateAccessToDSPResources()
    ccB \leftarrow GPR[rs]_{31..16} EQ GPR[rt]_{31..16}
    ccA \leftarrow GPR[rs]_{15..0} EQ GPR[rt]_{15..0}
    DSPControl_{ccond:25..24} \leftarrow ccB \mid ccA
    CMP.LT.PH
    ValidateAccessToDSPResources()
    ccB \leftarrow GPR[rs]_{31..16} LT GPR[rt]_{31..16}
    \texttt{ccA} \leftarrow \texttt{GPR[rs]}_{15..0} \texttt{LT GPR[rt]}_{15..0}
    \text{DSPControl}_{\text{ccond}:25..24} \leftarrow \text{ccB} \mid \mid \text{ccA}
    DSPControl<sub>ccond:27..26</sub> \leftarrow UNPREDICTABLE
CMP.LE.PH
    ValidateAccessToDSPResources()
    ccB \leftarrow GPR[rs]_{31..16} LE GPR[rt]_{31..16}
    ccA \leftarrow GPR[rs]_{15..0} LE GPR[rt]_{15..0}
    DSPControl_{ccond:25..24} \leftarrow ccB \mid \mid ccA
    DSPControl_{ccond:27..26} \leftarrow UNPREDICTABLE
```

DSP DSP DSP MIPS® Architecture Extension: nanoMIPS32™ DSP Technical Reference Manual — Revision 0.04

### CMP.cond.PH

**Compare Vectors of Signed Integer Halfword Values** 

## **Exceptions:**

CMPGDU.cond.QB Compare Unsigned Vector of Four Bytes and Write Result to GPR and DSPControl

| 31             | 26 | 25 | 21 | 20 | 16 1 | 5  | 11 | 10 | 9       | 32  | 0 |
|----------------|----|----|----|----|------|----|----|----|---------|-----|---|
| CMPGDU.EQ.QB   |    |    |    |    |      |    |    |    |         |     |   |
| P32A<br>001000 |    | 1  | t  | rs |      | rd |    | x  | 0110000 | 101 |   |
| CMPGDU.LE.QB   |    | 4  |    | ļ  |      |    |    |    |         |     |   |
| P32A<br>001000 |    | 1  | t  | rs |      | rd |    | x  | 1000000 | 101 |   |
| CMPGDU.LT.QB   |    | 1  |    |    |      |    |    |    |         |     |   |
| P32A<br>001000 |    | 1  | t  | rs |      | rd |    | x  | 0111000 | 101 |   |
| 6              |    |    | 5  | 5  |      | 5  | I  | 1  | 7       | 3   |   |

| CMPGDU.EQ.QB rd, | rs, rt | DSP-R2 |
|------------------|--------|--------|
| CMPGDU.LT.QB rd, | rs, rt | DSP-R2 |
| CMPGDU.LE.QB rd, | rs, rt | DSP-R2 |

#### Purpose: Compare Unsigned Vector of Four Bytes and Write Result to GPR and DSPControl

Compare two vectors of four unsigned bytes each, recording the comparison results in condition code bits that are written to both the specified destination GPR and the condition code bits in the DSPControl register.

```
Description: DSPControl[ccond]<sub>27.24</sub> \leftarrow (rs<sub>31.24</sub> cond rt<sub>31.24</sub>) || (rs<sub>23.16</sub> cond rt<sub>23.16</sub>) || (rs<sub>15.8</sub> cond rt<sub>15.8</sub>) || (rs<sub>7.0</sub> cond rt<sub>7.0</sub>);
rd \leftarrow 0<sup>(GPRLEN-4)</sup> || DSPControl[ccond]<sub>27.24</sub>
```

Each of the unsigned byte elements in register *rs* are compared with the corresponding unsigned byte elements in register *rt*. The four 1-bit boolean comparison results are written to the four least-significant bits of destination register *rd* and to bits 24 through 27 of the *DSPControl* register's 4-bit condition code field. The remaining bits in destination register *rd* are set to zero.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
\begin{split} & \mathsf{CMPGDU}.\mathsf{EQ}.\mathsf{QB} \\ & \mathsf{ValidateAccessToDSP2Resources()} \\ & \mathsf{ccD} \leftarrow \mathsf{GPR}[\mathsf{rs}]_{31..24} \; \mathsf{EQ} \; \mathsf{GPR}[\mathsf{rt}]_{31..24} \\ & \mathsf{ccC} \leftarrow \mathsf{GPR}[\mathsf{rs}]_{23..16} \; \mathsf{EQ} \; \mathsf{GPR}[\mathsf{rt}]_{23..16} \\ & \mathsf{ccB} \leftarrow \mathsf{GPR}[\mathsf{rs}]_{15..8} \; \mathsf{EQ} \; \mathsf{GPR}[\mathsf{rt}]_{15..8} \\ & \mathsf{ccA} \leftarrow \mathsf{GPR}[\mathsf{rs}]_{7..0} \; \mathsf{EQ} \; \mathsf{GPR}[\mathsf{rt}]_{7..0} \\ & \mathsf{DSPControl}_{\mathsf{cc}:27..24} \leftarrow \mathsf{ccD} \; || \; \mathsf{ccC} \; || \; \mathsf{ccB} \; || \; \mathsf{ccA} \\ & \mathsf{GPR}[\mathsf{rd}]_{31..0} \leftarrow 0 \; (\mathsf{GPRLEN-4}) \; || \; \mathsf{ccD} \; || \; \mathsf{ccC} \; || \; \mathsf{ccB} \; || \; \mathsf{ccA} \\ & \mathsf{CMPGDU}.\mathsf{LT.QB} \\ & \mathsf{ValidateAccessToDSP2Resources()} \\ & \mathsf{ccD} \leftarrow \mathsf{GPR}[\mathsf{rs}]_{31..24} \; \mathsf{LT} \; \mathsf{GPR}[\mathsf{rt}]_{31..24} \\ & \mathsf{ccC} \leftarrow \mathsf{GPR}[\mathsf{rs}]_{15..8} \; \mathsf{LT} \; \mathsf{GPR}[\mathsf{rt}]_{15..8} \\ & \mathsf{ccA} \leftarrow \mathsf{GPR}[\mathsf{rs}]_{15..8} \; \mathsf{LT} \; \mathsf{GPR}[\mathsf{rt}]_{15..8} \\ & \mathsf{ccA} \leftarrow \mathsf{GPR}[\mathsf{rs}]_{15..0} \; \mathsf{LT} \; \mathsf{GPR}[\mathsf{rt}]_{7..0} \\ & \mathsf{DSPControl}_{\mathsf{cc}:27..24} \leftarrow \mathsf{ccD} \; || \; \mathsf{ccC} \; || \; \mathsf{ccB} \; || \; \mathsf{ccA} \\ & \mathsf{GPR}[\mathsf{rd}]_{31..0} \leftarrow 0 \; (\mathsf{GPRLEN-4}) \; || \; \mathsf{ccD} \; || \; \mathsf{ccC} \; || \; \mathsf{ccB} \; || \; \mathsf{ccA} \\ & \mathsf{GPR}[\mathsf{rd}]_{31..0} \leftarrow 0 \; (\mathsf{GPRLEN-4}) \; || \; \mathsf{ccD} \; || \; \mathsf{ccC} \; || \; \mathsf{ccB} \; || \; \mathsf{ccA} \\ & \mathsf{GPR}[\mathsf{rd}]_{31..0} \leftarrow 0 \; (\mathsf{GPRLEN-4}) \; || \; \mathsf{ccD} \; || \; \mathsf{ccC} \; || \; \mathsf{ccB} \; || \; \mathsf{ccA} \\ & \mathsf{GPR}[\mathsf{rd}]_{31..0} \leftarrow 0 \; (\mathsf{GPRLEN-4}) \; || \; \mathsf{ccD} \; || \; \mathsf{ccC} \; || \; \mathsf{ccB} \; || \; \mathsf{ccA} \\ & \mathsf{CCA} \; || \; \mathsf{ccB} \; || \; \mathsf{ccA} \; || \; \mathsf{ccC} \; || \; \mathsf{ccB} \; || \; \mathsf{ccA} \\ & \mathsf{GPR}[\mathsf{rd}]_{31..0} \; \langle \mathsf{CC} \; || \; \mathsf{ccB} \; || \; \mathsf{ccA} \; || \; \mathsf{ccC} \; || \; \mathsf{ccB} \; || \; \mathsf{ccA} \\ & \mathsf{CCA} \; || \; \mathsf{ccB} \; || \; \mathsf{ccA} \; || \; \mathsf{ccC} \; || \; \mathsf{ccB} \; || \; \mathsf{ccA} \\ & \mathsf{CCA} \; || \; \mathsf{CCA} \; || \; \mathsf{ccB} \; || \; \mathsf{ccA} \; || \; \mathsf{ccB} \; || \; \mathsf{ccA} \; || \; \mathsf{ccA} \; || \; \mathsf{ccB} \; || \; \mathsf{ccA} \; || \; \mathsf{ccB} \; || \; \mathsf{ccA} \; || \; \mathsf{ccA} \; || \; \mathsf{ccB} \;
```

CMPGDU.cond.QB Compare Unsigned Vector of Four Bytes and Write Result to GPR and DSPControl

## **Exceptions:**

CMPGU.cond.QB

Compare Vectors of Unsigned Byte Values and Write Results to a GPR

| 31             | 26 | 25 | 21 | 20 | 16 | 15 | 11 | 10 | 9       | 32 | 0   |
|----------------|----|----|----|----|----|----|----|----|---------|----|-----|
| CMPGU.EQ.QB    |    |    |    |    |    |    |    |    |         |    |     |
| P32A<br>001000 |    | r  | t  | rs |    | rd |    | x  | 0011000 |    | 101 |
| CMPGU.LE.QB    |    | Į  |    | ļ  |    | ļ  |    | LI |         | Į  |     |
| P32A<br>001000 |    | r  | t  | rs |    | rd |    | x  | 0101000 |    | 101 |
| CMPGU.LT.QB    |    | ł  |    | 1  |    | ļ  |    |    |         | +  |     |
| P32A<br>001000 |    | r  | t  | rs |    | rd |    | x  | 0100000 |    | 101 |
| 6              |    | 5  | 5  | 5  |    | 5  |    | 1  | 7       |    | 3   |

| Format: | CMPGU.cond.QB | В          |     |
|---------|---------------|------------|-----|
|         | CMPGU.EQ.QB   | rd, rs, rt | DSP |
|         | CMPGU.LT.QB   | rd, rs, rt | DSP |
|         | CMPGU.LE.QB   | rd, rs, rt | DSP |

Purpose: Compare Vectors of Unsigned Byte Values and Write Results to a GPR

Perform an element-wise comparison of two vectors of unsigned bytes, recording the results of the comparison in condition code bits that are written to the specified GPR.

**Description:**  $rd \leftarrow (rs_{31..24} \text{ cond } rt_{31..24}) || (rs_{23..16} \text{ cond } rt_{23..16}) || (rs_{15..8} \text{ cond } rt_{15..8}) || (rs_{7..0} \text{ cond } rt_{7..0})$ 

Each of the unsigned byte elements in register *rs* are compared with the corresponding unsigned byte elements in register *rt*. The four 1-bit boolean comparison results are written to the four least-significant bits of destination register *rd*. The remaining bits in *rd* are set to zero.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
CMPGU.EQ.QB
      ValidateAccessToDSPResources()
      ccD \leftarrow GPR[rs]_{31..24} EQ GPR[rt]_{31..24}
      ccC \leftarrow GPR[rs]_{23..16} EQ GPR[rt]_{23..16}
      ccB \leftarrow GPR[rs]_{15..8} EQ GPR[rt]_{15..8}
       \begin{array}{c} \text{ccA} \leftarrow \text{GPR}\left[\text{rs}\right]_{7..0} \text{ EQ GPR}\left[\text{rt}\right]_{7..0} \\ \text{GPR}\left[\text{rd}\right]_{31..0} \leftarrow 0^{(\text{GPRLEN-4})} \mid\mid \text{ccD} \mid\mid \text{ccC} \mid\mid \text{ccB} \mid\mid \text{ccA} \end{array} 
CMPGU.LT.QB
      ValidateAccessToDSPResources()
      ccD \leftarrow GPR[rs]_{31..24} LT GPR[rt]_{31..24}
      ccC \leftarrow GPR[rs]_{23..16} LT GPR[rt]_{23..16}
       ccB \leftarrow GPR[rs]_{15..8} LT GPR[rt]_{15..8}
       \begin{array}{c} \text{ccA} \leftarrow \text{GPR}\left[\text{rs}\right]_{7..0} \text{ LT GPR}\left[\text{rt}\right]_{7..0} \\ \text{GPR}\left[\text{rd}\right]_{31..0} \leftarrow 0^{(\text{GPRLEN-4})} \mid\mid \text{ccD} \mid\mid \text{ccC} \mid\mid \text{ccB} \mid\mid \text{ccA} \end{array} 
CMPGU.LE.QB
      ValidateAccessToDSPResources()
      ccD \leftarrow GPR[rs]_{31..24} LE GPR[rt]_{31..24}
      ccC \leftarrow GPR[rs]_{23..16} LE GPR[rt]_{23..16}
```

### CMPGU.cond.QB

```
\begin{array}{l} \text{ccB} \leftarrow \text{GPR}\left[\text{rs}\right]_{15..8} \text{ LE } \text{GPR}\left[\text{rt}\right]_{15..8} \\ \text{ccA} \leftarrow \text{GPR}\left[\text{rs}\right]_{7..0} \text{ LE } \text{GPR}\left[\text{rt}\right]_{7..0} \\ \text{GPR}\left[\text{rd}\right]_{31..0} \leftarrow 0^{\left(\text{GPRLEN-4}\right)} || \text{ ccD } || \text{ ccC } || \text{ ccB } || \text{ ccA} \end{array}
```

## **Exceptions:**

| CMPU.cond.QB     |       |          | Comp  | are | Vectors of Unsigne | d Byte Values |
|------------------|-------|----------|-------|-----|--------------------|---------------|
| 31<br>CMPU.EQ.QB | 26 25 | 21 20 16 | 15 11 | 10  | 9                  | 3 2 0         |
| P32A<br>001000   | rt    | rs       | x     | x   | 1001000            | 101           |
| CMPU.LE.QB       |       |          |       |     |                    |               |
| P32A<br>001000   | rt    | rs       | x     | x   | 1011000            | 101           |
| CMPU.LT.QB       |       | •        | Į     |     | 1                  | -             |
| P32A<br>001000   | rt    | rs       | x     | x   | 1010000            | 101           |
| 6                | 5     | 5        | 5     | 1   | 7                  | 3             |

| Format: | CMPU.cond.QB |     |    |
|---------|--------------|-----|----|
|         | CMPU.EQ.QB   | rs, | rt |
|         | CMPU.LT.QB   | rs, | rt |
|         | CMPU.LE.QB   | rs, | rt |

Purpose: Compare Vectors of Unsigned Byte Values

Perform an element-wise comparison of two vectors of unsigned bytes, recording the results of the comparison in condition code bits.

**Description:** DSPControl<sub>ccond:27..24</sub>  $\leftarrow$  (rs<sub>31..24</sub> cond rt<sub>31..24</sub>) || (rs<sub>23..16</sub> cond rt<sub>23..16</sub>) || (rs<sub>15..8</sub> cond rt<sub>15..8</sub>) || (rs<sub>7..0</sub> cond rt<sub>7..0</sub>)

Each of the unsigned byte elements in register *rs* are compared with the corresponding unsigned byte elements in register *rt*. The four 1-bit boolean comparison results are written to bits 24 through 27 of the *DSPControl* register's 4-bit condition code field.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
CMPU.EQ.QB
    ValidateAccessToDSPResources()
    ccD \leftarrow GPR[rs]_{31..24} EQ GPR[rt]_{31..24}
    ccC \leftarrow GPR[rs]_{23..16} EQ GPR[rt]_{23..16}
    ccB \leftarrow GPR[rs]_{15..8} EQ GPR[rt]_{15..8}
     ccA \leftarrow GPR[rs]_{7..0} EQ GPR[rt]_{7..0}
    \text{DSPControl}_{\text{ccond}:27..24} \leftarrow \text{ccD} \mid\mid \text{ccC} \mid\mid \text{ccB} \mid\mid \text{ccA}
CMPU.LT.QB
    ValidateAccessToDSPResources()
    ccD \leftarrow GPR[rs]_{31..24} LT GPR[rt]_{31..24}
    ccC \leftarrow GPR[rs]_{23..16} LT GPR[rt]_{23..16}
     ccB \leftarrow GPR[rs]_{15..8} LT GPR[rt]_{15..8}
     ccA \leftarrow GPR[rs]_{7..0} LT GPR[rt]_{7..0}
    \text{DSPControl}_{\text{ccond}:27..24} \leftarrow \text{ccD} \mid \mid \text{ccC} \mid \mid \text{ccB} \mid \mid \text{ccA}
CMPU.LE.QB
    ValidateAccessToDSPResources()
    ccD \leftarrow GPR[rs]_{31..24} LE GPR[rt]_{31..24}
    ccC \leftarrow GPR[rs]_{23..16} LE GPR[rt]_{23..16}
```

DSP DSP DSP

## CMPU.cond.QB

## **Compare Vectors of Unsigned Byte Values**

```
\begin{array}{l} \texttt{ccB} \leftarrow \texttt{GPR[rs]}_{15..8} \texttt{ LE } \texttt{GPR[rt]}_{15..8} \\ \texttt{ccA} \leftarrow \texttt{GPR[rs]}_{7..0} \texttt{ LE } \texttt{GPR[rt]}_{7..0} \\ \texttt{DSPControl}_{\texttt{ccond:}27..24} \leftarrow \texttt{ccD} \mid\mid \texttt{ccC} \mid\mid \texttt{ccB} \mid\mid \texttt{ccA} \end{array}
```

## **Exceptions:**

| DF | PA.W.PH        |     |             | I  | Dot Product wi | th | Acc  | umula | te on V | ec | tor Inte | ege | er Hal | fwo | rd       | Eleme | nts   |
|----|----------------|-----|-------------|----|----------------|----|------|-------|---------|----|----------|-----|--------|-----|----------|-------|-------|
|    | 31             | 26  | 25          | 21 | 20 16          | 1  | 5 14 | 13 12 | 11      | 9  | 8        | 6   | 5      | 3   | 2        | 0     |       |
|    | P32A<br>001000 |     | rt          |    | rs             |    | ac   | 00    | 000     |    | 010      |     | 11     | 1   |          | 111   |       |
|    | 6              |     | 5           |    | 5              |    | 2    | 2     | 3       |    | 3        |     | 3      |     | <u> </u> | 3     |       |
|    | Format: DP     | A.W | .PH ac, rs, | rt | 5              |    |      |       |         |    |          |     |        |     |          | D     | SP-R2 |

Purpose: Dot Product with Accumulate on Vector Integer Halfword Elements

Generate the dot-product of two integer halfword vector elements using full-size intermediate products and then accumulate into the specified accumulator register.

**Description:** ac  $\leftarrow$  ac + ((rs<sub>31..16</sub> \* rt<sub>31..16</sub>) + (rs<sub>15..0</sub> \* rt<sub>15..0</sub>))

Each of the two halfword integer values from register rt is multiplied with the corresponding halfword element from register rs to create two integer word results. These two products are summed to generate a dot-product result, which is then accumulated into the specified 64-bit H/LO accumulator, creating a 64-bit integer result.

The value of ac selects an accumulator numbered from 0 to 3. When ac=0, this refers to the original HI/LO register pair of the MIPS32 architecture.

This instruction does not set any bits of the ouflag field in the DSPControl register.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \text{ValidateAccessToDSP2Resources()} \\ \text{tempB}_{31..0} \leftarrow (\text{GPR[rs]}_{31..16} * \text{GPR[rt]}_{31..16}) \\ \text{tempA}_{31..0} \leftarrow (\text{GPR[rs]}_{15..0} * \text{GPR[rt]}_{15..0}) \\ \text{dotp}_{32..0} \leftarrow (\text{tempB}_{31} \mid \mid \text{tempB}_{31..0}) + (\text{tempA}_{31} \mid \mid \text{tempA}_{31..0}) \\ \text{acc}_{63..0} \leftarrow (\text{HI[ac]}_{31..0} \mid \mid \text{LO[ac]}_{31..0}) + ((\text{dotp}_{32})^{31} \mid \mid \text{dotp}_{32..0}) \\ (\text{HI[ac]}_{31..0} \mid \mid \text{LO[ac]}_{31..0}) \leftarrow \text{acc}_{63..32} \mid \mid \text{acc}_{31..0} \end{array}
```

#### **Exceptions:**

## DPA.W.PH

Dot Product with Accumulate on Vector Integer Halfword Elements

| PAQ_S.W.I   | PH   |            |      | Dot Produ | ct v | vith | A  | cumu  | lation | on | Fract | ion | al Ha | alfwo | ord | Eleme | nts |
|-------------|------|------------|------|-----------|------|------|----|-------|--------|----|-------|-----|-------|-------|-----|-------|-----|
| 31          | 26   | 25         | 21 2 | 20        | 16   | 15   | 14 | 13 12 | 11     | 9  | 8     | 6   | 5     | 3     | 2   | 0     |     |
| P32<br>0010 |      | rt         |      | rs        |      | ac   | с  | 00    | 001    |    | 010   | )   | 1     | 11    |     | 111   | ]   |
| 6           |      | 5          |      | 5         |      | 2    |    | 2     | 3      |    | 3     |     | 1     | 3     | -1  | 3     |     |
| Format      | DPAQ | S.W.PH ac, | rs,  | rt        |      |      |    |       |        |    |       |     |       |       |     |       | D   |

Purpose: Dot Product with Accumulation on Fractional Halfword Elements

Element-wise multiplication of two vectors of fractional halfword elements and accumulation of the accumulated 32bit intermediate products into the specified 64-bit accumulator register, with saturation.

**Description:** ac  $\leftarrow$  ac + (sat32(rs<sub>31..16</sub> \* rt<sub>31..16</sub>) + sat32(rs<sub>15..0</sub> \* rt<sub>15..0</sub>))

Each of the two Q15 fractional word values from registers *rt* and *rs* are multiplied together, and the results left-shifted by one bit position to generate two Q31 fractional format intermediate products. If both multiplicands for either of the multiplications are equal to -1.0 (0x8000 hexadecimal), the resulting intermediate product is saturated to the maximum positive Q31 fractional value (0x7FFFFFF hexadecimal).

The two intermediate products are then sign-extended and summed to generate a 64-bit, Q32.31 fractional format dotproduct result that is accumulated in to the specified 64-bit *HI/LO* accumulator to produce a f inal Q32.31 fractional result.

The value of *ac* can range from 0 to 3; a v alue of 0 refers to the original *HI/LO* register pair of the MIPS32 architecture.

If saturation occurs as a result of a halfword multiplication, a 1 is written to one of bits 16 through 19 of the *DSPControl* register, within the outflag field. The value of *ac* determines which of these bits is set: bit 16 corresponds to *ac0*, bit 17 to *ac1*, bit 18 to *ac2*, and bit 19 to *ac3*.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{tempB}_{31..0} \leftarrow \mbox{multiplyQ15Q15( ac, GPR[rs]_{31..16}, GPR[rt]_{31..16} )} \\ \mbox{tempA}_{31..0} \leftarrow \mbox{multiplyQ15Q15( ac, GPR[rs]_{15..0}, GPR[rt]_{15..0} )} \\ \mbox{dotp}_{63..0} \leftarrow (\mbox{(tempB}_{31})^{32} \ || \mbox{tempB}_{31..0} ) + (\mbox{(tempA}_{31})^{32} \ || \mbox{tempA}_{31..0} ) \\ \mbox{tempC}_{63..0} \leftarrow (\mbox{HI[ac]}_{31..0} \ || \mbox{LO[ac]}_{31..0} ) + \mbox{dotp}_{63..0} \\ (\mbox{HI[ac]}_{31..0} \ || \mbox{LO[ac]}_{31..0} ) \leftarrow \mbox{tempC}_{63..32} \ || \mbox{tempC}_{31..0} \\ \mbox{function multiplyQ15Q15( acc_{1..0}, a_{15..0}, b_{15..0} ) \\ \mbox{if ( } a_{15..0} \ = \mbox{0x8000 ) and ( } b_{15..0} \ = \mbox{0x8000 ) then} \\ \mbox{temp}_{31..0} \leftarrow \mbox{0x7FFFFFF} \\ \mbox{DSPControl}_{ouflag:16+acc} \leftarrow 1 \\ \mbox{else} \\ \mbox{temp}_{31..0} \leftarrow (\mbox{(} a_{15..0} \ * \mbox{b}_{15..0} \ ) << 1 \\ \mbox{endif} \\ \mbox{return temp}_{31..0} \\ \mbox{endfunction multiplyQ15Q15} \end{array}
```

#### **Exceptions**:

# DPAQ\_S.W.PH

**Dot Product with Accumulation on Fractional Halfword Elements** 

| DP | AQ_SA.L.W      |     |            |    | Do   | t Pr | oduct | with A | ccumula | te on Fra | ictional V | Vord | Elem | ent |
|----|----------------|-----|------------|----|------|------|-------|--------|---------|-----------|------------|------|------|-----|
|    | 31             | 26  | 25         | 21 | 20   | 16   | 15 14 | 13 12  | 11 9    | 8 6       | 5 3        | 8 2  | 0    |     |
|    | P32A<br>001000 |     | rt         |    | rs   |      | ac    | 01     | 001     | 010       | 111        |      | 111  |     |
| L  | 6              |     | 5          |    | 5    |      | 2     | 2      | 3       | 3         | 3          |      | 3    |     |
|    | Format: DI     | PAQ | SA.L.W ac, | rs | , rt |      |       |        |         |           |            |      |      | DSP |

Purpose: Dot Product with Accumulate on Fractional Word Element

Multiplication of two fractional word elements, accumulating the product to the specified 64-bit accumulator register, with saturation.

**Description:** ac  $\leftarrow$  sat64(ac + sat32(rs<sub>31.0</sub> \* rt<sub>31.0</sub>))

The intermediate product is then added to the specified 64-bit *HI/LO* accumulator, creating a Q63 fractional result. If the accumulation results in overflow or underflow, the accumulator is saturated to either the maximum positive or minimum negative Q63 fractional value (0x80000000000000 hexadecimal), respectively.

The value of *ac* can range from 0 to 3; a v alue of 0 refers to the original *HI/LO* register pair of the MIPS32 architecture.

If saturation occurs, a 1 is written to one of bits 16 through 19 of the *DSPControl* register, within the ouflag field. The value of *ac* determines which of these bits is set: bit 16 corresponds to *ac0*, bit 17 to *ac1*, bit 18 to *ac2*, and bit 19 to *ac3*.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
ValidateAccessToDSPResources()
dotp_{63..0} \leftarrow multiplyQ31Q31(ac, GPR[rs]_{31..0}, GPR[rt]_{31..0})
\mathsf{temp}_{64..0} \leftarrow \mathsf{HI}[\mathsf{ac}]_{31} \mid \mid \mathsf{HI}[\mathsf{ac}]_{31..0} \mid \mid \mathsf{LO}[\mathsf{ac}]_{31..0}
temp_{64..0} \leftarrow temp_{64..0} + dotp_{63..0}
if ( \texttt{temp}_{64} \neq \texttt{temp}_{63} ) then
    if (temp_{64} = 1) then
         else
        endif
    DSPControl_{ouflag:16+ac} \leftarrow 1
endif
(HI[ac]_{31..0} || LO[ac]_{31..0}) \leftarrow temp_{63..32} || temp_{31..0}
function multiplyQ31Q31( acc_{1..0}, a_{31..0}, b_{31..0} )
    if (( a_{31\ldots0} = 0x80000000 ) and ( b_{31\ldots0} = 0x80000000 )) then
         \text{DSPControl}_{\text{ouflag:16+acc}} \leftarrow 1
    else
         temp_{63..0} \leftarrow (a_{31..0} * b_{31..0}) << 1
    endif
```

# DPAQ\_SA.L.W

# **Dot Product with Accumulate on Fractional Word Element**

return  ${\tt temp}_{63..0}$  endfunction multiplyQ31Q31

## **Exceptions:**

| DPAQX_S.W    | .PH    |          | Cros  | s Dot Pi | roduct | with A | ccumi | lation | on | Fractio | ona | al Half | woi | rd E | leme |
|--------------|--------|----------|-------|----------|--------|--------|-------|--------|----|---------|-----|---------|-----|------|------|
| 31           | 26     | 25       | 21    | 20       | 16     | 15 14  | 13 12 | 11     | 9  | 8       | 6   | 5       | 3   | 2    | 0    |
| P32/<br>0010 |        | rt       |       | rs       | 3      | ac     | 10    | 001    |    | 010     |     | 111     |     |      | 111  |
| 6            |        | 5        |       | 5        |        | 2      | 2     | 3      |    | 3       |     | 3       |     |      | 3    |
| Format       | : DPAQ | K_S.W.PH | ac, r | s, rt    |        |        |       |        |    |         |     |         |     |      | DS   |

**Purpose:** Cross Dot Product with Accumulation on Fractional Halfword Elements

Element-wise cross multiplication of two vectors of fractional halfword elements and accumulation of the 32-bit intermediate products into the specified 64-bit accumulator register, with saturation.

**Description:** ac  $\leftarrow$  ac + (sat32(rs<sub>31..16</sub> \* rt<sub>15..0</sub>) + sat32(rs<sub>15..0</sub> \* rt<sub>31..16</sub>))

The left Q15 fractional word value from registers rt is multiplied with the right halfword element from register rs and the result left-shifted by one bit position to generate a Q31 fractional format intermediate product. Similarly, the right Q15 fractional word value from registers rt is multiplied with the left halfword element from register rs and the result left-shifted by one bit position to generate a Q31 fractional format intermediate product. If both multiplicands for either of the multiplications are equal to -1.0 (0x8000 hexadecimal), the resulting intermediate product is saturated to the maximum positive Q31 fractional value (0x7FFFFFF hexadecimal).

The two intermediate products are then sign-extended and summed to generate a 64-bit, Q32.31 fractional format dotproduct result that is accumulated in to the specified 64-bit *HI/LO* accumulator to produce a f inal Q32.31 fractional result.

The value of *ac* can range from 0 to 3; a v alue of 0 refers to the original *HI/LO* register pair of the MIPS32 architecture.

If saturation occurs as a result of a half word multiplication, a 1 is written to one of bits 16 through 19 of the *DSPControl* register, within the outflag field. The value of *ac* determines which of these bits is set: bit 16 corresponds to *ac0*, bit 17 to *ac1*, bit 18 to *ac2*, and bit 19 to *ac3*.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{tempB}_{31..0} \leftarrow \mbox{multiplyQ15Q15( ac, GPR[rs]_{31..16}, GPR[rt]_{15..0} )} \\ \mbox{tempA}_{31..0} \leftarrow \mbox{multiplyQ15Q15( ac, GPR[rs]_{15..0}, GPR[rt]_{31..16} )} \\ \mbox{dotp}_{63..0} \leftarrow (\mbox{(tempB}_{31})^{32} \ || \mbox{tempB}_{31..0} ) + (\mbox{(tempA}_{31})^{32} \ || \mbox{tempA}_{31..0} ) \\ \mbox{tempC}_{63..0} \leftarrow (\mbox{HI[ac]}_{31..0} \ || \mbox{LO[ac]}_{31..0} ) + \mbox{dotp}_{63..0} \\ (\mbox{HI[ac]}_{31..0} \ || \mbox{LO[ac]}_{31..0} ) \leftarrow \mbox{tempC}_{63..32} \ || \mbox{tempC}_{31..0} \\ \mbox{function multiplyQ15Q15( acc_{1..0}, \mbox{a}_{15..0}, \mbox{b}_{15..0} ) \\ \mbox{if ( } \mbox{a}_{15..0} = \mbox{0x7FFFFFF} \\ \mbox{DSPControl}_{ouflag:16+acc} \leftarrow \mbox{1} \\ \mbox{else} \\ \mbox{temp}_{31..0} \leftarrow (\mbox{a}_{15..0} \ * \mbox{b}_{15..0} ) << \mbox{1} \\ \mbox{endif} \\ \mbox{return temp}_{31..0} \\ \mbox{endfunction multiplyQ15Q15} \end{array}
```

#### **Exceptions:**

DPAQX\_S.W.PH

**Cross Dot Product with Accumulation on Fractional Halfword Elements** 

| DPAQX_SA      | A.W.PH |    | Cro | oss Do | t Product | wit | h A | ccumu | lati | on oi | n F | ractior | al I | Hal | fwo | rd I | Eleme | nts |
|---------------|--------|----|-----|--------|-----------|-----|-----|-------|------|-------|-----|---------|------|-----|-----|------|-------|-----|
| 31            | 26     | 25 | 21  | 20     | 16        | 15  | 14  | 13 12 | 11   |       | 9   | 8       | 6    | 5   |     | 3    | 2     | 0   |
| P32A<br>00100 |        |    | rt  |        | rs        | æ   | ac  | 11    |      | 001   |     | 010     |      |     | 111 |      | 11    | 1   |
| 6             |        |    | 5   |        | 5         |     | 2   | 2     |      | 3     |     | 3       |      |     | 3   |      | 3     |     |
| Г             | .1     |    |     |        |           |     |     |       |      |       |     |         |      |     |     |      | Б     |     |

Format: DPAQX\_SA.W.PH ac, rs, rt

DSP-R2

Purpose: Cross Dot Product with Accumulation on Fractional Halfword Elements

Element-wise cross multiplication of two vectors of fractional halfword elements and accumulation of the 32-bit intermediate products into the specified 64-bit accumulator register, with saturation of the accumulator.

```
Description: ac \leftarrow sat32(ac + (sat32(rs<sub>31..16</sub> * rt<sub>15..0</sub>) + sat32(rs<sub>15..0</sub> * rt<sub>31..16</sub>)))
```

The left Q15 fractional word value from registers rt is multiplied with the right halfword element from register rs and the result left-shifted by one bit position to generate a Q31 fractional format intermediate product. Similarly, the right Q15 fractional word value from registers rt is multiplied with the left halfword element from register rs and the result left-shifted by one bit position to generate a Q31 fractional format intermediate product. If both multiplicands for either of the multiplications are equal to -1.0 (0x8000 hexadecimal), the resulting intermediate product is saturated to the maximum positive Q31 fractional value (0x7FFFFFF hexadecimal).

The two intermediate products are then sign-extended and summed to generate a 64-bit, Q32.31 fractional format dotproduct result that is accumulated into the specified 64-bit H//LO accumulator to produce a Q32.31 fractional result. If this result is larger than or equal to +1.0, or smaller than -1.0, it is saturated to the Q31 range.

The value of *ac* can range from 0 to 3; a v alue of 0 refers to the original *HI/LO* register pair of the MIPS32 architecture.

If saturation occurs as a result of halfword multiplication or accumulation, a 1 is written to one of bits 16 through 19 of the *DSPControl* register, within the ouflag field. The value of *ac* determines which of these bits is set: bit 16 corresponds to *ac0*, bit 17 to *ac1*, bit 18 to *ac2*, and bit 19 to *ac3*.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
ValidateAccessToDSPResources()

tempB<sub>31..0</sub> ← multiplyQ15Q15( ac, GPR[rs]<sub>31..16</sub>, GPR[rt]<sub>15..0</sub>)

tempA<sub>31..0</sub> ← multiplyQ15Q15( ac, GPR[rs]<sub>15..0</sub>, GPR[rt]<sub>31..16</sub>)

dotp<sub>63..0</sub> ← ( (tempB<sub>31</sub>)<sup>32</sup> || tempB<sub>31..0</sub>) + ( (tempA<sub>31</sub>)<sup>32</sup> || tempA<sub>31..0</sub>)

tempC<sub>63..0</sub> ← ( HI[ac]<sub>31..0</sub> || LO[ac]<sub>31..0</sub>) + dotp<sub>63..0</sub>

if ( tempC<sub>63</sub> = 0 ) and ( tempC<sub>62..31</sub> ≠ 0 ) then

tempC<sub>63..0</sub> = 0<sup>32</sup> || 0x7FFFFFF

DSPControl<sub>ouflag:16+acc</sub> ← 1

endif

if ( tempC<sub>63</sub> = 1) and ( tempC<sub>62..31</sub> ≠ 1<sup>32</sup>) then

tempC<sub>63..0</sub> = 1<sup>32</sup> || 0x8000000

DSPControl<sub>ouflag:16+acc</sub> ← 1

endif

( HI[ac]<sub>31..0</sub> || LO[ac]<sub>31..0</sub> ) ← tempC<sub>63..32</sub> || tempC<sub>31..0</sub>

function multiplyQ15Q15( acc<sub>1..0</sub>, a<sub>15..0</sub>, b<sub>15..0</sub> )

if ( a<sub>15..0</sub> = 0x8000 ) and ( b<sub>15..0</sub> = 0x8000 ) then

temp<sub>31..0</sub> ← 0x7FFFFFF

DSPControl<sub>ouflag:16+acc</sub> ← 1
```

# DPAQX\_SA.W.PH

```
else

temp<sub>31..0</sub> \leftarrow ( a_{15..0} * b_{15..0} ) << 1

endif

return temp<sub>31..0</sub>

endfunction multiplyQ15Q15
```

# **Exceptions:**

| DPAU.H.Q | BL            |              | Do   |    |       | cumu  | late on v | ector Uns | signea By | te Eleme | ents |
|----------|---------------|--------------|------|----|-------|-------|-----------|-----------|-----------|----------|------|
| 31       | 26            | 25 2         | 1 20 | 16 | 15 14 | 13 12 | 11 9      | 8 6       | 5 3       | 2 0      |      |
|          | P32A<br>01000 | rt           |      | rs | ac    | 10    | 000       | 010       | 111       | 111      |      |
|          | 6             | 5            |      | 5  | 2     | 2     | 3         | 3         | 3         | 3        |      |
| Form     | nat: DPAU.    | H.QBL ac, rs | , rt |    |       |       |           |           |           |          | DS   |

Det Dreduct with Accumulate on Vector Uncie

Purpose: Dot Product with Accumulate on Vector Unsigned Byte Elements

Element-wise multiplication of the two left-most elements of the four elements of each of two vectors of unsigned bytes, accumulating the sum of the products into the specified 64-bit accumulator register.

**Description:** ac  $\leftarrow$  ac + zero\_extend((rs<sub>31..24</sub> \* rt<sub>31..24</sub>) + (rs<sub>23..16</sub> \* rt<sub>23..16</sub>))

The two left-most elements of the four unsigned byte elements of each of registers *rt* and *rs* are multiplied together using unsigned arithmetic to generate two 16-bit unsigned intermediate products. The intermediate products are then zero-extended to 64 bits and accumulated into the specified 64-bit *HI/LO* accumulator.

The value of *ac* can range from 0 to 3; a value of 0 refers to the original *HI/LO* register pair of the MIPS32 architecture.

This instruction does not set any bits in the ouflag field in the DSPControl register.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{tempB}_{15..0} \leftarrow \mbox{multiplyU8U8(GPR[rs]_{31..24}, GPR[rt]_{31..24})} \\ \mbox{tempA}_{15..0} \leftarrow \mbox{multiplyU8U8(GPR[rs]_{23..16}, GPR[rt]_{23..16})} \\ \mbox{dotp}_{63..0} \leftarrow (0^{48} \mid\mid \mbox{tempB}_{15..0}) + (0^{48} \mid\mid \mbox{tempA}_{15..0}) \\ \mbox{tempC}_{63..0} \leftarrow (\mbox{HI[ac]}_{31..0} \mid\mid \mbox{Lo[ac]}_{31..0}) + \mbox{dotp}_{63..32} \mid\mid \mbox{tempC}_{31..0} \\ \mbox{(HI[ac]}_{31..0} \mid\mid \mbox{Lo[ac]}_{31..0}) \leftarrow \mbox{tempC}_{63..32} \mid\mid \mbox{tempC}_{31..0} \\ \mbox{function multiplyU8U8(a}_{7..0}, b_{7..0}) \\ \mbox{temp}_{17..0} \leftarrow (\mbox{0} \mid\mid \mbox{a}_{7..0}) \\ \mbox{return temp}_{15..0} \\ \mbox{endfunction multiplyU8U8} \end{array}
```

## **Exceptions:**

# DPAU.H.QBL

Dot Product with Accumulate on Vector Unsigned Byte Elements

| DF | AU.H.QBR       |      |              |   | Dot Product w | /ith | h Ao | cumu  | late | e on V | ect | or Uns | sig | ned E | Зy | te | Eleme | ents |
|----|----------------|------|--------------|---|---------------|------|------|-------|------|--------|-----|--------|-----|-------|----|----|-------|------|
|    | 31             | 26   | 25 2         | 1 | 20 16         | 15   | 14   | 13 12 | 11   | 9      | 8   | 6      | 5   |       | 3  | 2  | C     | I    |
|    | P32A<br>001000 |      | rt           |   | rs            | 8    | ıc   | 11    |      | 000    |     | 010    |     | 111   |    |    | 111   |      |
|    | 6              |      | 5            |   | 5             |      | 2    | 2     |      | 3      |     | 3      |     | 3     |    |    | 3     |      |
|    | Format: DI     | PAU. | H.QBR ac, rs | , | rt            |      |      |       |      |        |     |        |     |       |    |    |       | DSP  |

Purpose: Dot Product with Accumulate on Vector Unsigned Byte Elements

Element-wise multiplication of the two right-most elements of the four elements of each of two vectors of unsigned bytes, accumulating the sum of the products into the specified 64-bit accumulator register.

**Description:** ac  $\leftarrow$  ac + zero\_extend((rs<sub>15..8</sub> \* rt<sub>15..8</sub>) + (rs<sub>7..0</sub> \* rt<sub>7..0</sub>))

The two right-most elements of the four unsigned byte elements of each of registers *rt* and *rs* are multiplied together using unsigned arithmetic to generate two 16-bit unsigned intermediate products. The intermediate products are then zero-extended to 64 bits and accumulated into the specified 64-bit *HI/LO* accumulator.

The value of *ac* can range from 0 to 3; a value of 0 refers to the original *HI/LO* register pair of the MIPS32 architecture.

This instruction does not set any bits in the ouflag field in the DSPControl register.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{tempB}_{15..0} \leftarrow \mbox{multiplyU8U8(GPR[rs]_{15..8}, GPR[rs]_{15..8})} \\ \mbox{tempA}_{15..0} \leftarrow \mbox{multiplyU8U8(GPR[rs]_{7..0}, GPR[rs]_{7..0})} \\ \mbox{dotp}_{63..0} \leftarrow (0^{48} \mid | \mbox{tempB}_{15..0}) + (0^{48} \mid | \mbox{tempA}_{15..0}) \\ \mbox{tempC}_{63..0} \leftarrow (\mbox{HI}[ac]_{31..0} \mid | \mbox{LO}[ac]_{31..0}) + \mbox{dotp}_{63..32} \mid | \mbox{tempC}_{31..0} \\ \mbox{(HI}[ac]_{31..0} \mid | \mbox{LO}[ac]_{31..0}) \leftarrow \mbox{tempC}_{63..32} \mid | \mbox{tempC}_{31..0} \\ \end{array}
```

## **Exceptions:**

# DPAU.H.QBR

Dot Product with Accumulate on Vector Unsigned Byte Elements

| PAX.W.PH      |    |    | Cross | Dot P | roduct wi | th Acc | umula | te on V | ect | tor Inte | ege | r Half | wo | rd E | Elem |
|---------------|----|----|-------|-------|-----------|--------|-------|---------|-----|----------|-----|--------|----|------|------|
| 31            | 26 | 25 | 21    | 20    | 16        | 15 14  | 13 12 | 11      | 9   | 8        | 6   | 5      | 3  | 2    | (    |
| P32A<br>00100 |    |    | rt    |       | rs        | ac     | 01    | 000     |     | 010      |     | 111    |    |      | 111  |
| 6             |    | 1  | 5     | 1     | 5         | 2      | 2     | 3       |     | 3        |     | 3      |    |      | 3    |

Format: DPAX.W.PH ac, rs, rt

Purpose: Cross Dot Product with Accumulate on Vector Integer Halfword Elements

Generate the cross dot-product of two integer halfword vector elements using full-size intermediate products and then accumulate into the specified accumulator register.

**Description:** ac  $\leftarrow$  ac + ((rs<sub>31..16</sub> \* rt<sub>15..0</sub>) + (rs<sub>15..0</sub> \* rt<sub>31..16</sub>))

The left halfword integer value from register rt is multiplied with the right halfword element from register rs to create an integer word result. Similarly, the right halfword integer value from register rt is multiplied with the left halfword element from register rs to create the second integer word result. These two products are summed to generate the dotproduct result, which is then accumulated into the specified 64-bit HI/LO accumulator, creating a 64-bit integer result.

The value of ac selects an accumulator numbered from 0 to 3. When ac=0, this refers to the original HI/LO register pair of the MIPS32 architecture.

This instruction will not set any bits of the ouflag field in the DSPControl register.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become UNPREDICTABLE.

#### **Operation:**

```
ValidateAccessToDSPResources()
tempB_{31..0} \leftarrow (GPR[rs]_{31..16} * GPR[rt]_{15..0})
\begin{array}{l} \text{tempA}_{31..0} \leftarrow (\text{GPR}[\text{rs}]_{15..0} \ast \text{GPR}[\text{rt}]_{31..16}) \\ \text{dotp}_{32..0} \leftarrow ((\text{tempB}_{31}) \mid \mid \text{tempB}_{31..0}) + ((\text{tempA}_{31}) \mid \mid \text{tempA}_{31..0}) \\ \text{acc}_{63..0} \leftarrow ((\text{HI}[\text{ac}]_{31..0} \mid \mid \text{LO}[\text{acl}_{31..0})) + ((\text{dotp}_{32})^{31} \mid \mid \text{dotp}_{32..0}) \\ \end{array}
 ( \text{HI}[ac]_{31..0} || \text{LO}[ac]_{31..0} ) \leftarrow acc_{63..32} acc_{31..0}
```

### **Exceptions:**

DPAX.W.PH

Cross Dot Product with Accumulate on Vector Integer Halfword Elements

| PS.W.PH     |       |            |       | Dot Prod | luct | with S | ubtrac | t on Vect | or Intege | r Half-Wo | ord Eleme | ents |
|-------------|-------|------------|-------|----------|------|--------|--------|-----------|-----------|-----------|-----------|------|
| 31          | 26    | 25         | 21    | 20       | 16   | 15 14  | 13 12  | 11 9      | 8 6       | 5 3       | 2 0       | )    |
| P32<br>0010 |       | rt         |       | rs       |      | ac     | 00     | 010       | 010       | 111       | 111       |      |
| 6           |       | 5          |       | 5        |      | 2      | 2      | 3         | 3         | 3         | 3         |      |
| Format      | DPS.W | I.PH ac, 1 | rs, r | t        |      |        |        |           |           |           | Γ         | DSP- |

Purpose: Dot Product with Subtract on Vector Integer Half-Word Elements

Generate the dot-product of two integer halfword vector elements using full-size intermediate products and then subtract from the specified accumulator register.

**Description:** ac  $\leftarrow$  ac - ((rs<sub>31..16</sub> \* rt<sub>31..16</sub>) + (rs<sub>15..0</sub> \* rt<sub>15..0</sub>))

Each of the two halfword integer values from register rt is multiplied with the corresponding halfword element from register rs to create two integer word results. These two products are summed to generate the dot-product result, which is then subtracted from the specified 64-bit *HI/LO* accumulator, creating a 64-bit integer result.

The value of ac selects an accumulator numbered from 0 to 3. When ac=0, this refers to the original HI/LO register pair of the MIPS32 architecture.

This instruction will not set any bits of the ouflag field in the DSPControl register.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

```
 \begin{array}{l} \text{ValidateAccessToDSP2Resources()} \\ \text{tempB}_{31..0} \leftarrow (\text{GPR[rs]}_{31..16} * \text{GPR[rt]}_{31..16}) \\ \text{tempA}_{31..0} \leftarrow (\text{GPR[rs]}_{15..0} * \text{GPR[rt]}_{15..0}) \\ \text{dotp}_{32..0} \leftarrow ((\text{tempB}_{31}) \mid \mid \text{tempB}_{31..0}) + ((\text{tempA}_{31}) \mid \mid \text{tempA}_{31..0}) \\ \text{acc}_{63..0} \leftarrow (\text{HI[ac]}_{31..0} \mid \mid \text{LO[ac]}_{31..0}) - ((\text{dotp}_{32})^{31} \mid \mid \text{dotp}_{32..0}) \\ (\text{HI[ac]}_{31..0} \mid \mid \text{LO[ac]}_{31..0}) \leftarrow \text{acc}_{63..32} \mid \mid \text{acc}_{31..0} \end{array}
```

#### **Exceptions:**

# DPS.W.PH

Dot Product with Subtract on Vector Integer Half-Word Elements

| PSQ_S.W.PI     | 1    |           |      | Dot P | rodu | ct with | n Subt | raction o | on | Fractio | ona | I Half | woi | d E | leme |
|----------------|------|-----------|------|-------|------|---------|--------|-----------|----|---------|-----|--------|-----|-----|------|
| 31             | 26   | 25        | 21   | 20    | 16   | 15 14   | 13 12  | 11        | 9  | 8       | 6   | 5      | 3   | 2   | C    |
| P32A<br>001000 |      | rt        |      | rs    |      | ac      | 00     | 011       |    | 010     |     | 11     | 1   |     | 111  |
| 6              |      | 5         | I    | 5     |      | 2       | 2      | 3         |    | 3       |     | 3      |     |     | 3    |
| Format:        | DPSO | S.W.PH ac | . rs | , rt  |      |         |        |           |    |         |     |        |     |     |      |

Purpose: Dot Product with Subtraction on Fractional Halfword Elements

Element-wise multiplication of two vectors of fractional halfword elements and subtraction of the accumulated 32-bit intermediate products from the specified 64-bit accumulator register, with saturation.

**Description:** ac  $\leftarrow$  ac - (sat32(rs<sub>31..16</sub> \* rt<sub>31..16</sub>) + sat32(rs<sub>15..0</sub> \* rt<sub>15..0</sub>))

Each of the two Q15 fractional word values from registers *rt* and *rs* are multiplied together, and the results left-shifted by one bit position to generate two Q31 fractional format intermediate products. If both multiplicands for either of the multiplications are equal to -1.0 (0x8000 hexadecimal), the resulting intermediate product is saturated to the maximum positive Q31 fractional value (0x7FFFFFF hexadecimal).

The two intermediate products are then sign-extended and summed to generate a 64-bit, Q32.31 fractional format dotproduct result that is subtracted from the specified 64-bit *HI/LO* accumulator to produce a final Q32.31 fractional result.

The value of *ac* can range from 0 to 3; a v alue of 0 refers to the original *HI/LO* register pair of the MIPS32 architecture.

If saturation occurs as a result of a halfword multiplication, a 1 is written to one of bits 16 through 19 of the *DSPControl* register, within the ouflag field. The value of *ac* determines which of these bits is set: bit 16 corresponds to *ac0*, bit 17 to *ac1*, bit 18 to *ac2*, and bit 19 to *ac3*.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{tempB}_{31..0} \leftarrow \mbox{multiplyQ15Q15( ac, GPR[rs]_{31..16}, GPR[rt]_{31..16} )} \\ \mbox{tempA}_{31..0} \leftarrow \mbox{multiplyQ15Q15( ac, GPR[rs]_{15..0}, GPR[rt]_{15..0} )} \\ \mbox{dotp}_{63..0} \leftarrow (\mbox{(tempB}_{31})^{32} || \mbox{tempB}_{31..0} ) + (\mbox{(tempA}_{31})^{32} || \mbox{tempA}_{31..0} ) \\ \mbox{tempC}_{63..0} \leftarrow (\mbox{HI[ac]}_{31..0} || \mbox{LO[ac]}_{31..0} ) - \mbox{dotp}_{63..0} \\ \mbox{(HI[ac]}_{31..0} || \mbox{LO[ac]}_{31..0} ) \leftarrow \mbox{tempC}_{63..32} || \mbox{tempC}_{31..0} \end{array}
```

#### **Exceptions:**

# DPSQ\_S.W.PH

**Dot Product with Subtraction on Fractional Halfword Elements** 

| DPSQ_SA.L.W    | 1    |            |        | Dot Produ | ct with | Subtractio | n on Frac | tional Wo | rd Eleme |
|----------------|------|------------|--------|-----------|---------|------------|-----------|-----------|----------|
| 31             | 26   | 25 2       | 1 20   | 16 15 14  | 13 12   | 11 9       | 8 6       | 5 3       | 2 0      |
| P32A<br>001000 |      | rt         | rs     | ac        | 01      | 011        | 010       | 111       | 111      |
| 6              |      | 5          | 5      | 2         | 2       | 3          | 3         | 3         | 3        |
| Format:        | DPSC | SA.L.W ac, | rs, rt |           |         |            |           |           |          |

Purpose: Dot Product with Subtraction on Fractional Word Element

Multiplication of two fractional word elements, subtracting the accumulated product from the specified 64-bit accumulator register, with saturation.

**Description:** ac  $\leftarrow$  sat64(ac - sat32(rs<sub>31..0</sub> \* rt<sub>31..0</sub>))

The intermediate product is then subtracted from the specified 64-bit *HI/LO* accumulator, creating a Q63 fractional result. If the accumulation results in overflow or underflow, the accumulator is saturated to either the maximum positive or minimum negative Q63 fractional value (0x800000000000000 hexadecimal), respectively.

The value of *ac* can range from 0 to 3; a v alue of 0 refers to the original *HI/LO* register pair of the MIPS32 architecture.

If saturation occurs, a 1 is written to one of bits 16 through 19 of the *DSPControl* register, within the *ouflag* field. The value of *ac* determines which of these bits is set: bit 16 corresponds to *ac0*, bit 17 to *ac1*, bit 18 to *ac2*, and bit 19 to *ac3*.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
ValidateAccessToDSPResources()

dotp<sub>63..0</sub> ← multiplyQ31Q31( ac, GPR[rs]<sub>31..0</sub>, GPR[rt]<sub>31..0</sub>)

temp<sub>64..0</sub> ← HI[ac]<sub>31</sub> || HI[ac]<sub>31..0</sub> || LO[ac]<sub>31..0</sub>

temp<sub>64..0</sub> ← temp - dotp<sub>63..0</sub>

if (temp<sub>64</sub> ≠ temp<sub>63</sub>) then

if (temp<sub>64</sub> = 1) then

temp<sub>63..0</sub> ← 0x8000000000000

else

temp<sub>63..0</sub> ← 0x7FFFFFFFFFFF

endif

DSPControl<sub>ouflag:16+ac</sub> ← 1

endif

(HI[ac]<sub>31..0</sub> || LO[ac]<sub>31..0</sub>) ← temp<sub>63..32</sub> || temp<sub>31..0</sub>
```

#### **Exceptions:**

# DPSQ\_SA.L.W

**Dot Product with Subtraction on Fractional Word Element** 

| DPSQX | (_S.W.PH       |               | Cross Dot P | roduct witl | h Subt | raction on | Fractiona | l Halfwor | d Elements |
|-------|----------------|---------------|-------------|-------------|--------|------------|-----------|-----------|------------|
| 31    | 26             | 25 2          | 21 20       | 16 15 14    | 13 12  | 11 9       | 8 6       | 5 3       | 2 0        |
|       | P32A<br>001000 | rt            | rs          | ac          | 10     | 011        | 010       | 111       | 111        |
|       | 6              | 5             | 5           | 2           | 2      | 3          | 3         | 3         | 3          |
| F     | ormat: DPS     | QX S.W.PH ac, | rs, rt      |             |        |            |           |           | DSP        |

Purpose: Cross Dot Product with Subtraction on Fractional Halfword Elements

Element-wise cross multiplication of two vectors of fractional halfword elements and subtraction of the accumulated 32-bit intermediate products from the specified 64-bit accumulator register, with saturation.

**Description:** ac  $\leftarrow$  ac - (sat32(rs<sub>31..16</sub> \* rt<sub>15..0</sub>) + sat32(rs<sub>15..0</sub> \* rt<sub>31..16</sub>))

The left Q15 fractional word value from registers rt is multiplied with the right halfword element from register rs and the result left-shifted by one bit position to generate a Q31 fractional format intermediate product. Similarly, the right Q15 fractional word value from registers rt is multiplied with the left halfword element from register rs and the result left-shifted by one bit position to generate a Q31 fractional format intermediate product. If both multiplicands for either of the multiplications are equal to -1.0 (0x8000 hexadecimal), the resulting intermediate product is saturated to the maximum positive Q31 fractional value (0x7FFFFFF hexadecimal).

The two intermediate products are then sign-extended and summed to generate a 64-bit, Q32.31 fractional format dotproduct result that is subtracted from the specified 64-bit HI/LO accumulator to produce a final Q32.31 fractional result.

The value of *ac* can range from 0 to 3; a v alue of 0 refers to the original *HI/LO* register pair of the MIPS32 architecture.

If saturation occurs as a result of a half word multiplication, a 1 is written to one of bits 16 through 19 of the *DSPControl* register, within the *ouflag* field. The value of *ac* determines which of these bits is set: bit 16 corresponds to *ac0*, bit 17 to *ac1*, bit 18 to *ac2*, and bit 19 to *ac3*.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{tempB}_{31..0} \leftarrow \mbox{multiplyQ15Q15( ac, GPR[rs]_{31..16}, GPR[rt]_{15..0} )} \\ \mbox{tempA}_{31..0} \leftarrow \mbox{multiplyQ15Q15( ac, GPR[rs]_{15..0}, GPR[rt]_{31..16} )} \\ \mbox{dotp}_{63..0} \leftarrow (\mbox{(tempB}_{31})^{32} \ || \mbox{tempB}_{31..0} ) + (\mbox{(tempA}_{31})^{32} \ || \mbox{tempA}_{31..0} ) \\ \mbox{tempC}_{63..0} \leftarrow (\mbox{HI[ac]}_{31..0} \ || \mbox{LO[ac]}_{31..0} ) - \mbox{dotp}_{63..0} \\ (\mbox{HI[ac]}_{31..0} \ || \mbox{LO[ac]}_{31..0} ) \leftarrow \mbox{tempC}_{63..32} \ || \mbox{tempC}_{31..0} \\ \mbox{function multiplyQ15Q15( acc_{1..0}, \mbox{a}_{15..0}, \mbox{b}_{15..0} ) \\ \mbox{if ( } \mbox{a}_{15..0} = \mbox{0x3000 ) and ( } \mbox{b}_{15..0} = \mbox{0x3000 ) then} \\ \mbox{temp}_{31..0} \leftarrow \mbox{0x7FFFFFF} \\ \mbox{DSPControl}_{ouflag:16+acc} \leftarrow 1 \\ \mbox{else} \\ \mbox{temp}_{31..0} \leftarrow (\mbox{a}_{15..0} * \mbox{b}_{15..0} ) <<1 \\ \mbox{endif} \\ \mbox{return temp}_{31..0} \\ \mbox{endfunction multiplyQ15Q15} \end{array}
```

#### **Exceptions:**

DPSQX\_S.W.PH

**Cross Dot Product with Subtraction on Fractional Halfword Elements** 

| DPS | SQX_SA.W.PH    | (               | Cross Dot Prod | uct wit | h Subt | raction on | Fractional | Halfword | Elements |
|-----|----------------|-----------------|----------------|---------|--------|------------|------------|----------|----------|
| 31  | 26             | 25 21           | 20 16          | 15 14   | 13 12  | 11 9       | 8 6        | 5 3      | 2 0      |
|     | P32A<br>001000 | rt              | rs             | ac      | 11     | 011        | 010        | 111      | 111      |
|     | 6              | 5               | 5              | 2       | 2      | 3          | 3          | 3        | 3        |
|     | Format: DPS    | SQX_SA.W.PH ac, | rs, rt         |         |        |            |            |          | DSP-R2   |

Purpose: Cross Dot Product with Subtraction on Fractional Halfword Elements

Element-wise cross multiplication of two vectors of fractional halfword elements and subtraction of the accumulated 32-bit intermediate products from the specified 64-bit accumulator register, with saturation of the accumulator.

**Description:** ac  $\leftarrow$  sat32(ac - (sat32(rs<sub>31..16</sub> \* rt<sub>15..0</sub>) + sat32(rs<sub>15..0</sub> \* rt<sub>31..16</sub>)))

The left Q15 fractional word value from registers *rt* is multiplied with the right halfword element from register *rs* and the result left-shifted by one bit position to generate a Q31 fractional format intermediate product. Similarly, the right Q15 fractional word value from registers *rt* is multiplied with the left halfword element from register *rs* and the result left-shifted by one bit position to generate a Q31 fractional format intermediate product. If both multiplicands for either of the multiplications are equal to -1.0 (0x8000 hexadecimal), the resulting intermediate product is saturated to the maximum positive Q31 fractional value (0x7FFFFFF hexadecimal).

The two intermediate products are then sign-extended and summed to generate a 64-bit, Q32.31 fractional format dotproduct result that is subtracted from the specified 64-bit HI/LO accumulator to produce a Q32.31 fractional result. If this result is larger than or equal to +1.0, or smaller than -1.0, it is saturated to the Q31 range.

The value of *ac* can range from 0 to 3; a v alue of 0 refers to the original *HI/LO* register pair of the MIPS32 architecture.

If saturation occurs as a result of halfword multiplication or accumulation, a 1 is written to one of bits 16 through 19 of the *DSPControl* register, within the *ouflag* field. The value of *ac* determines which of these bits is set: bit 16 corresponds to *ac0*, bit 17 to *ac1*, bit 18 to *ac2*, and bit 19 to *ac3*.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
ValidateAccessToDSPResources()

tempB<sub>31..0</sub> ← multiplyQ15Q15( ac, GPR[rs]<sub>31..16</sub>, GPR[rt]<sub>15..0</sub>)

tempA<sub>31..0</sub> ← multiplyQ15Q15( ac, GPR[rs]<sub>15..0</sub>, GPR[rt]<sub>31..16</sub>)

dotp<sub>63..0</sub> ← ( (tempB<sub>31</sub>)<sup>32</sup> || tempB<sub>31..0</sub>) + ( (tempA<sub>31</sub>)<sup>32</sup> || tempA<sub>31..0</sub>)

tempC<sub>63..0</sub> ← ( HI[ac]<sub>31..0</sub> || LO[ac]<sub>31..0</sub>) - dotp<sub>63..0</sub>

if ( tempC<sub>63</sub> = 0 ) and ( tempC<sub>62..31</sub> ≠ 0 ) then

tempC<sub>63..0</sub> = 0<sup>32</sup> || 0x7FFFFFF

DSPControl<sub>ouflag:16+acc</sub> ← 1

endif

if ( tempC<sub>63</sub> = 1) and ( tempC<sub>62..31</sub> ≠ 1<sup>32</sup>) then

tempC<sub>63..0</sub> = 1<sup>32</sup> || 0x8000000

DSPControl<sub>ouflag:16+acc</sub> ← 1

endif

( HI[ac]<sub>31..0</sub> || LO[ac]<sub>31..0</sub> ) ← tempC<sub>63..32</sub> || tempC<sub>31..0</sub>

function multiplyQ15Q15( acc<sub>1..0</sub>, a<sub>15..0</sub>, b<sub>15..0</sub> )

if ( a<sub>15..0</sub> = 0x8000 ) and ( b<sub>15..0</sub> = 0x8000 ) then

temp<sub>31..0</sub> ← 0x7FFFFFF

DSPControl<sub>ouflag:16+acc</sub> ← 1
```

# DPSQX\_SA.W.PH

```
else

temp<sub>31..0</sub> \leftarrow ( a_{15..0} * b_{15..0} ) << 1

endif

return temp<sub>31..0</sub>

endfunction multiplyQ15Q15
```

# **Exceptions:**

| וט | 220.H.QBL      |     |            |     | Dot Prod |    | with S | ubtrac | tion on v | ecto | or Uns | Ignea | Ву | e E | leme | nts |
|----|----------------|-----|------------|-----|----------|----|--------|--------|-----------|------|--------|-------|----|-----|------|-----|
|    | 31             | 26  | 25         | 21  | 20       | 16 | 15 14  | 13 12  | 11 9      | 8    | 6      | 5     | 3  | 2   | 0    |     |
|    | P32A<br>001000 |     | rt         |     | rs       |    | ac     | 10     | 010       |      | 010    | 111   |    |     | 111  |     |
|    | 6              |     | 5          |     | 5        |    | 2      | 2      | 3         |      | 3      | 3     | -  |     | 3    | -   |
|    | Format: DI     | PSU | .H.QBL ac, | rs, | rt       |    |        |        |           |      |        |       |    |     |      | DSP |

Det Breduct with Subtraction on Vector Unci

Purpose: Dot Product with Subtraction on Vector Unsigned Byte Elements

Element-wise multiplication of two left-most elements from the four elements of each of two vectors of unsigned bytes, subtracting the sum of the products from the specified 64-bit accumulator register.

**Description:** ac  $\leftarrow$  ac - zero\_extend((rs<sub>31..24</sub> \* rt<sub>31..24</sub>) + (rs<sub>23..16</sub> \* rt<sub>23..16</sub>))

The two left-most elements of the four unsigned byte elements of each of registers rt and rs are multiplied together using unsigned arithmetic to generate two 16-bit unsigned intermediate products. The intermediate products are then zero-extended to 64 bits and subtracted from the specified 64-bit *HI/LO* accumulator. The result of the subtraction is written back to the specified 64-bit *HI/LO* accumulator.

The value of *ac* can range from 0 to 3; a value of 0 refers to the original *HI/LO* register pair of the MIPS32 architecture.

This instruction does not set any bits in the *ouflag* field in the DSPControl register.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{tempB}_{15..0} \leftarrow \mbox{multiplyU8U8(GPR[rs]_{31..24}, GPR[rt]_{31..24})} \\ \mbox{tempA}_{15..0} \leftarrow \mbox{multiplyU8U8(GPR[rs]_{23..16}, GPR[rt]_{23..16})} \\ \mbox{dotp}_{63..0} \leftarrow (0^{48} \mid \mid \mbox{tempB}_{15..0}) + (0^{48} \mid \mid \mbox{tempA}_{15..0}) \\ \mbox{tempC}_{63..0} \leftarrow (\mbox{HI}[ac]_{31..0} \mid \mid \mbox{LO}[ac]_{31..0}) - \mbox{dotp}_{63..0} \\ \mbox{(HI}[ac]_{31..0} \mid \mid \mbox{LO}[ac]_{31..0}) \leftarrow \mbox{tempC}_{63..32} \mid \mid \mbox{tempC}_{31..0} \end{array}
```

## **Exceptions:**

# DPSU.H.QBL

Dot Product with Subtraction on Vector Unsigned Byte Elements

| DPSU.H.QBK |                |      |            |     | Dot Produ |    | with Si | ubtrac | tion on v | ecto | or Uns | Ignea | ∃yt | e Elemo | ents |
|------------|----------------|------|------------|-----|-----------|----|---------|--------|-----------|------|--------|-------|-----|---------|------|
|            | 31             | 26   | 25         | 21  | 20        | 16 | 15 14   | 13 12  | 11 9      | 8    | 6      | 5     | 3   | 2       | 0    |
|            | P32A<br>001000 |      | rt         |     | rs        |    | ac      | 11     | 010       |      | 010    | 111   |     | 111     |      |
|            | 6              |      | 5          |     | 5         |    | 2       | 2      | 3         | 1    | 3      | 3     | 1   | 3       |      |
|            | Format: D      | PSU. | .H.QBR ac, | rs, | rt        |    |         |        |           |      |        |       |     |         | DSP  |

Purpose: Dot Product with Subtraction on Vector Unsigned Byte Elements

Element-wise multiplication of the two right-most elements of the four elements of each of two vectors of unsigned bytes, subtracting the sum of the products from the specified 64-bit accumulator register.

**Description:** ac  $\leftarrow$  ac - zero\_extend((rs<sub>15..8</sub> \* rt<sub>15..8</sub>) + (rs<sub>7..0</sub> \* rt<sub>7..0</sub>))

The two right-most elements of the four unsigned byte elements of each of registers *rt* and *rs* are multiplied together using unsigned arithmetic to generate two 16-bit unsigned intermediate products. The intermediate products are then zero-extended to 64 bits and subtracted from the specified 64-bit *HI/LO* accumulator.

The value of *ac* can range from 0 to 3; a v alue of 0 refers to the original *HI/LO* register pair of the MIPS32 architecture.

This instruction does not set any bits in the *ouflag* field in the DSPControl register.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{tempB}_{15..0} \leftarrow \mbox{multiplyU8U8(GPR[rs]_{15..8}, GPR[rt]_{15..8})} \\ \mbox{tempA}_{15..0} \leftarrow \mbox{multiplyU8U8(GPR[rs]_{7..0}, GPR[rt]_{7..0})} \\ \mbox{dotp}_{63..0} \leftarrow (0^{48} \mid | \mbox{tempB}_{15..0}) + (0^{48} \mid | \mbox{tempA}_{15..0}) \\ \mbox{tempC}_{63..0} \leftarrow (\mbox{HI}[ac]_{31..0} \mid | \mbox{LO}[ac]_{31..0}) - \mbox{dotp}_{63..0} \\ \mbox{(HI}[ac]_{31..0} \mid | \mbox{LO}[ac]_{31..0}) \leftarrow \mbox{tempC}_{63..32} \mid | \mbox{tempC}_{31..0} \end{array}
```

### **Exceptions:**

# DPSU.H.QBR

Dot Product with Subtraction on Vector Unsigned Byte Elements

| SX.W.PH       |    |    | Cross Dot Product with Subtract on Vector Integer Halfword Elements |    |    |       |       |      |     |   |     |   |     |   |
|---------------|----|----|---------------------------------------------------------------------|----|----|-------|-------|------|-----|---|-----|---|-----|---|
| 31            | 26 | 25 | 21                                                                  | 20 | 16 | 15 14 | 13 12 | 11 9 | 8   | 6 | 5   | 3 | 2   | 0 |
| P32A<br>00100 |    | rt |                                                                     | rs |    | ac    | 01    | 010  | 010 |   | 111 |   | 111 |   |
| 6             |    | 5  |                                                                     | 5  |    | 2     | 2     | 3    | 3   |   | 3   |   | 3   |   |

Purpose: Cross Dot Product with Subtract on Vector Integer Halfword Elements

Generate the cross dot-product of two integer halfword vector elements using full-size intermediate products and then subtract from the specified accumulator register.

**Description:** ac  $\leftarrow$  ac - ((rs<sub>31..16</sub> \* rt<sub>15..0</sub>) + (rs<sub>15..0</sub> \* rt<sub>31..16</sub>))

The left halfword integer value from register rt is multiplied with the right halfword element from register rs to create an integer word result. Similarly, the right halfword integer value from register rt is multiplied with the left halfword element from register rs to create the second integer word result. These two products are summed to generate the dotproduct result, which is then subtracted from the specified 64-bit HI/LO accumulator, creating a 64-bit integer result.

The value of ac selects an accumulator numbered from 0 to 3. When ac=0, this refers to the original HI/LO register pair of the MIPS32 architecture.

This instruction will not set any bits of the *ouflag* field in the DSPControl register.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become UNPREDICTABLE.

#### **Operation:**

```
ValidateAccessToDSPResources()
tempB_{31..0} \leftarrow (GPR[rs]_{31..16} * GPR[rt]_{15..0})
\begin{array}{l} \text{tempA}_{31..0} \leftarrow (\text{GPR}[\text{rs}]_{15..0} \ast \text{GPR}[\text{rt}]_{31..16}) \\ \text{dotp}_{32..0} \leftarrow ((\text{tempB}_{31}) \mid \mid \text{tempB}_{31..0}) + ((\text{tempA}_{31}) \mid \mid \text{tempA}_{31..0}) \\ \text{acc}_{63..0} \leftarrow ((\text{HI}[\text{ac}]_{31..0} \mid \mid \text{LO}[\text{acl}_{31..0})) - ((\text{dotp}_{32})^{31} \mid \mid \text{dotp}_{32..0}) \\ \end{array}
 ( \text{HI}[ac]_{31..0} || \text{LO}[ac]_{31..0} ) \leftarrow acc_{63..32} acc_{31..0}
```

#### **Exceptions:**

DPSX.W.PH

Cross Dot Product with Subtract on Vector Integer Halfword Elements

| XTP |                |            | Ex | tract Fixed I | Bitf | ield Fr | om Ar | bitrary Po | osition in | Accumu | lator to G | PR |
|-----|----------------|------------|----|---------------|------|---------|-------|------------|------------|--------|------------|----|
| 31  | 26             | 25         | 21 | 20            | 16   | 15 14   | 13 12 | 11 9       | 8 6        | 5 3    | 2 0        |    |
|     | P32A<br>001000 | rt         |    | size          |      | ac      | 10    | 011        | 001        | 111    | 111        |    |
|     | 6              | 5          |    | 5             |      | 2       | 2     | 3          | 3          | 3      | 3          | _  |
| ]   | Format: EXTP   | rt, ac, si | ze |               |      |         |       |            |            |        |            | DS |

Format: EXTP rt, ac, size

Purpose: Extract Fixed Bitfield From Arbitrary Position in Accumulator to GPR

Extract *size*+1 contiguous bits from a 64-bit accumulator from a position specified in the DSPControl register, writing the bits to a GPR with zero-extension.

**Description:** rt ← zero\_extend(ac<sub>pos..pos-size</sub>)

A set of *size*+1 contiguous bits are extracted from an arbitrary position in accumulator *ac*, zero-extended to 32 bits, and then written to register rt.

The bit position, *start\_pos*, of the first bit of the contiguous set to extract is specified by the pos field in bits 0 through 5 of the DSPControl register. The last bit in the set is *start\_pos* - *size*, where *size* is specified in the instruction.

The value of ac can range from 0 to 3. When ac=0, this refers to the original HI/LO register pair of the MIPS32 architecture. After the execution of this instruction, accumulator *ac* remains unmodified.

If start  $pos - (size + 1) \ge -1$ , the extraction is valid, otherwise the extraction is invalid and is said to have failed. The value of the destination register is **UNPREDICTABLE** when the extraction is invalid. Upon an invalid extraction this instruction writes a 1 to bit 14, the Extract Failed Indicator (EFI) bit of the DSPControl register, and 0 otherwise.

The values of bits 0 to 5 in the pos field of the *DSPControl* register are unchanged by this instruction.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become UNPREDICTABLE.

## **Operation:**

```
ValidateAccessToDSPResources()
start_{pos_{5..0}} \leftarrow DSPControl_{pos:5..0}
if (start_pos - (size+1) >= -1) then
   GPR[rt]_{31..0} \leftarrow temp_{31..0}
   DSPControl_{EFI:14} \leftarrow 0
else
   DSPControl_{EFI:14} \leftarrow 1
   GPR[rt] ← UNPREDICTABLE
endif
```

#### **Exceptions:**

EXTP

Extract Fixed Bitfield From Arbitrary Position in Accumulator to GPR

| EXTPDP | Extract Fixed Bitfield From Arbitrary Position in Accumulator to GPR and Decrement Pos |
|--------|----------------------------------------------------------------------------------------|
|        |                                                                                        |

| 31             | 26 25        | 21       | 20 16 | 15 14 | 13 12 | 11 9 | 8 6 | 5 3 | 2 0 |   |
|----------------|--------------|----------|-------|-------|-------|------|-----|-----|-----|---|
| P32A<br>001000 |              | rt       | size  | ac    | 11    | 011  | 001 | 111 | 111 |   |
| 6              |              | 5        | 5     | 2     | 2     | 3    | 3   | 3   | 3   | _ |
| Format: E      | EXTPDP rt, a | ac, size |       |       |       |      |     |     |     | D |

Format: EXTPDP rt, ac, size

Purpose: Extract Fixed Bitfield From Arbitrary Position in Accumulator to GPR and Decrement Pos

Extract *size*+1 contiguous bits from a 64-bit accumulator from a position specified in the DSPControl register, writing the bits to a GPR with zero-extension and modifying the extraction position.

**Description:**  $rt \leftarrow zero_extend(ac_{pos..pos-size})$ ; DSPControl<sub>pos.5..0</sub> -= (size+1)

A set of *size*+1 contiguous bits are extracted from an arbitrary position in accumulator *ac*, zero-extended to 32 bits, then written to register rt.

The bit position, *start\_pos*, of the first bit of the contiguous set to extract is specified by the *pos* field in bits 0 through 5 of the DSPControl register. The position of the last bit in the extracted set is start pos - size, where the size argument is specified in the instruction.

The value of ac can range from 0 to 3. When ac=0, this refers to the original HI/LO register pair of the MIPS32 architecture. After the execution of this instruction, accumulator ac remains unmodified.

If start  $pos - (size + 1) \ge -1$ , the extraction is valid and the value of the pos field in the DSPControl register is decremented by size+1. Otherwise, the extraction is invalid and is said to have failed. The value of the destination register is UNPREDICTABLE when the extraction is invalid, and the value of the pos field in the DSPControl register (bits 0 through 5) is not modified.

Upon an invalid extraction this instruction writes a 1 to bit 14, the Extract Failed Indicator (EFI) bit of the DSPControl register, and 0 otherwise.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become UNPREDICTABLE.

#### **Operation:**

```
ValidateAccessToDSPResources()
start_{pos_{5..0}} \leftarrow DSPControl_{pos:5..0}
if ( start_pos - (size+1) >= -1 ) then
       \begin{array}{c} \texttt{temp}_{\texttt{size..0}} \leftarrow (\texttt{HI}[\texttt{ac}]_{\texttt{31..0}} \mid \texttt{LO}[\texttt{ac}]_{\texttt{31..0}})_{\texttt{start_pos..start_pos-size}} \\ \texttt{GPR}[\texttt{rt}] \leftarrow 0^{(\texttt{GPRLEN-(size+i))}} \mid \texttt{ltemp}_{\texttt{size..0}} \end{array}
       \text{DSPControl}_{\text{pos:5..0}} \leftarrow \text{DSPControl}_{\text{pos:5..0}} - (size + 1)
       DSPControl_{EFI:14} \leftarrow 0
else
       \text{DSPControl}_{\text{EFI:14}} \leftarrow 1
       GPR[rt] ← UNPREDICTABLE
endif
```

## **Exceptions:**

EXTPDP Extract Fixed Bitfield From Arbitrary Position in Accumulator to GPR and Decrement Pos

# EXTPDPV Extract Variable Bitfield From Arbitrary Position in Accumulator to GPR and Decrement Pos

| 31 | 26             | 25 21         | 20 16 | 15 14 | 13 12 | 11 9 | 8 6 | 5 3 | 2 0 |     |
|----|----------------|---------------|-------|-------|-------|------|-----|-----|-----|-----|
|    | P32A<br>001000 | rt            | rs    | ac    | 11    | 100  | 010 | 111 | 111 |     |
|    | 6              | 5             | 5     | 2     | 2     | 3    | 3   | 3   | 3   | _   |
|    | Format: EXTPD  | PV rt, ac, rs |       |       |       |      |     |     |     | DSP |

Purpose: Extract Variable Bitfield From Arbitrary Position in Accumulator to GPR and Decrement Pos

Extract a fixed number of contiguous bits from a 64-bit accumulator from a position specified in the *DSPControl* register, writing the bits to a GPR with zero-extension and modifying the extraction position.

**Description:** rt ← zero\_extend(ac<sub>pos..pos-GPR[rs][4:0]</sub>); DSPControl<sub>pos:5..0</sub> -= (GPR[rs]<sub>4..0</sub>+1)

A fixed number of contiguous bits are extracted from an arbitrary position in accumulator ac, zero-extended to 32 bits, then written to destination register rt. The number of bits extracted is size+1, where size is specified by the five least-significant bits in register rs, interpreted as a five-bit unsigned integer. The remaining bits in register rs are ignored.

The bit position, *start\_pos*, of the first bit of the contiguous set to extract is specified by the pos field in bits 0 through 5 of the *DSPControl* register. The position of the last bit in the extracted set is *start\_pos* - *size*.

The value of ac can range from 0 to 3. When ac=0, this refers to the original HI/LO register pair of the MIPS32 architecture. After the execution of this instruction, accumulator ac remains unmodified.

If  $start_pos - (size + 1) \ge -1$ , the extraction is valid and the value of the pos field in the *DSPControl* register is decremented by size+1. Otherwise, the extraction is invalid and is said to have failed. The value of the destination register is **UNPREDICTABLE** when the extraction is invalid, and the value of the pos field in the *DSPControl* register (bits 0 through 5) is not modified.

Upon an invalid extraction this instruction writes a 1 to bit 14, the Extract Failed Indicator (EFI) bit of the *DSPControl* register, and 0 otherwise.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

#### **Exceptions:**

EXTPDPV Extract Variable Bitfield From Arbitrary Position in Accumulator to GPR and Decrement Pos

| EXTPV |                |      |        | Extract Variable Bitfield From Arbitrary Position in Accumulator to G |    |    |       |       |     |    |     |   |     |   | PR |     |     |
|-------|----------------|------|--------|-----------------------------------------------------------------------|----|----|-------|-------|-----|----|-----|---|-----|---|----|-----|-----|
|       | 31             | 26   | 25     | 21                                                                    | 20 | 16 | 15 14 | 13 12 | 11  | 98 | 3   | 6 | 5   | 3 | 2  | 0   |     |
|       | P32A<br>001000 |      | rt     | :                                                                     |    | rs | ac    | 10    | 100 |    | 010 |   | 111 |   |    | 111 |     |
| L     | 6              |      | 5      |                                                                       | 1  | 5  | 2     | 2     | 3   |    | 3   |   | 3   |   |    | 3   |     |
|       | Format: EX     | XTPV | rt, ac | , rs                                                                  |    |    |       |       |     |    |     |   |     |   |    |     | DSI |

Purpose: Extract Variable Bitfield From Arbitrary Position in Accumulator to GPR

Extract a variable number of contiguous bits from a 64-bit accumulator from a position specified in the *DSPControl* register, writing the bits to a GPR with zero-extension.

# **Description:** rt $\leftarrow$ zero\_extend(ac<sub>pos.pos-rs[4:0]</sub>)

A variable number of contiguous bits are extracted from an arbitrary position in accumulator ac, zero-extended to 32 bits, then written to register rt. The number of bits extracted is size+1, where size is specified by the five least-significant bits in register rs, interpreted as a five-bit unsigned integer. The remaining bits in register rs are ignored.

The position of the first bit of the contiguous set to extract, *start\_pos*, is specified by the pos field in bits 0 through 5 of the *DSPControl* register. The position of the last bit in the contiguous set is *start\_pos* - *size*.

The value of ac can range from 0 to 3. When ac=0, this refers to the original HI/LO register pair of the MIPS32 architecture. After the execution of this instruction, accumulator ac remains unmodified.

An extraction is valid if  $start_pos - (size + 1) \ge -1$ ; otherwise, the extraction is invalid and is said to have failed. The value of the destination register is **UNPREDICTABLE** when the extraction is invalid. Upon an invalid extraction this instruction writes a 1 to bit 14, the Extract Failed Indicator (EFI) bit of the *DSPControl* register, and 0 otherwise.

The values of bits 0 to 5 in the pos field of the DSPControl register are unchanged by this instruction.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{start_pos_{5..0}} \leftarrow \mbox{DSPControl}_{pos:5..0} \\ \mbox{size}_{4..0} \leftarrow \mbox{GPR[rs]}_{4..0} \\ \mbox{if ( start_pos - (size+1) >= -1 ) then} \\ & \mbox{temp}_{size..0} \leftarrow (\mbox{HI}[ac]_{31..0} \ || \mbox{LO}[ac]_{31..0} \ )_{start_pos..start_pos-size} \\ & \mbox{GPR[rt]} \leftarrow 0^{(\mbox{GPRLEN-(size+1))}} \ || \mbox{temp}_{size..0} \\ & \mbox{DSPControl}_{EFI:14} \leftarrow 0 \\ \mbox{else} \\ & \mbox{DSPControl}_{EFI:14} \leftarrow 1 \\ & \mbox{GPR[rt]} \leftarrow \mbox{UNPREDICTABLE} \\ \mbox{endif} \end{array}
```

## **Exceptions:**

EXTPV

Extract Variable Bitfield From Arbitrary Position in Accumulator to GPR

EXTR[\_RS].W

Extract Word Value With Right Shift From Accumulator to GPR

| 31             | 26 | 25 | 21 | 20 |       | 16 | 15 | 14 | 13 | 12 | 11 |     | 9 | 8   | 6 | 5 |     | 3 | 2 |     | 0 |
|----------------|----|----|----|----|-------|----|----|----|----|----|----|-----|---|-----|---|---|-----|---|---|-----|---|
| EXTR.W         |    |    |    |    |       |    |    |    |    |    |    |     |   |     |   |   |     |   |   |     |   |
| P32A<br>001000 |    |    | rt |    | shift |    | ac | 2  | 00 | )  |    | 111 |   | 001 |   |   | 111 |   |   | 111 |   |
| EXTR_R.W       |    |    |    | _  |       |    |    |    |    |    |    |     | ! |     |   |   |     |   |   |     |   |
| P32A<br>001000 |    |    | rt |    | shift |    | ac | 2  | 01 |    |    | 111 |   | 001 |   |   | 111 |   |   | 111 |   |
| EXTR_RS.W      |    |    |    | -  |       |    |    |    |    |    |    |     |   |     |   |   |     |   |   |     |   |
| P32A<br>001000 |    |    | rt |    | shift |    | ac | 2  | 10 | )  |    | 111 |   | 001 |   |   | 111 |   |   | 111 |   |
| 6              |    |    | 5  |    | 5     |    | 2  |    | 2  |    |    | 3   |   | 3   |   |   | 3   |   |   | 3   |   |

Format: EXTR[RS].W

DSP EXTR.W rt, ac, shift EXTR R.W rt, ac, shift DSP EXTR RS.W rt, ac, shift DSP

Purpose: Extract Word Value With Right Shift From Accumulator to GPR

Extract a word value from a 64-bit accumulator to a GPR with right shift, and with optional rounding or rounding and saturation.

**Description:** rt ← sat32(round(ac >> shift))

The value in accumulator ac is shifted right by shift bits with sign extension (arithmetic shift right). The 32 least-significant bits of the shifted value are then written to the destination register rs.

The rounding variant of the instruction adds a 1 at the most-significant discarded bit position. The 32 least-significant bits of the rounded result are then written to the destination register.

The rounding and saturating variant of the instruction adds a 1 at the most-significant discarded bit position. If the rounding operation results in an overflow, the shifted value is clamped to the maximum positive Q31 fractional value (0x7FFFFFFF hexadecimal). The rounded and saturated result is then written to the destination register.

The value of ac can range from 0 to 3. When ac=0, this refers to the original HI/LO register pair of the MIPS32 architecture. After the execution of this instruction, ac remains unmodified.

For all variants of the instruction, including EXTR.W, bit 23 of the DSPControl register is set to 1 if either of the rounded or non-rounded calculation results in overflow or saturation.

### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are UNPREDICTABLE and the values of the operand vectors become UNPREDICTABLE.

#### **Operation:**

```
EXTR.W
     ValidateAccessToDSPResources()
     temp_{64..0} \leftarrow \_shiftShortAccRightArithmetic( ac, shift )
     if (( \texttt{temp}_{64\ldots32}\neq 0 ) and ( \texttt{temp}_{64\ldots32}\neq \texttt{0xlFFFFFFF} ) ) then
           DSPControl_{ouflag:23} \leftarrow 1
     endif
     GPR[rt]_{31..0} \leftarrow temp_{32..1}
     \texttt{temp}_{64..0} \leftarrow \texttt{temp} + \texttt{1}
     if (( \texttt{temp}_{64..32} \neq \texttt{0} ) and ( \texttt{temp}_{64..32} \neq \texttt{0x1FFFFFFF} )) then
           \text{DSPControl}_{\text{ouflag:23}} \leftarrow 1
```

```
EXTR[_RS].W
```

```
endif
```

```
EXTR R.W
    ValidateAccessToDSPResources()
     \texttt{temp}_{64..0} \leftarrow \_\texttt{shiftShortAccRightArithmetic(ac, shift)}
     if (( \texttt{temp}_{64..32} \neq \texttt{0} ) and ( \texttt{temp}_{64..32} \neq \texttt{0x1FFFFFFF} )) then
          \text{DSPControl}_{\text{ouflag:23}} \leftarrow 1
    endif
     \texttt{temp}_{64..0} \leftarrow \texttt{temp} + 1
     if (( \texttt{temp}_{64\ldots32} \neq 0 ) and ( \texttt{temp}_{64\ldots32} \neq \texttt{0xlFFFFFFF} )) then
         \text{DSPControl}_{\text{ouflag:23}} \leftarrow 1
     endif
    GPR[rt]_{31..0} \leftarrow temp_{32..1}
EXTR RS.W
    ValidateAccessToDSPResources()
     temp_{64..0} \leftarrow \_shiftShortAccRightArithmetic( ac, shift )
     if ((temp<sub>64..32</sub> \neq 0) and (temp<sub>64..32</sub> \neq 0x1FFFFFFFF)) then
          DSPControl_{ouflag:23} \leftarrow 1
    endif
    \texttt{temp}_{64..0} \leftarrow \texttt{temp} + 1
     if (( temp_{64..32} \neq 0 ) and ( temp_{64..32} \neq 0x1FFFFFFF )) then
          if (temp_{64} = 0) then
               else
               temp_{32..1} \leftarrow 0x8000000
          endif
          \text{DSPControl}_{\text{ouflag:23}} \leftarrow 1
     endif
    GPR[rt]_{31..0} \leftarrow temp_{32..1}
function _shiftShortAccRightArithmetic( ac_{1..0}, shift_{4..0})
     if ( \text{shift}_{4 \dots 0} = 0 ) then
          temp_{64..0} \leftarrow (HI[ac]_{31..0} || LO[ac]_{31..0} || 0)
     else
          \texttt{temp}_{64..0} \leftarrow ( (\texttt{HI[ac]}_{31})^{\texttt{shift}} || \texttt{HI[ac]}_{31..0} || \texttt{LO[ac]}_{31..\texttt{shift-1}} )
     endif
     return temp_{64..0}
endfunction _shiftShortAccRightArithmetic
```

#### **Exceptions:**

| EX | TR_S.H         |      | Extract H  | alfw | ord Value F | From | n Ac | cu | mulate | or to | GPF | R With | n Rig | ght | Shif | t a | nd | Satu | ırate |
|----|----------------|------|------------|------|-------------|------|------|----|--------|-------|-----|--------|-------|-----|------|-----|----|------|-------|
|    | 31             | 26   | 25         | 21   | 20          | 16   | 15   | 14 | 13 12  | 11    | 9   | 8      | 6     | 5   |      | 3   | 2  | 1    | 0     |
|    | P32A<br>001000 |      | rt         |      | shift       |      | a    | с  | 11     | 11    | 1   | 00     | 1     |     | 111  |     |    | 111  |       |
|    | 6              |      | 5          |      | 5           |      | 2    | 2  | 2      | 3     |     | 3      |       | 1   | 3    |     |    | 3    |       |
|    | Format: EX     | KTR_ | S.H rt, ac | :, s | hift        |      |      |    |        |       |     |        |       |     |      |     |    |      | DSP   |

Purpose: Extract Halfword Value From Accumulator to GPR With Right Shift and Saturate

Extract a halfword value from a 64-bit accumulator to a GPR with right shift and saturation.

**Description:** rt ← sat16(ac >> shift)

The value in the 64-bit accumulator *ac* is shifted right by *shift* bits with sign extension (arithmetic shift right). The 64bit value is then saturated to 16-bits, sign extended to 32 bits, and written to the destination register *rt*. The shift argument is provided in the instruction.

The value of ac can range from 0 to 3. When ac=0, this refers to the original H/LO register pair of the MIPS32 architecture. After the execution of this instruction, ac remains unmodified.

This instruction sets bit 23 of the DSPControl register in the ouflag field if the operation results in saturation.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

### **Operation:**

```
ValidateAccessToDSPResources()
temp_{31..0} \leftarrow 0x00007FFF
   DSPControl_{ouflag:23} \leftarrow 1
else if ( temp_{63..0} < 0xFFFFFFFFFFF8000 ) then
    temp_{31..0} \leftarrow 0xFFFF8000
   DSPControl_{ouflag:23} \leftarrow 1
endif
GPR[rt]_{31..0} \leftarrow temp_{31..0}
function shiftShortAccRightArithmetic( ac_{1..0}, shift_{4..0})
    sign \leftarrow HI[ac]<sub>31</sub>
   if ( shift = 0 ) then
       temp_{63..0} \leftarrow HI[ac]_{31..0} || LO[ac]_{31..0}
    else
       temp_{63..0} \leftarrow sign^{shift} || ((HI[ac]_{31..0} || LO[ac]_{31..0}) >> shift)
    endif
    if ( sign \neq \text{temp}_{31} ) then
       DSPControl_{ouflag:23} \leftarrow 1
    endif
    return temp<sub>63..0</sub>
endfunction shiftShortAccRightArithmetic
```

#### **Exceptions:**

EXTR\_S.H Extract Halfword Value From Accumulator to GPR With Right Shift and Saturate

| EXTRV[_RS].V              | v                                          | Extract    | Word Value | e With | Variab | le Right S | Shift From / | Accumulat | tor to GPI |
|---------------------------|--------------------------------------------|------------|------------|--------|--------|------------|--------------|-----------|------------|
| 31                        | 26 25                                      | 21 20      | 16         | 15 14  | 13 12  | 11 9       | 86           | 5 3       | 2 0        |
| EXTRV.W<br>P32A<br>001000 | rt                                         |            | rs         | ac     | 00     | 111        | 010          | 111       | 111        |
| EXTRV_R.W                 |                                            |            |            |        |        |            |              |           |            |
| P32A<br>001000            | rt                                         |            | rs         | ac     | 01     | 111        | 010          | 111       | 111        |
| EXTRV_RS.W                | ŀ                                          |            |            |        | ļ      | I          |              | 4         | 1          |
| P32A<br>001000            | rt                                         |            | rs         | ac     | 10     | 111        | 010          | 111       | 111        |
| 6                         | 5                                          | Į          | 5          | 2      | 2      | 3          | 3            | 3         | 3          |
| Format:                   | EXTRV [_RS] .V<br>EXTRV.W 1<br>EXTRV_R.W 1 | rt, ac, rs |            |        |        |            |              |           | I          |

#### Purpose: Extract Word Value With Variable Right Shift From Accumulator to GPR

Extract a word value from a 64-bit accumulator to a GPR with variable right shift, and with optional rounding or rounding and saturation.

### **Description:** rt ← sat32(round(ac >> rs<sub>5..0</sub>))

EXTRV RS.W rt, ac, rs

The value in accumulator ac is shifted right by *shift* bits with sign extension (arithmetic shift right). The lower 32 bits of the shifted value are then written to the destination register rt. The number of bits to shift is given by the five least-significant bits of register rs; the remaining bits of rs are ignored.

The rounding variant of the instruction adds a 1 at the most-significant discarded bit position. The 32 least-significant bits of the rounded result are then written to the destination register.

The rounding and saturating variant of the instruction adds a 1 at the most-significant discarded bit position. If the rounding operation results in an overflow, the shifted value is clamped to the maximum positive Q31 fractional value (0x7FFFFFFF hexadecimal). The rounded and saturated result is then written to the destination register.

The value of ac can range from 0 to 3. When ac=0, this refers to the original HI/LO register pair of the MIPS32 architecture. After the execution of this instruction, ac remains unmodified.

For all variants of the instruction, including EXTRV.W, bit 23 of the *DSPControl* register is set to 1 if either of the rounded or non-rounded calculation results in overflow or saturation.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \mbox{EXTRV.W} \\ \mbox{ValidateAccessToDSPResources()} \\ \mbox{temp}_{64..0} \leftarrow \_shiftShortAccRightArithmetic(ac, GPR[rt]_{4..0}) \\ \mbox{if ((temp}_{64..32} \neq 0) and (temp_{64..32} \neq 0x1FFFFFFF)) then \\ \mbox{DSPControl}_{ouflag:23} \leftarrow 1 \\ \mbox{endif} \\ \mbox{GPR[rt]}_{31..0} \leftarrow temp_{32..1} \\ \mbox{temp}_{64..0} \leftarrow temp + 1 \\ \mbox{if ((temp}_{64..32} \neq 0) and (temp_{64..32} \neq 0x1FFFFFFFF)) then } \end{array}
```

DSP

```
EXTRV[_RS].W
```

```
DSPControl<sub>ouflag:23</sub> ← 1
     endif
EXTRV R.W
     ValidateAccessToDSPResources()
     \texttt{temp}_{64..0} \leftarrow \_\texttt{shiftShortAccRightArithmetic(ac, GPR[rt]_{4..0})}
     if (( temp_{64..32} \neq 0 ) and ( temp_{64..32} \neq 0x1FFFFFFFF )) then
          DSPControl_{ouflag:23} \leftarrow 1
     endif
     temp_{64..0} \leftarrow temp + 1
     if (( \texttt{temp}_{64\ldots32}\neq\texttt{0} ) and ( \texttt{temp}_{64\ldots32}\neq\texttt{0x1FFFFFFF} )) then
          \text{DSPControl}_{\text{ouflag:23}} \leftarrow 1
     endif
     GPR[rt]_{31..0} \leftarrow temp_{32..1}
EXTRV RS.W
     ValidateAccessToDSPResources()
     temp_{64...0} \leftarrow shiftShortAccRightArithmetic(ac, GPR[rt]_{4...0})
     if (( \texttt{temp}_{64\ldots32} \neq \texttt{0} ) and ( \texttt{temp}_{64\ldots32} \neq \texttt{0x1FFFFFFF} )) then
          \text{DSPControl}_{\text{ouflag:23}} \leftarrow 1
     endif
     \texttt{temp}_{64..0} \leftarrow \texttt{temp} + 1
     if ((temp<sub>64..32</sub> \neq 0) and (temp<sub>64..32</sub> \neq 0x1FFFFFFFF)) then
          if ( temp_{64} = 0 ) then
               else
               \texttt{temp}_{\texttt{32..1}} \leftarrow \texttt{0x8000000}
          endif
          \text{DSPControl}_{\text{ouflag:23}} \leftarrow 1
     endif
     GPR[rt]_{31..0} \leftarrow temp_{32..1}
```

## **Exceptions:**

### EXTRV\_S.H Extract Halfword Value Variable From Accumulator to GPR With Right Shift and Saturate

| 31             | 26    | 25         | 21   | 20 16 | 15 14 | 13 12 | 11 9 | 8 6 | 5 3 | 2 0 |     |
|----------------|-------|------------|------|-------|-------|-------|------|-----|-----|-----|-----|
| P32A<br>001000 | )     | rt         |      | rs    | ac    | 11    | 111  | 010 | 111 | 111 |     |
| 6              |       | 5          |      | 5     | 2     | 2     | 3    | 3   | 3   | 3   |     |
| Format:        | EXTRV | _S.H rt, a | c, r | S     |       |       |      |     |     |     | DSP |

Purpose: Extract Halfword Value Variable From Accumulator to GPR With Right Shift and Saturate

Extract a halfword value from a 64-bit accumulator to a GPR with right shift and saturation.

**Description:**  $rt \leftarrow sat16(ac >> rs_{4..0})$ 

The value in the 64-bit accumulator *ac* is shifted right by *shift* bits with sign extension (arithmetic shift right). The 64bit value is then saturated to 16-bits and sign-extended to 32 bits before being written to the destination register *rt*. The five least-significant bits of register *rs* provide the shift argument, interpreted as a five-bit unsigned integer; the remaining bits in *rs* are ignored.

The value of ac can range from 0 to 3. When ac=0, this refers to the original HI/LO register pair of the MIPS32 architecture. After the execution of this instruction, ac remains unmodified.

This instruction sets bit 23 of the DSPControl register in the ouflag field if the operation results in saturation.

### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

### **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{shift}_{4..0} \leftarrow \mbox{GPR[rs]}_{4..0} \\ \mbox{temp}_{31..0} \leftarrow \mbox{shiftShortAccRightArithmetic(ac, shift)} \\ \mbox{if (temp}_{63..0} > 0x0000000007FFF) then \\ \mbox{temp}_{31..0} \leftarrow 0x00007FFF \\ \mbox{DSPControl}_{23} \leftarrow 1 \\ \mbox{else if (temp}_{63..0} < 0xFFFFFFFFFFFFFF8000) then \\ \mbox{temp}_{31..0} \leftarrow 0xFFFF8000 \\ \mbox{DSPControl}_{23} \leftarrow 1 \\ \mbox{endif} \\ \mbox{GPR[rt]}_{31..0} \leftarrow \mbox{temp}_{31..0} \\ \end{array}
```

### **Exceptions:**

EXTRV\_S.H Extract Halfword Value Variable From Accumulator to GPR With Right Shift and Saturate

| IN | sv             |    |        |    |      |   |         |   | Ins | ser | t Bit | Fie | eld Va | riable |
|----|----------------|----|--------|----|------|---|---------|---|-----|-----|-------|-----|--------|--------|
|    | 31             | 26 | 25     | 21 | 20 1 | 6 | 15 9    | 8 | 6   | 5   |       | 3   | 2      | 0      |
|    | P32A<br>001000 |    | rt     |    | rs   |   | 0100000 |   | 100 |     | 111   |     | 111    |        |
|    | 6              |    | 5      |    | 5    |   | 7       |   | 3   | •   | 3     |     | 3      |        |
|    | Format: INS    | v  | rt, rs |    |      |   |         |   |     |     |       |     |        | DSF    |

Purpose: Insert Bit Field Variable

To merge a right-justified bit field from register rs into a specified field in register rt.

**Description:** rt ← InsertFieldVar(rt, rs, Scount, Pos)

The DSPControl register provides the *size* value from the *Scount* field, and the *pos* value from the *pos* field. The rightmost *size* bits from register *rs* are merged into the value from register *rt* starting at bit position *pos*. The result is put back in register *rt*. These *pos* and *size* values are converted by the instruction into the fields *msb* (the most significant bit of the field), and *lsb* (least significant bit of the field), as follows:

```
pos ← DSPControl<sub>5..0</sub>
size ← DSPControl<sub>12..7</sub>
msb ← pos+size-1
lsb ← pos
```

The values of *pos* and *size* must satisfy all of the following relations, or the instruction results in UNPREDICTABLE results:

0 ≤ pos < 32 0 < size ≤ 32 0 < pos+size ≤ 32

Figure 6.3 shows the symbolic operation of the instruction.

#### size size-1 msb-lsb+1 msb-lsb 0 31 ABCD GPR rs EFGH 32-size size 32-(msb-lsb+1) msb-lsb+1 pos pos-1 Isb Isb-1 pos+size pos+size-1 31 msb+1 msb 0 GPR rt IJKL MNOP QRST Initial 32-(pos+size) size pos Value 32-(msb+1) msb-lsb+1 İsb pos+size pos+size-1 pos pos-1 31 msb+1 msb lsb isb-1 0 GPR rt **IJKL** QRST EFGH Final 32-(pos+size) size pos Value 32-(msb+1) msb-lsb+1 İsb

### Figure 6.3 Operation of the INSV Instruction

#### **Restrictions:**

The operation is **UNPREDICTABLE** if lsb > msb.

#### **Operation:**

ValidateAccessToDSPResources()

### INSV

```
if (lsb > msb) then
    UNPREDICTABLE
endif
GPR[rt]<sub>31..0</sub> ← GPR[rt]<sub>31..msb+1</sub> || GPR[rs]<sub>msb-lsb..0</sub> || GPR[rt]<sub>lsb-1..0</sub>
```

#### **Exceptions:**

Reserved Instruction, DSP Disabled

#### **Implementation Notes**

The destination of this instruction is register rt because that register is used as both a source and destination of the instruction. Since most implementations have potential critical paths around source register decode, and typically decode registers rs and rt as source registers, the instruction is defined with the destination as register rt instead of register rd to minimize the impact on source register decode.

One implementation method is to shift the register *rs* value left by *lsb* bits and merge that value into the register *rt* value based on a merge mask. The merge mask has a 1 in every bit position from which the corresponding output bit comes from register *rs* and a 0 in every bit position from which the corresponding output bit comes from register *rt*. The mask can be calculated by subtracting two constants generated from the fields of the instruction, as follows:

Some implementations may choose to use the ALU to calculate the *merge\_mask* in parallel with shifting the register *rs* value to the left, then using the *merge\_mask* to bit-select from the register *rt* value or the shifted register *rs* value.

### LBUX

```
Load Unsigned Byte Indexed
```

```
Format: LBUX rd, index(base)
LBUX rd, rs(rt)
```

DSP Replaced with LBUX in nanoMIPS

#### Purpose: Load Unsigned Byte Indexed

To load a byte from memory as an unsigned value, using indexed addressing.

### **Description:** rd $\leftarrow$ memory[base+index]

The contents of GPR *index* is added to the contents of GPR *base* to form an effective address. The contents of the 8bit byte at the memory location specified by the aligned effective address are fetched, zero-extended to the GPR register length and placed in GPR *rd*.

#### **Restrictions:**

None.

#### **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{vAddr}_{31..0} \leftarrow \mbox{GPR[index]}_{31..0} + \mbox{GPR[base]}_{31..0} \\ ( \mbox{pAddr, CCA} ) \leftarrow \mbox{AddressTranslation( vAddr, DATA, LOAD )} \\ \mbox{pAddr} \leftarrow \mbox{pAddr}_{PSIZE-1..2} \mid \mid ( \mbox{pAddr}_{1..0} \mbox{ xor ReverseEndian}^2 ) \\ \mbox{memword}_{\mbox{GPRLEN..0}} \leftarrow \mbox{LoadMemory ( CCA, BYTE, pAddr, vAddr, DATA )} \\ \mbox{GPR[rd]}_{31..0} \leftarrow \mbox{zero\_extend( memword}_{7..0} ) \\ \end{array}
```

### **Exceptions:**

Reserved Instruction, DSP Disabled, TLB Refill, TLB Invalid, Bus Error, Address Error, Watch

# LBUX

### LHX

```
Load Halfword Indexed
```

```
Format: LHX rd, index(base)
LHX rd, rs(rt)
```

DSP Replaced with LHX in nanoMIPS

Purpose: Load Halfword Indexed

To load a halfword value from memory as a signed value, using indexed addressing.

**Description:** rd  $\leftarrow$  memory[base+index]

The contents of GPR *index* is added to the contents of GPR *base* to form an effective address. The contents of the 16bit halfword at the memory location specified by the aligned effective address are fetched, sign-extended to the length of the destination GPR, and placed in GPR *rd*.

#### **Restrictions:**

The effective address must be naturally-aligned. If the least-significant bit of the effective address is non-zero, an Address Error exception occurs.

#### **Operation:**

```
ValidateAccessToDSPResources()
vAddr<sub>31..0</sub> ← GPR[index]<sub>31..0</sub> + GPR[base]<sub>31..0</sub>
if ( vAddr<sub>0</sub>≠0 ) then
    SignalException( AddressError )
endif
( pAddr, CCA ) ← AddressTranslation( vAddr, DATA, LOAD )
halfword<sub>GPRLEN..0</sub> ← LoadMemory( CCA, HALFWORD, pAddr, vAddr, DATA )
GPR[rd]<sub>31..0</sub> ← sign_extend( halfword<sub>15..0</sub> )
```

### **Exceptions:**

Reserved Instruction, DSP Disabled, TLB Refill, TLB Invalid, Bus Error, Address Error, Watch

### LWX

### Load Word Indexed

```
Format: LWX rd, index(base)
    LWX rd, rs(rt)
```

DSP Replaced with LWX in nanoMIPS

Purpose: Load Word Indexed

To load a word value from memory as a signed value, using indexed addressing.

```
Description: rd \leftarrow memory[base+index]
```

The contents of GPR *index* is added to the contents of GPR *base* to form an effective address. The contents of the 32bit word at the memory location specified by the aligned effective address are fetched and placed in GPR *rd*.

### **Restrictions:**

The effective address must be naturally-aligned. If either of the two least-significant bits of the address are non-zero, an Address Error exception occurs.

### **Operation:**

### **Exceptions:**

Reserved Instruction, DSP Disabled, TLB Refill, TLB Invalid, Bus Error, Address Error, Watch

# LWX

Load Word Indexed

| AN | DD             |    |            |    |    |    |       |       | Multiply | V | lord a | nd | Add | to A | Acc | umula | tor |
|----|----------------|----|------------|----|----|----|-------|-------|----------|---|--------|----|-----|------|-----|-------|-----|
|    | 31             | 26 | 25         | 21 | 20 | 16 | 15 14 | 13 12 | 11 9     | 1 | 8      | 6  | 5   | 3    | 2   | 0     |     |
|    | P32A<br>001000 |    | rt         |    | rs |    | ac    | 00    | 101      |   | 010    |    | 11  | 1    |     | 111   |     |
| L  | 6              |    | 5          |    | 5  |    | 2     | 2     | 3        |   | 3      |    | 3   |      |     | 3     | 1   |
|    | Format: MAD    | D  | ac, rs, rt |    |    |    |       |       |          |   |        |    |     |      |     |       | DS  |

Purpose: Multiply Word and Add to Accumulator

To multiply two 32-bit integer words and add the 64-bit result to the specified accumulator.

**Description:** (HI[ac]||L0[ac])  $\leftarrow$  (HI[ac]||L0[ac]) + (rs<sub>31.0</sub> \* rt<sub>31.0</sub>)

The 32-bit signed integer word in register *rs* is multiplied by the corresponding 32-bit signed integer word in register *rt* to produce a 64-bit result. The 64-bit product is added to the specified 64-bit accumulator.

These special registers HI and LO are specified by the value of ac. When ac=0, this refers to the original HI/LO register pair of the MIPS32 architecture.

In Release 6 of the MIPS Architecture, accumulators are eliminated from MIPS32.

No arithmetic exception occurs under any circumstances.

#### **Restrictions:**

None

This instruction does not provide the capability of writing directly to a target GPR.

#### **Operation:**

```
if (( ac ≠ 0 ) or (Config<sub>AR</sub> ≥ 2)) then
    ValidateAccessToDSP2Resources()
endif
temp<sub>63..0</sub> ← ((GPR[rs]<sub>31</sub>)<sup>32</sup> || GPR[rs]<sub>31..0</sub>) * ((GPR[rt]<sub>31</sub>)<sup>32</sup> || GPR[rt]<sub>31..0</sub>)
acc<sub>63..0</sub> ← ( HI[ac]<sub>31..0</sub> || LO[ac]<sub>31..0</sub> ) + temp<sub>63..0</sub>
( HI[ac]<sub>31..0</sub> || LO[ac]<sub>31..0</sub> ) ← acc<sub>63..32</sub> || acc<sub>31..0</sub>
```

#### **Exceptions:**

Reserved Instruction, DSP Disabled

#### **Implementation Notes:**

Processors which implement a multiplier array which is not square (for example,  $32 \times 16$ ), and which therefore has an operation latency which is data dependent, should assume that the shorter operand is in register *rt*.

#### **Programming Notes:**

In some processors the integer multiply operation may proceed asynchronously and allow other CPU instructions to execute before it is complete. An attempt to read *LO* or *HI* before the results are written interlocks until the results are ready. Asynchronous execution does not af fect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel.

Programs that require overflow detection must check for it explicitly.

Where the size of the operands are known, software should place the shorter operand in register *rt*. This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.

### MADD

| M | ADDU           |     |            |    |    | Mult  | tiply U | nsigned | Word and | Add to A | ccumula | tor |
|---|----------------|-----|------------|----|----|-------|---------|---------|----------|----------|---------|-----|
|   | 31             | 26  | 25 21      | 20 | 16 | 15 14 | 13 12   | 11 9    | 8 6      | 5 3      | 2 0     |     |
|   | P32A<br>001000 |     | rt         | rs |    | ac    | 01      | 101     | 010      | 111      | 111     |     |
|   | 6              |     | 5          | 5  |    | 2     | 2       | 3       | 3        | 3        | 3       | 1   |
|   | Format: MA     | DDU | ac, rs, rt |    |    |       |         |         |          |          |         | DSP |

Purpose: Multiply Unsigned Word and Add to Accumulator

To multiply two 32-bit unsigned integer words and add the 64-bit result to the specified accumulator.

**Description:** (HI[ac]||LO[ac])  $\leftarrow$  (HI[ac]||LO[ac]) + (rs<sub>31..0</sub> \* rt<sub>31..0</sub>)

The 32-bit unsigned integer word in register *rs* is multiplied by the corresponding 32-bit unsigned integer word in register *rt* to produce a 64-bit result. The 64-bit product is added to the specified 64-bit accumulator.

These special registers HI and LO are specified by the value of ac. When ac=0, this refers to the original HI/LO register pair of the MIPS32 architecture.

In Release 6 of the MIPS Architecture, accumulators are eliminated from MIPS32.

No arithmetic exception occurs under any circumstances.

#### **Restrictions:**

None

This instruction does not provide the capability of writing directly to a target GPR.

#### **Operation:**

```
\begin{array}{l} \text{if } ((\text{ ac } \neq 0 \ ) \text{ or } (\text{Config}_{\text{AR}} \geq 2)) \text{ then } \\ \text{ValidateAccessToDSP2Resources()} \\ \text{endif} \\ \text{temp}_{64...0} \leftarrow (0^{32} \ || \ \text{GPR[rs]}_{31...0} \ ) \ \star (\ 0^{32} \ || \ \text{GPR[rt]}_{31...0} \ ) \\ \text{acc}_{63...0} \leftarrow ( \ \text{HI[ac]}_{31...0} \ || \ \text{LO[ac]}_{31...0} \ ) \ + \ \text{temp}_{63...0} \\ ( \ \text{HI[ac]}_{31...0} \ || \ \text{LO[ac]}_{31...0} \ ) \ \leftarrow \ \text{acc}_{63...32} \ || \ \text{acc}_{31...0} \end{array}
```

#### **Exceptions:**

Reserved Instruction, DSP Disabled

#### **Implementation Notes:**

Processors which implement a multiplier array which is not square (for example,  $32 \times 16$ ), and which therefore has an operation latency which is data dependent, should assume that the shorter operand is in register *rt*.

#### **Programming Notes:**

In some processors the integer multiply operation may proceed asynchronously and allow other CPU instructions to execute before it is complete. An attempt to read *LO* or *HI* before the results are written interlocks until the results are ready. Asynchronous execution does not af fect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel.

Programs that require overflow detection must check for it explicitly.

Where the size of the operands are known, software should place the shorter operand in register *rt*. This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.

# MADDU

Multiply Unsigned Word and Add to Accumulator

| MAQ_S[A].W  | /.PHL  |                     | N   | lultiply with | Ac | cumu  | late | e Si | ngle Ve  | cto | r Fractio | nal | Halfw | ore | d Eleme | ent        |
|-------------|--------|---------------------|-----|---------------|----|-------|------|------|----------|-----|-----------|-----|-------|-----|---------|------------|
| 31          |        | 25                  | 21  | 20            | 16 | 15 14 | 13   | 12   | 11       | 9   | 8 6       | 5   | 3     | 2   | 0       | ٦          |
| MAQ_S.W.PH  |        |                     |     |               |    | 1     |      |      |          |     |           |     |       |     |         | -          |
| P32<br>0010 |        | rt                  |     | rs            |    | ac    | 0    | 1    | 101      |     | 001       |     | 111   |     | 111     |            |
| MAQ_SA.W.PI | HL     | Į                   |     | <u> </u>      |    |       |      |      | <u> </u> |     |           | -   |       | -   |         | 1          |
| P32<br>0010 |        | rt                  |     | rs            |    | ac    | 1    | 1    | 101      |     | 001       |     | 111   |     | 111     |            |
| 6           |        | 5                   |     | 5             |    | 2     | 1    | 1    | 3        | -   | 3         | -   | 3     | -   | 3       | 1          |
| Format      | : MAQ_ | S[A].W.PHL          | I   |               |    |       |      |      |          |     |           |     |       |     |         |            |
|             | MAQ_   | S.W.PHL<br>SA.W.PHL | ac, |               |    |       |      |      |          |     |           |     |       |     |         | DSF<br>DSF |

Purpose: Multiply with Accumulate Single Vector Fractional Halfword Element

To multiply one pair of elements from two vectors of fractional halfword values using full-sized intermediate products and accumulate the result into the specified 64-bit accumulator, with optional saturating accumulation.

**Description:** ac  $\leftarrow$  sat32(ac + sat32(rs<sub>31..16</sub> \* rt<sub>31..16</sub>))

The left-most Q15 fractional halfword values from the paired halfword vectors in each of registers rt and rs are multiplied together, and the product left-shifted by one bit position to generate a Q31 fractional format intermediate result. If both multiplicands are equal to -1.0 in Q15 fractional format (0x8000 hexadecimal), the intermediate result is saturated to the maximum positive Q31 fractional value (0x7FFFFFFF hexadecimal). The intermediate result is then sign-extended and accumulated into accumulator ac to generate a 64-bit Q32.31 fractional format result.

In the saturating accumulation variant of this instruction, if the accumulati on of the intermediate product with the accumulator results in a value that cannot be represented as a Q31 fractional format value, the accumulator is saturated to either the maximum positive Q31 fractional format value (0x7FFFFFFF hexadecimal) or the minimum negative Q31 fractional format value (0x80000000), sign-extended to 64 bits.

The value of *ac* can range from 0 to 3; a value of 0 refers to the original *HI/LO* register pair of the MIPS32 architecture.

If overflow or saturation occurs, a 1 is written to one of bits 16 through 19 of the *DSPControl* register, within the *ouflag* field. The value of *ac* determines which of these bits is set: bit 16 corresponds to *ac0*, bit 17 to *ac1*, bit 18 to *ac2*, and bit 19 to *ac3*.

### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
\begin{array}{l} \text{MAQ\_S.W.PHL} \\ \text{ValidateAccessToDSPResources()} \\ \text{tempA}_{31..0} \leftarrow \text{multiplyQ15Q15(ac, GPR[rs]_{31..16}, GPR[rt]_{31..16})} \\ \text{tempB}_{63..0} \leftarrow (\text{HI}[ac]_{31..0} \mid \mid \text{LO}[ac]_{31..0}) + ((\text{tempA}_{31})^{32} \mid \mid \text{tempA}_{31..0}) \\ (\text{HI}[ac]_{31..0} \mid \mid \text{LO}[ac]_{31..0}) \leftarrow \text{tempB}_{63..32} \mid \mid \text{tempB}_{31..0} \end{array}
\begin{array}{l} \text{MAQ\_SA.W.PHL} \\ \text{ValidateAccessToDSPResources()} \\ \text{tempA}_{31..0} \leftarrow \text{multiplyQ15Q15(ac, GPR[rs]_{31..16}, GPR[rt]_{31..16})} \\ \text{tempA}_{31..0} \leftarrow \text{sat32AccumulateQ31(ac, temp)} \\ \text{tempB}_{63..0} \leftarrow (\text{tempA}_{31})^{32} \mid \mid \text{tempA}_{31..0} \\ (\text{HI}[ac]_{31..0} \mid \mid \text{LO}[ac]_{31..0}) \leftarrow \text{tempB}_{63..32} \mid \mid \text{tempB}_{31..0} \end{array}
```

### MAQ\_S[A].W.PHL

### **Multiply with Accumulate Single Vector Fractional Halfword Element**

```
function sat32AccumulateQ31( acc<sub>1..0</sub>, a<sub>31..0</sub> )
  sign<sub>A</sub> \leftarrow a<sub>31</sub>
  temp<sub>63..0</sub> \leftarrow HI[acc]<sub>31..0</sub> || LO[acc]<sub>31..0</sub>
  temp<sub>63..0</sub> \leftarrow temp + ( (sign<sub>A</sub>)<sup>32</sup> || a<sub>31..0</sub> )
  if ( temp<sub>32</sub> \neq temp<sub>31</sub> ) then
    if ( temp<sub>32</sub> = 0 ) then
       temp<sub>31..0</sub> \leftarrow 0x80000000
  else
       temp<sub>31..0</sub> \leftarrow 0x7FFFFFF
  endif
  DSPControl<sub>ouflag:16+acc</sub> \leftarrow 1
  endif
  return temp<sub>31..0</sub>
endfunction sat32AccumulateQ31
```

### **Exceptions:**

Reserved Instruction, DSP Disabled

### **Programming Notes:**

The MAQ\_SA version of the instruction is useful for compliance with some ITU speech processing codecs that require a 32-bit saturation after every multiply-accumulate operation.

| MAQ_S[A].W    | .PHR |                     | N  | Multiply with | Ac | cur | nul | ate | e Si | ngle Ve | ecto | or Fracti | ona | al Ha | lfw | ord | Elei | ment   |
|---------------|------|---------------------|----|---------------|----|-----|-----|-----|------|---------|------|-----------|-----|-------|-----|-----|------|--------|
| 31            | -    | 25                  | 21 | 20            | 16 | 15  | 14  | 13  | 12   | 11      | 9    | 8         | 6   | 5     | 3   | 2   |      | 0      |
| MAQ_S.W.PHR   |      |                     |    |               |    |     |     |     |      |         |      |           |     |       |     |     |      |        |
| P32A<br>00100 |      | rt                  |    | rs            |    | a   | c   | 0   | 0    | 101     |      | 001       |     | 11    | 1   |     | 111  |        |
| MAQ_SA.W.PH   | R    | Į                   |    |               |    |     |     |     |      | Į       |      |           |     |       |     | 1   |      |        |
| P32A<br>00100 |      | rt                  |    | rs            |    | a   | c   | 1   | 0    | 101     |      | 001       |     | 11    | 1   |     | 111  |        |
| 6             |      | 5                   |    | 5             |    | 2   | 2   | 1   | 1    | 3       |      | 3         |     | 3     |     | 1   | 3    |        |
| Format:       | ~_   | S[A].W.PHR          |    |               |    |     |     |     |      |         |      |           |     |       |     |     |      | Л      |
|               | _    | S.W.PHR<br>SA.W.PHR |    |               |    |     |     |     |      |         |      |           |     |       |     |     |      | D<br>D |

Purpose: Multiply with Accumulate Single Vector Fractional Halfword Element

To multiply one pair of elements from two vectors of fractional halfword values using full-sized intermediate products and accumulate the result into the specified 64-bit accumulator, with optional saturating accumulation.

**Description:** ac  $\leftarrow$  sat32(ac + sat32(rs<sub>15..0</sub> \* rt<sub>15..0</sub>))

The right-most Q15 fractional halfword values from each of the re gisters *rt* and *rs* are multiplied together and the product left-shifted by one bit position to generate a Q31 fractional format intermediate result. If both multiplicands are equal to -1.0 in Q15 fractional format (0x8000 hexadecimal), the intermediate result is saturated to the maximum positive Q31 fractional value (0x7FFFFFFF hexadecimal). The intermediate result is then sign-extended and accumulated into accumulator *ac* to generate a 64-bit Q32.31 fractional format result.

In the saturating accumulation variant of this instruction, if the accumulati on of the intermediate product with the accumulator results in a value that cannot be represented as a Q31 fractional format value, the accumulator is saturated to either the maximum positive Q31 fractional format value (0x7FFFFFFF hexadecimal) or the minimum negative Q31 fractional format value (0x80000000), sign-extended to 64 bits.

The value of *ac* can range from 0 to 3; a value of 0 refers to the original *HI/LO* register pair of the MIPS32 architecture.

If overflow or saturation occurs, a 1 is written to one of bits 16 through 19 of the *DSPControl* register, within the *ouflag* field. The value of *ac* determines which of these bits is set: bit 16 corresponds to *ac0*, bit 17 to *ac1*, bit 18 to *ac2*, and bit 19 to *ac3*.

### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
\begin{array}{l} \text{MAQ\_S.W.PHR} \\ \text{ValidateAccessToDSPResources()} \\ \text{tempA}_{31..0} \leftarrow \text{multiplyQ15Q15(ac, GPR[rs]_{15..0}, GPR[rt]_{15..0})} \\ \text{tempB}_{63..0} \leftarrow (\text{HI}[ac]_{31..0} \mid \mid \text{LO}[ac]_{31..0}) + ((\text{tempA}_{31})^{32} \mid \mid \text{tempA}_{31..0}) \\ (\text{HI}[ac]_{31..0} \mid \mid \text{LO}[ac]_{31..0}) \leftarrow \text{tempB}_{63..32} \mid \mid \text{tempB}_{31..0} \end{array}
\begin{array}{l} \text{MAQ\_SA.W.PHR} \\ \text{ValidateAccessToDSPResources()} \\ \text{tempA}_{31..0} \leftarrow \text{multiplyQ15Q15(ac, GPR[rs]_{15..0}, GPR[rt]_{15..0})} \\ \text{tempA}_{31..0} \leftarrow \text{sat32AccumulateQ31(ac, temp)} \\ \text{tempB}_{63..0} \leftarrow (\text{tempA}_{31})^{32} \mid \mid \text{tempA}_{31..0} \\ (\text{HI}[ac]_{31..0} \mid \mid \text{LO}[ac]_{31..0}) \leftarrow \text{tempB}_{63..32} \mid \mid \text{tempB}_{31..0} \end{array}
```

# MAQ\_S[A].W.PHR

# Multiply with Accumulate Single Vector Fractional Halfword Element

## **Exceptions:**

Reserved Instruction, DSP Disabled

### **Programming Notes:**

The MAQ\_SA version of the instruction is useful for compliance with some ITU speech processing codecs that require a 32-bit saturation after every multiply-accumulate operation.

| MF | -HI            |    |        |    |       |       |       |      | I   | Nove from | n HI regist | ter |
|----|----------------|----|--------|----|-------|-------|-------|------|-----|-----------|-------------|-----|
|    | 31             | 26 | 25 2   | 21 | 20 16 | 15 14 | 13 12 | 11 9 | 8 6 | 5 3       | 2 0         |     |
|    | P32A<br>001000 |    | rt     |    | Х     | ac    | 00    | 000  | 001 | 111       | 111         |     |
|    | 6              |    | 5      |    | 5     | 2     | 2     | 3    | 3   | 3         | 3           |     |
|    | Format: MFH    | II | rs, ac |    |       |       |       |      |     |           |             | DSP |

Purpose: Move from HI register

To copy the special purpose *HI* register to a GPR.

**Description:**rs ← HI[ac]

The *HI* part of accumulator *ac* is copied to the general-purpose register rs. The *HI* part of the accumulator is defined to be bits 32 through 63 of the DSP Module accumulator register.

The value of *ac* can range from 0 to 3. When *ac*=0, this refers to the original *HI/LO* register pair of the MIPS32 architecture.

In Release 6 of the MIPS Architecture, accumulators are eliminated from MIPS32.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

### **Operation:**

```
if (( ac ≠ 0 ) or (Config<sub>AR</sub> ≥2)) then
    ValidateAccessToDSPResources()
endif
GPR[rs]<sub>31..0</sub> ← HI[ac]<sub>31..0</sub>
```

#### **Exceptions:**

## MFHI

| MF | -LO            |    |        |    |    |    |       |       |      | Π   | love fro | m | LO regis | ter |
|----|----------------|----|--------|----|----|----|-------|-------|------|-----|----------|---|----------|-----|
|    | 31             | 26 | 25     | 21 | 20 | 16 | 15 14 | 13 12 | 11 9 | 8 6 | 5        | 3 | 2 0      |     |
|    | P32A<br>001000 |    | rt     |    | x  |    | ac    | 01    | 000  | 001 | 111      |   | 111      |     |
|    | 6              |    | 5      |    | 5  |    | 2     | 2     | 3    | 3   | 3        |   | 3        |     |
|    | Format: MFL    | JO | rt, ac |    |    |    |       |       |      |     |          |   |          | DSP |

Purpose: Move from LO register

To copy the special purpose LO register to a GPR.

**Description:** rt ← LO[ac]

The LO part of accumulator *ac* is copied to the general-purpose register *rt*. The LO part of the accumulator is defined to be bits 0 through 31 of the DSP Module accumulator register.

The value of *ac* can range from 0 to 3. When *ac*=0, this refers to the original *HI/LO* register pair of the MIPS32 architecture.

In Release 6 of the MIPS Architecture, accumulators are eliminated from MIPS32.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

### **Operation:**

```
if (( ac ≠ 0 ) or (Config<sub>AR</sub> ≥2)) then
    ValidateAccessToDSPResources()
endif
GPR[rt]<sub>31..0</sub> ← LO[ac]<sub>31..0</sub>
```

### **Exceptions:**

# MFLO

| ODSUB |           |            |    |      |      |    | Мо | dul | ar Subtraction on | an | Inde | x Val |
|-------|-----------|------------|----|------|------|----|----|-----|-------------------|----|------|-------|
| 31    | 26        | 25         | 21 | 20 1 | 6 15 |    | 11 | 10  | 9                 | 3  | 2    | 0     |
| _     | 2A<br>000 | rt         |    | rs   |      | rd |    | x   | 1010010           |    | 10   | )1    |
| (     | 6         | 5          |    | 5    |      | 5  |    | 1   | 7                 |    |      | 3     |
| Forma | t: MODSU  | NB rd, rs, | rt |      |      |    |    |     |                   |    |      |       |

Purpose: Modular Subtraction on an Index Value

Do a modular subtraction on a specified index value, using the specified decrement and modular roll-around values.

**Description:**  $rd \leftarrow (GPR[rs] == 0 ? zero_extend(GPR[rt]_{23..8}) : GPR[rs] - GPR[rt]_{7..0})$ 

The 32-bit value in register *rs* is compared to the value zero. If it is zero, then the index value has reached the bottom of the buffer and must be rolled back around to the top of the buffer. The index value of the top element of the buffer is obtained from bits 8 through 23 in register *rt*, this value is zero-extended to 32 bits and written to destination register *rd*.

If the value of register *rs* is not zero, then it is simply decremented by the size of the elements in the buffer. The size of the elements, in bytes, is specified by bits 0 through 7 of register *rt*, interpreted as an unsigned integer.

This instruction does not modify the *ouflag* field in the DSPControl register.

### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

### **Operation:**

```
\begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{decr}_{7..0} \leftarrow \mbox{GPR[rt]}_{7..0} \\ \mbox{lastindex}_{15..0} \leftarrow \mbox{GPR[rt]}_{23..8} \\ \mbox{if ( GPR[rs]}_{31..0} = 0.00000000 ) \mbox{then} \\ \mbox{GPR[rd]}_{31..0} \leftarrow 0^{(\mbox{GPRLEN-16})} || \mbox{lastindex}_{15..0} \\ \mbox{else} \\ \mbox{GPR[rd]}_{31..0} \leftarrow \mbox{GPR[rs]}_{31..0} - \mbox{decr}_{7..0} \\ \mbox{endif} \end{array}
```

#### **Exceptions:**

### MODSUB

| MS | SUB            |     |            |    |    |    | Ν     | lultipl | y Word a | nc | l Sub | tra | ict fro | m A | ١cc | umul | ator |
|----|----------------|-----|------------|----|----|----|-------|---------|----------|----|-------|-----|---------|-----|-----|------|------|
|    | 31             | 26  | 25         | 21 | 20 | 16 | 15 14 | 13 12   | 11 9     | ε  | 3     | 6   | 5       | 3   | 2   | 0    |      |
|    | P32A<br>001000 |     | rt         |    | rs |    | ac    | 10      | 101      |    | 010   |     | 111     | 1   |     | 111  |      |
|    | 6              |     | 5          |    | 5  |    | 2     | 2       | 3        |    | 3     |     | 3       |     | 1   | 3    |      |
|    | Format: MS     | SUB | ac, rs, rt |    |    |    |       |         |          |    |       |     |         |     |     |      | DSP  |

Purpose: Multiply Word and Subtract from Accumulator

To multiply two 32-bit integer words and subtract the 64-bit result from the specified accumulator.

**Description:** (HI[ac]||L0[ac])  $\leftarrow$  (HI[ac]||L0[ac]) - (rs<sub>31.0</sub> \* rt<sub>31.0</sub>)

The 32-bit signed integer word in register *rs* is multiplied by the corresponding 32-bit signed integer word in register *rt* to produce a 64-bit result. The 64-bit product is subtracted from the specified 64-bit accumulator.

These special registers HI and LO are specified by the value of ac. When ac=0, this refers to the original HI/LO register pair of the MIPS32 architecture.

In Release 6 of the MIPS Architecture, accumulators are eliminated from MIPS32.

No arithmetic exception occurs under any circumstances.

#### **Restrictions:**

None

This instruction does not provide the capability of writing directly to a target GPR.

#### **Operation:**

```
if (( ac ≠ 0 ) or (Config<sub>AR</sub> ≥ 2)) then
    ValidateAccessToDSP2Resources()
endif
temp<sub>63..0</sub> ← ((GPR[rs]<sub>31</sub>)<sup>32</sup> || GPR[rs]<sub>31..0</sub>) * ((GPR[rt]<sub>31</sub>)<sup>32</sup> || GPR[rt]<sub>31..0</sub>)
acc<sub>63..0</sub> ← ( HI[ac]<sub>31..0</sub> || LO[ac]<sub>31..0</sub> ) - temp<sub>63..0</sub>
( HI[ac]<sub>31..0</sub> || LO[ac]<sub>31..0</sub> ) ← acc<sub>63..32</sub> ||acc<sub>31..0</sub>
```

#### **Exceptions:**

Reserved Instruction, DSP Disabled

#### **Implementation Notes:**

Processors which implement a multiplier array which is not square (for example,  $32 \times 16$ ), and which therefore has an operation latency which is data dependent, should assume that the shorter operand is in register *rt*.

#### **Programming Notes:**

In some processors the integer multiply operation may proceed asynchronously and allow other CPU instructions to execute before it is complete. An attempt to read *LO* or *HI* before the results are written interlocks until the results are ready. Asynchronous execution does not af fect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel.

Programs that require overflow detection must check for it explicitly.

Where the size of the operands are known, software should place the shorter operand in register *rt*. This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.

### MSUB

Multiply Word and Subtract from Accumulator

| MSUBU Multiply Unsigned Word and Add to Accumu |                |     |             |    |    |       |       |      |     |     |     | tor |
|------------------------------------------------|----------------|-----|-------------|----|----|-------|-------|------|-----|-----|-----|-----|
|                                                | 31             | 26  | 25 21       | 20 | 16 | 15 14 | 13 12 | 11 9 | 8 6 | 5 3 | 2 0 |     |
|                                                | P32A<br>001000 |     | rt          | rs |    | ac    | 11    | 101  | 010 | 111 | 111 |     |
|                                                | 6              |     | 5           | 5  |    | 2     | 2     | 3    | 3   | 3   | 3   | J   |
|                                                | Format: MS     | UBU | Jac, rs, rt |    |    |       |       |      |     |     |     | DSP |

Purpose: Multiply Unsigned Word and Add to Accumulator

To multiply two 32-bit unsigned integer words and subtract the 64-bit result from the specified accumulator.

**Description:** (HI[ac]||LO[ac])  $\leftarrow$  (HI[ac]||LO[ac]) - (rs<sub>31..0</sub> \* rt<sub>31..0</sub>)

The 32-bit unsigned integer word in register *rs* is multiplied by the corresponding 32-bit unsigned integer word in register *rt* to produce a 64-bit result. The 64-bit product is subtracted from the specified 64-bit accumulator.

These special registers HI and LO are specified by the value of ac. When ac=0, this refers to the original HI/LO register pair of the MIPS32 architecture.

In Release 6 of the MIPS Architecture, accumulators are eliminated from MIPS32.

No arithmetic exception occurs under any circumstances.

#### **Restrictions:**

None

This instruction does not provide the capability of writing directly to a target GPR.

#### **Operation:**

```
\begin{array}{l} \text{if } ((\text{ ac } \neq 0 \ ) \text{ or } (\text{Config}_{\text{AR}} \geq 2)) \text{ then } \\ \text{ValidateAccessToDSP2Resources()} \\ \text{endif} \\ \text{temp}_{64...0} \leftarrow (0^{32} \ || \ \text{GPR[rs]}_{31...0} \ ) \ \star (\ 0^{32} \ || \ \text{GPR[rt]}_{31...0} \ ) \\ \text{acc}_{63...0} \leftarrow (\ \text{HI[ac]}_{31...0} \ || \ \text{LO[ac]}_{31...0} \ ) \ - \ \text{temp}_{63...0} \\ (\ \text{HI[ac]}_{31...0} \ || \ \text{LO[ac]}_{31...0} \ ) \ \leftarrow \ \text{acc}_{63..32} \ || \ \text{acc}_{31...0} \end{array}
```

#### **Exceptions:**

Reserved Instruction, DSP Disabled

#### **Implementation Notes:**

Processors which implement a multiplier array which is not square (for example,  $32 \times 16$ ), and which therefore has an operation latency which is data dependent, should assume that the shorter operand is in register *rt*.

#### **Programming Notes:**

In some processors the integer multiply operation may proceed asynchronously and allow other CPU instructions to execute before it is complete. An attempt to read *LO* or *HI* before the results are written interlocks until the results are ready. Asynchronous execution does not af fect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel.

Programs that require overflow detection must check for it explicitly.

Where the size of the operands are known, software should place the shorter operand in register *rt*. This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.

# MSUBU

Multiply Unsigned Word and Add to Accumulator

| M | гні            |    |        |    |    |    |       |       |      |     | Move to HI register |     |     |  |
|---|----------------|----|--------|----|----|----|-------|-------|------|-----|---------------------|-----|-----|--|
|   | 31             | 26 | 25     | 21 | 20 | 16 | 15 14 | 13 12 | 11 9 | 8 6 | 5 3                 | 2 0 |     |  |
|   | P32A<br>001000 |    | х      |    | rs |    | ac    | 10    | 000  | 001 | 111                 | 111 |     |  |
|   | 6              |    | 5      |    | 5  |    | 2     | 2     | 3    | 3   | 3                   | 3   | 1   |  |
|   | Format: MTH    | I  | rs, ac |    |    |    |       |       |      |     |                     |     | DSP |  |

Purpose: Move to HI register

To copy a GPR to the special purpose HI part of the specified accumulator register.

**Description:** HI [ac] ← GPR [rs]

The source register rs is copied to the *HI* part of accumulator ac. The *HI* part of the accumulator is defined to be bits 32 to 63 of the DSP Module accumulator register.

The value of *ac* can range from 0 to 3. When *ac*=0, this refers to the original *HI/LO* register pair of the MIPS32 architecture.

In Release 6 of the MIPS Architecture, accumulators are eliminated from MIPS32.

#### **Restrictions:**

A computed result written to the *HI/LO* pair by DIV, DIVU, DDIV, DDIVU, DMULT, DMULTU, MULT, or MULTU must be read by MF HI or MFLO before a new result can be written into either HI or LO. Note that this restriction only applies to the original *HI/LO* accumulator pair, and does not apply to the new accumulators, *ac1*, *ac2*, and *ac3*.

If an MTHI instruction is executed following one of these arithmetic instructions, but before an M FLO or MFHI instruction, the contents of *LO* are **UNPREDICTABLE**. The following example shows this illegal situation:

### **Operation:**

```
if (( ac ≠ 0 ) or (Config<sub>AR</sub> ≥ 2)) then
    ValidateAccessToDSPResources()
endif
HI[ac]<sub>31,.0</sub> ← GPR[rs]<sub>31,.0</sub>
```

### **Exceptions:**

## MTHI

| M | THLIP          |     |          |    | Сору  | LO t | 0 | HI and | d a GPR t | o L | O and | In | crem | en | t Pos | by | 32  |
|---|----------------|-----|----------|----|-------|------|---|--------|-----------|-----|-------|----|------|----|-------|----|-----|
|   | 31             | 26  | 25 2     | 21 | 20 16 | 15 1 | 4 | 13 12  | 11 9      | 8   | 6     | 5  | :    | 3  | 2     | 0  |     |
|   | P32A<br>001000 |     | Х        |    | rs    | ac   |   | 00     | 001       |     | 001   |    | 111  |    | 111   |    |     |
|   | 6              |     | 5        |    | 5     | 2    |   | 2      | 3         |     | 3     |    | 3    |    | 3     |    |     |
|   | Format: MT     | HLI | P rs, ac |    |       |      |   |        |           |     |       |    |      |    |       |    | DSP |

Purpose: Copy LO to HI and a GPR to LO and Increment Pos by 32

Copy the LO part of an accumulator to the HI part, copy a GPR to LO, and increment the pos field in the *DSPControl* register by 32.

**Description:** ac  $\leftarrow$  LO[ac]<sub>31..0</sub> || GPR[rs]<sub>31..0</sub>; DSPControl<sub>pos:5..0</sub> += 32

The 32 least-significant bits of the specified accumulator are copied to the most-significant 32 bits of the same accumulator. Then the 32 least-significant bits of register *rs* are copied to the least-significant 32 bits of the accumulator. The instruction then increments the value of bits 0 through 5 of the *DSPControl* register (the *pos* field) by 32.

The result of this instruction is **UNPREDICTABLE** if the value of the *pos* field before the execution of the instruction is greater than 32.

The value of *ac* can range from 0 to 3. When *ac*=0, this refers to the original *HI/LO* register pair of the MIPS32 architecture.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

## **Exceptions:**

Reserved Instruction, DSP Disabled

# MTHLIP

Copy LO to HI and a GPR to LO and Increment Pos by 32

| M | ſLO            |    |        |    |    |    |       |       |      |     | Move to | o LO regis | ster |
|---|----------------|----|--------|----|----|----|-------|-------|------|-----|---------|------------|------|
|   | 31             | 26 | 25     | 21 | 20 | 16 | 15 14 | 13 12 | 11 9 | 8 6 | 5 3     | 2 0        |      |
|   | P32A<br>001000 |    | х      |    | rs |    | ac    | 11    | 000  | 001 | 111     | 111        |      |
|   | 6              |    | 5      |    | 5  |    | 2     | 2     | 3    | 3   | 3       | 3          | -    |
|   | Format: MTI    | 0  | rs, ac |    |    |    |       |       |      |     |         |            | DSP  |

Purpose: Move to LO register

To copy a GPR to the special purpose LO part of the specified accumulator register.

**Description:** LO[ac] ← GPR[rs]

Thesource register *rs* is copied to the *LO* part of accumulator *ac*. The *LO* part of the accumulator is defined to be bits 0 to 32 of the DSP Module accumulator register.

The value of *ac* can range from 0 to 3. When *ac*=0, this refers to the original *HI/LO* register pair of the MIPS32 architecture.

In Release 6 of the MIPS Architecture, accumulators are eliminated from MIPS32.

#### **Restrictions:**

A computed result written to the *HI/LO* pair by DIV, DIVU, DDIV, DDIVU, DMULT, DMULTU, MULT, or MULTU must be read by MF HI or MFLO before a new result can be written into either HI or LO. Note that this restriction only applies to the original *HI/LO* accumulator pair, and does not apply to the new accumulators, *ac1*, *ac2*, and *ac3*.

If an MTHI instruction is executed following one of these arithmetic instructions, but before an M FLO or MFHI instruction, the contents of *LO* are **UNPREDICTABLE**. The following example shows this illegal situation:

## **Operation:**

```
if (( ac ≠ 0 ) or (Config<sub>AR</sub> ≥ 2)) then
    ValidateAccessToDSPResources()
endif
LO[ac]<sub>31..0</sub> ← GPR[rs]<sub>31..0</sub>
```

## **Exceptions:**

Reserved Instruction, DSP Disabled

# MTLO

| MUL[_ | S].PH |
|-------|-------|
|-------|-------|

Multiply Vector Integer HalfWords to Same Size Products

| 31             | 26 | 25 | 21 | 20 16 | 15 | 11 | 10 | 9       | 3 | 2 0 |
|----------------|----|----|----|-------|----|----|----|---------|---|-----|
| MUL.PH         |    |    |    |       |    |    |    |         |   |     |
| P32A<br>001000 |    | rt |    | rs    | rd |    | 0  | 0000101 |   | 101 |
| MUL_S.PH       |    | 1  |    | L     | -  |    | 1  |         |   |     |
| P32A<br>001000 |    | rt |    | rs    | rd |    | 1  | 0000101 |   | 101 |
| 6              |    | 5  |    | 5     | 5  | !  | 1  | 7       |   | 3   |

Format: MUL[\_S].PH MUL.PH rd, rs, rt MUL\_S.PH rd, rs, rt

DSP-R2 DSP-R2

Purpose: Multiply Vector Integer HalfWords to Same Size Products

Multiply two vector halfword values.

**Description:**  $rd \leftarrow (rs_{31..16} * rt_{31..16}) || (rs_{15..0} * rt_{15..0})$ 

Each of the two integer halfword elements in register *rs* is multiplied by the corresponding integer halfword element in register *rt* to create a 32-bit signed integer intermediate result.

In the non-saturation version of the instruction, the 16 least-significant bits of each 32-bit intermediate result are written to the corresponding vector element in destination register *rd*.

In the saturating version of the instruction, intermediate results that cannot be represented in 16 bits are clipped to either the maximum positive 16-bit value (0x7FFF hexadecimal) or the minimum negative 16-bit value (0x8000 hexadecimal), depending on the sign of the intermediate result. The saturated results are then written to the destination register.

To stay compliant with the base architecture, this instruction leaves the base *HI/LO* pair (accumulator *ac0*) **UNPRE-DICTABLE** after the operation completes. The other DSP Module accumulators, *ac1*, *ac2*, and *ac3*, are unchanged.

In the saturating instruction variant, if either multiplication results in an overflow or underflow, the instruction writes a 1 to bit 21 in the *ouflag* field in the *DSPControl* register.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

## MUL[\_S].PH

## **Multiply Vector Integer HalfWords to Same Size Products**

## $LO_{31..0} \leftarrow UNPREDICTABLE$

```
function MultiplyI16I16( a_{15..0}, b_{15..0} )
     \texttt{temp}_{31..0} \leftarrow \texttt{a}_{15..0} * \texttt{b}_{15..0} if ( \texttt{temp}_{31..0} > \texttt{0x7FFF} ) or ( \texttt{temp}_{31..0} < \texttt{0xFFFF8000} ) then
                \text{DSPControl}_{\text{ouflag:21}} \leftarrow 1
     endif
     return temp_{15..0}
endfucntion MultiplyI16I16
function satMultiplyI16I16(a_{15..0}, b_{15..0})
     \texttt{temp}_{\texttt{31..0}} \leftarrow \texttt{a}_{\texttt{15..0}} * \texttt{b}_{\texttt{15..0}}
     if ( temp_{31..0} > 0x7FFF ) then
           temp_{31..0} \leftarrow 0x00007FFF
           DSPControl_{ouflag:21} \leftarrow 1
     else
           if ( temp_{31..0} < 0xFFFF8000 ) then
                 temp_{31..0} \leftarrow 0xFFFF8000
                \text{DSPControl}_{\text{ouflag:21}} \leftarrow 1
           endif
     endif
     return temp_{15..0}
endfucntion satMultiplyI16I16
```

## **Exceptions:**

Reserved Instruction, DSP Disabled

## **Programming Notes:**

The base MIPS32 architecture states that upon the after a GPR-targeting multiply instruction such as MUL, the contents of *HI* and *LO* are **UNPREDICTABLE**. To stay compliant with the base architecture, this multiply instruction states the same requirement. But this requirement does not apply to the new accumulators ac1-ac3 and hence a programmer must save the value in ac0 (which is the same as *HI* and *LO*) across a GPR-targeting multiply instruction, it needed, while the values in ac1-ac3 do not need to be saved.

| LEQ_S.W.I     | PHL |    | Multip | bly Vector Fra | actional Le | eft Ha | lfw | ords to Expanded | Wid | th F | rodu |
|---------------|-----|----|--------|----------------|-------------|--------|-----|------------------|-----|------|------|
| 31            | 26  | 25 | 21 20  | ) 16           | 15          | 11     | 10  | 9                | 3   | 2    | 0    |
| P32A<br>00100 |     | rt |        | rs             | rd          |        | x   | 0000100          |     |      | 101  |
| 6             |     | 5  |        | 5              | 5           |        | 1   | 7                |     |      | 3    |

Format: MULEQ\_S.W.PHL rd, rs, rt

Purpose: Multiply Vector Fractional Left Halfwords to Expanded Width Products

Multiply two Q15 fractional halfword values to produce a Q31 fractional word result, with saturation.

**Description:**  $rd \leftarrow sat32(rs_{31..16} * rt_{31..16})$ 

The left-most Q15 fractional halfword value from the paired halfword vector in register rs is multiplied by the corresponding Q15 fractional halfword value from register rt. The result is left-shifted one bit position to create a Q31 format result and written into the destination register rd. If both input values are -1.0 in Q15 format (0x8000 in hexadecimal) the result is clamped to the maximum positive Q31 fractional value (0x7FFFFFFF in hexadecimal) before being written to the destination register.

To stay compliant with the base architecture, this instruction leaves the base H//LO pair (accumulator ac0) UNPRE-**DICTABLE** after the operation completes. The other DSP Module accumulators, *ac1*, *ac2*, and *ac3* are unmodified.

If the result is saturated, this instruction writes a 1 to bit 21 in the *outlag* field of the DSPControl register.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become UNPREDICTABLE.

## **Operation:**

```
ValidateAccessToDSPResources()
\texttt{temp}_{\texttt{31..0}} \leftarrow \texttt{multiplyQ15Q15ouflag21(GPR[rs]_{\texttt{31..16}}, GPR[rt]_{\texttt{31..16}})}
GPR[rd]_{31..0} \leftarrow temp_{31..0}
HI[0]_{31..0} \leftarrow UNPREDICTABLE
LO[0]<sub>31..0</sub> ← UNPREDICTABLE
function multiplyQ15Q15ouflag21( a<sub>15..0</sub>, b<sub>15..0</sub> )
      if ( a_{15..0} = 0x8000 ) and ( b_{15..0} = 0x8000 ) then
           temp_{31..0} \leftarrow 0x7FFFFFFF
           \text{DSPControl}_{\text{ouflag:21}} \leftarrow 1
     else
           \texttt{temp}_{\texttt{31..0}} \leftarrow (\texttt{a}_{\texttt{15..0}} * \texttt{b}_{\texttt{15..0}}) << \texttt{1}
      endif
     return temp_{31..0}
endfunction multiplyQ15Q15ouflag21
```

## **Exceptions:**

Reserved Instruction, DSP Disabled

#### **Programming Notes:**

The base MIPS32 architecture states that after a GPR-targeting multiply instruction such as MUL, the contents of registers *HI* and *LO* are **UNPREDICTABLE**. To maintain compliance with the base architecture this multiply instruction, MULEQ S.W.PHL, has the same requirement. Software must save and restore the aco register if the previous value in the ac0 register is needed following the MULEQ S.W.PHL instruction.

Note that the requirement on HI and LO does not apply to the new accumulator registers ac1, ac2, and ac3; as a result

# MULEQ\_S.W.PHL Multiply Vector Fractional Left Halfwords to Expanded Width Products

the values in these accumulators need not be saved.

| JLEQ_S.W.     | PHR  |          | Multi  | ply Vecto | r Frac | ctional Rig | iht Ha | altw | ords | to Expanded | I Wid | th P | rodu |
|---------------|------|----------|--------|-----------|--------|-------------|--------|------|------|-------------|-------|------|------|
| 31            | 26   | 25       | 21     | 20        | 16     | 15          | 11     | 10   | 9    |             | 3     | 2    | 0    |
| P32A<br>00100 |      | r        | t      | rs        |        | rd          |        | x    |      | 0001100     |       | 1    | 01   |
| 6             |      | 5        | 5      | 5         |        | 5           |        | 1    |      | 7           |       |      | 3    |
| Format:       | MULE | 2 S.W.PI | HR rd, | rs, rt    |        |             |        |      |      |             |       |      |      |

Purpose: Multiply Vector Fractional Right Halfwords to Expanded Width Products

Multiply two Q15 fractional halfword values to produce a Q31 fractional word result, with saturation.

**Description:**  $rd \leftarrow sat32(rs_{15..0} * rt_{15..0})$ 

The right-most Q15 fractional halfword value from register rs is multiplied by the corresponding Q15 fractional halfword value from register rt. The result is left-shifted one bit position to create a Q31 format result and written into the destination register rd. If both input values are -1.0 in Q15 format (0x8000 in hexadecimal) the result is clamped to the maximum positive Q31 fractional value (0x7FFFFFF in hexadecimal) before being written to the destination register.

To stay compliant with the base architecture, this instruction leaves the base HI/LO pair (accumulator ac0) UNPRE-**DICTABLE** after the operation completes. The other DSP Module accumulators, *ac1*, *ac2*, and *ac3* are unmodified.

If the result is saturated, this instruction writes a 1 to bit 21 in the *outlag* field of the DSPControl register.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become UNPREDICTABLE.

## **Operation:**

```
ValidateAccessToDSPResources()
\texttt{temp}_{\texttt{31..0}} \leftarrow \texttt{multiplyQ15Q15ouflag21( GPR[rs]_{15..0}, GPR[rt]_{15..0} )}
GPR[rd]_{31..0} \leftarrow temp_{31..0}
HI[0]<sub>31..0</sub> ← UNPREDICTABLE
LO[0]<sub>31</sub> _{0} \leftarrow UNPREDICTABLE
function multiplyQ15Q15ouflag21( a<sub>15..0</sub>, b<sub>15..0</sub> )
      if ( a_{15..0} = 0x8000 ) and ( b_{15..0} = 0x8000 ) then
           temp_{31..0} \leftarrow 0x7FFFFFFF
           DSPControl_{ouflag:21} \leftarrow 1
      else
           \texttt{temp}_{\texttt{31..0}} \leftarrow \texttt{(} \texttt{a}_{\texttt{15..0}} \texttt{ * } \texttt{b}_{\texttt{15..0}} \texttt{)} \mathrel{<<} \texttt{1}
      endif
      return temp_{31..0}
endfunction multiplyQ15Q15ouflag21
```

#### **Exceptions:**

Reserved Instruction, DSP Disabled

## **Programming Notes:**

The base MIPS32 architecture states that after a GPR-targeting multiply instruction such as MUL, the contents of registers HI and LO are UNPREDICTABLE. To maintain compliance with the base architecture this multiply instruction, MULEQ S.W.PHR, has the same requirement. Software must save and restore the ac0 register if the previous value in the ac0 register is needed following the MULEQ S.W.PHR instruction.

# MULEQ\_S.W.PHR Multiply Vector Fractional Right Halfwords to Expanded Width Products

Note that the requirement on *HI* and *LO* does not apply to the new accumulator registers *ac1*, *ac2*, and *ac3*; as a result the values in these accumulators need not be saved.

| IUL | .EU_S.PH.QE    | BL   |          | Mult   | iply Unsi | gned ' | Vector L | eft Byt | es k | by Halfwords to Ha | lfwo | rd F | Produ | icts |
|-----|----------------|------|----------|--------|-----------|--------|----------|---------|------|--------------------|------|------|-------|------|
| 3   | 1              | 26   | 25       | 21     | 20        | 16     | 15       | 11      | 10   | 9                  | 3    | 2    | 0     |      |
|     | P32A<br>001000 |      | rt       |        | rs        |        | rd       |         | x    | 0010010            |      |      | 101   |      |
|     | 6              |      | 5        |        | 5         |        | 5        |         | 1    | 7                  |      |      | 3     |      |
|     | Format: MI     | JLEU | J S.PH.Q | BL rd, | rs, rt    |        |          |         |      |                    |      |      |       | DS   |

Purpose: Multiply Unsigned Vector Left Bytes by Halfwords to Halfword Products

Multiply two left-most unsigned byte vector elements in a byte vector by two unsigned halfword vector elements to produce two unsigned halfword results, with saturation.

```
Description: rd \leftarrow sat16(rs_{31..24} * rt_{31..16}) || sat16(rs_{23..16} * rt_{15..0})
```

The two left-most unsigned byte elements in a four-element byte vector in register *rs* are multiplied as unsigned integer values with the four corresponding unsigned halfword elements from register *rt*. The eight most-significant bits of each 24-bit result are discarded, and the remaining 16 least-significant bits are written to the corresponding elements in halfword vector register *rd*. The instruction saturates the result to the maximum positive value (0xFFFF hexadecimal) if any of the discarded bits from each intermediate result are non-zero.

To stay compliant with the base architecture, this instruction leaves the base *HI/LO* pair (accumulator *ac0*) **UNPRE-DICTABLE** after the operation completes. The other DSP Module accumulators, *ac1*, *ac2*, and *ac3* are unmodified.

If either result is saturated this instruction writes a 1 to bit 21 in the DSPControl register in the ouflag field.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

#### **Exceptions**:

Reserved Instruction, DSP Disabled

#### **Programming Notes:**

The base MIPS32 architecture states that after a GPR-targeting multiply instruction such as MUL, the contents of registers *HI* and *LO* are **UNPREDICTABLE**. To maintain compliance with the base architecture this multiply instruction, MULEU\_S.PH.QBL, has the same requirement. Software must save and restore the *ac0* register if the previous value in the *ac0* register is needed following the MULEU\_S.PH.QBL instruction.

# MULEU\_S.PH.QBL Multiply Unsigned Vector Left Bytes by Halfwords to Halfword Products

Note that the requirement on *HI* and *LO* does not apply to the new accumulator registers *ac1*, *ac2*, and *ac3*; as a result the values in these accumulators need not be saved.

| MULEU | _S.PH.QBR      | N        | lultiply | Unsigne  | d Vec | tor Right | Bytes | wit | h halfw | ords to Ha | lf Wo | rd Pr | oduct |
|-------|----------------|----------|----------|----------|-------|-----------|-------|-----|---------|------------|-------|-------|-------|
| 31    | 26             | 25       | 21       | 20       | 16    | 15        | 11    | 10  | 9       |            | 3     | 2     | 0     |
|       | P32A<br>001000 | r        | t        | rs       |       | rd        |       | x   |         | 0011010    |       | 10    | )1    |
|       | 6              | 5        |          | 5        |       | 5         |       | 1   | 1       | 7          |       | 3     | }     |
| Fo    | ormat: MULE    | U S.PH.( | QBR rd   | , rs, rt |       |           |       |     |         |            |       |       | ]     |

Purpose: Multiply Unsigned Vector Right Bytes with halfwords to Half Word Products

Element-wise multiplication of unsigned byte elements with corresponding unsigned halfword elements, with saturation.

```
Description: rd \leftarrow sat16(rs_{15..8} * rt_{31..16}) || sat16(rs_{7..0} * rt_{15..0})
```

The two right-most unsigned byte elements in a four-element byte vector in register *rs* are multiplied as unsigned integer values with the corresponding right-most 16-bit unsigned values from register *rt*. Each result is clipped to preserve the 16 least-significant bits and written back into the respective halfword element positions in the destination register *rd*. The instruction saturates the result to the maximum positive value (0xFFFF hexadecimal) if any of the clipped bits are non-zero.

To stay compliant with the base architecture, this instruction leaves the base *H*//*LO* pair (accumulator *ac0*) **UNPRE-DICTABLE** after the operation completes. The other DSP Module accumulators, *ac1*, *ac2*, and *ac3* must be unmodified.

This instruction writes a 1 to bit 21 in the *ouflag* field in the *DSPControl* register if either multiplication results in saturation.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

## **Exceptions:**

Reserved Instruction, DSP Disabled

#### **Programming Notes:**

The base MIPS32 architecture states that after a GPR-targeting multiply instruction such as MUL, the contents of registers *HI* and *LO* are **UNPREDICTABLE**. To maintain compliance with the base architecture this multiply instruction, MULEU\_S.PH.QBR, has the same requirement. Software must save and restore the *ac0* register if the previous value in the *ac0* register is needed following the MULEU\_S.PH.QBR instruction.

Note that the requirement on *HI* and *LO* does not apply to the new accumulator registers *ac1*, *ac2*, and *ac3*; as a result the values in these accumulators need not be saved.

MULEU\_S.PH.QBR Multiply Unsigned Vector Right Bytes with halfwords to Half Word Products

| JLQ_RS.F | РΗ        |           | Μι  | Itiply Vecto | or Fr | actional Ha | lfwo | ords | to Fractional Ha | fwo | rd F | Produ | cts |
|----------|-----------|-----------|-----|--------------|-------|-------------|------|------|------------------|-----|------|-------|-----|
| 31       | 26        | 25        | 21  | 20           | 16    | 15          | 11   | 10   | 9                | 3   | 2    | 0     |     |
| _        | 2A<br>000 | rt        |     | rs           |       | rd          |      | x    | 0100010          |     | -    | 101   |     |
| (        | 6         | 5         |     | 5            |       | 5           |      | 1    | 7                |     |      | 3     |     |
| Forma    | t: MULQ   | RS.PH rd, | rs, | rt           |       |             |      |      |                  |     |      |       | D   |

Purpose: Multiply Vector Fractional Halfwords to Fractional Halfword Products

Multiply Q15 fractional halfword vector elements with rounding and saturation to produce two Q15 fractional halfword results.

```
Description: rd \leftarrow rndQ15(rs<sub>31..16</sub> * rt<sub>31..16</sub>) || rndQ15(rs<sub>15..0</sub> * rt<sub>15..0</sub>)
```

The two Q15 fractional halfword elements from register *rs* are separately multiplied by the corresponding Q15 fractional halfword elements from register *rt* to produce 32-bit intermediate results. Each intermediate result is left-shifted by one bit position to produce a Q31 fractional value, then rounded by adding 0x00008000 hexadecimal. The rounded intermediate result is then truncated to a Q15 fractional value and written to the corresponding position in destination register *rd*.

If the two input values to either multiplication are both -1.0 (0x8000 in hexadecimal), the final halfword result is saturated to the maximum positive Q15 value (0x7FFF in hexadecimal) and rounding and truncation are not performed.

To stay compliant with the base architecture, this instruction leaves the base *HI/LO* pair (accumulator *ac0*) **UNPRE-DICTABLE** after the operation completes. The other DSP Module accumulators, *ac1*, *ac2*, and *ac3* must be unmodified.

If either result is saturated this instruction writes a 1 to bit 21 in the DSPControl register in the outflag field.

### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

```
ValidateAccessToDSPResources()
tempB_{15..0} \leftarrow rndQ15MultiplyQ15Q15(GPR[rs]_{31..16}, GPR[rt]_{31..16})
tempA_{15..0} \leftarrow rndQ15MultiplyQ15Q15(GPR[rs]_{15..0}, GPR[rt]_{15..0})
GPR[rd]_{31..0} \leftarrow tempB_{15..0} \mid tempA_{15..0}
HI[0]<sub>31..0</sub> ← UNPREDICTABLE
LO[0]<sub>31..0</sub> ← UNPREDICTABLE
function rndQ15MultiplyQ15Q15( a_{15..0}, b_{15..0} )
     if ( a_{15\ldots0} = 0x8000 ) and ( b_{15\ldots0} = 0x8000 ) then
           temp_{31..0} \leftarrow 0x7FFF0000
           DSPControl<sub>ouflag:21</sub> ← 1
     else
           \texttt{temp}_{\texttt{31..0}} \leftarrow \texttt{(} \texttt{a}_{\texttt{15..0}} \texttt{ * } \texttt{b}_{\texttt{15..0}} \texttt{)} \mathrel{<<} \texttt{1}
           \texttt{temp}_{\texttt{31..0}} \leftarrow \texttt{temp}_{\texttt{31..0}} + \texttt{0x00008000}
     endif
     return temp<sub>31..16</sub>
endfunction rndQ15MultiplyQ15Q15
```

## **Exceptions:**

Reserved Instruction, DSP Disabled

## MULQ\_RS.PH

## **Multiply Vector Fractional Halfwords to Fractional Halfword Products**

### **Programming Notes:**

The base MIPS32 architecture states that after a GPR-targeting multiply instruction such as MUL, the contents of registers *HI* and *LO* are **UNPREDICTABLE**. To maintain compliance with the base architecture, this multiply instruction, MULQ\_RS.PH, has the same requirement. Software must save and restore the *ac0* register if the previous value in the *ac0* register is needed following the MULQ\_RS.PH instruction.

Note that the requirement on *HI* and *LO* does not apply to the new accumulator registers *ac1*, *ac2*, and *ac3*; as a result, the values in these accumulators need not be saved.

| MU | JLQ_RS.W       |    | Multipl | y Fra | ctional Wo | ords t | o Sam | e Size P | rodu | uct w | ith Saturatio | n an | d Ro | oundir | ng |
|----|----------------|----|---------|-------|------------|--------|-------|----------|------|-------|---------------|------|------|--------|----|
|    | 31             | 26 | 25      | 21    | 20         | 16     | 15    | 11       | 10   | 9     |               | 3    | 2    | 0      |    |
|    | P32A<br>001000 |    | rt      |       | rs         |        |       | rd       | x    |       | 0110010       |      | 1    | 01     |    |
|    | 6              |    | 5       |       | 5          |        |       | 5        | 1    |       | 7             |      |      | 3      |    |

Format: MULQ\_RS.W rd, rs, rt

Purpose: Multiply Fractional Words to Same Size Product with Saturation and Rounding

Multiply fractional Q31 word values, with saturation and rounding.

**Description:**  $rd \leftarrow round(sat32(rs_{31..0} * rt_{31..0}))$ 

The Q31 fractional format words in registers *rs* and *rt* are multiplied together and the product shifted left by one bit position to create a 64-bit fractional format intermediate result. The intermediate result is rounded up by adding a 1 at bit position 31, and then truncated by discarding the 32 least-significant bits to create a 32-bit fractional format result. The result is then written to destination register *rd*.

If both input multiplicands are equal to -1 (0x80000000 hexadecimal), rounding is not performed and the maximum positive Q31 fractional format value (0x7FFFFFF hexadecimal) is written to the destination register.

To stay compliant with the base architecture, this instruction leaves the base *HI/LO* pair (accumulator *ac0*) **UNPRE-DICTABLE** after the operation completes. The other DSP Module accumulators, *ac1*, *ac2*, and *ac3*, are unchanged.

This instruction, on an overflow or underflow of the operation, writes a 1 to bit 21 in the *DSPControl* register in the *ouflag* field.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
ValidateAccessToDSP2Resources()

if ( GPR[rs]_{31..0} = 0x8000000 ) and ( GPR[rt]_{31..0} = 0x8000000 ) then

temp_{63..0} \leftarrow 0x7FFFFFF00000000

DSPControl_{ouflag:21} \leftarrow 1

else

temp_{63..0} \leftarrow ( GPR[rs]_{31..0} * GPR[rt]_{31..0} ) << 1

temp_{63..0} \leftarrow temp_{63..0} + ( 0^{32} || 0x8000000 )

endif

GPR[rd]_{31..0} \leftarrow temp_{63..32}

HI[0]_{31..0} \leftarrow UNPREDICTABLE

LO[0]_{31..0} \leftarrow UNPREDICTABLE
```

## **Exceptions:**

Reserved Instruction, DSP Disabled

#### **Programming Notes:**

The base MIPS32 architecture states that after a GPR-targeting multiply instruction such as MUL, the contents of registers *HI* and *LO* are **UNPREDICTABLE**. To maintain compliance with the base architecture, this multiply instruction, MULQ\_RS.W, has the same requirement. Software must save and restore the *ac0* register if the previous value in the *ac0* register is needed following the MULQ\_RS.W instruction.

Note that the requirement on *HI* and *LO* does not apply to the new accumulator registers *ac1*, *ac2*, and *ac3*; as a result, the values in these accumulators need not be saved.

DSP-R2

MULQ\_RS.W Multiply Fractional Words to Same Size Product with Saturation and Rounding

|    |         |      | wiun | іріу \                 | rector                    | Fractiona                    |                                    | an-w                        | forus to San                              | ie Si                         | ze r                                                                                                           | roau               |
|----|---------|------|------|------------------------|---------------------------|------------------------------|------------------------------------|-----------------------------|-------------------------------------------|-------------------------------|----------------------------------------------------------------------------------------------------------------|--------------------|
| 26 | 25      | 21   | 20   | 16                     | 15                        | 11                           | 10                                 | 9                           |                                           | 3                             | 2                                                                                                              | 0                  |
|    | rt      |      | rs   |                        |                           | rd                           | x                                  |                             | 0101010                                   |                               | 1                                                                                                              | 101                |
|    | 5       |      | 5    |                        |                           | 5                            | 1                                  |                             | 7                                         |                               |                                                                                                                | 3                  |
|    | 26<br>0 | 0 rt | o rt | 26 25 21 20<br>0 rt rs | 26 25 21 20 16<br>0 rt rs | 26 25 21 20 16 15<br>0 rt rs | 26 25 21 20 16 15 11<br>0 rt rs rd | 26 25 21 20 16 15 11 10<br> | 26 25 21 20 16 15 11 10 9<br>0 rt rs rd x | 26 25 21 20 16 15 11 10 9<br> | 26     25     21     20     16     15     11     10     9     3       0     rt     rs     rd     x     0101010 | rt rs rd x 0101010 |

Format: MULQ S.PH rd, rs, rt

Purpose: Multiply Vector Fractional Half-Words to Same Size Products

Multiply two vector fractional Q15 values to create a Q15 result, with saturation.

**Description:**  $rd \leftarrow sat16(rs_{31..16} * rt_{31..16}) || sat16(rs_{15..0} * rt_{15..0})$ 

The two vector fractional Q15 values in register rs are multiplied with the corresponding elements in register rt to produce two 32-bit products. Each product is left-shifted by one bit position to create a Q31 fractional word intermediate result. The two 32-bit intermediate results are then each truncated by discarding the 16 least-significant bits of each result, and the resulting Q15 fractional format halfwords are then written to the corresponding positions in destination register rd. For each halfword result, if both input multiplicands are equal to -1 (0x8000 hexadecimal), the final halfword result is saturated to the maximum positive Q15 value (0x7FFF hexadecimal).

To stay compliant with the base architecture, this instruction leaves the base H//LO pair (accumulator ac0) UNPRE-**DICTABLE** after the operation completes. The other D SP Module accumulators, *ac1*, *ac2*, and *ac3*, must be untouched.

This instruction, on an overflow or underflow of any one of the two vector operation, writes bit 21 in the *ouflag* field in the DSPControl register.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are UNPREDICTABLE and the values of the operand vectors become UNPREDICTABLE.

#### **Operation:**

```
ValidateAccessToDSP2Resources()
tempB_{31..0} \leftarrow sat16MultiplyQ15Q15(GPR[rs]_{31..16}, GPR[rt]_{31..16})
tempA_{31..0} \leftarrow sat16MultiplyQ15Q15(GPR[rs]_{15..0}, GPR[rt]_{15..0})
GPR[rd]_{31..0} \leftarrow tempB_{15..0} || tempA_{15..0}
HI[0]<sub>31..0</sub> ← UNPREDICTABLE
LO[0]_{31,..0} \leftarrow UNPREDICTABLE
function sat16MultiplyQ15Q15(a_{15...0}, b_{15...0})
    if ( a15..0 = 0x8000 ) and ( b15..0 = 0x8000 ) then
         temp_{31..0} \leftarrow 0x7FFF0000
         DSPControl_{ouflag:21} \leftarrow 1
    else
         temp_{31..0} \leftarrow (a_{15..0} * b_{15..0})
         \operatorname{temp}_{31\dots 0} \leftarrow (\operatorname{temp}_{30\dots 0} | | 0)
    endif
     return temp<sub>31..16</sub>
endfunction sat16MultiplyQ15Q15
```

## **Exceptions:**

Reserved Instruction, DSP Disabled

#### **Programming Notes:**

The base MIPS32 architecture states that after a GPR-targeting multiply instruction such as MUL, the contents of reg-

## MULQ\_S.PH

## **Multiply Vector Fractional Half-Words to Same Size Products**

isters *HI* and *LO* are **UNPREDICTABLE**. To maintain compliance with the base architecture, this multiply instruction, MULQ\_S.PH, has the same requirement. Software must save and restore the *ac0* register if the previous value in the *ac0* register is needed following the MULQ\_S.PH instruction.

Note that the requirement on *HI* and *LO* does not apply to the new accumulator registers *ac1*, *ac2*, and *ac3*; as a result, the values in these accumulators need not be saved.

| JLQ_S.W     |    |    |    | Multiply | / Fra | ctiona | al Words t | o Sa | ame | Size Produc | t witl | n Sa | iturati |
|-------------|----|----|----|----------|-------|--------|------------|------|-----|-------------|--------|------|---------|
| 31          | 26 | 25 | 21 | 20       | 16    | 15     | 11         | 10   | 9   |             | 3      | 2    | 0       |
| P32<br>0010 |    | rt |    | rs       |       |        | rd         | x    |     | 0111010     |        | 1    | 01      |
| 6           |    | 5  |    | 5        |       |        | 5          | 1    |     | 7           |        |      | 3       |

Format: MULQ S.W rd, rs, rt

Purpose: Multiply Fractional Words to Same Size Product with Saturation

Multiply two Q31 fractional format word values to create a fractional Q31 result, with saturation.

**Description:**  $rd \leftarrow sat32(rs_{31..0} * rt_{31..0})$ 

The Q31 fractional format words in registers rs and rt are multiplied together to create a 64-bit fractional format intermediate result. The intermediate result is left-shifted by one bit position, and then truncated by discarding the 32 least-significant bits to create a Q31 fractional format result. This result is then written to destination register rd.

If both input multiplicands are equal to -1 (0x80000000 hexadecimal), the product is clipped to the maximum positive Q31 fractional format value (0x7FFFFFF hexadecimal), and written to the destination register.

To stay compliant with the base architecture, this instruction leaves the base H//LO pair (accumulator ac0) UNPRE-**DICTABLE** after the operation completes. The other DSP Module accumulators, *ac1*, *ac2*, and *ac3*, are unchanged.

This instruction, on an overflow or underflow of the operation, writes a 1 to bit 21 in the DSPControl register in the ouflag field.

### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are UNPREDICTABLE and the values of the operand vectors become UNPREDICTABLE.

## **Operation:**

```
ValidateAccessToDSP2Resources()
if ( {\rm GPR}\,[{\rm rs}]_{\,31..0} = 0x80000000 ) and ( {\rm GPR}\,[{\rm rt}]_{\,31..0} = 0x80000000 ) then
     \texttt{temp}_{\texttt{63..0}} \leftarrow \texttt{0x7FFFFFF00000000}
     DSPControl_{ouflag:21} \leftarrow 1
else
     temp_{63..0} \leftarrow (GPR[rs]_{31..0} * GPR[rt]_{31..0}) << 1
endif
GPR[rd]_{31..0} \leftarrow temp_{63..32}
HI[0]<sub>31..0</sub> ← UNPREDICTABLE
LO[0]<sub>31.0</sub> ← UNPREDICTABLE
```

#### **Exceptions:**

Reserved Instruction, DSP Disabled

#### **Programming Notes:**

The base MIPS32 architecture states that after a GPR-targeting multiply instruction such as MUL, the contents of registers HI and LO are UNPREDICTABLE. To maintain compliance with the base architecture, this multiply instruction, MULQ S.W, has the same requirement. Software must save and restore the aco register if the previous value in the *ac0* register is needed following the MULQ S.W instruction.

Note that the requirement on HI and LO does not apply to the new accumulator registers ac1, ac2, and ac3; as a result, the values in these accumulators need not be saved.

# MULQ\_S.W

Multiply Fractional Words to Same Size Product with Saturation

| LSA.W.PH       |    |    | Multip | ly and Sub | otract | Vecto | or Integ | ger Halfv | vord El | em | ents a | nd | Ac | cumul |
|----------------|----|----|--------|------------|--------|-------|----------|-----------|---------|----|--------|----|----|-------|
| 31             | 26 | 25 | 21     | 20         | 16     | 15 14 | 13 12    | 11 9      | 8       | 6  | 5      | 3  | 2  | 0     |
| P32A<br>001000 | )  |    | rt     | rs         |        | ac    | 10       | 110       | 010     | )  | 111    |    |    | 111   |
| 6              |    |    | 5      | 5          |        | 2     | 2        | 3         | 3       |    | 3      |    |    | 3     |

```
Format: MULSA.W.PH ac, rs, rt
```

DSP-R2

Purpose: Multiply and Subtract Vector Integer Halfword Elements and Accumulate

To multiply and subtract two integer vector elements using full-size intermediate products, accumulating the result into the specified accumulator.

**Description:** ac  $\leftarrow$  ac + ((rs<sub>31..16</sub> \* rt<sub>31..16</sub>) - (rs<sub>15..0</sub> \* rt<sub>15..0</sub>))

Each of the two halfword integer elements from register *rt* are multiplied by the corresponding elements in *rs* to create two word results. The right-most result is subtracted from the left-most result to generate the intermediate result, which is then added to the specified 64-bit accumulator.

The value of ac selects an accumulator numbered from 0 to 3. When ac=0, this refers to the original HI/LO register pair of the MIPS32 architecture.

This instruction does not set any bits of the ouflag field in the DSPControl register.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the result is **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

```
 \begin{array}{l} \text{ValidateAccessToDSP2Resources()} \\ \text{tempB}_{31..0} \leftarrow (\text{GPR[rs]}_{31..16} * \text{GPR[rt]}_{31..16}) \\ \text{tempA}_{31..0} \leftarrow (\text{GPR[rs]}_{15..0} * \text{GPR[rt]}_{15..0}) \\ \text{dotp}_{32..0} \leftarrow ((\text{tempB}_{31}) \mid \mid \text{tempB}_{31..0}) - ((\text{tempA}_{31}) \mid \mid \text{tempA}_{31..0}) \\ \text{acc}_{63..0} \leftarrow (\text{HI[ac]}_{31..0} \mid \mid \text{LO[ac]}_{31..0}) + ((\text{dotp}_{32})^{31} \mid \mid \text{dotp}_{32..0}) \\ (\text{HI[ac]}_{31..0} \mid \mid \text{LO[ac]}_{31..0}) \leftarrow \text{acc}_{63..32} \mid \mid \text{acc}_{31..0} \end{array}
```

## **Exceptions:**

Reserved Instruction, DSP Disabled

MULSA.W.PH

Multiply and Subtract Vector Integer Halfword Elements and Accumulate

| MU | ILSAQ_S.W.PH   | I   |              | ľ   | Multiply And Subtract Vector Fractional Halfwords And Accumula |         |       |      |     |     |     |     |  |
|----|----------------|-----|--------------|-----|----------------------------------------------------------------|---------|-------|------|-----|-----|-----|-----|--|
|    | 31             | 26  | 25           | 21  | 20 16                                                          | 6 15 14 | 13 12 | 11 9 | 8 6 | 5 3 | 2 0 |     |  |
|    | P32A<br>001000 |     | rt           |     | rs                                                             | ac      | 11    | 110  | 010 | 111 | 111 |     |  |
|    | 6              |     | 5            |     | 5                                                              | 2       | 2     | 3    | 3   | 3   | 3   |     |  |
|    | Format: MU     | LSP | AQ_S.W.PH ac | ؛ , | rs, rt                                                         |         |       |      |     |     |     | DSP |  |

Purpose: Multiply And Subtract Vector Fractional Halfwords And Accumulate

Multiply and subtract two Q15 fractional halfword vector elements using full-size intermediate products, accumulating the result from the specified accumulator, with saturation.

**Description:** ac  $\leftarrow$  ac + (sat32(rs<sub>11.16</sub> \* rt<sub>31.16</sub>) - sat32(rs<sub>15.10</sub> \* rt<sub>15.10</sub>))

The two corresponding Q15 fractional values from registers *rt* and *rs* are multiplied together and left-shifted by 1 bit to generate two Q31 fractional format intermediate products. If the input multiplicands to either of the multiplications are both -1.0 (0x8000 hexadecimal), the intermediate result is saturated to 0x7FFFFFFF hexadecimal.

The two intermediate products (named left and right) are summed with alternating sign to create a sum-of-products, i.e., the sign of the right product is negated before summation. The sum-of-products is then sign-extended to 64 bits and accumulated into the specified 64-bit accumulator, producing a Q32.31 result.

The value of *ac* can range from 0 to 3; a v alue of 0 refers to the original *HI/LO* register pair of the MIPS32 architecture.

If saturation occurs, a 1 is written to one of bits 16 through 19 of the *DSPControl* register, within the *ouflag* field. The value of *ac* determines which of these bits is set: bit 16 corresponds to *ac0*, bit 17 to *ac1*, bit 18 to *ac2*, and bit 19 to *ac3*.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{tempB}_{31..0} \leftarrow \mbox{multiplyQ15Q15( ac, rs_{31..16}, rt_{31..16} )} \\ \mbox{tempA}_{31..0} \leftarrow \mbox{multiplyQ15Q15( ac, rs_{15..0}, rt_{15..0} )} \\ \mbox{dotp}_{63..0} \leftarrow (\mbox{(tempB}_{31})^{32} \mid \mid \mbox{tempB}_{31..0} ) - (\mbox{(tempA}_{31})^{32} \mid \mid \mbox{tempA}_{31..0} ) \\ \mbox{tempC}_{63..0} \leftarrow (\mbox{HI[ac]}_{31..0} \mid \mid \mbox{LO[ac]}_{31..0} ) + \mbox{dotp}_{63..0} \\ \mbox{(HI[ac]}_{31..0} \mid \mid \mbox{LO[ac]}_{31..0} ) \leftarrow \mbox{tempC}_{63..32} \mid \mid \mbox{tempC}_{31..0} \end{array}
```

## **Exceptions:**

Reserved Instruction, DSP Disabled

MULSAQ\_S.W.PH

Multiply And Subtract Vector Fractional Halfwords And Accumulate

| Μ | JLT            |    |            |    |      |    |       |       |      |   |    |   | I   | Mu | ltiply \ | Wo | rd  |
|---|----------------|----|------------|----|------|----|-------|-------|------|---|----|---|-----|----|----------|----|-----|
|   | 31             | 26 | 25         | 21 | 20 1 | 16 | 15 14 | 13 12 | 11 9 | 8 | 6  | 5 | :   | 3  | 2        | 0  |     |
|   | P32A<br>001000 |    | rt         |    | rs   |    | ac    | 00    | 110  | 0 | 10 |   | 111 |    | 111      |    |     |
|   | 6              |    | 5          |    | 5    |    | 2     | 2     | 3    | ; | 3  |   | 3   |    | 3        |    |     |
|   | Format: MUL    | т  | ac, rs, rt |    |      |    |       |       |      |   |    |   |     |    |          |    | DSP |

Purpose: Multiply Word

To multiply two 32-bit signed integers, writing the 64-bit result to the specified accumulator.

**Description:** ac  $\leftarrow$  rs<sub>31..0</sub> \* rt<sub>31..0</sub>

The 32-bit signed integer value in register *rt* is multiplied by the corresponding 32-bit signed integer value in register *rs*, to produce a 64-bit result that is written to the specified accumulator register.

The value of ac selects an accumulator numbered from 0 to 3. When ac=0, this refers to the original H/LO register pair of the MIPS32 architecture.

In Release 6 of the MIPS Architecture, accumulators are eliminated from MIPS32.

No arithmetic exception occurs under any circumstances.

#### **Restrictions:**

None

## **Operation:**

```
if (( ac ≠ 0 )or (Config<sub>AR</sub> ≥ 2)) then
    ValidateAccessToDSP2Resources()
endif
temp<sub>63..0</sub> ← ((GPR[rs]<sub>31</sub>)<sup>32</sup> || GPR[rs]<sub>31..0</sub>) * ((GPR[rt<sub>31</sub>)<sup>32</sup> || GPR[rt]<sub>31..0</sub>)
( HI[ac]<sub>31..0</sub> || LO[ac]<sub>31..0</sub> ) ← temp<sub>63..32</sub> || temp<sub>31..0</sub>
```

## **Exceptions:**

Reserved Instruction, DSP Disabled

## **Programming Notes:**

In some processors the integer multiply operation may proceed asynchronously and allow other CPU instructions to execute before it is complete. An attempt to read *LO* or *HI* before the results are written interlocks until the results are ready. Asynchronous execution does not af fect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel.

Programs that require overflow detection must check for it explicitly.

Where the size of the operands are known, software should place the shorter operand in register *rt*. This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.

## **Implementation Note:**

Processors which implement a multiplier array which is not square (for example,  $32 \times 16$ ), and which therefore has an operation latency which is data dependent, should assume that the shorter operand is in register *rt*.

# MULT

| MU | JLTU           |     |         |    |    |    |       |       |      | Mu  | Itiply Uns | signed Wo | ord |
|----|----------------|-----|---------|----|----|----|-------|-------|------|-----|------------|-----------|-----|
|    | 31             | 26  | 25      | 21 | 20 | 16 | 15 14 | 13 12 | 11 9 | 8 6 | 5 3        | 2 0       |     |
|    | P32A<br>001000 |     | rt      |    | rs |    | ac    | 01    | 110  | 010 | 111        | 111       |     |
|    | 6              |     | 5       |    | 5  |    | 2     | 2     | 3    | 3   | 3          | 3         | -   |
|    | Format: MU     | LTU | ac, rs, | rt |    |    |       |       |      |     |            |           | DSP |

Purpose: Multiply Unsigned Word

To multiply 32-bit unsigned integers, writing the 64-bit result to the specified accumulator.

**Description:** ac  $\leftarrow$  rs<sub>31..0</sub> \* rt<sub>31..0</sub>

The 32-bit unsigned integer value in register *rt* is multiplied by the corresponding 32-bit unsigned integer value in register *rs*, to produce a 64-bit unsigned result that is written to the specified accumulator register.

The value of ac selects an accumulator numbered from 0 to 3. When ac=0, this refers to the original H/LO register pair of the MIPS32 architecture.

In Release 6 of the MIPS Architecture, accumulators are eliminated from MIPS32.

No arithmetic exception occurs under any circumstances.

#### **Restrictions:**

None

## **Operation:**

```
 \begin{array}{l} \text{if } ((\ ac \neq 0 \ ) \ or \ (\text{Config}_{\text{AR}} \geq 2)) \ \text{then} \\ & \quad \text{ValidateAccessToDSP2Resources}() \\ \text{endif} \\ \text{temp}_{64..0} \leftarrow (\ 0^{32} \ || \ \text{GPR[rs]}_{31..0} \ ) \ \star \ (\ 0^{32} \ || \ \text{GPR[rt]}_{31..0} \ ) \\ (\ \text{HI[ac]}_{31..0} \ || \ \text{LO[ac]}_{31..0} \ ) \leftarrow \ \text{temp}_{63..32} \ || \ \text{temp}_{31..0} \end{array}
```

## **Exceptions:**

Reserved Instruction, DSP Disabled

## **Programming Notes:**

In some processors the integer multiply operation may proceed asynchronously and allow other CPU instructions to execute before it is complete. An attempt to read *LO* or *HI* before the results are written interlocks until the results are ready. Asynchronous execution does not af fect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel.

Programs that require overflow detection must check for it explicitly.

Where the size of the operands are known, software should place the shorter operand in register *rt*. This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.

## **Implementation Note:**

Processors which implement a multiplier array which is not square (for example,  $32 \times 16$ ), and which therefore has an operation latency which is data dependent, should assume that the shorter operand is in register *rt*.

# MULTU

| PA | CKRL.PH        |     |            |     |    | Pack a | Vecto | or of Half | vor | ds fro | om Vector H | alfwo | ord | Sour | ces |
|----|----------------|-----|------------|-----|----|--------|-------|------------|-----|--------|-------------|-------|-----|------|-----|
|    | 31             | 26  | 25         | 21  | 20 | 16     | 15    | 11         | 10  | 9      |             | 3     | 2   | 0    |     |
|    | P32A<br>001000 |     | rt         |     | rs |        |       | rd         | x   |        | 0110101     |       |     | 101  |     |
| L  | 6              |     | 5          |     | 5  |        |       | 5          | 1   |        | 7           |       |     | 3    |     |
|    | Format: PA     | CKR | L.PH rd, 1 | rs, | rt |        |       |            |     |        |             |       |     |      | DSP |

Purpose: Pack a Vector of Halfwords from Vector Halfword Sources

Pick two elements for a halfword vector using the right halfword and left halfword respectively from the two source registers.

**Description:**  $rd \leftarrow rs_{15..0} || rt_{31..16}$ 

The right halfword element from register rs and the left halfword from register rt are packed into the two halfword positions of the destination register rd.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are UNPREDICTABLE and the values of the operand vectors become UNPREDICTABLE.

## **Operation:**

```
ValidateAccessToDSPResources()
tempB_{15..0} \leftarrow GPR[rs]_{15..0}
tempA_{15..0} \leftarrow GPR[rt]_{31..16}
GPR[rd]_{31..0} \leftarrow tempB_{15..0} \mid tempA_{15..0}
```

## **Exceptions:**

Reserved Instruction, DSP Disabled

## PACKRL.PH

Pack a Vector of Halfwords from Vector Halfword Sources

| ICK.PH |                |           |        | Pick a | Vecto | r of Halfw | ord V | alu | es Based on ( | Conditio | on Co | ode E | 3its |
|--------|----------------|-----------|--------|--------|-------|------------|-------|-----|---------------|----------|-------|-------|------|
| 31     | 26             | 25        | 21     | 20     | 16    | 15         | 11    | 10  | 9             | 3        | 2     | 0     |      |
|        | P32A<br>001000 | rt        |        | rs     |       | rd         |       | x   | 1000101       | l        | 1     | 01    |      |
|        | 6              | 5         |        | 5      |       | 5          |       | 1   | 7             |          |       | 3     |      |
| For    | mat: PICK      | .PH rd, 1 | rs, rt |        |       |            |       |     |               |          |       |       | D    |

Purpose: Pick a Vector of Halfword Values Based on Condition Code Bits

Select two halfword elements from either of two source registers based on condition code bits, writing the selected elements to the destination register.

**Description:**  $rd \leftarrow pick(cc_{25}, rs_{31..16}, rt_{31..16}) \mid \mid pick(cc_{24}, rs_{15..0}, rt_{15..0})$ 

The two right-most condition code bits in the DSPControl register are used to select halfword values from the corresponding element of either source register rs or source register rt. If the value of the corresponding condition code bit is 1, then the halfword value is selected from register rs; otherwise, it is selected from rt. The selected halfwords are written to the destination register rd.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be in the specified format. If they are not, the results are UNPREDICTABLE and the values of the operand vectors become UNPREDICTABLE.

### **Operation:**

```
ValidateAccessToDSPResources()
tempB_{15..0} \leftarrow (DSPControl_{ccond:25} = 1 ? GPR[rs]_{31..16} : GPR[rt]_{31..16})
\texttt{tempA}_{15..0} \leftarrow (\texttt{DSPControl}_{\texttt{ccond}:24} = \texttt{1 ? GPR[rs]}_{15..0} : \texttt{GPR[rt]}_{15..0})
GPR[rd]_{31..0} \leftarrow tempB_{15..0} \mid \mid tempA_{15..0}
```

### **Exceptions:**

Reserved Instruction, DSP Disabled

## PICK.PH

Pick a Vector of Halfword Values Based on Condition Code Bits

| PIC | K.QB           |       |      | Pick a V | Pick a Vector of Byte Values Based on Condition Code Bits |    |         |   |     |   |  |  |  |  |
|-----|----------------|-------|------|----------|-----------------------------------------------------------|----|---------|---|-----|---|--|--|--|--|
|     | 31             | 26 25 | 5 21 | 20 16    | 15 11                                                     | 10 | 9       | 3 | 2   | 0 |  |  |  |  |
|     | P32A<br>001000 |       | rt   | rs       | rd                                                        | x  | 0111101 |   | 101 |   |  |  |  |  |
| L   | 6              |       | 5    | 5        | 5                                                         | 1  | 7       |   | 3   |   |  |  |  |  |

Format: PICK.QB rd, rs, rt

Purpose: Pick a Vector of Byte Values Based on Condition Code Bits

Select four byte elements from either of two source registers based on condition code bits, writing the selected elements to the destination register.

```
Description: rd \leftarrow pick(cc_{27}, rs_{31..24}, rt_{31..24}) || pick(cc_{26}, rs_{23..16}, rt_{23..16}) || pick(cc_{25}, rs_{15..8}, rt_{15..8}) || pick(cc_{24}, rs_{7..0}, rt_{7..0})
```

Four condition code bits in the *DSPControl* register are used to select byte values from the corresponding byte element of either source register *rs* or source register *rt*. If the value of the corresponding condition code bit is 1, then the byte value is selected from register *rs*; otherwise, it is selected from *rt*. The selected bytes are written to the destination register *rd*.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

## **Exceptions:**

Reserved Instruction, DSP Disabled

DSP

## PICK.QB

Pick a Vector of Byte Values Based on Condition Code Bits

| PR | ECEQ.W.PHL     |      |           |      | Precisio | n Ex | kpand Fractional Ha | lfv | vord to | F | ractio | nal | Wo | ord Va | lue |
|----|----------------|------|-----------|------|----------|------|---------------------|-----|---------|---|--------|-----|----|--------|-----|
|    | 31             | 26   | 25        | 21   | 20       | 16   | 15                  | 9   | 8       | 6 | 5      | 3   | 2  | 0      |     |
|    | P32A<br>001000 |      | rt        |      | rs       |      | 0101000             |     | 100     |   | 11     | l   |    | 111    |     |
| L  | 6              |      | 5         |      | 5        |      | 7                   |     | 3       |   | 3      |     |    | 3      | _   |
|    | Format: PI     | RECE | Q.W.PHL r | t, r | S        |      |                     |     |         |   |        |     |    |        | DSF |

Purpose: Precision Expand Fractional Halfword to Fractional Word Value

Expand the precision of a Q15 fractional value taken from the left element of a paired halfword vector to create a Q31 fractional word value.

**Description:** rt ← expand\_prec(rs<sub>31..16</sub>)

The left Q15 fractional halfword value from the paired halfword vector in register *rs* is expanded to a Q31 fractional value and written to destination register *rt*. The precision expansion is achieved by appending 16 least-significant zero bits to the original halfword value to generate the 32-bit fractional value.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

```
ValidateAccessToDSPResources()
temp<sub>31..0</sub> \leftarrow GPR[rs]<sub>31..16</sub> || 0<sup>16</sup>
GPR[rt]<sub>31..0</sub> \leftarrow temp<sub>31..0</sub>
```

### **Exceptions:**

# PRECEQ.W.PHL

Precision Expand Fractional Halfword to Fractional Word Value

| PR | ECEQ.W.PHR     |     |           |      | Precisior | ו E | cpand Fractional Half | wo | ord to F | ra | ction | al | Wo | ord Va | lue |
|----|----------------|-----|-----------|------|-----------|-----|-----------------------|----|----------|----|-------|----|----|--------|-----|
|    | 31             | 26  | 25        | 21   | 20        | 16  | 15 9                  | 8  | 6        | 5  |       | 3  | 2  | 0      |     |
|    | P32A<br>001000 |     | rt        |      | rs        |     | 0110000               |    | 100      |    | 111   |    |    | 111    |     |
| L  | 6              |     | 5         |      | 5         |     | 7                     |    | 3        |    | 3     |    |    | 3      | _   |
|    | Format: PR     | ECE | Q.W.PHR r | t, r | S         |     |                       |    |          |    |       |    |    |        | DSI |

Purpose: Precision Expand Fractional Halfword to Fractional Word Value

Expand the precision of a Q15 fractional value taken from the right element of a paired halfword vector to create a Q31 fractional word value.

**Description:** rt  $\leftarrow$  expand\_prec(rs<sub>15..0</sub>)

The right Q15 fractional halfword value from the paired halfword vector in register *rs* is expanded to a Q31 fractional value and written to destination register *rt*. The precision expansion is achieved by appending 16 least-significant zero bits to the original halfword value to generate the 32-bit fractional value.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

```
ValidateAccessToDSPResources()
temp<sub>31..0</sub> \leftarrow GPR[rs]<sub>15..0</sub> || 0<sup>16</sup>
GPR[rt]<sub>31..0</sub> \leftarrow temp<sub>31..0</sub>
```

## **Exceptions:**

## PRECEQ.W.PHR

Precision Expand Fractional Halfword to Fractional Word Value

| PF | RECEQU.PH.QE   | BL  | I            | Precision Expa | and two Unsigned | Bytes | s to Fra | ctic | onal Ha | alfv | voro | d Valu | ies |
|----|----------------|-----|--------------|----------------|------------------|-------|----------|------|---------|------|------|--------|-----|
|    | 31             | 26  | 25 21        | 20 1           | 16 15            | 9     | 8        | 6    | 5       | 3    | 2    | 0      |     |
|    | P32A<br>001000 |     | rt           | rs             | 0111000          |       | 100      |      | 111     |      | 1    | 111    |     |
|    | 6              |     | 5            | 5              | 7                |       | 3        |      | 3       |      |      | 3      | 1   |
|    | Format: PR     | ECE | QU.PH.QBL rt | , rs           |                  |       |          |      |         |      |      |        | DSP |

Purpose: Precision Expand two Unsigned Bytes to Fractional Halfword Values

Expand the precision of two unsigned byte values taken from the two left-most elements of a quad byte vector to create two Q15 fractional halfword values.

**Description:**  $rt \leftarrow expand_prec(rs_{31..24}) || expand_prec(rs_{23..16})$ 

The two left-most unsigned integer byte values from the four byte elements in register *rs* are expanded to create two Q15 fractional values that are then written to destination register *rt*. The precision expansion is achieved by pre-pending a single zero bit (for positive sign) to the original byte value and appending seven least-significant zeros to generate each 16-bit fractional value.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

```
 \begin{array}{l} \text{ValidateAccessToDSPResources()} \\ \text{tempB}_{15..0} \leftarrow 0^1 \mid \mid \text{GPR[rs]}_{31..24} \mid \mid 0^7 \\ \text{tempA}_{15..0} \leftarrow 0^1 \mid \mid \text{GPR[rs]}_{23..16} \mid \mid 0^7 \\ \text{GPR[rt]}_{31..0} \leftarrow \text{tempB}_{15..0} \mid \mid \text{tempA}_{15..0} \end{array}
```

## **Exceptions:**

PRECEQU.PH.QBL

Precision Expand two Unsigned Bytes to Fractional Halfword Values

| PR | ECEQU.PH.QB    | BLA |                   | Precision | Ехра | nd two Unsigned By | ytes | s to Fra | tio | nal Ha | alfw | vord | Valu | ies |
|----|----------------|-----|-------------------|-----------|------|--------------------|------|----------|-----|--------|------|------|------|-----|
|    | 31             | 26  | 25 2 <sup>-</sup> | 1 20      | 16   | 15                 | 9    | 8        | 6   | 5      | 3    | 2    | 0    |     |
|    | P32A<br>001000 |     | rt                | rs        |      | 0111001            |      | 100      |     | 111    |      | 1    | 11   |     |
|    | 6              |     | 5                 | 5         |      | 7                  |      | 3        |     | 3      |      | :    | 3    |     |
|    | Format: PR     | ECE | EQU.PH.QBLA       | rt, rs    |      |                    |      |          |     |        |      |      |      | DSP |

Purpose: Precision Expand two Unsigned Bytes to Fractional Halfword Values

Expand the precision of two unsigned byte values taken from the two left-alternate aligned elements of a quad byte vector to create two Q15 fractional halfword values.

**Description:** rt  $\leftarrow$  expand\_prec(rs<sub>31..24</sub>) || expand\_prec(rs<sub>15..8</sub>)

The two left-alternate aligned unsigned integer byte values from the four byte elements in register *rs* are expanded to create two Q15 fractional values that are then written to destination register *rt*. The precision expansion is achieved by pre-pending a single zero bit (for positive sign) to the original byte value and appending seven least-significant zeros to generate each 16-bit fractional value.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

```
 \begin{array}{l} \text{ValidateAccessToDSPResources()} \\ \text{tempB}_{15..0} \leftarrow 0^1 \mid \mid \text{GPR[rs]}_{31..24} \mid \mid 0^7 \\ \text{tempA}_{15..0} \leftarrow 0^1 \mid \mid \text{GPR[rs]}_{15..8} \mid \mid 0^7 \\ \text{GPR[rt]}_{31..0} \leftarrow \text{tempB}_{15..0} \mid \mid \text{tempA}_{15..0} \end{array}
```

## **Exceptions:**

PRECEQU.PH.QBLA

Precision Expand two Unsigned Bytes to Fractional Halfword Values

| PR | RECEQU.PH.QE   | BR  | F            | Precision Expar | nd two Unsigned Bytes | s to Fractio | onal Halfv | vord Valu | es  |
|----|----------------|-----|--------------|-----------------|-----------------------|--------------|------------|-----------|-----|
|    | 31             | 26  | 25 21        | 20 16           | 15 9                  | 8 6          | 5 3        | 2 0       |     |
|    | P32A<br>001000 |     | rt           | rs              | 1001000               | 100          | 111        | 111       |     |
|    | 6              |     | 5            | 5               | 7                     | 3            | 3          | 3         |     |
|    | Format: PR     | ECE | QU.PH.QBR rt | , rs            |                       |              |            |           | DSP |

Purpose: Precision Expand two Unsigned Bytes to Fractional Halfword Values

Expand the precision of two unsigned byte values taken from the two right-most elements of a quad byte vector to create two Q15 fractional halfword values.

**Description:** rt  $\leftarrow$  expand\_prec(rs<sub>15..8</sub>) || expand\_prec(rs<sub>7..0</sub>)

The two right-most unsigned integer byte values from the four byte elements in register *rs* are expanded to create two Q15 fractional values that are then written to destination register *rt*. The precision expansion is achieved by pre-pending a single zero bit (for positive sign) to the original byte value and appending seven least-significant zeros to generate each 16-bit fractional value.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

```
ValidateAccessToDSPResources()
tempB<sub>15..0</sub> \leftarrow 0<sup>1</sup> || GPR[rs]<sub>15..8</sub> || 0<sup>7</sup>
tempA<sub>15..0</sub> \leftarrow 0<sup>1</sup> || GPR[rs]<sub>7..0</sub> || 0<sup>7</sup>
GPR[rt]<sub>31..0</sub> \leftarrow tempB<sub>15..0</sub> || tempA<sub>15..0</sub>
```

#### **Exceptions:**

PRECEQU.PH.QBR

Precision Expand two Unsigned Bytes to Fractional Halfword Values

| PR | RECEQU.PH.QI   | BRA  | <b>L</b> |                | F   | Precision E | хра | nd tw | vo Unsigned E | Byte | s to F | ractio | onal | На  | lfw | oro/ | d Val | ues |
|----|----------------|------|----------|----------------|-----|-------------|-----|-------|---------------|------|--------|--------|------|-----|-----|------|-------|-----|
|    | 31             | 26   | 25       |                | 21  | 20          | 16  | 15    |               | 9    | 8      | 6      | 5    |     | 3   | 2    | 0     |     |
|    | P32A<br>001000 |      |          | rt             |     | rs          |     |       | 1001001       |      | 1      | 00     |      | 111 |     |      | 111   |     |
|    | 6              |      |          | 5              |     | 5           |     |       | 7             |      |        | 3      |      | 3   | 1   |      | 3     |     |
|    | Format: PI     | RECI | EQU.P    | H.QBR <i>A</i> | A r | t, rs       |     |       |               |      |        |        |      |     |     |      |       | DSP |

Purpose: Precision Expand two Unsigned Bytes to Fractional Halfword Values

Expand the precision of two unsigned byte values taken from the two right-alternate aligned elements of a quad byte vector to create two Q15 fractional halfword values.

**Description:** rt  $\leftarrow$  expand\_prec(rs<sub>23..16</sub>) || expand\_prec(rs<sub>7..0</sub>)

The two right-alternate aligned unsigned integer byte values from the four byte elements in register rs are expanded to create two Q15 fractional values that are then written to destination register *rt*. The precision expansion is achieved by pre-pending a single zero bit (for positive sign) to the original byte value and appending seven least-significant zeros to generate each 16-bit fractional value.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \text{ValidateAccessToDSPResources()} \\ \text{tempB}_{15..0} \leftarrow 0^1 \mid \mid \text{GPR[rs]}_{23..16} \mid \mid 0^7 \\ \text{tempA}_{15..0} \leftarrow 0^1 \mid \mid \text{GPR[rs]}_{7..0} \mid \mid 0^7 \\ \text{GPR[rt]}_{31..0} \leftarrow \text{tempB}_{15..0} \mid \mid \text{tempA}_{15..0} \end{array}
```

#### **Exceptions:**

PRECEQU.PH.QBRA

Precision Expand two Unsigned Bytes to Fractional Halfword Values

| PF | RECEU.PH.QBL   |     |             | Pr | ecision Expar | nd | Two Unsigned Byte | s to | o Unsig | ne | d Ha | lfv | vor | d Valı | les |
|----|----------------|-----|-------------|----|---------------|----|-------------------|------|---------|----|------|-----|-----|--------|-----|
|    | 31             | 26  | 25          | 21 | 20 16         | 6  | 15 9              | 8    | 6       | 5  |      | 3   | 2   | 0      |     |
|    | P32A<br>001000 |     | rt          |    | rs            |    | 1011000           |      | 100     |    | 111  |     |     | 111    |     |
|    | 6              |     | 5           |    | 5             |    | 7                 |      | 3       |    | 3    |     |     | 3      | _   |
|    | Format: PR     | ECE | U.PH.QBL rt | ,  | rs            |    |                   |      |         |    |      |     |     |        | DSP |

Purpose: Precision Expand Two Unsigned Bytes to Unsigned Halfword Values

Expand the precision of two unsigned byte values taken from the two left-most elements of a quad byte vector to create two unsigned halfword values.

**Description:** rt  $\leftarrow$  expand\_prec8u16(rs<sub>31..24</sub>) || expand\_prec8u16(rs<sub>23..16</sub>)

The two left-most unsigned integer byte values from the four byte elements in register *rs* are expanded to create two unsigned halfword values that are then written to destination register *rt*. The precision expansion is achieved by prepending eight most-significant zeros to each original value to generate each 16 bit unsigned value.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
\begin{split} & \text{ValidateAccessToDSPResources()} \\ & \text{tempB}_{15..0} \leftarrow 0^8 \mid \mid \text{GPR}[\text{rs}]_{31..24} \\ & \text{tempA}_{15..0} \leftarrow 0^8 \mid \mid \text{GPR}[\text{rs}]_{23..16} \\ & \text{GPR}[\text{rt}]_{31..0} \leftarrow \text{tempB}_{15..0} \mid \mid \text{tempA}_{15..0} \end{split}
```

### **Exceptions:**

PRECEU.PH.QBL

Precision Expand Two Unsigned Bytes to Unsigned Halfword Values

| PR | ECEU.PH.QBL    | Α.   | P            | Precision Expan | id Two Unsigned Byte | s to Unsig | ned Halfv | vord Valu | ies |
|----|----------------|------|--------------|-----------------|----------------------|------------|-----------|-----------|-----|
|    | 31             | 26   | 25 21        | 20 16           | 15 9                 | 8 6        | 5 3       | 2 0       |     |
|    | P32A<br>001000 |      | rt           | rs              | 1011001              | 100        | 111       | 111       |     |
| L  | 6              | 1    | 5            | 5               | 7                    | 3          | 3         | 3         | ,   |
|    | Format: PI     | RECE | U.PH.QBLA rt | , rs            |                      |            |           |           | DSP |

Purpose: Precision Expand Two Unsigned Bytes to Unsigned Halfword Values

Expand the precision of two unsigned integer byte values taken from the two left-alternate aligned positions of a quad byte vector to create four unsigned halfword values.

**Description:**  $rt \leftarrow expand_prec8u16(rs_{31..24}) || expand_prec8u16(rs_{15..8})$ 

The two left-alternate aligned unsigned integer byte values from the four right-most byte elements in register rs are each expanded to unsigned halfword values and written to destination register rt. The precision expansion is achieved by pre-pending eight most-significant zero bits to the original byte v alue to generate each 16 bit unsigned halfw ord value.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{tempB}_{15..0} \leftarrow 0^8 \mid \mid \mbox{GPR[rs]}_{31..24} \\ \mbox{tempA}_{15..0} \leftarrow 0^8 \mid \mid \mbox{GPR[rs]}_{15..8} \\ \mbox{GPR[rt]}_{31..0} \leftarrow \mbox{tempB}_{15..0} \mid \mid \mbox{tempA}_{15..0} \end{array}
```

#### **Exceptions:**

PRECEU.PH.QBLA

Precision Expand Two Unsigned Bytes to Unsigned Halfword Values

| PREC | EU.PH.QBR      |    |             | P  | Precision Ex | pan | d two Unsigned Byte | s to | o Unsig | ne | d Half | wo | rd Valu | les |
|------|----------------|----|-------------|----|--------------|-----|---------------------|------|---------|----|--------|----|---------|-----|
| 31   | 2              | 26 | 25          | 21 | 20           | 16  | 15 9                | 8    | 6       | 5  | 3      | 2  | 0       |     |
|      | P32A<br>001000 |    | rt          |    | rs           |     | 1101000             |      | 100     |    | 111    |    | 111     |     |
| L    | 6              |    | 5           |    | 5            |     | 7                   |      | 3       | 1  | 3      | -  | 3       | 1   |
|      | Format: PRE    | CE | U.PH.OBR rt |    | rs           |     |                     |      |         |    |        |    |         | D   |

Purpose: Precision Expand two Unsigned Bytes to Unsigned Halfword Values

Expand the precision of two unsigned integer byte values taken from the two right-most elements of a quad byte vector to create two unsigned halfword values.

**Description:** rt  $\leftarrow$  expand\_prec8u16(rs<sub>15..8</sub>) || expand\_prec8u16(rs<sub>7..0</sub>)

The two right-most unsigned integer byte values from the four byte elements in register *rs* are expanded to create two unsigned halfword values that are then written to destination register *rt*. The precision expansion is achieved by prepending eight most-significant zero bits to each original value to generate each 16 bit halfword value.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
\begin{split} & \text{ValidateAccessToDSPResources()} \\ & \text{tempB}_{15..0} \leftarrow 0^8 \mid \mid \text{GPR}[\text{rs}]_{15..8} \\ & \text{tempA}_{15..0} \leftarrow 0^8 \mid \mid \text{GPR}[\text{rs}]_{7..0} \\ & \text{GPR}[\text{rt}]_{31..0} \leftarrow \text{tempB}_{15..0} \mid \mid \text{tempA}_{15..0} \end{split}
```

### **Exceptions:**

PRECEU.PH.QBR

Precision Expand two Unsigned Bytes to Unsigned Halfword Values

| PR | ECEU.PH.QBF    | RA   | P             | recision Expar | nd Two Unsigned Byte | s to Unsig | ned Halfv | vord Valu | es  |
|----|----------------|------|---------------|----------------|----------------------|------------|-----------|-----------|-----|
|    | 31             | 26   | 25 21         | 20 16          | 15 9                 | 8 6        | 5 3       | 2 0       |     |
|    | P32A<br>001000 |      | rt            | rs             | 1101001              | 100        | 111       | 111       |     |
|    | 6              |      | 5             | 5              | 7                    | 3          | 3         | 3         |     |
|    | Format: PE     | RECE | EU.PH.QBRA rt | , rs           |                      |            |           |           | DSP |

Purpose: Precision Expand Two Unsigned Bytes to Unsigned Halfword Values

Expand the precision of two unsigned byte values taken from the two right-alternate aligned positions of a quad byte vector to create two unsigned halfword values.

**Description:** rt  $\leftarrow$  expand\_prec8u16(rs<sub>23..16</sub>) || expand\_prec8u16(rs<sub>7..0</sub>)

The two right-alternate aligned unsigned integer byte values from the four byte elements in register rs are each expanded to unsigned halfword values and written to destination register rt. The precision expansion is achieved by pre-pending eight most-significant zero bits to the original byte value to generate each 16 bit unsigned halfword value.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{tempB}_{15..0} \leftarrow 0^8 \mid \mid \mbox{GPR[rs]}_{23..16} \\ \mbox{tempA}_{15..0} \leftarrow 0^8 \mid \mid \mbox{GPR[rs]}_{7..0} \\ \mbox{GPR[rt]}_{31..0} \leftarrow \mbox{tempB}_{15..0} \mid \mid \mbox{tempA}_{15..0} \end{array}
```

#### **Exceptions:**

PRECEU.PH.QBRA

Precision Expand Two Unsigned Bytes to Unsigned Halfword Values

| PRECR.QI | B.PH          |            |    |      | Prec | ision Redu | ce F | ou | r Integ | ger Halfword | ds to | Fo | our B | ytes  |
|----------|---------------|------------|----|------|------|------------|------|----|---------|--------------|-------|----|-------|-------|
| 31       | 26            | 25         | 21 | 20   | 16   | 15         | 11   | 10 | 9       |              | 3     | 2  | (     | )     |
|          | P32A<br>01000 | rt         |    | rs   |      | rd         |      | x  |         | 0001101      |       |    | 101   |       |
|          | 6             | 5          |    | 5    |      | 5          |      | 1  |         | 7            |       | I  | 3     |       |
| Form     | nat: PRECR    | .QB.PH rd, | rs | , rt |      |            |      |    |         |              |       |    | I     | DSP-I |

Purpose: Precision Reduce Four Integer Halfwords to Four Bytes

Reduce the precision of four integer halfwords to four byte values.

**Description:** rd ← rs<sub>23..16</sub> || rs<sub>7..0</sub> || rt<sub>23..16</sub> || rt<sub>7..0</sub>

The 8 least-significant bits from each of the two integer halfword values in registers rs and rt are taken to produce four byte-sized results that are written to the four byte elements in destination register rd. The two bytes values obtained from rs are written to the two left-most destination byte elements, and the two bytes obtained from rt are written to the two right-most destination byte elements.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

### **Exceptions:**

# PRECR.QB.PH

Precision Reduce Four Integer Halfwords to Four Bytes

```
PRECR_SRA[_R].PH.W
```

Precision Reduce Two Integer Words to Halfwords after a Right Shift

| 31         | 20       | 6 25     | 21 2    | 0 16       | 15 | 11 10 | 9 |         | 3 2 | 2 0 |     |
|------------|----------|----------|---------|------------|----|-------|---|---------|-----|-----|-----|
| PRECR_SR   | A.PH.W   |          |         |            |    |       |   |         |     |     |     |
| P32<br>001 |          | rt       |         | rs         | sa | 0     |   | 1111001 |     | 101 |     |
| PRECR_SR   | A_R.PH.W |          | ł       |            | +  | I     | 1 |         |     |     |     |
| P32<br>001 |          | rt       |         | rs         | sa | 1     |   | 1111001 |     | 101 |     |
| e          | 3        | 5        | Į       | 5          | 5  | 1     |   | 7       |     | 3   |     |
| Forma      | t: PREG  | CR_SRA[_ | R].PH.W |            |    |       |   |         |     |     |     |
|            | PRE      | CR_SRA.P | H.W     | rt, rs, sa |    |       |   |         |     |     | DSF |
|            | PRE      | CR_SRA_R | .PH.W   | rt, rs, sa |    |       |   |         |     | I   | DSF |

Purpose: Precision Reduce Two Integer Words to Halfwords after a Right Shift

Do an arithmetic right shift of two integer words with optional rounding, and then reduce the precision to halfwords.

**Description:** rt  $\leftarrow$  (round(rt>>shift))<sub>15.0</sub> || (round(rs>>shift))<sub>15.0</sub>

The two words in registers *rs* and *rt* are right shifted arithmetically by the specified shift amount *sa* to create interim results. The 16 least-significant bits of each interim result are then written to the corresponding elements of destination register *rt*.

In the rounding version of the instruction, a value of 1 is added at the most-significant discarded bit position after the shift is performed. The 16 least-significant bits of each interim result are then written to the corresponding elements of destination register *rt*.

The shift amount sa is interpreted as a five-bit unsigned integer taking values between 0 and 31.

This instruction does not write any bits of the *ouflag* field in the DSPControl register.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
PRECR SRA.PH.W
      ValidateAccessToDSP2Resources()
      if (sa_{4..0} = 0) then
             \texttt{tempB}_{\texttt{15..0}} \leftarrow \texttt{GPR[rt]}_{\texttt{15..0}}
             tempA_{15..0} \leftarrow GPR[rs]_{15..0}
      else
             tempB_{15..0} \leftarrow ( (GPR[rt]_{31})^{sa} || GPR[rt]_{31..sa} )
             tempA_{15..0} \leftarrow ( (GPR[rs]_{31})^{sa} || GPR[rs]_{31..sa} )
      endif
      GPR[rt]_{31..0} \leftarrow tempB_{15..0} \mid \mid tempA_{15..0}
PRECR SRA R.PH.W
      ValidateAccessToDSP2Resources()
      if (sa_4 _0 = 0) then
             tempB_{16..0} \leftarrow (GPR[rt]_{15..0} | | 0)
             tempA_{16..0} \leftarrow (GPR[rs]_{15..0} || 0)
      else
            \begin{array}{l} \texttt{tempB}_{32..0} \leftarrow ( \ (\texttt{GPR[rt]}_{31})^{\texttt{sa}} \ \big| \ \texttt{GPR[rt]}_{31..\text{sa}-1} \ ) + 1 \\ \texttt{tempA}_{32..0} \leftarrow ( \ (\texttt{GPR[rs]}_{31})^{\texttt{sa}} \ \big| \ \texttt{GPR[rs]}_{31..\text{sa}-1} \ ) + 1 \end{array}
      endif
```

PRECR\_SRA[\_R].PH.W Precision Reduce Two Integer Words to Halfwords after a Right Shift

 $GPR[rt]_{31..0} \leftarrow tempB_{16..1} || tempA_{16..1}$ 

## **Exceptions:**

| PR | ECRQ.PH.W      |     |            |    | Precisio | Precision Reduce Fractional Words to Fractional Halfwords |    |    |         |   |     |     |  |  |
|----|----------------|-----|------------|----|----------|-----------------------------------------------------------|----|----|---------|---|-----|-----|--|--|
|    | 31             | 26  | 25         | 21 | 20 16    | 15                                                        | 11 | 10 | 9       | 3 | 2   | 0   |  |  |
|    | P32A<br>001000 |     | rt         |    | rs       | rd                                                        |    | x  | 0011101 |   | 101 |     |  |  |
|    | 6              |     | 5          |    | 5        | 5                                                         |    | 1  | 7       | I | 3   |     |  |  |
|    | Format: PR     | ECR | Q.PH.W rd, | rs | , rt     |                                                           |    |    |         |   |     | DSP |  |  |

Purpose: Precision Reduce Fractional Words to Fractional Halfwords

Reduce the precision of two fractional words to produce two fractional halfword values.

**Description:**  $rd \leftarrow rt_{31..16} || rs_{31..16}$ 

The 16 most-significant bits from each of the Q31 fractional word values in registers rs and rt are written to destination register rd, creating a vector of two Q15 fractional values. The fractional word from the rs register is used to create the left-most Q15 fractional value in rd, and the fractional word from the rt register is used to create the right-most Q15 fractional value.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{tempB}_{15..0} \leftarrow \mbox{GPR[rs]}_{31..16} \\ \mbox{tempA}_{15..0} \leftarrow \mbox{GPR[rt]}_{31..16} \\ \mbox{GPR[rd]}_{31..0} \leftarrow \mbox{tempB}_{15..0} \mid \mid \mbox{tempA}_{15..0} \end{array}
```

### **Exceptions:**

## PRECRQ.PH.W

**Precision Reduce Fractional Words to Fractional Halfwords** 

| PR | ECRQ.QB.PH     | Pre | Precision Reduce Four Fractional Halfwords to Four Byt |      |       |    |    |    |    |   |         |   |   |     |     |
|----|----------------|-----|--------------------------------------------------------|------|-------|----|----|----|----|---|---------|---|---|-----|-----|
|    | 31             | 26  | 25                                                     | 21   | 20    | 16 | 15 | 11 | 10 | 9 |         | 3 | 2 | 0   |     |
|    | P32A<br>001000 |     | rt                                                     |      | rs    |    | r  | d  | x  |   | 0010101 |   |   | 101 |     |
| -  | 6              |     | 5                                                      |      | 5     |    | Į  | 5  | 1  |   | 7       |   |   | 3   | -   |
|    | Format: PR     | ECR | Q.QB.PH ro                                             | l, r | s, rt |    |    |    |    |   |         |   |   |     | DSI |

Purpose: Precision Reduce Four Fractional Halfwords to Four Bytes

Reduce the precision of four fractional halfwords to four byte values.

**Description:**  $rd \leftarrow rs_{31..24} || rs_{15..8} || rt_{31..24} || rt_{15..8}$ 

The four Q15 fractional values in registers *rs* and *rt* are truncated by dropping the eight least significant bits from each value to produce four fractional byte values. The four fractional byte values are written to the four byte elements of destination register *rd*. The two values obtained from register *rt* are placed in the two right-most byte positions in the destination register, and the two values obtained from register *rs* are placed in the two remaining byte positions.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

### **Exceptions:**

## PRECRQ.QB.PH

Precision Reduce Four Fractional Halfwords to Four Bytes

| PRECRQU_    | S.QB.PH | Precis      | Precision Reduce Fractional Halfwords to Unsigned Bytes With Saturation |       |    |    |         |   |    |    |  |  |  |
|-------------|---------|-------------|-------------------------------------------------------------------------|-------|----|----|---------|---|----|----|--|--|--|
| 31          | 26      | 25 2        | 1 20                                                                    | 16 15 | 11 | 10 | 9       | 3 | 2  | 0  |  |  |  |
| P32<br>0010 |         | rt          | rs                                                                      |       | rd | x  | 0101101 |   | 10 | )1 |  |  |  |
| 6           |         | 5           | 5                                                                       |       | 5  | 1  | 7       |   | 3  | 3  |  |  |  |
| Forma       | PRECI   | RQU S.QB.PH | rd, rs, rt                                                              |       |    |    |         |   |    | D  |  |  |  |

Format: PRECRQU S.QB.PH rd, rs, rt

Purpose: Precision Reduce Fractional Halfwords to Unsigned Bytes With Saturation

Reduce the precision of four fractional halfwords with saturation to produce four unsigned byte values, with saturation.

```
Description: rd \leftarrow sat(reduce_prec(rs<sub>31..16</sub>)) || sat(reduce_prec(rs<sub>15..0</sub>)) ||
sat(reduce_prec(rt_{31,..16})) || sat(reduce_prec(rt_{15,..0}))
```

The four Q15 fractional halfwords from registers rs and rt are used to create four unsigned byte values that are written to corresponding elements of destination register rd. The two halfwords from the rs register and the two halfwords from the *rt* register are used to create the four unsigned byte values.

Each unsigned byte value is created from the Q15 fractional halfword input value after first examining the sign and magnitude of the halfword. If the sign of the halfword value is positive and the value is greater than 0x7F80 hexadecimal, the result is clamped to the maximum positive 8-bit value (255 decimal, 0xFF hexadecimal). If the sign of the halfword value is negative, the result is clamped to the minimum positive 8-bit value (0 decimal, 0x00 hexadecimal). Otherwise, the sign bit is discarded from the input and the result is taken from the eight most-significant bits that remain.

If clamping was needed to produce any of the unsigned output values, bit 22 of the DSPControl register is set to 1.

### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become UNPREDICTABLE.

### **Operation:**

```
ValidateAccessToDSPResources()
tempD<sub>7 0</sub> \leftarrow sat8ReducePrecision(GPR[rs]<sub>31 16</sub>)
tempC_{7..0} \leftarrow sat8ReducePrecision(GPR[rs]_{15..0})
tempB_{7..0} \leftarrow sat8ReducePrecision(GPR[rt]_{31..16})
tempA_{7..0} \leftarrow sat8ReducePrecision(GPR[rt]_{15..0})
GPR[rd]_{31..0} \leftarrow tempD_{7..0} || tempC_{7..0} || tempB_{7..0} || tempA_{7..0}
function sat8ReducePrecision( a<sub>15..0</sub> )
    sign \leftarrow a<sub>15</sub>
    mag_{14..0} \leftarrow a_{14..0}
    if (sign = 0) then
         if ( mag_{14..0} > 0x7F80 ) then
              temp_{7..0} \leftarrow 0xFF
              DSPControl_{ouflag:22} \leftarrow 1
         else
              temp_{7..0} \leftarrow mag_{14..7}
         endif
    else
         temp_{7..0} \leftarrow 0x00
         DSPControl_{ouflag:22} \leftarrow 1
    endif
    return temp7 0
endfunction sat8ReducePrecision
```

# PRECRQU\_S.QB.PH Precision Reduce Fractional Halfwords to Unsigned Bytes With Saturation

## **Exceptions:**

| PRECRQ_RS.PH.W F | Precision Reduce Fractional Words to Halfwords With Rounding and Saturation |
|------------------|-----------------------------------------------------------------------------|
|------------------|-----------------------------------------------------------------------------|

| 31             | 26    | 25       | 21     | 20       | 16 | 15 | 11 10 | ç | 9 3     | 2 0 |    |
|----------------|-------|----------|--------|----------|----|----|-------|---|---------|-----|----|
| P32A<br>001000 |       | rt       | -      | rs       |    | rd | x     |   | 0100101 | 101 |    |
| 6              |       | 5        |        | 5        |    | 5  | 1     |   | 7       | 3   | _  |
| Format:        | PRECI | RQ RS.PH | I.W rd | , rs, rt |    |    |       |   |         |     | DS |

Purpose: Precision Reduce Fractional Words to Halfwords With Rounding and Saturation

Reduce the precision of two fractional words to produce two fractional halfword values, with rounding and saturation.

**Description:** rd  $\leftarrow$  truncQ15SatRound(rs<sub>31.0</sub>) || truncQ15SatRound(rt<sub>31.0</sub>)

The two Q31 fractional word values in each of registers *rs* and *rt* are used to create two Q15 fractional halfword values that are written to the two halfword elements in destination register *rd*. The fractional word from the *rs* register is used to create the left-most Q15 fractional halfword result in *rd*, and the fractional word from the *rt* register is used to create the right-most halfword value.

Each input Q31 fractional value is rounded and saturated before being truncated to create the Q15 fractional halfword result. First, the value 0x00008000 is added to the input Q31 value to round even, creating an interim rounded result. If this addition causes overflow, the interim rounded result is saturated to the maximum Q31 value (0x7FFFFFF hexadecimal). Then, the 16 least-significant bits of the interim rounded and saturated result are discarded and the 16 most-significant bits are written to the destination register in the appropriate position.

If either of the rounding operations results in overflow and saturation, a 1 is written to bit 22 in the *DSPControl* register within the *ouflag* field.

### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

### **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{tempB}_{15..0} \leftarrow \mbox{truncl6Satl6Round(GPR[rs]_{31..0})} \\ \mbox{tempA}_{15..0} \leftarrow \mbox{truncl6Satl6Round(GPR[rt]_{31..0})} \\ \mbox{GPR[rd]}_{31..0} \leftarrow \mbox{tempB}_{15..0} \mid\mid \mbox{tempA}_{15..0} \\ \mbox{function truncl6Satl6Round(} a_{31..0} ) \\ \mbox{temp}_{32..0} \leftarrow (a_{31} \mid\mid a_{31..0}) + 0x00008000 \\ \mbox{if (temp}_{32} \neq \mbox{temp}_{31}) \mbox{then} \\ \mbox{temp}_{32..0} \leftarrow 0 \mid\mid \mbox{0x7FFFFFF} \\ \mbox{DSPControl}_{ouflag:22} \leftarrow 1 \\ \mbox{endif} \\ \mbox{return temp}_{31..16} \\ \mbox{endfunction truncl6Satl6Round} \\ \end{array}
```

## **Exceptions:**

PRECRQ\_RS.PH.W Precision Reduce Fractional Words to Halfwords With Rounding and Saturation

## PREPEND

## **Right Shift and Prepend Bits to the MSB**

```
Format: PREPEND rt, rs, sa
EXTW rt, rs, rt, sa
```

DSP-R2 Replaced with EXTW in nanoMIPS

Purpose: Right Shift and Prepend Bits to the MSB

Logically right-shift the first source register, replacing the bits emptied by the shift with bits from the source register.

```
Description: rt \leftarrow rs_{sa-1..0} || (rt >> sa)
```

The word value in register rt is logically right-shifted by the specified shift amount sa, and sa bits from the least-significant positions of register rs are written into the sa most-significant bits emptied by the shift. The result is then written to destination register rt.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

## **Exceptions:**

# PREPEND

| RADD | OU.W.QB        | Unsigned Reduction Add Vector Quad Bytes |           |    |    |    |    |         |   |     |   |   |     |   |    |   |     |
|------|----------------|------------------------------------------|-----------|----|----|----|----|---------|---|-----|---|---|-----|---|----|---|-----|
| 31   |                | 26                                       | 25        | 21 | 20 | 16 | 15 |         | 9 | 8   | 6 | 5 |     | 3 | 2  | 0 |     |
|      | P32A<br>001000 |                                          | rt        |    | rs |    |    | 1111000 |   | 100 | ) |   | 111 |   | 11 | 1 |     |
|      | 6              |                                          | 5         |    | 5  |    |    | 7       |   | 3   |   |   | 3   |   | 3  | 3 | J   |
|      | Format: RAI    | DDU                                      | .W.QB rt, | rs |    |    |    |         |   |     |   |   |     |   |    |   | DSP |

Purpose: Unsigned Reduction Add Vector Quad Bytes

Reduction add of four unsigned byte values in a vector register to produce an unsigned word result.

**Description:**  $rt \leftarrow zero_extend(rs_{31..24} + rs_{23..16} + rs_{15..8} + rs_{7..0})$ 

The unsigned byte elements in register *rs* are added together as unsigned 8-bit values, and the result is zero extended to a word and written to register *rt*.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \text{ValidateAccessToDSPResources()} \\ \text{temp}_{9..0} \leftarrow ( \ 0^2 \ || \ \text{GPR[rs]}_{31..24} \ ) + ( \ 0^2 \ || \ \text{GPR[rs]}_{23..16} \ ) + ( \ 0^2 \ || \ \text{GPR[rs]}_{15..8} \ ) + \\ ( \ 0^2 \ || \ \text{GPR[rs]}_{7..0} \ ) \\ \text{GPR[rt]}_{31..0} \leftarrow 0^{(\text{GPRLEN-10})} \ || \ \text{temp}_{9..0} \end{array}
```

## **Exceptions:**

## RADDU.W.QB

**Unsigned Reduction Add Vector Quad Bytes** 

| RD | DSP            |                         |          |    |      |    | Rea   | d DSPCc | ontrol Reg | jister | Fiel | ds to a | a GPR            |
|----|----------------|-------------------------|----------|----|------|----|-------|---------|------------|--------|------|---------|------------------|
|    | 31             | 26                      | 25       | 21 | 20   | 14 | 13 12 | 11 9    | 8 6        | 5      | 3    | 2       | 0                |
|    | P32A<br>001000 |                         | rt       |    | mask |    | 00    | 011     | 001        | 11     | 1    | 111     |                  |
|    | 6              |                         | 5        |    | 7    |    | 2     | 3       | 3          | 3      |      | 3       |                  |
|    | Format:        | RDDSP<br>RDDSP<br>RDDSP | rt, mask | 2  |      |    |       |         |            |        |      | Assem   | DSP<br>bly Idiom |

#### Purpose: Read DSPControl Register Fields to a GPR

To copy selected fields from the special-purpose DSPControl register to the specified GPR.

#### **Description:** rt $\leftarrow$ select(mask, DSPControl)

Selected fields in the special register *DSPControl* are copied into the corresponding bits of destination register *rt*. Each of bits 0 through 5 of the *mask* operand corresponds to a specific field in the *DSPControl* register. A mask bit value of 1 indicates that the bits from the corresponding field in *DSPControl* will be copied into the same bit positions in register *rt*, and a mask bit value of 0 indicates that the corresponding bit positions in *rt* will be set to zer o. Bits 6 through 9 of the *mask* operand are ignored.

The table below shows the correspondence between the bits in the *mask* operand and the fields in the *DSPControl* register; mask bit 0 is the least-significant bit in *mask*.

| Bit                 | 31 24 | 23 16  | 15 | 14  | 13 | 12 7   | 6 | 5 0 |
|---------------------|-------|--------|----|-----|----|--------|---|-----|
| DSPControl<br>field | ccond | ouflag | 0  | EFI | С  | scount |   | pos |
| Mask bit            | 4     | 3      |    | 5   | 2  | 1      |   | 0   |

For example, to copy only the bits from the scount field in *DSPControl*, the value of the *mask* operand used will be 2 decimal (0x02 hexadecimal). After execution of the instruction, bits 7 through 12 of register *rt* will have the value of bits 7 through 12 from the scount field in *DSPControl*. The remaining bits in register *rt* will be set to zero.

The one-operand version of the instruction provides a convenient assembly idiom that allows the programmer to read all fields in the *DSPControl* register into the destination register, i.e., it is equivalent to specifying a *mask* value of 31 decimal (0x1F hexadecimal).

#### **Restrictions:**

F

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
\begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{temp}_{31..0} \leftarrow 0^{32} \\ \mbox{if (mask}_0 = 1) \mbox{then} \\ \mbox{temp}_{5..0} \leftarrow \mbox{DSPControl}_{pos:5..0} \\ \mbox{endif} \\ \mbox{if (mask}_1 = 1) \mbox{then} \\ \mbox{temp}_{12..7} \leftarrow \mbox{DSPControl}_{scount:12..7} \\ \mbox{endif} \\ \mbox{if (mask}_2 = 1) \mbox{then} \\ \mbox{temp}_{13} \leftarrow \mbox{DSPControl}_{c:13} \\ \mbox{endif} \\ \mbox{if (mask}_3 = 1) \mbox{then} \\ \mbox{temp}_{23..16} \leftarrow \mbox{DSPControl}_{ouflag:23..16} \end{array}
```

## RDDSP

## **Read DSPControl Register Fields to a GPR**

```
endif

if ( mask<sub>4</sub> = 1 ) then

temp_{27..24} \leftarrow DSPControl_{ccond:27..24}

endif

if ( mask<sub>5</sub> = 1 ) then

temp_{14} \leftarrow DSPControl_{efi:14}

endif

GPR[rt]_{31..0} \leftarrow temp_{31..0}
```

## **Exceptions:**

| EPL.PH      |         |           |       | Replicate immed | late integer into |         | eme | nt Pos | ation |
|-------------|---------|-----------|-------|-----------------|-------------------|---------|-----|--------|-------|
| 31          | 26      | 25        | 21    | 20              | 11 10 9           |         | 3   | 2      | 0     |
| P32<br>0010 |         | rt        |       | S               | x                 | 0000111 |     | 101    |       |
| 6           |         | 5         |       | 10              | 1                 | 7       |     | 3      |       |
| Format      | : REPL. | .PH rd, i | mmedi | ate             |                   |         |     |        | I     |

Denligate Immediate Integer into all Vector Flor

Purpose: Replicate Immediate Integer into all Vector Element Positions

Replicate a sign-extended, 10-bit signed immediate integer value into the two halfwords in a halfword vector.

**Description:** rd  $\leftarrow$  sign\_extend(immediate) || sign\_extend(immediate)

The specified 10-bit signed immediate integer value is sign-extended to 16 bits and replicated into the two halfword positions in destination register rd.

### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are UNPREDICTABLE and the values of the operand vectors become UNPREDICTABLE.

#### **Operation:**

```
ValidateAccessToDSPResources()
temp_{15..0} \leftarrow (immediate_9)^6 || immediate_{9..0}
GPR[rd]_{31..0} \leftarrow temp_{15..0} || temp_{15..0}
```

#### **Exceptions:**

# REPL.PH

**Replicate Immediate Integer into all Vector Element Positions** 

| RE | PL.QB           |                    | Replicate Immed            | diat  | e Ir | nteger inte | o all Vect | or Eleme | nt Positio | ns  |
|----|-----------------|--------------------|----------------------------|-------|------|-------------|------------|----------|------------|-----|
|    | 31 26           | 25 21              | 20                         | 13    | 12   | 11 9        | 8 6        | 5 3      | 2 0        |     |
|    | P32A<br>001000  | rt                 | u                          |       | x    | 010         | 111        | 111      | 111        |     |
|    | 6               | 5                  | 8                          |       | 1    | 3           | 3          | 3        | 3          |     |
|    | Format: REPL.   | QB rt, immed       | iate                       |       |      |             |            |          |            | DSP |
|    | Purpose: Replic | ate Immediate Inte | eger into all Vector Eleme | ent P | osi  | tions       |            |          |            |     |

Replicate a immediate byte into all elements of a quad byte vector.

 $Description: \texttt{rt} \leftarrow \texttt{immediate} ~ || ~ \texttt{immediate} ~ || ~ \texttt{immediate} ~ || ~ \texttt{immediate}$ 

The specified 8-bit signed immediate value is replicated into the four byte elements of destination register rt.

### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

## **Operation:**

```
\label{eq:constraint} \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{temp}_{7..0} \leftarrow \mbox{immediate}_{7..0} \\ \mbox{GPR[rt]}_{31..0} \leftarrow \mbox{temp}_{7..0} \mid \mid \mbox{temp}_{7..0} \mid \mid \mbox{temp}_{7..0} \end{array}
```

## **Exceptions:**

# REPL.QB

**Replicate Immediate Integer into all Vector Element Positions** 

| RE | PLV.PH         |     |            |    |    | Re | plicate a Halfword in | ito | all Ve | ct | or Ele | eme | nt | Positic | ons |
|----|----------------|-----|------------|----|----|----|-----------------------|-----|--------|----|--------|-----|----|---------|-----|
|    | 31             | 26  | 25         | 21 | 20 | 16 | 15 9                  | )   | 8      | 6  | 5      | 3   | 2  | 0       |     |
|    | P32A<br>001000 |     | rt         |    | rs |    | 0000001               |     | 100    |    | 11     | 1   |    | 111     |     |
|    | 6              |     | 5          |    | 5  | 1  | 7                     |     | 3      |    | 3      |     | 1  | 3       | I   |
|    | Format: RE     | PLV | .PH rt, rs | 5  |    |    |                       |     |        |    |        |     |    |         | DSP |

Purpose: Replicate a Halfword into all Vector Element Positions

Replicate a variable halfword into the elements of a halfword vector.

**Description:**  $rt \leftarrow (rs_{15..0} || rs_{15..0})$ 

The halfword value in register rs is replicated into the two halfword elements of destination register rt.

### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
\begin{split} & \text{ValidateAccessToDSPResources()} \\ & \text{temp}_{15..0} \leftarrow \text{GPR[rs]}_{15..0} \\ & \text{GPR[rt]}_{31..0} \leftarrow \text{temp}_{15..0} \ | \ \text{temp}_{15..0} \end{split}
```

### **Exceptions:**

# REPLV.PH

Replicate a Halfword into all Vector Element Positions

| REPLV | .QB            |            |       | Replicate Byte int | o all Vect | or Eleme | nt Positio | ons |
|-------|----------------|------------|-------|--------------------|------------|----------|------------|-----|
| 31    | 26             | 25 21      | 20 16 | 15 9               | 8 6        | 5 3      | 2 0        |     |
|       | P32A<br>001000 | rt         | rs    | 0001001            | 100        | 111      | 111        |     |
|       | 6              | 5          | 5     | 7                  | 3          | 3        | 3          | J   |
| F     | ormat: REPLV   | .QB rt, rs |       |                    |            |          |            | DSP |

Purpose: Replicate Byte into all Vector Element Positions

Replicate a variable byte into all elements of a quad byte vector.

**Description:**  $rt \leftarrow rs_{7..0} || rs_{7..0} || rs_{7..0} || rs_{7..0}$ 

The byte value in register rs is replicated into the four byte elements of destination register rt.

## **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
\begin{split} & \text{ValidateAccessToDSPResources()} \\ & \text{temp}_{7..0} \leftarrow \text{GPR[rs]}_{7..0} \\ & \text{GPR[rt]}_{31..0} \leftarrow \text{temp}_{7..0} ~ || ~ \text{temp}_{7..0} ~ || ~ \text{temp}_{7..0} \end{split}
```

### **Exceptions:**

## REPLV.QB

**Replicate Byte into all Vector Element Positions** 

| HILO        |         |        | Shi | ft an Accui | mulator | Value | e Leav | ing the | Resu | lt in the Sa | me A | CCI | umula | ator |
|-------------|---------|--------|-----|-------------|---------|-------|--------|---------|------|--------------|------|-----|-------|------|
| 31          | 26      | 25     | 22  | 21          | 16      | 15 14 | 13     | 10      | 9    |              | 3    | 2   | 0     |      |
| P32<br>0010 |         | Х      |     | S           |         | ac    |        | x       |      | 0000011      |      |     | 101   |      |
| 6           | I       | 4      |     | 6           |         | 2     |        | 4       |      | 7            |      | 1   | 3     |      |
| Format      | : SHILO | ac, sh | ift |             |         |       |        |         |      |              |      |     |       | D    |

Purpose: Shift an Accumulator Value Leaving the Result in the Same Accumulator

Shift the *HI/LO* paired value in a 64-bit accumulator either left or right, leaving the result in the same accumulator.

**Description:**  $ac \leftarrow (shift \ge 0)$  ? ( $ac \ge shift$ ) : (ac << -shift)

The *HI/LO* register pair is treated as a single 64-bit accumulator that is shifted logically by *shift* bits, with the result of the shift written back to the source accumulator. The *shift* argument is a six-bit signed integer value: a positive argument results in a right shift of up to 31 bits, and a negative argument results in a left shift of up to 32 bits.

The value of *ac* can range from 0 to 3. When *ac*=0, this refers to the original *HI/LO* register pair of the MIPS32 architecture.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{sign} \leftarrow \mbox{shift}_5 \\ \mbox{shift}_{5..0} \leftarrow (\mbox{ sign = 0 ? shift}_{5..0} : -\mbox{shift}_{5..0} ) \\ \mbox{if ( shift}_{5..0} = 0 ) \mbox{then} \\ \mbox{temp}_{63..0} \leftarrow (\mbox{HI}[ac]_{31..0} || \mbox{LO}[ac]_{31..0} ) \\ \mbox{else} \\ \mbox{temp}_{63..0} \leftarrow (\mbox{HI}[ac]_{31..0} || \mbox{LO}[ac]_{31..0} ) >> \mbox{shift} ) \\ \mbox{else} \\ \mbox{temp}_{63..0} \leftarrow ((\mbox{HI}[ac]_{31..0} || \mbox{LO}[ac]_{31..0} ) << \mbox{shift} ) || \mbox{0}^{\mbox{shift}} \\ \mbox{endif} \\ \mbox{endif} \\ \mbox{endif} \\ \mbox{(HI}[ac]_{31..0} || \mbox{LO}[ac]_{31..0} ) \leftarrow \mbox{temp}_{63..32} || \mbox{temp}_{31..0} \\ \end{array}
```

#### **Exceptions:**

SHILO

Shift an Accumulator Value Leaving the Result in the Same Accumulator

| SI | HILOV          |      | Variable S | hift | of Accumula | ator | Value | e Leavi | ng the R | esult i | n th | ie Sar | ne / |   | umula | tor |
|----|----------------|------|------------|------|-------------|------|-------|---------|----------|---------|------|--------|------|---|-------|-----|
|    | 31             | 26   | 25         | 21   | 20          | 16   | 15 14 | 13 12   | 11 9     | 8       | 6    | 5      | 3    | 2 | 0     |     |
|    | P32A<br>001000 |      | x          |      | rs          |      | ac    | 01      | 001      | 00      | 1    | 11     | 1    |   | 111   |     |
|    | 6              |      | 5          |      | 5           |      | 2     | 2       | 3        | 3       |      | 3      |      |   | 3     | 1   |
|    | Format: SI     | HILO | V ac, rs   |      |             |      |       |         |          |         |      |        |      |   |       | DSP |

Purpose: Variable Shift of Accumulator Value Leaving the Result in the Same Accumulator

Shift the *HI/LO* paired value in an accumulator either left or right by the amount specified in a GPR, leaving the result in the same accumulator.

```
Description: ac \leftarrow (GPR[rs]<sub>6..0</sub> >= 0) ? (ac >> GPR[rs]<sub>6..0</sub>) : (ac << -GPR[rs]<sub>6..0</sub>)
```

The *HI/LO* register pair is treated as a single 64-bit accumulator that is shifted logically by *shift* bits, with the result of the shift written back to the source accumulator. The *shift* argument is provided by the six least-significant bits of register *rs*; the remaining bits of *rs* are ignored. The *shift* argument is interpreted as a six-bit signed integer: a positive argument results in a right shift of up to 31 bits, and a negative argument results in a left shift of up to 32 bits.

The value of *ac* can range from 0 to 3. When *ac*=0, this refers to the original *HI/LO* register pair of the MIPS32 architecture.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \text{ValidateAccessToDSPResources()} \\ \text{sign} \leftarrow \text{GPR[rs]}_{5} \\ \text{shift}_{5..0} \leftarrow (\text{ sign = 0 ? GPR[rs]}_{5..0} : -\text{GPR[rs]}_{5..0}) \\ \text{if ( shift}_{5..0} = 0 ) \text{ then} \\ & \text{temp}_{63..0} \leftarrow (\text{ HI[ac]}_{31..0} \mid \mid \text{LO[ac]}_{31..0} ) \\ \text{else} \\ & \text{if ( sign = 0 ) then} \\ & \text{temp}_{63..0} \leftarrow 0^{\text{shift}} \mid \mid ((\text{ HI[ac]}_{31..0} \mid \mid \text{LO[ac]}_{31..0} ) >> \text{shift }) \\ & \text{else} \\ & \text{temp}_{63..0} \leftarrow ((\text{ HI[ac]}_{31..0} \mid \mid \text{LO[ac]}_{31..0} ) << \text{shift }) \mid | 0^{\text{shift}} \\ & \text{endif} \\ \\ \text{endif} \\ (\text{ HI[ac]}_{31..0} \mid \mid \text{LO[ac]}_{31..0} ) \leftarrow \text{temp}_{63..32} \mid | \text{temp}_{31..0} \\ \end{array}
```

### **Exceptions:**

SHILOV Variable Shift of Accumulator Value Leaving the Result in the Same Accumulator

| SH | LL[_S].PH                 |                                      |       |    | Shift Left | Logical Vecto | r Pair Halfwo | rds        |
|----|---------------------------|--------------------------------------|-------|----|------------|---------------|---------------|------------|
| _  |                           | 6 25 2 <sup>-</sup>                  | 20 16 | 15 | 11 10 9    |               | 320           | ٦          |
| -  | SHLL.PH<br>P32A<br>001000 | rt                                   | rs    | sa | 00         | 1110110       | 101           | _          |
|    | SHLL_S.PH                 | -                                    |       | 4  |            |               |               |            |
|    | P32A<br>001000            | rt                                   | rs    | sa | 10         | 1110110       | 101           |            |
| L  | 6                         | 5                                    | 5     | 4  | 2          | 7             | 3             |            |
|    | SHLI                      | [_S].PH<br>PH rt, rs<br>_S.PH rt, rs |       |    |            |               |               | DSI<br>DSI |

Purpose: Shift Left Logical Vector Pair Halfwords

Element-wise shift of two independent halfwords in a vector data type by a fixed number of bits, with optional saturation.

**Description:** rt  $\leftarrow$  sat16(rs<sub>31..16</sub> << sa) || (rs<sub>15..0</sub> << sa)

The two halfword values in register *rs* are each independently shifted left, inserting zeros into the least-significant bit positions emptied by the shift. In the saturating version of the instruction, if the shift results in an overflow the intermediate result is saturated to either the maximum positive or the minimum negative 16-bit value, depending on the sign of the original unshifted value. The two independent results are then written to the corresponding halfword elements of destination register *rt*.

This instruction writes a 1 to bit 22 in the *DSPControl* register in the *ouflag* field if any of the left shift operations results in an overflow or saturation.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
SHLL.PH
    ValidateAccessToDSPResources()
     tempB_{15..0} \leftarrow shift16Left(GPR[rs]_{31..16}, sa)
     tempA_{15..0} \leftarrow shift16Left(GPR[rs]_{15..0}, sa)
    GPR[rt]_{31..0} \leftarrow tempB_{15..0} \mid \mid tempA_{15..0}
SHLL S.PH
    ValidateAccessToDSPResources()
    tempB_{15..0} \leftarrow sat16ShiftLeft(GPR[rs]_{31..16}, sa)
     tempA_{15..0} \leftarrow satl6ShiftLeft(GPR[rs]_{15..0}, sa)
    GPR[rt]_{31..0} \leftarrow tempB_{15..0} || tempA_{15..0}
function shift16Left( a_{15..0}, s_{3..0} )
    if (s_{3,0} = 0) then
         temp_{15..0} \leftarrow a_{15..0}
     else
          sign \leftarrow a_{15}
         temp_{15..0} \leftarrow (a_{15-s..0} || 0^{s})
discart_{15..0} \leftarrow (sign<sup>(16-s)</sup> || a_{14..14-(s-1)})
          if (( \texttt{discart}_{15..0} \neq \texttt{0x0000} ) and ( \texttt{discard}_{15..0} \neq \texttt{0xFFFF} )) then
```

## SHLL[\_S].PH

## Shift Left Logical Vector Pair Halfwords

```
\text{DSPControl}_{\text{ouflag:22}} \leftarrow 1
                  endif
        endif
        return temp_{15..0}
endfunction shift16Left
function sat16ShiftLeft( a_{15 \ldots 0},\ s_{3 \ldots 0} )
        if ( s_{3..0} = 0 ) then
                 \texttt{temp}_{15..0} \leftarrow \texttt{a}_{15..0}
        else
                  sign \leftarrow a<sub>15</sub>
                 \begin{array}{l} \text{sign} ( \ a_{15} \\ \text{temp}_{15...0} \leftarrow ( \ a_{15-s...0} \ || \ 0^{s} \ ) \\ \text{discard}_{15...0} \leftarrow ( \ \text{sign}^{(16-s)} \ || \ a_{14...14-(s-1)} \ ) \\ \text{if} \ (( \ \text{discard}_{15...0} \neq 0 \text{x0000} \ ) \ \text{and} \ ( \ \text{discard}_{15...0} \neq 0 \text{xFFFF} \ )) \ \text{then} \\ \text{temp}_{15...0} \leftarrow ( \ \text{sign} = 0 \ ? \ 0 \text{x7FFF} \ : \ 0 \text{x8000} \ ) \\ \end{array}
                           DSPControl_{ouflag:22} \leftarrow 1
                  endif
        endif
        return temp_{15..0}
endfunction sat16ShiftLeft
```

#### **Exceptions:**

| SHL | L.QB           |    |              | Shift L | .ef   | t Log | ica | al Veo | ctor | Qı  | iad By | rtes |   |   |     |     |
|-----|----------------|----|--------------|---------|-------|-------|-----|--------|------|-----|--------|------|---|---|-----|-----|
| 3   | 1              | 26 | 25 2         | 1       | 20 16 | 15    |     | 9      | 8    |     | 6      | 5    | 3 | 2 | 0   |     |
|     | P32A<br>001000 |    | rt           |         | rs    | sa    | 0   | 100    |      | 001 |        | 11   | 1 |   | 111 |     |
|     | 6              |    | 5            |         | 5     | 3     | 1   | 3      | 1    | 3   |        | 3    |   |   | 3   |     |
|     | Format: SHI    | Ŀ. | QB rt, rs, s | a       |       |       |     |        |      |     |        |      |   |   |     | DSF |

Purpose: Shift Left Logical Vector Quad Bytes

Element-wise left shift of four independent bytes in a vector data type by a fixed number of bits.

```
Description: rt \leftarrow (rs_{31..24} << sa) || (rs_{23..16} << sa) || (rs_{15..8} << sa) || (rs_{7..0} << sa)
```

The four byte values in register *rs* are each independently shifted left by *sa* bits and the *sa* least significant bits of each value are set to zero. The four independent results are then written to the corresponding byte elements of destination register *rt*.

This instruction writes a 1 to bit 22 in the *DSPControl* register in the *ouflag* field if any of the left shift operations results in an overflow.

### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
ValidateAccessToDSPResources()
tempD_{7..0} \leftarrow shift8Left(GPR[rs]_{31..24}, sa_{2..0})
tempC<sub>7..0</sub> \leftarrow shift8Left( GPR[rs]<sub>23..16</sub>, sa<sub>2..0</sub> )
tempB_{7..0} \leftarrow shift8Left(GPR[rs]_{15..8}, sa_{2..0})
tempA_{7..0} \leftarrow shift8Left(GPR[rs]_{7..0}, sa_{2..0})
GPR[rt]_{31..0} \leftarrow tempD_{7..0} || tempC_{7..0} || tempB_{7..0} || tempA_{7..0}
function shift8Left(a_{7..0}, s_{2..0})
    if ( s_{2..0} = 0 ) then
         \texttt{temp}_{7..0} \leftarrow \texttt{a}_{7..0}
    else
         sign \leftarrow a<sub>7</sub>
         if ( discard_{7...0} \neq 0x00 ) then
             \text{DSPControl}_{\text{ouflag:22}} \leftarrow 1
         endif
    endif
    return temp7..0
endfunction shift8Left
```

## **Exceptions:**

## SHLL.QB

| SHLLV[_S].PI   | 4              |                         |       | Shift Left | Logic | al Variable Vector | r Pai | ir Hal | lfwords |
|----------------|----------------|-------------------------|-------|------------|-------|--------------------|-------|--------|---------|
| 31<br>SHLLV.PH | 26             | 25 21                   | 20 16 | 15         | 11 10 | 9                  | 3     | 2      | 0       |
| P32A<br>00100  |                | rt                      | rs    | rd         | 0     | 1110001            |       | 10     | )1      |
| SHLLV_S.PH     |                | 1                       | -     | 1          |       |                    |       |        |         |
| P32A<br>00100  |                | rt                      | rs    | rd         | 1     | 1110001            |       | 10     | )1      |
| 6              |                | 5                       | 5     | 5          | 1     | 7                  |       | 3      | 3       |
| Format:        | SHLLV<br>SHLLV | '[_S].PH<br>'.PH rd, rt | , rs  |            |       |                    |       |        | DS      |

SHLLV S.PH rd, rt, rs

DSP DSP

Purpose: Shift Left Logical Variable Vector Pair Halfwords

Element-wise left shift of the two right-most independent halfwords in a vector data type by a variable number of bits, with optional saturation.

**Description:**  $rd \leftarrow sat16(rt_{31..16} << rs_{3..0}) || sat16(rt_{15..0} << rs_{3..0})$ 

The two halfword values in register *rt* are each independently shifted left by *shift* bits, inserting zeros into the leastsignificant bit positions emptied by the shift. In the saturating version of the instruction, if the shift results in an overflow the intermediate result is saturated to either the maximum positive or the minimum negative 16-bit value, depending on the sign of the original unshifted value. The two independent results are then written to the corresponding halfword elements of destination register *rd*.

The four least-significant bits of *rs* provide the shift value, interpreted as a four-bit unsigned integer; the remaining bits of *rs* are ignored.

This instruction writes a 1 to bit 22 in the *DSPControl* register in the ouflag field if any of the left shift operations results in an overflow or saturation.

### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

### **Operation:**

```
 \begin{array}{l} \mbox{SHLLV.PH} \\ \mbox{ValidateAccessToDSPResources()} \\ \mbox{tempB}_{15..0} \leftarrow \mbox{shift16Left(} \mbox{GPR[rt]}_{31..16}, \mbox{GPR[rs]}_{3..0} \mbox{)} \\ \mbox{tempA}_{15..0} \leftarrow \mbox{shift16Left(} \mbox{GPR[rt]}_{15..0}, \mbox{GPR[rs]}_{3..0} \mbox{)} \\ \mbox{GPR[rd]}_{31..0} \leftarrow \mbox{tempB}_{15..0} \mbox{||} \mbox{tempA}_{15..0} \\ \mbox{SHLLV_S.PH} \\ \mbox{ValidateAccessToDSPResources()} \\ \mbox{tempB}_{15..0} \leftarrow \mbox{sat16ShiftLeft(} \mbox{GPR[rt]}_{31..16}, \mbox{GPR[rs]}_{3..0} \mbox{)} \\ \mbox{tempA}_{15..0} \leftarrow \mbox{sat16ShiftLeft(} \mbox{GPR[rt]}_{15..0}, \mbox{GPR[rs]}_{3..0} \mbox{)} \\ \mbox{GPR[rd]}_{31..0} \leftarrow \mbox{tempB}_{15..0} \mbox{||} \mbox{tempA}_{15..0} \end{array}
```

#### **Exceptions:**

# SHLLV[\_S].PH

Shift Left Logical Variable Vector Pair Halfwords

| SH | LLV.QB         |      |            |     |    |    |    | Shift Le | ft L | .0 | gical Variable Veo | tor | Qu | ad By | /tes |
|----|----------------|------|------------|-----|----|----|----|----------|------|----|--------------------|-----|----|-------|------|
|    | 31             | 26   | 25         | 21  | 20 | 16 | 15 | 1        | 1 1  | 0  | 9                  | 3   | 2  | 0     |      |
|    | P32A<br>001000 |      | rt         |     | rs |    |    | rd       | Х    | ĸ  | 1110010            |     |    | 101   |      |
| L  | 6              |      | 5          |     | 5  |    |    | 5        | 1    | 1  | 7                  |     |    | 3     |      |
|    | Format: SF     | ILLV | .QB rd, rt | , r | s  |    |    |          |      |    |                    |     |    |       | DSI  |

**Purpose:** Shift Left Logical Variable Vector Quad Bytes

Element-wise left shift of four independent bytes in a vector data type by a variable number of bits.

**Description:**  $rd \leftarrow (rt_{31..24} << rs_{2..0}) || (rt_{23..16} << rs_{2..0}) || (rt_{15..8} << rs_{2..0}) || (rt_{7..0} << rs_{2..0})$ 

The four byte values in register rt are each independently shifted left by sa bits, inserting zeros into the least-significant bit positions emptied by the shift. The four independent results are then written to the corresponding byte elements of destination register rd.

The three least-significant bits of *rs* provide the shift value, interpreted as a three-bit unsigned integer; the remaining bits of *rs* are ignored.

This instruction writes a 1 to bit 22 in the *DSPControl* register in the *ouflag* field if any of the left shift operations results in an overflow.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSPResources()} \\ \mbox{tempD}_{7..0} \leftarrow \mbox{shift8Left(} GPR[rt]_{31..24}, \mbox{GPR}[rs]_{2..0} \ ) \\ \mbox{tempB}_{7..0} \leftarrow \mbox{shift8Left(} GPR[rt]_{23..16}, \mbox{GPR}[rs]_{2..0} \ ) \\ \mbox{tempA}_{7..0} \leftarrow \mbox{shift8Left(} GPR[rt]_{7..0}, \mbox{GPR}[rs]_{2..0} \ ) \\ \mbox{tempA}_{7..0} \leftarrow \mbox{shift8Left(} GPR[rt]_{7..0}, \mbox{GPR}[rs]_{2..0} \ ) \\ \mbox{GPR}[rd]_{31..0} \leftarrow \mbox{tempD}_{7..0} \ || \ \mbox{tempB}_{7..0} \ || \ \mbox{tempA}_{7..0} \end{array}
```

#### **Exceptions:**

# SHLLV.QB

Shift Left Logical Variable Vector Quad Bytes

| HLLV_S.W    |          |              |       |    |    | Shi | ft l | eft Logical Varia | ble \ | Vector | Word |
|-------------|----------|--------------|-------|----|----|-----|------|-------------------|-------|--------|------|
| 31          | 26       | 25           | 21 20 | 16 | 15 | 11  | 10   | 9                 | 3     | 2      | 0    |
| P32<br>0010 |          | rt           |       | rs | rd |     | x    | 1111010           |       | 101    |      |
| 6           |          | 5            |       | 5  | 5  |     | 1    | 7                 |       | 3      |      |
| Format      | t: SHLLV | / S.W rd, rt | , rs  |    |    |     |      |                   |       |        | D    |

Purpose: Shift Left Logical Variable Vector Word

A left shift of the word in a vector data type by a variable number of bits, with optional saturation.

**Description:**  $rd \leftarrow sat32(rt_{31..0} << rs_{4..0})$ 

The word element in register *rt* is shifted left by *shift* bits, inserting zeros into the least-significant bit positions emptied by the shift. If the shift results in an overflow the intermediate result is saturated to either the maximum positive or the minimum negative 32-bit value, depending on the sign of the original unshifted value.

The shifted result is then written to destination register rd.

The five least-significant bits of *rs* are used as the shift value, interpreted as a five-bit unsigned integer; the remaining bits of *rs* are ignored.

This instruction writes a 1 to bit 22 in the *DSPControl* register in the *ouflag* field if either of the left shift operations results in an overflow or saturation.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
\label{eq:validateAccessToDSPResources() \\ \texttt{temp}_{31..0} \leftarrow \texttt{sat32ShiftLeft(GPR[rt]_{31..0}, GPR[rs]_{4..0}) \\ \texttt{GPR[rd]}_{31..0} \leftarrow \texttt{temp}_{31..0} \\ \end{matrix}
```

#### **Exceptions:**

# SHLLV\_S.W

| SH | LL_S.W         |      |            |     |    |    |    | Sh | ift l | Left Logical W | ord witl | n Sa | aturat | ion |
|----|----------------|------|------------|-----|----|----|----|----|-------|----------------|----------|------|--------|-----|
|    | 31             | 26   | 25         | 21  | 20 | 16 | 15 | 11 | 10    | 9              | 3        | 2    | 0      |     |
|    | P32A<br>001000 |      | rt         |     | rs |    | sa |    | x     | 1111110        |          | 1    | 101    |     |
| L  | 6              |      | 5          |     | 5  |    | 5  |    | 1     | 7              |          |      | 3      | _   |
|    | Format: SI     | HLL_ | S.W rt, rs | , s | a  |    |    |    |       |                |          |      |        | DSP |

Purpose: Shift Left Logical Word with Saturation

To execute a left shift of a word with saturation by a fixed number of bits.

**Description:** rt  $\leftarrow$  sat32(rs << sa)

The 32-bit word in register rs is shifted left by sa bits, with zeros inserted into the bit positions emptied by the shift. If the shift results in a signed overflow, the shifted result is saturated to either the maximum positive (hexadecimal 0x7FFFFFFF) or minimum negative (hexadecimal 0x80000000) 32-bit value, depending on the sign of the original unshifted value. The shifted result is then written to destination register rt.

The instruction's sa field specifies the shift value, interpreted as a five-bit unsigned integer.

If the shift operation results in an overflow and saturation, this instruction writes a 1 to bit 22 of the *DSPControl* register within the *ouflag* field.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
ValidateAccessToDSPResources()
temp_{31..0} \leftarrow sat32ShiftLeft(GPR[rs]_{31..0}, sa_{4..0})
GPR[rt]_{31..0} \leftarrow temp_{31..0}
function sat32ShiftLeft( a_{13..0}, s_{4..0} )
   if (s = 0) then
        temp_{31} \quad 0 \leftarrow a
    else
        sign \leftarrow a<sub>31</sub>
        if (( discard<sub>31..0</sub> \neq 0x00000000 ) and ( discard<sub>31..0</sub> \neq 0xFFFFFFFF )) then
            temp_{31..0} \leftarrow (sign = 0 ? 0x7FFFFFFF : 0x80000000)
            DSPControl_{ouflag:22} \leftarrow 1
        endif
   endif
   return temp_{31..0}
endfunction sat32ShiftLeft
```

## **Exceptions:**

Reserved Instruction, DSP Disabled

#### **Programming Notes:**

To do a logical left shift of a word in a register without saturation, use the MIPS32 SLL instruction.

# SHLL\_S.W

| SHRA[_R].Q    | В    |                                     |      |    |          | S  | Shif | t Right A | rithmetic | Vector of | Four Bytes   |
|---------------|------|-------------------------------------|------|----|----------|----|------|-----------|-----------|-----------|--------------|
| 31<br>SHRA.QB | 26   | 25 2                                | 1 20 | 16 | 15       | 13 | 12   | 11 9      | 8 6       | 5 3       | 2 0          |
| P32<br>0010   |      | rt                                  | rs   |    | sa       |    | 0    | 000       | 111       | 111       | 111          |
| SHRA_R.QB     |      | 1                                   | -    |    | <u> </u> |    | I    | ļ         | Ļ         |           |              |
| P32<br>0010   |      | rt                                  | rs   |    | sa       |    | 1    | 000       | 111       | 111       | 111          |
| 6             |      | 5                                   | 5    |    | 3        |    | 1    | 3         | 3         | 3         | 3            |
| Format        | SHRA | [_R].QB<br>.QB rt, r<br>_R.QB rt, r |      |    |          |    |      |           |           |           | DSP-<br>DSP- |

Purpose: Shift Right Arithmetic Vector of Four Bytes

To execute an arithmetic right shift on four independent bytes by a fixed number of bits.

**Description:**  $rt \leftarrow round(rs_{31..24} >> sa) || round(rs_{23..16} >> sa) || round(rs_{15..8} >> sa) || round(rs_{7..0} >> sa)$ 

The four byte elements in register *rs* are each shifted right arithmetically by *sa* bits, then written to the corresponding vector elements in destination register *rt*. The *sa* argument is interpreted as an unsigned three-bit integer taking values from zero to seven.

In the rounding variant of the instruction, a value of 1 is added at the most significant discarded bit position of each result prior to writing the rounded result to the destination register.

### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
SHRA.OB
    ValidateAccessToDSP2Resources()
     tempD_{7..0} \leftarrow (GPR[rs]_{31})^{sa} || GPR[rs]_{31..24+sa})
    tempC_{7..0} \leftarrow (GPR[rs]_{23})^{sa} \mid GPR[rs]_{23..16+sa})tempB_{7..0} \leftarrow (GPR[rs]_{15})^{sa} \mid GPR[rs]_{15..8+sa})
     tempA_{7..0} \leftarrow (GPR[rs]_7)^{sa} || GPR[rs]_{7..sa})
    GPR[rt]_{31..0} \leftarrow tempD_{7..0} || tempC_{7..0} || tempB_{7..0} || tempA_{7..0}
SHRA R.QB
    ValidateAccessToDSP2Resources()
     if (sa_2 = 0) then
          tempD_{7..0} \leftarrow GPR[rs]_{31..24}
          tempC_{7..0} \leftarrow GPR[rs]_{23..16}
          tempB_{7..0} \leftarrow GPR[rs]_{15..8}
          \texttt{tempA}_{7..0} \leftarrow \texttt{GPR[rs]}_{7..0}
     else
          tempD_{8..0} \leftarrow (GPR[rs]_{31})^{sa} || GPR[rs]_{31..24+sa-1}) + 1
          tempC_{8..0} \leftarrow (GPR[rs]_{23})^{sa} || GPR[rs]_{23..16+sa-1}) + 1
          tempB_{8..0} \leftarrow (GPR[rs]_{15})^{sa} || GPR[rs]_{15..8+sa-1}) + 1
          tempA_{8..0} \leftarrow (GPR[rs]_7)^{sa} || GPR[rs]_{7..sa-1}) + 1
     endif
    GPR[rt]_{31..0} \leftarrow tempD_{8..1} || tempC_{8..1} || tempB_{8..1} || tempA_{8..1}
```

MIPS® Architecture Extension: nanoMIPS32™ DSP Technical Reference Manual — Revision 0.04

## SHRA[\_R].QB

Shift Right Arithmetic Vector of Four Bytes

endif

## **Exceptions:**

| IRA[_R].PH Shift Right Arithmetic Vector Pair Halfwor |    |    |    |    |    |    |    |    |    |    | ord     |   |   |     |   |
|-------------------------------------------------------|----|----|----|----|----|----|----|----|----|----|---------|---|---|-----|---|
| 31<br>SHRA.PH                                         | 26 | 25 | 21 | 20 | 16 | 15 |    | 12 | 11 | 10 | 9       | 3 | 2 |     | 0 |
| P32A<br>001000                                        |    | rt |    | rs |    |    | sa |    | x  | 0  | 1100110 |   |   | 101 | _ |
| SHRA_R.PH                                             |    | ļ  |    | ļ  |    | I  |    |    |    |    | Į       |   | 1 |     |   |
| P32A<br>001000                                        |    | rt |    | rs |    |    | sa |    | x  | 1  | 1100110 |   |   | 101 |   |
| 6                                                     |    | 5  |    | 5  |    | I  | 4  |    | 1  | 1  | 7       |   | 1 | 3   |   |

Purpose: Shift Right Arithmetic Vector Pair Halfwords

SHRA R.PH rt, rs, sa

Element-wise arithmetic right-shift of two independent halfwords in a vector data type by a fixed number of bits, with optional rounding.

**Description:** rt  $\leftarrow$  rnd16(rs<sub>31..16</sub> >> sa) || rnd16(rs<sub>15..0</sub> >> sa)

The two halfword values in register *rt* are each independently shifted right by *sa* bits, with each value's original sign bit duplicated into the *sa* most-significant bits emptied by the shift.

In the non-rounding variant of this instruction, the two independent results are then written to the corresponding halfword elements of destination register *rd*.

In the rounding variant of the instruction, a 1 is added at the most-significant discarded bit position before the results are written to destination register *rd*.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
SHRA.PH
    ValidateAccessToDSPResources()
    tempB<sub>15</sub> _{0} \leftarrow shift16RightArithmetic(GPR[rs]<sub>31</sub> _{16}, sa)
    tempA_{15..0} \leftarrow shift16RightArithmetic(GPR[rs]_{15..0}, sa)
    GPR[rt]_{31..0} \leftarrow tempB_{15..0} \mid \mid tempA_{15..0}
SHRA R.PH
    ValidateAccessToDSPResources()
    tempB_{15...0} \leftarrow rnd16ShiftRightArithmetic(GPR[rs]_{31...16}, sa)
    tempA_{15..0} \leftarrow rnd16ShiftRightArithmetic(GPR[rs]_{15..0}, sa)
    GPR[rt]_{31..0} \leftarrow tempB_{15..0} \mid \mid tempA_{15..0}
function shift16RightArithmetic( a<sub>15..0</sub>, s<sub>3..0</sub> )
    if ( s_{3..0} = 0 ) then
         \texttt{temp}_{15..0} \leftarrow \texttt{a}_{15..0}
    else
         sign \leftarrow a_{15}
         temp_{15..0} \leftarrow ( sign^{s} || a_{15..s} )
    endif
    return temp_{15..0}
endfunction shift16RightArithmetic
```

DSP

## SHRA[\_R].PH

## Shift Right Arithmetic Vector Pair Halfwords

```
function rndl6ShiftRightArithmetic( a_{15..0}, s_{3..0} )
    if ( s_{3..0} = 0 ) then
        temp<sub>16..0</sub> \leftarrow ( a_{15..0} \mid \mid 0 )
    else
        sign \leftarrow a_{15}
        temp<sub>16..0</sub> \leftarrow ( sign<sup>s</sup> || a_{15..s-1} )
    endif
    temp<sub>16..0</sub> \leftarrow temp + 1
    return temp<sub>16..1</sub>
endfunction rndl6ShiftRightArithmetic
```

## **Exceptions:**

| SHRAV[_R].P    | н  |            | S     | Shift Right Arithmetic Variable Vector Pair Halfwords |      |    |         |   |     |     |  |  |  |
|----------------|----|------------|-------|-------------------------------------------------------|------|----|---------|---|-----|-----|--|--|--|
| 31             | 26 | 25 21      | 20 16 | 15                                                    | 11 1 | 09 |         | 3 | 2   | 0   |  |  |  |
| SHRAV.PH       |    |            |       |                                                       |      |    |         |   |     |     |  |  |  |
| P32A<br>001000 |    | rt         | rs    | rd                                                    | (    | )  | 0110001 |   | 101 |     |  |  |  |
| SHRAV_R.PH     |    | 1          | ŀ     | -                                                     | Į    |    |         |   |     |     |  |  |  |
| P32A<br>001000 |    | rt         | rs    | rd                                                    |      | 1  | 0110001 |   | 101 |     |  |  |  |
| 6              |    | 5          | 5     | 5                                                     |      | 1  | 7       |   | 3   |     |  |  |  |
| Format:        |    | RAV[_R].PH | t ra  |                                                       |      |    |         |   |     | הפו |  |  |  |

SHRAV.PH rd, rt, rs SHRAV R.PH rd, rt, rs DSP DSP

Purpose: Shift Right Arithmetic Variable Vector Pair Halfwords

Element-wise arithmetic right shift of two independent halfwords in a vector data type by a variable number of bits, with optional rounding.

**Description:**  $rd \leftarrow rnd16(rt_{31..16} >> rs_{3..0}) || rnd16(rt_{15..0} >> rs_{3..0})$ 

The two halfword values in register *rt* are each independently shifted right, with each value's original sign bit duplicated into the most-significant bits emptied by the shift. In the non-rounding variant of this instruction, the two independent results are then written to the corresponding halfword elements of destination register *rd*.

In the rounding variant of this instruction, a 1 is added at the most-significant discarded bit position before the results are written to destination register *rd*.

The shift amount sa is given by the four least-significant bits of register rs; the remaining bits of rs are ignored.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
\begin{split} & \text{SHRAV.PH} \\ & \text{ValidateAccessToDSPResources()} \\ & \text{tempB}_{15..0} \leftarrow \text{shift16RightArithmetic(} GPR[rt]_{31..16}, GPR[rs]_{3..0} ) \\ & \text{tempA}_{15..0} \leftarrow \text{shift16RightArithmetic(} GPR[rt]_{15..0}, GPR[rs]_{3..0} ) \\ & \text{GPR[rd]}_{31..0} \leftarrow \text{tempB}_{15..0} \mid \mid \text{tempA}_{15..0} \end{split}
```

#### **Exceptions:**

# SHRAV[\_R].PH

Shift Right Arithmetic Variable Vector Pair Halfwords

| SHRAV[_R].QI   | 3     |                            |    |    |    | Shift Rig | ht Arit | hm | etic Variable | e Vector o | of Fou | ır Byte    |
|----------------|-------|----------------------------|----|----|----|-----------|---------|----|---------------|------------|--------|------------|
| 31             | 26    | 25                         | 21 | 20 | 16 | 15        | 11      | 10 | 9             | 3          | 2      | 0          |
| SHRAV.QB       |       |                            |    |    |    |           |         |    |               |            |        |            |
| P32A<br>001000 |       | rt                         |    | rs |    | rd        |         | 0  | 0111          | 001        | 1      | 01         |
| SHRAV_R.QB     |       |                            |    |    |    | ļ         |         |    |               |            | -!     |            |
| P32A<br>001000 | )     | rt                         |    | rs |    | rd        |         | 1  | 0111          | 001        | 1      | 01         |
| 6              |       | 5                          |    | 5  |    | 5         |         | 1  | 7             |            | ;      | 3          |
| Format:        | SHRAV | /[_R].QB<br>/.QB<br>/ R.QB |    |    |    |           |         |    |               |            |        | DSI<br>DSI |

Purpose: Shift Right Arithmetic Variable Vector of Four Bytes

To execute an arithmetic right shift on four independent bytes by a variable number of bits.

**Description:**  $rd \leftarrow round(rt_{31..24} >> rs_{2..0}) || round(rt_{23..16} >> rs_{2..0}) || round(rt_{15..8} >> rs_{2..0}) || round(rt_{7..0} >> rs_{2..0})$ 

The four byte elements in register *rt* are each shifted right arithmetically by *sa* bits, then written to the corresponding byte elements in destination register *rd*. The *sa* argument is provided by the three least-significant bits of register *rs*, interpreted as an unsigned three-bit integer taking values from zero to seven. The remaining bits of *rs* are ignored.

In the rounding variant of the instruction, a value of 1 is added at the most significant discarded bit position of each result prior to writing the rounded result to the destination register.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
SHRAV.OB
    ValidateAccessToDSP2Resources()
    sa_{2..0} \leftarrow GPR[rs]_{2..0}
     if ( sa_{2..0} = 0 ) then
         tempD_{7..0} \leftarrow GPR[rt]_{31..24}
         tempC_{7..0} \leftarrow GPR[rt]_{23..16}
         tempB_{7..0} \leftarrow GPR[rt]_{15..8}
         tempA_{7...0} \leftarrow GPR[rt]_{7...0}
     else
         tempD_{7..0} \leftarrow (GPR[rt]_{31})^{sa} || GPR[rt]_{31..24+sa})
         tempC_{7..0} \leftarrow (GPR[rt]_{23})^{sa} || GPR[rt]_{23..16+sa})
         tempB_{7..0} \leftarrow (GPR[rt]_{15})^{sa} || GPR[rt]_{15..8+sa})
         tempA_{7..0} \leftarrow (GPR[rt]_7)^{sa} || GPR[rt]_{7..sa})
    endif
    GPR[rd]_{31..0} \leftarrow tempD_{7..0} || tempC_{7..0} || tempB_{7..0} || tempA_{7..0}
SHRAV R.QB
    ValidateAccessToDSP2Resources()
    sa_{2..0} \leftarrow GPR[rs]_{2..0}
    if ( sa_{2..0} = 0 ) then
         tempD_{8..0} \leftarrow (GPR[rt]_{31..24} || 0)
         tempC_{8,.0} \leftarrow (GPR[rt]_{23,.16} | | 0)
```

## SHRAV[\_R].QB

Shift Right Arithmetic Variable Vector of Four Bytes

## **Exceptions:**

| SHRAV_R.W      | ic Variable Word wit | :h R            | oundin | ıg |    |    |    |         |   |     |     |
|----------------|----------------------|-----------------|--------|----|----|----|----|---------|---|-----|-----|
| 31             | 26                   | 25 21           | 20     | 16 | 15 | 11 | 10 | 9       | 3 | 2   | 0   |
| P32A<br>001000 |                      | rt              | rs     |    | rd |    | x  | 1011010 |   | 101 |     |
| 6              |                      | 5               | 5      |    | 5  |    | 1  | 7       |   | 3   |     |
| Format         | : SHF                | RAV R.W rd, rt, | rs     |    |    |    |    |         |   |     | DSP |

Purpose: Shift Right Arithmetic Variable Word with Rounding

Arithmetic right shift with rounding of a signed 32-bit word by a variable number of bits.

**Description:**  $rd \leftarrow rnd32(rt_{31..0} >> rs_{4..0})$ 

The word value in register *rt* is shifted right, with the value's original sign bit duplicated into the most-significant bits emptied by the shift. A 1 is then added at the most-significant discarded bit position before the result is written to destination register *rd*.

The shift amount sa is given by the five least-significant bits of register rs; the remaining bits of rs are ignored.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

### **Operation:**

```
\begin{split} & \text{ValidateAccessToDSPResources()} \\ & \text{temp}_{31..0} \leftarrow \text{rnd32ShiftRightArithmetic(GPR[rt]_{31..0}, GPR[rs]_{4..0})} \\ & \text{GPR[rd]_{31..0}} \leftarrow \text{temp}_{31..0} \end{split}
```

## **Exceptions:**

# SHRAV\_R.W

Shift Right Arithmetic Variable Word with Rounding

| HRA_R.W     |        |             |    |      |   | Sh | ift R | ligh | t Ari | thmetic Wor | d wi | th R | oun | ding |
|-------------|--------|-------------|----|------|---|----|-------|------|-------|-------------|------|------|-----|------|
| 31          | 26     | 25          | 21 | 20 1 | 6 | 15 | 11    | 10   | 9     |             | 3    | 2    | C   | )    |
| P32<br>0010 |        | rt          |    | rs   |   | sa |       | x    |       | 1011110     |      |      | 101 |      |
| 6           |        | 5           | 1  | 5    |   | 5  |       | 1    |       | 7           |      |      | 3   |      |
| Format      | : SHRA | R.W rt, rs, | sa | l    |   |    |       |      |       |             |      |      |     |      |

Purpose: Shift Right Arithmetic Word with Rounding

To execute an arithmetic right shift with rounding on a word by a fixed number of bits.

**Description:** rt  $\leftarrow$  rnd32(rs<sub>31:0</sub> >> sa)

The word in register *rs* is shifted right by *sa* bits, and the sign bit is duplicated into the *sa* bits emptied by the shift. The shifted result is then rounded by adding a 1 bit to the most-significant discarded bit. The rounded result is then written to the destination register *rt*.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
ValidateAccessToDSPResources()
temp<sub>31..0</sub> \leftarrow rnd32ShiftRightArithmetic( GPR[rt]<sub>31..0</sub>, sa<sub>4..0</sub> )
GPR[rt]<sub>31..0</sub> \leftarrow temp<sub>32..1</sub>
function rnd32ShiftRightArithmetic( a<sub>31..0</sub>, s<sub>4..0</sub> )
if ( s<sub>4..0</sub> = 0 ) then
    temp<sub>32..0</sub> \leftarrow ( a<sub>31..0</sub> || 0 )
else
    sign \leftarrow a<sub>31</sub>
    temp<sub>32..0</sub> \leftarrow ( sign<sup>s</sup> || a<sub>31..s-1</sub> )
endif
temp<sub>32..0</sub> \leftarrow temp + 1
return temp<sub>32..1</sub>
endfunction rnd32ShiftRightArithmetic
```

#### **Exceptions:**

Reserved Instruction, DSP Disabled

#### **Programming Notes:**

To do an arithmetic right shift of a word in a register without rounding, use the MIPS32 SRA instruction.

### SHRA\_R.W

Shift Right Arithmetic Word with Rounding

| RL.PH         |    |    |    |    |    |    |    |    | 1  | Shif | t Rig | ht L | ogi | cal <sup>-</sup> | Tw | o H | alfw | or |
|---------------|----|----|----|----|----|----|----|----|----|------|-------|------|-----|------------------|----|-----|------|----|
| 31            | 26 | 25 | 21 | 20 | 16 | 15 |    | 12 | 11 | 9    | 8     | 6    | 5   |                  | 3  | 2   |      | 0  |
| P32/<br>00100 |    | rt |    | rs |    |    | sa |    | 00 | l    | 1     | 1    |     | 111              |    |     | 111  |    |
| 6             |    | 5  |    | 5  |    |    | 4  |    | 3  |      | ;     | 3    |     | 3                |    |     | 3    |    |

Purpose: Shift Right Logical Two Halfwords

To execute a right shift of two independent halfwords in a vector data type by a fixed number of bits.

**Description:**  $rt \leftarrow (rs_{31..16} >> sa) || (rs_{15..0} >> sa)$ 

The two halfwords in register *rs* are independently logically shifted right, inserting zeros into the bit positions emptied by the shift. The two halfword results are then written to the corresponding halfword elements in destination register *rt*.

The shift amount is provided by the *sa* field, which is interpreted as a four bit unsigned integer taking values between 0 and 15.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \mbox{ValidateAccessToDSP2Resources()} \\ \mbox{tempB}_{15..0} \leftarrow 0^{\rm sa} \mid \mid \mbox{GPR}[rs]_{31..sa+16} \\ \mbox{tempA}_{15..0} \leftarrow 0^{\rm sa} \mid \mid \mbox{GPR}[rs]_{15..sa} \\ \mbox{GPR}[rt]_{31..0} \leftarrow \mbox{tempB}_{15..0} \mid \mid \mbox{tempA}_{15..0} \end{array}
```

#### **Exceptions:**

# SHRL.PH

Shift Right Logical Two Halfwords

| SF | IRL.QB         |     |            |    |      |   |    |    | Shift Rig | ght L | ogic | al V | ector | Qua | ıd Byt | es  |
|----|----------------|-----|------------|----|------|---|----|----|-----------|-------|------|------|-------|-----|--------|-----|
|    | 31             | 26  | 25         | 21 | 20 1 | 6 | 15 | 12 | 11 9      | 8     | 6    | 5    | 3     | 2   | 0      |     |
|    | P32A<br>001000 |     | rt         |    | rs   |   | sa | 1  | 100       | 00    | )1   | 1    | 111   | 1   | 111    |     |
|    | 6              |     | 5          |    | 5    |   | 4  |    | 3         | 3     | 5    | 1    | 3     |     | 3      |     |
|    | Format: SHI    | RL. | QB rt, rs, | sa |      |   |    |    |           |       |      |      |       |     |        | DSP |

Purpose: Shift Right Logical Vector Quad Bytes

Element-wise logical right shift of four independent bytes in a vector data type by a fixed number of bits.

**Description:**  $rt \leftarrow rs_{31..24} >> sa) || (rs_{23..16} >> sa) || (rs_{15..8} >> sa) || (rs_{7..0} >> sa)$ 

The four byte values in register *rs* are each independently shifted right by *sa* bits and the *sa* most-significant bits of each value are set to zero. The four independent results are then written to the corresponding byte elements of destination register *rt*.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

#### **Exceptions:**

Reserved Instruction, DSP Disabled

#### **Programming Notes:**

To do a logical left shift of a word in a register without saturation, use the MIPS32 SLL instruction.

# SHRL.QB

| SHRLV.PH |             |               |    |    | Shift | Variat | əle | e Right Logical Pa | air c | of Hal | fword |
|----------|-------------|---------------|----|----|-------|--------|-----|--------------------|-------|--------|-------|
| 31       | 26          | 25 21         | 20 | 16 | 15    | 11 10  | 0   | 9                  | 3     | 2      | 0     |
|          | 32A<br>1000 | rt            | rs |    | rd    | х      | 1   | 1100010            |       | 10     | )1    |
|          | 6           | 5             | 5  |    | 5     | 1      |     | 7                  |       | 3      | 3     |
| Form     | at: SHRLV   | .PH rd, rt, 1 | ſS |    |       |        |     |                    |       |        | DSP   |

Purpose: Shift Variable Right Logical Pair of Halfwords

To execute a right shift of two independent halfwords in a vector data type by a variable number of bits.

**Description:**  $rd \leftarrow (rt_{31..16} >> rs_{3..0}) || (rt_{15..0} >> rs_{3..0})$ 

The two halfwords in register *rt* are independently logically shifted right, inserting zeros into the bit positions emptied by the shift. The two halfword results are then written to the corresponding halfword elements in destination register *rd*.

The shift amount is provided by the four least-significant bits of register *rs*, which is interpreted as a four bit unsigned integer taking values between 0 and 15. The remaining bits of *rs* are ignored.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \text{ValidateAccessToDSP2Resources()} \\ \text{sa}_{3..0} \leftarrow \text{GPR[rs]}_{3..0} \\ \text{tempB}_{15..0} \leftarrow \text{O}^{\text{sa}} \mid \mid \text{GPR[rt]}_{31..\text{sa+16}} \\ \text{tempA}_{15..0} \leftarrow \text{O}^{\text{sa}} \mid \mid \text{GPR[rt]}_{15..\text{sa}} \\ \text{GPR[rd]}_{31..0} \leftarrow \text{tempB}_{15..0} \mid \mid \text{tempA}_{15..0} \end{array}
```

#### **Exceptions:**

# SHRLV.PH

Shift Variable Right Logical Pair of Halfwords

| SH | RLV.QB         |      |             |    |    |    |    | Shift Right | Lo | gica | I Variable Vec | tor | Qu | ad By | /tes |
|----|----------------|------|-------------|----|----|----|----|-------------|----|------|----------------|-----|----|-------|------|
|    | 31             | 26   | 25          | 21 | 20 | 16 | 15 | 11          | 10 | 9    |                | 3   | 2  | 0     |      |
|    | P32A<br>001000 |      | rt          |    | rs |    |    | rd          | x  |      | 1101010        |     |    | 101   |      |
| L  | 6              |      | 5           |    | 5  |    |    | 5           | 1  | 1    | 7              |     |    | 3     |      |
|    | Format: SI     | HRLV | .QB rd, rt, | r  | S  |    |    |             |    |      |                |     |    |       | DSP  |

Purpose: Shift Right Logical Variable Vector Quad Bytes

Element-wise logical right shift of four independent bytes in a vector data type by a variable number of bits.

**Description:**  $rd \leftarrow (rt_{31..24} >> rs_{2..0}) || (rt_{23..16} >> rs_{2..0}) || (rt_{15..8} >> rs_{2..0}) || (rt_{7..0} >> rs_{2..0})$ 

The four byte values in register *rt* are each independently shifted right, inserting zeros into the most-significant bit positions emptied by the shift. The four independent results are then written to the corresponding byte elements of destination register *rd*.

The three least-significant bits of *rs* provide the shift value, interpreted as an unsigned integer; the remaining bits of *rs* are ignored.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
 \begin{array}{l} \text{ValidateAccessToDSPResources()} \\ \text{tempD}_{7..0} \leftarrow \text{shift8Right(} \text{GPR[rt]}_{31..24}, \text{GPR[rs]}_{2..0} \text{ )} \\ \text{tempC}_{7..0} \leftarrow \text{shift8Right(} \text{GPR[rt]}_{23..16}, \text{GPR[rs]}_{2..0} \text{ )} \\ \text{tempB}_{7..0} \leftarrow \text{shift8Right(} \text{GPR[rt]}_{15..8}, \text{GPR[rs]}_{2..0} \text{ )} \\ \text{tempA}_{7..0} \leftarrow \text{shift8Right(} \text{GPR[rt]}_{7..0}, \text{GPR[rs]}_{2..0} \text{ )} \\ \text{GPR[rd]}_{31..0} \leftarrow \text{tempD}_{7..0} \mid \mid \text{tempC}_{7..0} \mid \mid \text{tempB}_{7..0} \mid \mid \text{tempA}_{7..0} \end{array}
```

#### **Exceptions:**

# SHRLV.QB

Shift Right Logical Variable Vector Quad Bytes

| SUBQ[_S].P    | н   |                      |       |    | Subtr   | act Fractional I | Halfw | ord Vecto |
|---------------|-----|----------------------|-------|----|---------|------------------|-------|-----------|
| 31<br>SUBQ.PH | 26  | 25 21                | 20 16 | 15 | 11 10 9 |                  | 3     | 2 0       |
| P32<br>0010   |     | rt                   | rs    | rd | 0       | 1000001          |       | 101       |
| SUBQ_S.PH     |     |                      |       |    |         |                  |       |           |
| P32<br>0010   |     | rt                   | rs    | rd | 1       | 1000001          |       | 101       |
| 6             | 1   | 5                    | 5     | 5  | 1       | 7                |       | 3         |
| Format        | ~ - | _S].PH<br>PH rd, rs, | , rt  |    |         |                  |       | I         |

SUBQ S.PH rd, rs, rt

DSP DSP

Purpose: Subtract Fractional Halfword Vector

Element-wise subtraction of one vector of Q15 fractional halfword values from another to produce a vector of Q15 fractional halfword results, with optional saturation.

```
Description: rd \leftarrow sat16(rs_{31..16} - rt_{31..16}) || sat16(rs_{15..0} - rt_{15..0})
```

The two fractional halfwords in register *rt* are subtracted from the corresponding fractional halfword elements in register *rs*.

For the non-saturating version of this instruction, each result is written to the corresponding element in register *rd*. In the case of overflow or underflow, the result modulo 2 is written to register *rd*.

For the saturating version of the instruction, the subtraction is performed using signed saturating arithmetic. If the operation results in an overflow or an un derflow, the result is clamped to e ither the largest representable value (0x7FFF hexadecimal) or the smallest representable value (0x8000 hexadecimal), respectively, before being written to the destination register *rd*.

For both instructions, if any of the individual subtractions result in underflow, overflow, or saturation, a 1 is written to bit 20 in the *DSPControl* register within the *ouflag* field.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
\begin{split} & \text{SUBQ.PH:} \\ & \text{ValidateAccessToDSPResources()} \\ & \text{tempB}_{15..0} \leftarrow \text{subtract16(GPR[rs]_{31..16}, GPR[rt]_{31..16})} \\ & \text{tempA}_{15..0} \leftarrow \text{subtract16(GPR[rs]_{15..0}, GPR[rt]_{15..0})} \\ & \text{GPR[rd]_{31..0}} \leftarrow \text{tempB}_{15..0} \mid \mid \text{tempA}_{15..0} \\ \\ & \text{SUBQ_S.PH:} \\ & \text{ValidateAccessToDSPResources()} \\ & \text{tempB}_{15..0} \leftarrow \text{sat16Subtract(GPR[rs]_{31..16}, GPR[rt]_{31..16})} \\ & \text{tempA}_{15..0} \leftarrow \text{sat16Subtract(GPR[rs]_{15..0}, GPR[rt]_{15..0})} \\ & \text{GPR[rd]_{31..0}} \leftarrow \text{tempB}_{15..0} \mid \mid \text{tempA}_{15..0} \\ \\ & \text{function subtract16(a_{15..0}, b_{15..0})} \\ & \text{temp}_{16..0} \leftarrow (a_{15} \mid \mid a_{15..0}) - (b_{15} \mid \mid b_{15..0}) \\ & \text{if (temp}_{16} \neq \text{temp}_{15}) \text{ then} \\ & \text{DSPControl}_{ouflag:20} \leftarrow 1 \\ & \text{endif} \\ \\ \end{split}
```

## SUBQ[\_S].PH

### **Subtract Fractional Halfword Vector**

```
return temp<sub>15..0</sub>
endfunction subtract16
function sat16Subtract(a_{15..0}, b_{15..0})
temp<sub>16..0</sub> \leftarrow (a_{15} || a_{15..0}) - (b_{15} || b_{15..0})
if (temp<sub>16</sub> \neq temp<sub>15</sub>) then
if (temp<sub>16</sub> = 0) then
temp \leftarrow 0x7FFF
else
temp \leftarrow 0x8000
endif
DSPControl<sub>ouflag:20</sub> \leftarrow 1
endif
return temp<sub>15..0</sub>
endfunction sat16Subtract
```

#### **Exceptions:**

| รบ | IBQ_S.W        |     |               |      |      |    |    |    | Subtra  | ct Frac | tiona | l Wo | ord |
|----|----------------|-----|---------------|------|------|----|----|----|---------|---------|-------|------|-----|
|    | 31             | 26  | 25 2          | 1 20 | 0 16 | 15 | 11 | 10 | 9       | 3       | 2     | 0    |     |
|    | P32A<br>001000 |     | rt            |      | rs   | rd |    | x  | 1101000 |         | 10    | 1    |     |
|    | 6              |     | 5             |      | 5    | 5  |    | 1  | 7       |         | 3     |      |     |
|    | Format: SUE    | 3Q_ | S.W rd, rs, : | rt   |      |    |    |    |         |         |       |      | DSP |

Purpose: Subtract Fractional Word

One Q31 fractional word is subtracted from another to produce a Q31 fractional result, with saturation.

**Description:**  $rd \leftarrow sat32(rs_{31..0} - rt_{31..0})$ 

The Q31 fractional word in register *rt* is subtracted from the corresponding fractional word in register *rs*, and the 32bit result is written to destination register *rd*. The subtraction is performed using signed saturating arithmetic. If the operation results in an overflow or an un derflow, the result is clamped to e ither the largest representable value (0x7FFFFFFF hexadecimal) or the smallest representable value (0x80000000 hexadecimal), respectively, before being sign-extended and written to the destination register *rd*.

If the subtraction results in underflow, overflow, or saturation, a 1 is written to bit 20 in the *DSPControl* register within the *ouflag* field.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

#### **Exceptions:**

# SUBQ\_S.W

**Subtract Fractional Word** 

| SUBQH[_R].F | ΡH    |                      | Sub | otract Fract | iona | I Halfword V | /ector | s A | nd Shift Right t | o Ha | alve Re | esul |
|-------------|-------|----------------------|-----|--------------|------|--------------|--------|-----|------------------|------|---------|------|
|             |       |                      |     |              |      |              |        |     | -                |      |         |      |
|             |       |                      |     |              |      |              |        |     |                  |      |         |      |
| 31          | 26    | 25                   | 21  | 20           | 16   | 15           | 11 1   | ) 9 |                  | 3    | 2       | 0    |
| SUBQH.PH    |       |                      |     |              |      |              |        |     |                  |      |         |      |
| P324        | 4     |                      |     | ***          |      | rd           | (      |     | 1001001          |      | 101     |      |
| 00100       | 00    | rt                   |     | rs           |      | Iù           |        |     | 1001001          |      | 101     |      |
| SUBQH_R.PH  |       | +                    |     | 1            |      |              |        | 1   |                  |      |         |      |
| P324        | A     |                      |     | ***          |      | rd           | 1      |     | 1001001          |      | 101     |      |
| 00100       | 00    | rt                   |     | rs           |      | rd           | 1      |     | 1001001          |      | 101     |      |
| 6           |       | 5                    |     | 5            |      | 5            | 1      | 1   | 7                |      | 3       |      |
| Format:     | SUBQI | H[ R].PH             |     |              |      |              |        |     |                  |      |         |      |
|             |       | - <u>-</u> -<br>H.PH | rd, | rs, rt       |      |              |        |     |                  |      |         | DSF  |
|             |       | H_R.PH               |     |              |      |              |        |     |                  |      |         | DSF  |

Purpose: Subtract Fractional Halfword Vectors And Shift Right to Halve Results

Element-wise fractional subtraction of halfword vectors, with a right shift by one bit to halve each result, with optional rounding.

**Description:**  $rd \leftarrow round((rs_{31..16} - rt_{31..16}) >> 1) || round((rs_{15..0} - rt_{15..0}) >> 1)$ 

Each element from the two halfword values in register *rt* is subtracted from the corresponding halfword element in register *rs* to create an interim 17-bit result.

In the non-rounding instruction variant, each interim result is then shifted right by one bit before being written to the corresponding halfword element of destination register *rd*.

In the rounding version of the instruction, a v alue of 1 is added at the least-significant bit position of each interim result; the interim result is then right-shifted by one bit and written to the destination register.

This instruction does not modify the DSPControl register.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
ADDOH.PH
    ValidateAccessToDSP2Resources()
    tempB_{15..0} \leftarrow rightShift1SubQ16(GPR[rs]_{31..16}, GPR[rt]_{31..16})
    tempA_{15..0} \leftarrow rightShift1SubQ16(GPR[rs]_{15..0}, GPR[rt]_{15..0})
    GPR[rd]_{31..0} \leftarrow tempB_{15..0} \mid \mid tempA_{15..0}
ADDQH R.PH
    ValidateAccessToDSP2Resources()
    tempB_{15..0} \leftarrow roundRightShift1SubQ16(GPR[rs]_{31..16}, GPR[rt]_{31..16})
    tempA_{15..0} \leftarrow roundRightShift1SubQ16(GPR[rs]_{15..0}, GPR[rt]_{15..0})
    GPR[rd]_{31..0} \leftarrow tempB_{15..0} \mid tempA_{15..0}
function rightShift1SubQ16( a_{15..0} , b_{15..0} )
    temp_{16..0} \leftarrow ((a_{15} || a_{15..0}) - (b_{15} || b_{15..0}))
    return temp<sub>16..1</sub>
endfunction rightShift1SubQ16
function roundRightShift1SubQ16( a_{15\ldots0} , b_{15\ldots0} )
    \texttt{temp}_{16..0} \leftarrow (( a_{15} \mid \mid a_{15..0} ) - ( b_{15} \mid \mid b_{15..0} ))
    temp_{16..0} \leftarrow temp_{16..0} + 1
```

# SUBQH[\_R].PH

### Subtract Fractional Halfword Vectors And Shift Right to Halve Results

return temp<sub>16..1</sub> endfunction roundRightShift1SubQ16

#### **Exceptions:**

| SUBQH[_ | <u>R].W</u> |
|---------|-------------|
|---------|-------------|

Subtract Fractional Words And Shift Right to Halve Results

| 31             | 26 | 25 | 21 | 20 | 16 | 15 | 11 | 10 | 9       | 32 |    |
|----------------|----|----|----|----|----|----|----|----|---------|----|----|
| SUBQH.W        |    |    |    |    |    |    |    |    |         |    |    |
| P32A<br>001000 |    | rt |    | rs |    | rd |    | 0  | 1010001 | 1  | 01 |
| SUBQH_R.W      |    |    |    | ł  |    |    |    | 1  | -       | l  |    |
| P32A<br>001000 |    | rt |    | rs |    | rd |    | 1  | 1010001 | 1  | 01 |
| 6              |    | 5  |    | 5  |    | 5  |    | 1  | 7       |    | 3  |

Format: SUBQH[\_R].W SUBQH.W rd, rs, rt SUBQH\_R.W rd, rs, rt

DSP-R2 DSP-R2

Purpose: Subtract Fractional Words And Shift Right to Halve Results

Fractional subtraction of word vectors, with a right shift by one bit to halve the result, with optional rounding.

**Description:**  $rd \leftarrow round((rs_{31..0} - rt_{31..0}) >> 1)$ 

The word in register *rt* is subtracted from the word in register *rs* to create an interim 33-bit result.

In the non-rounding instruction variant, the interim result is then shifted right by one bit before being written to the destination register *rd*.

In the rounding version of the instruction, a value of 1 is added at the least-si gnificant bit position of the interim result; the interim result is then right-shifted by one bit and written to the destination register.

This instruction does not modify the DSPControl register.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
ADDQH.W
    ValidateAccessToDSP2Resources()
     tempA_{31..0} \leftarrow rightShift1SubQ32(GPR[rs]_{31..0}, GPR[rt]_{31..0})
    GPR[rd]_{31...0} \leftarrow tempA_{31...0}
ADDQH R.W
    ValidateAccessToDSP2Resources()
     tempA_{31..0} \leftarrow roundRightShift1SubQ32(GPR[rs]_{31..0}, GPR[rt]_{31..0})
    GPR[rd]_{31..0} \leftarrow tempA_{31..0}
function rightShift1SubQ32( a_{31\ldots0} , b_{31\ldots0} )
    \texttt{temp}_{32..0} \leftarrow (( \texttt{ a}_{31} \mid | \texttt{ a}_{31..0} ) - ( \texttt{ b}_{31} \mid | \texttt{ b}_{31..0} ))
    return temp_{32..1}
endfunction rightShift1SubQ32
function roundRightShifttSubQ32( a_{31\ldots0} , b_{31\ldots0} )
     \texttt{temp}_{32..0} \leftarrow ((a_{31} || a_{31..0}) - (b_{31} || b_{31..0}))
     \texttt{temp}_{\texttt{32..0}} \leftarrow \texttt{temp}_{\texttt{32..0}} + \texttt{1}
     return temp<sub>32..1</sub>
endfunction roundRightShift1SubQ32
```

MIPS® Architecture Extension: nanoMIPS32™ DSP Technical Reference Manual — Revision 0.04

# SUBQH[\_R].W

Subtract Fractional Words And Shift Right to Halve Results

### **Exceptions:**

| SUBU[_S].PI | ł     |    |                  |    |    |    | Su   | bt | ract Unsigned Ir | tege | er Half | word         |
|-------------|-------|----|------------------|----|----|----|------|----|------------------|------|---------|--------------|
| 31          | 26    | 25 | 21               | 20 | 16 | 15 | 11 1 | 0  | 9                | 3    | 2       | 0            |
| SUBU.PH     |       |    |                  |    |    |    |      |    |                  |      |         |              |
| P32<br>0010 |       | rt |                  | rs |    | rd | (    | )  | 1100001          |      | 10      | l            |
| SUBU_S.PH   |       | ļ  |                  | ļ  |    |    | Į    | -  |                  |      | I       |              |
| P32<br>0010 |       | rt |                  | rs |    | rd | 1    |    | 1100001          |      | 10      | l            |
| 6           |       | 5  |                  | 5  |    | 5  | 1    |    | 7                |      | 3       |              |
| Format      | SUBU. |    | rd, rs<br>rd, rs |    |    |    |      |    |                  |      |         | DSP-<br>DSP- |

Purpose: Subtract Unsigned Integer Halfwords

Element-wise subtraction of pairs of unsigned integer halfwords, with optional saturation.

**Description:**  $rd \leftarrow sat16(rs_{31..16} - rt_{31..16}) || sat16(rs_{15..0} - rt_{15..0})$ 

The two unsigned integer halfwords in register *rs* are subtracted from the corresponding unsigned integer halfwords in register *rt*. The unsigned results are then written to the corresponding element in destination register *rd*.

In the saturating version of the instruction, if either subtraction results in an underflow the result is clamped to the minimum unsigned integer halfword value (0x0000 hexadecimal), before being written to the destination register *rd*.

For both instruction variants, if either subtraction causes an underflow the instruction writes a 1 t o bit 20 in the *DSPControl* register in the *ouflag* field.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
SUBU.PH
    ValidateAccessToDSP2Resources()
    tempB_{15..0} \leftarrow subtractU16U16(GPR[rt]_{31..16}, GPR[rs]_{31..16})
    tempA_{15...0} \leftarrow subtractU16U16(GPR[rt]_{15...0}, GPR[rs]_{15...0})
    GPR[rd]_{31..0} \leftarrow tempB_{15..0} || tempA_{15..0}
SUBU S.PH
    ValidateAccessToDSPResources()
    tempB_{15..0} \leftarrow satU16SubtractU16U16(GPR[rt]_{31..16}, GPR[rs]_{31..16})
    \texttt{tempA}_{15..0} \leftarrow \texttt{satU16SubtractU16U16( GPR[rt]_{15..0}, GPR[rs]_{15..0})}
    GPR[rd]_{31..0} \leftarrow tempB_{15..0} \mid \mid tempA_{15..0}
function subtractU16U16( a_{15\ldots0},\ b_{15\ldots0} )
    \texttt{temp}_{16..0} \leftarrow ( 0 || a_{15..0} ) - ( 0 || b_{15..0} )
    if ( temp_{16} = 1 ) then
         DSPControl<sub>ouflag:20</sub> ← 1
    endif
    return temp<sub>15..0</sub>
endfunction subtractU16U16
```

### SUBU[\_S].PH

### Subtract Unsigned Integer Halfwords

```
 \begin{array}{l} \mbox{function satU16SubtractU16U16( $a_{15..0}$, $b_{15..0}$) \\ \mbox{temp}_{16..0} \leftarrow ( 0 \ || $a_{15..0}$) - ( 0 \ || $b_{15..0}$) \\ \mbox{if ( temp}_{16} = 1 $) $ then $$ $temp_{15..0}$ \leftarrow 0x0000 $$ $DSPControl_{ouflag:20}$ \leftarrow 1 $$ $endif $$ $return temp_{15..0}$ $$ $endfunction satU16SubtractU16U16 $$ } \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \mbox{satU16SubtractU16U16} $$ \endfunction $$ \endfunction $$ \endfunction $$ \endfunction $$ \endfunction $$ \endfunction $$ \endfunction $$ \endfunction $$ \endfunction $$ \endfunction $$ \endfunction $$ \endfunction $$ \endfunction $$ \endfunction $$ \en
```

## **Exceptions:**

| SUBU[_S]      | .QB           |                                       |    |      |     | Sub      | tract Unsigned Qua | d Byte Ve | ector      |
|---------------|---------------|---------------------------------------|----|------|-----|----------|--------------------|-----------|------------|
| 31<br>SUBU.QB | 26            | 25 21                                 | 20 | 16 1 | 5 1 | 11 10    | 9                  | 32        | 0          |
|               | P32A<br>01000 | rt                                    | rs |      | rd  | 0        | 1011001            | 101       |            |
| SUBU.QB       |               | -                                     |    | I    |     | <b>I</b> | 1                  |           |            |
|               | P32A<br>01000 | rt                                    | rs |      | rd  | 1        | 1011001            | 101       |            |
|               | 6             | 5                                     | 5  |      | 5   | 1        | 7                  | 3         |            |
| Form          | SUBU.         | [_S].QB<br>.QB rd, rs<br>_S.QB rd, rs |    |      |     |          |                    |           | DSP<br>DSP |

Purpose: Subtract Unsigned Quad Byte Vector

Element-wise subtraction of one vector of unsigned byte values from another to produce a vector of unsigned byte results, with optional saturation.

**Description:**  $rd \leftarrow sat8(rs_{31..24} - rt_{31..24}) || sat8(rs_{23..16} - rt_{23..16}) || sat8(rs_{15..8} - rt_{15..8}) || sat8(rs_{7..0} - rt_{7..0})$ 

The four byte elements in *rt* are subtracted from the corresponding byte elements in register *rs*.

For the non-saturating version of the instruction, the result modulo 256 is written into the corresponding position in register *rd*.

For the saturating version of the instruction the subtraction is performed using unsigned saturating arithmetic. If the subtraction results in underflow, the value is clamped to the smallest representable value (0 decimal, 0x00 hexadecimal) before being written to the destination register rd.

For each instruction, if any of the individual subtractions result in underflow or saturation, a 1 is written to bit 20 in the *DSPControl* register within the *ouflag* field.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
SUBU.OB:
    ValidateAccessToDSPResources()
     tempD<sub>7..0</sub> \leftarrow subtractU8( GPR[rs]<sub>31..24</sub> , GPR[rt]<sub>31..24</sub> )
     \texttt{tempC}_{7..0} \leftarrow \texttt{subtractU8(GPR[rs]_{23..16}, GPR[rt]_{23..16})}
     tempB<sub>7..0</sub> \leftarrow subtractU8( GPR[rs]<sub>15..8</sub> , GPR[rt]<sub>15..8</sub> )
     \texttt{tempA}_{7..0} \leftarrow \texttt{subtractU8( GPR[rs]}_{7..0}, \texttt{GPR[rt]}_{7..0})
    GPR[rd]_{31..0} \leftarrow tempD_{7..0} || tempC_{7..0} || tempB_{7..0} || tempA_{7..0}
SUBU S.QB:
    ValidateAccessToDSPResources()
     \texttt{tempD}_{7..0} \leftarrow \texttt{satU8Subtract(GPR[rs]_{31..24}, GPR[rt]_{31..24})}
     tempC_{7..0} \leftarrow satU8Subtract(GPR[rs]_{23..16}, GPR[rt]_{23..16})
     \texttt{tempB}_{7..0} \leftarrow \texttt{satU8Subtract(GPR[rs]_{15..8}, GPR[rt]_{15..8})}
     \texttt{tempA}_{7..0} \leftarrow \texttt{satU8Subtract(GPR[rs]}_{7..0}, \texttt{GPR[rt]}_{7..0})
    GPR[rd]_{31..0} \leftarrow tempD_{7..0} || tempC_{7..0} || tempB_{7..0} || tempA_{7..0}
function subtractU8( a_{7 \ldots 0},\ b_{7 \ldots 0} )
     temp_{8..0} \leftarrow (0 || a_{7..0}) - (0 || b_{7..0})
```

### SUBU[\_S].QB

### Subtract Unsigned Quad Byte Vector

```
if (temp<sub>8</sub> = 1) then

DSPControl<sub>ouflag:20</sub> \leftarrow 1

endif

return temp<sub>7..0</sub>

endfunction subtractU8

function satU8Subtract(a_{7..0}, b_{7..0})

temp<sub>8..0</sub> \leftarrow (0 || a_{7..0}) - (0 || b_{7..0})

if (temp<sub>8</sub> = 1) then

temp<sub>7..0</sub> \leftarrow 0x00

DSPControl<sub>ouflag:20</sub> \leftarrow 1

endif

return temp<sub>7..0</sub>

endfunction satU8Subtract
```

### **Exceptions:**

Subtract Unsigned Bytes And Right Shift to Halve Results

| 31             | 26           | 25                 | 21 20  | 16 | 15 | 11 | 10 | 9       | 3 | 2 ( | 0  |
|----------------|--------------|--------------------|--------|----|----|----|----|---------|---|-----|----|
| SUBUH.QB       |              |                    |        |    |    |    |    |         |   |     | Π  |
| P32A<br>001000 |              | rt                 |        | rs | rd |    | 0  | 1101001 |   | 101 |    |
| SUBUH_R.QB     |              | ł                  |        |    |    |    |    |         |   |     |    |
| P32A<br>001000 |              | rt                 |        | rs | rd |    | 1  | 1101001 |   | 101 |    |
| 6              |              | 5                  | Į      | 5  | 5  |    | 1  | 7       |   | 3   | ]  |
|                | UBUH<br>UBUH | [_R].QB<br>.OB rd, | rs, rt |    |    |    |    |         |   | I   | DS |

```
SUBUH R.QB rd, rs, rt
```

DSP-R2 DSP-R2

Purpose: Subtract Unsigned Bytes And Right Shift to Halve Results

Element-wise subtraction of two vectors of unsigned bytes, with a one-bit right shift to halve results and optional rounding.

**Description:**  $rd \leftarrow round((rs_{31..24} - rt_{31..24}) >>1) || round((rs_{23..16} - rt_{23..16}) >>1) || round((rs_{15..8} - rt_{15..8}) >>1) || round((rs_{7..0} - rt_{7..0}) >>1)$ 

The four unsigned byte values in register *rt* are subtracted from the corresponding unsigned byte values in register *rs*. Each unsigned result is then halved by shifting right by one bit position. The byte results are then written to the corresponding elements of destination register *rd*.

In the rounding variant of the instruction, a value of 1 is added to the result of each subtraction at the discarded bit position before the right shift.

The results of this instruction never overflow; no bits of the ouflag field in the DSPControl register are written.

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be a value in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

#### **Exceptions:**

# SUBUH[\_R].QB

Subtract Unsigned Bytes And Right Shift to Halve Results

| WI | RDSP           |                         |     |      |      | Wr | ite Fie | lds to DS | PContro | l Reg | gister | from  | a GPR            |
|----|----------------|-------------------------|-----|------|------|----|---------|-----------|---------|-------|--------|-------|------------------|
|    | 31             | 26                      | 25  | 21   | 20   | 14 | 13 12   | 11 9      | 8 6     | 5     | 3      | 2     | 0                |
|    | P32A<br>001000 |                         |     | rt   | mask |    | 01      | 011       | 001     | 1     | 111    | 11    | 1                |
|    | 6              |                         |     | 5    | 7    |    | 2       | 3         | 3       |       | 3      | 3     |                  |
|    | Format:        | WRDSP<br>WRDSP<br>WRDSP | rt, | mask |      |    |         |           |         |       |        | Assen | DS<br>nbly Idior |

Purpose: Write Fields to DSPControl Register from a GPR

To copy selected fields from the specified GPR to the special-purpose DSPControl register.

**Description:** DSPControl  $\leftarrow$  select(mask, GPR[rt])

Selected fields in the special register *DSPControl* are overwritten with the corresponding bits from the source GPR *rt*. Each of bits 0 through 5 of the *mask* operand corresponds to a specific field in the *DSPControl* register. A mask bit value of 1 indicates that the field will be overwritten using the bits from the same bit positions in register *rt*, and a mask bit value of 0 indicates that the corresponding field will be unchanged. Bits 6 through 9 of the *mask* operand are ignored.

The table below shows the correspondence between the bits in the *mask* operand and the fields in the *DSPControl* register; mask bit 0 is the least-significant bit in *mask*.

| Bit                 | 31 24 | 23 16  | 15 | 14  | 13 | 12 7   | 6 | 5 0 |
|---------------------|-------|--------|----|-----|----|--------|---|-----|
| DSPControl<br>field | ccond | ouflag | 0  | EFI | С  | scount |   | pos |
| Mask bit            | 4     | 3      |    | 5   | 2  | 1      |   | 0   |

For example, to overwrite only the scount field in *DSPControl*, the value of the *mask* operand used will be 2 decimal (0x02 hexadecimal). After execution of the instruction, the scount field in *DSPControl* will have the value of bits 7 through 12 of the specified source register *rt* and the remaining bits in *DSPControl* are unmodified.

The one-operand version of the instruction provides a convenient assembly idiom that allows the programmer to write all the allowable fields in the *DSPControl* register from the source GPR, i.e., it is equivalent to specifying a *mask* value of 31 decimal (0x1F hexadecimal).

#### **Restrictions:**

No data-dependent exceptions are possible.

The operands must be values in the specified format. If they are not, the results are **UNPREDICTABLE** and the values of the operand vectors become **UNPREDICTABLE**.

#### **Operation:**

```
\begin{split} & \text{ValidateAccessToDSPResources()} \\ & \text{newbits}_{31..0} \leftarrow 0^{32} \\ & \text{overwrite}_{31..0} \leftarrow 0 \\ & \text{overwrite}_{5..0} \leftarrow 0 \\ & \text{mewbits}_{5..0} \leftarrow 0^6 \\ & \text{newbits}_{5..0} \leftarrow \text{GPR[rt]}_{5..0} \\ & \text{endif} \\ & \text{if (mask}_1 = 1) \\ & \text{then} \\ & \text{overwrite}_{12..7} \leftarrow 0^6 \\ & \text{newbits}_{12..7} \leftarrow \text{GPR[rt]}_{12..7} \\ & \text{endif} \\ & \text{if (mask}_2 = 1) \\ & \text{then} \\ & \text{overwrite}_{13} \leftarrow 0 \end{split}
```

### WRDSP

### Write Fields to DSPControl Register from a GPR

```
newbits_{13} \leftarrow GPR[rt]_{13}
endif
if ( mask_3 = 1 ) then
    overwrite_{23..16} \leftarrow 0^8
     newbits_{23..16} \leftarrow GPR[rt]_{23..16}
endif
if ( mask_4 = 1 ) then
    overwrite_{31..24} \leftarrow 0^8
    newbits<sub>31..24</sub> \leftarrow GPR[rt]<sub>31..24</sub>
endif
if ( mask_5 = 1 ) then
     overwrite_{14} \leftarrow 0
    newbits_{14} \leftarrow GPR[rt]_{14}
endif
DSPControl \leftarrow DSPControl and overwrite<sub>31.0</sub>
DSPControl \leftarrow DSPControl or \texttt{new}_{\texttt{31..0}}
```

#### **Exceptions:**

# **Endian-Agnostic Reference to Register Elements**

# A.1 Using Endian-Agnostic Instruction Names

Certain instructions being proposed in the Module only operate on a subset of the operands in the register. In most cases, this is simply the left ( $\mathbf{L}$ ) or right ( $\mathbf{R}$ ) half of the register. Some instructions refer to the left alternating ( $\mathbf{LA}$ ) or right alternating ( $\mathbf{RA}$ ) elements of the register. But this type of reference does not take the endian-ness of the processor and memory into account. Since the DSP Module instructions do not take the endian-ness into account and simply use the left or right part of the register, this section describes a method by which users can take advantage of user-defined macros to translate the given instruction to the appropriate one for a given processor endian-ness.

An example is given below that uses actual element numbers in the mnemonics to be endian-agnostic.

In the microMIPS32 architecture, the following conventions could be used:

- PH0 refers to halfword element 0 (from a pair in the specified register).
- PH1 refers to halfword element 1.
- QB01 refers to byte elements 0 and 1 (from a quad in the specified register).
- QB23 refers to byte elements 2 and 3.
- QB02 refers to (even) byte elements 0 and 2.
- QB13 refers to (odd) byte elements 1 and 3.

The even and odd subsets are mainly used in storing, computing on, and loading complex numbers that have a real and imaginary part. If the real and imaginary parts of a complex number are stored in consecutive memory locations, then computations that involve only the real or only the imaginary parts must first extract these to a different register. This can most effectively be done using the even and odd formats of the relevant operations.

Note that these mnemonics are translated by the assembler to underlying real instructions that operate on absolute element positions in the register based on the endian-ness of the processor.

# A.2 Mapping Endian-Agnostic Instruction Names to DSP Module Instructions

To illustrate this process, we will use one instruction as an example. This can be repeated for all the relevant instructions in the Module. The MULEQ\_S instruction multiplies fractional data operands to expanded full-size results in a destination register with optional saturation. Since the result occupies twice the width of the input operands, only half the operands from the source registers are operated on at a time. So the complete instruction mnemonic would be given as MULEQ\_S.W.PH0 rd, rs, rt where the second part (after the first dot) indicates the size of the result, and the third part (after the second dot) indicates the element of the source register being used, which in this example is the 0<sup>th</sup> element. The real instructions that the hardware implements are MULEQ\_S.W.PHL and MULEQ\_S.W.PHR which operate on the left halfword element and the right halfword element respectively, of the given source registers, as shown in Figure A.1. The user can map the user instruction (with .PH0) to the MULEQ\_S.W.PHL real instruction if the processor is big-endian or to the real instruction MULEQ\_S.W.PHR if the processor is little-endian.

#### Figure A.1 The Endian-Independent PHL and PHR Elements in a GPR for the microMIPS32 Architecture



Then MULEQ\_S.W.PH1 rd, rs, rt instruction can be mapped to MULEQ\_S.W.PHR if the processor is big-endian (see Figure A.2), and to MULEQ\_S.W.PHL real instruction if the processor is little-endian (see Figure A.3).

#### Figure A.2 The Big-Endian PH0 and PH1 Elements in a GPR for the microMIPS32 Architecture



#### Figure A.3 The Little-Endian PH0 and PH1 Elements in a GPR for the microMIPS32 Architecture



To specify the even and odd type operations, a user instruction (to use odd elements) such as **PRECEQ\_S.PH.QB02** (which precision expands the values) would be mapped to **PRECEQ\_S.PH.QBLA** or **PRECEQ\_S.PH.QBRA** depending on whether the endian-ness of the processor was big or little, respectively. (LA stands for left-alternating and **RA** for right-alternating).

Figure A.4 The Endian-Independent QBL and QBR Elements in a GPR for the microMIPS32 Architecture



### Figure A.5 The Endian-Independent QBLA and QBRA Elements in a GPR for the microMIPS32 Architecture



# Appendix **B**

# **Revision History**

Vertical change bars in the left page margin note the location of changes to this document since its last release.

NOTE: Change bars on figure titles are used to denote a potential change in the figure itself.

|                     | Version Date |                   | Comments                                                                                                          |  |  |  |  |  |  |
|---------------------|--------------|-------------------|-------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|--|
|                     | 0.01         | August 30, 2017   | Initial revision.                                                                                                 |  |  |  |  |  |  |
|                     | 0.02         | November 17, 2017 | Updated cover and formatting.                                                                                     |  |  |  |  |  |  |
|                     | 0.03         | March 26, 2018    | <ul><li>Fixed bits and typo in ADDQH_R.H</li><li>Fixed broken cross references in the Overview chapter.</li></ul> |  |  |  |  |  |  |
| 0.04 April 27, 2018 |              |                   | Changed confidentiality level to Public.                                                                          |  |  |  |  |  |  |