Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

drm/amdgpu/gfx6: Support harvested SI chips with disabled TCCs (v2)

This commit fixes amdgpu to work on the Radeon HD 7870 XT
which has never worked with the Linux open source drivers before.

Some boards have "harvested" chips, meaning that some parts of
the chip are disabled and fused, and it's sold for cheaper and
under a different marketing name.
On a harvested chip, any of the following can be disabled:
- CUs (Compute Units)
- RBs (Render Backend, aka. ROP)
- Memory channels (ie. the chip has a lower bandwidth)
- TCCs (ie. less L2 cache)

Handle chips with harvested TCCs by patching the registers
that configure how TCCs are mapped.

If some TCCs are disabled, we need to make sure that
the disabled TCCs are not used, and the remaining TCCs
are used optimally.

TCP_CHAN_STEER_LO/HI control which TCC is used by TCP channels.
TCP_ADDR_CONFIG.NUM_TCC_BANKS controls how many channels are used.

Note that the TCC configuration is highly relevant to performance.
Suboptimal configuration (eg. CHAN_STEER=0) can significantly
reduce gaming performance.

For optimal performance:
- Rely on the CHAN_STEER from the golden registers table,
only skip disabled TCCs but keep the mapping order.
- Limit NUM_TCC_BANKS to number of active TCCs to avoid thrashing,
which performs better than using the same TCC twice.

v2:
- Also consider CGTS_USER_TCC_DISABLE for disabled TCCs.

Link: https://bugs.freedesktop.org/show_bug.cgi?id=60879
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/2664
Fixes: 2cd46ad22383 ("drm/amdgpu: add graphic pipeline implementation for si v8")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 00218d15528fab9f6b31241fe5904eea4fcaa30d)

authored by

Timur Kristóf and committed by
Alex Deucher
fe2b84f9 13e4cf11

+66
+66
drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
··· 1571 1571 mutex_unlock(&adev->grbm_idx_mutex); 1572 1572 } 1573 1573 1574 + /** 1575 + * gfx_v6_0_setup_tcc() - setup which TCCs are used 1576 + * 1577 + * @adev: amdgpu_device pointer 1578 + * 1579 + * Verify whether the current GPU has any TCCs disabled, 1580 + * which can happen when the GPU is harvested and some 1581 + * memory channels are disabled, reducing the memory bus width. 1582 + * For example, on the Radeon HD 7870 XT (Tahiti LE). 1583 + * 1584 + * If some TCCs are disabled, we need to make sure that 1585 + * the disabled TCCs are not used, and the remaining TCCs 1586 + * are used optimally. 1587 + * 1588 + * TCP_CHAN_STEER_LO/HI control which TCC is used by TCP channels. 1589 + * TCP_ADDR_CONFIG.NUM_TCC_BANKS controls how many channels are used. 1590 + * 1591 + * For optimal performance: 1592 + * - Rely on the CHAN_STEER from the golden registers table, 1593 + * only skip disabled TCCs but keep the mapping order. 1594 + * - Limit NUM_TCC_BANKS to number of active TCCs to avoid thrashing, 1595 + * which performs better than using the same TCC twice. 1596 + */ 1597 + static void gfx_v6_0_setup_tcc(struct amdgpu_device *adev) 1598 + { 1599 + u32 i, tcc, tcp_addr_config, num_active_tcc = 0; 1600 + u64 chan_steer, patched_chan_steer = 0; 1601 + const u32 num_max_tcc = adev->gfx.config.max_texture_channel_caches; 1602 + const u32 dis_tcc_mask = 1603 + amdgpu_gfx_create_bitmask(num_max_tcc) & 1604 + (REG_GET_FIELD(RREG32(mmCGTS_TCC_DISABLE), 1605 + CGTS_TCC_DISABLE, TCC_DISABLE) | 1606 + REG_GET_FIELD(RREG32(mmCGTS_USER_TCC_DISABLE), 1607 + CGTS_USER_TCC_DISABLE, TCC_DISABLE)); 1608 + 1609 + /* When no TCC is disabled, the golden registers table already has optimal TCC setup */ 1610 + if (!dis_tcc_mask) 1611 + return; 1612 + 1613 + /* Each 4-bit nibble contains the index of a TCC used by all TCPs */ 1614 + chan_steer = RREG32(mmTCP_CHAN_STEER_LO) | ((u64)RREG32(mmTCP_CHAN_STEER_HI) << 32ull); 1615 + 1616 + /* Patch the TCP to TCC mapping to skip disabled TCCs */ 1617 + for (i = 0; i < num_max_tcc; ++i) { 1618 + tcc = (chan_steer >> (u64)(4 * i)) & 0xf; 1619 + 1620 + if (!((1 << tcc) & dis_tcc_mask)) { 1621 + /* Copy enabled TCC indices to the patched register value. */ 1622 + patched_chan_steer |= (u64)tcc << (u64)(4 * num_active_tcc); 1623 + ++num_active_tcc; 1624 + } 1625 + } 1626 + 1627 + WARN_ON(num_active_tcc != num_max_tcc - hweight32(dis_tcc_mask)); 1628 + 1629 + /* Patch number of TCCs used by TCPs */ 1630 + tcp_addr_config = REG_SET_FIELD(RREG32(mmTCP_ADDR_CONFIG), 1631 + TCP_ADDR_CONFIG, NUM_TCC_BANKS, 1632 + num_active_tcc - 1); 1633 + 1634 + WREG32(mmTCP_ADDR_CONFIG, tcp_addr_config); 1635 + WREG32(mmTCP_CHAN_STEER_HI, upper_32_bits(patched_chan_steer)); 1636 + WREG32(mmTCP_CHAN_STEER_LO, lower_32_bits(patched_chan_steer)); 1637 + } 1638 + 1574 1639 static void gfx_v6_0_config_init(struct amdgpu_device *adev) 1575 1640 { 1576 1641 adev->gfx.config.double_offchip_lds_buf = 0; ··· 1794 1729 gfx_v6_0_tiling_mode_table_init(adev); 1795 1730 1796 1731 gfx_v6_0_setup_rb(adev); 1732 + gfx_v6_0_setup_tcc(adev); 1797 1733 1798 1734 gfx_v6_0_setup_spi(adev); 1799 1735