Skip to content

Enhance Dense Layout Evaluation in All Tiers of Memory Hierarchyrelease dense layoutloop evaluation, test passing#301

Merged
angshuman-parashar merged 4 commits intoNVlabs:masterfrom
JianmingTONG:layoutloop
Mar 14, 2025
Merged

Enhance Dense Layout Evaluation in All Tiers of Memory Hierarchyrelease dense layoutloop evaluation, test passing#301
angshuman-parashar merged 4 commits intoNVlabs:masterfrom
JianmingTONG:layoutloop

Conversation

@JianmingTONG
Copy link
Collaborator

Overview

This PR enables timeloop to evaluation data layout in all tiers of memory hierarchy for all dataspaces, verified on typical convolutions; Test pass; Format Clean;

Pls see https://docs.google.com/presentation/d/1L3QLatmdSMooeVGd1ixgejq0fpmbsWQslSx8S5qZUKI/edit?usp=sharing for more details.

How do I use this feature?

Simple! Just
(1) Add "ranks" to each dataspace in your problem configuration file such as layer.yaml below.
(2) Define the layout.yaml for the input problem configuration, such as layout.yaml below. The concrete specifications for how to write a layout.yaml is listed below.
(3) No changes required for mapping, constraint and architecture defintion.

Ready to Run?

scons -j32
./bin/timeloop-model arch.yaml layer.yaml layout.yaml mapping.yaml
architecture:
  version: 0.2

  subtree:
  - name: System
    
    local:
    - name: MainMemory
      class: DRAM
      attributes:
        width: 64
        block_size: 8
        word_bits: 8

    subtree:
    - name: Chip
      attributes:
        technology: 45nm

      local:
      - name: GlobalBuffer
        class: SRAM
        attributes:
          depth: 8192
          width: 64
          block_size: 8
          word_bits: 8
          read_bandwidth: 8
          write_bandwidth: 8

      subtree:
      - name: PE[0..143]
        local:
        - name: RegisterFile
          class: regfile
          attributes:
            depth: 16
            width: 8
            block_size: 1
            datawidth: 8
            meshX: 12
        - name: MACC
          class: intmac
          attributes:
            datawidth: 8
            meshX: 12
problem:
  instance:
    C: 4
    Hdilation: 1
    Hstride: 1
    M: 1
    N: 1
    P: 24
    Q: 24
    R: 3
    S: 3
    Wdilation: 1
    Wstride: 1
  shape:
    coefficients:
    - default: 1
      name: Wstride
    - default: 1
      name: Hstride
    - default: 1
      name: Wdilation
    - default: 1
      name: Hdilation
    data_spaces:
    - name: Weights
      ranks: [C, M, R, S]
      projection:
        - [ [C] ]
        - [ [M] ]
        - [ [R] ]
        - [ [S] ]
    - name: Inputs
      ranks: [N, C, W, H]
      projection:
        - [ [N] ]
        - [ [C] ]
        - [ [R, Wdilation], [P, Wstride] ] # SOP form: R*Wdilation + P*Wstride
        - [ [S, Hdilation], [Q, Hstride] ] # SOP form: S*Hdilation + Q*Hstride 
    - name: Outputs
      ranks: [N, M, Q, P]
      projection:
        - [ [N] ]
        - [ [M] ]
        - [ [Q] ]
        - [ [P] ]
      read_write: true
    dimensions:
    - C
    - M
    - R
    - S
    - N
    - P
    - Q
    name: CNN_Layer
mapping:
  - target: RegisterFile
    type: datatype
    keep:
      - Weights
    bypass:
      - Inputs
      - Outputs
  - target: GlobalBuffer
    type: datatype
    keep:
      - Inputs
      - Outputs
    bypass:
      - Weights
  - target: MainMemory
    type: datatype
    keep:
      - Weights
      - Inputs
      - Outputs
    bypass:
      []
  - target: RegisterFile
    type: temporal
    factors: C1 M1 R1 S1 N1 P1 Q1
    permutation: RSPQCMN
  - target: GlobalBuffer
    type: spatial
    factors: C2 M1 R1 S1 N1 P2 Q3
    permutation: CMRSNPQ
    split: 0
  - target: GlobalBuffer
    type: temporal
    factors: C2 M1 R3 S3 N1 P12 Q8
    permutation: CMRSNPQ
  - target: MainMemory
    type: temporal
    factors: C1 M1 R1 S1 N1 P1 Q1
    permutation: CMRSNPQ
layout:
  - target: MainMemory
    type: interline
    factors: R=1 S=1 P=1 Q=1 C=1 M=1 N=1 H=1 W=1
    permutation: SRCQPMNHW

  - target: MainMemory
    type: intraline
    factors: R=3 S=3 P=24 Q=24 C=4 M=1 N=1 H=27 W=27
    permutation: SRCQPMNHW

  - target: GlobalBuffer
    type: interline
    factors: R=3 S=3 P=24 Q=6 C=2 M=1 N=1 H=27 W=27
    permutation: SRHWCQPMN

  - target: GlobalBuffer
    type: intraline
    factors: R=1 S=1 P=1 Q=4 C=2 M=1 N=1 H=1 W=1
    permutation: SRWHCQPMN

  - target: RegisterFile
    type: interline
    factors: R=1 S=1 P=1 Q=1 C=1 M=1 N=1 H=1 W=1
    permutation: SRCQPMNHW

Layout Specifications

The layout.yaml specifies the layout of each data space.

We assume each data space is projected to a tensor with dimensions C, M, R, S, N, P, Q.
Each data space has multiple ranks, and each rank is projected to one or more dimensions.

For example, {"Inputs", {"Z", "V", "W", "H"}} means data space "Inputs" has four ranks including Z, V, W, and H.

Take another example,
ankToFactorizedDimensionID = {
{"V", {0}},
{"H", {3, 6}},
{"E", {1}},
{"Z", {4}},
{"I", {6}},
{"U", {5}},
{"J", {2}},
{"K", {3}},
{"W", {2, 5}}
};
std::map<std::string, FactorizedDimensionID> dimensionNameToDimID = {
{"C", 0}, {"M", 1}, {"R", 2}, {"S", 3}, {"N", 4}, {"P", 5}, {"Q", 6}
};
means that rank "W" is projected to dimension "R" (dimID=2) and 5, rank "H" is projected to dimension 3 and 6.

permutation specifies the loop order of all ranks, left to right specifies innermost to outermost loop.
factor specifies the factor of each rank in individual loop.

type:

  • interline specifies how ranks are interleaved across memory lines.
  • intraline specifies how ranks are interleaved within a memory line.
layout:
  - target: MainMemory
    type: interline
    factors: J=1 K=1 U=1 I=1 V=1 E=1 Z=1 H=1 W=1
    permutation: KJVIUEZHW

  - target: MainMemory
    type: intraline
    factors: J=3 K=3 U=24 I=24 V=4 E=1 Z=1 H=27 W=27
    permutation: KJVIUEZHW

  - target: GlobalBuffer
    type: interline
    factors: J=3 K=3 U=24 I=6 V=2 E=1 Z=1 H=27 W=27
    permutation: KJHWVIUEZ

  - target: GlobalBuffer
    type: intraline
    factors: J=1 K=1 U=1 I=4 V=2 E=1 Z=1 H=1 W=1
    permutation: KJWHVIUEZ

  - target: RegisterFile
    type: interline
    factors: J=1 K=1 U=1 I=1 V=1 E=1 Z=1 H=1 W=1
    permutation: KJVIUEZHW
Dimension Order: C-0, M-1, R-2, S-3, N-4, P-5, Q-6
Rank List: K J W H V I U E Z
Target: RegisterFile
 num_read_ports: 2, num_write_ports: 2
  Data space: Inputs
  Type: interline
    Rank: W dimension=(2,5)-(R,P), factor=1
    Rank: H dimension=(3,6)-(S,Q), factor=1
    Rank: Z dimension=6-N, factor=1
    Rank: V dimension=2-C, factor=1
  Data space: Outputs
  Type: interline
    Rank: Z dimension=6-N, factor=1
    Rank: E dimension=5-M, factor=1
    Rank: U dimension=4-P, factor=1
    Rank: I dimension=3-Q, factor=1
  Data space: Weights
  Type: interline
    Rank: E dimension=5-M, factor=1
    Rank: V dimension=2-C, factor=1
    Rank: J dimension=1-R, factor=1
    Rank: K dimension=0-S, factor=1
  Data space: Inputs
  Type: intraline (Default: Order from dataSpaceNameToRanks, factor as 1 because of no specification in yaml)
    Rank: Z dimension=6-N, factor=1
    Rank: V dimension=2-C, factor=1
    Rank: W dimension=(2,5)-(R,P), factor=1
    Rank: H dimension=(3,6)-(S,Q), factor=1
  Data space: Outputs
  Type: intraline (Default: Order from dataSpaceNameToRanks, factor as 1 because of no specification in yaml)
    Rank: Z dimension=6-N, factor=1
    Rank: E dimension=5-M, factor=1
    Rank: I dimension=3-Q, factor=1
    Rank: U dimension=4-P, factor=1
  Data space: Weights
  Type: intraline (Default: Order from dataSpaceNameToRanks, factor as 1 because of no specification in yaml)
    Rank: V dimension=2-C, factor=1
    Rank: E dimension=5-M, factor=1
    Rank: J dimension=1-R, factor=1
    Rank: K dimension=0-S, factor=1

Target: GlobalBuffer
 num_read_ports: 2, num_write_ports: 2
  Data space: Inputs
  Type: interline
    Rank: Z dimension=6-N, factor=1
    Rank: V dimension=2-C, factor=2
    Rank: W dimension=(2,5)-(R,P), factor=27
    Rank: H dimension=(3,6)-(S,Q), factor=27
  Data space: Outputs
  Type: interline
    Rank: Z dimension=6-N, factor=1
    Rank: E dimension=5-M, factor=1
    Rank: U dimension=4-P, factor=24
    Rank: I dimension=3-Q, factor=6
  Data space: Weights
  Type: interline
    Rank: E dimension=5-M, factor=1
    Rank: V dimension=2-C, factor=2
    Rank: J dimension=1-R, factor=3
    Rank: K dimension=0-S, factor=3
  Data space: Inputs
  Type: intraline
    Rank: Z dimension=6-N, factor=1
    Rank: V dimension=2-C, factor=2
    Rank: W dimension=(2,5)-(R,P), factor=1
    Rank: H dimension=(3,6)-(S,Q), factor=1
  Data space: Outputs
  Type: intraline
    Rank: Z dimension=6-N, factor=1
    Rank: E dimension=5-M, factor=1
    Rank: U dimension=4-P, factor=1
    Rank: I dimension=3-Q, factor=4
  Data space: Weights
  Type: intraline
    Rank: E dimension=5-M, factor=1
    Rank: V dimension=2-C, factor=2
    Rank: J dimension=1-R, factor=1
    Rank: K dimension=0-S, factor=1

Target: MainMemory
   num_read_ports: 2, num_write_ports: 2
  Data space: Inputs
  Type: interline
    Rank: W dimension=(2,5)-(R,P), factor=1
    Rank: H dimension=(3,6)-(S,Q), factor=1
    Rank: Z dimension=6-N, factor=1
    Rank: V dimension=2-C, factor=1
  Data space: Outputs
  Type: interline
    Rank: Z dimension=6-N, factor=1
    Rank: E dimension=5-M, factor=1
    Rank: U dimension=4-P, factor=1
    Rank: I dimension=3-Q, factor=1
  Data space: Weights
  Type: interline
    Rank: E dimension=5-M, factor=1
    Rank: V dimension=2-C, factor=1
    Rank: J dimension=1-R, factor=1
    Rank: K dimension=0-S, factor=1
  Data space: Inputs
  Type: intraline
    Rank: Z dimension=6-N, factor=1
    Rank: V dimension=2-C, factor=1
    Rank: W dimension=(2,5)-(R,P), factor=1
    Rank: H dimension=(3,6)-(S,Q), factor=1
  Data space: Outputs
  Type: intraline
    Rank: Z dimension=6-N, factor=1
    Rank: E dimension=5-M, factor=1
    Rank: I dimension=3-Q, factor=1
    Rank: U dimension=4-P, factor=1
  Data space: Weights
  Type: intraline
    Rank: V dimension=2-C, factor=1
    Rank: E dimension=5-M, factor=1
    Rank: J dimension=1-R, factor=1
    Rank: K dimension=0-S, factor=1

Enjoy! :D

@angshuman-parashar angshuman-parashar self-assigned this Mar 14, 2025
@angshuman-parashar
Copy link
Collaborator

Clang on Darwin is unhappy about an unused variable:

src/model/buffer.cpp:1014:23: error: variable 'spatial_data_requirement' set but not used [-Werror,-Wunused-but-set-variable]
                  int spatial_data_requirement = 1;

It looks like the entire code block in lines 1014-1026 needs to be removed or moved into the #ifdef DEBUG section. Once you do that, the build completes successfully, and Simba regression tests pass.

@JianmingTONG
Copy link
Collaborator Author

Updated! This is caused by a version issue on my end -- sry for the introduced trouble and thanks for the help!

@angshuman-parashar
Copy link
Collaborator

I don't believe c266f5b5a3da38b9d2684cf387891c5776ae46de fixed the issue. I can still see line 1014 has the offending variable.

@JianmingTONG
Copy link
Collaborator Author

JianmingTONG commented Mar 14, 2025

Thanks for reminding! Fixed :D

@angshuman-parashar
Copy link
Collaborator

Another unused variable:

src/model/buffer.cpp:873:62: error: unused parameter 'compute_cycles' [-Werror,-Wunused-parameter]
                                         const std::uint64_t compute_cycles)

You can just add a (void) compute_cycles; statement at the beginning of the method to avoid this issue.

Also there are some indentation inconsistencies, e.g.,

       for (unsigned j = 0; j < problem::GetShape()->NumDataSpaces; j++)
          {
            if (problem::GetShape()->DataSpaceIDToName.at(j)

If possible please use either (preferred):

       for (unsigned j = 0; j < problem::GetShape()->NumDataSpaces; j++)
       {
         if (problem::GetShape()->DataSpaceIDToName.at(j)

or:

       for (unsigned j = 0; j < problem::GetShape()->NumDataSpaces; j++) {
         if (problem::GetShape()->DataSpaceIDToName.at(j)

@JianmingTONG
Copy link
Collaborator Author

Fixed unused input compute_cycles and corected format -- thanks!

@angshuman-parashar angshuman-parashar merged commit c072c92 into NVlabs:master Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants