3. Configure the Core

The Chromite generator provides the user with a number of customizatio hooks at both the ISA and the micro-architectural levels. By changing a simple configuration file the user can generate an instance of the core ranging in size from embedded micro-controllers to Linux capable high-performance cores or anywhere in between. The typical flow of configuration is shown in Fig. 3.1.

_images/configure.png

Fig. 3.1 Configuration Flow

3.1. ISA Level Configurations

In RISC-V both, the Unprivileged and the Privileged specs both offer a great amount of choices to configure an implementation with. The Unprivileged spec offers various extensions and sub-extensions like Multiply-divide, Floating Point, Atomic, Compressed, etc which a user can choose to implement or not.

The Unprivileged Spec on the other hand provides a much more larger space of configurability to the user. Apart from choosing which privilege modes to implement (Machine, Hypervisor, Supervisor or User), the spec also provides a huge number of Control and Status Registers (CSRs) which impact various aspects of the RISC-V system. For example the MISA csr can be used to dynamically enable or disable execution of certain sub-extensions. Similarly, the valid and legal values of the satp.mode fields indicate what paging schemes are supported by the underlying implementation.

To capture all such possible choices of the RISC-V ISA in a single standard format, InCore has proposed the RISCV-CONFIG YAML format, which has also been adopted by the riscv-community, primarily for the ISA compatibility framework. The Chromite core generator uses the same YAML inputs to control various ISA level features of the core.

3.1.1. Generating CSRs

For implementing the CSR module, Chromite uses the CSR-BOX utility to automatically create a bsv module which implements all the necessary CSRs as per the input YAML specification provided in riscv-config format. An example of the isa YAML is provided here . CSR-BOX ensures the warl functions specified in the YAML are faithfullty replicated in bsv. Along with CSRs CSR-BOX also provides methods and logic to handle traps and xRET instructions based on the privileged modes (U, S, H) defined in the ISA node of the input yaml.

Note that the CSR-BOX allows one to split the CSRs into a daisy-chain like fashion to reduce the impact on timing when instantiating large number of CSRs. Thus, apart from the isa yaml, CSR-BOX also requires a grouping yaml file which indicates which daisy-chain unit should contain which set of CSRs.

CSR-BOX also takes in an optional debug spec yaml (as defined by riscv-config) to capture basic debug related information like where the parking loop code of the debug is placed in the memory map. Providing the debug spec, also indicates CSR-BOX to implement the necessary logic for handling custom debug interrupts like halt, resume and step. The Debug csrs must be defined in the debug spec. TODO provide example LINK

CSR-BOX also allows the user to define custom CSRs that may be required by the the implementation. Chromite uses a custom csr to control the enabling/disabling of caches and branch predictors. The details of this CSR are provided here. An example YAML containing the definition of these CSRs which can be fed into CSR-BOX is available here

3.1.2. Other Derived Configuration Settings

Other than the CSRs, Chromite derives the following parameters from the input isa yaml

  • The ISA string indicates what extensions be enabled in Hardware and its associated collaterals

  • The max value in the supported_xlen node indicates the xlen variable in Chromite. This is used to defined the width of the integer register file, alu operations, bypass width, virtual address size, etc.

  • The flen variable in Chromite is set based on the presence of ‘F’ or ‘D’ characters in the ISA string.

  • If the ‘S’ extension is present in the ISA string, then Chromite detects the supervisor page translation mode to be implemented by detecting the max legal values of the satp.mode csr field present in the input yaml

  • The asid length to be used in the implementation is also derived by checking legal values of the satp.asid csr field.

  • The size of the physical address to be implemented is derived from the physical_addr_sz node of the isa yaml

  • The number of mhpmcounters (and therefore mhpmevents) and their behavior is also captured from the csrs defined in the input isa yaml

  • the number of pmp entries and granularity is also captured from the input isa yaml.

  • custom interrupts/exceptions and their cause values are also captured from the input isa yaml. The implementation creates an entry in the defines file with for the name and cause value. The usage of these custom causes need to be implemented separately in the bsv code.

  • The max size of the cause field in the mcause csr is also derived by checking for the max cause value being used after accounting for the custom interrupts and exceptions.

3.2. Micro-Architectural Configuration hooks

The Chromite core has also defined a custom schema to control various micro-architectural features of the core. A sample configuration file is available here

The following provides a list and description of the configuration hooks available at the micro-architectural level. Note, there are also hooks in this configuration which control the bluespec compilation commands and the verilator commands as well.

3.2.1. num_harts

  • Description: Total number of harts to be instantiated in the dummy test-soc. Note that these will non-coherent cores simply acting as masters on the fast-bus.

  • Examples:

    num_harts: 2
    

3.2.2. overlap_redirections

  • Description: When set to true this field indicates that the branch resolution and the new PC latching to the I$ happen in the same cycle. When set to False, there is a single cycle latency between branch resolution and the new PC being latched to the I$

  • Examples:

    overlap_redirections: True
    

3.2.3. isb_sizes

  • Description: A dictionary controlling the size of the inter-stage buffers of the pipeline. The variable isb_s0s1 controls the size of the isb between stage0 and stage1. Similarly isb_s1s2 dictates the size of the isb between stage1 and stage2 and so on. By increasing isb_s0s1 and isb_s1s2 one can shadow the stalls or latencies in the backend stages of the pipeline by fetching more instructions into the front-end stages of the pipeline.

    There is a restriction however that isb_s2s3 should always be 1. This is because the outputs of register file accessed in stage2 are not buffered and niether is the bypass scheme implemented to handle this scenario.

    One can however increase the number of in-flight instructions by increasing the sizes of isb_s3s4 and isb_s4s5 (increasing isb_s3s4 has a larger impact).

    Also note that if write-after-write stalls are disabled , the size of the wawid is defined by the sum of isb_s3s4 and isb_s4s5. Therefore, increasing in-flight instructions caused a logarithmic increase in the wawid used for maintaining bypass of operands.

  • Examples:

    isb_sizes :
      isb_s0s1: 2
      isb_s1s2: 2
      isb_s2s3: 1
      isb_s3s4: 2
      isb_s4s5: 2
    

3.2.4. merged_rf

  • Description: Boolean field to indicate if the architectural registerfiles for floating and integer should be implemented as a single extended regfile in hw or as separate. This field only makes sense ‘F’ support is enabled in the ISA string of the input isa yaml. Under certain targets like FPGA or certain technologies maintaining a single registerfile might lead to better area and timing savings.

  • Examples:

    merged_rf: True
    

3.2.5. total_events

  • Description: This field indicates the total number of events that can be used to program the mhpm counters. This field is used to capture the size of the events signals that drives the counters.

  • Examples:

    total_events: 28
    

3.2.6. waw_stalls

  • Description: Indicates if stalls must occur on a WAW hazard. If you are looking for higher performance set this to False. Setting this to true would lead to instructions stalling in stage3 due to a WAW hazard.

    Setting this to false also means the scoreboad will not allocate a unique id to the destination register of every instruction that is offloaded for execution. The size of this id depends on the numbr of in-flight instructions after the execution stage, which in turn depends on the size of the isb_s3s4 and isb_s4s5 as defined above.

  • Examples:

    waw_stalls: False
    

3.2.7. iepoch_size

  • Description: integer value indicating the size of the epochs for the instruction memory subsystem. Allowed value is 2 only

  • Examples:

    iepoch_size: 2
    

3.2.8. depoch_size

  • Description: integer value indicating the size of the epochs for the data memory subsystem. Allowed value is 1 only

  • Examples:

    depoch_size: 1
    

3.2.9. s_extension

  • Description: Describes various supervisor and MMU related parameters. These parameters only take effect when “S” is present in the ISA field.

    • sfence_i_complexity: string indicating the complexity of the sfence operation supported in the ITLB. Values are:[‘simple’, ‘complex’]. In simple sfence upon recieveing sfence operation all TLBs entries are flushed. Working of complex sfence is described in section 4.2.1 of RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Version 1.11

    • sfence_d_complexity: string indicating the complexity of the sfence operation supported in the ITLB. Values are:[‘simple’, ‘complex’]. In simple sfence upon recieveing sfence operation all TLBs entries are flushed. Working of complex sfence is described in section 4.2.1 of RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Version 1.11

    • dltb_config: This is a dictionary which contains the implementation parameters of the DTLB. This dictionary can one of set_associative , fully_associative or dummy_tlb the descriptions of which are given below: the DTLB this properties of dtlb are defined.

      • set_associative: indicates the implementation of a set-associative TLB. Note here that parameters not applicable under a given virtualization mode will be ignored by the configuration framework, but they will need to set to some values for the schema checker to pass

        • 4kb: dictionary of parameters for 4kb pages

          • ways : integer indicating the number of ways for this splitTLB

          • sets : integer indicating the number of sets for this splitTLB

          • replacement: integer indicating the replacement algorithm. 0- Random, 1- round-robin, 2 pseudo LRU. Currently only Random is supported.

        • 4mb: dictionary of parameters for 4mb pages

          • ways : integer indicating the number of ways for this splitTLB

          • sets : integer indicating the number of sets for this splitTLB

          • replacement: integer indicating the replacement algorithm. 0- Random, 1- round-robin, 2 pseudo LRU. Currently only Random is supported.

        • 2mb: dictionary of parameters for 2mb pages

          • ways : integer indicating the number of ways for this splitTLB

          • sets : integer indicating the number of sets for this splitTLB

          • replacement: integer indicating the replacement algorithm. 0- Random, 1- round-robin, 2 pseudo LRU. Currently only Random is supported.

        • 1gb: dictionary of parameters for 1gb pages

          • ways : integer indicating the number of ways for this splitTLB

          • sets : integer indicating the number of sets for this splitTLB

          • replacement: integer indicating the replacement algorithm. 0- Random, 1- round-robin, 2 pseudo LRU. Currently only Random is supported.

        • 512gb: dictionary of parameters for 512gb pages

          • ways : integer indicating the number of ways for this splitTLB

          • sets : integer indicating the number of sets for this splitTLB

          • replacement: integer indicating the replacement algorithm. 0- Random, 1- round-robin, 2 pseudo LRU. Currently only Random is supported.

        • 256tb: dictionary of parameters for 256tb pages

          • ways : integer indicating the number of ways for this splitTLB

          • sets : integer indicating the number of sets for this splitTLB

          • replacement: integer indicating the replacement algorithm. 0- Random, 1- round-robin, 2 pseudo LRU. Currently only Random is supported.

      • fully_associative: indicates the implementation of a fully-associative TLB with the following parameters

        • tlb_size: integer indicating the number of entries in the TLB

        • replacement: integer indicating the replacement algorithm. 0- Random, 1- round-robin, 2 pseudo LRU. Currently only Random is supported.

      • dummy_tlb: indicates that a no TLB is present, just a functional interface between the caches and the ptwalk

  • Examples:

    The following is an example of configuring fully-associative TLB with 4 entries each and with random replacement policy

    s_extension:
      sfence_i_complexity: simple
      sfence_d_complexity: complex
      dtlb_config:
        fully_associative:
          tlb_size: 4
          replacement: 0
      itlb_config:
        fully_associative:
          tlb_size: 4
          replacement: 0
    

    The following is an example of configuring set-associative TLB in sv39

    s_extension:
      sfence_i_complexity: simple
      sfence_d_complexity: complex
      dtlb_config:
        set_associative:
          4kb: {ways: 4, sets: 2, replacement: 0}
          2mb: {ways: 2, sets: 2, replacement: 0}
          1gb: {ways: 1, sets: 1, replacement: 0}
      itlb_config:
        set_associative:
          4kb: {ways: 4, sets: 2, replacement: 0}
          2mb: {ways: 2, sets: 2, replacement: 0}
          1gb: {ways: 1, sets: 1, replacement: 0}
    

3.2.10. a_extension

  • Description: Describes various A-extension related parameters. These params take effect only when the “A” extension is enabled in the riscv_config ISA

    • reservation_size: integer indicate the size of the reservation in terms of bytes. Minimum value is 4 and must be a power of 2. For RV64 system minimum should be 8 bytes.

  • Examples:

    a_extension:
      reservation_size: 8
    

3.2.11. m_extension

  • Description: Describes various M-extension related parameters. These parameters take effect only is “M” is present in the ISA field. The multiplier used in the core is a retimed one. The parameters below indicate the number of input and output registers around the combo block to enable retiming.

    • mul_stages_out: Number of stages to be inserted after the multiplier combinational block. Minimum value is 1.

    • mul_stages_in: Number of stages to be inserted before the multiplier combinational block. Minimum value is 0

    • div_stages: an integer indicating the number of cycles for a single division operation. Max value is limited to the XLEN defined in the ISA.

  • Examples:

    m_extension:
      mul_stages_in  : 2
      mul_stages_out : 2
      div_stages: 32
    

3.2.12. fd_extension

  • Description: Capture the number of stages to add inbetween each stage of FMA modules. Each fma module has 4 stages, namely Pre, Mac, Post and `Round. The inputs should be passed once through each stage to obtain the final result, hence they are connected one after the other. Registers can be inserted inbetween each stage to enable retiming. Different number of registers produce different delays and enable retiming for diffent technologies. Two different stages can also be clubbed together before inserting a register. Registers can be inserted on the input and output sides of each stage.

    • spfma: Describes the configuration for single precision FMA. n can take values from 1

    to 4.

    • stage<n>: Describes the configuration for the stage.
      • mod: Module name for the stage. The stages should be setup such that all the 4 steps are performed once for each input. The allowed configurations for each stage are as follows:

      • in: Number of registers to be added on the input side of the stage.

      • out: Number of registers to be added on the output side of the stage.

    • dpfma: Describes the configuration for double precision FMA. The node follows the same format as the spfma node.

    • ordering_depth: Number of inflight Floating point instructions in the pipeline. For maximum throughput this should be two greater than the sum of all registers in spfma and dpfma.

3.2.13. branch_predictor

  • Description: Describes various branch predictor related parameters.

    • instantiate: boolean value indicating if the predictor needs to be instantiated

    • predictor: string indicating the type of predictor to be implemented. Valid values are: ‘gshare’ not. Valid values are : [‘enable’,’disable’]

    • btb_depth: integer indicating the size of the branch target buffer

    • bht_depth: integer indicating the size of the bracnh history buffer

    • history_len: integer indicating the size of the global history register

    • history_bits: integer indicating the number of bits used for indexing bht/btb.

    • ras_depth: integer indicating the size of the return address stack.

  • Examples:

branch_predictor:
  instantiate: True
  predictor: gshare
  btb_depth: 32
  bht_depth: 512
  history_len: 8
  history_bits: 5
  ras_depth: 8

3.2.14. icache_configuration

  • Description: Describes the various instruction cache related features.

    • instantiate: boolean value indicating if the predictor needs to be instantiated not. Valid values are : [‘enable’,’disable’]

    • sets: integer indicating the number of sets in the cache

    • word_size: integer indicating the number of bytes in a word. Fixed to 4.

    • block_size: integer indicating the number of words in a cache-block.

    • ways: integer indicating the number of the ways in the cache

    • fb_size: integer indicating the number of fill-buffer entries in the cache

    • replacement: strings indicating the replacement policy. Valid values are: [“PLRU”, “RR”, “Random”]

    • ecc_enable: boolean field indicating if ECC should be enabled on the cache.

    • one_hot_select: boolean value indicating if the bsv one-hot selection funcion should be used of conventional for-loops to choose amongst lines/fb-lines. Choice of this has no affect on the functionality

    If supervisor is enabled then the max size of a single way should not exceed 4Kilo Bytes

  • Examples:

    icache_configuration:
      instantiate: True
      sets: 4
      word_size: 4
      block_size: 16
      ways: 4
      fb_size: 4
      replacement: "PLRU"
      ecc_enable: false
      one_hot_select: false
    

3.2.15. dcache_configuration

  • Description: Describes the various instruction cache related features.

    • instantiate: boolean value indicating if the predictor needs to be instantiated not. Valid values are : [‘enable’,’disable’]

    • sets: integer indicating the number of sets in the cache

    • word_size: integer indicating the number of bytes in a word. Fixed to 4.

    • block_size: integer indicating the number of words in a cache-block.

    • ways: integer indicating the number of the ways in the cache

    • fb_size: integer indicating the number of fill-buffer entries in the cache

    • sb_size: integer indicating the number of store-buffer entries in the cache. Fixed to 2

    • lb_size: integer indicating the number lines to be stored in the store buffer. Applicable only when rwports == 1r1w

    • ib_Size: integer indicating the number of io-buffer entries in the cache. Default to 2

    • replacement: strings indicating the replacement policy. Valid values are: [“PLRU”, “RR”, “Random”]

    • ecc_enable: boolean field indicating if ECC should be enabled on the cache.

    • one_hot_select: boolean value indicating if the bsv one-hot selection funcion should be used of conventional for-loops to choose amongst lines/fb-lines. Choice of this has no affect on the functionality

    • rwports: number of read-write ports available on the brams. Allowed values are 1rw, 1r1w and 2rw

If supervisor is enabled then the max size of a single way should not exceed 4Kilo Bytes

  • Examples:

    dcache_configuration:
      instantiate: True
      sets: 4
      word_size: 4
      block_size: 16
      ways: 4
      fb_size: 4
      sb_size: 2
      lb_size: 2
      ib_size: 2
      replacement: "PLRU"
      ecc_enable: false
      one_hot_select: false
      rwports: 1r1w
    

3.2.16. reset_pc

  • Description: Integer value indicating the reset value of program counter

  • Example:

3.2.17. bus_protocol

  • Description: bus protocol for the master interfaces of the core. Fixed to “AXI4”

  • Examples:

    bus_protocol: AXI4
    

3.2.18. verilator_configuration

  • Description: describes the various configurations for verilator compilation.

    • coverage: indicates the type of coverage that the user would like to track. Valid values are: [“none”, “line”, “toggle”, “all”]

    • trace: boolean value indicating if vcd dumping should be enabled.

    • threads: an integer field indicating the number of threads to be used during simulation

    • verbosity: a boolean field indicating of the verbose/display statements in the generated verilog should be compiled or not.

    • out_dir: name of the directory where the final executable will be dumped.

    • opt_fast: gcc flags for compiling the fast components of the design in verilator

    • opt_slow: gcc flags for compiling the slow components of the design in verilator

    • opt: gcc flags for compiling the design in verilator

  • Examples:

    verilator_configuration:
      coverage: "none"
      trace: False
      threads: 1
      verbosity: True
      open_ocd: False
      sim_speed: fast
    

3.2.19. bsc_compile_options

  • Description: Describes the various bluespec compile options

    • test_memory_size: size of the BRAM memory in the test-SoC in bytes. Default is 32MB

    • assertions: boolean value indicating if assertions used in the design should be compiled or not

    • trace_dump: boolean value indicating if the logic to generate a simple trace should be implemented or not. Note this is only for simulation and not a real trace

    • trace_dump_limit: sets the limit of number of intruction in the rtl.dump file. If the number of instruction cross the limit it creats a new file and appends the log.

    • compile_target: a string indicating if the bsv files are being compiled for simulation of for asic/fpga synthesis. The valid values are: [ ‘sim’, ‘asic’, ‘fpga’ ]

    • suppress_warnings: List of warnings which can be suppressed during bluespec compilation. Valid values are: [“none”, “all”, “G0010”, “T0054”, “G0020”, “G0024”, “G0023”, “G0096”, “G0036”, “G0117”, “G0015”]

    • ovl_assertions: boolean value indicating if OVL based assertions must be turned on/off

    • ovl_path: string indicating the path where the OVL library is installed.

    • sva_assertions: boolean value indicating if SVA based assertions must be turned on/off

    • verilog_dir: the directory name of where the generated verilog will be dumped

    • open_ocd: a boolean field indicating if the test-bench should have an open-ocd vpi enabled.

    • build_dir: the directory name where the bsv build files will be dumped

    • top_module: name of the top-level bluespec module to be compiled.

    • top_file: file containing the top-level module.

    • top_dir: directory containing the top_file.

    • cocotb_sim: boolean variable. When set the terminating conditions in the test-bench environments are disabled, as the cocotb environment is meant to handle that. When set to false, the bluespect test-bench holds the terminating conditions.

  • Examples:

    bsc_compile_options:
      assertions: True
      trace_dump: True
      suppress_warnings: "none"
      top_module: mkTbSoc
      top_file: TbSoc
      top_dir: base_sim
      out_dir: bin
    

3.2.20. asic_params

Description: This node captures the parameters required to generate asic physical design scripts like synthesis scripts, etc.

  • tech_size: this is an integer indicating the size of the technology 65, 90, etc.

  • frequency_mhz: this is an integer indicating the target frequency in MHz units.

Examples:

asic_params:
  tech_size: 65
  frequency: 600

3.2.21. noinline_modules

Description: This node contains multiple module names which take a boolean value. Setting a module to True would generate a separate verilog file for that module during bluespec compilation. If set to False, then that particular module will be in lined the module above it in hierarchy in the generated verilog.

Examples:

noinline_modules:
  stage0: False
  stage1: True
  stage2: False
  stage3: False