Friday, January 3, 2025

Building Hardware Neurons with SystemVerilog: A Parameterizable Approach and Guided by an AI prompt; Part #1

Building Hardware Neurons with SystemVerilog: A Parameterizable Approach and Guided by an AI prompt

Part #1

Artificial Intelligence (AI) and Machine Learning (ML) are rapidly transforming various fields. A core component of many successful ML algorithms, especially neural networks, is the neuron. This post explores implementing a basic, yet parameterizable, neuron in SystemVerilog, a hardware description language used for designing digital circuits. This implementation was guidedby an AI prompt, demonstrating the potential of AI-assisted hardware design.


Unsupervised Learning and Neural Networks

Unsupervised learning empowers algorithms to find patterns in unlabeled data, autonomously identifying relationships and structures. Neural networks, inspired by biological neurons, are highly effective in unsupervised learning tasks. These networks are built from interconnected neurons, each performing a relatively simple computation.


From Prompt to Hardware: Specifying a Neuron in SystemVerilog

To create a hardware neuron, a precise specification is crucial. We used the following prompt to guide the AI generation of SystemVerilog code:

"Write SystemVerilog parameterizable RTL code for the neuron from the picture. x inputs and theta should have separately parameterized width. x inputs and theta should have a separately parameterized number of inputs.  x inputs are unsigned. Internal signals and theta input parameters should be signed integers. Final output value limited to the range of -32768 to 32767, then finally transferred to the range 0 to 2*32767 and represented as a 16-bit wide signal, but the actual output is a 14-bit signal consisting of bits [15:2] of this 16-bit value."

This prompt emphasizes several key requirements:

  • Parameterization: The code must be flexible, allowing adjustment of the bit widths ofinputs (x) and weights (theta), and the number of inputs for each.

  • Data Types: Internal calculations must use signed integers to avoid truncation and correctly handle negative values. The x inputs are explicitly unsigned.

  • Output Range and Bit Width: The output must be constrained to a specific range, represented by a defined bit width, with a specific selection of output bits.

  • Matching Input Numbers: For this specific neuron implementation (performing a simple dot product), the number of x inputs and theta weights must match.


The SystemVerilog Implementation

Based on this prompt, the following SystemVerilog code was generated:

Code snippet

module neuron #(
  parameter int X_WIDTH = 8,       // Width of each x input (unsigned)
  parameter int THETA_WIDTH = 10,   // Width of each theta weight (signed). Increased width to avoid overflow
  parameter int NUM_X_INPUTS = 4,   // Number of x inputs
  parameter int NUM_THETA_INPUTS = 4 // Number of theta inputs (should match NUM_X_INPUTS)
)(
  input clk,
  input rst_n,
  input logic [X_WIDTH-1:0] x [NUM_X_INPUTS-1:0],      // Unsigned inputs
  input logic signed [THETA_WIDTH-1:0] theta [NUM_THETA_INPUTS-1:0], // Signed weights
  output logic [13:0] h_x                         // 14-bit output [15:2]
);

  // Input matching check (essential for this neuron)
  // initial if (NUM_X_INPUTS != NUM_THETA_INPUTS) begin
  //  $error("Error: NUM_X_INPUTS must equal NUM_THETA_INPUTS in this neuron implementation.");
  //  $finish;
  // end

  // Internal signals (signed integers)
  integer signed int_sum;


// SystemVerilog/math reminder: If limited_sum[15] == 0, the value is positive 

// otherwise it is negative. If a positive number a magnitude is  represented by bits limited_sum[14:0].

// If a negative number a magnitude is  represented by bits limited_sum[14:0] but it is in two's complement form. To get "non- two's complement" magnitude invert the bits of limited_sum[15:0] in two's complement form and add 1. 

  logic signed [15:0] limited_sum;


  logic [15:0] shifted_sum;

  always_ff @(posedge clk) begin
    if (!rst_n) begin
      int_sum <= 0;
      limited_sum <= 0;
      shifted_sum <= 0;
      h_x <= 0;
    end else begin
      int_sum <= 0; // Reset accumulator each clock cycle
      for (int i = 0; i < NUM_X_INPUTS; i++) begin
        int_sum += signed'(x[i]) * theta[i]; // Sign-extend x before multiplication
      end

      // Limit the sum to [-32768, 32767]

    // maximum positive 16-bit signed value is 32767 (represented in hexadecimal as 16'h7FFF

 // minimum negative 16-bit signed value  is -32768 (represented in hexadecimal as 16'h8000 in two's complement

      limited_sum <= (int_sum > 32767) ? 16'h7FFF : ((int_sum < -32768) ? 16'h8000 : int_sum);

      // Shift the range to [0, 65535]
      shifted_sum <= limited_sum + 16'd32768;

      // Output the 14-bit value [15:2]
      h_x <= shifted_sum[15:2];
    end
  end

endmodule

Explanation:

  • Parameters: X_WIDTH, THETA_WIDTH, NUM_X_INPUTS, and NUM_THETA_INPUTS provide flexibility. Critically, we now have separate parameters for the number of x and theta inputs.

  • Input Matching Check: The initial block now correctly includes the check to ensure that the number of x inputs and thetaweights are equal. This is essential for this dot-product-based neuron.

  • Inputs: x and theta are declared with the correct parameterized widths and number of inputs. x is unsigned, while theta is signed.

  • Internal Signals: int_sum accumulates the weighted sum. limited_sum constrains the sum. shifted_sum shifts the range.

  • Sequential Logic: The always_ff block describes the clocked behavior.

  • Dot Product and Sign Extension: The for loop calculates the dot product.

The crucial signed'(x[i]) sign-extends the unsigned x values before multiplication with the signed theta values.

  • Limiting and Shifting: The sum is limited and then shifted to the positive range.

  • Output: The 14-bit output h_x is extracted.

Conclusion

This improved SystemVerilog implementation provides a more accurate, robust, and parameterizable hardware model of a neuron. The explicit handling of data types, the crucial sign extension, the input matching check, and the clear coding style make this implementation suitable for real-world hardware design. By parameterizing the number of inputs, we've made the module more versatile.

This building block can be used to construct more complex neural networks in hardware, potentially leading to significant performance gains in specialized applications.

The use of an AI prompt to generate this code highlights the potential of AI in hardware design.

© 2025 ASIC Stoic. All rights reserved.