The MAMBA Model transformer using a language modeling head on major (linear layer with weights tied for the input
It begins having a linear projection to extend upon the enter embeddings. Then, a convolution ahead of https://k2spiceshop.com/product/liquid-k2-on-paper-online/