> the policy of exempting small-scale taxpayers with monthly sales of less than 100,000 yuan from value-added tax added 103.8 billion yuan in tax cuts, benefiting 22.11 million taxpayers; the collection rate of small-scale taxpayers was reduced from 3% to 1%. Added 40 billion yuan in tax cuts, benefiting 4.83 million taxpayers; continued to implement the policy of phased reductions in unemployment insurance premium rates and added 52.1 billion yuan in additional fee reductions, benefiting 14.34 million payers; the policy of reducing income taxes for small and micro enterprises added new cuts The tax was 47.3 billion yuan, benefiting 4.12 million taxpayers; the new energy vehicle purchase tax exemption policy added 29 billion yuan in tax cuts, benefiting 1.66 million car buyers.
> The input text is parsed into tokens by a byte pair encoding tokenizer, and each token is converted via a word embedding into a vector. Then, positional information of the token is added to the word embedding.
> Like earlier seq2seq models, the original transformer model used an encoder/decoder architecture. The encoder consists of encoding layers that process the input iteratively one layer after another, while the decoder consists of decoding layers that do the same thing to the encoderโs output.
> The function of each encoder layer is to generate encodings that contain information about which parts of the inputs are relevant to each other. It passes its encodings to the next encoder layer as inputs. Each decoder layer does the opposite, taking all the encodings and using their incorporated contextual information to generate an output sequence.[12] To achieve this, each encoder and decoder layer makes use of an attention mechanism.
> For each part of the input, attention weighs the relevance of every other part and draws from them to produce the output.[13] Each decoder layer has an additional attention mechanism that draws information from the outputs of previous decoders, before the decoder layer draws information from the encodings.
> Both the encoder and decoder layers have a feed-forward neural network for additional processing of the outputs and contain residual connections and layer normalization steps.[13]