- This repo is based on https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/transformer.py.
- The
Transformermodel with related functions and models are placed in directorytransformer/. - We apply our
Transformermodel to text classfication task usingImdbdataset and odd number generation task to test our model. The corresponding bash scripts and models are underimdb/andodd_numbers/.
We implemented our model with reference to torch.nn.modules.transformer, which is composed of Transformer, TransformerEncoder, TransformerDecoder, TransformerEncoderLayer,
TransformerDecoderLayer.
There are some differences between pytorch's model and ours. For attention mask, we only allow its dtype to be oneflow.int32, oneflow.int64 or oneflow.int8.
We use Imdb dataset to test our model first.
mkdir datasets; cd datasets;
wget https://oneflow-public.oss-cn-beijing.aliyuncs.com/datasets/models/Imdb_ofrecord.tar.gz
tar zxf Imdb_ofrecord.tar.gzThe bash script imdb/train.sh will train our model on imdb dataset.
cd imdb
sh train.shThe default parameters are displayed below. You can modify them to fit your own environment.
BATCH_SIZE=32
EPOCH=1
LEARNING_RATE=0.0001
SEQUENCE_LEN=128
VOCAB_SZ=100000
D_MODEL=512
DROPOUT=0.1
NHEAD=8
NUM_LAYERS=4
DIM_FF=1024
IMDB_PATH="../datasets/imdb"
LOAD_DIR="."
SAVE_DIR="best_model"Script imdb/infer.sh can test the result of classification. We use text "It is awesome! It is nice. The director does a good job!" as example.
cd imdb
sh infer.shThe default parameters are displayed below.
SEQUENCE_LEN=128
VOCAB_SZ=100000
D_MODEL=512
DROPOUT=0.1
NHEAD=8
NUM_LAYERS=4
DIM_FF=1024
LOAD_DIR="best_model"
TEXT="It is awesome! It is nice. The director does a good job!"This task can generate a sequence of odd numbers according to the input even numbers. For example, if we input [2422, 2424, 2426], then we will get [0, 2423, 2425, 2427] as a result.
We generate the data by ourselves. Please read odd_numbers/train_transformer_odd_numbers.py for details.
You can use bash script odd_numbers/train.sh to train this model.
cd odd_numbers
sh train.shNote that the default parameters are following:
BATCH_SIZE=128
EPOCH=30
LEARNING_RATE=0.0001
VOCAB_SZ=10000
D_MODEL=512
DROPOUT=0.0
NHEAD=2
NUM_ENCODER_LAYERS=1
NUM_DECODER_LAYERS=1
DIM_FF=128
LOAD_DIR="."
SAVE_DIR="best_model"Bash script odd_numbers/infer.sh is used to infer the trained model.
cd odd_numbers
sh infer.shThe default parameters are set as below:
VOCAB_SZ=10000
D_MODEL=512
DROPOUT=0.0
NHEAD=2
NUM_ENCODER_LAYERS=1
NUM_DECODER_LAYERS=1
DIM_FF=128
LOAD_DIR="best_model"
INPUT_START=4386The parameter input_start is the first number of the sequence input. If it is 4386, then the program will generate the sequence [4386, 4388, 4390] as input.