Machine & Deep Learning Blog by Insaf Ashrapov

Hackathon: Who is better to spot generated image?

2021-01-12T00:00:00+00:00

The online hackathon by Digital Leader was held from 19.10 to 27.11.2020. I have shown that a trained neural network better distinguish generated face images that human. As a result, I took 4th place and won swag prizes.

The work of other participants can be found here: https://hackathon.digitalleader.org/?fv-page=2#contest

Graph classification by computer vision

2020-11-07T00:00:00+00:00

Graph analysis nowadays becomes more popular, but how does it perform compared to the computer vision approach? We will show while the training speed of computer vision models is much slower, they perform considerably well compared to graph theory.

Github repo with all code, link
Originally posted on Medium

Graph analysis

In general, graph theory represents pairwise relationships between objects. We won’t leave much detail here, but you may consider its some kind of network below: Network. Photo by Alina Grubnyak on Unsplash

The main point we need to know here, it is that by connecting objects with edges we may visualize graphs. Then we will be able to use classic computer vision models. Unfortunately, we may lose some initial information. For example, the graph may contain different types of objects, connection, maybe impossible to visualize it in 2D.

Libraries

There are plenty of libraries you look at if you willing to start working with them:

networkx — classical algorithms, visualizations
pytorch_geometric — SOTA algorithms graph, a framework on top of pytorch
graph-tool — classical algorithms, visualizations
scikit-network — classical algorithms, sklearn like API
TensorFlow Graphics — SOTA algorithms graph, a framework on top of Tensorflow

They are all aimed at their own specific role. This is why it depends on your task which one to use.

Theory

This article aimed more at practical usage this why for the theory I will leave some only some links:

Hands-on Graph Neural Networks with PyTorch & PyTorch Geometric
CS224W: Machine Learning with Graphs
Graph classification will be based on Graph Convolutional Networks (GCN), arxiv link

Model architecture

We will be using as baseline following architecture:

* GCNConv - 6 blocks
* JumpingKnowledge for aggregation sconvolutions
* global_add_pool with relu
* Final layer is softmax

class SimpleGNN(torch.nn.Module):
    """Original from http://pages.di.unipi.it/citraro/files/slides/Landolfi_tutorial.pdf"""
    def __init__(self, dataset, hidden=64, layers=6):
        super(SimpleGNN, self).__init__()
        self.dataset = dataset
        self.convs = torch.nn.ModuleList()
        self.convs.append(GCNConv(in_channels=dataset.num_node_features,
                                  out_channels=hidden))

        for _ in range(1, layers):
            self.convs.append(GCNConv(in_channels=hidden, out_channels=hidden))

        self.jk = JumpingKnowledge(mode="cat")
        self.jk_lin = torch.nn.Linear(
            in_features=hidden*layers, out_features=hidden)
        self.lin_1 = torch.nn.Linear(in_features=hidden, out_features=hidden)
        self.lin_2 = torch.nn.Linear(
            in_features=hidden, out_features=dataset.num_classes)

    def forward(self, index):
        data = Batch.from_data_list(self.dataset[index])
        x = data.x
        xs = []
        for conv in self.convs:
            x = F.relu(conv(x=x, edge_index=data.edge_index))
            xs.append(x)

        x = self.jk(xs)
        x = F.relu(self.jk_lin(x))
        x = global_add_pool(x, batch=data.batch)
        x = F.relu(self.lin_1(x))
        x = F.softmax(self.lin_2(x), dim=-1)
        return x

The code link is based on this tutorial.

Computer vision

All the required theory and technical skills you will get by following this article: Guide how to learn and master computer vision in 2020 Besides, you should be familiar with the next topics:

EfficientNet https://arxiv.org/abs/1905.11946
Focal Loss https://arxiv.org/abs/1708.02002
albumentations — augmentation library
pytorch-lightning — pytorch framework

Model architecture

We will be using the following model without any hyper-parameter tuning::

efficientnet_b2b as encoder
FocalLoss and average precision as early stopping criteria
TTA with flip left right and up down
Augmentation with albumentation
Pytorch-lightning as training model framework
4 Folds Assembling
mixup The code link.

Experiment

Data

We will predict the activity (against COVID?) of different molecules. Dataset sample:

smiles, activity
OC=1C=CC=CC1CNC2=NC=3C=CC=CC3N2, 1
CC(=O)NCCC1=CNC=2C=CC(F)=CC12, 1
O=C([C@@H]1[C@H](C2=CSC=C2)CCC1)N, 1

To generate images for the computer vision approach we first convert the graph to the networkx format and then get the desired images by calling draw_kamada_kawai function:

""" Full code link 
https://github.com/Diyago/Graph-clasification-by-computer-vision/blob/main/generate_images.py"""
if __name__ == "__main__":
    ohd = transforms.OneHotDegree(max_degree=4)
    covid = COVID(root='./data/COVID/', transform=ohd)
    for graph in torch.arange(len(covid)).long():
        G = utils.to_networkx(covid[int(graph)])
        a = nx.draw_kamada_kawai(G)
        plt.savefig("./train/id_{}_y_{}.jpg".format(int(graph),
                                                    covid.data.y[int(graph)]), format="jpg")
                                            

Different molecules visualization will be used for the computer vision approach. Image by Insaf Ashrapov

Link to the dataset.

Experiment results

TEST metrics
### Computer vision
* ROC AUC 0.697
* MAP 0.183

### Graph method
* ROC AUC 0.702
* MAP 0.199

As you can result practically similar. The graph method gets a bit higher results. Besides, it takes only 1 minute to train GNN and 30 minutes for CNN. I have to say: this is mostly just a proof-of-concept project with many simplifications. In other words, you may visualize graphs and train well-known computer vision models instead of fancy-new GNN.

References

Github repo with all code link by Insaf Ashrapov
GNN tutorial http://pages.di.unipi.it/citraro/files/slides/Landolfi_tutorial.pdf

Talk: Automatic satellite building construction monitoring

2020-09-20T00:00:00+00:00

This weekend there was a big Data Science event - DataFest. I had a talk about “Automatic satellite building construction monitoring”. (in Russian)

Рассказал на Data Fest Online 2020 про один из наших исследовательских проектов: Автоматизированный спутниковый контроль за строительством жилых зданий.

GANs for tabular data

2020-03-26T00:00:00+00:00

We well know GANs for success in the realistic image generation. However, they can be applied in tabular data generation. We will review and examine some recent papers about tabular GANs in action.

Originally posted on Medium.
Github repo

What is GAN

“GAN composes of two deep networks: the generator and the discriminator” [1]. Both of them simultaneously trained. Generally, the model structure and training process presented this way:

GAN training pipeline. By Jonathan Hui — What is Generative Adversarial Networks GAN? [1]

The task for the generator is to generate samples, which won’t be distinguished from real samples by the discriminator. I won’t give much detail here, but if you would like to dive into them, you can read the medium post and the original paper by Ian J. Goodfellow. Recent architectures such as StyleGAN 2 can produce outstanding photo-realistic images.

Hand-picked examples of human faces generated by StyleGAN 2, Source arXiv:1912.04958v2 [7]

Problems

While face generation seems to be not a problem anymore, there are plenty of issues we need to resolve:

Training speed. For training StyleGAN 2 you need 1 week and DGX-1 (8x NVIDIA Tesla V100).
Image quality in specific domains. The state-of-the-art network still fails on other tasks.

Hand-picked examples of cars and cats generated by StyleGAN 2, Source arXiv:1912.04958v2 [7]

Tabular GANs

Even cats and dogs generation seem heavy tasks for GANs because of not trivial data distribution and high object type variety. Besides such domains, the image background becomes important, which GANs usually fail to generate. Therefore, I’ve been wondering what GANs can achieve in tabular data. Unfortunately, there aren’t many articles. The next two articles appear to be the most promising.

TGAN: Synthesizing Tabular Data using Generative Adversarial Networks arXiv:1811.11264v1 [3]

First, they raise several problems, why generating tabular data has own challenges: the various data types (int, decimals, categories, time, text) different shapes of distribution ( multi-modal, long tail, Non-Gaussian…) sparse one-hot-encoded vectors and highly imbalanced categorical columns.

Task formalizing

Let us say table T contains n_c continuous variables and n_d discrete(categorical) variables, and each row is C vector. These variables have an unknown joint distribution P. Each row is independently sampled from P. The object is to train a generative model M. M should generate new a synthetic table T_synth with the distribution similar to P. A machine learning model learned on T_synth should achieve a similar accuracy on a real test table T_test, as would a model trained on T.

Preprocessing numerical variables

“Neural networks can effectively generate values with a distribution centered over (−1, 1) using tanh” [3]. However, they show that nets fail to generate suitable data with multi-modal data. Thus they cluster a numerical variable by using and training a Gaussian Mixture Model (GMM) with m (m=5) components for each of C.

Normalizing using GMM using mean and standard deviation. Source arXiv:1811.11264v1 [3]

Finally, GMM is used to normalize C to get V. Besides, they compute the probability of C coming from each of the m Gaussian distribution as a vector U.

Preprocessing categorical variables

Due to usually low cardinality, they found the probability distribution can be generated directly using softmax. But it necessary to convert categorical variables to one-hot-encoding representation with noise to binary variables After prepossessing, they convert T with n_c + n_d columns to V, U, D vectors. This vector is the output of the generator and the input for the discriminator in GAN. “GAN does not have access to GMM parameters” [3].

Generator

They generate a numerical variable in 2 steps. First, generate the value scalar V, then generate the cluster vector U eventually applying tanh. Categorical features generated as a probability distribution over all possible labels with softmax. To generate the desired row LSTM with attention mechanism is used. Input for LSTM in each step is random variable z, weighted context vector with previous hidden and embedding vector.

Discriminator

Multi-Layer Perceptron (MLP) with LeakyReLU and BatchNorm is used. The first layer used concatenated vectors (V, U, D) among with mini-batch diversity with feature vector from LSTM. The loss function is the KL divergence term of input variables with the sum ordinal log loss function.

Example of using TGAN to generate a simple census table. The generator generates T features one be one. The discriminator concatenates all features together. Then it uses Multi-Layer Perceptron (MLP) with LeakyReLU to distinguish real and fake data. Source arXiv:1811.11264v1 [3]

Results

Accuracy of machine learning models trained on the real and synthetic training set. (BN — Bayesian networks, Gaussian Copula). Source arXiv:1811.11264v1 [3]

They evaluate the model on two datasets KDD99 and covertype. For some reason, they used weak models without boosting (xgboost, etc). Anyway, TGAN performs reasonably well and robust, outperforming bayesian networks. The average performance gap between real data and synthetic data is 5.7%.

Modeling Tabular Data using Conditional GAN (CTGAN) arXiv:1907.00503v2 [4]

The key improvements over previous TGAN are applying the mode-specific normalization to overcome the non-Gaussian and multimodal distribution. Then a conditional generator and training-by-sampling to deal with the imbalanced discrete columns.

Task formalizing

The initial data remains the same as it was in TGAN. However, they solve different problems.

Likelihood of fitness. Do columns in T_syn follow the same joint distribution as T_train
Machine learning efficacy. When training model to predict one column using other columns as features, can such model learned from T_syn achieve similar performance on T_test, as a model learned on T_train

Preprocessing

Preprocessing for discrete columns keeps the same. For continuous variables, a variational Gaussian mixture model (VGM) is used. It first estimates the number of modes m and then fits a Gaussian mixture. After we normalize initial vector C almost the same as it was in TGAN, but the value is normalized within each mode. Mode is represented as one-hot vector betta ([0, 0, .., 1, 0]). Alpha is the normalized value of C.

An example of mode-specific normalization. Source arXiv:1907.00503v2 [4]

As a result, we get our initial row represented as the concatenation of one-hot’ ed discrete columns with representation discussed above of continues variables:

Preprocessed row. Source arXiv:1907.00503v2 [4]

Training

“The final solution consists of three key elements, namely: the conditional vector, the generator loss, and the training-by-sampling method” [4].

CTGAN model. The conditional generator can generate synthetic rows conditioned on one of the discrete columns. With training-by-sampling, the cond and training data are sampled according to the log-frequency of each category, thus CTGAN can evenly explore all possible discrete values. Source arXiv:1907.00503v2 [4]

Conditional vector

Represents concatenated one-hot vectors of all discrete columns but with the specification of only one category, which was selected. “For instance, for two discrete columns, D1 = {1, 2, 3} and D2 = {1, 2}, the condition (D2 = 1) is expressed by the mask vectors m1 = [0, 0, 0] and m2 = [1, 0]; so cond = [0, 0, 0, 1, 0]” [4].

Generator loss

“During training, the conditional generator is free to produce any set of one-hot discrete vectors” [4]. But they enforce the conditional generator to produce d_i (generated discrete one-hot column)= m_i (mask vector) is to penalize its loss by adding the cross-entropy between them, averaged over all the instances of the batch.

Training-by-sampling

“Specifically, the goal is to resample efficiently in a way that all the categories from discrete attributes are sampled evenly during the training process, as a result, to get real data distribution during the test” [4]. In another word, the output produced by the conditional generator must be assessed by the critic, which estimates the distance between the learned conditional distribution P_G(row|cond) and the conditional distribution on real data P(row|cond). “The sampling of real training data and the construction of cond vector should comply to help critics estimate the distance” [4]. Properly sample the cond vector and training data can help the model evenly explore all possible values in discrete columns. The model structure is given below, as opposite to TGAN, there is no LSTM layer. Trained with WGAN loss with gradient penalty.

Generator. Source arXiv:1907.00503v2 [4]

Discriminator. Source arXiv:1907.00503v2 [4]

Also, they propose a model based on Variational autoencoder (VAE), but it out of the scope of this article.

Results

Proposed network CTGAN and TVAE outperform other methods. As they say, TVAE outperforms CTGAN in several cases, but GANs do have several favorable attributes. The generator in GANs does not have access to real data during the entire training process, unlike TVAE.

Benchmark results over three sets of experiments, namely Gaussian mixture simulated data (GM Sim.), Bayesian network simulated data (BN Sim.), and real data. They report the average of each metric. For real datasets (f1, etc). Source arXiv:1907.00503v2 [4]

Besides, they published source code on GitHub, which with slight modification will be used further in the article.

Applying CTGAN to generating data for increasing train (semi-supervised)

This is a kind of vanilla dream for me to be examined. After brief familiarization with recent developments in GAN, I’ve been thinking about how to apply it to something that I solve on the work daily. So here is my idea.

Task formalization

Let say we have T_train and T_test (train and test set respectively). We need to train the model on T_train and make predictions on T_test. However, we will increase the train by generating new data by GAN, somehow similar to T_test, without using ground truth labels of it.

Experiment design

Let say we have T_train and T_test (train and test set respectively). The size of T_train is smaller and might have different data distribution. First of all, we train CTGAN on T_train with ground truth labels (step 1), then generate additional data T_synth (step 2). Secondly, we train boosting in an adversarial way on concatenated T_train and T_synth (target set to 0) with T_test (target set to 1) (steps 3 & 4). The goal is to apply newly trained adversarial boosting to obtain rows more like T_test. Note — original ground truth labels aren’t used for adversarial training. As a result, we take top rows from T_train and T_synth sorted by correspondence to T_test (steps 5 & 6). Finally, rain new boosting on them and check results on T_test.

Experiment design and workflow

Of course for the benchmark purposes we will test ordinal training without these tricks and another original pipeline but without CTGAN (in step 3 we won’t use T_sync).

Code

Experiment code and results released as Github repo here. Pipeline and data preparation was based on Benchmarking Categorical Encoders’ article and its repo. We will follow almost the same pipeline, but for speed, only Single validation and Catboost encoder was chosen. Due to the lack of GPU memory, some of the datasets were skipped.

Datasets

All datasets came from different domains. They have a different number of observations, several categorical and numerical features. The aim of all datasets is a binary classification. Preprocessing of datasets was simple: removed all time-based columns from datasets. The remaining columns were either categorical or numerical. In addition, while training results were sampled T_train — 5%, 10%, 25%, 50%, 75%

Name	Total points	Train points	Test points	Number of features	Number of categorical features	Short description
Telecom	7.0k	4.2k	2.8k	20	16	Churn prediction for telecom data
Adult	48.8k	29.3k	19.5k	15	8	Predict if persons’ income is bigger 50k
Employee	32.7k	19.6k	13.1k	10	9	Predict an employee’s access needs, given his/her job role
Credit	307.5k	184.5k	123k	121	18	Loan repayment
Mortgages	45.6k	27.4k	18.2k	20	9	Predict if house mortgage is founded
Taxi	892.5k	535.5k	357k	8	5	Predict the probability of an offer being accepted by a certain driver
Poverty_A	37.6k	22.5k	15.0k	41	38	Predict whether or not a given household for a given country is poor or not

Datasets properties

Results

From the first sight of view and in terms of metric and stability (std), GAN shows the worse results. However, sampling the initial train and then applying adversarial training we could obtain the best metric results and stability (sample_original). To determine the best sampling strategy, ROC AUC scores of each dataset were scaled (min-max scale) and then averaged among the dataset.

Results

To determine the best validation strategy, I compared the top score of each dataset for each type of validation.

Table 1.2 Different sampling results across the dataset, higher is better (100% - maximum per dataset ROC AUC)

dataset_name	None	gan	sample_original
credit	0.997	0.998	0.997
employee	0.986	0.966	0.972
mortgages	0.984	0.964	0.988
poverty_A	0.937	0.950	0.933
taxi	0.966	0.938	0.987
adult	0.995	0.967	0.998
telecom	0.995	0.868	0.992

Table 1.3 Different sampling results, higher is better for a mean (ROC AUC), lower is better for std (100% - maximum per dataset ROC AUC)

sample_type	mean	std
None	0.980	0.036
gan	0.969	0.06
sample_original	0.981	0.032

We can see that GAN outperformed other sampling types in 2 datasets. Whereas sampling from original outperformed other methods in 3 of 7 datasets. Of course, there isn’t much difference. but these types of sampling might be an option. Of course, there isn’t much difference. but these types of sampling might be an option.

Table 1.4 same_target_prop is equal 1 then the target rate for train and test are different no more than 5%. Higher is better.

sample_type	same_target_prop	prop_test_score
None	0	0.964
None	1	0.985
gan	0	0.966
gan	1	0.945
sample_original	0	0.973
sample_original	1	0.984

Let’s define same_target_prop is equal 1 then the target rate for train and test is different no more than 5%. So then we have almost the same target rate in train and test None and sample_original better. However, gan is starting performing noticeably better than target distribution changes.

same_target_prop is equal 1 then the target rate for train and test are different only by 5%

References

[1] Jonathan Hui. GAN — What is Generative Adversarial Networks GAN? (2018), medium article [2] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. Generative Adversarial Networks (2014). arXiv:1406.2661 [3] Lei Xu LIDS, Kalyan Veeramachaneni. Synthesizing Tabular Data using Generative Adversarial Networks (2018). arXiv:1811.11264v1 [cs.LG] [4] Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, Kalyan Veeramachaneni. Modeling Tabular Data using Conditional GAN (2019). arXiv:1907.00503v2 [cs.LG] [5] Denis Vorotyntsev. Benchmarking Categorical Encoders (2019). Medium post [6] Insaf Ashrapov. GAN-for-tabular-data (2020). Github repository. [7] Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, Timo Aila. Analyzing and Improving the Image Quality of StyleGAN (2019) arXiv:1912.04958v2 [cs.CV]

Guide how to learn and master computer vision in 2020

2019-12-15T00:00:00+00:00

This post will focus on resources, which I believe will boost your knowledge in computer vision the most and mainly based on my own experience.

Original Medium post

Before starting learning computer vision getting knowledge about basics in machine learning and python will be great.

Frameworks

Star Wars: Luke Skywalker & Darth Vader

You don’t have to choose it from the beginning, but applying newly gained knowledge is necessary. There is no much to options: pytorch or keras (TensorFlow). Pytorch may require more code to write but gives much flexibility in return, so use it. Besides, most researchers in deep learning started to use pytoch. Albumentation (image augmentation) and catalyst (framework, high-level API on the top of pytorch) might be useful as well, use them, especially the first one.

Hardware

Nvidia GPU 10xx+ will be more than enough ($300+) Kaggle kernels — only 30 hours/week (free) Google Colab — 12 hours session limit, unknown week limits (free)

Theory & Practise

Online courses

CS231n is the top online, which covers all necessary fundamentals in the computer vision. Youtube online videos. They even have exercises but I can’t advise to solve them. (free)
Fast.ai is the next course you should watch off. Also, fast.ai is the high-level framework on the top of pytorch, but they change their API too frequent and the lack of documentation makes it unreliable to use. However, theory and useful tricks are just fantastic to spend time watching this course. (free) While taking these courses I encourage you to put theory into practice applying it to one of the frameworks.

Articles and code

ArXiv.org — information about all recent will be here. (free) https://paperswithcode.com/sota — the state of the art in most common deep learning tasks, not only computer vision. (free) Github — if something was implemented you will find it here. (free)

Books

There is no much to read, but these two books I believe will be useful, no matter pytorch or keras you choose to use Deep Learning with Python by Keras creator and Google AI researcher François Chollet. Easy to use and may get some insight you didn’t know before. (not free) Deep learning with Pytorch by pytorch team Eli Stevens & Luca Antiga (free)

Kaggle

Competitions — kaggle is well known online platform for different variety of machine learning competitions, many of them are about computer vision. You can start participating even without finishing courses, because from competition beginning there will be many open kernels (end-to-end code) which you can run directly from the browser. (free)

Tough (jedi) way

Star Wars`s Jedi: Yoda

Another alternative path could be tough but you will get required knowledge not only to do fit-predict but perform own research. From Sergei Belousov aka bes. You just need to read and implement all the articles below (free). Just reading them also will be great.

Architectures

AlexNet: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
ZFNet: https://arxiv.org/abs/1311.2901
VGG16: https://arxiv.org/abs/1505.06798
ResNet: https://arxiv.org/abs/1704.06904
GoogLeNet: https://arxiv.org/abs/1409.4842
Inception: https://arxiv.org/abs/1512.00567
Xception: https://arxiv.org/abs/1610.02357
MobileNet: https://arxiv.org/abs/1704.04861

Severstal Steel Defect Detection Challenge on Kaggle

2019-11-20T00:00:00+00:00

Top 2% (31/2431) solution write-up. Steel is one of the most important building materials of modern times. Steel buildings are resistant to natural and man-made wear which has made the material ubiquitous around the world. To help make production of steel more efficient, this competition will help identify defects.

31 place solution on Github

Can you detect and classify defects in steel? Segmentation in Pytorch https://www.kaggle.com/c/severstal-steel-defect-detection/overview

Input data

Team - [ods.ai] stainless

Insaf Ashrapov
Igor Krashenyi
Pavel Pleskov
Anton Zakharenkov
Nikolai Popov

Models We tried almost every type of model from qubvel`s segmentation model library - unet, fpn, pspnet with different encoders from resnet to senet152. FPN with se-resnext50 outperformed other models. Lighter models like resnet34 performed aren’t well enough but were useful in the final blend. Se-resnext101 possibly could perform much better with more time training, but we didn’t test that.

Augmentations and Preprocessing From Albumentations library: Hflip, VFlip, RandomBrightnessContrast – training speed was not to fast so these basic augmentations performed well enough. In addition, we used big crops for training or/and finetuning on the full image size, because attention blocks in image tasks rely on the same input size for the training and inference phase.

Training

We used both pure pytorch and Catalyst framework for training.
Losses: bce and bce with dice performed quite well, but lovasz loss dramatically outperformed them in terms of validation and public score. However, combining with classification model bce with dice gave a better result, that could be because Lovasz helped the model to filter out false-positive masks. Focal loss performed quite poor due to not very good labeling.
Optimizer: Adam with RAdam. LookAHead, Over900 didn’t work well to use.
Crops with a mask, BalanceClassSampler with upsampler mode from catalyst significantly increased training speed.
We tried own classification model (resnet34 with CBAM) by setting the goal to improve f1 for each class. The optimal threshold was disappointingly unstable but we reached averaged f1 95.1+. As a result, Cheng`s classification was used.
Validation: kfold with 10 folds. Despite the shake-up – local, public and private correlated surprisingly good.
Pseudolabing; We did two rounds of pseudo labeling by training on the best public submit and validating on the out of fold. It didn’t work for the third time but gave us a huge improvement.
Postprocessing: filling holes, removing the small mask by the threshold. We tried to remove small objects by connected components with no improvements.
Hardware: bunch of nvidia cards

Ensembling Simple segmentation models averaging with different encoders, both FPN and Unet applied to images classified having a mask. One of the unchosen submit could give as 16th place.

Talk: Banking models interpretation

2019-11-09T00:00:00+00:00

Talk was given at AI Journey Conference in Moscow. Conference with leading international and Russian experts in AI and data analysis, top companies in the development and application of AI in business

Kaggle APTOS 2019 Blindness Detection Challenge

2019-10-04T00:00:00+00:00

Top 3% (76/2943) solution write-up for the Kaggle APTOS 2019 Blindness Detection. Imagine being able to detect blindness before it happened. Millions of people suffer from diabetic retinopathy, the leading cause of blindness among working aged adults

This repository consists of code and configs that were used to train our best single model. The solution is powered by awesome Catalyst library.

Data

Input Data

2015 competition data was used for pretraining all our models. Without it out models performed much worse. We used different techniques: first train on old data, then finetuning on the new train, another technique train on both data, the finetune on new train data. Besides, starting finetuning with freezing all layers and training only last FC layer gave us more stable results.

Models and Preprocessing

From the beginning, efficientnet outperformed other models. Using fp16 (available in kaggle kernels) allowed to use bigger batch size - speeded up training and inference.

Models used in the final submission:

EfficientNet-B5 (best single model): 224x224 (tta with Hflip, preprocessing - crop_from_gray, circle_crop, ben_preprocess=10)
EfficientNet-B4: 256x256 (tta with Hflip, preprocessing - crop_from_gray, circle_crop, ben_preprocess=20)
EfficientNet-B5: 256x256 (tta with Hflip, preprocessing - crop_from_gray, circle_crop, ben_preprocess=30)
EfficientNet-B5: (256x256) without specific preprocess, two models with different augmentations.

We tried bigger image sizes but it gave worse results. EfficientNet-B2 and EfficientNet-B6 gave worse results as well.

Augmentations

From Albumentations library: Hflip, VFlip, RandomScale, CenterCrop, RandomBrightnessContrast, ShiftScaleRotate, RandomGamma, RandomGamma, JpegCompression, HueSaturationValue, RGBShift, ChannelShuffle, ToGray, Cutout

Training

First 3 models were trained using Catalyst library and the last one with FastAi, both of them work on top of Pytorch.

We used both ordinal regression and regression. Models with classification tasks weren’t well enough to use them. Adam with OneCycle was used for training. WarmUp helped to get more stable results. RAdam, Label smoothing didn’t help to improve the score.

We tried to use leak investigated here and here by fixing output results. Almost 10% of the public test data were part of the train. Results dropped significantly, which means training data annotation were pretty bad.

We tried kappa coefficient optimization, it didn’t give reliable improvement on public, but could help us on private almost +0.003 score.

Hardware

We used 1x2080, 1x Tesla v40, 1x*1070ti Ensembling

Team

Mamat Shamshiev, Insaf Ashrapov, Mishunyayev Nikita

Road detection using segmentation models and albumentations libraries on Keras

2019-08-25T00:00:00+00:00

In this article, I will show how to write own data generator and how to use albumentations as augmentation library. Along with segmentation_models library, which provides dozens of pretrained heads to Unet and other unet-like architectures. For the full code go to Github. Link to dataset.

Original Medium post

Theory

The task of semantic image segmentation is to label each pixel of an image with a corresponding class of what is being represented. For such a task, Unet architecture with different variety of improvements has shown the best result. The core idea behind it just few convolution blocks, which extracts deep and different type of image features, following by so-called deconvolution or upsample blocks, which restore the initial shape of the input image. Besides after each convolution layers, we have some skip-connections, which help the network to remember about initial image and help against fading gradients. For more detailed information you can read the arxiv article or another article.

Vanilla U-Net https://arxiv.org/abs/1505.04597

We came for practice, lets go for it.

Dataset—satellite images

For segmentation we don’t need much data to start getting a decent result, even 100 annotated photos will be enough. For now, we will be using Massachusetts Roads Dataset from https://www.cs.toronto.edu/~vmnih/data/, there about 1100+ annotated train images, they even provide validation and test dataset. Unfortunately, there is no download button, so we have to use a script. This script will get the job done (it might take some time to complete). Lets take a look at image examples: Massachusetts Roads Dataset image and ground truth mask ex.

Annotation and image quality seem to be pretty good, the network should be able to detect roads.

Libraries installation

First of all, you need Keras with TensorFlow to be installed. For Unet construction, we will be using Pavel Yakubovskiy`s library called segmentation_models, for data augmentation albumentation library. I will write more detailed about them later. Both libraries get updated pretty frequently, so I prefer to update them directly from git.

conda install -c conda-forge keras
pip install git+https://github.com/qubvel/efficientnet
pip install git+https://github.com/qubvel/classification_models.git
pip install git+https://github.com/qubvel/segmentation_models
pip install git+https://github.com/albu/albumentations
pip install tta-wrapper

Defining data generator

As a data generator, we will be using our custom generator. It should inherit keras.utils.Sequence and should have defined such methods:

__init__ (class initializing)
__len__ (return lengths of dataset)
on_epoch_end (behavior at the end of epochs)
__getitem__ (generated batch for feeding into a network)

One main advantage of using a custom generator is that you can work with every format data you have and you can do whatever you want — just don’t forget about generating desired output(batch) for keras.

Here we defining __init__ method. The main part of it is setting paths for images (self.image_filenames) and mask names (self.mask_names). Don’t forget to sort them, because for self.image_filenames[i] corresponding mask should be self.mask_names[i].

def __init__(self, root_dir=r'../data/val_test', image_folder='img/', mask_folder='masks/', 
             batch_size=1, image_size=768, nb_y_features=1, 
             augmentation=None,
             suffle=True):
    self.image_filenames = listdir_fullpath(os.path.join(root_dir, image_folder))
    self.mask_names = listdir_fullpath(os.path.join(root_dir, mask_folder))
    self.batch_size = batch_size
    self.augmentation = augmentation
    self.image_size = image_size
    self.nb_y_features = nb_y_features
    self.suffle = suffle

def listdir_fullpath(d):
    return np.sort([os.path.join(d, f) for f in os.listdir(d)])

Next important thing __getitem__. Usually, we can not store all images in RAM, so every time we generate a new batch of data we should read corresponding images. Below we define the method for training. For that, we create an empty numpy array (np.empty), which will store images and mask. Then we read images by read_image_mask method, apply augmentation into each pair of image and mask. Eventually, we return batch (X, y), which is ready to be fitted into the network.

def __getitem__(self, index):
      data_index_min = int(index*self.batch_size)
      data_index_max = int(min((index+1)*self.batch_size, len(self.image_filenames)))

      indexes = self.image_filenames[data_index_min:data_index_max]
      this_batch_size = len(indexes) # The last batch can be smaller than the others

      X = np.empty((this_batch_size, self.image_size, self.image_size, 3), dtype=np.float32)
      y = np.empty((this_batch_size, self.image_size, self.image_size, self.nb_y_features), dtype=np.uint8)

      for i, sample_index in enumerate(indexes):

          X_sample, y_sample = self.read_image_mask(self.image_filenames[index * self.batch_size + i], 
                                                  self.mask_names[index * self.batch_size + i])

          # if augmentation is defined, we assume its a train set
          if self.augmentation is not None:

              # Augmentation code
              augmented = self.augmentation(self.image_size)(image=X_sample, mask=y_sample)
              image_augm = augmented['image']
              mask_augm = augmented['mask'].reshape(self.image_size, self.image_size, self.nb_y_features)
              # divide by 255 to normalize images from 0 to 1
              X[i, ...] = image_augm/255
              y[i, ...] = mask_augm
          else:
              ...
    return X, y

test_generator = DataGeneratorFolder(root_dir = './data/road_segmentation_ideal/training', 
                           image_folder = 'input/', 
                           mask_folder = 'output/', 
                           nb_y_features = 1)

train_generator = DataGeneratorFolder(root_dir = './data/road_segmentation_ideal/training', 
                                      image_folder = 'input/', 
                                      mask_folder = 'output/', 
                                      batch_size=4,
                                      image_size=512,
                                      nb_y_features = 1, augmentation = aug_with_crop)

Data augmentation — albumentations

Data augmentation is a strategy that enables to significantly increase the diversity of data available for training models, without actually collecting new data. It helps to prevent over-fitting and make the model more robust. There are plenty of libraries for such task: imaging, augmentor, solt, built-in methods to keras/pytorch, or you can write your custom augmentation with OpenCV library. But I highly recommend albumentations library. It’s super fast and convenient to use. For usage examples go to the official repository or take a look at example notebooks.

In our task, we will be using basic augmentations such as flips and contrast with non-trivial such ElasticTransform. Example of them you can in the image above.

def aug_with_crop(image_size = 256, crop_prob = 1):
    return Compose([
        RandomCrop(width = image_size, height = image_size, p=crop_prob),
        HorizontalFlip(p=0.5),
        VerticalFlip(p=0.5),
        RandomRotate90(p=0.5),
        Transpose(p=0.5),
        ShiftScaleRotate(shift_limit=0.01, scale_limit=0.04, rotate_limit=0, p=0.25),
        RandomBrightnessContrast(p=0.5),
        RandomGamma(p=0.25),
        IAAEmboss(p=0.25),
        Blur(p=0.01, blur_limit = 3),
        OneOf([
            ElasticTransform(p=0.5, alpha=120, sigma=120 * 0.05, alpha_affine=120 * 0.03),
            GridDistortion(p=0.5),
            OpticalDistortion(p=1, distort_limit=2, shift_limit=0.5)
        ], p=0.8)
    ], p = 1)

After defining the desired augmentation you can easily get your output this:

augmented = aug_with_crop(image_size = 1024)(image=img, mask=mask)
image_aug = augmented['image']
mask_aug = augmented['mask']

Callbacks

We will be using common callbacks:

ModelCheckpoint — allows you to save weights of the model while training
ReduceLROnPlateau — reduces training if a validation metric stops to increase
EarlyStopping — stop training once metric on validation stops to increase several epochs
TensorBoard — the great way to monitor training progress

from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, EarlyStopping, TensorBoard

# reduces learning rate on plateau
lr_reducer = ReduceLROnPlateau(factor=0.1,
                               cooldown= 10,
                               patience=10,verbose =1,
                               min_lr=0.1e-5)
# model autosave callbacks
mode_autosave = ModelCheckpoint("./weights/road_crop.efficientnetb0imgsize.h5", 
                                monitor='val_iou_score', 
                                mode='max', save_best_only=True, verbose=1, period=10)

# stop learining as metric on validatopn stop increasing
early_stopping = EarlyStopping(patience=10, verbose=1, mode = 'auto') 

# tensorboard for monitoring logs
tensorboard = TensorBoard(log_dir='./logs/tenboard', histogram_freq=0,
                          write_graph=True, write_images=False)

callbacks = [mode_autosave, lr_reducer, tensorboard, early_stopping]

Training

As the model, we will be using Unet. The easiest way to use it just get from segmentation_models library.

backbone_name: name of classification model for using as an encoder. EfficientNet currently is state-of-the-art in the classification model, so let us try it. While it should give faster inference and has less training params, it consumes more GPU memory than well-known resnet models. There are many other options to try
encoder_weights — using imagenet weights speeds up training
encoder_freeze: if True set all layers of an encoder (backbone model) as non-trainable. It might be useful firstly to freeze and train model and then unfreeze
decoder_filters — you can specify numbers of decoder block. In some cases, a heavier encoder with simplified decoder might be useful.

After initializing Unet model, you should compile it. Also, we set IOU ( intersection over union) as metric we will to monitor and bce_jaccard_loss (binary cross-entropy plus jaccard loss) as the loss we will optimize. I gave links, so won’t go here for further detail for them. Tensorboard logs

After starting training you can for watching tensorboard logs. As we can see model train pretty well, even after 50 epoch we didn’t reach global/local optima.

Loss and IOU metric history

Inference

So we have 0.558 IOU on validation, but every pixel prediction higher than 0 we count as a mask. By picking the appropriate threshold we can further increase our result by 0.039 (7%). Validation threshold adjusting

Metrics are quite interesting for sure, but a much more insightful model prediction. From the images below we see that our network caught up the task pretty good, which is great. For the inference code and for calculating metrics you can read full code.

References

@phdthesis{MnihThesis,
    author = {Volodymyr Mnih},
    title = {Machine Learning for Aerial Image Labeling},
    school = {University of Toronto},
    year = {2013}
}

Poster: Automatic salt deposits segmentation: A deep learning approach

2019-06-20T00:00:00+00:00

Being honored to present a poster about image segmentation at the last international summit, Machines Can See 2019 , Moscow, Russia #deeplearning #cv #poster

-> 10 th

Machine & Deep Learning Blog by Insaf Ashrapov

Hackathon: Who is better to spot generated image?

Graph classification by computer vision

Graph analysis

Libraries

Theory

Model architecture

Computer vision

Model architecture

Experiment

Data

Experiment results

References

Talk: Automatic satellite building construction monitoring

GANs for tabular data

Originally posted on Medium.

Github repo

What is GAN

Problems

Tabular GANs

TGAN: Synthesizing Tabular Data using Generative Adversarial Networks arXiv:1811.11264v1 [3]

Task formalizing

Preprocessing numerical variables

Preprocessing categorical variables

Generator

Discriminator

Results

Modeling Tabular Data using Conditional GAN (CTGAN) arXiv:1907.00503v2 [4]

Task formalizing

Preprocessing

Training

Conditional vector

Generator loss

Training-by-sampling

Results

Applying CTGAN to generating data for increasing train (semi-supervised)

Task formalization

Experiment design

Code

Datasets

Results

Results

References

Guide how to learn and master computer vision in 2020

Original Medium post

Frameworks

Hardware

Theory & Practise

Online courses

Articles and code

Books

Kaggle

Tough (jedi) way

Architectures

Semantic Segmentation

Generative adversarial networks

Object detection

Instance Segmentation

Pose estimation

Severstal Steel Defect Detection Challenge on Kaggle

31 place solution on Github

Talk: Banking models interpretation

Kaggle APTOS 2019 Blindness Detection Challenge

Data

Models and Preprocessing

Augmentations

Training

Hardware

Team

Road detection using segmentation models and albumentations libraries on Keras

Original Medium post

Theory

Dataset—satellite images

Libraries installation

Defining data generator

Data augmentation — albumentations

Callbacks

Training

Inference

References