Skip to content

Adding the skin_mnist task to the default tasks#579

Merged
s314cy merged 1 commit intodevelopfrom
skin_mnist_task
Jul 6, 2023
Merged

Adding the skin_mnist task to the default tasks#579
s314cy merged 1 commit intodevelopfrom
skin_mnist_task

Conversation

@walidabn
Copy link
Copy Markdown
Collaborator

  • In the context of our dogfooding, we added a simple classification task, skin_mnist, that classifies skin diseases among 7 categories. The data is taken from the HAM10000 dataset, with data that is public on Kaggle / Harvard Dataverse.

@s314cy s314cy force-pushed the skin_mnist_task branch 2 times, most recently from cc4fa7c to 0f45c27 Compare June 29, 2023 08:48
@s314cy
Copy link
Copy Markdown
Contributor

s314cy commented Jun 29, 2023

to be merged after #578 has been reviewed

@s314cy s314cy force-pushed the skin_mnist_task branch from 0f45c27 to 31c12e5 Compare July 6, 2023 08:07
@s314cy s314cy self-requested a review July 6, 2023 08:30
@s314cy s314cy force-pushed the skin_mnist_task branch from 31c12e5 to fb8c48e Compare July 6, 2023 08:33
@s314cy s314cy merged commit e1b67f1 into develop Jul 6, 2023
@s314cy s314cy deleted the skin_mnist_task branch July 6, 2023 08:36
@martinjaggi
Copy link
Copy Markdown
Member

can we get more documentation on this new task (and a level higher how this did or not go easily with the documentation how to add new tasks)

docu needed on this task:

  • reference the dataset more precisely for full reproducibility. a paper would be best but links can work too (problem of permanence)
  • more details on model and train/test split
  • are there published accuracy numbers on this dataset in the literature, and how does it compared to yours here?

@walidabn
Copy link
Copy Markdown
Collaborator Author

Task documentation

  • **The Dataset : ** We trained our model on the HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. The dataset is also sometimes refered as "Skin MNIST" in practice. The dataset and its documentation are available at this link on the Harvard Dataverse, with this following Bibtex citation.
    @DaTa{DVN/DBW86T_2018,
    author = {Tschandl, Philipp},
    publisher = {Harvard Dataverse},
    title = {{The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions}},
    UNF = {UNF:6:KCZFcBLiFE5ObWcTc2ZBOA==},
    year = {2018},
    version = {V4},
    doi = {10.7910/DVN/DBW86T},
    url = {https://doi.org/10.7910/DVN/DBW86T}
    }

The dataset is also thoroughly described in this paper

  • Model : We first tried to use AutoKeras to determine which model would be a good fit for this image classification problem. AutoKeras can, given a simple problem with a small dataset (here, an image classification task), try out a set of different models, and train/test them for a certain amount of epochs (here, 50), and returns the model which achives the highest test accuracy.
    This results in a convolutional neural network with 3 Conv2D layers with relu activation, respectively of size 256, 128 and 64, with (3,3) filters. Each layer is followed by a (2,2) max pooling layer, and a dropout layer with a 0.3 dropout probability. After the third convolutional layer, the input goes through a 32-Dense layer, before undergoing a softmax layer with 7 outputs, since the task is a 7-class classification problem. We split the train set equally between 3 clients in a federated setup. Each client has the same label distribution (i.e. each peer has the same number of examples for each of the 7 labels).

  • Results : We compared the results obtained in this task with the same model and the same hyperparameters in a single setting case, where the single participant pooling all the data (the global case), which achieves 69.4% accuracy. In our federated setup with 3 clients, we reach 56% accuracy. State of the art models, such as RegNetY-320 reaches 91% accuracy on the task. Previous results, from AlexNet and InceptionV3 reached 85% accuracy. The current model we use is a simple convolutional neural network, with only 3 layers, without any kind of preprocessing, which explains the gap between state of the art models and our model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants