Stories by Merve Noyan on Medium

Open-Source Text Generation & Conversational AI Ecosystem in Hugging Face

Merve Noyan — Fri, 23 Jun 2023 11:23:18 GMT

Text generation and conversational technologies has been around for ages. With the recent boom of text generation models like GPT-4 and open-source alternatives (Falcon, MPT and more!) going mainstream, these technologies will be around and more integrated into every day products. In this post, I’ll go through a brief background on how they work, the types of text generation models and the tools in Hugging Face ecosystem that enable building products using open-source alternatives, challenges, questions and how we respond them.

Small Background on Text Generation

Text generation models are essentially trained with an objective of completing text. Earlier challenges in working with these models were controlling both the coherence and diversity of the text through inference parameters and the discriminative biases. The outputs that sounded more coherent were less creative and closer to the original training data, and wouldn’t sound like something that would be said by a human. Recent developments overcame these challenges, and user friendly UIs enabled everyone to try these models out.

Having more variation of open-source text generation models enables companies to keep privacy with their data (one of their intellectual properties!), ability to adapt models to their domains quicker and cut costs for inference instead of relying on closed paid APIs.

Simply put, these models are firstly trained with the objective of text completion, and later optimized using a process called reinforcement learning from human feedback. This optimization is mainly made over how natural and coherent the text sounds, rather than validity of the answer. You can get more information about this process here. In this post, we will not go through the details of this.

One thing you need to know about before we move on, is fine-tuning. This is the process of taking a very large model and transfer the knowledge contained this model to the use case, a downstream task. This tasks can come in form of instructions. As the model size grows, the models can generalize better to the instructions that do not exist in the fine-tuning data.

As of now, there’s two main types of text generation models. Models that complete the text are referred as Causal Language Models and can be seen below. Most known examples are GPT-3 and BLOOM. These models are trained with bunch of texts where latter part of the text is masked such that the model can complete the given text.

All causal language models on Hugging Face Hub can be found here.

Second type of text generation models is commonly referred as text-to-text generation models. These models are trained on text pairs, which can be questions and answers, or instructions and responses. The most popular ones are T5 and BART (which as of now aren’t state-of-the art). Google has recently released FLAN-T5 series of models. FLAN is a recent technique developed for instruction fine-tuning, and FLAN-T5 is essentially T5 fine-tuned using FLAN. As of now, FLAN-T5 series of models are state-of-the-art and open-source, available on Hugging Face Hub. Below you can see an illustration of how these models work.

Picture taken from FLAN-T5

The model GPT-3 itself is a causal language model, and the models in the backend of the ChatGPT (which is the UI for GPT-series models) are fine-tuned on prompts that can consist of conversations or instructions through RLHF. It’s an important distinction to make between these models. On Hugging Face Hub, you can find both causal language models, text-to-text models, and causal language models fine-tuned on instruction (which we’ll give links to later in this blog post).

Snippets to use these models are given in either the model repository, or the documentation page of that model type in Hugging Face.

Licensing

Most of the available text generation models are either closed-source or the license limits commercial use. As of now, MPT-3 and Falcon models are fully open-source, and have open-source friendly licenses (Apache 2.0) that allow commercial use. These models are causal language models. There are versions fine-tuned on various instruction datasets that exist on Hugging Face Hub that come in various sizes depending on your needs.

MPT-30B-Chat has CC-BY-NC-SA license (for non-commercial use) and , MPT-30B-Instruct has CC-BY-SA 3.0 that can be used commercially respectively. Falcon-7B-Instruct has Apache 2.0 license that allows commercial use. Another popular model is OpenAssistant, built on LLaMa model of Meta. LLaMa has restrictive license and due to this, OpenAssistant checkpoints built on LLaMa don’t have fully open-source licenses, but there are other OpenAssistant models built on open-source models like Falcon or pythia that can be used.

Some of the existing instruction datasets are either crowd-sourced or use outputs of existing models (e.g. the models behind ChatGPT). ALPACA dataset created by Stanford is created through the outputs of models behind ChatGPT, which OpenAI doesn’t allow using when training models. Moreover, there are various crowd-sourced instruction datasets with open-source licenses, like oasst1 (created by thousands of people voluntarily!) or databricks/databricks-dolly-15k. Models fine-tuned on these datasets can be distributed.

How can you use these models?

Response times and handling concurrent users remain a challenge for serving these models. For this, Hugging Face has released text-generation-inference (TGI) it’s an open-source serving solution for large language models, built with Rust, Python and gRPc.

Screenshot from Hugging Chat 🌈

TGI currently powers HuggingChat. HuggingChat is the chat UI for large language models. Currently it has OpenAssistant on backend. You can chat as much as you want with HuggingChat, and enable the search feature for validated responses. You can also give feedbacks to each response for model authors to train better models. The UI of HuggingChat is also open-sourced (yes 🤯) and soon, there will be a docker image release on Hugging Face Spaces (app store of machine learning) so you can have your very own HuggingChat instance.

How to find the best model as of now?

Hugging Face hosts an LLM leaderboard here. This leaderboard is created by people uploading models, and metrics that evaluate text generation task are calculated on Hugging Face’s clusters and later added to leaderboard. If you can’t find the language or domain you’re looking for, you can filter them here.

Models created with love by Hugging Face with BigScience and BigCode

Hugging Face has two main large language models, BLOOM 🌸 and StarCoder🌟. StarCoder is a causal language model trained on code from GitHub (with 80+ programming languages 🤯), it’s not fine-tuned on instructions and thus it serves more as a coding assistant to complete a given code, e.g. translate Python to C++, explain concepts (what’s recursion) or act as a terminal. You can try all of the StarCoder checkpoints in this application. It also comes with a VSCode extension.

BLOOM is a causal language model trained on 46 languages and 13 programming languages. It it the first open-source model to have more parameters than GPT-3. You can find available checkpoints in BLOOM documentation.

Bonus: Parameter Efficient Fine Tuning (PEFT)

If you’d like to fine-tune one of the existing large models on your own instruction dataset, it is nearly impossible to do so on consumer hardware and later deploy them (since the instruction models are same size as original checkpoints that are used for fine-tuning). PEFT is a library that allows you to fine-tune smaller part of the parameters for more efficiency. With PEFT, you can do low rank adaptation (LoRA), prefix tuning, prompt tuning and p-tuning.

This is all for this blog post, I’m planning to write down another one as new tools and models are being released. Please let me know what you think or build!

Further Resources

AWS has released TGI based LLM deployment deep learning containers called LLM Inference Containers, read about them here
Text Generation task page to find out more about the task itself
PEFT announcement blog post

Complete Guide on Deep Learning Architectures Part 2: Autoencoders

Merve Noyan — Sun, 11 Jun 2023 16:05:15 GMT

Photo by Daniele Levis Pelusi on Unsplash

Autoencoder: Basic Ideas

Autoencoder is the type of a neural network that reconstructs an input from the output. The basic idea here is that we have our inputs, and we compress those inputs in such a manner that we have the most important features to reconstruct it back.

As humans, when we’re asked to draw a tree with the least number of touches to the paper, (given that we’ve seen so many trees in our lifetime) we draw a line for the tree and couple of branches on top to provide an abstraction to how trees look like, this is what’s being done with autoencoder.

An average autoencoder looks like below:

Let’s take a solid case for image reconstruction.

We have our input layer with 784 units (assuming we give 28x28 images) and we could simply stack a layer on top with 28 units, and our output layer will have 784 units again.

The first part is called “encoder” it encodes our inputs as latent variables, and second part is called “decoder” it will reconstruct our inputs from the latent variables.

The hidden layer having less number of units will be enough to do the compression and get latent variables. This is called “undercomplete autoencoder” (we also have other types of autoencoders but this gives the main idea, we’ll go over them as well). So in short, it’s just another feed forward neural network that has the following characteristics:

input layer, hidden layer with less number of units and output layer
it’s unsupervised: we will pass our inputs, get the output and compare with input again
Our loss function will be comparing the input to compressed and then reconstructed version of the input and see if the model is successful or not.

An autoencoder that has a decoder with a linear layer essentially does the same thing as principal component analysis (even though training objective is to copy the input).
Another core concept of autoencoder is weight tying. The weights of decoder are tied to weights of encoder. When you transpose the weight matrix of encoder, you get the weights of decoder. It’s a common practice for decoder weights to be tied to encoder weights. This saves memory (using less parameters), and reduced overfitting. I tried to explain my intuition below in multiple graphics. Let’s take a look.

For the autoencoder below:

Weight matrix of encoder and decoder looks like this (you can skip this if you know what transpose means):

Keras Implementation

Let’s implement above network using Keras Subclassing API. I left comments for each layer to walk you through how to create it.


class Autoencoder(Model):

  def __init__(self, latent_dim):
    super(Autoencoder, self).__init__()
    self.latent_dim = latent_dim

	# define our encoder and decoder with Sequential API
	# flatten the image and pass to latent layer to encode
    self.encoder = tf.keras.Sequential([
      layers.Flatten(),
      layers.Dense(latent_dim, activation=’relu’),
    ])

	# reconstruct latent outputs with another dense layer
	# reshape back to image size
    self.decoder = tf.keras.Sequential([
      layers.Dense(784, activation=’sigmoid’),
      layers.Reshape((28, 28))
    ])

	# give input to encoder and pass encoder outputs (latents) to decoder
  def call(self, x):
    encoded = self.encoder(x)
    decoded = self.decoder(encoded)
    return decoded

# initialize model with latent dimension of 128
autoencoder = Autoencoder(128)

# we can use simple MSE loss to compare input with reconstruction
autoencoder.compile(optimizer=’adam’, loss=losses.MeanSquaredError())

# we don’t have a y_train given we want output to be same as input :) 
autoencoder.fit(x_train, x_train,
                epochs=10,
                shuffle=True,
                validation_data=(x_test, x_test))

In contrast to undercomplete autoencoder, complete autoencoder has equal units, and overcomplete autoencoder has more units in latent dimension compared to encoder and deocder. This causes the model to not learn anything but rather overfit. In undercomplete autoencoders on the other hand, the encoder and decoder might be overcapacitated with information if hidden layer is too small. To avoid overengineering this and add more functionalities to autoencoders, regularized autoencoders are introduced. These models have different loss functions that not only copy input to output, but make the model more robust to noisy, sparse or missing data. There are two types of regularized autoencoders, called denoising autoencoder and sparse autoencoder. We will not go through them in depth in this post since implementation doesn’t differ a lot from the normal autoencoder.

Sparse Autoencoder

Sparse autoencoders are autoencoders that have loss functions with a penalty for latent dimension (added to encoder output) on top of the reconstruction loss. These sparse features can be used to make the problem supervised, where the outputs depend on those features. This way, autoencoders can be used for problems like classification.

Denoising Autoencoder

Denoising autoencoders are type of autoencoders that remove the noise from a given input. To do this, we simply train the autoencoder with corrupted version of input with a noise and ask the model to output the original version of the input that doesn’t have the noise. You can see the comparison of loss functions below. The implementation of this is same as normal autoencoder, except for the input.

Stacked Autoencoder

Before ReLU existed, vanishing gradients would make it impossible to train deep neural networks. For this, stacked autoencoders were created as a hacky workaround. One autoencoder was trained to learn the features of the training data, and then the decoder layer was cut and another encoder is added on top and the new network is trained. At the end, softmax layer was added to use these features for classification. This could be one of the early techniques to do transfer learning.

There are variational autoencoders and other types heavily used in generative AI. I will go through them in another blog post. Thanks a lot if you’ve read this far and let me know if there’s anything I can improve :)

Açık Yazılım Ağı’nda Geliştirdiğimiz Makine Öğrenmesi Uygulamaları

Merve Noyan — Mon, 27 Feb 2023 15:19:02 GMT

afetharita arayüzü

Yazarlar: Merve Noyan & Alara Dirik

This blog contains the applications we developed for disaster response. You can find the English version here 👉 https://huggingface.co/blog/using-ml-for-disasters

6 Şubat 2023'te Güneydoğu Türkiye’yi vuran 7,7 ve 7,6 büyüklüğündeki depremler, 10 ili etkiledi ve 21 Şubat itibarıyla 42.000'den fazla ölüm ve 120.000'den fazla yaralanmayla sonuçlandı. Depremden saatler Açık Yazılım Ağı Discord sunucusunda kurtarma ekiplerine, depremzedelere ve yardım etmek isteyenlere kaynak olabilecek bir proje geliştirmeye başladık: afetharita.com. Depremin ilk gününde depremzedelerin yardım çağrılarının yazılı şekilde instagram hikayesi ya da tweet olarak paylaştıklarını gözlemledik. Bu verileri otomatik olarak çekip anlamlı hale getirmenin yollarını aradık ve bir yandan zamanla yarışmak zorunda kaldık. Bu blog yazısında oluşturduğumuz uygulamaları ve geliştirme süreçlerinde izlediğimiz yolları anlatacağız.

Discord sunucusuna davet edildiğimde nasıl çalışacağımız ve ne yapacağımız konusunda oldukça fazla kaos vardı. Bilgi almak ve işlemek için makine öğrenimi tabanlı uygulamalar oluşturmak istediğimize ve ayrıca modeller ve verisetleri için bir registry’e ihtiyacımız olduğuna karar verdiğimiz için aşağıdaki Hugging Face organizasyon hesabını açtık. Eğittiğimiz modeller, kişisel veri içermeyen veri setleri ve uygulamalarımız hala bu hesapta bulunuyor.

İlk gün insanların paylaştıkları instagram hikayelerinden ya da twitter’da bu hikaye ekran görüntülerinden bilgi çekip, yapılandırıp, bir veritabanına yazma ihtiyacımız doğdu. Çeşitli açık kaynaklı OCR (optik karakter tanıma) araçlarını denedikten sonra, bu uygulamayı geliştirmek için easyocr ve arayüz için Gradio’yu kullanmaya başladık ekiplerin OCR’den yararlanabilmesi için arayüzden API endpoint’leri açtık. Bu uygulamaya gelen ekran görüntülerinden OCR ile metinleri çekip ardından bu metinlerden aşağıda değineceğim açık kaynaklı kendi eğittiğimiz adres modelini kullanarak adresleri, isimleri ve telefon numaralarını çekip veritabanına yazdık.

Daha sonra, çeşitli kanallardan yardım çağrısında bulunan depremzedelerin adreslerini ve kişisel bilgilerini (daha sonra anonim hale getirildi) içeren etiketli veri bize verildi. Hem kapalı kaynak modellerin birkaç adımlık yönlendirmesiyle (few-shot learning) hem de transformers kütüphanesiyle kendi adres çekme modelimizi eğiterek denemeye başladık. Bunun için baz model olarak dbmdz/bert-base-turkish-cased kullandık ve ilk adres çıkarma modelini eğitmiş olduk.

Model daha sonra adresleri ayrıştırmak için “afetharita”da kullanıldı. Daha sonra, ayrıştırılan adresler coğrafi kodlama API’sine iletilecek, boylam ve enlem alınacak ve bunları arayüzde gösterilmek üzere dağıtacak olan backend’e gidecekti. Bu modeli inference’a açık şekilde host etmek için Hugging Face’in hazır inference API’sini kullandık. Bu bizi modeli çekip üstüne FastAPI uygulaması yazıp docker görüntüsü oluşturmaktan, her deployment için CI/CD kurmaktan ve sonrasında buluta koymaktan kurtardı.

Daha sonra, elimizdeki verilerden depremzedelerin ihtiyaçlarını sınıflandırmak için niyet sınıflandırma modeli eğitmemiz istendi. Her tweet’te birden fazla ihtiyaç vardı. Bu ihtiyaçlara örnek barınak, yiyecek, ya da lojistik olabilir. İlk olarak Hugging Face Hub’da açık kaynaklı NLI modelleri ile zero-shot prompting denemeleri ve OpenAI davinci endpoint’i ile few-shot prompting denemeleri yapmaya başladık. NLI modelleri, aday etiketlerle doğrudan çıkarım yapabildiğimiz ve veri kayması meydana geldikçe etiketleri değiştirebildiğimiz için özellikle kullanışlıydı. Davinci de iyi çalışıyordu, fakat çıktısını backend’e verirken olmayan etiketler uydurduğu için tercih etmedik. Sonrasında elimize etiketli veri geçtiği için, BERT fine-tune etmeye karar verdik. Deneylerimizin performansını model repository’lerindeki model kartının metadata kısmında not ettik, sonrasında lider tablosu oluşturduk.

Yanlış negatiflerin olmamasını çok önemsiyorduk (bir ihtiyacın olması fakat olmamış gibi gözükmesi durumu) ve sınıflarda dengesizlik vardı, bu nedenle bir recall ve F1 puanlarının macro average’ı üzerinden bir kıyaslama yaptık.

Eğitim setinde sızıntıyı önlemek ve modellerimizi tutarlı bir şekilde kıyaslamak için ayrı bir test seti oluşturduk. Bu problem çok etiketli sınıflandırma olduğu için modelin en iyi performans gösterdiği eşiği bulup not ettik ve üretimde hangi etiketin sepete gireceğini bulmak için bu eşiğe bağlı kaldık.

Veri etiketleyiciler bize daha doğru ve güncel verisetleri sağlamak için çalıştıklarından adres çekme modelimizin değerlendirilmesini crowdsourced hale getirmek istedik. Adres çekme modelini değerlendirmek için, Argilla ve Gradio kullanarak, insanların bir tweet girip çıktıyı doğru/yanlış/belirsiz olarak işaretleyebildiği bir etiketleme arayüzü kurduk.

Daha sonra veri setindeki çiftleri çıkardık ve sonraki deneylerimizde modeli değerlendirmek için kullanmaya karar verdik.

Bunun dışında sonradan üretime aldığımız bir projemizde de OpenAI davinci‘yle few-shot denemeler yaparak her tweet’ten daha spesifik ihtiyaçları (bebek bezi, ilaç vb) çekip serbest metin olarak yazdık. Bu modeli FastAPI ile sarıp ardından buluta koyduk.

Adres ve ihtiyaç sınıflandırma modelleri an itibariyle gönüllülerin ve arama kurtarma ekiplerinin hayatta kalanlara ihtiyaçları ulaştırabilmesi için aşağıdaki ısı haritasındaki noktaları oluşturmak için üretimde kullanılıyor.

Adres tanıma ve amaç sınıflandırma modelleri için MLOps işlem hattımız aşağıdadır.

Teknik Gözlemler: Hugging Face transformers kullanmak hem davinci’yle few-shot prompting karşılaştırmasında hem de diğer tekniklere karşı yüzde 10–15 civarı doğruluk oranı artışı verdi. transformers’ın başka iyi bir tarafıysa model açık kaynak olduğu için inference API kullanıldığında bir sorun çıkması dahilinde modelin elimizin altında olması ve hızlıca üretime çıkarabilmemizdi. BERT fine-tuning’inde sadece sınıflandırıcı katmanı eğittiğimizden ötürü her eğitim Colab’deki GPU’da yaklaşık 3–4 dakika sürdü, sonrasında modelleri Hugging Face Hub’a attık. Modelin Hub’da olması inference API’nin çalışması için yeterliydi, bu yüzden modeli API haline getirmek için ekstra bir efor sarfetmedik.

Not: Bu uygulamaların arkasında bunları kısa sürede çıkarmak için gece gündüz çalışan onlarca kişi var.

Uzaktan Algılama Uygulamaları

Uzaktan algılama tarafında çalışan takımlar arama kurtarma operasyonlarını yönlendirmek amacıyla binalara ve altyapıya verilen hasarı değerlendirmek için uzaktan algılama uygulamaları üzerinde çalıştı. Depremin ilk 48 saatinde elektriğin ve sabit mobil ağların olmaması, çöken yollarla birleştiğinde, hasarın boyutunun ve nerede yardıma ihtiyaç duyulduğunun değerlendirilmesini son derece zorlaştırdı. Arama kurtarma operasyonları, iletişim ve ulaşımdaki zorluklar nedeniyle yıkılan ve hasar gören binaların yanlış ihbarlarından da büyük ölçüde etkilendi. Bu sorunları ele alma ve gelecekte kullanılabilecek açık kaynak araçları oluşturma çabasıyla, Planet Labs, Maxar ve Copernicus Open Access Hub’dan etkilenen bölgelerin deprem öncesi ve sonrası uydu görüntülerini toplayarak başladık.

İlk yaklaşımımız, “binalar” için tek bir kategoriyle, nesne algılama ve örnek bölümleme için uydu görüntülerini hızlı bir şekilde etiketlemekti. Amaç, aynı bölgeden toplanan deprem öncesi ve sonrası görüntülerde ayakta kalan bina sayılarını karşılaştırarak hasarın boyutunu değerlendirmekti. Modelleri eğitmeyi kolaylaştırmak için 1080x1080 uydu görüntülerini daha küçük 640x640 parçalara kırparak başladık. Ardından, bina tespiti ve bir YOLOv5, YOLOv8 ve EfficientNet modellerinde tespit için ince ayar yaptık. Anlamsal segmentasyon içinse SegFormer fine-tune ettik ve bu uygulamaları Hugging Face Spaces uygulaması olarak dışarıya açtık.

Yine, onlarca kişi etiketleme, veri hazırlama ve eğitim modelleri üzerinde çalıştı. Bireysel gönüllülere ek olarak, Co-One gibi şirketler, uydu verilerini, binalar ve altyapı için hasar yok, yıkıldı, hasarlı, hasarsız tesis gibi daha ayrıntılı açıklamalarla etiketlemek için gönüllü oldu. An itibariyle nihai hedefimiz, gelecekte dünya çapındaki arama ve kurtarma operasyonlarını hızlandırabilecek kapsamlı bir açık kaynak veri seti yayınlamaktır.

Complete Guide on Deep Learning Architectures, Chapter 1 on ConvNets

Merve Noyan — Sat, 19 Nov 2022 16:05:39 GMT

Welcome to my guide where I compiled my intuitions, notes and code on deep learning architectures. As a pre-requisite, you only have to know about basics of deep learning: forward pass, backward pass, activations and so on.

My motivation to write this comes from most of the people finding the theory overwhelming and fail to implement the algorithms (I was one of them!). Before every midterm & final I always made sure I understand intuition enough to solve any problem, I studied from wide range of resources to not get overfitted into one’s intuition, tried to put down everything. I hope this guide will be useful to people who take technical interviews or exams or simply is willing to understand.

Convolutional Neural Networks

Convolution: Basic Ideas

Convolution is an operation used to extract features from data. This can be 1D, 2D or 3D. I’ll explain the operation with a solid example, all you need to know now is that the operation is simply takes a matrix made of numbers, moves it through the data, takes the sum of products between the data and that matrix. This matrix is called kernel or filter. You might say, “what does it have to do with the feature extraction and how am I supposed to apply it?”.

Don’t panic! We’re getting to it.

To illustrate the intuition, I’d like us to take a look at this example. We have this 1D data and we visualize it.

I have this kernel [-1, 1]. I’ll start from left-most element, put the kernel, multiply the overlapping numbers and sum them up. Kernels have something centers, it’s one of the elements. Here, we pick the center as 1 (the element on the right). Now, the kernel’s center has to touch every single element, so we put a zero to the the left of the element for convenience. If I don’t pad it, I’ll have to start multiplying -1 with the left-most element, and 1 will not touch the left-most element, that’s why we apply padding. Let’s see how it looks like.

I’m multiplying the left-most element (that is currently a pad) with -1, the first element (zero) with 1 and sum them up, get a 0 and note it down. Now, I’ll move kernel by one position and do the same, note it down again, this movement is called striding, this is usually done by moving the kernel by one pixel, you can also move it with more pixels. The result (convolved data) is currently an array [0, 0].

I will repeat it until the right element of kernel touches every element, which yields the below result.

Notice anything? The filter gives changes in the data (the derivatives!), this is one characteristic we could extract from our data. Let’s visualize.

The convolved data (result of the convolution) is actually called feature map and it makes so much sense, as it shows, well, the features we can extract, the characteristics related to data, the change.

This is exactly the deal with edge detection filters as well! Let’s see it in 2-dimensional data. This time, our kernel will be different, it will be a 3x3 kernel (could’ve been 2x2 as well, just saying).

This filter is actually quite famous but I won’t spoil it for you now :). Previous filter was [-1 1] meanwhile this one is [-1 0 1], it’s just 3x3 and nothing different, it gives changes on the horizontal axis as well. Let’s see an example and apply convolution. Below is our 2D data.

Think of this as an image and we want to extract the horizontal changes. Now, the center of the filter has to touch every single pixel, so we pad the image.

The feature map will be the same size as the original data. The result of the convolution will be written to the same position that the center of the kernel touched in the original matrix, meaning, for this one, it will touch the left-most and the top position.

If we keep applying the convolution we get following feature map.

Which shows us the horizontal change, the edges. This filter is actually called the Prewitt Filter.

You can flip the Prewitt filter to get the changes in vertical direction. Sobel filter is another filter for edge detection.

Convolutional Neural Networks

Fine, but what does it have to do with deep learning? Well, brute forcing filters to extract features does not work well with every image. We could somehow find the optimal filters used to extract important information, or even detect objects in the images. That’s where convolutional neural networks come into play. We convolve images with various filters, and these pixels in the feature maps will eventually become our parameters that we will optimize, and in the end, we will find the best filters for our problem.

The idea is, we will use filters to extract information. We will randomly initialize multiple filters, create our feature maps, feed it to a classifier and do back propagation. Before diving into it, I’d like to introduce you to something we call “pooling”.

As you can see above, there are many pixels that show the change in the feature map. To know that there’s an edge, we actually need to see that there’s one change and that’s it. We can actually get the information that there’s a change (an edge, a corner, anything). In the above example, we could’ve gotten only one of the 2’s, and that would be enough. This way, we will store less parameters and still have the features. This operation of getting the most important element in the feature map is called pooling. With pooling, we lose the exact location of a pixel where there’s an edge, but end up storing less parameters. Also, this way, our feature extraction mechanism will be more robust to small changes, e.g. we only need to know that there are two eyes, a nose and a mouth to know that there’s a face in an image, the distance between those elements and the size of those elements tend to change from face to face, and pooling enables the model to be more robust against these changes. Another good thing about pooling is that it helps us handle varying input sizes. I’d like you to watch this video to gather a better intuition. Below is the max pooling operation, where every four pixels, we get the maximum pixel. There are various types of pooling, e.g. average pooling, weighted pooling or L2 pooling.

Let’s get to the architecture. We will use a Keras example and I will walk you through what’s happening. Below is our model (again, don’t panic, I will walk you through what’s happening).

If you don’t know what Keras Sequential API is doing, it stacks layers like lego bricks and connects them. Each layer has different hyperparameters, Conv2D layer takes number of convolution filters, kernel size and activation function, while MaxPooling2D takes pooling size, and dense layer takes number of output units (again, don’t panic).

Most of the convnet implementations don’t do padding for the sake of letting kernel touch every pixel in an image processing fashion, since padding with zeroes come with an assumption that we might have features in borders, and it adds a complexity for calculation on top. That’s why you’re seeing that the first input size is (26,26), we lose information along the borders.

model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 1600)              0         
_________________________________________________________________
dropout (Dropout)            (None, 1600)              0         
_________________________________________________________________
dense (Dense)                (None, 10)                16010     
=================================================================
Total params: 34,826
Trainable params: 34,826
Non-trainable params: 0
_________________________________________________________________

Convolutional neural networks start with an input layer and a convolutional layer. Keras Conv2D layers take number of kernels and the size of the kernel as parameters. What’s happening is illustrated below. Here, we convolve the image with 32 kernels and end up with 32 feature maps, each having the size of the image.

After convolutional layers, we put a max pooling layer to reduce the number of parameters stored and make the model robust to the changes, as discussed above. This will reduce the number of parameters calculated.

Then, these feature maps are concatenated together and are flattened.

Later on, we use something called dropout to drop a portion of parameters to avoid overfitting. Finally, the final form of weights will go through dense layer for the classification part to finally get classified and backpropagation will take place.

Backpropagation in Convolutional Neural Networks in Theory

How does the backpropagation work here? We want to optimize for the best kernel values here so they’re our weights. At the end, we expect the classifier to figure out relationship between pixel values, kernels, and classes: we have a very long flattened array consisting of elements that consist of pooled and activated version of pixels convolved with initial weights (kernel elements), and we update those weights such that we answer the question “which kernels should I apply to make a distinction between cat and a dog photo?”. So point of training CNNs is to come up with the optimal kernels, and these are found thanks to backpropagation. Prior to CNNs, people would try to try a lot of filters on an image to extract features themselves, meanwhile most generic filters (like we’ve seen above, e.g. Prewitt or Sobel) do not necessarily need to work for all images given images can be very different, even in the same dataset. This is why CNNs outperform traditional image processing techniques.

There are couple of advantages by means of storage when we use convolutional neural networks.

Parameter sharing

In convolutional neural networks, we convolve with the same filter across all pixels and all images which provides an advantage over storing parameters, this is much more efficient than going through an image with a dense neural network. This is called “weight tying” and those weights are called “tied weights”. This is also seen in autoencoders.

Sparse Interactions

In densely connected neural networks, we input the whole piece of data at once -which is very overwhelming due to how images have hundreds or thousands of pixels-, meanwhile in convnets, we have smaller kernels that we use to extract features. This is called sparse interaction, and it helps us use less memory.

Open-Source Contributio: How to Get Started

Merve Noyan — Mon, 18 Jul 2022 18:52:21 GMT

Open-Source: How to Get Started

Drawing of Comet Morehouse, 1908 (C/1908 R1)

This article is for people who’d like to get started with contributing to open-source with code, or community contributions, or writing their own libraries from scratch. Note that this article is quite opinionated, as my experience is limited.

How I Started my Journey

I came across this video by Hugging Face on YouTube that changed my life. I started following Thomas Wolf (the person on the video, who’s Chief Scientist at Hugging Face) on twitter, and must say I was such a big fan of what Hugging Face was doing. One day I saw him posting about a contribution sprint.

I got myself signed up without knowing about any open-source workflows. We were supposed to commit a dataset script that would load the dataset and create splits and define configurations and various specifications of the dataset.

I must be honest I struggled quite a lot in the first place. I didn’t even know about formatting, let alone CI!

I required guidance, but after a while I started helping people out though, thanks to Quentin and Yacine teaching me about the workflows 🤩 Also they usually write down contribution guides which helps you a lot if you follow them.

hehe

After that, I participated in other sprints (like wav2vec one), where I failed — due to quality of data, the model wasn’t performing well and I didn’t have time to find other sources of data without breaching licenses or doing something against PII. After a while, I started teaching people about how to use Hugging Face Transformers, I even included it on my talk at Google I/O.

This is another way of contributing to open-source. You don’t necessarily need to contribute code to contribute to open-source, there are other ways:

writing documentation (blog posts, library documentation, projects on github)
gathering people and giving workshops to teach them about a specific open-source tool
opening issues/feature requests on GitHub and giving feedback, testing beta versions
answering questions on forum, stackoverflow or developer discord

are more valuable than you think. A library with no adoption due to lack of documentation or resources is just a code. Codebases require community to thrive.

Get Started with Community Contributions

Just pick a new tool in the ecosystem, test it and see if it doesn’t work, if it’s not intuitive by means of developer experience and give feedback. Even if you don’t find a bug, pointing out if usage can be simplified is a valuable thing to report.
Writing a blog post about how to use the tool, putting up a notebook on kaggle or a GitHub repository including a potential use case for the tool is a good place to start.
If you’re good on shooting videos, you can shoot a video of a walkthrough in ecosystem, developing an application using the tool and more!
If you’d like to give a talk in a developer conference, putting up a deck on the ecosystem, new things included in latest release or going through an end-to-end application is very good. At Google I/O, I was going through a concept called transfer learning and how to train a model with transfer learning using google’s stack and also Hugging Face.

small part of my graph 🙂

Getting Started with Code Contributions

It might be quite intimidating and overwhelming to get started to contribute a big codebase: the fear of your PR being rejected, receiving a lot of reviews thinking it’s a bad thing, tests not passing and more. Here’s couple of things no one will tell you but are obvious:

No one starts contributing to open-source at the age of 1. Everyone has learnt it from someone else.
As open-source maintainers, we love to onboard new contributors, we’re more than happy to provide guidance.
Your PR being rejected is not the end of the world, it doesn’t mean your code is bad, there are multiple reasons of it (that I’ll go through) and you should open a second PR.

🌈 Open an issue before opening a PR or find the issue that you want to solve and discuss with developers first 🌈

There are design decisions associated with each codebase. If you came across a design that you don’t agree on or you saw a problem, first find the issue if it exists or open one and discuss with developers. Come up with an initial design of a solution and then open a PR that you can iterate on.

🌈 Find good first issues 🌈

Most of the libraries have issue tags under https://github.com/organization/library-name/labels. You need to filter for issues that have good first issue tags, they are usually good to get started with a codebase.

These do not guarantee your PR will be merged but they will ease the process and make it more likely for your PRs to be merged.

✨ Try to follow the developer communities ✨

As I told my story, my journey has become with me following Hugging Face. Like Hugging Face, many developer communities have open-source contribution sprints, like scikit-learn. You need to keep in touch with the core developers to be notified about the sprints and more. The good thing about sprints is that the core developers dedicate a large portion of their time to help you out, thus make it easier for you to get started. At Hugging Face, we have good amount of sprints, including:

We’ve done a documentation sprint: where transformers library needed to update the docstrings of the functions and with the help of community, we documented the codebase.
wav2vec/2 sprints: Where everyone trains an audio model with their own language. Given number of languages are overwhelming for core developers to handle, we provided compute, scripts and guidance to participants to train a model and commit.

🙌🏻 Some tools you can get familiar with that open-source libraries use:

pre-commit: Some libraries use pre-commit hooks. Once installed, it formats your code during committing your code.
black/flake-8/isort/any formatting or quality tool: Every code base has a style guide that it follows. For python, black helps format the code automatically, meanwhile flake8 checks the code for quality, e.g. you might have a variable you’ve defined but not used. For other languages, you need to find out what works.
test libraries: If you’re contributing to an open-source library with a new addition to the codebase, you need to be familiar with testing as you’ll be expected to write a test for your code to preserve code coverage.

Getting started with writing your own library

I recently started writing a library from scratch with my colleagues. Before that, I tried writing another library with one of my friends and horribly failed. The main reason was that I couldn’t come up with a design for the functions, classes & co. I got overwhelmed and couldn’t put my ideas into library. This time I was lucky enough to work with a very experienced open-source developer (follow him!). Couple of things I learnt from him that has given me a starting point:

Always put developer experience before anything. A codebase that is not intuitive will not be adopted.
This way, he gave me a good starting point: before anything, he’s writing executable notebook-like code cells and define how users will interact, before actually writing the code itself, then he implements the code afterwards. (Check here)
Related: document your code well, e.g. write docstrings first and then implement.
Write tests. Writing tests might be cumbersome but it will make sure your code will always work as intended and be robust against edge cases, and will prevent problems related to consistency next time someone contributes a code, help with backwards compatibility.
For later, put up contribution guidelines to onboard new developers. Try to define rules about merger, styling and so on such that it will easily scale to multiple maintainers or contributors, e.g. we don’t merge our own code ourselves for sanity check.

Finishing up the blog, I have couple of thanks to make to people who onboarded me to open-source and convinced me to do it for full-time (give them a follow if you want to learn more about contributing 🙂)

Yacine & Quentin for helping me out on my first sprint
Omar & Julien for reaching out to me to let me work with them
Adrin for teaching me a new thing every single day.

(and more people from Hugging Face that I wouldn’t be able to fit their names in this blog)

If you have any additions or stories to add, this article is open to additions, I will add your stories below this article, so feel free to reach out to me on my twitter, I’d love to hear your open-source stories.

Copyrighted by Looney Tunes (public domain)

Putting Keras on Hub for Collaborative Training and Reproducibility

Merve Noyan — Fri, 22 Apr 2022 13:26:00 GMT

Reproducibility, experiment tracking and collaborative training are couple of problems in open-source machine learning. At Hugging Face, we are challenging these issues to enable collaborative training, versioning models and hosting open-source machine learning demos on Hugging Face Hub. Thanks to recent Keras integration, you can easily:

Push and save a model to Hugging Face Hub and let others pull easily,
Version your models,
Reproduce your experiments with automatically generated model card and hosted TensorBoard instance,
Load model with only one line of code,
Easily load your models for demonstration on Spaces.

Pushing Your Model to the Hugging Face Hub

After your training is done, you can call push_to_hub_keras() to push your model to Hugging Face Hub.

https://medium.com/media/e6c2dfcd92007fbec8244252f27b1f22/href

which will generate a model card that includes your model’s hyperparameters, metrics of your model in a JSON, plot of your model and couple of sections related to the usage purpose of your model, biases the training data might have and limitations about putting your model to production. The tags are written to model card and are used for others to find your models easily. The repository will host your tensorboard traces like below.

Your tensorboard traces will look like this 🙌🏻

Your model history, hyperparameters and plot are included in the card 🃏

You can see the example hosted model with model card and tensorboard traces here.

Your model is saved with tf.keras.models.save_model(). You can pass the rest of the arguments in **model_save_kwargs. This saves your model as SavedModel with weights and graph, which is a format that you can later convert to other model formats in TensorFlow Extended to later use your model in browsers, edge devices, REST/gRPC APIs.

https://medium.com/media/1269df0ee40cb0e52154bd2035224fcd/href

Loading Your Model and Building A Demo

After your model is pushed, you can simply load your model with from_pretrained_keras(). You can also load a Keras model that you’ve liked on the Hub like below.

https://medium.com/media/435fd8a8067ce2c0e6e9c51edc489329/href

To demonstrate your model’s capabilities, you can simply load your model with from_pretrained_keras(), write an inference function in an app.py file, then create a Spaces repository and drag and drop your app.py file and your application will be ready.

You can simply drag and drop your app.py file and your Space will be built! ✨

We’ve organized a sprint to train the Keras examples, host them on Hugging Face Hub and build interactive demos for reproducibility. If you liked a Keras example, simply look at the bottom of the example and you can find redirections to model and Space repositories.

Find the awesome tutorial here: https://keras.io/examples/vision/probing_vits/

The Keras examples are all hosted in this page, see the model repositories and Spaces applications.

Where was this model during Game of Thrones! 🥲

We’ve written two guides to build Streamlit and Gradio demos easily host them on the Hugging Face Hub.

Thanks a lot for reading. 🤗. I’d love to hear your experience and feature requests for Keras at merve@hf.co. We’re planning a second round of the Keras sprint (and more community events!), if you’d like to be informed, feel free to join to our Discord channel 👾.

Bir Veri Bilimcinin Araç Çantası

Merve Noyan — Fri, 15 Oct 2021 08:00:01 GMT

Görsel: The New Yorker, https://paidpost.newyorker.com/article/hbo/rise-ethical-machines

Merhabalar, bu blog yazımda kendimce kaggle yarışmaları ve şirketlerde gördüğüm veri bilimcilerin kullandığı ve benim de çok sevdiğim araçları derledim. Bu yazı fazlasıyla önyargılı (opinionated) bir yazı, bunu belirtmekte fayda var. Bunlar dışında olmasını düşündüğünüz araçlar varsa yazabilirsiniz.

Ben bu yazıda sklearn, fast.ai, TensorFlow ve PyTorch’u dışarıda bıraktım, bunlar zaten bilmeniz gereken kütüphaneler. Aynı şekilde hangi alanda çalışıyorsanız o alanın kendine has araçları var, bunlardan da bahsetmeyeceğim. (bilgisayarlı görü için opencv, NLP için spaCy, NLTK gibi)

Genel hatlarıyla ele alacak olursak sizi bir adım öteye taşıyacak araçlar ise,

MLOps platformları (W&B, Comet ve diğerleri)
Öneğitimli modeller barındıran çözümler (Hugging Face, AllenNLP, Stanza, Tensorflow Hub)
Veri bilimi ve makine öğrenmesi projelerinize kolayca arayüz yapmanızı sağlayan kütüphaneler (Streamlit, Gradio)
Size iyi bir baz çözüm veren AutoML araçları (H2O, AutoNLP)

Bunların hepsini az sonra detaylandıracağım.

Bu yazımda son bir senedir makine öğrenmesi mühendisi olarak çalıştığım yerde backend geliştirirken kullandığım postman, ngrok, gazpacho gibi araçlardan ve katıldığım açık kaynak sprint’lerde öğrendiğim flake8, black gibi kütüphanelerden de bahsettim, ilginizi çekebilir. Bunların hepsi hakkında video çekmeye başladım, yakında koymayı planlıyorum.

Hazırsak başlayalım.

MLOps Platformları

MLOps, DevOps’un kod odaklı değil model odaklı yönetim biçimi. MLOps’ta kodun versiyon kontrolünden ziyade deney takibi (experiment tracking) var. Hangi modeliniz, hangi hiperparametrelerle nasıl sonuç verdi bunlara bakıyorsunuz aslında. Makine öğrenmesinin baştan sona kendine has dertleri var, başka bir tanesi ise veri kayması (data drift), bunu da üretim sonrasında modeli izleyen bir araçla halletmeniz gerek, bu platformlar bu çözümü de sağlıyor. Bunlardan en çok bilinen iki tanesi Weights & Biases ve Comet. Bu araçlar çok büyük veri bilimi takımları tarafından an itibariyle kullanılıyor. Bu linkte bu platformların karşılaştırmasına bakabilirsiniz (içinde MLFlow, KubeFlow, neptune.ai gibi diğer platformlar da var). Benim gözlemim Comet uçtan uca daha fazla süreci kendi içinde barındırıyor, W&B kaggle topluluklarında çokça kullanılıyor (belki UX farklılıklarından). Bunlar kişisel projeler için ücretsiz, bu yüzden kaggle yarışmalarında çok kullanılıyor, Google, OpenAI gibi şirketlerden para kazanıyorlar.

Öneğitimli Modeller Barındıran Platformlar

Makine öğrenmesi en son haliyle artık öneğitimli modellerin yaygın şekilde kullandığı bir paradigmaya kaydı. Öneğitimli modellerin hassas ayar çekilmesi, sıfırdan eğitime göre daha iyi performans sergiliyor. Bunun için Hugging Face, AllenNLP, TensorFlow Hub gibi platformları/kütüphaneleri iyi kullanmanız gerek. HuggingFace kendi içerisinde AllenNLP entegrasyonu yaptı, aynı zamanda facebook, google gibi şirketlerin veri bilimi takımları da modellerini Hugging Face’te barındırıyor. Bu platformlardan seçtiğiniz modeli alıp direkt kendi ağırlıklarıyla kullanabilir, ya da kendi veriniz üstünde hassas ayar yapabilirsiniz. Bu platformlarda veri setleri ve önişleme araçları da bulunuyor. Hugging Face ve TF Hub’da NLP’nin yanında ses modelleri/veri setleri ve görüntü modelleri/veri setleri de var. Eğer ses için özelleşmiş bir platform isterseniz Coqui’ye biz göz atmanızı öneririm.

Demo için Arayüz Oluşturma Kütüphaneleri

Veri bilimci ya da makine öğrenmesi mühendisiyseniz müşterilerinize ya da projedeki paydaşlarınıza model, keşifsel veri analizi, verisetleriniz gibi normalde notebook’larda dağınık şekilde gösterilen çalışmalarınıza arayüz yapmanız gerek, bu da eğer çok iyi bir önyüz geliştiricisi değilseniz zor bir iş. Burada imdadımıza streamlit ve gradio gibi kütüphaneler yetişiyor. Bunlar modellerinizle tahminde bulunmak için hiperparametreleri alan bileşenlerin yanısıra interaktif veri görselleştirme paketleri, çalıştığınız alan odaklı paketler (spacy-streamlit gibi) de barındırıyor. İkisini de kullanmış biri olarak görüşüm şu şekilde: eğer hızlıca makine öğrenmesi modelinizi demo haline getirmek isterseniz Gradio kullanın. Gradio daha çok model odaklı olmakla birlikte modeller için farklı çeşitlerde input’lar da alıyor (en son çizim için whiteboard bile getirmişlerdi). Streamlit’te ise streamlit’in çekirdek kütüphanesi dışında ekstra bir bileşen isterseniz bunları indirmeniz gerek. Aşağıda çok sevdiğim GPT-2 ile metin yazma uygulamasını streamlit’le yeniden oluşturdum.

Aşağıda keşifsel veri analizimi streamlit’le arayüze çevirdim.

Gradio’da bir modeli alıp arayüz oluşturup paylaşılabilir hale getirmek iste 5 satırlık kod, üç dakika sürüyor.

Bunları nasıl yapabileceğinizi şurada Streamlit ve Gradio için ayrı ayrı blog’larda ele aldım, bunların kodları da blog’larda mevcut.

AutoML Araçları

AutoML çoğu zaman sihirli bir çözüm gibi algılansa da aslında makine öğrenmesi mühendisinin takım çantasında olması gereken araçlardan biri. Bu araçlar üzerine iyileştirme yapabileceğiniz temel bir çözüm sağlıyor aslında. İki tane AutoML mühendisiyle podcast yaptım, AutoML neler yapabilir neler yapamaz kendilerine sordum, aşağı yukarı aynı cevaplar aldım. Bu platformların çoğu şirketler için ücretli çözümler olarak veriliyor çünkü modellerinizi eğitip içinde hiperparametre optimizasyonu yapıyorlar, bu da çok uzun GPU saatleri ve yüksek maliyetler demek, bildiğim kadarıyla kişisel kullanım için ücretsiz planları da var. Bunların arkasında yatan arama algoritmalarını yazmak da ustalık gerektiriyor, kişisel görüşüm açık kaynak AutoML araçlarının (TPOT gibi) yeterince iyi çalışmadığı (bu örnekte arkada genetik algoritma çalışıyor, ne yaptığınızı gerçek anlamda bilmiyorsanız kullanmanızı önermem).

Başka araçlara göz atmak isterseniz Kaggle’ın bu sene için yaptığı veri bilimcisi anketine bakabilirsiniz.

Bu yazıyı bir iki sene sonra güncellenen araçlarla tekrar yazmayı planlıyorum. Okuduğunuz için teşekkür ederim.

Sorumluluk reddi beyanı: Bu yazı işverenimin (Hugging Face) görüşlerini içermemektedir.

My reviews on Machine Learning, Data Science and Statistics books

Merve Noyan — Thu, 09 Sep 2021 09:47:41 GMT

I receive questions on content that explains machine learning, statistics or data science on a daily basis. I usually learn from books, so I wanted to write a post about the resources I used, some of them e-books. I’ve finished some of the books, and some of them are in queue. I’ve categorized the books according to topic it covers, and it’s important to note that each book suits you according to your background, e.g. you might be a stats graduate trying to learn machine learning and I might suggest you books on CS or you might be self-taught person that needs to learn the theory. Without further ado, let’s review!

Books on Machine Learning/Deep Learning

Introducing MLOps: Although this book is not considered a technical book (there’s no code inside), there is very valuable information on how to manage machine learning processes in production, I think entry-level and junior machine learning engineers can benefit from it.You can get the free PDF of the book here: https://pages.dataiku.com/oreilly-introducing-mlops

AI and Machine Learning for Coders (Laurence Moroney): This book is considered to be more beginner level compared to Geron’s book, explains machine learning to software engineers who want to make an MVP with machine learning models, it’s particularly good for people with software developer background, but if you are advanced it may not work for you.

Introduction to Machine Learning with Python — Andreas Müller: I finished this book back when I was trying to learn ML best practices, I call this book “the sklearn book”, it doesn’t cover data science that much, it’s more on machine learning (mostly statistical learning) algorithms, it’s great because it covers best practices on supervised and unsupervised learning. It doesn’t cover the theory that much though, but it’s still a good book for people who aren’t interested that much in the theory of the algorithms and want to train a model. There’s a small chapter on deep learning as well.

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow — Aurelien Geron: I’ve finished this book on TF 1.0, I just bought it for 2.0, it’s one of the best books I’ve read on implementation, it includes both the theory and practice of machine learning and deep learning, but it doesn’t cover the theory as much as the below book (Deep Learning) does. But the good thing is there is the deployment part, it covers ML on the Edge and cloud, and has a section on reinforcement learning with TF-Agents which most tensorflow books don’t cover. It covers all of the tensorflow ecosystem and I refer to this book a lot.

Deep Learning — Bengio, Courville, Goodfellow: It may be the only book (at least I’ve seen) that delves into the theory of deep learning this deeply. There’s everything from linear algebra to GANs, it gives you the fundamentals and teaches you the advanced part but it doesn’t cover code. I’ve read this, but it’s not an easy to read book (like below one, Deep Learning with Python). There is no practical part, you have to use the book below for this.

Deep Learning with Python — François Chollet: I finished this book and applied all the codes inside, it covers all of the Keras ecosystem. It may be one of my favorite books as it helped me learn deep learning (I’m a person with operations research background so all I knew was statistical learning algorithms back then). The theory of each algorithm is explained in a simple way and paired with the code.

Artificial Intelligence A Modern Approach — Peter Norvig & Stuart Russell: I would call this the “AI book”, I have the third edition, fourth edition might cover machine learning more. It does have a section for machine learning and deep learning but those chapters are really small. This book covers artificial intelligence algorithms, search algorithms, decision making based on AI and reinforcement learning. Also, it was weird to see that it has no dedicated chapter on unsupervised learning. But it’s the best book on AI, I also read the book of Tom Mitchell and this book helps you comprehend everything compared to that book, it’s also relatively easy to read compared to Deep Learning book above.

Natural Language Processing with Python — Bird, Klein & Loper : It’s a book that I refer from time to time when I’m trying to solve NLPproblems, as it covers the whole NLTK library, it tells you what your workflow should be depending on the problem, each problem covers a different thing, e.g. segmentation is a problem on it’s own but it’s a pre-requisite for POS tagging and POS tagging is a prerequisite for relationship extraction and it guides you through all of the process. It’s a good starting point if you’re new to natural language processing. It also helps you model a problem which is a great know-how in my opinion.

Generative Deep Learning, Teaching Machines to Paint, Write, Compose, and Play — David Foster : This is one of my favorite books for making fun side-projects. It explains how to generate data in various domains(image, sound, text), covers architectures like CycleGAN, VAE, DCGAN, and even language models such as GPT-2 for text generation.

Statistics, Data Science

Python Data Science Handbook: Essential Tools for Working with Data — Jake VanderPlas: I enjoyed reading this book a lot and it was one of the books that I started from the beginning and finished completely. There are all the steps to take raw data with pandas and sklearn and manage the process of exploratory data analysis, data preprocessing, model building, hyperparameter optimization. It’s quite comprehensive for end-to-end data science workflow, and I’d highly recommend this book.

Naked Statistics — Charles Wheelan: This is one of my favorite books, I wish I came across this while I was studying in the bachelor’s. A very entertaining and intuitive book that explains the core concepts of statistics, it also covers a bit of machine learning. I remember finishing it in two or three days, but it’s not a book you can learn statistics completely, it just gives you intuition.

The Art of Statistics (How to Learn from Data) — David Spiegelhalter: This book is very similar to Naked Statistics, but it’s more advanced. The author explains statistics in an entertaining way. Unfortunately, this is not a book where you can learn statistics (it’s not enough to apply statistics, in my opinion), but it does answer most of the statistics-related questions, like why we need it and how can we use it.

I have this below book if you’d like to study statistics.

Probability and Statistics for Engineers — Jay L. Devore: It’s a textbook where you can learn probability and statistics from scratch. I’m usually very skeptical of intuitions (though I like them) so I usually prefer to study and solve problems and I finished this book in one semester, though I passed the class with a BA (that is completely my fault :’)). It’s one of the most comprehensive books I’ve read.

Cognitive Science:

The Book of Why — Judea Pearl: This book covers causality, Judea Pearl explains why causality should not be confused with correlation and how you can distinguish between the two. I read some of it but had to take a break because of my exams.

Thinking, Fast and Slow — Daniel Kahneman: Daniel Kahneman is one of the two best in the cognitive science field, along with Amos Tversky, and he received the Nobel Prize in economics. This book explains the cognitive biases we have, and people’s behavior when they think fast. I’ve heard of this book from the data science lecturer at the master’s, as he said it’s good to read this to understand people’s behavior when you want to make an inference from the data.

Python:

Data Structures and Algorithms in Python — Tamassia, Goodrich, Goldwasser: I’m not a computer science graduate so I sat down and took the data structures and algorithms class, and I met this book. The main reason I took this course is that good companies often interview you for data structures and algorithms, but later on I realized how necessary this is for engineer-like thinking. I have to admit that every time I read it, it changes my perspective on these concepts. It is both a theoretical and a practical book, it contains the theory and application of each algorithm, in Python, which I recommend you to learn DA&A with, as it’s an easy language and algorithms and data structures doesn’t really change from language to language.

Flask Web Development: Developing Web Applications with Python — Miguel Grinberg: I loved this book because I had no idea about development. The book covers development as a concept (e.g. databases, requests, bootstraps, security) then walks you through you how can implement it in Flask. I still refer to it from time to time and recommend this book if you want to develop something and you have no idea on development.

Manga Guides

I loved the manga guides, they keep your attention and explain the concepts visually. I mainly read the one for linear algebra (oh how I wish I read it when I was in bachelor’s), statistics and microprocessors. I will read the ones for databases and cryptography when I make time. I recommend these series to anyone who gets bored during studying.

I’m currently reading Deep Learning for Coders with fastai & Pytorch by Sylvain Gugger & Jeremy Howard. Will add the review soon.

Some books in queue:

Learning Tensorflow.js by Gant Laborde
Approaching (Almost) Any Machine Learning Problem by Abhishek Thakur
The Most Human Human by Brian Christian

Python geliştirirken en çok kullandığım araçlar

Merve Noyan — Mon, 23 Aug 2021 17:56:14 GMT

Merhabalar, daha önce gelen talepler üstüne bu yazımı kaleme alıyorum. İşim gereği python’da backend de geliştirmem gerekiyor ve veri bilimi çıkışlı biri olarak zorlandığım durumlar olsa da hayatımı kolaylaştıran araçlardan bahsedeceğim. Öncelikle sevdiğim python kütüphanelerinden, sonra environment tercihimden ardından da IDE ve text editörü seçimlerimden bahsedeceğim.

Kütüphaneler

Flask

Flask kütüphaneden ziyade bir Python’la backend geliştirme platformu denebilir. Django’dan daha esnek bulduğum için Flask kullanıyorum. Öğrenirken de Miguel Grinberg’ün “Flask Web Development: Developing Web Applications with Python” kitabını baştan sona okuyup uygulamaya çalışarak öğrendim. Eğer backend hakkında hiçbir fikriniz yoksa bu kitap size baştan güvenlik, logging, database’ler, istekler ve daha fazlası hakkında gereken temel bilgiyi de sağlıyor. Bunların hiçbirinden bu yazıda bahsetmiyorum, bu yazı daha çok yardımcı araçları içeriyor.

Postman (ve Responses)

Postman geliştirdiğiniz API’ya istekler göndermenizi kolaylaştıran bir araç. Postman’dan önce kendi yazdığım unit test’leri otomatize etmeye çalışarak her güncellemeden önce test yapıyordum, bu arada bunun için ayrı bir kütüphane kullanıyordum, adı da responses. Responses kütüphanesiyle basit şekilde test yazıyorsunuz, içinde requests’le isteğinizi gönderip çıktısını istediğiniz cevabı döndürüyor mu diye assert ettiriyorsunuz, linki koyduğum dokümentasyonda güzel örnekler var. Oturup buna vakit harcamak yerine isteklerimi Postman’in altında bir koleksiyon olarak tutmaya karar verdim, bu işimi fazlasıyla kolaylaştırdı.

Pdb

Pdb (python debugger) işimi en çok kolaylaştıran araç olabilir. Hata fırlattığınız kodun bir üst satırına import pdb;pdb.set_trace() yazıp kodunuzu terminal’de çalıştırarak pdb’nin bulunduğu yere kadar çalıştırıp, pdb’nin bulunduğu yerde kodunuzu durdurup değişkenlerin o anda aldığı değerlere, istek atılıyorsa cevabına, o an olan herhangi bir şeye bakabiliyorsunuz, ardından kodunuzu n diyerek satır satır çalıştırarak devam ediyorsunuz.

Black

Black’i huggingface’in açık kaynak datasets (kütüphane) etkinliğinde keşfettim. Black’te sadece bir satır komutla kodunuzu formatlıyor, bu da CI/CD için (özellikle açık kaynak projelerde) çok kullanışlı bir şey. Kendi projenize katkı sağlayacak kişiler için kullanabilirsiniz. Her commit’ten önce çalışması için otomatikleştirebiliyorsunuz bu arada.

flake8

flake8'in black’ten farklı sizin hatalarınızı (kodunuzun davranışından ziyade format, kullanılmayan kütüphane ve değişkenler) göstermesi. Daha önce pycodestyle ve pyflakes’i ayrı ayrı kullanmış olanlar için daha kullanışlı çünkü bu ikisinin üstüne yazılmış bir kütüphane.

Ngrok

Ngrok herhangi bir uygulamayı kendi bilgisayarınızda host ederken başka insanların da buna erişebilmesini sağlayan bir servis. Bunu genelde chatbot için yaptığımız veri etiketlerini cloud’da çalıştırmaya ve insanlara erişim vermeye üşendiğimde kullanıyorum, müşterilerinize demo yapmak için de kullanabilirsiniz, böylece insanlara zorla ssh açtırmanız ve teknik detaylarıyla uğraşmanız gerekmez.

Gazpacho

Kaynak: Gazpacho’nun kendi PyPI sayfası

Gazpacho’yu calmcode’dan öğrenmiştim. Bu kütüphanenin iyi tarafı kendisine ait dependency’lerinin olmaması. Basit şekilde veri kazıma (scraping) için kullanıyorum.

Pyenv

Sanallaştırma projenizin dependency’lerini ve python versiyonlarını kolay yönetmenizi ve başka insanların da projenizi sorunsuz çalıştırmasını sağlar. Bilmeyenler için sanallaştırma eğitim serisini altı çizili şekilde link’e koydum. Ben sanallaştırma için pyenv kullanıyorum. Farklı sanal ortamlar için bir cheatsheet oluşturdum, içinde komutlar da mevcut. Bir proje için python 3.7.6 kullanıyordum ve conda’da bunun gibi bazı ara versiyonların yüklenmediğini farkettiğim için pyenv kullanmaya başladım.

Yazıya zamanla ekleme yapacağım.

Bütün bu araçların ve kütüphanelerin ve daha fazlasının rehberlerinin host edildiği calmcode.io’ya ve geliştiricisi Vincent’a teşekkür ederim.

Yazılımcı olarak yurtdışı/uzaktan iş bulma üzerine

Merve Noyan — Mon, 03 May 2021 13:58:02 GMT

Kaynak: Geekbot

Geçtiğimiz ay, Facebook’ta çalışan Android geliştiricisi Hadi Tok ve Superpeer’a uzaktan çalışmış geliştirici ve tasarımcı Adem İlter’le “yurtdışında yazılım tarafında iş bulma” temalı bir discord yayını yapmıştık, bu yayında sesli şekilde soru aldığımız için yayını kaydetmedim, bundan ötürü yazan çok fazla kişi oldu. Hem yayında neler konuşulduğunu anlatmak hem de Birleşik Krallık’ta bir scale-up’ta çalışan biri olarak kendi deneyimlerimi anlatmak istedim.

Şansın fırsat maksimizasyonu üstüne kurulduğuna inanıyorum, gerçekten işi hakettiğinize inanıyorsanız fırsatlarınızı arttırmaya bakın. Hadi ve Adem’le ilk konuştuğumuz şey işi en iyi şekilde yapmak gerektiğiydi. Yurtdışındaki iş piyasası çok rekabetli olduğu için “neden beni almalılar?” sorusuna iyi bir cevabınızın olması gerek. Türkiye’de okuldan çıkar çıkmaz iş deneyimi olmadan yurtdışında çalışmak istediğini belirten arkadaşlar görüyorum, bu çok zor bir ihtimal olmakla birlikte önce Türkiye’de iş öğrenip ardından uzaktan ya da yurtdışında iş bulmakta fayda var çünkü şirketler giriş seviyesi işlere kendi ülkelerinden çalışan almayı tercih ederler. “Hiç işe girmedim, Türkiye’de işe girmek için ne yapmam gerek?” diye sorarsanız, kendi portfolyonuzu oluşturup işverenlere ihtiyaçları olan araçları yeterince bildiğinizi göstermeniz gerek. Yaptığınız uçtan uca projeleri (bu bir veri bilimi projesi olabilir ya da bir mobil uygulama olabilir) github profilinizde iyi bir README ile sergilemek, iyi şekilde kodunuzun dokümantasyonunu yazmak bile artıdır. Medium gibi platformlarda işinizle ilgili makaleler yazmak da iş görür. Bunun dışında Türkiye’deki ve Dünya’daki yazılım topluluklarına katılıp işverenlerle bir araya gelmek, sizin gibi insanlarla tanışıp etkinliklere katılmak, bootcamp’lere katılıp proje yapıp bunları sergilemek, ağınızı genişletmek fırsatlarınızı arttırmanıza yardımcı olur. Açık kaynak kodlu platformlara sahip şirketlerin açık kaynak sprint’lerine katılmak ya da github’larında açık issue’larını çözmek de o şirketlerde iş bulmanızı sağlar, zamanında bu şekilde çok sevdiğim bir şirketten teklif almıştım. Bunun dışında start-up/scale-up’ta çalışan insanlara ulaşıp pozisyon var mı diye sorabilirsiniz, başvuru yaparken daha küçük şirketlere başvurmanız hem hızlı şekilde işe girmenize hem de çok iş öğrenmenize yardımcı olur.

Ben nasıl işe girdim? Neden yurtdışındaki bir şirket için çalışmayı seçtim? Özellikle çalıştığım alanda rekabet çok fazla, yüksek lisans ya da doktora mezunu veri bilimci aranıyor. Ben bu bariyeri herkesin bilmediği bir chatbot framework’üyle (Rasa) çalıştığım için aştım ama bundan önce bir Türk firmasıyla çalışmıştım zaten, direkt giriş seviye bir işe girmedim. Bunun dışında iş aradığım dönemde bana bir kelle avcısı ulaştı, kendisinin İngiltere’deki start-up ve scale-up’lar için alım yapan bir şirketi var. Şansımı deneyeyim dedim ve hem kültürü hem de maaşı iyi bulduğum için yoluma şu anda çalıştığım şirketle devam ettim. Çalıştığım şirkette dünyanın her yerinden insan var ve gerçekten hoşgörülüler, anlayışlılar, ve güzel bir kültür mevcut. Eğer daha az maaş teklif etselerdi de burada çalışırdım sanıyorum.

Deneyimli biri olarak nasıl yurtdışında iş bulabilirim? Avrupa’daki işe alım sitelerine bakabilirsiniz. Bu linkte beni bulan kelle avcısının blog postu var, blog’da bir dizi websitesi var, bu sitelere üye olup iş bakabilirsiniz. Daha hızlı sonuç almak istiyorsanız şansınızı önce start-up’larda denemenizde fayda var. Uzaktan işe girerken girdiğiniz işin Dünya’nın başka yerlerinden işe alım yaptığına dikkat edin çünkü iş ilanlarında uzaktan dense bile işverenler kendi ülkelerinden birilerini almak isteyebiliyorlar. Linkedin’den kelle avcılarını ekleyip iş aradığınızı söyleyebilirsiniz, eninde sonunda size uygun bir iş bulunca geri dönüş yaparlar. Bunun dışında başvurabildiğiniz kadar işe başvurun, ben zamanında deneme amaçlı çok fazla şirkete başvurmuştum ve çoğu dönüş yaptı.

Peki yurtdışında ya da uzaktan işlerde maaş, vergilendirme, sigorta nasıl oluyor?

Yurtdışında ya işçi (employee) ya da kontratlı (contractor) şekilde çalışabiliyorsunuz. İşçi olarak çalışmak için her şeyden önce çalıştığınız şirketin bulunduğu ülkede çalışma izninizin olması gerek, böyle bir durum olduğunda maaşınızdan vergi o ülkede kesilir, Türkiye ve diğer ülkeler arasında çifte vergilendirmeyi önlemek için anlaşmalar mevcut. Eğer kontratlıysanız çalıştığınız şirkete aylık fatura kesersiniz ve vergi işlerinizi kendiniz halletmek zorundasınız. Ben dahil çok fazla insan bunu şahıs şirketi kurarak çözüyor ama yasada belli başlı açıklar olduğu için bu yolu seçmeyen çok fazla insan var. Siz şirket kurup uğraşmayı seçerseniz de şahıs şirketi masrafları ve muhasebeci masrafını maaşınız kaldıracaksa seçin (ki çok büyük masraflar değiller). İşveren konumunda olduğunuz için aylık Bağkur ödüyorsunuz. Güzel tarafı eğer gitme gibi bir niyetiniz varsa işe uzaktan kontratlı olarak başlayıp bir süre sonra çalıştığınız şirketten işçi vizenize sponsor olmalarını isteyip taşınabilirsiniz.