Driven by curiosity and a passion for solving real-world problems with data. Comfortable across the full data lifecycle — from ingestion and preprocessing to modeling, deployment, and insight delivery. Proficient in Python for data analysis and machine learning, with experience in cloud platforms (AWS, Azure, GCP) and data visualization for clear communication.
Currently seeking a Data Scientist or Data Engineering role where I can apply my analytical, programming, and communication skills.
Building a scalable pipeline for processing and analyzing network flow data, with a focus on anomaly detection and bot activity.
- ⚙️ Ingest and sample large network datasets with Polars
- 🧱 Transform raw flow logs into feature-rich tabular format
- 🛠️ Develop modular ETL pipeline for local or streamed flow data
- 🧠 Integrate anomaly detection and classification models
(e.g. Isolation Forest, LOF, Random Forest, LGBM) - 🧪 Evaluate under real-world class imbalance
- 📊 Working with large tabular datasets
- 🧮 Handling class imbalance in cybersecurity contexts
- 🔍 Anomaly detection techniques
- 📈 Practical ML evaluation
- 🧰 Prototyping realistic data pipelines
and that's Ritchie Vink - creator of Polars with my graffiti:
-
👨💻 All of my projects are available at https://github.com/anopsy
-
📑If you'd like to hire me, check my CV
-
📫 How to reach me [email protected]
-
⚡ Fun fact 🎨 I paint graffiti portraits
🎨 Selected Projects ┣━━ Data Science Content Intern at NannyML: ┃ ┣━━ 📈Post-Deployment Data Science blogs ┃ ┃ ┣━━ 📉Data Quality and Covariate Shift ┃ ┃ ┗━━ 🌀Models aren't Forever ┃ ┣━━ contributed to the Research team on Anomaly Detection by evaluating multiple detection algorithms and generating synthetic datasets ┃ ┗━━ contributed to docs ┃ ┣━━ PyData and PyLadies Con speaker and volunteer at: ┃ ┣━━ 💽PyData Amsterdam 2024 Talk-Alice in Open Source Land ┃ ┣━━ 🤖PyLadiesCon 2024 Talk ┃ ┗━━ 🏃PyData Open Source Sprint ┃ ┣━━ Contributed to OSS at: ┃ ┣━━ 🧱scikit-lego ┃ ┃ ┣━━ contributed to docs ┃ ┃ ┗━━ made ColumnSelector dataframe agnostic using Narwhals ┃ ┗━━ 🐳🦄narwhals ┃ ┃ ┣━━ worked on pyarrow/dask backend implementation ┃ ┃ ┗━━ contributed to docs and tests ┃ ┗━━ 💡embetter ┃ ┣━━ deprecated a method ┃ ┗━━ added pre-commit hooks ┃ ┣━━ Juniors_vs_ChatGPT ┃ - Did ChatGPT replaced Juniors and Interns? ┃ ┣━━ data cleaning ┃ ┣━━ data wrangling ┃ ┣━━ data analysis ┃ ┣━━ modeling ┃ ┗━━ python🐍/API/polars🐻❄️/hvplot📊 ┃ ┣━━ Compensation Prediction ┃ - How much do Engineers earn? ┃ ┣━━ data modeling ┃ ┣━━ model evaluation ┃ ┣━━ containerization using docker ┃ ┣━━ building streamlit app ┃ ┗━━ python🐍/scikit-learn/streamlit📈/docker📦 ┃ ┣━━ MaskMap: Decoding the Hidden Spectrum ┃ - Prototype of a diagnosis support tool using the power of NLP to identify symptoms of Autistic Masking ┃ ┣━━ data scraping ┃ ┣━━ data cleaning ┃ ┣━━ modeling ┃ ┣━━ deploying ┃ ┗━━ python🐍/pandas🐼/FastAPI ┃ ┣━━ Equity in Healthcare: Women in Data Science Datathon 2024 ┃ - WIDS Datathon Project predicting a timely diagnosis of Metastatic Cancer ┃ ┣━━ data cleaning ┃ ┣━━ data wrangling ┃ ┣━━ data analysis ┃ ┣━━ modeling ┃ ┗━━ python🐍/pandas🐼/ensemble🌳/keras🧠 ┃ ┣━━ Relative Search Volumes Analysis ┃ - Search Volumes for Autism vs Autism Spectrum Disorder around the world ┃ ┣━━ data scraping ┃ ┣━━ data cleaning ┃ ┣━━ modeling WIP ┃ ┗━━ python🐍/pandas🐼 ┃ ┣━━ Steelplate Defect Visual EDA ┃ - Colorful joyplots for Visual EDA ┃ ┣━━ data visualization ┃ ┣━━ ensemble ┃ ┗━━ python🐍/pandas🐼/xgb🌳/seaborn🎨 ┃ ┣━━ hossenfelder - 🦺WIP ┃ - Data Analysis and Prediction of views on Sabine Hossenfelder YT channel ┃ ┣━━ data scraping ┃ ┣━━ data cleaning ┃ ┣━━ modeling WIP ┃ ┗━━ python🐍/pandas🐼 ┃ ┗━━ MyFalaClassifier - 🦺WIP - Detector of surfable waves ┣━━ live-stream scraping ┣━━ image processing ┣━━ transfer learning ┣━━ deploying ┗━━ python🐍/keras🧠


