Introduction

OxyKoditOxyKodit

OxyKodit helps you unlock the power of your texts and data by implementing tailored AI solutions, ranging from Natural Language Processing (NLP) techniques to Large Language Models (LLMs) and Machine Learning (ML). Every single project is delivered on the timeline we agree upon in the initial contract, within the budget assigned to the specific deliverables and milestones we’ll set out for your specific needs.

Being the founder, resident Data Scientist, chief Machine Learning Engineer, full-stack Software Developer and head of Quality Assurance all at once, I am passionate about ensuring code robustness and quality through rigurous testing, introducing proper levels of abstraction and battle-testing solutions. With a Master in Computer Science (software engineering) and a PhD in the field of BioNLP, I have been working in the domain since 2006 and I’m familiar with both “traditional” NLP techniques as well as cutting-edge LLMs. I love designing custom algorithms and models for specific business use-cases, diving into the peculiarities of the application domain and data, and identifying ways to introduce efficiencies across business operations.

Email me to start a conversation about how we can achieve your data-driven goals together. I look forward to discussing your project!

Sofie Van Landeghem

ToolsProject opportunities

As a non-exhaustive list, OxyKodit offers the following types of projects. Each can be set up as a separate project, depending on the current phase and requirements of your specific use-case:

  1. AI proof-of-concept: Identify ways to integrate Artificial Intelligence into your business and domain by implementing a tangible PoC using public data and/or confidential in-house data.
  2. Production-grade code: Ensure your code base is ready for production by performing code reviews, extended quality assurance and making sure there is a robust unit testing framework in place.
  3. Open-source maintenance: Implement features, fix bugs, review PRs, organize issues, engage the community and answer discussion threads.


1. AI proof-of-concept

  • Perform a thorough analysis of a specific business case, including performance requirements.
  • Analyse the available data sources in terms of quantity, quality and predictive signal.
  • Implement an annotation framework to produce, in collaboration with the business partners, a realistic training and/or evaluation dataset.
  • Determine a suitable strategy to coordinate relevant LLM models, NLP algorithms and data integration methods.
  • Analyse the trade-off between using closed LLMs (through an external API) or open-source models (deployed in-house).
  • Example data sources: news, research articles, patents, contracts, clinical trials, physician notes and EMRs, social media, customer requests, …
  • Example NLP components:
  • Example projects I have worked on:
    • Implemented a large-scale text mining framework called EVEX to identify biomolecular events in millions of research articles.
    • Analysed a set of legal documents to identify similar paragraphs and sentences, and used NLP and clustering techniques to implement a template generator that can significantly reduce required editing time for a new document.
    • Designed and implemented an NLP strategy to mine diagnostic reports, identify relevant information and summarize patient characteristics through named entity recognition and relation extraction techniques.
    • Implemented the data model and helped design the graphical user interface for an annotation framework focused on capturing the molecular mechanisms of leaf growth and development in the Arabidopsis plant.
    • Created a novel framework Diffany to analyse the rewiring of biomolecular interactions under stress conditions such as plant drought or human cancer.
    • Implemented an optimization framework to create hyper-personalized cocktails and mocktails according to user preferences.


2. Ensure your code base is ready for production

  • Perform rigorous testing and assess the current quality of both the data and the code base.
  • Identify structural errors (if any) in the dataset and/or annotation guidelines.
  • Make the code more robust and more performant in terms of speed and memory usage.
  • Tune the algorithms in terms of predictive performance to make them more accurate and reliable.
  • Iterate on both data and the ML models.
  • Make sure there is a robust unit testing framework in place.
  • Example projects I have worked on:
    • Performed an in-depth review of an existing code base to identify opportunities to increase performance, both in terms of efficiencies (memory usage, throughput) and accuracy (correctness of the results).
    • Experimented with various hyperparameter tuning experiments as well as different architectures to optimize F-score of supervised ML models.
    • Performed a critical evaluation of existing tools to recognize mentions of cell lines in text, and developed two new annotated datasets to further boost development of NLP algorithms in this domain.
    • Analysed an NER dataset from a customer and identified structural ambiguities and conflicts. Refined the annotation guidelines accordingly and trained new models on the curated dataset, obtaining much more robust and accurate results.


3. Open-source maintenance

  • Extend the code base with new functionality according to specific feature requests.
  • Fix self-encountered and user-reported bugs.
  • Review PRs with a keen eye for detail and ensuring maintainability and robustness of the code base.
  • Organizing and answering issues and discussion threads on Github.
  • Engaging with the community at large and providing guidance on contributing to open-source.
  • Example repositories I have worked on:
    • The NLP library spaCy and its ecosystem, including spacy-llm, spacy-transformers and the underlying ML library Thinc
    • The popular web framework FastAPI for building high-performance APIs
    • A research tool called Diffany to infer and visualize differential networks
    • Typer: a library for building CLI applications with Python easily
    • neuralcoref: a neural network framework for performing Coreference Resolution
    • A partially vibe-coded hobby project to support language learning through fun vocabulary exercises: LexiKon

BlogHighlighted Blogs

For a full overview of all blog posts go here.

11 June 2025


Title slide of the presentation.

Inside the black box: What open-weight LLMs today can (and can’t) do

The recent wave of open-weight LLMs are leveling the playing field – allowing developers to build robust, privacy-friendly and cost-efficient NLP applications. But which open model suits your application best?

31 May 2025


Video banner for Youtube

Data doesn’t lie — but it can mislead: How to ensure integrity of your ML applications

This talk explores the hidden story behind the performance metrics, moving beyond a single F-measure or accuracy score to delve into the intricacies of the data set and its domain.