Log inSign up
Andrej Baranovskij
5,451 posts
Image
user avatar
Andrej Baranovskij
@andrejusb
Sparrow Creator: Open-Source AI Doc Extraction πŸš€ | ML/Oracle Dev | @katana_ml | Try: sparrow.katanaml.io | github.com/katanaml
Katana ML πŸ‘‰
katanaml.io
Joined March 2010
154
Following
6,687
Followers
  • Pinned
    user avatar
    Andrej Baranovskij
    @andrejusb
    Nov 11, 2021
    We launched Katana ML katanaml.io in 2018 and now it is time to update the website, to explain where we are now and what we do with #MachineLearning, #MLOps, and #opensource πŸš€πŸš€πŸš€
    user avatar
    Katana
    @katana_ml
    Nov 11, 2021
    We have a new website - katanaml.io It explains what we do with ML in a simple and straightforward way. It is featuring our open source product Skipper, we are using it to run #MLOps. #MachineLearning #MLOps
    Image
  • user avatar
    Andrej Baranovskij
    @andrejusb
    Oct 6, 2022
    It took me ~1 hour to build this dashboard (data is dummy) layout in @streamlit. Using default Streamlit components. I think it would take 10 times longer with HTML/JS. Now I can focus on functionality πŸ‘, not on div alignment 🀣 Code: github.com/katanaml/sparr…
    Image
  • user avatar
    Andrej Baranovskij
    @andrejusb
    Apr 2, 2024
    Do you need LLM to output structured JSON? Then I recommend: 1. Hermes-2-Pro-Mistral-7B LLM (huggingface.co/NousResearch/H…) 2. @UnstructuredIO for data pre-processing and data preparation for LLM tasks 3. @langchain Pydantic parser to process LLM output (python.langchain.com/docs/modules/m…)
    Image
    45K
  • user avatar
    Andrej Baranovskij
    @andrejusb
    May 6, 2024
    You Don't Need RAG to Extract Invoice Data Complete video: youtu.be/watch?v=_GoGdF… Code: github.com/katanaml/sparr… Documents like invoices or receipts can be processed by LLM directly, without RAG. I explain how you can do this locally with @ollama and Instructor by @jxnlco.
    Image
    00:00
    52K
  • user avatar
    Andrej Baranovskij
    @andrejusb
    Jan 15, 2024
    πŸš€ FastAPI and LlamaIndex RAG: Creating Efficient APIs πŸ”₯ Complete video: youtu.be/watch?v=vntNI3… Code: github.com/katanaml/sparr… FastAPI works great with LlamaIndex RAG. In this video, I show how to build a POST endpoint to execute inference requests for LlamaIndex. RAG
    Image
    00:00
    94K
  • user avatar
    Andrej Baranovskij
    @andrejusb
    Apr 18, 2022
    I'm impressed with the OCR quality by Mindee docTR. This is an open-source solution and can be easily integrated into document processing pipelines. Well done @MindeeAPI #Python πŸš€ docTR GitHub: github.com/mindee/doctr Full video with OCR tests: youtu.be/3nYPIDCToes
    Image
    00:00
  • user avatar
    Andrej Baranovskij
    @andrejusb
    Nov 30, 2023
    Running Starling-7B LLM model on local CPU with @Ollama_ai and getting great results for invoice data extraction, even better than Zephyr, Mistral or Llama2. Prompt: retrieve gross worth value for each invoice item from the table. format response as following {\"gross_worth\":
    Image
    79K
  • user avatar
    Andrej Baranovskij
    @andrejusb
    Sep 25, 2023
    LLama 2 LLM for PDF Invoice Data Extraction I show how you can extract data from text PDF invoice using LLama2 LLM model and explain how you can improve data retrieval using carefully crafted prompts. Complete video: youtube.com/watch?v=WGNpdv…
    Image
    00:00
    39K
  • user avatar
    Andrej Baranovskij
    @andrejusb
    Jan 8, 2024
    Transforming Invoice Data into JSON: Local LLM with LlamaIndex & Pydantic πŸš€ Complete video: youtu.be/watch?v=VKeYaI… Code: github.com/katanaml/sparr… I explain how to get structured JSON output with LlamaIndex and dynamic Pydantic class. This helps to implement the use case of
    Image
    00:00
    148K
  • user avatar
    Andrej Baranovskij
    @andrejusb
    Oct 22, 2023
    πŸ”₯ Invoice Data Processing with Llama2 13B LLM RAG on Local CPU πŸš€ I explain production LLM RAG setup with Weaviate @philipvollet , Haystack for LLM API, and Llama.cpp to run Llama2 13b. Give it a try πŸ§‘β€πŸ’» Full video: youtu.be/XuvdgCuydsM Code: github.com/katanaml/llm-r…
    Image
    00:00
    47K
  • user avatar
    Andrej Baranovskij
    @andrejusb
    Apr 19, 2024
    JSON structured output from invoice document with Llama3:8b Instruct. Running in Sparrow RAG with @llama_index, @weaviate_io and @ollama πŸš€ @katana_ml
    Image
    42K
  • user avatar
    Andrej Baranovskij
    @andrejusb
    Jun 10, 2024
    Effective Table Data Extraction from PDF without LLM Sparrow Parse helps to read tabular data from PDFs, relying on various libraries, such as Unstructured or PyMuPDF4LLM. This allows us to avoid data hallucination errors often produced by LLMs when processing complex data
    Image
    00:00
    28K
  • user avatar
    Andrej Baranovskij
    @andrejusb
    Jun 17, 2024
    Avoid LLM Hallucinations: Use Sparrow Parse for Tabular PDF Data, Instructor LLM for Forms LLMs tend to hallucinate and produce incorrect results for table data extraction. For this reason in Sparrow we are using Instructor by @jxnlco structured output for LLM to query form data
    Image
    00:00
    21K
  • user avatar
    Andrej Baranovskij
    @andrejusb
    Mar 25, 2024
    LLM Structured Output for Function Calling with Ollama Complete video: youtu.be/watch?v=_-FrUR… Code: github.com/katanaml/sparr… I explain how function calling works with LLM. This is often confused concept, LLM doesn't call a function - LLM retuns JSON response with values to be
    Image
    00:00
    22K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

TermsΒ·PrivacyΒ·CookiesΒ·AccessibilityΒ·Ads InfoΒ·Β© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement