Knowledge is power

CREDENTIALS FOR DEMO (https://frontend.hz.siriusfrk.me/)

Username: backend

Password: QWE!#2asdzxc

Inspiration

Large international companies generate enormous amounts of data including datasheets, reports, emails and more. This data is often in different languages and can be structured or unstructured, and sometimes even hidden. The challenge is to extract valuable information from this sea of data, which demands substantial efforts and innovation. Recognizing the vast potential that the modern AI technologies harbor, the team was inspired to create a solution that facilitates a novel interaction with data, leveraging it to fetch useful insights and make data handling more efficient.

What it does

The developed system leverages artificial intelligence to process and transform vast arrays of information available in different formats. It extracts specific types of data like texts, images, and tables, which are then converted to embeddings (vectors). The system utilizes modern models to extract facts from the text and images and parse tables into databases. Furthermore, it offers a combined search feature where ML models work with text and user requests, supported by LLM (Language Models) to provide accurate responses to search queries, significantly enhancing the search experience. Additionally, it supports multi-language interactions, including fetching responses in different languages such as German.

How we built it

The project involves utilizing artificial intelligence methods to initiate a transformation process where data from various formats is converted into embeddings using potentially a stack of different technologies. These embeddings then facilitate a powerful search experience, supported by prompt engineering and LLM to offer precise answers to search queries.

Challenges we ran into

Handling vast amounts of data in various formats and languages, and extracting useful information from hidden data were substantial challenges. Optimizing the system to work efficiently and reduce costs while maintaining privacy could also have been potential challenges.

Accomplishments that we're proud of

We successfully implemented a substantial part of the combined search feature, with functioning ML models that can process text and user requests effectively. We have also introduced a feature where every answer contains links to the documents used to produce the answer, enhancing the trust in the ML's responses. The system offers multilingual support, adding to its versatility and utility in a multicultural corporate environment.

What we learned

We learned to extract various types of data from raw datasets and convert them into embeddings. Through the development process, they acquired knowledge in utilizing prompt engineering and enhancing the LLM's capabilities to provide accurate responses without requiring training. We also gained insights into optimizing metrics to reduce costs and simplify data search across different languages.

What's next for Knowledge is power

Looking ahead, there is room for further improvements in the system, including the integration of Computer Vision (CV) models and structured data handling like tables, graphics, and audio to provide a more potent search experience. Moreover, they plan to work on optimizing metrics further to reduce costs and simplify data access in various languages, potentially setting up the system on local hardware to ensure privacy. The ultimate goal is to create a tool that is cheaper than human labor and facilitates multilingual interactions, thereby optimizing the business operations in large multinational corporations.