SynthesisAI Documentation

Introduction

SynthesisAI is an advanced research synthesis platform that merges the power of Large Language Models (LLMs) and Knowledge Graphs to transform the research process. The platform enables researchers, academics, and professionals to efficiently analyze, synthesize, and generate insights from vast amounts of research data. Whether you are starting with a single PubMed article, exploring a new research topic, or analyzing large datasets, SynthesisAI provides an integrated solution for research synthesis that ensures accurate, context-aware results.

With its cutting-edge features such as Graph Retrieval Augmented Generation (RAG), semantic chunking, and knowledge graph integration, SynthesisAI facilitates quicker and more effective discovery, reducing time spent on manual tasks while increasing the depth and quality of research insights.


System Architecture

SynthesisAI is designed using a multi-layer architecture that enables efficient processing, analysis, and synthesis of research content. Below is a breakdown of each key component of the system:

1. Data Retrieval Layer

The Data Retrieval Layer focuses on acquiring data from various research databases and transforming it into a usable format for further processing.

  • Intelligent Search Processing

    • Leverages LLMs to generate precise MeSH (Medical Subject Headings) terms based on user input.
    • Retrieves targeted results from research databases like PubMed, Google Scholar, and others.
    • Enhances search results by adjusting parameters based on initial user inputs and feedback.
  • Database Integration

    • Seamlessly integrates with PubMed, Scopus, and other research databases to provide researchers with real-time access to articles.
    • Supports querying of large-scale repositories, enabling the platform to pull the most relevant articles and datasets.
  • Smart Query Generation

    • Automatically converts user-provided research interests into optimized search queries using intelligent algorithms.
    • Adjusts the queries based on ongoing feedback to ensure better result relevance.
  • Adaptive Search

    • Continuously refines search queries to improve result accuracy and relevance, adapting to user preferences.

2. Text Processing Layer

This layer is responsible for segmenting and analyzing the text data retrieved from academic papers, ensuring that it is broken down into manageable chunks for further analysis.

  • Section-Based Chunking

    • Identifies common patterns within research papers (e.g., introduction, methods, results, conclusions).
    • Segments text based on logical academic section boundaries, ensuring content is grouped semantically.
  • LLM-Based Chunking

    • Uses machine learning models to identify logical breakpoints in papers where standard chunking methods may not be sufficient.
    • Ensures coherent and meaningful chunks of text even for papers with unconventional structures.
  • Semantic Chunking

    • Employs Natural Language Processing (NLP) techniques to detect meaningful borders in sections of the text.
    • Ensures that chunked content maintains semantic consistency, which is vital for downstream analysis.

3. Embedding and Graph Generation Layer

In this layer, the processed text is converted into high-dimensional embeddings that allow the system to understand relationships between research concepts and build knowledge graphs.

  • Text Embedding

    • Utilizes Google’s text-embedding-004 model to convert the processed content into vector embeddings.
    • Embeddings facilitate comparisons based on semantic similarity, enabling the system to recognize related research concepts.
  • Graph Generation

    • Detects relationships between different research chunks using LLMs and creates nodes and edges that represent these relationships.
    • Builds an interconnected knowledge graph in Dgraph, representing the relationships between findings, methods, and concepts from the research papers.
  • Semantic Relationship Preservation

    • Ensures that relationships between different research entities (e.g., drugs, diseases, methodologies) are accurately represented in the graph.

4. RAG and Agentic Systems

At the heart of SynthesisAI’s analytical power lies its RAG and agentic systems, which drive the content generation and provide high-quality insights.

  • Retrieval-Augmented Generation (RAG)

    • Uses the knowledge graph to inform the generation of high-quality, context-aware content.
    • Incorporates relevant information retrieved from the graph to ensure generated outputs are factual, accurate, and insightful.
  • Agentic Systems

    • Enables parallel report generation, allowing multiple sections to be processed simultaneously.
    • Conducts comparative analysis across different research papers, facilitating deeper insights.
    • Automates the generation of MeSH terms and research highlights, making it easier for researchers to stay informed.

Technical Implementation

Core Technologies

  1. Modus Framework

    • The backbone framework for API development and database management.
    • Handles caching, model plugin integration, and auto-generation of GraphQL schema.
    • Manages database connections and ensures smooth data handling between components.
  2. Language Models

    • Meta Llama 3.1 (via Hypermode): Powers text generation and relationship extraction.
    • Google Gemini: Handles text embeddings and more complex writing tasks, ensuring the system performs at its best across different stages.
  3. Database Infrastructure

    • Dgraph: A high-performance graph database used to store and query knowledge graphs that link research concepts.
  4. Frontend Development

    • Next.js Framework: A modern React framework used to develop the frontend interface for SynthesisAI.
    • Responsive UI: Ensures the platform is accessible on all devices, from desktop to mobile.
    • Real-time Result Visualization: Presents research insights and graphs interactively, allowing users to explore the data.

System Flow

system flow

Key Features

  • Intelligent Research Processing

    • Automatic generation of MeSH terms for easy search and categorization.
    • Multi-source data integration from diverse academic repositories.
    • Adaptive and customizable chunking techniques.
  • Advanced Analysis Capabilities

    • Supports cross-paper comparative analysis for enhanced insights.
    • Guides users through the systematic review process using advanced AI-powered tools.
    • Detects intersections between research areas, opening new avenues for exploration.
  • Efficient Content Generation

    • Facilitates parallel report generation, drastically reducing time spent on manual writing.
    • Generates contextually accurate content and fact-checked summaries.
    • Provides automatic research highlight generation for quick takeaways.

System Workflow

  1. Input Processing

    • The user submits a research query, paper, or idea.
    • The system automatically generates relevant MeSH terms and prepares the database queries.
    • The user’s preferences and feedback help adjust query parameters in real-time.
  2. Content Processing

    • Retrieved content is processed using multi-tier chunking strategies (section-based, LLM-based, and semantic chunking).
    • Metadata and context are extracted from the retrieved content and refined for further analysis.
  3. Knowledge Graph Creation

    • The content is embedded into high-dimensional vectors using text embedding models.
    • Relationships between different chunks of text are identified, and a knowledge graph is created in Dgraph, connecting related concepts and findings.
  4. Content Generation

    • RAG enhances content generation by using context from the knowledge graph to produce fact-checked summaries and detailed reports.
    • Agent-based systems process various sections of the report in parallel for faster results.

Performance Optimization

SynthesisAI implements several performance-enhancing techniques to ensure that the platform remains fast and efficient even when handling large-scale data.

  1. Caching

    • Model response caching via Modus Collections helps to reduce latency, especially for frequently accessed queries.
    • Caching helps improve the system’s responsiveness.
  2. Parallel Processing

    • Sections of the research report are processed in parallel, ensuring that users receive results faster.
    • Embedding generation and knowledge graph updates occur concurrently, allowing for faster system updates.
  3. Smart Retrieval

    • The RAG system intelligently retrieves relevant information from the knowledge graph, ensuring that only the most pertinent data is used.
    • Optimized graph traversal ensures efficient navigation of large knowledge graphs.

Future Developments

SynthesisAI is continuously evolving. Potential areas for enhancement include:

  1. Enhanced Model Integration

    • The integration of additional LLM architectures to further enhance text generation and analysis capabilities.
    • The adoption of improved embedding techniques for even more accurate and efficient processing.
  2. Extended Database Support

    • Adding more research databases and repositories to provide a more extensive range of academic content.
    • Improvements in cross-database search capabilities and metadata handling.
  3. Advanced Analysis Features

    • Developing additional tools for comparative analysis, such as side-by-side visualizations and advanced filtering options.
    • Enhanced systematic review capabilities to help researchers organize and manage their findings more efficiently.

Challenges Faced

This hackathon presented me with the challenge of learning a new framework, and I had to dive deep into a lot of new concepts. I also took on the additional challenge of learning Golang during the past two weeks. While it was initially tough, it has been a fun and rewarding experience, especially thanks to the Modus platform that provided the resources and tools to make it possible.

Accomplishments We're Proud Of

  • Learning Golang: Gaining proficiency in Golang was a huge accomplishment and a valuable skill for future projects.
  • Integrating Modus Backend into My Frontend Application: Successfully incorporating the Modus backend with my frontend application was a key milestone.
  • Creating a Complex Schema with Dgraph: Designing a complex schema for my application using Dgraph was an intricate task that I’m proud of achieving.
  • Switching to Dgraph from PostgreSQL: Initially, I had planned to use PostgreSQL for user authentication and data storage. However, I decided to migrate to Dgraph for its scalability and performance, which turned out to be a great decision.

What We Learned

  • More About Golang: I gained a deeper understanding of Golang, its syntax, and how to use it effectively for building scalable applications.
  • Building Scalable Backends: I learned how to design and implement scalable backend architectures, making my application more efficient.
  • Integrating AI Models: I also gained insight into how to integrate artificial intelligence models seamlessly into my applications, opening up possibilities for future use cases.

What's Next for SynthesisAI

  • Expanding Application Capabilities: We're planning to expand the application's functionality from simple research writing to more advanced research and generation tasks using Retrieval-Augmented Generation (RAG).
  • Adding a Chat Interface: A chat interface will be added so users can interact with and explore the results of their research more intuitively.
  • Implementing Disparate Analysis: We aim to introduce disparate analysis capabilities to allow more detailed and varied data interpretations.
  • Connecting to Multiple Databases: Future updates will include the ability to connect to multiple databases, allowing us to pull in more data from diverse sources for better insights.
  • Building a Model Infrastructure: We plan to build the model infrastructure for Modus using LangChain, which will make it easy to build AI agents with Modus. This infrastructure will allow developers to leverage LangChain’s well-known capabilities in conjunction with Modus for better performance and flexibility, provided compatibility with TinyGo is ensured.

Conclusion

SynthesisAI represents a major leap forward in the field of research synthesis, leveraging advanced artificial intelligence and knowledge graph technology to streamline research analysis. Its multi-layered architecture, intelligent processing layers, and cutting-edge tools enable researchers to work more efficiently and effectively. By combining these capabilities, SynthesisAI is helping researchers unlock new insights, discover novel connections, and accelerate scientific progress.

As the platform continues to evolve, SynthesisAI promises to be an indispensable tool for academic professionals, research institutions, and industries worldwide.

Built With

Share this project:

Updates