Note-Splicer: AI-Powered Personal Note Management and RAG System

Note-Splicer is a comprehensive Python-based system designed to transform unstructured personal notes (such as lecture transcripts, study materials, or research documents) into a structured, searchable knowledge base using advanced AI techniques. The system leverages Retrieval-Augmented Generation (RAG) to enable intelligent querying of your notes, allowing you to ask questions and receive coherent, context-aware answers based on your personal knowledge repository.

Key Features

Note Processing: Converts raw text files into structured JSON summaries using generative AI
Data Compilation: Splices individual processed notes into a unified master file
Vector Database: Builds a persistent ChromaDB vector database for semantic search
RAG Querying: Implements a RAG pipeline for natural language queries against your notes
Multiple LLM Support: Compatible with various Large Language Models (e.g., Gemini, Deepseek via Ollama)
Streamlit UI: Interactive web interface for querying the knowledge base
Modular Architecture: Organized into separate components for processing, database creation, and querying

Project Structure

note-splicer/
├── requirements.txt                    # Python dependencies
├── Spliced_Notes.json                  # Compiled master notes file
├── Spliced_Notes.txt                   # (Empty) Alternative text format
├── note_summarizer/                    # Core processing components
│   ├── README.md                       # Detailed usage guide
│   ├── requirements.txt                # Component-specific dependencies
│   ├── extracted_notes.txt             # Raw extracted notes
│   ├── spliced_notes_schema.json       # JSON schema for notes structure
│   ├── Spliced_Notes.json              # Component-level compiled notes
│   ├── structure.json                  # Additional structure definitions
│   ├── structure_example.json          # Example structure
│   ├── training_dataset.jsonl          # Training data for fine-tuning
│   ├── Generated_Notes/                # AI-generated structured notes
│   │   └── Module_1/
│   │       └── Module_1/
│   │           ├── Module_1_Part_1.json
│   │           ├── Module_1_Part_2.json
│   │           └── ...
│   └── note_summarizer/                # Scripts directory
│       ├── create_finetune_dataset.py  # Dataset creation for model fine-tuning
│       ├── create_vector_db.py         # Vector database builder
│       ├── extract_notes.py            # Note extraction utilities
│       ├── finetune_lora.py            # LoRA fine-tuning script
│       ├── note_processor.py           # Main note processing script
│       ├── prepare_dataset.py          # Dataset preparation
│       ├── query_rag.py                # RAG query interface
│       ├── run_poli_sci_qa.py          # Political science QA runner
│       └── splice_json.py              # JSON splicing utility
├── src/                                # Source code directory
│   ├── format_notes.py                 # Note formatting utilities
│   └── note_summarizer/                # Alternative script location
│       ├── create_finetune_dataset.py
│       ├── create_vector_db.py
│       ├── extract_notes.py
│       ├── finetune_lora.py
│       ├── note_processor.py
│       ├── prepare_dataset.py
│       ├── query_rag.py
│       ├── run_poli_sci_qa.py
│       └── splice_json.py
├── Lecture Transcripts/                # Raw input data
│   └── Organized_Notes/                # Organized raw transcripts
│       └── Module_1/
│           └── Module_1/
│               ├── Module_1_Part_1.txt
│               ├── Module_1_Part_2.txt
│               └── ...
├── final-notes/                        # Final processed outputs
│   ├── Final Q1 Citations.md
│   └── Final Q2 Citations.md
├── Note_Training/                      # Training data organization
│   └── Organized_Notes/                # Training transcripts
├── Simple_Notes_Organized/             # Simplified note organization
│   ├── Spliced_Notes.txt
│   └── Organized_Notes/                # Simplified organized notes
├── textbook_cleanup/                   # Textbook processing utilities
│   ├── extracted_text cleaned.txt
│   ├── extracted_text.txt
│   └── textbook_cleaner.py
└── vector_db/                          # Vector database storage
    ├── chroma.sqlite3                  # ChromaDB database file
    └── [collection_id]/                # Collection-specific data

Installation

Clone the repository:

git clone https://github.com/Integer-Conversion-Error/note-splicer.git
cd note-splicer

Set up a virtual environment:

python -m venv .venv
.\.venv\Scripts\activate  # On Windows
# or
source .venv/bin/activate  # On macOS/Linux

Install dependencies:
```
pip install -r requirements.txt
```
Set up environment variables (optional, for API keys): Create a .env file in the project root with your API keys:
```
GROQ_API_KEY=your_groq_api_key_here
GOOGLE_API_KEY=your_google_api_key_here
```

Usage

Full Workflow

Process Raw Notes:

python src/note_summarizer/note_processor.py

Compile Master JSON:

python src/note_summarizer/splice_json.py

Build Vector Database:

python src/note_summarizer/create_vector_db.py

Query Your Notes:

python src/note_summarizer/query_rag.py "Your question here"

Streamlit Web Interface

For an interactive experience:

streamlit run src/note_summarizer/query_rag.py

This launches a web interface where you can ask questions about your notes and receive AI-generated answers.

Examples

Original Raw Notes (Input)

Here's an example of raw lecture transcript text from Lecture Transcripts/Organized_Notes/Module_1/Module_1/Module_1_Part_1.txt:

Hi, everyone, and welcome to this first official course of Poll 2101, Introduction to Canadian Politics.
So, in this first model, what I want to talk about is, it's a broad theme of the Canadian constitution and the Canadian political community.
So, in many ways, what I want to see is the evolution of Canada over time from what wasn't called Canada at first, but that gradually became Canada in the 19th century.

But before we talk about some of these key moments, constitutional moments, I want to talk about a key element of when we talk about Canadian politics and the political life of Canada.
That might seem obvious, but it's important to always keep in mind, which is that geography matters a great deal in politics, especially when comparing different countries, and territory matters a great deal in Canadian politics.
This is a massive country, really large, you know, it takes the same time to go from Montreal to Vancouver, then Montreal to Paris on a plane.

And so, size matters, why?
Because it created different regional patterns of settlement, different regional interests, very different, just sort of physical geography, okay?
And it also matters because of the distribution of the population that is essentially close to the American border in the south, that also creates an important difference between people who live in the north and people who live in the south in many ways.
But I think really one of the main important differences is the economy, and we're going to get back to that frequently.
The resources are not the same in each region, they have very different economic interests, they have different economic markets, whether it's Asia, in British Columbia, or more closer to Europe, in Eastern Canada.

So that creates significantly different interests, okay?
And that's a great, really important.
And that was this notion of different interests, different territory, or different regions, sorry I should say, is very much related to the importance of resources, natural resources in the history of Canada and in Canadian politics.
And that was a key aspect, and one of the most important scholars of Canadian politics in the 20s, 30s, and 40s, Harold Dennis, who is an economic historian slash political economist, who developed this theory that is called the Staple Thesis Theory.

So what is the Staple Thesis Theory?
It is basically an approach that stipulates that Canada was economically dominated by the export of a series of staples to outside markets, whether first Great Britain, and then the United States.
So that includes fur, fish, later wheat, and today oil in many ways.
So you cannot understand the key aspect of Canadian politics without understanding the fact that the Western provinces, especially Alberta, is so dependent on oil.

Generated Structured Notes (Output)

After processing with the AI system, the raw text is transformed into structured, summarized notes in JSON format (from note_summarizer/Generated_Notes/Module_1/Module_1/Module_1_Part_1.json):

{
    "source_file": "Module_1_Part_1.txt",
    "notes": [
        {
            "id": 1,
            "note": "Geography is a critical factor in Canadian politics, shaping regional patterns of settlement, interests, and the distribution of its population."
        },
        {
            "id": 2,
            "note": "Canada's immense size contributes to distinct regional economic interests and markets, for example, Asia for British Columbia and Europe for Eastern Canada."
        },
        {
            "id": 3,
            "note": "Harold Dennis, a 20th-century economic historian and political economist, developed the Staple Thesis Theory to explain Canadian economic development."
        },
        {
            "id": 4,
            "note": "The Staple Thesis Theory posits that Canada's economy was dominated by the export of staples (like fur, fish, wheat, and oil) to external markets, initially Great Britain and later the United States."
        },
        {
            "id": 5,
            "note": "The Staple Thesis helps explain political conflicts, such as those between the West (dependent on oil) and the central power, and regional partisan dominance (Conservatives in the West, Liberals in the East)."
        },
        {
            "id": 6,
            "note": "Before European settlement, Canada was home to over 50 distinct Indigenous nations and cultures."
        },
        {
            "id": 7,
            "note": "Early European contact with Indigenous peoples was initially viewed as interactions between nations."
        },
        {
            "id": 8,
            "note": "The Royal Proclamation of 1763 is a pivotal document defining the relationship between the British/Canadian Crown and Indigenous peoples."
        },
        {
            "id": 9,
            "note": "The Royal Proclamation of 1763 forbade the purchase or settlement of land in areas designated as 'Indian Territory' without Crown approval, requiring treaties to be negotiated."
        },
        {
            "id": 10,
            "note": "Over time, as European settlement expanded, these treaties were often not respected, leading to the displacement, depopulation, and genocide of Indigenous peoples, whose numbers significantly declined in the 19th century."
        }
    ]
}

Query Example

Using the RAG system to query the processed notes:

Query: "What is the Staple Thesis Theory?"

Response: The Staple Thesis Theory, developed by Harold Dennis, posits that Canada's economy was dominated by the export of staples (like fur, fish, wheat, and oil) to external markets, initially Great Britain and later the United States. This theory helps explain political conflicts, such as those between the West (dependent on oil) and the central power, and regional partisan dominance.

Technologies Used

Python: Core programming language
ChromaDB: Vector database for semantic search
Sentence Transformers: For generating text embeddings
LiteLLM: Unified interface for multiple LLMs
Streamlit: Web interface for querying
PyTorch & Transformers: For AI/ML operations
Hugging Face Libraries: Model management and fine-tuning
Pandas & NumPy: Data processing
JSON Schema: Data validation

Data Sources

The system is designed to work with personal notes, including:

Lecture transcripts (as shown in examples)
Study materials
Research documents
Textbook excerpts
Personal annotations

Contributing

Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.

License

This project is open-source. Please check the repository for licensing information.

Contact

For questions or support, please open an issue on the GitHub repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Note-Splicer: AI-Powered Personal Note Management and RAG System

Key Features

Project Structure

Installation

Usage

Full Workflow

Streamlit Web Interface

Examples

Original Raw Notes (Input)

Generated Structured Notes (Output)

Query Example

Technologies Used

Data Sources

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Simple_Notes_Organized		Simple_Notes_Organized
note_summarizer		note_summarizer
src		src
textbook_cleanup		textbook_cleanup
vector_db		vector_db
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
Spliced_Notes.json		Spliced_Notes.json
Spliced_Notes.txt		Spliced_Notes.txt
requirements.txt		requirements.txt

Integer-Conversion-Error/note-splicer

Folders and files

Latest commit

History

Repository files navigation

Note-Splicer: AI-Powered Personal Note Management and RAG System

Key Features

Project Structure

Installation

Usage

Full Workflow

Streamlit Web Interface

Examples

Original Raw Notes (Input)

Generated Structured Notes (Output)

Query Example

Technologies Used

Data Sources

Contributing

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages