A C# WPF application for truedoc.xyz that allows users to view, manage, and work with PDF and image files.
When processing documents like receipts, TrueDoc Desktop extracts structured data using AI, which can then be translated to multiple languages or converted to speech.
- Upload and view PDF documents and images
- Side-by-side layout with document preview on the left and tools on the right
- Text extraction using Qwen-VL models (Max, Plus, Chat)
- Multi-language translation support using Qwen-Chat
- Customizable AI prompts for optimized text extraction
- Comprehensive debug logging system for troubleshooting
- Save documents to your local system
- Print documents directly from the application
- Automatic creation of processing folders to organize output files
- PDF operations: Convert to images, password protection, and digital signatures
- Image operations: Convert to PDF, increase DPI, and crop
- Drag-and-drop support for easy file loading
- Windows OS
- .NET 8.0 or higher
- Microsoft Edge WebView2 Runtime (for PDF viewing)
- DashScope API key (for Qwen-VL functionality)
- PdfiumViewer native dependencies (for PDF to image conversion)
- Clone this repository
- Open the solution in Visual Studio
- Build and run the application
- Configure your DashScope API key in the Settings menu
- Install PdfiumViewer native libraries (automatically installed via the "Install PDF Dependencies" option)
For PDF to image conversion to work properly, you need to install the native Pdfium libraries:
- Start the application
- Click on the "Install PDF Dependencies" option in the top-right corner of the application
- The application will automatically download and install the appropriate dependencies
-
Download the latest version of the Pdfium binary distribution from:
- Pdfium Binary Downloads
- Choose the appropriate package for your system (e.g.,
pdfium-windows-x64.zipfor 64-bit Windows)
-
Extract the downloaded ZIP file
-
Copy the following DLL files to your application's bin directory:
pdfium.dlland all related DLLs in the package- Also copy them to the
Librariessubfolder in the application directory
-
Alternatively, you can install PdfiumViewer via NuGet with native dependencies:
Install-Package PdfiumViewer.Native.x86.v8-xfaor
Install-Package PdfiumViewer.Native.x86_64.v8-xfa
-
Upload documents:
- Click on "Upload PDF" or "Upload Image" to load a document
- Or drag and drop files directly onto the application window
-
View the document in the preview panel on the left
-
Use the tools on the right panel to perform actions on the document
-
All processed files are automatically saved to a
<filename>_processingfolder in the same directory as the original file
- Convert to Image: Convert PDF pages to image files (JPEG or PNG) with selectable DPI
- Set Password: Add password protection to a PDF document
- Remove Password: Remove password protection from a PDF document
- Sign Document: Add a digital signature to a PDF document
- Convert to PDF: Convert an image to a PDF document
- Increase DPI: Enhance image resolution by increasing its DPI
- Crop Image: Select and crop a portion of an image
- Save As: Save a copy of the document
- Print: Print the document using the system's default print handler
- Extract Data with AI: Use Qwen-VL to extract text from images
The text extraction feature uses DashScope's Qwen-VL models to extract text from images. To use this feature:
- Open the Settings menu and enter your DashScope API key
- Configure your preferred model and prompts (see below)
- Load an image document (JPG, PNG, etc.)
- Click the "Extract Data with AI" button
- View the extracted text in the AI Text Extraction Results section
- Use the "Copy Text" button to copy text to clipboard
You can customize the AI settings to optimize text extraction for different types of documents:
- qwen-vl-max: The most powerful model with highest accuracy (recommended for complex documents)
- qwen-vl-plus: Mid-tier model with good balance of performance and speed
- qwen-vl-chat: Lighter model optimized for real-time applications
The system prompt sets the context for the AI. You can customize this to make the model focus on specific types of content:
- Default: "You are a helpful assistant that extracts text from images."
- For legal documents: "You are a legal document specialist that extracts text from legal documents with high accuracy."
- For tables: "You are an expert at extracting tabular data from images and preserving the table structure."
The user prompt provides specific instructions for each image:
- Default: "Extract all text content from this image. Just return the extracted text without any additional commentary."
- For formatting: "Extract all text from this image, preserving the original formatting, paragraphs, and bullet points."
- For selective extraction: "Extract only the phone numbers and email addresses from this image."
To get a DashScope API key, visit DashScope Platform.
All processed files are automatically saved to a dedicated processing folder:
- The folder is named
<filename>_processingand is created in the same directory as the original file - This organization keeps your original files separate from processed versions
- Each operation (convert, crop, sign, etc.) saves its output to this folder with a descriptive prefix:
- protected_filename.pdf
- cropped_filename.jpg
- signed_filename.pdf
- etc.
The application includes a comprehensive logging system that saves:
- API requests and responses
- Image processing details
- Error messages and troubleshooting information
Logs are stored in the "logs" folder in the application directory and are organized by date.
- Microsoft.Web.WebView2 - For PDF viewing
- System.Drawing.Common - For image handling
- WindowsAPICodePack-Shell - For enhanced file dialogs
- Newtonsoft.Json - For JSON serialization/deserialization
- System.Security.Cryptography.ProtectedData - For secure API key storage
- iTextSharp - For PDF manipulation
- PdfiumViewer - For PDF to image conversion
This application is built with WPF (Windows Presentation Foundation) using C# in the .NET 8 framework.
This project is licensed under the MIT License - see the LICENSE file for details.
