<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Loger on Medium]]></title>
        <description><![CDATA[Stories by Loger on Medium]]></description>
        <link>https://medium.com/@otranslator?source=rss-1775b3748501------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*eJaQHvwtwN9xlGjdzOknTA.jpeg</url>
            <title>Stories by Loger on Medium</title>
            <link>https://medium.com/@otranslator?source=rss-1775b3748501------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sat, 04 Apr 2026 03:49:50 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@otranslator/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[How to safely and efficiently split and merge PDFs?]]></title>
            <link>https://medium.com/@otranslator/how-to-safely-and-efficiently-split-and-merge-pdfs-19c37456d6f7?source=rss-1775b3748501------2</link>
            <guid isPermaLink="false">https://medium.com/p/19c37456d6f7</guid>
            <category><![CDATA[pdf-split-tool]]></category>
            <category><![CDATA[pdf-converter]]></category>
            <category><![CDATA[pdf]]></category>
            <dc:creator><![CDATA[Loger]]></dc:creator>
            <pubDate>Tue, 07 Oct 2025 05:21:26 GMT</pubDate>
            <atom:updated>2025-10-07T05:21:26.717Z</atom:updated>
            <content:encoded><![CDATA[<p>Have you ever needed to merge several PDF files into one, or extract just a few pages from a large file?</p><p>Many people typically use one of two methods to handle this:</p><p><strong>Download specialized software:</strong> This requires you to spend time searching for, downloading, and installing it. However, many free programs not only come with annoying ads but may also be downloaded from unsafe websites, posing a risk to your computer.</p><p><strong>Use online websites:</strong> Processing files on a webpage seems convenient, but it has a huge hidden danger: you must first upload your files to the website’s server. This means that your contracts, resumes, reports, and other documents containing personal privacy or company secrets are completely out of your control. You have no way of knowing how the website will handle your files, creating a risk of information leakage.</p><p>With the development of AI tools, you can now use AI to automatically generate a web application.</p><p>For example, you can use Gemini. With just one sentence, “Generate a PDF splitting page that runs in the local browser,” + Canvas, Gemini can immediately create a usable PDF splitting tool.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*vtLubXcv4935z5KuHpNzTQ.png" /></figure><p>If you don’t have the corresponding AI tools and don’t want to optimize the AI-generated application yourself (that is, adjust the functions according to your own needs), you can consider using <a href="https://oconvertor.com">O.Convertor</a>, a collection of local browser applications that I had AI adjust to my own requirements.</p><p>The biggest advantages of all the tools are:</p><p><strong>Absolutely secure, no upload required:</strong> All operations are completed within your own computer’s browser. Your files never leave your computer from start to finish. This fundamentally guarantees your privacy and security.</p><p><strong>Extremely simple, no installation needed:</strong> It’s a website; you can use it directly by clicking the link. The interface is very intuitive; you can easily split or merge pages just by dragging and dropping.</p><p><strong>Completely free:</strong> A clean page with no ad interruptions. (There are some article links at the bottom that redirect to <a href="https://otranslator.com">O.Translator</a>, a document translation website I developed.)</p><p><strong>PDF Merger Tool:</strong> <a href="https://oconvertor.com/tools/merge-pdf">https://oconvertor.com/tools/merge-pdf</a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*sVk_p8qKXP7AFByQy9WgpQ.png" /><figcaption>PDF Merger</figcaption></figure><p><strong>PDF Splitter Tool:</strong> <a href="https://oconvertor.com/tools/split-pdf">https://oconvertor.com/tools/split-pdf</a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*lqDg_kK23rW_r5AwfoxB3w.png" /><figcaption>PDF Splitter</figcaption></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=19c37456d6f7" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Perfect Layouts, Perfect Translations: How to Use AI to Translate Document]]></title>
            <link>https://medium.com/@otranslator/perfect-layouts-perfect-translations-how-to-use-ai-to-translate-document-29916357139e?source=rss-1775b3748501------2</link>
            <guid isPermaLink="false">https://medium.com/p/29916357139e</guid>
            <category><![CDATA[pdf-translation]]></category>
            <category><![CDATA[document-translation]]></category>
            <category><![CDATA[pdf]]></category>
            <category><![CDATA[epub]]></category>
            <category><![CDATA[ai-translation]]></category>
            <dc:creator><![CDATA[Loger]]></dc:creator>
            <pubDate>Mon, 10 Mar 2025 13:11:17 GMT</pubDate>
            <atom:updated>2025-03-10T13:11:17.698Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*b22E44FD7TJ9np2IgqSN1Q.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*RxciGMlVlEZoompOsnBpTw.jpeg" /></figure><h3>Introduction</h3><p>In today’s global marketplace, accurate document translation has become essential for businesses and individuals alike. However, traditional translation methods often struggle with maintaining the original document’s formatting and layout, resulting in additional work to restructure translated content.</p><p>Enter <a href="https://otranslator.com/"><strong>otranslator.com</strong></a> — an innovative AI-powered translation platform designed to solve this exact problem. Our service specializes in translating documents while perfectly preserving their original layouts, making the translation process seamless and efficient.</p><p>Whether you need to translate business reports, academic papers, presentations, books, or any other document type, <a href="https://otranslator.com/">otranslator.com</a> supports a wide range of formats including PDF, DOCX, XLSX, PPTX, EPUB, and more. In this comprehensive guide, we’ll walk you through the entire process of using otranslator.com to transform your documents with precision and ease.</p><h3>Step 1: Logging In to Your Account</h3><p>The first step to using otranslator.com is accessing your account. We offer multiple convenient login options:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*RqIaKamrVEC8p-d9wcYfWA.png" /></figure><h3>Google Account Login</h3><ol><li>Click on “Sign In with Google”</li><li>Select your Google account</li><li>Authorize the connection</li></ol><h3>Email Verification Login</h3><ol><li>Enter your email address</li><li>Click on “Sign In with Email”</li><li>Check your email for the link to login in</li></ol><p>Pro tip: Creating an account allows you to track your translation history and manage your credits more efficiently.</p><h3>Step 2: Uploading Your Document</h3><p>Once logged in, you’re ready to upload the document you need to translate:</p><ol><li>On the home page, click on the “Start Translation” button</li><li>Select the file from your computer, or drag and drop it into the designated area</li><li>Choose the original language of your document from the dropdown menu</li><li>Select the target language you want to translate to</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*6ryuMT_lRRNGN-n-QEIROg.png" /></figure><p><strong>Supported file formats:</strong> PDF, DOCX, XLSX, PPTX, EPUB, TXT, HTML, SRT, AI, INDD, ODT, ODS, ODP, JPG, PNG, JSON, XML, MD, PO, XLF, CSV, TSV, TEX and more.</p><p>Note: For optimal results, ensure your document is clearly formatted and readable. While our system can handle most document types, extremely complex layouts or documents with unusual fonts may require additional attention.</p><h3>Step 3: Initiating the Translation</h3><p>After uploading your document and selecting languages:</p><ol><li>Review your language selections to ensure they’re correct</li><li>Click the “Start Translation” button</li><li>The system will begin processing your document and generating a translation preview</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*AT0QkHHCkffZ_5w1ok-gOw.png" /></figure><p>During this process, our AI analyzes your document’s structure, content, and formatting to prepare an accurate translation while maintaining the original layout.</p><h3>Step 4: Previewing the Translation</h3><p>Once the initial processing is complete, you’ll be presented with a translation preview:</p><ol><li>Examine the preview to assess the quality of the translation</li><li>Check how the layout has been preserved</li><li>Note the total credits required for the complete translation</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Gx9ggph6ZNWpjpVBrHIWLw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*7rco8kUerQ_vzemKdoeosg.png" /></figure><p>This preview stage is crucial as it allows you to:</p><ul><li>Verify the translation meets your expectations</li><li>Confirm the formatting has been preserved correctly</li><li>Make an informed decision about proceeding with the full translation</li></ul><h3>Step 5: Paying with Credits for Full Translation</h3><p>If you’re satisfied with the preview and ready to proceed:</p><ol><li>Review the credit cost displayed for the complete translation</li><li>Confirm the credit payment when prompted</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*QF9ojtsVXk9vJK-N1ZISZA.png" /></figure><p>If you need more credits:</p><ol><li>Select your preferred credit package</li><li>Complete the payment through our secure payment system</li><li>Return to your translation to continue the process</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*JqZy6dTGt3nLCBKQvL93FA.png" /></figure><p>We offer various credit packages to suit different needs, from one-time translations to regular bulk processing.</p><h3>Step 6: Downloading Your Translated Document</h3><p>Once the translation is complete:</p><ol><li>You’ll receive a notification that your document is ready</li><li>Click on download button to save the translated document to your device</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*CZyH2M0MLWkQRtmEZoeo0g.png" /></figure><p>Your translated document will maintain the same formatting, layout, images, tables, and other elements as the original, saving you countless hours of reformatting work.</p><h3>Step 7: Fine-tuning Your Translation (Optional)</h3><p>If you notice specific sections that need improvement:</p><ol><li>Click the “Post-editing” button</li><li>Use our built-in editor to modify specific sections of text</li><li>Save your changes</li><li>Generate a new version of the translated document with your edits incorporated</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*U3hkYwK07ehx5MZApXDQng.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*yGIFip0FEwN83F526uvlNA.png" /></figure><p>This feature is particularly useful for:</p><ul><li>Technical terminology that may require domain-specific adjustments</li><li>Brand names or product terms that should remain untranslated</li><li>Culturally specific content that may need adaptation</li></ul><h3>Conclusion</h3><p><a href="https://otranslator.com/">otranslator.com</a> revolutionizes document translation by combining powerful AI with layout preservation technology. By following the steps outlined in this guide, you can transform your documents into different languages while maintaining their professional appearance and formatting.</p><p>Whether you’re a business expanding into international markets, a researcher sharing findings globally, or an individual communicating across language barriers, our platform offers an efficient solution for all your document translation needs.</p><p>Ready to try it for yourself? Visit <a href="https://otranslator.com/">otranslator.com</a> today and experience the perfect blend of translation accuracy and formatting preservation.</p><p><em>Have questions or need assistance? Contact our support team at </em><a href="https://discord.gg/tdZaYSKzqG"><em>Discord</em></a><em>.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=29916357139e" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Revolutionizing PDF Translation with AI: An In-Depth Look at O.Translator’s Innovation]]></title>
            <link>https://medium.com/@otranslator/revolutionizing-pdf-translation-with-ai-an-in-depth-look-at-o-translators-innovation-1a8317dde11f?source=rss-1775b3748501------2</link>
            <guid isPermaLink="false">https://medium.com/p/1a8317dde11f</guid>
            <category><![CDATA[gemini]]></category>
            <category><![CDATA[openai]]></category>
            <category><![CDATA[pdf]]></category>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[translation]]></category>
            <dc:creator><![CDATA[Loger]]></dc:creator>
            <pubDate>Sun, 05 Jan 2025 07:01:52 GMT</pubDate>
            <atom:updated>2025-01-05T07:01:52.004Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*jX0oICRe4-5XqT2kUq9P1g.jpeg" /></figure><p>As the digital world continues to expand, the need for efficient and accurate translation of documents grows exponentially. PDFs (Portable Document Format files) are among the most widely used formats for sharing information due to their consistent appearance across different devices and platforms. However, translating PDFs has historically presented significant challenges, hindering seamless global communication.</p><p>At <a href="https://otranslator.com/">O.Translator</a>, we have been at the forefront of addressing these challenges by leveraging advanced artificial intelligence (AI) technologies. This article examines the current state of PDF translation, the limitations of traditional methods, and how AI is revolutionizing this field.</p><h3>The Intrinsic Challenges of PDF Translation</h3><p>PDFs were originally designed to preserve document formatting and ensure that files appear the same on any device. While this makes them ideal for sharing finalized documents, it complicates the process of editing or translating their content.</p><h4><strong>Limitations of Traditional Translation Methods</strong></h4><p><strong>1. Designed for Display, Not Editing</strong>: PDFs are inherently non-editable. Most translation workflows involve converting PDFs into editable formats like DOCX (Microsoft Word) before translation. This conversion is not seamless and often leads to:</p><ul><li><strong>Formatting Issues</strong>: The structure and layout can become disordered during conversion, resulting in misaligned text, disrupted paragraphs, and misplaced images.</li><li><strong>Floating Text on Images</strong>: Text embedded within or overlaid on images may not convert properly, leading to disjointed or missing content.</li><li><strong>Mathematical Formulas and Special Characters</strong>: Equations and symbols might not be accurately converted due to their complex formatting, causing errors in translated documents.</li></ul><p><strong>2. Inadequate Contextual Understanding in Machine Translation</strong>:</p><ul><li><strong>Fragmented Sentences</strong>: PDFs often segment text for layout purposes, breaking sentences across lines or columns. Traditional machine translation tools may treat these fragments as separate sentences, leading to incoherent translations.</li><li><strong>Lack of Contextual Awareness</strong>: Without understanding the broader context, machines can produce literal translations that miss the intended meaning, tone, or nuance of the original text.</li></ul><p>These challenges result in a labor-intensive process that requires significant manual correction to ensure the translated document retains the integrity of the original.</p><h3>The AI Revolution in PDF Translation</h3><p>Advancements in AI, particularly in large language models (LLMs), have opened new possibilities for translating PDFs more accurately and efficiently.</p><h4><strong>Enhanced Translation Capabilities with Large Language Models</strong></h4><p><strong>1. Improved Contextual Analysis</strong>:</p><ul><li><strong>Deep Learning Algorithms</strong>: LLMs utilize sophisticated algorithms capable of understanding context by analyzing vast amounts of data. This allows for more accurate translations that consider the nuances of language.</li><li><strong>Natural Language Processing (NLP)</strong>: Advanced NLP techniques enable the AI to interpret idiomatic expressions, cultural references, and stylistic elements, producing translations that are fluent and contextually appropriate.</li></ul><p><strong>2. Near Human-Level Translation Quality</strong>:</p><ul><li><strong>Consistency and Coherence</strong>: By considering entire paragraphs or sections rather than isolated sentences, LLMs maintain the logical flow of the text.</li><li><strong>Adaptability</strong>: The AI can adjust translations based on the subject matter, whether it’s technical, legal, literary, or colloquial, ensuring the terminology and tone are suitable for the intended audience.</li></ul><h4><strong>Analytical Advancements in PDF Structure Interpretation</strong></h4><p><strong>1. Accurate Sentence Reconstruction</strong>:</p><ul><li><strong>Text Segmentation Recognition</strong>: AI models can identify when text fragments are part of the same sentence or thought, even when separated by formatting in the PDF.</li><li><strong>Sentence Merging</strong>: By understanding the document’s structure, the AI can merge fragmented text appropriately, preserving the meaning in the translation.</li></ul><p><strong>2. Direct PDF Translation Without Conversion</strong>:</p><ul><li><strong>Layout Preservation</strong>: AI technologies have improved in analyzing and replicating the layout of the original PDF, maintaining the positioning of text, images, tables, and other elements in the translated document.</li><li><strong>Formula and Symbol Handling</strong>: Enhanced capabilities allow the AI to recognize and accurately translate mathematical formulas and special symbols directly within the PDF.</li></ul><h4><strong>Continuous Improvement of AI Models</strong></h4><p>The field of AI is rapidly evolving, with models becoming increasingly sophisticated in handling complex tasks related to document analysis and translation.</p><ul><li><strong>Refinement Through Training</strong>: Ongoing training with diverse datasets helps the AI learn and adapt to new formats, languages, and subjects.</li><li><strong>Integration of Multimodal Data</strong>: Future developments aim to incorporate visual and contextual cues from images and graphics within PDFs to further enhance translation accuracy.</li></ul><h3>Introducing O.Translator: Bridging the Language Gap</h3><p>At <a href="https://otranslator.com/">O.Translator</a>, we have harnessed these AI advancements to develop a solution that addresses the longstanding challenges of PDF translation.</p><h4><strong>Our Approach</strong></h4><ol><li><strong>Leveraging Advanced AI Models</strong>: We utilize state-of-the-art LLMs that have been fine-tuned specifically for document translation tasks. This ensures high-quality translations that retain the original document’s intent and style.</li><li><strong>Direct PDF Translation</strong>: Our platform translates PDFs directly without the need for intermediate format conversions, preserving the original layout and formatting.</li><li><strong>Handling Complex Content</strong>: Whether it’s technical manuals with intricate diagrams, academic papers with mathematical equations, or marketing materials with embedded graphics, our AI is equipped to handle diverse content types accurately.</li></ol><p>Example:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*z1yILqShOAjjcnJ3XLaz3Q.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Xkg1NFGO1Chgdi09-a00hw.png" /></figure><h4><strong>Benefits to the Consumer</strong></h4><ol><li><strong>Cost-Effectiveness</strong>: By automating the translation process, we significantly reduce costs compared to traditional human translation services, making high-quality translations accessible to a wider audience.</li><li><strong>Time Efficiency</strong>: Our AI-powered platform delivers rapid turnaround times, enabling users to obtain translated documents promptly without compromising on quality.</li><li><strong>Ease of Use</strong>: With a user-friendly interface, clients can upload PDFs and receive translations seamlessly, without the need for technical expertise or manual formatting adjustments.</li></ol><h4><strong>Addressing the High Demand for Document Translation</strong></h4><p>The globalized nature of today’s economy and academia necessitates effective communication across languages. PDFs are prevalent in various fields, including:</p><ul><li><strong>E-books and Publications</strong>: Authors and publishers require translations that maintain the integrity of the original work, including layout, images, and stylistic elements.</li><li><strong>Business Reports and Legal Documents</strong>: Accurate translations are crucial for international collaborations, compliance, and negotiations.</li><li><strong>Academic Papers and Research</strong>: Scholars need precise translations to share findings with the global community, where accuracy in terminology and data representation is paramount.</li></ul><p>By providing a reliable and efficient translation service, <a href="https://otranslator.com/">O.Translator</a> meets the growing demand for accessible multilingual content.</p><h3>The Technical Underpinnings of Our Solution</h3><h4><strong>Advanced Natural Language Processing</strong></h4><p>Our AI models are built upon cutting-edge NLP techniques that enable:</p><ul><li><strong>Semantic Understanding</strong>: The AI comprehends the meaning behind the text, allowing for translations that capture subtle nuances.</li><li><strong>Contextual Relevance</strong>: By analyzing surrounding text, the AI ensures that translations are contextually appropriate, reducing errors commonly found in phrase-based translations.</li></ul><h4><strong>Security and Privacy Considerations</strong></h4><p>We recognize the importance of maintaining confidentiality, especially with sensitive documents.</p><ul><li><strong>Secure Data Handling</strong>: All documents are processed using encrypted connections, and we adhere to strict data protection protocols.</li><li><strong>Compliance with Regulations</strong>: Our platform is designed to comply with international data privacy regulations to ensure our clients’ information is safeguarded.</li></ul><h3>The Future of PDF Translation with AI</h3><p>The integration of AI in PDF translation is not just a technological advancement; it’s a paradigm shift in how we approach multilingual communication.</p><h4><strong>Anticipated Developments</strong></h4><ul><li><strong>Enhanced Multilingual Support</strong>: Continued expansion of language pairs and dialects to cater to a broader global audience.</li><li><strong>Integration with Other AI Technologies</strong>: Incorporating speech recognition and text-to-speech capabilities for accessible translations in different formats.</li><li><strong>Customization and Personalization</strong>: Allowing users to define translation styles or industry-specific terminology for tailored outputs.</li></ul><h3>Collaborative Opportunities</h3><ul><li><strong>Human-AI Synergy</strong>: Combining AI efficiency with human expertise for specialized translations, such as literary works or sensitive legal documents.</li><li><strong>API Integration</strong>: Providing services that integrate with other platforms and applications, enabling automated workflows and increased productivity.</li></ul><h4><strong>Conclusion</strong></h4><p>The challenges of PDF translation have long been a barrier to effective global communication. However, with the advent of AI and the development of sophisticated language models, we are witnessing a revolution in how documents are translated and shared across languages.</p><p>At <a href="https://otranslator.com/">O.Translator</a>, our commitment is to harness these technological advancements to provide solutions that are not only efficient and cost-effective but also maintain the highest standards of accuracy and quality. By addressing the inherent difficulties of PDF translation, we are enabling individuals and organizations to communicate more effectively in an increasingly interconnected world.</p><p>The journey towards perfecting AI-driven translation is ongoing. We continue to invest in research and development to enhance our platform’s capabilities, ensuring that we meet the evolving needs of our clients. Through innovation and dedication, we aim to break down language barriers and facilitate the seamless exchange of knowledge and ideas globally.</p><p><strong>About </strong><a href="https://otranslator.com/"><strong>O.Translator</strong></a><strong> ( </strong><a href="https://otranslator.com/">https://otranslator.com/</a> )</p><p><a href="https://otranslator.com/">O.Translator</a> is a leading AI-powered translation platform specializing in direct PDF translation. By leveraging advanced artificial intelligence and natural language processing technologies, we provide high-quality translations that preserve the original document’s formatting and integrity. Our mission is to make accurate and efficient translation services accessible to all, fostering better communication and collaboration worldwide.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*DZfGeNQ17gY_RtipkS5vdw.png" /></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=1a8317dde11f" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How to Translate arXiv Papers with AI]]></title>
            <link>https://medium.com/@otranslator/how-to-translate-arxiv-papers-with-ai-eda328fd1c6e?source=rss-1775b3748501------2</link>
            <guid isPermaLink="false">https://medium.com/p/eda328fd1c6e</guid>
            <category><![CDATA[translation]]></category>
            <category><![CDATA[latex]]></category>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[arxiv]]></category>
            <dc:creator><![CDATA[Loger]]></dc:creator>
            <pubDate>Fri, 13 Dec 2024 05:09:15 GMT</pubDate>
            <atom:updated>2024-12-13T05:09:15.834Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*h5uid7-HxxjTYHw0iwFnnw.jpeg" /></figure><h3>I. Introduction to arXiv</h3><p><a href="https://arxiv.org/">arXiv</a> is an open-access preprint platform operated by the Cornell University Library since 1991. It encompasses academic papers across various fields such as physics, mathematics, computer science, and more. arXiv provides a free space for researchers worldwide to communicate and share their research findings.</p><h3>II. Format Features of arXiv Papers</h3><p>On arXiv, the primary format for paper distribution is PDF, which is convenient for reading and printing. Additionally, many papers provide the <strong>TeX source code</strong>, a typesetting system widely used in academia. The availability of TeX source code greatly facilitates secondary editing and translation of papers.</p><h3>III. Analysis of Current Methods for Translating arXiv Papers</h3><p>Facing a vast number of English papers, there are three main translation methods, each with its advantages and disadvantages.</p><h3>1. Online HTML Translation (Free)</h3><p>For papers that include TeX source code, arXiv provides an HTML page converted using <a href="http://dlmf.nist.gov/LaTeXML">LaTeXML</a> — a LaTeX to XML/HTML/MathML converter. On the HTML page, you can use tools like the <strong>Google Translate</strong> plugin to achieve an online translation preview.</p><h3>2. Online PDF Translation (Free or Paid)</h3><h4>Using Word Documents as an Intermediate Format (e.g., DeepL Translation)</h4><p><strong>Method:</strong> Convert the PDF to Word format, translate and replace text within Word, and finally regenerate the translated PDF using Word’s automatic typesetting features.</p><p><strong>Disadvantages:</strong></p><ul><li><strong>Incorrect Recognition of Complex Mathematical Formulas:</strong> During the conversion from PDF to DOCX, complex mathematical formulas may not be correctly recognized, leading to garbled text or misaligned formatting.</li><li><strong>Layout Issues:</strong> Paragraphs, charts, references, and other elements may experience layout disarray, requiring time-consuming reorganization.</li></ul><h4>Direct Translation and Replacement of Text within the Original PDF (e.g., Google Translate, O.Translator)</h4><p><strong>Method:</strong> Extract detailed information from the PDF text (words, colors, fonts, positions, orientation, etc.), then translate and replace the original text in situ to achieve the translation effect.</p><p><strong>Disadvantages:</strong></p><ul><li><strong>Limited Functionality for Direct PDF Text Editing:</strong> Editing large paragraphs directly within a PDF is limited, making translation and proofreading less convenient.</li><li><strong>Complexity of PDF Formats:</strong> The flexible nature of PDFs requires complex code to enhance sentence integrity, preventing sentence breaks that can affect translation quality.</li></ul><p><strong>Note:</strong> <a href="https://otranslator.com/">O.Translator</a> has made significant improvements in PDF translation. You can click the link below to learn more:</p><p><a href="https://otranslator.com/">O.Translator: The world&#39;s smartest AI document translator</a></p><h3>3. Translating TeX Source Code and Recompiling</h3><p><strong>Method:</strong> Download the paper’s TeX source code, extract the textual content for translation, and then embed the translated text back into the TeX source code. Recompile to generate the PDF document in the target language.</p><h4>Advantages:</h4><ul><li><strong>Complete Sentences:</strong> The text in the source code usually contains complete sentences, avoiding the issue of fragmented sentences when extracting text from PDFs, thus improving translation quality.</li><li><strong>Preservation of Formulas and Formatting:</strong> Mathematical formulas exist as TeX code, preventing garbling issues during translation. After recompilation, formulas and formatting can be perfectly presented.</li><li><strong>High-Quality Typesetting:</strong> Utilizing TeX’s powerful typesetting capabilities, the generated translated document has exquisite formatting, suitable for formal publication and presentation.</li></ul><h4>Disadvantages:</h4><ul><li><strong>High Technical Threshold:</strong> Requires familiarity with TeX, including experience with the compilation environment and process.</li><li><strong>Complex Workflow:</strong> Involves multiple steps such as text extraction, translation, code modification, and compilation, which may be challenging for beginners.</li></ul><h4>Specific TeX Translation Solutions:</h4><ol><li><strong>Direct AI Translation of TeX Files Using Prompts:</strong> Due to AI translation length limitations and randomness, the success rate of direct compilation after translation is not high.</li><li><strong>Open-Source arXiv Translation Solutions:</strong> Due to maintenance issues, the stability of translation is generally average.</li></ol><p><strong>O.Translator</strong> has recently launched a paid AI translation feature for arXiv papers, based on the TeX translation method (also supports direct TeX file translation).</p><ul><li><strong>Support for Numerous Target Languages:</strong> Over 50 languages, including Simplified and Traditional Chinese, Japanese, Korean, French, German, Portuguese, and more.</li><li><strong>Superior Translation Quality:</strong> O.Translator’s paid translation, based on GPT-4o-mini, offers higher consistency and correctness in context compared to other paper translation tools.</li></ul><h4>Enhanced Bilingual Version:</h4><p>Accuracy is paramount in paper translations. For unclear parts in the translated text, comparing with the original text enhances comprehension. Based on TeX translation, it is possible to retain the layout and compare the original text nearby.</p><p><strong>Example of O.Translator’s Bilingual Translation Effect:</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*mqTWaNGnVMQy4YQpvTNYEQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*RdC4suqs3W_WFzOIVDlnbw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*YHx21SDunGo2h47JsTNgPw.png" /></figure><p><strong>Pricing Details of O.Translator:</strong></p><p>O.Translator charges based on the number of tokens of the actual text to be translated.</p><blockquote><strong>1. Token Calculation in TeX Translation:</strong></blockquote><blockquote>When translating documents using the TeX method, only the tokens of the actual text needing translation are counted.</blockquote><blockquote>Comments, citations, standalone mathematical formulas, and TeX command texts within the TeX code are not counted.</blockquote><blockquote><strong>2. Compilation Errors Due to TeX File Complexity:</strong></blockquote><blockquote>Due to the complexity of TeX files, some papers may fail to compile after source code modifications.</blockquote><blockquote>We continuously monitor and strive to fix related errors.</blockquote><p><strong>If you have translation needs in this area, you can give it a try.</strong></p><p><strong>The website also provides a free preview mode to help users assess translation quality.</strong></p><p>👉 <a href="https://otranslator.com/">Visit O.Translator</a></p><p>On the journey of scientific research, language should not be a barrier preventing us from acquiring knowledge. We hope this introduction provides practical reference, empowering you to navigate the academic world of arXiv with ease and confidence.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=eda328fd1c6e" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[OfficeTranslator.com is Now OTranslator.com]]></title>
            <link>https://medium.com/@otranslator/officetranslator-com-is-now-otranslator-com-a734e105dd21?source=rss-1775b3748501------2</link>
            <guid isPermaLink="false">https://medium.com/p/a734e105dd21</guid>
            <category><![CDATA[pdf]]></category>
            <category><![CDATA[chatgpt]]></category>
            <category><![CDATA[translation-services]]></category>
            <dc:creator><![CDATA[Loger]]></dc:creator>
            <pubDate>Sun, 12 May 2024 14:32:38 GMT</pubDate>
            <atom:updated>2024-05-12T14:32:38.241Z</atom:updated>
            <content:encoded><![CDATA[<p>Since the launch of OfficeTranslator.com last year, we have been committed to providing high-quality and efficient online translation services to help users around the world overcome language barriers and achieve seamless communication. We are well aware that with the continuous advancement of technology and the evolving needs of users, we must also innovate constantly to meet your expectations.</p><p>Today, we are excited to announce that OfficeTranslator.com will officially be renamed as OTranslator.com.</p><p><strong>Why OTranslator.com?</strong></p><ul><li><strong>Wider range of services</strong>: The new brand name better reflects that our services are not limited to the translation of office documents but also include more forms of content and scenarios, such as websites and personal files.</li><li><strong>Technological innovation</strong>: OTranslator.com will continue to adopt the most advanced translation technologies, including artificial intelligence and machine learning, to provide more accurate and natural translation results.</li><li><strong>Improved user experience</strong>: We have made comprehensive upgrades to the platform to provide a more intuitive and convenient user experience. The new platform will support more languages and functions, helping users to complete translation tasks more efficiently.</li></ul><p><strong>Our commitment</strong></p><p>Although the brand name has changed, our commitment to providing high-quality translation services remains unchanged. OTranslator.com will continue to adhere to the principle of putting customers first, continuously optimize our services, and meet your translation needs.</p><p><strong>What will happen next?</strong></p><ul><li>Existing OfficeTranslator.com users do not need to take any action. Your account information, history records, and subscription services will be automatically transferred to OTranslator.com.</li><li>We will gradually complete the brand transition in the next few weeks. During this period, you may see both the brand names OfficeTranslator.com and OTranslator.com.</li></ul><p>We sincerely appreciate your continued support and understanding. OTranslator.com looks forward to continuing to provide you with high-quality translation services and working with you to embrace a more exciting future.</p><p>If you have any questions or need assistance, please feel free to contact our customer support team.</p><p>Thank you once again for your support!</p><p>The OTranslator Team</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*E3cXJgIjCFDAqVryVSOQuQ.png" /></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a734e105dd21" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How to Translate the Entire Excel File?]]></title>
            <link>https://medium.com/@otranslator/how-to-translate-the-entire-excel-file-69ba15350bb0?source=rss-1775b3748501------2</link>
            <guid isPermaLink="false">https://medium.com/p/69ba15350bb0</guid>
            <category><![CDATA[translation]]></category>
            <category><![CDATA[typescript]]></category>
            <category><![CDATA[excel]]></category>
            <dc:creator><![CDATA[Loger]]></dc:creator>
            <pubDate>Sat, 09 Dec 2023 05:50:59 GMT</pubDate>
            <atom:updated>2024-04-18T07:54:47.649Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*5QD8mq2cc3csjb38tc0Hqg.png" /></figure><h3>I. <strong>What is an Excel file?</strong></h3><p>Excel is a spreadsheet software developed and released by Microsoft.</p><p>It provides powerful data processing and analysis capabilities, allowing users to manage and calculate data, create charts and reports, and perform various data analysis tasks.</p><p>An Excel file consists of one or more spreadsheets, where users can input and edit data in cells and utilize built-in functions and formulas for calculations.</p><p>Excel also supports functions such as data sorting, filtering, charting, as well as connecting and importing from external data sources.</p><p>It is widely used office software in various fields and industries.</p><h3>II. <strong>Common Ways of Translating Excel</strong></h3><h4><strong>1. Translation Using Functions</strong></h4><p>The official Excel function translation tool allows the translation of specified text by calling translation functions. However, translating the entire file can be cumbersome.</p><h4><strong>2. Using Tools for One-Click Translation</strong></h4><p><a href="https://medium.com/@loger.zhu/how-to-translate-documents-pdf-docx-pptx-using-chatgpt-e0f86aeed69b">Translating Documents with ChatGPT: A Comprehensive Guide</a></p><blockquote>If you want to create your own Excel translation program, the following text reading tips might be useful to you.</blockquote><h3>III. <strong>How to Read and Translate Text Used in Excel Files</strong></h3><blockquote><em>A perfect Excel translation should extract and translate all text while retaining the original formats.</em></blockquote><h4><strong>1. Problem</strong></h4><p>There are many libraries available in Node.js for reading xlsx files, such as xlsx, exceljs, and node-xlsx.</p><p>However, apart from Microsoft Office, no other program can guarantee full compatibility with all Excel features.</p><p>This means that using these libraries to read and regenerate Excel files may result in the loss of some functions.</p><h4><strong>2. Solution</strong></h4><p>The main issue lies in the format conversion process, which can cause changes to the document content.</p><p>If the location of the text in the file can be identified and replaced with the translated text, the original format and content, such as formulas, can be perfectly preserved.</p><p>Firstly, it is necessary to understand the structure of the .xlsx file.</p><h4><strong>3. Structure of the .xlsx File</strong></h4><p>An .xlsx file is essentially a compressed file in ZIP format, containing multiple directories and files.</p><p>The main directories and files include:</p><ul><li><strong>_rels directory</strong>: Contains files related to file relationships.</li><li><strong>docProps directory</strong>: Contains files related to document properties, such as core properties and extended properties.</li><li><strong>xl directory</strong>: Contains files related to the Excel workbook.</li><li><strong>[Content_Types].xml</strong>: Defines the content types of various parts in the file.</li></ul><p>Under the <strong>xl</strong> directory, there is a special file called <strong>xl/sharedStrings.xml</strong>. <strong>The “t” node inside this file contains all the text used in the Excel file.</strong></p><h4><strong>4. Reading and Modifying Text in Excel</strong></h4><p>Third-party libraries required:</p><blockquote>jszip: A library for creating, reading, and manipulating ZIP files in JavaScript.</blockquote><blockquote>xmldom: A lightweight XML parsing library for JavaScript that enables the creation, modification, and traversal of XML document nodes.</blockquote><p>4.1. <strong>Reading Excel File Using jszip</strong></p><pre>const excelFile = &#39;excel file path&#39;<br>const fileBuffer = fs.readFileSync(excelFile)<br>const zip = await JSZip.loadAsync(fileBuffer)</pre><p>4.2. <strong>Reading Content of sharedStrings.xml Using xmldom</strong></p><pre>let xml = await zip.file(&#39;xl/sharedStrings.xml&#39;)?.async(&#39;string&#39;)<br>const doc = parser.parseFromString(xml, &#39;application/xml&#39;)</pre><p>4.3. <strong>Accessing and Translating All Text Nodes</strong></p><pre>// Access all text nodes in order<br>const nodes = doc.getElementsByTagName(&#39;t&#39;)<br>for (let i = 0; i &lt; nodes.length; i++) {<br>  const node = nodes[i]<br>  // Translate node.textContent using ChatGPT..., where translate represents the desired translation function that can be performed using ChatGPT<br>  node.textContent = translate(node.textContent)<br>}<br><br></pre><p>4.4. <strong>Replacing the translated sharedStrings.xml in the compressed file</strong></p><pre>const serializer = new XMLSerializer()<br>const modifiedXml = serializer.serializeToString(doc)<br>await zip.file(&#39;xl/sharedStrings.xml&#39;, modifiedXml, {<br>   compression: &#39;DEFLATE&#39;,<br>   compressionOptions: { level: 3 }<br>})<br><br>// gen the translated excel fiel<br>fs.writeFileSync(&quot;translated.xlsx&quot;, await zip.generateAsync({ type: &#39;nodebuffer&#39; }))</pre><p>By following these steps, you will obtain a perfectly translated file.</p><h4><strong>5. Optimization Suggestions</strong></h4><ul><li>Translate the names of worksheets, which can be found in the workbook.xml.</li><li>sharedStrings.xml contains some application-specific names. When translating Excel files containing a large number of formulas, it is important to avoid translating relevant reference names.</li><li>Translate text using the contextual understanding approach of ChatGPT, rather than translating each instance of text separately. There is significant topic for optimization in this approach, but it requires a thorough process.</li></ul><p>I will continue to share some code about translating various file formats in the future. If you are interested, please follow me. Thank you.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=69ba15350bb0" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Document Translation — How to Translate EPUB Files?]]></title>
            <link>https://medium.com/@otranslator/document-translation-how-to-translate-epub-files-576c7a5f1478?source=rss-1775b3748501------2</link>
            <guid isPermaLink="false">https://medium.com/p/576c7a5f1478</guid>
            <category><![CDATA[typescript]]></category>
            <category><![CDATA[epub]]></category>
            <category><![CDATA[translation]]></category>
            <dc:creator><![CDATA[Loger]]></dc:creator>
            <pubDate>Fri, 08 Dec 2023 16:22:22 GMT</pubDate>
            <atom:updated>2024-04-18T07:56:01.890Z</atom:updated>
            <content:encoded><![CDATA[<h3>How to Translate EPUB Files?</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*E3cXJgIjCFDAqVryVSOQuQ.png" /></figure><h3>I. What is an EPUB File?</h3><ol><li>An EPUB file is a file format used by electronic publishing businesses.</li><li>It is commonly used to store and present electronic publications such as e-books, magazines, newspapers, etc.</li><li>It has structured content, including text, images, tables, etc.</li></ol><h3>II. Structure of an EPUB File</h3><p>Brief structure:</p><p><strong>META-INF Folder</strong>: This folder contains the metadata information of the EPUB file, such as container file, encryption information, version number, etc.</p><p><strong>OEBPS Folder</strong>: The OEBPS (Open eBook Publication Structure) folder is the core content folder of the EPUB file, containing various components of the e-book.</p><ul><li><strong>content.opf file</strong>: This is the main metadata file of the EPUB file, containing the description of the e-book, chapter structure, references to text files and media resources, etc.</li><li><strong>toc.ncx file</strong>: This is the table of contents file of the EPUB file, defining the chapters and directory structure of the e-book, providing navigation and positioning functionality.</li><li><strong>HTML files</strong>: The content of the EPUB file is usually presented in the form of HTML files, with each HTML file representing a chapter or page.</li><li><strong>CSS files</strong>: The EPUB file can contain CSS files used for styling and layout, controlling the appearance and typography of the e-book.</li><li><strong>Image, audio, and video files</strong>: The EPUB file can contain embedded image, audio, and video files for enriching content and interactive elements.</li><li><strong>Other files</strong>: The EPUB file may also include other auxiliary files such as font files, style sheets, script files, etc., for customizing and enhancing the functionality and appearance of the e-book.</li></ul><p>EPUB files use the ZIP compression format and adopt open-standard technologies such as XML, HTML, and CSS to achieve structured content, layout, and presentation. This file structure makes EPUB files easy to create, edit, distribute, and read, and provides a unified reading experience across different e-readers and platforms.</p><h3>III. Extracting Translatable Text from EPUBs</h3><p>Translating EPUBs while preserving the display format unchanged can be achieved by locating the position of all the text and then translating and filling in the corresponding positions.</p><p>Third-party libraries required:</p><blockquote><em>jszip: A library for creating, reading, and manipulating ZIP files in JavaScript.</em></blockquote><blockquote><em>Cheerio: A fast, flexible, and lightweight HTML parsing and manipulation library based on jQuery.</em></blockquote><h3>1. Reading EPUB Files Using jszip</h3><pre>const epubFile = &#39;epub file path&#39;<br>const fileBuffer = fs.readFileSync(epubFile)<br>const zip = await JSZip.loadAsync(fileBuffer)</pre><h3>2. Locating All HTML Files Based on File Extension</h3><pre>for (let filePath of Object.keys(zip.files)) {<br>  if (filePath.endsWith(&#39;.html&#39;) || filePath.endsWith(&#39;.htm&#39;) || filePath.endsWith(&#39;.xhtml&#39;)) {<br>    const html = await zip.file(filePath)?.async(&#39;string&#39;)<br>    if (html) {<br>       // read the text from html<br>    }<br>  }<br>}</pre><h3>3. Reading Text Nodes from HTML Using cheerio and Translating</h3><pre>const $ = cheerio.load(html)<br>for (let selector of [&#39;body&#39;, &#39;head&#39;]) {<br>    $(selector)<br>        .find(&#39;*&#39;)<br>        .contents()<br>        .each(<br>            function () {<br>                // nodeType === 3 is the text node<br>                if (this.nodeType === 3 &amp;&amp; this.data.trim() !== &#39;&#39;) {<br>                    // user ChatGPT to translate this.data ...<br>                    this.data = translateWithChatGPT(this.data)<br>                }<br>            }<br>        )<br>}</pre><h3>4. Replacing the Translated HTML in the Compressed File</h3><pre>await zip.file(filePath, $.html({ xml: true }), {<br>   compression: &#39;DEFLATE&#39;,<br>   compressionOptions: { level: 3 }<br>})</pre><blockquote>EPUB is very strict in validating the format of HTML. If the format is incorrect, it will result in an “Invalid document” error.</blockquote><blockquote>The {xml:true} option ensures that there won’t be any mismatched tags in the HTML format.</blockquote><h3>5. Optimization Suggestions</h3><ul><li>Translating EPUB metadata (which includes information such as the book title) using a similar approach as HTML since metadata is in XML format.</li><li>Translating text in the context of ChatGPT rather than translating each individual text in HTML. This optimization has a significant potential for improvement and requires careful handling.</li></ul><p>If you are looking for a tool to help you translate EPUB files using ChatGPT, you can have a look on</p><p><a href="https://medium.com/@loger.zhu/how-to-translate-documents-pdf-docx-pptx-using-chatgpt-e0f86aeed69b">Translating Documents with ChatGPT: A Comprehensive Guide</a></p><p>I will continue to share some code about translating various file formats in the future. If you are interested, please follow me. Thank you.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=576c7a5f1478" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[PDF Transaction — How to use PDF.js]]></title>
            <link>https://medium.com/@otranslator/pdf-transaction-how-to-use-pdf-js-b669f092fc1d?source=rss-1775b3748501------2</link>
            <guid isPermaLink="false">https://medium.com/p/b669f092fc1d</guid>
            <category><![CDATA[nodejs]]></category>
            <category><![CDATA[pdf-translation]]></category>
            <dc:creator><![CDATA[Loger]]></dc:creator>
            <pubDate>Thu, 07 Sep 2023 03:48:56 GMT</pubDate>
            <atom:updated>2024-04-18T07:55:23.189Z</atom:updated>
            <content:encoded><![CDATA[<h3>How to extract the text information of a PDF?</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Go7wmgqcLcx--l7RyqaeKA.png" /></figure><h3>1. Common PDF Document Translation Solutions:</h3><h4>1.1 Translation Solution via Word Document (DeepL Translation)</h4><p>This solution first converts the PDF into a Word format, then translates and replaces the text in Word, and finally uses the automatic typesetting feature of Word to regenerate the translated PDF.</p><p><strong>Disadvantages:</strong> Due to the flexibility of the PDF format, converting PDF to Word cannot perfectly preserve all formatting. Additionally, the automatic typesetting after translation may cause some tables to be misaligned.</p><h4>2.1. Translate and replace the text of the PDF directly (Google Translate)</h4><p>This approach extracts detailed information from the PDF text (such as text, color, font, position, and direction), and then translates and replaces the original text in its original position to achieve the effect of translation.</p><p><strong>Disadvantage:</strong> When the translated text is longer than the original text, the font needs to be reduced in order to fit in the original position.</p><blockquote>If the length of the translated text is nearly the same or shorter than the original, this way can product more perfect translated document.</blockquote><p><a href="https://otranslator.com">OTranslator.com</a> uses the implementation of solution 2, replacing Google’s machine translation with <strong>ChatGPT</strong>, while optimizing for various scenarios of PDF format.</p><p>The translated effects can be found in the introduction of the previous article:</p><p><a href="https://medium.com/@loger.zhu/how-to-translate-documents-pdf-docx-pptx-using-chatgpt-e0f86aeed69b">How to translate documents (PDF/EPUB/Word/Excel/PowerPoint) using ChatGPT?</a></p><h3>2. How to extract text info from a PDF using PDF.js?</h3><blockquote>PDF.js is a JavaScript library used for rendering and interacting with PDF documents on a web page. Although primarily used for displaying PDF documents in a browser, PDF.js also provides APIs and functionalities for extracting text information from PDFs.</blockquote><h4>2.1 install</h4><pre>npm install pdfjs-dist</pre><h4>2.2 Load PDF Documents</h4><pre>const pdfFile = &quot;Your PDF file path&quot;<br>const pdf = await getDocument({<br>      url: pdfFile,<br>      cMapUrl: &#39;./node_modules/pdfjs-dist/cmaps/&#39;,<br>      cMapPacked: true<br>    }).promise</pre><h4>2.3 Extract Text info</h4><p>If you need to read the text information on the first page, first obtain the reference to the document information of the first page through getPage.</p><pre>const pageNumber = 1<br>const page = await pdf.getPage(pageNumber)</pre><p>Then get the text information on the page through getTextContent().items.</p><pre>const textContent = await page.getTextContent()<br>console.log(textContent.items)</pre><p>In each item, it represents information about a small piece of text on the page, for example.</p><pre>{<br> &quot;str&quot;: &quot;Rearranging to avoid summing the infinite tail of the distribution...&quot;,<br> &quot;dir&quot;: &quot;ltr&quot;,<br> &quot;width&quot;: 269.9515540245999,<br> &quot;height&quot;: 10.1,<br> &quot;transform&quot;: [10.1, 0, 0, 10.1, 108.1, 375.5],<br> &quot;fontName&quot;: &quot;g_d0_f2&quot;,<br> &quot;hasEOL&quot;: false<br>}</pre><p>Field Description</p><ul><li><strong>str</strong>: Text content</li><li><strong>dir</strong>: Text direction: ttb(top to bottom), ltr(left to right), rtl(right to left)</li><li><strong>width</strong>: Text Width</li><li><strong>height</strong>: Text Height</li><li><strong>transform</strong>: Used to describe the transformation matrix of the text item. The last two numbers (108.1, 375.5) represent the X and Y coordinates of the font location.</li><li><strong>fontName</strong>: Style code of the font within the PDF, which needs to be converted to obtain the actual font name.</li></ul><h3>3. Calculation of Text Information</h3><p>From the item of TextContent above, we can already obtain the coordinates and text information of the text.</p><p>Once the translation is completed, the translated text can be output to the translated document based on these two pieces of information. However, it can actually be done better.</p><h4>3.1 Get the font information of the text and use the original font to output the translation</h4><pre>const fontFace = page.commonObjs.get(item.fontName)</pre><p>fontFace contains font information such as the name, type, and binary data of the font. The font information serves two main purposes:</p><ul><li>Based on the font’s name, search and download the font from websites. In order to reduce the size of a PDF file, the font used in the document is self-contained with the character data.</li><li>Determine if the translated characters can be printed using the original characters. Each font has its own list of supported characters. For example, an English font does not support Chinese characters. Direct printing would result in square boxes being displayed in the PDF.</li><li>Calculate the length of the translated text. When the translated text is too long, the font size needs to be compressed proportionally based on the length.</li></ul><h4>3.2 Angle of Text Printing</h4><p>Not all text is horizontal; some may be printed at a 45-degree angle like watermark text.</p><p>The actual printing angle can be calculated based on the transform field.</p><pre>const angle = Math.atan2(item.transform[1], item.transform[0])</pre><h4>3.3 Get the text color and output the translation using the original font color</h4><p>Calculating text color is a complex problem with scarce information available online.</p><p>First, it is necessary to understand the instructions in PDF. In simple terms, a PDF page is actually the result of executing a series of operation. For example:</p><pre>setTextMatrix [100,100, 200,300] // Specify the text area<br>setFillRGBColor [255, 255, 0] // Set the fill color to yellow RGB(255,255,0)<br>moveText [125, 130]           // Move the coordinates to (125, 130)<br>drawText [{chat:xx...}...]    // Enter text at the current coordinates</pre><p>As can be seen, the color and position of the text are determined by the filling color at the moment when the text is output. Therefore, to calculate the color of an item, two steps are needed:</p><ol><li>Locate the position of the item in the instruction stream.</li><li>Parse the execution instructions related to color and calculate the filling color of the input item at that moment.</li></ol><p>First, obtain an instruction stream of a page and print it out for learning and observation. (Learning with a purpose, in my opinion, is a better way to study).</p><pre>const operatorList = await page.getOperatorList()<br><br>for (let fnIndex = 0; fnIndex &lt; operatorList.fnArray.length; fnIndex++) {<br>    const fn = operatorList.fnArray[fnIndex]<br>    const args = operatorList.argsArray[fnIndex]<br>    // print the operation list for learning<br>    console.log(Object.keys(OPS).find(key =&gt; OPS[key] == fn), args)<br>}</pre><p>Then, calculate the position of the item and the filling color at that time based on the instructions related to text position and color changes.</p><p>The detailed explanation of this problem is already in the code, so it will not be further described.</p><pre>export function getItemColor(item: TextItem, operatorList: PDFOperatorList) {<br>  // Stack for recording states<br>  const stack: PDFStatus[] = []<br>  // Current state record<br>  let currentStatus: PDFStatus = {}<br>  <br>  //  Analyze the page instructions in order<br>  for (let fnIndex = 0; fnIndex &lt; operatorList.fnArray.length; fnIndex++) {<br>    const fn = operatorList.fnArray[fnIndex]<br>    const args = operatorList.argsArray[fnIndex]<br>    switch (fn) {<br>      //push currentStatus to stack<br>      case OPS.save:<br>        stack.push(currentStatus)<br>        currentStatus = { ...currentStatus }<br>        break<br>      //restore currentStatus from stack<br>      case OPS.restore:<br>        currentStatus = stack.pop() ?? {}<br>        break<br>      //Set text fill color<br>      case OPS.setFillRGBColor:<br>        currentStatus.currentColor = [args[0], args[1], args[2]]<br>        break<br>      //Set text area<br>      case OPS.setTextMatrix:<br>        currentStatus.currentMatrix = [args[4], args[5]]<br>        currentStatus.currentXY = [args[4], args[5]]<br>        break<br>      //Set line spacing<br>      case OPS.setLeading:<br>        currentStatus.leading = args[0]<br>        break<br>      //Set font type and size<br>      case OPS.setFont:<br>        currentStatus.font = [args[0], args[1]]<br>        break<br>      //Calculate line break, when line break occurs, the current coordinates need to jump to the beginning of the next line<br>      case OPS.nextLine:<br>      case OPS.nextLineShowText:<br>      case OPS.nextLineSetSpacingShowText:<br>        if (currentStatus.leading &amp;&amp; currentStatus.currentXY) {<br>          currentStatus.currentXY = [currentStatus.currentXY[0], currentStatus.currentXY[1] - currentStatus.leading]<br>        }<br>        break<br>      // Move text coordinates<br>      case OPS.moveText:<br>        if (currentStatus.currentXY) {<br>          currentStatus.currentXY = [currentStatus.currentXY[0] + args[0], currentStatus.currentXY[1] + args[1]]<br>        }<br>        break<br>      //Show text<br>      case OPS.showText:<br>        if (currentStatus.currentXY) {<br>          let x = currentStatus.currentXY[0]<br>          let y = currentStatus.currentXY[1]<br>          // Check if the text matches the position<br>          const isMatch = () =&gt;<br>            Math.abs(x - item.transform[4]) &lt; item.height / 5 &amp;&amp; Math.abs(y - item.transform[5]) &lt; item.height / 5<br>          if (isMatch()) {<br>            return currentStatus.currentColor<br>          }<br>          if (args[0]) {<br>            // Calculate the actual coordinates of each printed character, and then match them with the coordinates of the item<br>            for (let charInfo of args[0]) {<br>              if (typeof charInfo?. width == &#39;number&#39; &amp;&amp; currentStatus.font) {<br>                if (isMatch()) {<br>                  return currentStatus.currentColor<br>                }<br>                x += (charInfo?. width / 1000) * currentStatus.font[1]<br>              } else if (typeof charInfo == &#39;number&#39; &amp;&amp; currentStatus.font) {<br>                if (isMatch()) {<br>                  return currentStatus.currentColor<br>                }<br>                x -= (charInfo / 1000) * currentStatus.font[1]<br>              }<br>            }<br>          }<br>        }<br>        break<br>    }<br>  }<br>}</pre><p>I will continue to share some code about translating various file formats in the future. If you are interested, please follow me. Thank you.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=b669f092fc1d" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Translating Documents with ChatGPT: A Comprehensive Guide]]></title>
            <link>https://medium.com/@otranslator/how-to-translate-documents-pdf-docx-pptx-using-chatgpt-e0f86aeed69b?source=rss-1775b3748501------2</link>
            <guid isPermaLink="false">https://medium.com/p/e0f86aeed69b</guid>
            <category><![CDATA[document-translation]]></category>
            <category><![CDATA[translation]]></category>
            <category><![CDATA[chatgpt]]></category>
            <category><![CDATA[pdf-translation]]></category>
            <dc:creator><![CDATA[Loger]]></dc:creator>
            <pubDate>Wed, 30 Aug 2023 16:09:58 GMT</pubDate>
            <atom:updated>2024-04-18T07:53:10.112Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*lZ-l6YYe2d2hgoNpkaxheg.png" /></figure><h3>1. The Advantages of ChatGPT Translation</h3><ul><li><strong>Enhanced Understanding</strong>: Unlike other machine translation tools, ChatGPT excels at accurately translating longer and more complex sentences by comprehensively considering the entire paragraph.</li><li><strong>Consistency in Style and Tone</strong>: ChatGPT translates text while maintaining the original tone and style. It captures the emotional nuances and conveys them accurately, unlike other tools that often provide standardized and neutral translations.</li><li><strong>Improved Expression</strong>: Utilizing natural language generation (NLG), ChatGPT produces smoother and more precise translations that closely resemble human-level quality. It ensures accurate sentence structures and preserves the semantic meaning of the original text.</li><li><strong>Multi-Language Support</strong>: ChatGPT facilitates translation across multiple languages, augmenting its versatility and usefulness.</li></ul><h3>2. Document Translation with ChatGPT</h3><p>While ChatGPT excels at generating natural language text, it does not directly translate documents. However, it can be employed for document translation by inputting the content sentence by sentence, thereby obtaining the desired translation results.</p><h3><strong>3. </strong>Drawbacks of ChatGPT Translation</h3><p>Due to its complex nature, ChatGPT necessitates more computational resources and time to process input text. Consequently, the translation speed is relatively slower compared to other methods.</p><h3>4. Translate document on OTranslator.com</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*d88IIC1ncgbH-kEQi06F1Q.png" /></figure><p><a href="https://otranslator.com">OTranslator.com</a> enables translation of various file formats.</p><p><strong>Supported Translation Formats</strong>:</p><ul><li>Office documents (<strong>Word, Excel, PPT</strong>) and ebooks (<strong>Epub</strong>) can retain their original formatting completely.</li><li>Game text (exported as <strong>ManualTransFile.json</strong> by MTool) is specifically optimized for Japanese game text.</li><li>Game text (exported as TXT files by XUnity.AutoTranslator) is also specially tailored for Japanese game text.</li><li><strong>PDF</strong> can maintain most of the formatting;</li><li>Subtitles (<strong>SRT</strong>), offering bilingual translation and output.</li><li><strong>PO</strong> files</li><li><strong>TXT</strong> files</li><li><strong>HTML</strong> files</li><li><strong>XML</strong> files</li><li><strong>XLF</strong> (XML Localization Interchange File Format): A standard format for localization data exchange released by the OASIS standards consortium.</li><li><strong>JSON</strong> files: Support for translating common JSON string values.</li><li><strong>ZIP</strong> files: Catering to the needs of batch translation (format-wise, supporting all the aforementioned file types except for PDF).</li></ul><p>These translations can uphold original formatting and are free for files containing less than 1500 words.</p><p><strong>Supported Languages:</strong></p><blockquote>Simplified Chinese,Traditional Chinese,English,Spanish,</blockquote><blockquote>French,Hindi,Bengali,Portuguese,Russian,Japanese,</blockquote><blockquote>German,Italian,Korean,Turkish,Dutch,Polish,Ukrainian,</blockquote><blockquote>Romanian,Vietnamese,Indonesian,Thai,Czech,Hungarian,</blockquote><blockquote>Slovak,Bulgarian,Serbian,Croatian,Slovenian,Icelandic,Finnish,</blockquote><blockquote>Swedish,Danish,Norwegian,Albanian,Armenian,Azerbaijani,</blockquote><blockquote>Belarusian,Catalan,Estonian,Filipino,Greek,Gujarati,Haitian Creole,</blockquote><blockquote>Irish,Latvian,Lithuanian,Malay,Marathi,Maltese, Punjabi,Sinhala,Tamil,Lao</blockquote><p><strong>Translate File Step</strong></p><ol><li>Goto <a href="https://otranslator.com">https://otranslator.com/</a></li><li>Login with your Google Account</li><li>Upload your document</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*GLiILuJhG2kzXOzgHn_2vg.png" /></figure><p>4. Check the translation result.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*w2JG5zmwrhS2SGdM3GQYJQ.png" /></figure><p>Other Example:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*IMkkJm41G0jxCCGE2hAIxg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9htsYAam2l_bxYo8lijd0Q.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_bm0bXMnRSMx8f5uaNEvEA.png" /></figure><p>Don’t miss out on the opportunity to explore a world of seamless communication across languages.</p><p>Give it a try today and unleash the power of <a href="https://otranslator.com">OTranslator.com</a> !</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e0f86aeed69b" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>