-
Notifications
You must be signed in to change notification settings - Fork 391
Closed
Description
For example, when parsing a PDF such as test/pdf/misc/i28_line_break_210.pdf in this repo,
const pdfParser = new PDFParser();
pdfParser.on('pdfParser_dataError', (errData: PDFParserError) => {
// handle error
});
pdfParser.on('pdfParser_dataReady', (pdfData: PDFData) => {
try {
// write pdfData.Pages[0].Texts to file
} catch (error) {
// handle error
}
});
pdfParser.loadPDF(`test/pdf/misc/i28_line_break_210.pdf`);
The resulting output is something like this:
[ { x: -0.25,
y: 48.75,
w: 3,
clr: 0,
sw: 0.32553125,
A: 'left',
R: [ { T: '%20', S: -1, TS: [ 0, 15, 0, 0 ] } ] },
{ x: -0.25,
y: 48.75,
w: 110.016,
clr: 0,
sw: 0.32553125,
A: 'left',
R: [ { T: 'BY%20ORDER%20OF%20THE%20', S: -1, TS: [ 0, 16, 1, 1 ] } ] },
{ x: -0.25,
y: 48.75,
w: 3,
clr: 0,
sw: 0.32553125,
A: 'left',
R: [ { T: '%20', S: -1, TS: [ 0, 16, 1, 1 ] } ] },
{ x: -0.25,
y: 48.75,
w: 140.376,
clr: 0,
sw: 0.32553125,
A: 'left',
R: [ { T: 'SECRETARY%20OF%20THE%20AIR', S: -1, TS: [ 0, 16, 1, 1 ] } ] },
...
Sometimes the first item has a unique x/y coordinate, but thereafter all elements have the same x/y coordinate.
This makes spatial-aware parsing and grouping algorithms which depend on these coordinates useless.
Output was fine with version 3.2.0, and is broken in version 3.2.1 and 3.2.2.
The issue happens for both parseBuffer() and loadPDF().
The major refactor of version 3.2.1 added "Type3 glyph font support", but after reviewing the diff I was unable to identify where the above undesirable behavior was introduced.
pderiy
Metadata
Metadata
Assignees
Labels
No labels