chars/bytes confusion in the error emitter

`src/librustc_errors/snippet.rs` has [big comment](https://github.com/rust-lang/rust/blob/32b50e280faf56f21cbd82d1cf82cb4795535143/src/librustc_errors/snippet.rs#L123-L124) saying that the column info is provided in characters, not in bytes. However, the error emitter doesn't care about that at all and uses these like byte offsets all over the place. This leads to bugs like #44023 and #44078 .

As an example, look how span printing varies with varying characters used:

Correct case:

```
12 |       "B   "";
   |  ___________^
```

Now add an emoji character:

```
12 |       "😊   "";
   |  ___________^
```

Note how its off by one char now. This can stack up:

```
12 |       "😊😊😊😊   "";
   |  ______________^
```

If I didn't use any spaces at all, I'd run into #44078.

Now this can be fixed by going through the emitter code and looking for all places where the pos is used in a byte position fashion. A much more proper fix instead is to stop trusting that people read comments and encode this via the type system. There is already a mechanism for that inside the compiler, its `libsyntax_pos::CharPos`! Just convert the types of `start_col`, `end_col` members of the `MultilineAnnotation` and `Annotation` structs to `CharPos`, or maybe to `BytePos` if that's preferred.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chars/bytes confusion in the error emitter #44080

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

chars/bytes confusion in the error emitter #44080

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions