33

Vector's new method data() provides a const and non-const version.
However string's data() method only provides a const version.

I think they changed the wording about std::string so that the chars are now required to be contiguous (like std::vector).

Was std::string::data just missed? Or is the a good reason to only allow const access to a string's underlying characters?

note: std::vector::data has another nice feature, it's not undefined behavior to call data() on an empty vector. Whereas &vec.front() is undefined behavior if it's empty.

12
  • 1
    I didn't knew std::vector::data returns null when the vector is empty. Why is that a nice feature? Commented Sep 22, 2011 at 17:16
  • 2
    personally I prefer to use 'empty' to check if a string or vector is empty, but that is just me. Commented Sep 22, 2011 at 17:19
  • 2
    @R.MartinhoFernandes As you can easily supply the vector data to a function taking a pointer and coping with null pointers itself, without checking for emptiness yourself. Not an important feature, but a nice one. Commented Sep 22, 2011 at 17:21
  • 2
    Anyways, the point is moot. std::vector::data is not spec'd to return NULL. Commented Sep 22, 2011 at 17:26
  • 4
    @Anders: f(v.empty() ? NULL : &v.front()) is quite a mouthful, though, compared to f(v.data()). Commented Sep 22, 2011 at 17:28

4 Answers 4

31

In C++98/03 there was good reason to not have a non-const data() due to the fact that string was often implemented as COW. A non-const data() would have required a copy to be made if the refcount was greater than 1. While possible, this was not seen as desirable in C++98/03.

In Oct. 2005 the committee voted in LWG 464 which added the const and non-const data() to vector, and added const and non-const at() to map. At that time, string had not been changed so as to outlaw COW. But later, by C++11, a COW string is no longer conforming. The string spec was also tightened up in C++11 such that it is required to be contiguous, and there's always a terminating null exposed by operator[](size()). In C++03, the terminating null was only guaranteed by the const overload of operator[].

So in short a non-const data() looks a lot more reasonable for a C++11 string. To the best of my knowledge, it was never proposed.

Update

charT* data() noexcept;

was added basic_string in the C++1z working draft N4582 by David Sankel's P0272R1 at the Jacksonville meeting in Feb. 2016.

Nice job David!

Sign up to request clarification or add additional context in comments.

7 Comments

So it looks like the missing non-const data method was just forgotten. How does one file a Defect Report against the standard?
@deft_code Indeed seems so. Another oversight may be the lack of data() in std::initializer_list, which has size() and the iterators begin/end, but not data() for template genericity with vector and string (since using &(*begin()) means deferencing a potentially invalid iterator if empty container).
In case anyone from the future finds this useful, I've asked about this on std-discussion and will submit a defect report (unless I'm told not to). Also, an alternative link for submitting issues is here (it's not a massive page like the other link). Depending on how this goes, I may bring up std::initializer_list too.
@Cornstalks I'd like to know how it turned out, can you share some follow-up?
In C++14 (N3937) there is no change. string still has only a const data().
|
2

Historically, the string data has not been const because it would prevent several common optimizations, like copy-on-write (COW). This is now, IIANM, far less common, because it behaves badly with multithreaded programs.

BTW, yes they are now required to be contiguous:

[string.require].5: The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

Another reason might be to avoid code such as:

std::string ret;
strcpy(ret.data(), "whatthe...");

Or any other function that returns a preallocated char array.

8 Comments

How does this answer the question? Given that std::basic_string<>'s storage is now required to be contiguous, why wasn't a non-const overload of std::basic_string<>::data() added?
@rodrigo, sorry I wasn't clear. I meant, is there a good reason to only allow const access to the character data via the data() method. Thanks for the contiguous storage reference.
I don't see how a non-const data() is any worse for COW than, say, the existing possibility get a non-const-qualified pointer from non-const &front(). In both cases the implementation would have to perform the copy before returning the address -- contiguous or not, the problem for COW is that if the user can modify the element through a pointer, then the referand must be an element of that instance alone.
@SteveJessop AFAIK, basic_string doesn't have a front() member, so the only way to access to pointers to characters are &*iterators, &s[x], c_str() and now data(). I guess that it has to be with use-cases. That is, if you use operator[], you are likely to modify the string, but if you use c_str()/data() you are likely not to modify it. YMMV.
I like the term "copy on fright" to describe the "copy on write and non-const blah blah blah". The string gets copied on write, or if you just frighten it by taking a reference or shouting "Boo!" at it. (I first heard that term used by Andy Sawyer).
|
1

Although I'm not that well-versed in the standard, it might be due to the fact that std::string doesn't need to contain null-terminated data, but it can and it doesn't need to contain an explicit length field, but it can. So changing the undelying data and e.g. adding a '\0' in the middle might get the strings length field out of sync with the actual char data and thus leave the object in an invalid state.

3 Comments

The spec says that data() and c_str() "shall not alter any of the values stored in the character array." I thing that means they can't add a '\0', but maybe that doesn't count because the '\0' would be outside the range [0,size()).
No, you can have embedded '\0's in a std::string, and they should work just like any other character. Don't try to printf() them though.
See my answer for clarification about std::string (not enough room here)
0

@Christian Rau

From the time the original Plauger (around 1995 I think) string class was STL-ized by the committee (turned into a Sequence, templatified), std::string has always been std::vector plus string-related stuff (conversion from/to 0-terminated, concatenation, ...), plus some oddities, like COW that's actually "Copy on Write and on non-const begin()/end()/operator[]".

But ultimately a std::string is really a std::vector under another name, with a slightly different focus and intent. So:

  • just like std::vector, std::string has either a size data member or both start and end data members;
  • just like std::vector, std::string does not care about the value of its elements, embedded NUL or others.

std::string is not a C string with syntax sugar, utility functions and some encapsulation, just like std::vector<T> is not T[] with syntax sugar, utility functions and some encapsulation.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.