The distinction between reference types and value types are basically a performance tradeoff in the design of the language. Reference types have some overhead on construction and destruction and garbage collection, because they are created on the heap. Value types on the other hand have overhead on method calls , because the whole object is copied rather than just a pointer.
Because strings can be much larger than the size of a pointer, they are designed as reference types. Also, as Servy pointed out, the size of a value type must be known at compile time, which is not always the case for strings.
Strings aren't value types since they can be huge, and need to be stored on the heap. Value types are stored on the stack. Stack allocating strings would break all sorts of things: the stack is only 1MB, you'd have to box each string, incurring a copy penalty, you couldn't intern strings, and memory usage would balloon, etc
That is why a string is really immutable because when you change it even if it is of the same size the compiler doesn't know that and has to allocate a new array and assign characters to the positions in the array. It makes sense if you think of strings as a way that languages protect you from having to allocate memory on the fly