0%

Java系统化学习之String

String介绍

API描述

java.lang.String的源码中明确说明,String类表示字符串,Java中所有的字符串传字面值(如:”abc”)都是该类的实例;并且字符串是常量,一旦创建之后则不可以改变,String buffers支持可变的字符串。因为String对象是不可变的,但是可以共享。

1
2
3
4
5
6
7
8
9
10
11
12
 * The {@code String} class represents character strings. All
* string literals in Java programs, such as {@code "abc"}, are
* implemented as instances of this class.
* <p>
* Strings are constant; their values cannot be changed after they
* are created. String buffers support mutable strings.
* Because String objects are immutable they can be shared.

public final class String
implements java.io.Serializable,
Comparable<String>, CharSequence {
}

通过API定义总结以下信息:

  1. 使用final修饰,则不能被继承(该类中的成员方法默认都是final)
  2. 实现接口Serializable,说明可以进行序列化
  3. 实现接口Comparable,说明可以大小比较
  4. 实现接口CharSequence,说明String本身就是char类型的数组,而且通过成员变量定义可以进行佐证
  5. 通过char字符数组实现

如何创建字符串

直接赋值

直接赋值方式创建对象是在方法区的常量池。
String str = "abc";

使用new

通过构造方法创建字符串对象是在堆内存.
String str = new String("abc");

字符串拼接

String str = "abc" + "bdc";

分析

参考代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
package com.sunld.string;

public class StringTest {

public static void main(String[] args) {
String str1 = "sunld";
String str2 = new String("sunld");
//引用传递,str3直接指向st2的堆内存地址
String str3 = str2;
String str4 = "sunld";
/**
* ==:
* 1. 基本数据类型:比较的是基本数据类型的值是否相同
* 2. 引用数据类型:比较的是引用数据类型的地址值是否相同
* 3. 所以在这里的话:String类对象==比较,比较的是地址,而不是内容
*/
System.out.println(str1==str2);//false
System.out.println(str1==str3);//false
System.out.println(str3==str2);//true
System.out.println(str1==str4);//true
}

}

内存分配

特殊说明

  1. 在使用new创建对象时,首先会在堆中的对象区域创建一个区域分配空间和完成初始化
  2. 在使用字符串时,首先会判断字符串常量池中是否存在,否则则创建,然后堆对象会指向该字符串
  3. 直接赋值:只开辟一块堆内存空间,并且会自动入池,不会产生垃圾。
  4. 构造方法:会开辟两块堆内存空间,其中一块堆内存会变成垃圾被系统回收,而且不能够自动入池,需要通过public String intern();方法进行手工入池。
  5. String类对象一旦声明则不可以改变;而改变的只是地址,原来的字符串还是存在的,并且产生垃圾
  6. 当使用字符串拼接生成字符串时,如果在编译器就可以确定字符串,则使用==判断时则返回true
1
2
3
4
5
6
7
8
public class TestString {
public static void main(String args[]){
//对匿名对象"hello"进行手工入池操作
String str =new String("sunld").intern();
String str1="sunld";
System.out.println(str==str1);//true
}
}

源码分析

成员变量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
/**
* String 使用final 修饰的字符数组进行存储,
* 并且字符串的长度和是否为空都是通过该数组的长度判断
*/
private final char value[];

/** Cache the hash code for the string */
private int hash; // Default to 0

/** use serialVersionUID from JDK 1.0.2 for interoperability */
private static final long serialVersionUID = -6849794470754667710L;

/**
* Class String is special cased within the Serialization Stream Protocol.
*
* A String instance is written into an ObjectOutputStream according to
* <a href="{@docRoot}/../platform/serialization/spec/output.html">
* Object Serialization Specification, Section 6.2, "Stream Elements"</a>
*/
private static final ObjectStreamField[] serialPersistentFields = new ObjectStreamField[0];

字符串长度

String内部是以char数组的形式存储,数组的长度是int类型,那么String允许的最大长度就是Integer.MAX_VALUE了。又由于java中的字符是以16位存储的,因此大概需要4GB的内存才能存储最大长度的字符串。不过这仅仅是对字符串变量而言,如果是字符串字面量(string literals),如“abc”、”1a2b”之类写在代码中的字符串literals,那么允许的最大长度取决于字符串在常量池中的存储大小,也就是字符串在class格式文件中的存储格式:

1
2
3
4
5
CONSTANT_Utf8_info {
        u1 tag;
        u2 length;
        u1 bytes[length];
}

u2是无符号的16位整数,因此理论上允许的string literal的最大长度是2^16-1=65535。然而实际测试表明,允许的最大长度仅为65534,超过就编译错误了,有兴趣可以写段代码试试,估计是length还不能为0。

构造函数

Constructor Description
String() 初始化新创建的 String对象,使其表示空字符序列。
String(byte[] bytes) 通过使用平台的默认字符集解码指定的字节数组来构造新的 String 。
String(byte[] bytes, Charset charset) 构造一个新的String由指定用指定的字节的数组解码charset 。
String(byte[] bytes, int offset, int length) 通过使用平台的默认字符集解码指定的字节子阵列来构造新的 String 。
String(byte[] bytes, int offset, int length, Charset charset) 构造一个新的String通过使用指定的指定字节子阵列解码charset 。
String(byte[] bytes, int offset, int length, String charsetName) 构造一个新的 String通过使用指定的字符集解码指定的字节子阵列。
String(byte[] bytes, String charsetName) 构造一个新的String由指定用指定的字节的数组解码charset 。
String(char[] value) 分配一个新的 String ,以便它表示当前包含在字符数组参数中的字符序列。
String(char[] value, int offset, int count) 分配一个新的 String ,其中包含字符数组参数的子阵列中的字符。
String(int[] codePoints, int offset, int count) 分配一个新的 String ,其中包含 Unicode code point数组参数的子阵列中的 字符 。
String(String original) 初始化新创建的String对象,使其表示与参数相同的字符序列
换句话说,新创建的字符串是参数字符串的副本。
String(StringBuffer buffer) 分配一个新的字符串,其中包含当前包含在字符串缓冲区参数中的字符序列。
String(StringBuilder builder) 分配一个新的字符串,其中包含当前包含在字符串构建器参数中的字符序列。

判断功能

equals

比较字符串内容是否相同。重写Object类中的equals方法。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
/**
* Compares this string to the specified object. The result is {@code
* true} if and only if the argument is not {@code null} and is a {@code
* String} object that represents the same sequence of characters as this
* object.
*
* @param anObject
* The object to compare this {@code String} against
*
* @return {@code true} if the given object represents a {@code String}
* equivalent to this string, {@code false} otherwise
*
* @see #compareTo(String)
* @see #equalsIgnoreCase(String)
* 判断对象存储内容是否相同
*/
public boolean equals(Object anObject) {
if (this == anObject) {// 首先判断是否属于同一对象
return true;
}
if (anObject instanceof String) {
// 类型String----长度---数组中的字符
String anotherString = (String)anObject;
int n = value.length;
if (n == anotherString.value.length) {
char v1[] = value;
char v2[] = anotherString.value;
int i = 0;
while (n-- != 0) {// 循环判断不等
if (v1[i] != v2[i])
return false;
i++;
}
return true;
}
}
return false;
}

equalsIgnoreCase

比较字符串的内容是否相同,忽略大小写。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
/**
* Compares this {@code String} to another {@code String}, ignoring case
* considerations. Two strings are considered equal ignoring case if they
* are of the same length and corresponding characters in the two strings
* are equal ignoring case.
*
* <p> Two characters {@code c1} and {@code c2} are considered the same
* ignoring case if at least one of the following is true:
* <ul>
* <li> The two characters are the same (as compared by the
* {@code ==} operator)
* <li> Applying the method {@link
* java.lang.Character#toUpperCase(char)} to each character
* produces the same result
* <li> Applying the method {@link
* java.lang.Character#toLowerCase(char)} to each character
* produces the same result
* </ul>
*
* @param anotherString
* The {@code String} to compare this {@code String} against
*
* @return {@code true} if the argument is not {@code null} and it
* represents an equivalent {@code String} ignoring case; {@code
* false} otherwise
*
* @see #equals(Object)
*/
public boolean equalsIgnoreCase(String anotherString) {
return (this == anotherString) ? true
: (anotherString != null)
&& (anotherString.value.length == value.length)
&& regionMatches(true, 0, anotherString, 0, value.length);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
/**
* Tests if two string regions are equal.
* <p>
* A substring of this {@code String} object is compared to a substring
* of the argument {@code other}. The result is {@code true} if these
* substrings represent character sequences that are the same, ignoring
* case if and only if {@code ignoreCase} is true. The substring of
* this {@code String} object to be compared begins at index
* {@code toffset} and has length {@code len}. The substring of
* {@code other} to be compared begins at index {@code ooffset} and
* has length {@code len}. The result is {@code false} if and only if
* at least one of the following is true:
* <ul><li>{@code toffset} is negative.
* <li>{@code ooffset} is negative.
* <li>{@code toffset+len} is greater than the length of this
* {@code String} object.
* <li>{@code ooffset+len} is greater than the length of the other
* argument.
* <li>{@code ignoreCase} is {@code false} and there is some nonnegative
* integer <i>k</i> less than {@code len} such that:
* <blockquote><pre>
* this.charAt(toffset+k) != other.charAt(ooffset+k)
* </pre></blockquote>
* <li>{@code ignoreCase} is {@code true} and there is some nonnegative
* integer <i>k</i> less than {@code len} such that:
* <blockquote><pre>
* Character.toLowerCase(this.charAt(toffset+k)) !=
Character.toLowerCase(other.charAt(ooffset+k))
* </pre></blockquote>
* and:
* <blockquote><pre>
* Character.toUpperCase(this.charAt(toffset+k)) !=
* Character.toUpperCase(other.charAt(ooffset+k))
* </pre></blockquote>
* </ul>
*
* @param ignoreCase if {@code true}, ignore case when comparing
* characters.
* @param toffset the starting offset of the subregion in this
* string.
* @param other the string argument.
* @param ooffset the starting offset of the subregion in the string
* argument.
* @param len the number of characters to compare.
* @return {@code true} if the specified subregion of this string
* matches the specified subregion of the string argument;
* {@code false} otherwise. Whether the matching is exact
* or case insensitive depends on the {@code ignoreCase}
* argument.
*/
public boolean regionMatches(boolean ignoreCase, int toffset,
String other, int ooffset, int len) {
char ta[] = value;
int to = toffset;
char pa[] = other.value;
int po = ooffset;
// Note: toffset, ooffset, or len might be near -1>>>1.
if ((ooffset < 0) || (toffset < 0)
|| (toffset > (long)value.length - len)
|| (ooffset > (long)other.value.length - len)) {
return false;
}
while (len-- > 0) {
char c1 = ta[to++];
char c2 = pa[po++];
if (c1 == c2) {
continue;
}
if (ignoreCase) {
// If characters don't match but case may be ignored,
// try converting both characters to uppercase.
// If the results match, then the comparison scan should
// continue.
char u1 = Character.toUpperCase(c1);
char u2 = Character.toUpperCase(c2);
if (u1 == u2) {
continue;
}
// Unfortunately, conversion to uppercase does not work properly
// for the Georgian alphabet, which has strange rules about case
// conversion. So we need to make one last check before
// exiting.
if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
continue;
}
}
return false;
}
return true;
}

startsWith

判断字符串对象是否以指定的str开头。源码中表现为按照长度截断之后进行循环比对

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
/**
* Tests if the substring of this string beginning at the
* specified index starts with the specified prefix.
*
* @param prefix the prefix.
* @param toffset where to begin looking in this string.
* @return {@code true} if the character sequence represented by the
* argument is a prefix of the substring of this object starting
* at index {@code toffset}; {@code false} otherwise.
* The result is {@code false} if {@code toffset} is
* negative or greater than the length of this
* {@code String} object; otherwise the result is the same
* as the result of the expression
* <pre>
* this.substring(toffset).startsWith(prefix)
* </pre>
*/
public boolean startsWith(String prefix, int toffset) {
char ta[] = value;
int to = toffset;
char pa[] = prefix.value;
int po = 0;
int pc = prefix.value.length;
// Note: toffset might be near -1>>>1.
if ((toffset < 0) || (toffset > value.length - pc)) {
return false;
}
while (--pc >= 0) {
if (ta[to++] != pa[po++]) {
return false;
}
}
return true;
}

endsWith

判断字符串对象是否以指定的str结尾,源码中转换成startsWith处理。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/**
* Tests if this string ends with the specified suffix.
*
* @param suffix the suffix.
* @return {@code true} if the character sequence represented by the
* argument is a suffix of the character sequence represented by
* this object; {@code false} otherwise. Note that the
* result will be {@code true} if the argument is the
* empty string or is equal to this {@code String} object
* as determined by the {@link #equals(Object)} method.
*/
public boolean endsWith(String suffix) {
return startsWith(suffix, value.length - suffix.value.length);
}

获取功能

length

获取字符串的长度,其实也就是字符个数

1
2
3
4
5
6
7
8
9
10
11
12
/**
* Returns the length of this string.
* The length is equal to the number of <a href="Character.html#unicode">Unicode
* code units</a> in the string.
* 返回字符串的长度,长度等于字符串中的Unicode代码单元的数目(UTF-16的代码单元的数目)
*
* @return the length of the sequence of characters represented by this
* object.
*/
public int length() {
return value.length;
}

charAt

获取指定索引处的字符

indexOf

获取str在字符串对象中第一次出现的索引

substring(int start,int end)

从start开始,到end结束截取字符串。包括start,不包括end

转换功能

toCharArray

把字符串转换为字符数组

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/**
* Converts this string to a new character array.
*
* @return a newly allocated character array whose length is the length
* of this string and whose contents are initialized to contain
* the character sequence represented by this string.
*/
public char[] toCharArray() {
// Cannot use Arrays.copyOf because of class initialization order issues
/**
* 1. String 和Arrays 都属于rt.jar中的类,但是BootstrapClassloader在加载这两个类的顺序是不同的。
* 2. 所以当String.class被加载进内存的时候,Arrays此时没有被加载,所以直接使用肯定会抛异常。
* 3. 而System.arrayCopy是使用native代码,则不会有这个问题。
*/
char result[] = new char[value.length];
System.arraycopy(value, 0, result, 0, value.length);
return result;
}

toLowerCase

把字符串转换为小写字符串

toUpperCase

把字符串转换为大写字符串

其他常用

trim

去除字符串两端空格

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
/**
* Returns a string whose value is this string, with any leading and trailing
* whitespace removed.
* <p>
* If this {@code String} object represents an empty character
* sequence, or the first and last characters of character sequence
* represented by this {@code String} object both have codes
* greater than {@code '\u005Cu0020'} (the space character), then a
* reference to this {@code String} object is returned.
* <p>
* Otherwise, if there is no character with a code greater than
* {@code '\u005Cu0020'} in the string, then a
* {@code String} object representing an empty string is
* returned.
* <p>
* Otherwise, let <i>k</i> be the index of the first character in the
* string whose code is greater than {@code '\u005Cu0020'}, and let
* <i>m</i> be the index of the last character in the string whose code
* is greater than {@code '\u005Cu0020'}. A {@code String}
* object is returned, representing the substring of this string that
* begins with the character at index <i>k</i> and ends with the
* character at index <i>m</i>-that is, the result of
* {@code this.substring(k, m + 1)}.
* <p>
* This method may be used to trim whitespace (as defined above) from
* the beginning and end of a string.
*
* @return A string whose value is this string, with any leading and trailing white
* space removed, or this string if it has no leading or
* trailing white space.
*/
public String trim() {
int len = value.length;
int st = 0;
char[] val = value; /* avoid getfield opcode */

while ((st < len) && (val[st] <= ' ')) {
st++;
}
while ((st < len) && (val[len - 1] <= ' ')) {
len--;
}
return ((st > 0) || (len < value.length)) ? substring(st, len) : this;
}

split

按照指定符号分割字符串

compareTo

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
/**
* Compares two strings lexicographically.
* The comparison is based on the Unicode value of each character in
* the strings. The character sequence represented by this
* {@code String} object is compared lexicographically to the
* character sequence represented by the argument string. The result is
* a negative integer if this {@code String} object
* lexicographically precedes the argument string. The result is a
* positive integer if this {@code String} object lexicographically
* follows the argument string. The result is zero if the strings
* are equal; {@code compareTo} returns {@code 0} exactly when
* the {@link #equals(Object)} method would return {@code true}.
* <p>
* This is the definition of lexicographic ordering. If two strings are
* different, then either they have different characters at some index
* that is a valid index for both strings, or their lengths are different,
* or both. If they have different characters at one or more index
* positions, let <i>k</i> be the smallest such index; then the string
* whose character at position <i>k</i> has the smaller value, as
* determined by using the &lt; operator, lexicographically precedes the
* other string. In this case, {@code compareTo} returns the
* difference of the two character values at position {@code k} in
* the two string -- that is, the value:
* <blockquote><pre>
* this.charAt(k)-anotherString.charAt(k)
* </pre></blockquote>
* If there is no index position at which they differ, then the shorter
* string lexicographically precedes the longer string. In this case,
* {@code compareTo} returns the difference of the lengths of the
* strings -- that is, the value:
* <blockquote><pre>
* this.length()-anotherString.length()
* </pre></blockquote>
*
* @param anotherString the {@code String} to be compared.
* @return the value {@code 0} if the argument string is equal to
* this string; a value less than {@code 0} if this string
* is lexicographically less than the string argument; and a
* value greater than {@code 0} if this string is
* lexicographically greater than the string argument.
*/
/**
* 1. 比较字符串大小,因为String实现了Comparable<String>接口,所有重写了compareTo方法
* 2. 返回int类型,正数为大,负数为小,基于字符的ASSIC码比较
*/
public int compareTo(String anotherString) {
int len1 = value.length;
int len2 = anotherString.value.length;
int lim = Math.min(len1, len2); //获取长度比较小的字符串长度
char v1[] = value;
char v2[] = anotherString.value;

int k = 0;
while (k < lim) { //当前索引小于两个字符串中长度较小的字符串长度时,循环继续
char c1 = v1[k];
char c2 = v2[k];
if (c1 != c2) {
return c1 - c2; //从前向后遍历,有一个字符不相同,返回差值
}
k++;
}
return len1 - len2; //如果遍历结束,都相同,比较两个字符串长度
}

String的不可变性

在定义String时增加了final标记,并且存储数据的数组也被定义为final,则表示String一但创建就注定不可变。对String对象的任何改变都不影响到原对象,相关的任何change操作都会生成新的对象

原因

因为String太过常用,JAVA类库的设计者在实现时做了个小小的变化,即采用了享元模式,每当生成一个新内容的字符串时,他们都被添加到一个共享池中,当第二次再次生成同样内容的字符串实例时,就共享此对象,而不是创建一个新对象,但是这样的做法仅仅适合于通过=符号进行的初始化。需要说明一点的是,在object中,equals()是用来比较内存地址的,但是String重写了equals()方法,用来比较内容的,即使是不同地址,只要内容一致,也会返回true,这也就是为什么a.equals(c)返回true的原因了。

优点

  1. 可以实现多个变量引用堆内存中的同一个字符串实例,避免创建的开销。
  2. 安全性更高
  3. 传参与基础类型一样,更加直观

字符串常量池

后续会在JVM中进行详细的讲解。

字符串常量池概述

常量池表(Constant_Pool table)

  1. Class文件中存储所有常量(包括字符串)的table。
  2. Class文件中的内容,不是运行内容,表示Class文件中的字节码指令

运行时常量池(Runtime Constant Pool)

  1. JVM内存中方法区的一部分,这是运行时的内容
  2. 这部分内容(绝大部分)是随着JVM运行时候,从常量池转化而来,每个Class对应一个运行时常量池
  3. 除了 Class中常量池内容,还可能包括动态生成并加入这里的内容

字符串常量池(String Pool)

  1. 这部分也在方法区中,但与Runtime Constant Pool不是一个概念,String Pool是JVM实例全局共享的,全局只有一个
  2. JVM规范要求进入这里的String实例叫“被驻留的interned string”,各个JVM可以有不同的实现,HotSpot是设置了一个哈希表StringTable来引用堆中的字符串实例,被引用就是被驻留。

亨元模式

字符串在使用过程中使用到了享元模式(一个系统中如果有多处用到了相同的一个元素,那么我们应该只存储一份此元素,而让所有地方都引用这一个元素),而那个存储元素的地方就叫做“字符串常量池 - String Pool”

详细分析

String str = "hello";的执行过程:

其他

String与StringBuffer、StringBuilder

  1. String是不可变的字符序列
  2. StringBuffer和StringBuilder是可变的字符序列
  3. StringBuffer是线程安全的,效率低
  4. StringBuilder和String是线程不安全的,效率高一些,
  5. 效率从高到低:StringBuilder>String>StringBuffer

String循环拼接对象

1
2
3
4
5
6
7
8
9
10
package com.sunld.string;
public class Test1 {
public static void main(String[] args) {
String string = "";
for(int i=0;i<10000;i++) {
string += "hello";
}
System.out.println(string);
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Compiled from "Test1.java"
public class com.sunld.string.Test1 {
public com.sunld.string.Test1();
Code:
0: aload_0
1: invokespecial #8 // Method java/lang/Object."<init>":()V
4: return

public static void main(java.lang.String[]);
Code:
0: ldc #16 // String
2: astore_1
3: iconst_0
4: istore_2
5: goto 31
8: new #18 // class java/lang/StringBuilder
11: dup
12: aload_1
13: invokestatic #20 // Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String;
16: invokespecial #26 // Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
19: ldc #29 // String hello
21: invokevirtual #31 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
24: invokevirtual #35 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
27: astore_1
28: iinc 2, 1
31: iload_2
32: sipush 10000
35: if_icmplt 8
38: getstatic #39 // Field java/lang/System.out:Ljava/io/PrintStream;
41: aload_1
42: invokevirtual #45 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
45: return
}

8~35行是整个循环过程,并且每次都会new一个StringBuilder(String拼接被jvm优化为使用StringBuilder处理)

使用StringBuilder

1
2
3
4
5
6
7
8
9
10
11
12
13
package com.sunld.string;

public class Test2 {

public static void main(String[] args) {
StringBuilder string = new StringBuilder();
for(int i=0;i<10000;i++) {
string.append("hello");
}
System.out.println(string.toString());
}

}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Compiled from "Test2.java"
public class com.sunld.string.Test2 {
public com.sunld.string.Test2();
Code:
0: aload_0
1: invokespecial #8 // Method java/lang/Object."<init>":()V
4: return

public static void main(java.lang.String[]);
Code:
0: new #16 // class java/lang/StringBuilder
3: dup
4: invokespecial #18 // Method java/lang/StringBuilder."<init>":()V
7: astore_1
8: iconst_0
9: istore_2
10: goto 23
13: aload_1
14: ldc #19 // String hello
16: invokevirtual #21 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
19: pop
20: iinc 2, 1
23: iload_2
24: sipush 10000
27: if_icmplt 13
30: getstatic #25 // Field java/lang/System.out:Ljava/io/PrintStream;
33: aload_1
34: invokevirtual #31 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
37: invokevirtual #35 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
40: return
}

13~27整个循环处理过程,只创建了一次对象。

==与equals

==在对字符串比较的时候,对比的是内存地址,而equals比较的是字符串内容,在开发的过程中,equals()通过接受参数,可以避免空指向"hello".equals(str)

编码问题

java中的String类是按照Unicode的方式进行编码,并且默认编码方式为UTF-16。使用字节的方式可以指定编码方式,以及编码方式的转换(转换不对会出现乱码问题)。并且不能的编码方式对应的数组长度不同。

示例

1
2
3
4
5
6
7
8
package com.sunld.string;

public class TestString2 {
public static void main(String[] args) {
String b = "𝄞";
System.out.println(b.length()); //输出为2
}
}

通过length的描述:知道返回的字符串长度时Unicode代码单元的数目

1
2
3
4
5
6
7
8
9
/**
* Returns the length of this string.
* The length is equal to the number of <a href="Character.html#unicode">Unicode
* code units</a> in the string.
* 返回字符串的长度,长度等于字符串中的Unicode代码单元的数目(UTF-16的代码单元的数目)
*
* @return the length of the sequence of characters represented by this
* object.
*/

说明

在java中的编码方式分为内码和外码:

  1. 内码:char或string在内存中使用的编码方式,java中的使用utf16作为内码
  2. 外码:除内码都可以认为是外码,包括class文件的编码

代码单元:一种转换格式(UTF)中最小的一个分隔,称为一个代码单元(code unit),一个转换公式只会包含整数个单元。UTF-X中的X表示各自码单元的位数。
UTF-16,可以包含一个单元和两个单元,也就是两个字节和四个字节

替代方案:codePointCount

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
/**
* Returns the number of Unicode code points in the specified text
* range of this {@code String}. The text range begins at the
* specified {@code beginIndex} and extends to the
* {@code char} at index {@code endIndex - 1}. Thus the
* length (in {@code char}s) of the text range is
* {@code endIndex-beginIndex}. Unpaired surrogates within
* the text range count as one code point each.
*
* @param beginIndex the index to the first {@code char} of
* the text range.
* @param endIndex the index after the last {@code char} of
* the text range.
* @return the number of Unicode code points in the specified text
* range
* @exception IndexOutOfBoundsException if the
* {@code beginIndex} is negative, or {@code endIndex}
* is larger than the length of this {@code String}, or
* {@code beginIndex} is larger than {@code endIndex}.
* @since 1.5
* 可以使用该方法返回字符串的准确长度(相对于length方法)
*/
public int codePointCount(int beginIndex, int endIndex) {
if (beginIndex < 0 || endIndex > value.length || beginIndex > endIndex) {
throw new IndexOutOfBoundsException();
}
return Character.codePointCountImpl(value, beginIndex, endIndex - beginIndex);
}

参考

  1. Java常用类(二)String类详解
  2. String源码分析(基于JDK1.8)
  3. 刨根究底字符编码之十四——UTF-16究竟是怎么编码的