原文地址:https://cstack.github.io/db_tutorial/parts/part4.html 原文作者: cstack 译者:Win-Man
目前为止,我们的数据库具备了插入记录和输出所有记录行的能力。让我们花点时间测试一下目前数据库的能力。
我准备使用 rspec 写测试,因为我对这个比较熟悉,并且语法比较易读。
我会定义一个简短的助手去发送一系列命令给我们的数据库程序,然后对返回的输出结果做断言。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 describe 'database' do def run_script(commands) raw_output = nil IO.popen("./db", "r+") do |pipe| commands.each do |command| pipe.puts command end pipe.close_write # Read entire output raw_output = pipe.gets(nil) end raw_output.split("\n") end it 'inserts and retrieves a row' do result = run_script([ "insert 1 user1 person1@example.com", "select", ".exit", ]) expect(result).to match_array([ "db > Executed.", "db > (1, user1, person1@example.com)", "Executed.", "db > ", ]) end end
这个简单的测试确保了我们得到了和我们输入一样的输出结果,并且这个测试确实通过了:
1 2 3 4 5 bundle exec rspec . Finished in 0.00871 seconds (files took 0.09506 seconds to load) 1 example, 0 failures
然后就可以测试往数据库中插入大量的记录了:
1 2 3 4 5 6 7 8 it 'prints error message when table is full' do script = (1..1401).map do |i| "insert #{i} user#{i} person#{i}@example.com" end script << ".exit" result = run_script(script) expect(result[-2]).to eq('db > Error: Table full.') end
再重新运行测试。
1 2 3 4 5 bundle exec rspec .. Finished in 0.01553 seconds (files took 0.08156 seconds to load) 2 examples, 0 failures
测试通过了,我们的数据库可以存储 1400 行数据目前,因为我们设置了最大的数据页大小是 100,并且每个数据页上可以存储 14 条记录。
重新翻看我们写的代码,我意识到我们可能没有正确的处理如何存储 text 字段。可以很容易的验证这个 case:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 it 'allows inserting strings that are the maximum length' do long_username = "a"*32 long_email = "a"*255 script = [ "insert 1 #{long_username} #{long_email}", "select", ".exit", ] result = run_script(script) expect(result).to match_array([ "db > Executed.", "db > (1, #{long_username}, #{long_email})", "Executed.", "db > ", ]) end
然后测试就跑不通了
1 2 3 4 5 6 7 8 9 10 Failures: 1) database allows inserting strings that are the maximum length Failure/Error: raw_output.split("\n") ArgumentError: invalid byte sequence in UTF-8 # ./spec/main_spec.rb:14:in `split' # ./spec/main_spec.rb:14:in `run_script' # ./spec/main_spec.rb:48:in `block (2 levels) in <top (required)>'
如果我们手动测试这个例子,我们可以在输出的记录中看到一些奇怪的字符(我对字符串进行了缩写):
1 2 3 4 5 6 db > insert 1 aaaaa... aaaaa... Executed. db > select (1, aaaaa...aaa\�, aaaaa...aaa\�) Executed. db >
这是怎么回事?我们再看一下我们对于 Row 的定义,我们给 username 字段分配了 32 字节的空间,email 字段分配了 255 字节的空间。但是 C 语言中 string 类型的字段应该是以 null 字符结尾的,我们并没有给这个结束符分配空间。所以这个问题的结局方案就是多分配一个字节的空间:
1 2 3 4 5 6 7 8 const uint32_t COLUMN_EMAIL_SIZE = 255 ; typedef struct { uint32_t id; - char username[COLUMN_USERNAME_SIZE]; - char email[COLUMN_EMAIL_SIZE]; + char username[COLUMN_USERNAME_SIZE + 1 ]; + char email[COLUMN_EMAIL_SIZE + 1 ]; } Row;
再次运行测试,可以看到测试通过了
1 2 3 4 5 bundle exec rspec ... Finished in 0.0188 seconds (files took 0.08516 seconds to load) 3 examples, 0 failures
我们不应该允许插入的 username 和 email 字段的值超过我们定义的 column size ,可以增加一个测试:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 it 'prints error message if strings are too long' do long_username = "a"*33 long_email = "a"*256 script = [ "insert 1 #{long_username} #{long_email}", "select", ".exit", ] result = run_script(script) expect(result).to match_array([ "db > String is too long.", "db > Executed.", "db > ", ]) end
为了实现这个限制,我们需要更新一下我们的 parser 解析器。目前使用的是 scanf():
1 2 3 4 5 6 7 8 9 10 if (strncmp (input_buffer->buffer, "insert" , 6 ) == 0 ) { statement->type = STATEMENT_INSERT; int args_assigned = sscanf ( input_buffer->buffer, "insert %d %s %s" , &(statement->row_to_insert.id), statement->row_to_insert.username, statement->row_to_insert.email); if (args_assigned < 3 ) { return PREPARE_SYNTAX_ERROR; } return PREPARE_SUCCESS; }
但是 scanf() 有一些缺点。如果读取的字符串大小超过了分配的空间,会导致缓存溢出并且数据会被写入到未知的空间。我们需要在拷贝数据到 Row 结构体之前先检查一下字符串的长度。为了实现这个,我们需要将输入通过空格切割一下。
我准备试用 strtok() 来实现这个。这个比较容易理解:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 +PrepareResult prepare_insert (InputBuffer* input_buffer, Statement* statement) { + statement->type = STATEMENT_INSERT; + + char * keyword = strtok(input_buffer->buffer, " " ); + char * id_string = strtok(NULL , " " ); + char * username = strtok(NULL , " " ); + char * email = strtok(NULL , " " ); + + if (id_string == NULL || username == NULL || email == NULL ) { + return PREPARE_SYNTAX_ERROR; + } + + int id = atoi(id_string); + if (strlen (username) > COLUMN_USERNAME_SIZE) { + return PREPARE_STRING_TOO_LONG; + } + if (strlen (email) > COLUMN_EMAIL_SIZE) { + return PREPARE_STRING_TOO_LONG; + } + + statement->row_to_insert.id = id; + strcpy (statement->row_to_insert.username, username); + strcpy (statement->row_to_insert.email, email); + + return PREPARE_SUCCESS; +} + PrepareResult prepare_statement (InputBuffer* input_buffer, Statement* statement) { if (strncmp (input_buffer->buffer, "insert" , 6 ) == 0 ) { + return prepare_insert(input_buffer, statement); - statement->type = STATEMENT_INSERT; - int args_assigned = sscanf ( - input_buffer->buffer, "insert %d %s %s" , &(statement->row_to_insert.id), - statement->row_to_insert.username, statement->row_to_insert.email); - if (args_assigned < 3 ) { - return PREPARE_SYNTAX_ERROR; - } - return PREPARE_SUCCESS; }
在输入缓冲区上连续调用strtok,每当缓冲区到达分隔符(在本例中是空格)时,就插入一个空字符,从而将它分解为子字符串。它返回一个指向子字符串开头的指针。
可以调用 strlen() 函数确认每个 text 字段值是不是超过了长度限制。
我们可以像处理别的错误一样处理这个错误:
1 2 3 4 5 6 enum PrepareResult_t { PREPARE_SUCCESS, + PREPARE_STRING_TOO_LONG, PREPARE_SYNTAX_ERROR, PREPARE_UNRECOGNIZED_STATEMENT };
1 2 3 4 5 6 7 8 9 switch (prepare_statement(input_buffer, &statement)) { case (PREPARE_SUCCESS): break ; + case (PREPARE_STRING_TOO_LONG): + printf ("String is too long.\n" ); + continue ; case (PREPARE_SYNTAX_ERROR): printf ("Syntax error. Could not parse statement.\n" ); continue ;
再次运行测试,测试通过
1 2 3 4 5 bundle exec rspec .... Finished in 0.02284 seconds (files took 0.116 seconds to load) 4 examples, 0 failures
既然已经增加了一种错误情况,我们不妨再处理一个错误情况:
1 2 3 4 5 6 7 8 9 10 11 12 13 it 'prints an error message if id is negative' do script = [ "insert -1 cstack foo@bar.com", "select", ".exit", ] result = run_script(script) expect(result).to match_array([ "db > ID must be positive.", "db > Executed.", "db > ", ]) end
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 enum PrepareResult_t { PREPARE_SUCCESS, + PREPARE_NEGATIVE_ID, PREPARE_STRING_TOO_LONG, PREPARE_SYNTAX_ERROR, PREPARE_UNRECOGNIZED_STATEMENT @@ -148 ,9 +147 ,6 @@ PrepareResult prepare_insert (InputBuffer* input_buffer, Statement* statement) { } int id = atoi(id_string); + if (id < 0 ) { + return PREPARE_NEGATIVE_ID; + } if (strlen (username) > COLUMN_USERNAME_SIZE) { return PREPARE_STRING_TOO_LONG; } @@ -230 ,9 +226 ,6 @@ int main (int argc, char * argv[]) { switch (prepare_statement(input_buffer, &statement)) { case (PREPARE_SUCCESS): break ; + case (PREPARE_NEGATIVE_ID): + printf ("ID must be positive.\n" ); + continue ; case (PREPARE_STRING_TOO_LONG): printf ("String is too long.\n" ); continue ;
目前这些测试已经足够了,下一个章节会是一个重要的特性:持久化。我们需要将数据存储到文件中,并且中文件中读取出来。
那会很棒的。
在这给出这个部分中与之前代码不同的部分:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 @@ -22 ,6 +22 ,8 @@ enum PrepareResult_t { PREPARE_SUCCESS, + PREPARE_NEGATIVE_ID, + PREPARE_STRING_TOO_LONG, PREPARE_SYNTAX_ERROR, PREPARE_UNRECOGNIZED_STATEMENT }; @@ -34 ,8 +36 ,8 @@ #define COLUMN_EMAIL_SIZE 255 typedef struct { uint32_t id; - char username[COLUMN_USERNAME_SIZE]; - char email[COLUMN_EMAIL_SIZE]; + char username[COLUMN_USERNAME_SIZE + 1 ]; + char email[COLUMN_EMAIL_SIZE + 1 ]; } Row; @@ -150 ,18 +152 ,40 @@ MetaCommandResult do_meta_command (InputBuffer* input_buffer, Table *table) { } } -PrepareResult prepare_statement (InputBuffer* input_buffer, - Statement* statement) {- if (strncmp (input_buffer->buffer, "insert" , 6 ) == 0 ) { +PrepareResult prepare_insert (InputBuffer* input_buffer, Statement* statement) { statement->type = STATEMENT_INSERT; - int args_assigned = sscanf ( - input_buffer->buffer, "insert %d %s %s" , &(statement->row_to_insert.id), - statement->row_to_insert.username, statement->row_to_insert.email - ); - if (args_assigned < 3 ) { + + char * keyword = strtok(input_buffer->buffer, " " ); + char * id_string = strtok(NULL , " " ); + char * username = strtok(NULL , " " ); + char * email = strtok(NULL , " " ); + + if (id_string == NULL || username == NULL || email == NULL ) { return PREPARE_SYNTAX_ERROR; } + + int id = atoi(id_string); + if (id < 0 ) { + return PREPARE_NEGATIVE_ID; + } + if (strlen (username) > COLUMN_USERNAME_SIZE) { + return PREPARE_STRING_TOO_LONG; + } + if (strlen (email) > COLUMN_EMAIL_SIZE) { + return PREPARE_STRING_TOO_LONG; + } + + statement->row_to_insert.id = id; + strcpy (statement->row_to_insert.username, username); + strcpy (statement->row_to_insert.email, email); + return PREPARE_SUCCESS; + +} +PrepareResult prepare_statement (InputBuffer* input_buffer, + Statement* statement) {+ if (strncmp (input_buffer->buffer, "insert" , 6 ) == 0 ) { + return prepare_insert(input_buffer, statement); } if (strcmp (input_buffer->buffer, "select" ) == 0 ) { statement->type = STATEMENT_SELECT; @@ -223 ,6 +247 ,12 @@ int main (int argc, char * argv[]) { switch (prepare_statement(input_buffer, &statement)) { case (PREPARE_SUCCESS): break ; + case (PREPARE_NEGATIVE_ID): + printf ("ID must be positive.\n" ); + continue ; + case (PREPARE_STRING_TOO_LONG): + printf ("String is too long.\n" ); + continue ; case (PREPARE_SYNTAX_ERROR): printf ("Syntax error. Could not parse statement.\n" ); continue ;
以及我们增加的测试:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 +describe 'database' do + def run_script(commands) + raw_output = nil + IO.popen("./db", "r+") do |pipe| + commands.each do |command| + pipe.puts command + end + + pipe.close_write + + # Read entire output + raw_output = pipe.gets(nil) + end + raw_output.split("\n") + end + + it 'inserts and retrieves a row' do + result = run_script([ + "insert 1 user1 person1@example.com", + "select", + ".exit", + ]) + expect(result).to match_array([ + "db > Executed.", + "db > (1, user1, person1@example.com)", + "Executed.", + "db > ", + ]) + end + + it 'prints error message when table is full' do + script = (1..1401).map do |i| + "insert #{i} user#{i} person#{i}@example.com" + end + script << ".exit" + result = run_script(script) + expect(result[-2]).to eq('db > Error: Table full.') + end + + it 'allows inserting strings that are the maximum length' do + long_username = "a"*32 + long_email = "a"*255 + script = [ + "insert 1 #{long_username} #{long_email}", + "select", + ".exit", + ] + result = run_script(script) + expect(result).to match_array([ + "db > Executed.", + "db > (1, #{long_username}, #{long_email})", + "Executed.", + "db > ", + ]) + end + + it 'prints error message if strings are too long' do + long_username = "a"*33 + long_email = "a"*256 + script = [ + "insert 1 #{long_username} #{long_email}", + "select", + ".exit", + ] + result = run_script(script) + expect(result).to match_array([ + "db > String is too long.", + "db > Executed.", + "db > ", + ]) + end + + it 'prints an error message if id is negative' do + script = [ + "insert -1 cstack foo@bar.com", + "select", + ".exit", + ] + result = run_script(script) + expect(result).to match_array([ + "db > ID must be positive.", + "db > Executed.", + "db > ", + ]) + end +end