Update some storage documentation for new adjustment workflows

+135 -104

2 changed files

expand all

src

docs

contributor

database.diviner

infrastructure

storage

management

workflow

PhabricatorStorageManagementAdjustWorkflow.php

+131 -95

src/docs/contributor/database.diviner

··· 4 4 This document describes key components of the database schema and should answer 5 5 questions like how to store new types of data. 6 6 7 - = Database System = 7 + Database System 8 + =============== 8 9 9 - Phabricator uses MySQL with InnoDB engine. The only exception is the 10 + Phabricator uses MySQL or another MySQL-compatible database (like MariaDB 11 + or Amazon RDS). 12 + 13 + Phabricator the InnoDB table engine. The only exception is the 10 14 `search_documentfield` table which uses MyISAM because MySQL doesn't support 11 - fulltext search in InnoDB. 15 + fulltext search in InnoDB (recent versions do, but we haven't added support 16 + yet). 12 17 13 - Let us know if you need to use other database system: @{article:Give Feedback! 14 - Get Support!}. 18 + We are unlikely to ever support other incompatible databases like PostgreSQL or 19 + SQLite. 15 20 16 - = PHP Drivers = 21 + PHP Drivers 22 + =========== 17 23 18 24 Phabricator supports [[ http://www.php.net/book.mysql | MySQL ]] and 19 - [[ http://www.php.net/book.mysqli | MySQLi ]] PHP extensions. Most installations 20 - use MySQL but MySQLi should work equally well. 25 + [[ http://www.php.net/book.mysqli | MySQLi ]] PHP extensions. 21 26 22 - = Databases = 27 + Databases 28 + ========= 23 29 24 30 Each Phabricator application has its own database. The names are prefixed by 25 - `phabricator_`. This design has two advantages: 31 + `phabricator_` (this is configurable). This design has two advantages: 26 32 27 - * Each database is easier to comprehend and to maintain. 28 - * We don't do cross-database joins so each database can live on its own machine 29 - which is useful for load-balancing. 33 + - Each database is easier to comprehend and to maintain. 34 + - We don't do cross-database joins so each database can live on its own 35 + machine. This gives us flexibility in sharding data later. 30 36 31 - = Connections = 37 + Connections 38 + =========== 32 39 33 40 Phabricator specifies if it will use any opened connection just for reading or 34 - also for writing. This allows opening write connections to master and read 35 - connections to slave in master/slave replication. It is useful for 36 - load-balancing. 41 + also for writing. This allows opening write connections to a primary and read 42 + connections to a replica in primary/replica setups (which are not actually 43 + supported yet). 37 44 38 - = Tables = 45 + Tables 46 + ====== 39 47 40 - Each table name is prefixed by its application. For example, Differential 41 - revisions are stored in database `phabricator_differential` and table 42 - `differential_revision`. This duplicity allows easy recognition of the table in 43 - DarkConsole (see @{article:Using DarkConsole}) and other places. 48 + Most table names are prefixed by their application names. For example, 49 + Differential revisions are stored in database `phabricator_differential` and 50 + table `differential_revision`. This generally makes queries easier to recognize 51 + and understand. 44 52 45 - The exception is tables which share the same schema over different databases 46 - such as `edge`. 53 + The exception is a few tables which share the same schema over different 54 + databases such as `edge`. 47 55 48 - We use lower-case table names with words separated by underscores. The reason is 49 - that MySQL can be configured (with `lower_case_table_names`) to lower-case the 50 - table names anyway. 56 + We use lower-case table names with words separated by underscores. 51 57 52 - = Column Names = 58 + Column Names 59 + ============ 53 60 54 - Phabricator uses camelCase names for columns. The main advantage is that they 61 + Phabricator uses `camelCase` names for columns. The main advantage is that they 55 62 directly map to properties in PHP classes. 56 63 57 64 Don't use MySQL reserved words (such as `order`) for column names. 58 65 59 - = Data Types = 66 + Data Types 67 + ========== 60 68 61 - Phabricator uses `int unsigned` columns for storing dates instead of `date` or 62 - `datetime`. We don't need to care about time-zones in both MySQL and PHP because 63 - of it. The other reason is that PHP internally uses numbers for storing dates. 69 + Phabricator defines a set of abstract data types (like `uint32`, `epoch`, and 70 + `phid`) which map to MySQL column types. The mapping depends on the MySQL 71 + version. 64 72 65 - Phabricator uses UTF-8 encoding for storing all text data. We use 66 - `utf8_general_ci` collation for free-text and `utf8_bin` for identifiers. 73 + Phabricator uses `utf8mb4` character sets where available (MySQL 5.5 or newer), 74 + and `binary` character sets in most other cases. The primary motivation is to 75 + allow 4-byte unicode characters to be stored (the `utf8` character set, which 76 + is more widely available, does not support them). On newer MySQL, we use 77 + `utf8mb4` to take advantage of improved collation rules. 78 + 79 + Phabricator stores dates with an `epoch` abstract data type, which maps to 80 + `int unsigned`. Although this makes dates less readable when browsing the 81 + database, it makes date and time manipulation more consistent and 82 + straightforward in the application. 67 83 68 84 We don't use the `enum` data type because each change to the list of possible 69 85 values requires altering the table (which is slow with big tables). We use 70 86 numbers (or short strings in some cases) mapped to PHP constants instead. 71 87 72 - = JSON = 88 + JSON and Other Serialized Data 89 + ============================== 73 90 74 - Some data don't require structured access - you don't need to filter or order by 91 + Some data don't require structured access -- we don't need to filter or order by 75 92 them. We store these data as text fields in JSON format. This approach has 76 93 several advantages: 77 94 78 - * If we decide to add another unstructured field then we don't need to alter the 79 - table (which is slow for big tables in MySQL). 80 - * Table structure is not cluttered by fields which could be unused most of the 81 - time. 95 + - If we decide to add another unstructured field then we don't need to alter 96 + the table (which is slow for big tables in MySQL). 97 + - Table structure is not cluttered by fields which could be unused most of the 98 + time. 82 99 83 100 An example of such usage can be found in column 84 101 `differential_diffproperty.data`. 85 102 86 - = Primary Keys = 103 + Primary Keys 104 + ============ 87 105 88 - Most tables have auto-increment column named `id`. However creating such column 89 - is not required for tables which are not usually directly referenced (such as 90 - tables expressing M:N relations). Example of such table is 91 - `differential_relationship`. 106 + Most tables have auto-increment column named `id`. Adding an ID column is 107 + appropriate for most tables (even tables that have another natural unique key), 108 + as it improves consistency and makes it easier to perform generic operations 109 + on objects. 92 110 93 - = Indexes = 111 + For example, @{class:LiskMigrationIterator} allows you to very easily apply a 112 + migration to a table using a constant amount of memory provided the table has 113 + an `id` column. 114 + 115 + Indexes 116 + ====== 94 117 95 118 Create all indexes necessary for fast query execution in most cases. Don't 96 119 create indexes which are not used. You can analyze queries @{article:Using ··· 100 123 `(a, b) IN ((%s, %d), (%s, %d))`. Use `AND` and `OR` instead: 101 124 `((a = %s AND b = %d) OR (a = %s AND b = %d))`. 102 125 103 - = Foreign Keys = 126 + Foreign Keys 127 + ============ 104 128 105 - We don't use InnoDB's foreign keys because our application is so great that 106 - no inconsistencies can arise. It will just slow us down. 129 + We don't use foreign keys because they're complicated and we haven't experienced 130 + significant issues with data inconsistency that foreign keys could help prevent. 131 + Empirically, we have witnessed first hand as `ON DELETE CASCADE` relationships 132 + accidentally destroy huge amounts of data. We may pursue foreign keys 133 + eventually, but there isn't a strong case for them at the present time. 107 134 108 - = PHIDs = 135 + PHIDs 136 + ===== 109 137 110 138 Each globally referencable object in Phabricator has its associated PHID 111 - (Phabricator ID) which serves as a global identifier. We use PHIDs for 112 - referencing data in different databases. 139 + ("Phabricator ID") which serves as a global identifier, similar to a GUID. 140 + We use PHIDs for referencing data in different databases. 113 141 114 142 We use both autoincrementing IDs and global PHIDs because each is useful in 115 - different contexts. Autoincrementing IDs are chronologically ordered and allow 116 - us to construct short, human-readable object names (like D2258) and URIs. Global 117 - PHIDs allow us to represent relationships between different types of objects in 118 - a homogeneous way. 143 + different contexts. Autoincrementing IDs are meaningfully ordered and allow 144 + us to construct short, human-readable object names (like `D2258`) and URIs. 145 + Global PHIDs allow us to represent relationships between different types of 146 + objects in a homogeneous way. 119 147 120 - For example, the concept of "subscribers" is more powerfully done with PHIDs 121 - because we could theoretically have users, projects, teams, and more all as 122 - "subscribers" of other objects. Using an ID column we would need to add a 123 - "type" column to avoid ID collision; using PHIDs does not require this 124 - additional column. 148 + For example, infrastructure like "subscribers" can be implemented easily with 149 + PHID relationships: different types of objects (users, projects, mailing lists) 150 + are permitted to subscribe to different types of objects (revisions, tasks, 151 + etc). Without PHIDs, we would need to add a "type" column to avoid ID collision; 152 + using PHIDs makes implementing features like this simpler. 125 153 126 - = Transactions = 154 + Transactions 155 + ============ 127 156 128 157 Transactional code should be written using transactions. Example of such code is 129 158 inserting multiple records where one doesn't make sense without the other or 130 159 selecting data later used for update. See chapter in @{class:LiskDAO}. 131 160 132 - = Advanced Features = 161 + Advanced Features 162 + ================= 133 163 134 164 We don't use MySQL advanced features such as triggers, stored procedures or 135 165 events because we like expressing the application logic in PHP more than in SQL. 136 - Some of these features (especially triggers) can also cause big confusion. 166 + Some of these features (especially triggers) can also cause a great deal of 167 + confusion, and are generally more difficult to debug, profile, version control, 168 + update, and understand than application code. 137 169 138 - Avoiding these advanced features is also good for supporting other database 139 - systems (which we don't support anyway). 170 + Schema Denormalization 171 + ====================== 172 + 173 + Phabricator uses schema denormalization sparingly. Avoid denormalization unless 174 + there is a compelling reason (usually, performance) to denormalize. 175 + 176 + Schema Changes and Migrations 177 + ============================= 140 178 141 - = Schema Denormalization = 179 + To create a new schema change or migration: 142 180 143 - Phabricator uses schema denormalization for performance reasons sparingly. Try 144 - to avoid it if possible. 181 + **Create a database patch**. Database patches go in 182 + `resources/sql/autopatches/`. To change a schema, use a `.sql` file and write 183 + in SQL. To perform a migration, use a `.php` file and write in PHP. Name your 184 + file `YYYYMMDD.patchname.ext`. For example, `20141225.christmas.sql`. 145 185 146 - = Changing the Schema = 186 + **Keep patches small**. Most schema change statements are not transactional. If 187 + a patch contains several SQL statements and fails partway through, it normally 188 + can not be rolled back. When a user tries to apply the patch again later, the 189 + first statement (which, for example, adds a column) may fail (because the column 190 + already exists). This can be avoided by keeping patches small (generally, one 191 + statement per patch). 147 192 148 - There are three simple steps to update the schema: 193 + **Use namespace and character set variables**. When defining a `.sql` patch, 194 + you should use these variables instead of hard-coding namespaces or character 195 + set names: 149 196 150 - # Create a `.sql` file in `resources/sql/patches/`. This file should: 151 - - Contain the appropriate MySQL commands to update the schema. 152 - - Be named as `YYYYMMDD.patchname.ext`. For example, `20130217.example.sql`. 153 - - Use `${NAMESPACE}` rather than `phabricator` for database names. 154 - - Use `COLLATE utf8_bin` for any columns that are to be used as identifiers, 155 - such as PHID columns. Otherwise, use `COLLATE utf8_general_ci`. 156 - - Name all indexes so it is possible to delete them later. 157 - # Edit `src/infrastructure/storage/patch/PhabricatorBuiltinPatchList.php` and 158 - add your patch to 159 - @{method@phabricator:PhabricatorBuiltinPatchList::getPatches}. 160 - # Run `bin/storage upgrade`. 197 + | Variable | Meaning | Notes | 198 + |---|---|---| 199 + | {$NAMESPACE} | Storage Namespace | Defaults to `phabricator` | 200 + | {$CHARSET} | Default Charset | Mostly used to specify table charset | 201 + | {$COLLATE_TEXT} | Text Collation | For most text (case-sensitive) | 202 + | {$COLLATE_SORT} | Sort Collation | For sortable text (case-insensitive) | 203 + | {$CHARSET_FULLTEXT} | Fulltext Charset | Specify explicitly for fulltext | 204 + | {$COLLATE_FULLTEXT} | Fulltext Collate | Specify explicitly for fulltext | 161 205 162 - It is also possible to create more complex patches in PHP for data migration 163 - (due to schema changes or otherwise.) However, the schema changes themselves 164 - should be done in separate `.sql` files. Order can be guaranteed by editing 165 - `src/infrastructure/storage/patch/PhabricatorBuiltinPatchList.php` 166 - appropriately. 167 206 168 - See the 169 - [[https://secure.phabricator.com/rPb39175342dc5bee0c2246b05fa277e76a7e96ed3 170 - | commit adding policy storage for Paste ]] for a reasonable example of the code 171 - changes. 207 + **Test your patch**. Run `bin/storage upgrade` to test your patch. 172 208 173 - = See Also = 209 + See Also 210 + ======== 174 211 175 - * @{class:LiskDAO} 176 - * @{class:PhabricatorPHID} 212 + - @{class:LiskDAO}

+4 -9

src/infrastructure/storage/management/workflow/PhabricatorStorageManagementAdjustWorkflow.php

··· 148 148 pht( 149 149 "Found %s issues(s) with schemata, detailed above.\n\n". 150 150 "You can review issues in more detail from the web interface, ". 151 - "in Config > Database Status.\n\n". 151 + "in Config > Database Status. To better understand the adjustment ". 152 + "workflow, see \"Managing Storage Adjustments\" in the ". 153 + "documentation.\n\n". 152 154 "MySQL needs to copy table data to make some adjustments, so these ". 153 - "migrations may take some time.". 154 - 155 - // TODO: Remove warning once this stabilizes. 156 - "\n\n". 157 - "WARNING: This workflow is new and unstable. If you continue, you ". 158 - "may unrecoverably destory data. Make sure you have a backup before ". 159 - "you proceed.", 160 - 155 + "migrations may take some time.", 161 156 new PhutilNumber(count($adjustments)))); 162 157 163 158 $prompt = pht('Fix these schema issues?');

Configure Feed

Configure Feed