twitter-to-sqlite import recreates archive- tables, closes #17 · alice.mosphere.at/twitter-to-sqlite@6a8012b

+3 -1

README.md

··· 219 219 220 220 You can request an archive of your Twitter data by [following these instructions](https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive). 221 221 222 - Twitter will send you a link to download a `.zip` file. You can import the contents of that file into a set of tables (each beginning with the `archive-` prefix) using the `import` command: 222 + Twitter will send you a link to download a `.zip` file. You can import the contents of that file into a set of tables in a new database file called `archive.db` (each table beginning with the `archive-` prefix) using the `import` command: 223 223 224 224 $ twitter-to-sqlite import archive.db ~/Downloads/twitter-2019-06-25-b31f2.zip 225 225 226 226 This command does not populate any of the regular tables, since Twitter's export data does not exactly match the schema returned by the Twitter API. 227 + 228 + It will delete and recreate all of your `archive-*` tables every time you run it. If this is not what you want, run the command against a new SQLite database file name rather than running it against one that already exists. 227 229 228 230 You may want to use other commands to populate tables based on data from the archive. For example, to retrieve full API versions of each of the tweets you have favourited in your archive, you could run the following: 229 231

+21 -4

tests/test_import.py

··· 8 8 from .utils import create_zip 9 9 10 10 11 - def test_cli_import(tmpdir): 11 + @pytest.fixture 12 + def import_test_dir(tmpdir): 12 13 archive = str(tmpdir / "archive.zip") 13 - output = str(tmpdir / "output.db") 14 14 buf = io.BytesIO() 15 15 zf = create_zip(buf) 16 16 zf.close() 17 17 open(archive, "wb").write(buf.getbuffer()) 18 + return tmpdir, archive 19 + 20 + 21 + def test_cli_import(import_test_dir): 22 + tmpdir, archive = import_test_dir 23 + output = str(tmpdir / "output.db") 18 24 result = CliRunner().invoke(cli.cli, ["import", output, archive]) 19 25 assert 0 == result.exit_code, result.stderr 20 26 db = sqlite_utils.Database(output) ··· 36 42 {"savedSearchId": "42214", "query": "simonw"}, 37 43 {"savedSearchId": "55814", "query": "django"}, 38 44 ] == list(db["archive-saved-search"].rows) 39 - dd = list(db["archive-account"].rows) 40 45 assert [ 41 46 { 42 47 "pk": "c4e32e91742df2331ef3ad1e481d1a64d781183a", ··· 48 53 "createdAt": "2006-11-15T13:18:50.000Z", 49 54 "accountDisplayName": "Simon Willison", 50 55 } 51 - ] == dd 56 + ] == list(db["archive-account"].rows) 57 + 58 + 59 + def test_deletes_existing_archive_tables(import_test_dir): 60 + tmpdir, archive = import_test_dir 61 + output = str(tmpdir / "output.db") 62 + db = sqlite_utils.Database(output) 63 + # Create a table 64 + db["archive-foo"].create({"id": int}) 65 + assert ["archive-foo"] == db.table_names() 66 + result = CliRunner().invoke(cli.cli, ["import", output, archive]) 67 + # That table should have been deleted 68 + assert "archive-foo" not in db.table_names()

+4

twitter_to_sqlite/cli.py

··· 473 473 def import_(db_path, archive_path): 474 474 "Import data from a Twitter exported archive" 475 475 db = sqlite_utils.Database(db_path) 476 + # Drop archive-* tables that already exist 477 + for table in db.tables: 478 + if table.name.startswith("archive-"): 479 + table.drop() 476 480 for filename, content in utils.read_archive_js(archive_path): 477 481 filename = filename[: -len(".js")] 478 482 if filename not in archive.transformers:

Configure Feed

Configure Feed