README.md at 10dddcc2822863f9a6dbdc5cd4b28c6c4e4e9170 · alice.mosphere.at/twitter-to-sqlite

alice.mosphere.at / twitter-to-sqlite
fork
this repo has no description
fork
twitter-to-sqlite / README.md
at 10dddcc2822863f9a6dbdc5cd4b28c6c4e4e9170 369 lines 19 kB view raw view rendered
wrap content
Simon Willison Fixed typo 5y ago
197e69ce
  1# twitter-to-sqlite
  2
  3[![PyPI](https://img.shields.io/pypi/v/twitter-to-sqlite.svg)](https://pypi.org/project/twitter-to-sqlite/)
  4[![GitHub changelog](https://img.shields.io/github/v/release/dogsheep/twitter-to-sqlite?include_prereleases&label=changelog)](https://github.com/dogsheep/twitter-to-sqlite/releases)
  5[![CircleCI](https://circleci.com/gh/dogsheep/twitter-to-sqlite.svg?style=svg)](https://circleci.com/gh/dogsheep/twitter-to-sqlite)
  6[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/dogsheep/twitter-to-sqlite/blob/master/LICENSE)
  7
  8Save data from Twitter to a SQLite database.
  9
 10<!-- toc -->
 11
 12- [How to install](#how-to-install)
 13- [Authentication](#authentication)
 14- [Retrieving tweets by specific accounts](#retrieving-tweets-by-specific-accounts)
 15- [Retrieve user profiles in bulk](#retrieve-user-profiles-in-bulk)
 16- [Retrieve tweets in bulk](#retrieve-tweets-in-bulk)
 17- [Retrieving Twitter followers](#retrieving-twitter-followers)
 18- [Retrieving friends](#retrieving-friends)
 19- [Retrieving favorited tweets](#retrieving-favorited-tweets)
 20- [Retrieving Twitter lists](#retrieving-twitter-lists)
 21- [Retrieving Twitter list memberships](#retrieving-twitter-list-memberships)
 22- [Retrieving just follower and friend IDs](#retrieving-just-follower-and-friend-ids)
 23- [Retrieving tweets from your home timeline](#retrieving-tweets-from-your-home-timeline)
 24- [Retrieving your mentions](#retrieving-your-mentions)
 25- [Providing input from a SQL query with --sql and --attach](#providing-input-from-a-sql-query-with---sql-and---attach)
 26- [Running searches](#running-searches)
 27- [Capturing tweets in real-time with track and follow](#capturing-tweets-in-real-time-with-track-and-follow)
 28  * [track](#track)
 29  * [follow](#follow)
 30- [Importing data from your Twitter archive](#importing-data-from-your-twitter-archive)
 31- [Design notes](#design-notes)
 32
 33<!-- tocstop -->
 34
 35## How to install
 36
 37    $ pip install twitter-to-sqlite
 38
 39## Authentication
 40
 41First, you will need to create a Twitter application at https://developer.twitter.com/en/apps. You may need to apply for a Twitter developer account - if so, you may find this [example of an email application](https://raw.githubusercontent.com/dogsheep/twitter-to-sqlite/master/email.png) useful that has been approved in the past.
 42
 43Once you have created your application, navigate to the "Keys and tokens" page and make note of the following:
 44
 45* Your API key
 46* Your API secret key
 47* Your access token
 48* Your access token secret
 49
 50You will need to save all four of these values to a JSON file in order to use this tool.
 51
 52You can create that JSON file by running the following command and pasting in the values at the prompts:
 53
 54    $ twitter-to-sqlite auth
 55    Create an app here: https://developer.twitter.com/en/apps
 56    Then navigate to 'Keys and tokens' and paste in the following:
 57
 58    API key: xxx
 59    API secret key: xxx
 60    Access token: xxx
 61    Access token secret: xxx
 62
 63This will create a file called `auth.json` in your current directory containing the required values. To save the file at a different path or filename, use the `--auth=myauth.json` option.
 64
 65## Retrieving tweets by specific accounts
 66
 67The `user-timeline` command retrieves all of the tweets posted by the specified user accounts. It defaults to the account belonging to the authenticated user:
 68
 69    $ twitter-to-sqlite user-timeline twitter.db
 70    Importing tweets  [#####-------------------------------]  2799/17780  00:01:39
 71
 72All of these commands assume that there is an `auth.json` file in the current directory. You can provide the path to your `auth.json` file using `-a`:
 73
 74    $ twitter-to-sqlite user-timeline twitter.db -a /path/to/auth.json
 75
 76To load tweets for other users, pass their screen names as arguments:
 77
 78    $ twitter-to-sqlite user-timeline twitter.db cleopaws nichemuseums
 79
 80Twitter's API only returns up to around 3,200 tweets for most user accounts, but you may find that it returns all available tweets for your own user account.
 81
 82You can pass numeric Twitter user IDs instead of screen names using the `--ids` parameter.
 83
 84You can use `--since` to retrieve every tweet since the last time you imported for that user, or `--since_id=xxx` to retrieve every tweet since a specific tweet ID.
 85
 86This command also accepts `--sql` and `--attach` options, documented below.
 87
 88## Retrieve user profiles in bulk
 89
 90If you have a list of Twitter screen names (or user IDs) you can bulk fetch their fully inflated Twitter profiles using the `users-lookup` command:
 91
 92    $ twitter-to-sqlite users-lookup users.db simonw cleopaws
 93
 94You can pass user IDs instead using the `--ids` option:
 95
 96    $ twitter-to-sqlite users-lookup users.db 12497 3166449535 --ids
 97
 98This command also accepts `--sql` and `--attach` options, documented below.
 99
100## Retrieve tweets in bulk
101
102If you have a list of tweet IDS you can bulk fetch them using the `statuses-lookup` command:
103
104    $ twitter-to-sqlite statuses-lookup tweets.db 1122154819815239680 1122154178493575169
105
106The `--sql` and `--attach` options are supported.
107
108Here's a recipe to retrieve any tweets that existing tweets are in-reply-to which have not yet been stored in your database:
109
110    $ twitter-to-sqlite statuses-lookup tweets.db \
111        --sql='
112            select in_reply_to_status_id
113            from tweets
114            where in_reply_to_status_id is not null' \
115        --skip-existing
116
117The `--skip-existing` option means that tweets that have already been stored in the database will not be fetched again.
118
119## Retrieving Twitter followers
120
121The `followers` command retrieves details of every follower of the specified accounts. You can use it to retrieve your own followers, or you can pass one or more screen names to pull the followers for other accounts.
122
123The following command pulls your followers and saves them in a SQLite database file called `twitter.db`:
124
125    $ twitter-to-sqlite followers twitter.db
126
127This command is **extremely slow**, because Twitter impose a rate limit of no more than one request per minute to this endpoint! If you are running it against an account with thousands of followers you should expect this to take several hours.
128
129To retrieve followers for another account, use:
130
131    $ twitter-to-sqlite followers twitter.db cleopaws
132
133This command also accepts the `--ids`, `--sql` and `--attach` options.
134
135See [Analyzing my Twitter followers with Datasette](https://simonwillison.net/2018/Jan/28/analyzing-my-twitter-followers/) for the original inspiration for this command.
136
137## Retrieving friends
138
139The `friends` command works like the `followers` command, but retrieves the specified (or currently authenticated) user's friends - defined as accounts that the user is following.
140
141    $ twitter-to-sqlite friends twitter.db
142
143It takes the same options as the `followers` command.
144
145## Retrieving favorited tweets
146
147The `favorites` command retrieves tweets that have been favorited by a specified user. Called without any extra arguments it retrieves tweets favorited by the currently authenticated user:
148
149    $ twitter-to-sqlite favorites faves.db
150
151You can also use the `--screen_name` or `--user_id` arguments to retrieve favorite tweets for another user:
152
153    $ twitter-to-sqlite favorites faves-obama.db --screen_name=BarackObama
154
155## Retrieving Twitter lists
156
157The `lists` command retrieves all of the lists belonging to one or more users.
158
159    $ twitter-to-sqlite lists lists.db simonw dogsheep
160
161This command also accepts the `--sql` and `--attach` and `--ids` options.
162
163To additionally fetch the list of members for each list, use `--members`.
164
165## Retrieving Twitter list memberships
166
167The `list-members` command can be used to retrieve details of one or more Twitter lists, including all of their members.
168
169    $ twitter-to-sqlite list-members members.db simonw/the-good-place
170
171You can pass multiple `screen_name/list_slug` identifiers.
172
173If you know the numeric IDs of the lists instead, you can use `--ids`:
174
175    $ twitter-to-sqlite list-members members.db 927913322841653248 --ids
176
177## Retrieving just follower and friend IDs
178
179It's also possible to retrieve just the numeric Twitter IDs of the accounts that specific users are following ("friends" in Twitter's API terminology) or followed-by:
180
181    $ twitter-to-sqlite followers-ids members.db simonw cleopaws
182
183This will populate the `following` table with `followed_id`/`follower_id` pairs for the two specified accounts, listing every account ID that is following either of those two accounts.
184
185    $ twitter-to-sqlite friends-ids members.db simonw cleopaws
186
187This will do the same thing but pull the IDs that those accounts are following.
188
189Both of these commands also support `--sql` and `--attach` as an alternative to passing screen names as direct command-line arguments. You can use `--ids` to process the inputs as user IDs rather than screen names.
190
191The underlying Twitter APIs have a rate limit of 15 requests every 15 minutes - though they do return up to 5,000 IDs in each call. By default both of these subcommands will wait for 61 seconds between API calls in order to stay within the rate limit - you can adjust this behaviour down to just one second delay if you know you will not be making many calls using `--sleep=1`.
192
193## Retrieving tweets from your home timeline
194
195The `home-timeline` command retrieves up to 800 tweets from the home timeline of the authenticated user - generally this means tweets from people you follow.
196
197    $ twitter-to-sqlite home-timeline twitter.db
198    Importing timeline  [#################--------]  591/800  00:01:14
199
200The tweets are stored in the `tweets` table, and a record is added to the `timeline_tweets` table noting that this tweet came in due to being spotted in the timeline of your user.
201
202You can use `--since` to retrieve just tweets that have been posted since the last time this command was run, or `--since_id=xxx` to explicitly pass in a tweet ID to use as the last position.
203
204You can then view your timeline in Datasette using the following URL:
205
206`/tweets/tweets?_where=id+in+(select+tweet+from+[timeline_tweets])&_sort_desc=id&_facet=user`
207
208This will filter your tweets table to just tweets that appear in your timeline, ordered by most recent first and use faceting to show you which users are responsible for the most tweets.
209
210## Retrieving your mentions
211
212The `mentions-timeline` command works like `home-timeline` except it retrieves tweets that mention the authenticated user's account. It records the user account that was mentioned in a `mentions_tweets` table.
213
214It supports `--since` and `--since_id` in the same was as `home-timeline` does.
215
216## Providing input from a SQL query with --sql and --attach
217
218This option is available for some subcommands - run `twitter-to-sqlite command-name --help` to check.
219
220You can provide Twitter screen names (or user IDs or tweet IDs) directly as command-line arguments, or you can provide those screen names or IDs by executing a SQL query.
221
222For example: consider a SQLite database with an `attendees` table listing names and Twitter accounts - something like this:
223
224| First   | Last       | Twitter      |
225|---------|------------|--------------|
226| Simon   | Willison   | simonw       |
227| Avril   | Lavigne    | AvrilLavigne |
228
229You can run the `users-lookup` command to pull the Twitter profile of every user listed in that database by loading the screen names using a `--sql` query:
230
231    $ twitter-to-sqlite users-lookup my.db --sql="select Twitter from attendees"
232
233If your database table contains Twitter IDs, you can select those IDs and pass the `--ids` argument. For example, to fetch the profiles of users who have had their user IDs inserted into the `following` table using the `twitter-to-sqlite friends-ids` command:
234
235    $ twitter-to-sqlite users-lookup my.db --sql="select follower_id from following" --ids
236
237Or to avoid re-fetching users that have already been fetched:
238
239    $ twitter-to-sqlite users-lookup my.db \
240        --sql="select followed_id from following where followed_id not in (
241            select id from users)" --ids
242
243If your data lives in a separate database file you can attach it using `--attach`. For example, consider the attendees example above but the data lives in an `attendees.db` file, and you want to fetch the user profiles into a `tweets.db` file. You could do that like this:
244
245    $ twitter-to-sqlite users-lookup tweets.db \
246        --attach=attendees.db \
247        --sql="select Twitter from attendees.attendees"
248
249The filename (without the extension) will be used as the database alias within SQLite. If you want a different alias for some reason you can specify that with a colon like this:
250
251    $ twitter-to-sqlite users-lookup tweets.db \
252        --attach=foo:attendees.db \
253        --sql="select Twitter from foo.attendees"
254
255## Running searches
256
257The `search` command runs a search against the Twitter [standard search API](https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets).
258
259    $ twitter-to-sqlite search tweets.db "dogsheep"
260
261This will import up to around 320 tweets that match that search term into the `tweets` table. It will also create a record in the `search_runs` table recording that the search took place, and many-to-many records in the `search_runs_tweets` table recording which tweets were seen for that search at that time.
262
263You can use the `--since` parameter to check for previous search runs with the same arguments and only retrieve tweets that were posted since the last retrieved matching tweet.
264
265The following additional options for `search` are supported:
266
267* `--geocode`: `latitude,longitude,radius` where radius is a number followed by mi or km
268* `--lang`: ISO 639-1 language code e.g. `en` or `es`
269* `--locale`: Locale: only `ja` is currently effective
270* `--result_type`: `mixed`, `recent` or `popular`. Defaults to `mixed`
271* `--count`: Number of results per page, defaults to the maximum of 100
272* `--stop_after`: Stop after this many results
273* `--since_id`: Pull tweets since this Tweet ID. You probably want to use `--since` instead of this.
274
275## Capturing tweets in real-time with track and follow
276
277This functionality is **experimental**. Please [file bug reports](https://github.com/dogsheep/twitter-to-sqlite/issues) if you find any!
278
279Twitter provides a real-time API which can be used to subscribe to tweets as they happen. `twitter-to-sqlite` can use this API to continually update a SQLite database with tweets matching certain keywords, or referencing specific users.
280
281### track
282
283To track keywords, use the `track` command:
284
285    $ twitter-to-sqlite track tweets.db kakapo
286
287This command will continue to run until you hit Ctrl+C. It will capture any tweets mentioning the keyword [kakapo](https://en.wikipedia.org/wiki/Kakapo) and store them in the `tweets.db` database file.
288
289You can pass multiple keywords as a space separated list. This will capture tweets matching either of those keywords:
290
291    $ twitter-to-sqlite track tweets.db kakapo raccoon
292
293You can enclose phrases in quotes to search for tweets matching both of those keywords:
294
295    $ twitter-to-sqlite track tweets.db 'trash panda'
296
297See [the Twitter track documentation](https://developer.twitter.com/en/docs/tweets/filter-realtime/guides/basic-stream-parameters#track) for advanced tips on using this command.
298
299Add the `--verbose` option to see matching tweets (in their verbose JSON form) displayed to the terminal as they are captured:
300
301    $ twitter-to-sqlite track tweets.db raccoon --verbose
302
303### follow
304
305The `follow` command will capture all tweets that are relevant to one or more specific Twitter users.
306
307    $ twitter-to-sqlite follow tweets.db nytimes
308
309This includes tweets by those users, tweets that reply to or quote those users and retweets by that user. See [the Twitter follow documentation](https://developer.twitter.com/en/docs/tweets/filter-realtime/guides/basic-stream-parameters#follow) for full details.
310
311The command accepts one or more screen names.
312
313You can feed it numeric Twitter user IDs instead of screen names by using the `--ids` flag.
314
315The command also supports the `--sql` and `--attach` options, and the `--verbose` option for displaying tweets as they are captured.
316
317Here's how to start following tweets from every user ID currently represented as being followed in the `following` table (populated using the `friends-ids` command):
318
319    $ twitter-to-sqlite follow tweets.db \
320        --sql="select distinct followed_id from following" \
321        --ids
322
323## Importing data from your Twitter archive
324
325You can request an archive of your Twitter data by [following these instructions](https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive).
326
327Twitter will send you a link to download a `.zip` file. You can import the contents of that file into a set of tables in a new database file called `archive.db` (each table beginning with the `archive_` prefix) using the `import` command:
328
329    $ twitter-to-sqlite import archive.db ~/Downloads/twitter-2019-06-25-b31f2.zip
330
331This command does not populate any of the regular tables, since Twitter's export data does not exactly match the schema returned by the Twitter API.
332
333It will delete and recreate the corresponding `archive_*` tables every time you run it. If this is not what you want, run the command against a new SQLite database file name rather than running it against one that already exists.
334
335If you have already decompressed your archive, you can run this against the directory that you decompressed it to:
336
337    $ twitter-to-sqlite import archive.db ~/Downloads/twitter-2019-06-25-b31f2/
338
339You can also run it against one or more specific files within that folder. For example, to import just the follower.js and following.js files:
340
341    $ twitter-to-sqlite import archive.db \
342        ~/Downloads/twitter-2019-06-25-b31f2/follower.js \
343        ~/Downloads/twitter-2019-06-25-b31f2/following.js
344
345You may want to use other commands to populate tables based on data from the archive. For example, to retrieve full API versions of each of the tweets you have favourited in your archive, you could run the following:
346
347    $ twitter-to-sqlite statuses-lookup archive.db \
348        --sql='select tweetId from archive_like' \
349        --skip-existing
350
351If you want these imported tweets to then be reflected in the `favorited_by` table, you can do so by applying the following SQL query:
352
353    $ sqlite3 archive.db
354    SQLite version 3.22.0 2018-01-22 18:45:57
355    Enter ".help" for usage hints.
356    sqlite> INSERT OR IGNORE INTO favorited_by (tweet, user)
357       ...>     SELECT tweetId, 'YOUR_TWITTER_ID' FROM archive_like;
358    <Ctrl+D>
359
360Replace YOUR_TWITTER_ID with your numeric Twitter ID. If you don't know that ID you can find it out by running the following:
361
362    $ twitter-to-sqlite fetch \
363        "https://api.twitter.com/1.1/account/verify_credentials.json" \
364        | grep '"id"' | head -n 1
365
366## Design notes
367
368* Tweet IDs are stored as integers, to afford sorting by ID in a sensible way
369* While we configure foreign key relationships between tables, we do not ask SQLite to enforce them. This is used by the `following` table to allow the `followers-ids` and `friends-ids` commands to populate it with user IDs even if the user accounts themselves are not yet present in the `users` table.
Configure Feed

Configure Feed