Database-less torrent website with IPFS

2022-01-10

TL;DR

Since IPFS, a p2p network for storing and retrieving files, has a javascript implementation that runs on browsers, it can be used in torrent websites to retrieve a small database from peers instead of the server. A .sqlite file, 13 MB in size containing around 135,000 records, can be retrieved in 8 seconds from the IPFS network using a 100 Mbit connection. Since the database object is small enough to be cached by the browser, subsequent visits take zero.

LIVE DEMO HERE!

1. Websites go down, torrents get lost

Local file sharing websites are struggling with law enforcement and their only defence is to frequently delocate the website by changing DNS name and using new IP as reverse proxy. Many of them have been closed by their owners because of threats of legal actions. One little downside of this is that the owners hardly ever dump a backup database to the wild, so those torrents are virtually lost, because they can't be reached anymore by the public.

2. Database as a "torrent", served from peers

Years ago I found out about IPFS, a p2p network and protocol that aims to decentralize the Internet. It shares the same principles about peers and DHT network of BitTorrent to retrieve files based on hash, even though it's not intended to be a replacement. From the user perspective is much more easier to share and retrieve a file in this network. The only calls you need to use are add() and cat(). Know what? It runs on browser too!

Well, what could they do if someone hosts a torrent website without hosting any data or metadata about torrents? What if the torrent database is retrieved from a peer-to-peer network by the browser? In this case, who is violating what?

The only infos you need to store about a torrent are title, size and the magnet URI of course. Considering an average torrent title to be 50 bytes long, the size 14¹ bytes and the magnet URI 40 bytes (without trackers informations), the size of a single record is about ~100 bytes. What about 100,000 records? Only 10 MB of storage for such amount of torrents. Even though it's relatively small compared to world wide torrent websites, like ThePirateBay hosting a 7 GB database with ~20 million records, it's enough for a local website and for communities around a general topic, small enough to be downloaded almost instantly from the IPFS network.

3. Let's see it in action!

LIVE DEMO HERE!

As I wanted to test the whole thing to see if it could actually work, I started looking for a dump and I found the database from TNT Village. What an irony: TNT Village, closed in 2019, was The italian file sharing website moderated by ScambioEtico, a project/political movement from the Italian Pirate Party. As they closed, The owner decided to dump the database in the current state before shutting down the whole thing. Unfortunately, no one used it to start a new website again and the community moved to other closed websites (private trackers). The torrents inside it are largely dead.

3.1 Shrinking the database, adding it on IPFS

To shrink the size of the dump, originally 26 MB containing 135,000 records, I dropped irrelevant columns like author and post referring to the (now closed) froum, and mildly-relevant columns like the author of the torrent and description. The result is a database of 13 MB, half size of the original. I then added an pinned it to my local IPFS node.

$ sqlite3
sqlite> .mode csv
sqlite> .open dump.csv torrent
sqlite> alter table drop *columns to save space*;
sqlite> vacuum;
sqlite> .save dump.sqlite
$ ipfs add dump.sqlite
$ ipfs pin add 'Qm...some1337cid...ftW'
$ ipfs daemon

3.2 IPFS on browser

Documentation

<script src="https://cdn.jsdelivr.net/npm/ipfs/dist/index.min.js"></script>
<script type="text/javascript">

async function loadAndCat(cid) {
    const node = await Ipfs.create()
    const stream = await node.cat(cid)

    let data = ''
    for await (const chunk of stream) {
        data += chunk.toString()
    }

    console.log(data)
}
loadIPFSAndCat('QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG/readme')

</script>

3.3 SQLite on browser

Documentation

<script src="https://cdn.jsdelivr.net/npm/sql.js@1.6.2/dist/sql-wasm.min.js"></script>
<script type="text/javascript">

async function loadDBAndExec(query) {
  const SQLPromise = initSqlJs({
    locateFile: file => 'https://cdn.jsdelivr.net/npm/sql.js@1.6.2/dist/sql-wasm.wasm'
  })

  const DBPromise = fetch('/path/to/database.sqlite') // This will be done on IPFS
    .then(result => result.arrayBuffer())
  
  const [SQL, buffer] = await Promise.all([SQLPromise, DBPromise])
  const db = new SQL.Database(new Uint8Array(buffer))
  db.exec(query)
}
loadDBAndExec('SELECT * FROM my_table LIMIT 10;')

</script>

3.4 Merge

let dbPromise = new Promise((resolve, reject) => {
  try {
    Promise.all([getDbFromIPFS(), loadDbModule()])
      .then(([buffer, SQL]) => resolve(new SQL.Database(buffer)))
  } catch(e) {
    reject(e)
  }
})

async function getDbFromIPFS() {
  const node = await Ipfs.create()
  const stream = await node.cat('Qm...some1337cid...ftW')

  let i = 0
  const data = new Uint8Array(12873728) // DB size in bytes
  for await (const chunk of stream) {
    data.set(chunk, i)
    i += chunk.length
  }

  return data
}

function loadDbModule() {
  return initSqlJs({
    locateFile: file => 'https://cdn.jsdelivr.net/npm/sql.js@1.6.2/dist/sql-wasm.wasm'
  })
}

function search(searchString) {
  return dbPromise
    .then(database => database.exec(
      'SELECT * FROM torrent WHERE title LIKE \'' + searchString + '%\' LIMIT 10;'
    ))
}
search('something').then(result => console.log(result))

4. Results

Database size: 13 MB
Records: 135,000 defined as (title, size, category, magnet URI)
- magnet URI w/o trackers informations to save space
Loading time (first time): 8 seconds on cold start (download time from IPFS)
Loading time (on reload): 0 seconds (browser caching)

5. Final thoughts

5.1 Pros ordered by relevance

It resists censorship
- taking down one website is useless, you should take down the file from IPFS
In 8 seconds you have full access to every torrent ever uploaded
After first visit, the loading time is 0 till the database is updated (a new CID to download)

5.2 Cons ordered by relevance

Not fully decentralized
- connection to IPFS peers depends on few WebRTC servers that act as gateways. Maybe WebTorrent has more RTC servers
Text search is limited to starts with or contains
- Unfortunately sql.js doesn't have the text search module. There are maybe other implementations having it
Small database
- since I consider 8 seconds the max time a user should wait for actually using a website of this kind, that means it can't host database heavier than 13 MB, so 130,000 records is the upper limit
Non-tech users can be scared of the loading percentage (perceived as an ad or cryptominer)

Peace!

size expressed in bytes, stored as a string ↩