I don't think it would be difficult (for the NSA, CIA etc) to automate. The big part is getting the taps on the data line of any interesting site or
the user. These record the IP address and the arrival/departure times the data is sent/recieved. (the meta data, Snoden said they are already doing
this).
You then build a signature from these times and match them up. As long as the data path is constant and delay time variation << than the time
between data exchanges (mouse clicks) after 5-10 data exchanges they can match you. For a low traffic site given enough exchanges from a visitor,
there is no need to even match the IP addresses.
The matching should be O(N) where N is the number being watched.
First sort them into bins bases on the times. N=millions is not an issue.
I think TOR uses the same data path for each session, so its no help,
against the NSA, its a "honey pot".
Algorithm:
Write bot to read SM look for interesting words.
Identify interesting poster, get his post times.
Match post times to data io times
Match data times to joe-blows data times tapped from his ISP.
Send robot cop to arrest joe |