Aral Balkan @aral

Better Blocker’s HTTP Archive (HAR) collection now available over DAT

Our evolving collection of HAR files from our web crawls in search of trackers is now available over DAT for researchers and anyone else who’s interested under Creative Commons ShareAlike license:

archive.better.fyi

DAT link:

dat://2f063e531e6352f0326b8e0f076d74fc6fd694d77d4d5dafeb77659847214247

Please do let me know how you get on if you have a play with it. mastodon.ar.al/media/6fJB_6ySk

· Web · 10 · 5

@aral Very interesting. What kind of scientific questions would you like researchers to answer with this data?

@hiemstra That’s up to them. We gather and use it to classify the prevalence of (and to block) third-party trackers.

@aral Sure, just wondered if there's anything you would like us to work on.

@hiemstra Visualisations are always great to have and help raise awareness. Also, classifications – are there patterns between groups of trackers / the types of sites / etc. Would be interesting to see IP-address related studies for clustering trackers/determining ownership. Those are just off the top of my head. Anything that gives us further insight and possibly even results in predictive blocking abilities would help the Web :)

@aral Thanks! This is helpful. Looks like an interesting dataset to explore.

@hiemstra Neat :) Look forward to seeing what you do with it :)