One of the things I dabble in is data mining. I'm quite good with getting computers to do difficult things. I also have quite strong political views and interests (although not a 'Political Person' with a capital 'P').
So I built a prototype data workflow that scrapes the entire Register of Member's Interests from the HoC, and extracts funders amounts and dates. This shows the tangled web of political funding.
#dataviz #ukpolitics #money
More on this project as it evolves.
The HoC has made the data as opaque as possible to get at. It's published a pure text in PDF files, and requires scraping. TheyWorkForYou.com do this as best they can and publish the data as XML, but it still requires post-processing.
A lot of the data is badly formatted and buggers up the calculations:
The process uses a KNIME workflow to scrape the register archives. Grey nodes are funders, coloured nodes are MPs.
At present, the scraping only goes back to 2022. I could extend it further but there's enough here to be getting on with right now.
The output is dumped into a Power BI dataset: I'm just getting to grips with this tool.
I'm currently working to fix anomalous amounts: the 'red supergiant' is Siobhain McDonagh
That anonomaly is anything but.
'Name of donor: Waheed Alli . Address of donor: private . Amount of donation or nature and value if donation in kind: Interest free loan of £1,200,000, to be repaid on the sale of the home I share with a family member. The move was necessary to provide the family member with complete ground floor access. . Date received: 14 March 2023 . Date accepted: 14 March 2023 . '
@deadlyvices Whoa! Now that IS interesting.
@sellathechemist It's a very interesting dog's breakfast
@deadlyvices I'm sure this is not accidental.
@deadlyvices This is brilliant. Big data is not just a tool for the powerful.