I’ve not done exactly this before, but similar things (programmatic manipulation of a big pile of Lots Of Things).
IME, whatever automated strategy you go for won’t be very good initially and will want refinement. So you need to be able to try an approach, see how well it went, tweak it, try again, etc.
Doing this over an Internet connection will be awful, so I would:
1. Download everything to a local machine (Gmail does provide IMAP access last time I looked).
2. Implement a filtering tool using your ideas for what wants keeping. You could look at something like spamassassin, or indeed script it yourself with something like python.
3. Run the tool on a copy of.your downloaded email mountain.
4. See how it did. Good enough? Goto 5. Needs work? Goto 2.
5. Upload reduced mountain to new email service.
6. Six months later realize you missed something important and be glad.you forgot to delete the downloaded mountain.