Bypassing Newspapers.com paywall and hunting down obituaries

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ@lemmy.dbzer0.com – 60 points – 12 months ago

(As part of the Reddit migration, any time I'm only able to find info on Reddit, I'm reposting it to kbin/Lemmy.)

TL;DR - To get the page's OCR text from Newspapers.com, replace /image/ with /newspage/ in the url with the thumbnail.

EDIT: @godless Pointed out that some libraries have access to Newspapers.com through a Library Edition portal. My local library has several newspaper archives, and I figured the first couple would be the most complete. Nope, but there was Newspapers.com Library Edition access buried under the fold. That worked!

Bonus tip - Also search for current info of close family members. The spokeo hit was due to searching his mother's name, and spokeo is too dumb to understand that deceased people don't move with their families to future homes. It treated his records like he was living ("Current" address, phone numbers, etc were listed, even though they were for his sister, who's still alive).

And here's my rant/vent/story...

I was looking for an obituary in that nebulous early 90's time period where only some info is digitized. Hi s family's having a memorial for him next week and I was hoping to bring a pic of the newspaper from his birthday and deathday, along with the obit. I had a general idea of the date of death, knew the city and funeral home, and his name minus middle initial. Sites like legacy.com refused to return a match. Even the state and county records sites were useless.

After a couple hours, I had only 2 partial hits. Bing Chat (yeah, I was surprised, too) said it found the obit, but it was locked behind a paywall. The newspaper that had it (which I checked earlier) said nothing was there. It appears that the obits are available going back to 2004. Dates before that were supposedly available in the paper's archive. The archive was 404. Or, rather, the entire domain was 404.

The second hit was on spokeo - one of those obnoxious sites that gives partial info and then wants you to subscribe to 3 different levels of services. But, from there I got his middle initial and the exact birthday and death date. That info helped.

I eventually made it to Newspapers.com, which threw up a paywall, but indicated it had the info. I did the usual checking the source and css, reader mode, incognito, etc. It was clear that the image was probably there, judging by the css. Nope. The only info I could find on getting through that barrier was on Reddit. It doesn't lead to the paper image, but the OCR text. Just replace /image/ with /newspage/ in the url with the thumbnail.

Good. It existed and was exactly where I was expecting through the whole search. Now to get the paper image that the text was extracted from... nope. Gotta sign up.

One last thing to try again, since Newspapers.com gave me the exact PAGE NUMBER.

I tried looking into the archives of the paper available in the library's database. It appears most obits (non-newsworthy ones) were excluded. My hypothesis is that the paper sold the archives to a site that stipulated that they must be excluded from other sources. It's the only explanation.

So, looks like I'll be visiting the library Monday to see if they have microfiche of the paper. WTF is going on that I can't find a major metropolitan newspaper's obit section in 2023? I can find 15 million pictures of influencers' breakfasts, but a 2x2 inch shred of paper is completely inaccessible. Not even a torrent out there of this stuff because who the fuck would make it hard to find an old newspaper?

(Forgot to mention that I used Google, Bing, DDG, and SearXNG. Bing was the most helpful, Google the least helpful.)

This shit right here is why I pirate - "great" business models. If there was a torrent of the entire decade's worth of that newspaper, it would have been easier to download that, compared to jumping through all these hoops.

You are viewing a single comment

View all comments

I know this is a piracy community, but honestly I don't mind paying for newspapers.com. I use it a lot just to read up on old articles and stuff and they seem to be doing a pretty good job adding new newspapers to the archive all the time.

For me piracy is great when the product is just outright overpriced because some corporate tools in New York or Los Gatos are trying to make their VC people happy.

I'm sure it's a fine service, if you want to use it regularly, but I just wanted 1 tiny thing. If they had a $1 for an obit or a page deal, sure. Instead, there's this whole microcosm of bullshit where some are archived, others available, some omitted from public collections, some on different 3rd party sites, etc.

The family paid for an obit. It wasn't in the 1800s. The paper has been digitized. I should be able to go to the paper with the name, exact date, and city and find it. They literally say it doesn't exist. Not that it's on our archive site or our partner site, just nothing.

I would have thrown a couple bucks to any of the sites for access, but no, I need to sign up for a subscription, give them all my details, get spam calls for the next 100 years, just no. Super frustrating.

I agree. There should be some sort of a la carte service where you can pay a couple of bucks and use it for like a day.