Getting PDF attributes (title, description) in bulk
-
Does anyone know of a way to crawl PDFs' metadata in bulk? Screaming Frog can give me a list of the PDFs, but it can't give me any info on their metadata (i.e. title, description, etc.).
I'm fully willing to hack together a solution if there's nothing currently available, just hoping to get some insight into where this metadata is even accessed.
-
It's definitely an interesting question. I don't know why Screaming Frog doesn't do it yet. It sounds like an interesting feature request.
So, I looked into it a little further. I love Screaming Frog for it's ease of use, but I know it's not the only crawler out there. I came back with two things that should work. First there is Nutch. Second there is Tika. Of the two, it looks like Tika is better suited for your needs, out of the box.
I'll definitely play with Tika in the future.
-
Thanks Travis, I'll take a look at both.