Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Slightly related to this: where can I find data sanitizers for common file formats (PDF, MP3 and so on)?


I strip all mp3 metadata using the 'id3mtag' tool[1].

  id3 -d *.mp3 ; id3 -2 -d *.mp3
That deleted all tags - v1 and v2 id3 tags.

I don't do this for security - I just don't like mp3 metadata competing with metadata in the filename and most mp3 metadata is laughably bad anyway[2] so I just wipe it.

[1] /usr/ports/audio/id3mtag on FreeBSD

[2] Misspellings, First Last instead of Last, First, ALL CAPS ALL THE TIME and using special characters/unicode that always breaks car stereo implementations.


what counts as sanitizing? How do you know a file is malicious?


Especially with PDFs, my "sanitization" can be your "stripped away all the fonts and functionality - might as well have given me a plain .TXT", and vice versa.


"might as well have given me a plain .TXT""

Yes, please - that sounds fantastic.


I agree - but it's 1.surprisingly complicated for a general solution (positioning and such), and 2.not really a solution for the usual end user (who might appreciate a JPEG instead)


(btw there's `pdftotext`, which is pretty good in most cases)


Read data according to spec, drop stuff that is incorrect and write it back.

For example if MP3 genre field is 999 bytes long cut it down to 32 bytes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: