LSI and higher-order techniques, such as PARAFAC, can decompose term-document matrices into individual concepts. One potential application of this may be to predict the topics of damaged files based on filename, metadata, and what data is present and uncorrupted.
(Note to self: potential future research area).