Data that contain errors. Dirty data can be caused by a number of factors including: inaccurate, incomplete or erroneous data such as spelling or punctuation errors, incorrect data or incorrect data type associated with a field, incomplete or outdated data, duplicate data, inconsistent data, incorrectly ordered data, improper parsing of fields from disparate systems, etc. Errors can be introduced at any stage as data are entered, stored and managed. Using a dirty dataset can lead to spurious associations, false conclusions and misdirected investments. SYNONYM. Dirty dataset