we need to talk.
As data warehouse specialists, our job is to analyze your customers data. Oftentimes, this data is stored in one of your operational database systems and in log files stored somewhere on the server you’re taking care of.
We source 80 % of our data from these two data sources. Databases and log files.
Now comes the caveat. Our clients (and your users) want to analyze historical data. They want to know the development of a product’s sales over time. They want to analyze the traffic on their web sites over the last 5 years.
So, our first question is this:
Do you have daily backups of your data?
If yes, did you keep them for the last 5 years?
How about your log files?
For some reason (mostly storage), system administrators do not take the overhead of keeping a daily backup of their operational databases and log files. Our problem, as data analysts, is that product information (such as the price of a product or it’s features) change over time. When we come to a client, we often see only the last state of a product. That is, because your operational systems store only the latest version of a record. For that reason, we hate SQL’s UPDATE statement, because we are also interested in the old version of the record.
We have never met. But there is a good chance that one of our colleagues will come to you one day and ask for one thing: historical data. Please, do us two favours. Dump your database tables to a compressed CSV or XML every day and turn off log rotate (or backup your log files daily, as well).
For example, a simple command for MySQL is doing all the heavy work for you:
mysqldump --all-databases --xml <code>--lock-tables=false</code> -ppassword | gzip -c | cat > /media/bak/servername-$(date +%Y-%m-%d)-mysql.xml.gz
While we don’t like XML files that much, this command backups all your databases to a compressed XML file. Daily if you schedule it using a CRON job. The only caveat is that you need to have shell access to the database server.
Another option is the use of a tool, such as EMS Data Export for MySQL (which is also available for other database systems, such as Oracle, Microsoft SQL Server, DB2, PostgreSQL and Firebird/Interbase). The process for exporting the tables of a given database using their tools is described in the following articles:
- Export all tables of an Oracle database
- Export all tables of a Microsoft SQL Server database
- Export all tables of a DB2 database
- Export all tables of a PostgreSQL database
- Export all tables of a Firebird/Interbase database
- Export all tables of a MySQL database
Thank you for reading. Please leave your comments.