Dear System Administrators…

we need to talk.

As data warehouse specialists, our job is to analyze your customers data. Oftentimes, this data is stored in one of your operational database systems and in log files stored somewhere on the server you’re taking care of.

We source 80 % of our data from these two data sources. Databases and log files.

Now comes the caveat. Our clients (and your users) want to analyze historical data. They want to know the development of a product’s sales over time. They want to analyze the traffic on their web sites over the last 5 years.

So, our first question is this:

Do you have daily backups of your data?

If yes, did you keep them for the last 5 years?

How about your log files?

For some reason (mostly storage), system administrators do not take the overhead of keeping a daily  backup of their operational databases and log files. Our problem, as data analysts, is that product information (such as the price of a product or it’s features) change over time. When we come to a client, we often see only the last state of a product. That is, because your operational systems store only the latest version of a record. For that reason, we hate SQL’s UPDATE statement, because we are also interested in the old version of the record.

We have never met. But there is a good chance that one of our colleagues will come to you one day and ask for one thing: historical data. Please, do us two favours. Dump your database tables to a compressed CSV or XML every day and turn off log rotate (or backup your log files daily, as well).

For example, a simple command for MySQL is doing all the heavy work for you:

mysqldump --all-databases --xml <code>--lock-tables=false</code> -ppassword | gzip -c | cat &gt; /media/bak/servername-$(date +%Y-%m-%d)-mysql.xml.gz

While we don’t like XML files that much, this command backups all your databases to a compressed XML file. Daily if you schedule it using a CRON job. The only caveat is that you need to have shell access to the database server.

Another option is the use of a tool, such as EMS Data Export for MySQL (which is also available for other database systems, such as Oracle, Microsoft SQL Server, DB2, PostgreSQL and Firebird/Interbase). The process for exporting the tables of a given database using their tools is described in the following articles:

Thank you for reading. Please leave your comments.

3 thoughts on “Dear System Administrators…

  1. Pingback: Week in Review | Michael Olschimke

  2. Hi, The post is informative. I haven’t worked on database, Just curious to know where all these Historical data are stored?

    • Hi Raman,

      the historical data is stored in zipped files – not the best option, but better than nothing.
      If you have a chance and your SQL Server supports this, turn on CDC, which is more fun from a BI perspective.

      In any other case, I would take zipped database exports as described in the article.

      Cheers,

      Michael

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>