I am not an expert in database design, since most of my career I have worked with alternate data storage / data access solutions. But one of the very first projects I had to do back in 1985 when I was a student was to write the code for a fully functional database architecture, in Pascal, from scratch. You will probably find some of my questions naive, and some intriguing.

  • When were variable-width fields first introduced? How is this type of data stored, depending on vendor or implementation: arrays, variable arrays or linked lists, or something else? Why is it always necessary to specify a max length? Is it to make indexing more efficient, or because of limitations in the way packets are transmitted across intranet, or across the Internet? Is NoSQL technology better at dealing with variable-width records?
  • How do you store images or videos in databases? I'm talking about physically storing the videos, e.g. using the 'blob' binary data type. And how do you store vector images? In graph databases?
  • Why is importing CSV columns containing variable-width text so cumbersome with SQL Server? Is it any easier with other database systems?
  • Are there any fast, efficient database clients allowing you to perform some of the computations (maybe sorting or simple analytics) in-memory with traditional SQL: NOT on the database server itself, but locally on your machine, or even on some external machine? Or is the only option based on cloud-computing technology, Map Reduce, Hadoop etc.  By fast client, I mean something far superior to Toad or Brio (the two clients I've been working with), as basic data selection (without join) using their interface, on your desktop, is 100 times slower than accessing the database server straight via a Python script connecting directly to the database server.

Related articles

Views: 5230


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Vincent Granville on July 9, 2013 at 12:58pm

@Richard: At eBay about 4 years ago, there was a system telling you about DB server usage ("red" meaning tons of users at the same time, or some very big queries running or in the queue). DB access/processing (Teradata) could be very slow at times, although I thing you could schedule your job during the night if priority was low. That's how I came with the idea (probably not an original one) of moving some of the processing (painful joins, sorting etc.) outside the fast but over-loaded DB server to your slow but under-loaded local machine.

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service