Subscribe to DSC Newsletter

Good morning all,

I am conducting research on various data warehousing and analytics tools for a graduate project using my organizational data sources. I work at a college. So, we often have a list of students for which we need additional information. Our current system (Informer) limits the filters to around 50 values, resulting in multiple runs of a query. This seems inefficient to me. Could anyone advise if there is a better system which allows large filters, sometimes up to 10,000 values? I am open to both open-source and commercial systems. 

Thank you,

-------------------------------------------
Michelle Ford
Research Analyst
-------------------------------------------

Views: 337

Reply to This

Replies to This Discussion

Are you sure you can't simplify the conditions? Are there other derived conditions (particular cohort years, for example) that could be better suited to filtering through your already built-out solutions?

If you can't simplify your restrictions, it sounds like you would have a better chance with constructing a raw SQL query. If your Informer system is the same as etrinsik Informer then it should just be a wrapper for one or more other databases. Depending on which database it is, you should be able to install a developer studio that will accommodate your needs -- MySQL Workbench for a MySQL database, SQL Server Management Studio for Microsoft SQL, and Oracle SQL Developer for Oracle databases. All of these are free for developer versions -- the pricing model is for making money off the server software, so these companies already got their dues.

The first stop down this route would be your college's IT resource to figure out what type of database your data warehouse connects to. Then, make sure you get permissions to download the relevant developer studio for the database you're connecting to. Make sure you get the proper logins, and start building your SQL query.

For that high of a filter, I would either find a way to derive a sub-selection criteria with the more flexible SQL syntax, or at the very worst load the filter criteria into a temp table. Then, I would use that temp table in an INNER JOIN condition to the table you're wanting to filter, to make sure you pull out only matches between your filter list and the desired table source.

Thank you Andrew. It is Entrinsik Informer. We are connected to a U2 datasource. The problem is that if students add or drop after the 14th day, the list will not be the same when running a query from Informer. For federal and other reporting, we use the 14th day as final. At times, we need additional information that is not archived with the 14th day data. So, some students may be left out or added if we relied on querying the live database. 

To put it in perspective, the other week, I spent 2 hours running a query 21 times to obtain the information necessary because the filter maxed out at 50 IDs (I had a list of 1002 students). Now that I think about it, I may have been able to run it with a filter on the term. Then, run it again with a filter on the missing IDs, and removed any extra items. Doh!

I do like your idea of storing the information in a SQL database. Then, connecting that to Informer. I will have to get with IT on this. I am starting to store our historical files in an Access database, simply to have them all in one location, but I need to clean the historical data so it can be queried. 

Andrew Troemner said:

Are you sure you can't simplify the conditions? Are there other derived conditions (particular cohort years, for example) that could be better suited to filtering through your already built-out solutions?

If you can't simplify your restrictions, it sounds like you would have a better chance with constructing a raw SQL query. If your Informer system is the same as etrinsik Informer then it should just be a wrapper for one or more other databases. Depending on which database it is, you should be able to install a developer studio that will accommodate your needs -- MySQL Workbench for a MySQL database, SQL Server Management Studio for Microsoft SQL, and Oracle SQL Developer for Oracle databases. All of these are free for developer versions -- the pricing model is for making money off the server software, so these companies already got their dues.

The first stop down this route would be your college's IT resource to figure out what type of database your data warehouse connects to. Then, make sure you get permissions to download the relevant developer studio for the database you're connecting to. Make sure you get the proper logins, and start building your SQL query.

For that high of a filter, I would either find a way to derive a sub-selection criteria with the more flexible SQL syntax, or at the very worst load the filter criteria into a temp table. Then, I would use that temp table in an INNER JOIN condition to the table you're wanting to filter, to make sure you pull out only matches between your filter list and the desired table source.

Ehh, not familiar with U2. If the Rocket (manufacturer) documents are to believed, it does support a SQL interface through the ODBC (Open Data Base Connectivity) driver set, but it requires a bit of database configuration to perform the necessary mapping. Might be overkill, but it would add more flexibility without requiring you to learn a rather niche DB query language.

I used to work for my local state higher ed department, and we similarly had problems with late enrollments being caught up in new data pulls. We ended up using SQL with an audit trail table to "rewind" what a student's school and status were at an arbitrary point in the past. As complicated as Institutional Research can get, it's extremely helpful to tackle the problem from a SQL standpoint to structure your data right.

Glad to hear a different filter criteria would at least get you in the right direction. I think Access is an okay starting point, but loading that into a more stable server resource like SQL Server or MySQL might be more consistent with your Institutional Research needs from a long-term perspective. Access tends to bog down under much of a load at all, and server resources are considerably more powerful than individual desktop resources.


Michelle Ford said:

Thank you Andrew. It is Entrinsik Informer. We are connected to a U2 datasource. The problem is that if students add or drop after the 14th day, the list will not be the same when running a query from Informer. For federal and other reporting, we use the 14th day as final. At times, we need additional information that is not archived with the 14th day data. So, some students may be left out or added if we relied on querying the live database. 

To put it in perspective, the other week, I spent 2 hours running a query 21 times to obtain the information necessary because the filter maxed out at 50 IDs (I had a list of 1002 students). Now that I think about it, I may have been able to run it with a filter on the term. Then, run it again with a filter on the missing IDs, and removed any extra items. Doh!

I do like your idea of storing the information in a SQL database. Then, connecting that to Informer. I will have to get with IT on this. I am starting to store our historical files in an Access database, simply to have them all in one location, but I need to clean the historical data so it can be queried. 

I definitely plan to move it to SQL. I don't have access to a SQL database yet though. So, it's a holding place for all of the files until I have SQL access. 

I wish the data were structured better. I like the idea of the audit trail table to provide historical information. It would be helpful if the date and time of the changes were captured in the current system. So, we could simply filter on a date instead of having multiple systems. Oh well, one can dream. 

Thanks again! 

Andrew Troemner said:

Ehh, not familiar with U2. If the Rocket (manufacturer) documents are to believed, it does support a SQL interface through the ODBC (Open Data Base Connectivity) driver set, but it requires a bit of database configuration to perform the necessary mapping. Might be overkill, but it would add more flexibility without requiring you to learn a rather niche DB query language.

I used to work for my local state higher ed department, and we similarly had problems with late enrollments being caught up in new data pulls. We ended up using SQL with an audit trail table to "rewind" what a student's school and status were at an arbitrary point in the past. As complicated as Institutional Research can get, it's extremely helpful to tackle the problem from a SQL standpoint to structure your data right.

Glad to hear a different filter criteria would at least get you in the right direction. I think Access is an okay starting point, but loading that into a more stable server resource like SQL Server or MySQL might be more consistent with your Institutional Research needs from a long-term perspective. Access tends to bog down under much of a load at all, and server resources are considerably more powerful than individual desktop resources.


Michelle Ford said:

Thank you Andrew. It is Entrinsik Informer. We are connected to a U2 datasource. The problem is that if students add or drop after the 14th day, the list will not be the same when running a query from Informer. For federal and other reporting, we use the 14th day as final. At times, we need additional information that is not archived with the 14th day data. So, some students may be left out or added if we relied on querying the live database. 

To put it in perspective, the other week, I spent 2 hours running a query 21 times to obtain the information necessary because the filter maxed out at 50 IDs (I had a list of 1002 students). Now that I think about it, I may have been able to run it with a filter on the term. Then, run it again with a filter on the missing IDs, and removed any extra items. Doh!

I do like your idea of storing the information in a SQL database. Then, connecting that to Informer. I will have to get with IT on this. I am starting to store our historical files in an Access database, simply to have them all in one location, but I need to clean the historical data so it can be queried. 

I'm glad you found it useful! I think you're headed in the right direction, and learning to load data into SQL is a valuable skill in its own right. Luckily, Access supports migrating data to MS SQL Server in a relatively user-friendly way, and you can install a SQL database on your local machine as long as it's less than 2 GB (the developer version's database size limit).

Best of luck!


Michelle Ford said:

I definitely plan to move it to SQL. I don't have access to a SQL database yet though. So, it's a holding place for all of the files until I have SQL access. 

I wish the data were structured better. I like the idea of the audit trail table to provide historical information. It would be helpful if the date and time of the changes were captured in the current system. So, we could simply filter on a date instead of having multiple systems. Oh well, one can dream. 

Thanks again! 

RSS

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service