Subscribe to DSC Newsletter

Hi there,

I am new here and I require help analyzing a set of data that I have.

This dataset is a list of train journeys for a certain time period on a particular day in a country. Each line in the dataset consists of five fields:

  1. a unique journey ID
  2. destination station
  3. time when the commuter tapped out of the station
  4. origin station
  5. time when the commuter tapped into the station

What methods shall I use and codes to write to find out the following?

  1. What is the underlying train network according to the dataset provided?
  2. How many trains are there from any station to another station during that time frame?
  3. Suppose a commuter wants to board a train at Clementi Station at 9:25am and heads to Kallang Station, how many people are there in the train at every station along the journey from Clementi Station to Kallang Station, starting from 9:25am?

Would really really appreciate it if someone can help me with this problem.

Thank you so much!!!

Regards,

Esther

Views: 234

Reply to This

Replies to This Discussion

I am also new here and Analytics field. Let me give it a try:

It will be helpful If you provide some more data/info.

May be Top 10 Records from Data set.

Let say In INDIA - To Analyse the Q1
there are three network or regional zones trains:

Southern Railway ,
Central Railway,
Western Railway - TO identify the network it will help, Like that is there nay Data or Column?

Q2: Can be done by pivoted collapse with all data in a time frame.
Q3: If you have data set of each station wise updated one, this can be identified via simple Excel Or in R program filtering with Data frame.

Gunasekaran Sengodan

Reply to Discussion

RSS

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2018   Data Science Central™   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service