RSBB 2023 - WeLoveDataScience Hackstathon: Stats / Data Science competition
Hack-what?
Here is a mix between Hacking, Statistics and Marathon.
Concretely, we organize a small competition open to all RSBB participants and students in statistics/machine learning.
In short...
We come with...
Business context
Real dataset coming along with some metadata
Open questions / topics
Our support and guidance
You come with...
Your minds
Some (limited but on purpose) time
your knowledge/expertience in statistics
Any tool you have at your disposal and would like to use
We have a match! Then we think it is possible to end up with:
Application of your prefered domain of expertise: statistics/ML
Presentation of your approach, analyses and outcome
Code
Ideas
Fun
Ohhh, and we also come with
Feedback on your work
Prizes !
Opportunity to present your work to a company
Recognition of your work
More concretely
Agenda
You register on RSBB2023 website, having already team mates or we will find some for you. Deadline: October, 8th
you come back on this page on October, 13th: data will be made available!
You work as you want and we will organize a Team session with every team during Tuesday and/or Wednesday
You prepare a presentation (notebook/Rmarkdown is also possible)
You will present on Friday afternoon
Some information we already want to share
Business context
Data are kindly provided by a tour operator company who proposes rental of properties (houses/flats/castles...).
Think "booking.com","airbnb"...
They will consist in a subset of the information available within the company related to bookings,
some characteristics on properties, some other on customers, satisfaction surveys...
We selected for you a shortlist of 200 variables (so that it begins to be of interest but still manageable
in limited time) (plus two IDs). There are more than 175k records (same reasons!).
Those data thus come from a database and gather information from several dimensions.
There are even temporal data for time-series lovers! Note: prices of reservations won't be put at disposal;
for the rest based on any experience you may have with such company you can imagine what might be contained within
those 200 variables.
What we expect from you?
Think about what you would like to do with those data:
(business oriented) Question/problem statistics can help to answer
Statistical algorithm/method you would like to use: clustering, factorial analysis, descriptive, predictive,
time-series, anomaly detection... Could your current research be applied on those data?
Then: do all your best to answer your own questions / apply your methodology
You can use any tool you want but we prefer programming language and open source (python/R).
Still if you want to do some analyses in Excel, it is also fine for us. Commercial software are also accepted.
You can derive new variables (feature engineering)
You may even enrich data with external ones (open data?)
You can ask questions and have support from WeLoveDataScience via Teams (or physically on Thursday during conference)
Part of a good hackathon: ideation! (important but only one slide)
What are other opportunies for this tour operator? What else would you propose to do?
What if you would have had at disposal other data you suspect this tour operator to have in-house (which ones and then... what?)
Would you think about any open data source (tip: we are in Belgium so we filtered on transactions in this country) / open software that could be useful?
Well: anything you might want to share, could even be: "I know someone who... could help you / would enjoy working in a company having such businesss"...
You prepare your deliverables you present
Several formats possible: slides (powerpoint/latex), but why not notebboks, Rmarkdown or an interactive small application (Shiny...)
Deliverable: code (ideally on github) or - for non-open source - name and functionalities of softwares
Pay attention: this will be an approximative 5" presentation only! Summarize, synthetize, go to the point.
Be aware that any company from private sector likes to know "what is it there for me" and does not (necessarily) want/like
to see all details
Be ready for questions
Be creative
Competition: evaluation criteria
Participants will be evaluated based on various criteria:
Relevancy of the business question? (does the question present an interest: added value in case we could provide good answers to it)
Relevancy of the methodology used to address the question
Reproductibility of the analysis
Quality of code (if used)
Scalibility of the proposed approach (what if company has 100x more data? Is this still valid?)
Honesty/transparency about limitations of your project
Quality of the presentation: is it to the point? at the right level of details? Understandable by the business? Are the answers to our questions satisfying?