STAT-437-mhudelson-2025-09-18-11-13-18

 

Title of Course [High Dimensional Data Learning and Visualization]

Prefix and Number [STAT 437]

Semester and Year [tbd]

Number of Credit Hours [3 Credits (2 credits lecture, 1 credit studio)]

Prerequisites [STAT 435]

Course Details

Day and Time: [tbd]

Meeting Location: [tbd]

 

Instructor Contact Information

Instructor Name: [tbd]

Instructor Contact Information: [office location, phone, email] [tbd]

Instructor Office Hours: [click here for best practices] [tbd]

 

TA Name: [tbd]

TA Contact Information: [office location, phone, email]: [tbd]

TA Office Hours: [click here for best practices] [tbd]

 

Course Description

[This course is the second part of a two-course sequence whose first part is STAT 435 “Statistical Modeling for Data Analytics”. STAT 435 focuses on supervised learning via regression models and their regularized versions. STAT 437 focuses on visualization, non-predictive modeling, and unsupervised learning.
It will cover the following topics: data visualization (via R packages such as ggplot2, gganimate, igraph, plotly), metric-based clustering (such as hierarchical clustering, and K-means), probabilistic and metric-based classification (such as nearest neighbor, mixture models, support vector machine, tree-based method, and neural networks), algebraic and probabilistic dimension reduction (such as principal components, spectral methods, and latent variable models), scalable and approximate inferential methods (such as variational inference and approximate Bayesian computation).
The methods to be covered by the course will be implemented by the software R.]

 

Course Materials 

Required Books: [An introduction to statistical learning (with application in R), Corrected 8th printing, 2017, G. James, D. Witten, T. Hastie, R. Tibshirani]

Supplementary Texts: 1. ggplot2: elegant graphs for data analysis, 2009, Hadley Wickham
2. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition, 2009, T. Hastie, R. Tibshirani and J. Friedman
3. R for Data Science, 2019, Hadley Wickham and Garrett Grolemund]

Other Materials: [None]

Fees: [None]

Student Learning Outcomes (SLOs) [add more lines if necessary]

Course Learning Outcomes

(students will be able to:)

Activities Supporting the Learning Outcomes Assessment of the Learning Outcomes
[Perform data visualization]

[There will be approximately 4 to 7 homework assignments, and each homework assignment usually contains both conceptual exercises and applied exercises, the latter of which require software implementation.

 

There will be 2 projects whose written reports need to be submitted.]

Homework is graded on a per-problem basis.  Answers are to be submitted with  necessary supporting computer codes, and well-organized. 

 

The written project rubric includes details concerning Introduction, Methods and Results, Discussion, and Appendix:

The written project report should contain at least the following sections:
Introduction: This section must contain a summary of the problems to be resolved, a review of pertinent and existing methods that have been or are used to tackle these problems, and a statement of the proposed solutions to the problems. It should usually not exceed 2 pages in length.
Methods and Results: This section should contain details on how the problems described in the Introduction are resolved, and present the solutions, with sufficient supporting evidence, to the problems. It usually is the main part of the project report, and contains the most relevant outputs from your analysis.
Discussion: This section should discuss aspects of the solutions or results that can be further improved, associated issues that may require further investigation, and implications of the results not directly related to the problems.
Appendix: This section contains the full, relevant computer codes that have been used to conduct the analysis.

[Perform clustering, classification, and dimension reduction]
[Perform approximate and scalable statistical inference]

Course Schedule

[Please note that a WSU semester is 15 weeks + Thanksgiving/Spring Break. The schedule below does not include the break.]

Dates Lesson Topic Assignment Assessment

Week 1
[dates]

 1. Basics on R packages dplyr and ggplot2
2. Create scatter plot
  Homework 1 assigned   N/A
Week 2
[dates]
  1. Elementary Visualizations (via ggplot2): density plot, histogram, boxplot, barplot, pie chart
2. Advanced Visualizations via ggplot2: faceting, annotation
   Project 1 assigned    N/A
Week 3
[dates]
  1. Advanced Visualizations via ggplot2: adjusting scales, legends, fonts, orientation
2. Advanced Visualizations via ggplot2: math expressions, and other ggplot2 tricks

  Homework 1 due

Homework 2 assigned

 Homework 1 graded on a per-problem basis
Week 4
[dates]
   1. Visualizing spatial data, networks and graphs
2. Basics for interactive and dynamic visualization
3. K-means clustering: I
   None    N/A
Week 5
[dates]
   1. K-means clustering: II
2. Hierarchical clustering: I

 Homework 2 due

Homework 3 assigned

   Homework 2 graded on a per-problem basis
Week 6
[dates]
   1. Hierarchical clustering: II   None   N/A
Week 7
[dates]
  1. Hierarchical clustering: III
2. Bayes classifier
3. Nearest-neighbor classifier: Part I

  Homework 3 due

Homework 4 assigned

   Homework 3 graded on a per-problem basis
Week 8
[dates]
  1. Nearest-neighbor classifier: Part II
2. Discriminant analysis for classification: Part I
  Project 2 assigned   N/A
Week 9
[dates]
   1. Discriminant analysis for classification: Part II   None    N/A
Week 10
[dates]
  1. Discriminant analysis for classification: Part III   Project 1 due    Project graded using rubric
Week 11
[dates]
 1. Support vector machines for classification: Part I

  Homework 4 due

Homework 5 assigned

  Homework 4 graded on a per-problem basis
Week 12
[dates]
 1. Support vector machines for classification: Part II
2. Neural networks: I
   None  N/A
Week 13
[dates]
  1. Neural networks: II
2. Principal component analysis for dimension reduction: Part I
  None   Homework 5 graded on a per-problem basis
Week 14
[dates]
1. Principal component analysis for dimension reduction: Part II   None    N/A
Week 15
[dates]
   1. Principal component analysis for dimension reduction: Part III
2. Multidimensional scaling (if time allows)
 Project 2 due  Project 2 graded using the project rubric

 

 

Expectations for Student Effort 

For each hour of lecture equivalent and each hour in studio, students should expect to have a minimum of two hours of work outside of class. 

 

Grading [add more lines if necessary]

Assignment Breakdown
Type of Assignment (tests, papers, etc) Number/Frequency Percent of Overall Grade
In-class discussion Weekly 12%
Weekly class survey Weekly 3%
Assignments 5 50%
Projects 2 35%

 

Grading Schema
Grade Percent Grade Percent
A

93-100

C 73-76.99
A-  90-92.99 C- 70-72.99
B+ 87-89.99 D+ 66-69.99
B 83-86.99 D 60-65.99
B- 80-82.99 F 0-59.99
C+ 77-79.99  

 


Attendance and Make-Up Policy 

“Students should make all reasonable efforts to attend all class meetings. However, in the event a student is unable to attend a class, it is the responsibility of the student to inform the instructor as soon as possible, explain the reason for the absence (and provide documentation, if appropriate), and make up class work missed within a reasonable amount of time, if allowed. Missing class meetings may result in reducing the overall grade in the class.  

Late assignments are not accepted, except under extenuating circumstances. If there are extenuating circumstances, such as an extended illness, a student should contact the instructor before the homework due time and provide the instructor with needed evidence, and the instructor, at his or her discretion, may extend the due date of a homework assignment. Usually, a maximal extension is 7 days. However, any such decision, in particular regarding the length of an extension, will made by the instructor on a case-by-case basis.

 


Academic Integrity Statement

Academic integrity is the cornerstone of higher education. As such, all members of the university community share responsibility for maintaining and promoting the principles of integrity in all activities, including academic integrity and honest scholarship. Academic integrity will be strongly enforced in this course. Students who violate WSU’s Academic Integrity Policy (identified in Washington Administrative Code WAC 504-26-010(3) and WAC 504-26-404) will receive a score of zero on any graded coursework, which may result in failing the course, will not have the option to withdraw from the course pending an appeal, and will be reported to the the Center for Community Standards.
Cheating includes, but is not limited to, plagiarism and unauthorized collaboration as defined in the Standards of Conduct for Students, WAC 504-26-010(3). You need to read and understand all of the definitions of cheating: http://app.leg.wa.gov/WAC/default.aspx?cite=504-26-010. If you have any questions about what is and is not allowed in this course, you should ask course instructors before proceeding. If you wish to appeal a faculty member’s decision relating to academic integrity, please use the form available
at
communitystandards.wsu.edu. Make sure you submit your appeal within 21 calendar days of the faculty member’s decision.