• Scraping Grailed

    title

    Recently, I have been selling a large portion of my closet to Grailed, a community marketplace for men’s clothing centered on streetwear & designer.

    ex1

    As an avid user of this platform, I find it hard to easily compare prices of current listings versus sold listings of the same item, as you would have to scan through all listings in attempt to capture an overlying trend over time. Thus, it was hard to extrapolate best prices and listing features to sell your item quicker. Unfortunately, Grailed does not have a public API, so I though this would be a perfect opportunity to attempt to scrape relevant features from each listing to visualize fashion trends through data.

    Read on →

  • Relearning R

    title

    A hodgepodge of notes for learning R for my reference, segmented so it is easy to read.

    Read on →

  • Python Integration in RStudio?

    title

    It’s finally here! Well, it has been here actually. The reticulate package has provided tools for interoperability between Python and R; it allows:

    • Calling Python from R in a variety of ways including R Markdown, sourcing Python scripts, importing Python modules, and using Python interactively within an R session.
    • Translation between R and Python objects (for example, between R and Pandas data frames, or between R-matrices and NumPy arrays).
    • Flexible binding to different versions of Python including virtual environments and Conda environments.

    Read on →

  • On Missing Values

    title

    Missing values – nan, n/a, or just empty observations – have a significant effect on conclusions that can be made from data.

    They can occur systematically:

    • as a nonresponse: no information is provided for one or more items or for a whole unit; private subjects like income can push people to not answer
    • from attrition in longitudinal studies, where participants drop out before the experiment or test ends
    • on purpose – entities choose not to, or fail to, report critical statistics, such as through governments or private entities; or because the information is not available

    Or at random:

    • missing completely at random (MCAR): if the events that lead to any particular data-item being missing are independent both of observable variables and of unobservable parameters of interest, and occur entirely at random – causes the dataset to be unbiased
    • missing at random (MAR): missingness is not random, but where missingness can be fully accounted for by variables where there is complete information – for example, knowing that some employee forgot to input certain observations on a certain day for a study on accident (reason has to be unrelated to the variable)
    • missing not at random (MNAR): neither MCAR or MAR, one example is if users fail to fill in a depression survey because of their level of depression

    Read on →

  • Relearning SQL

    title

    There’s no argument that SQL is a must-learn language if you want to work with data. Like with any language, you need to keep notes to refresh yourself for future projects.

    Read on →