Browse Datasets
Sort by Date Donated, desc
High-Resolution Load Dataset from Smart Meters Across Various Cities in Morocco
The dataset includes detailed measurements of electricity consumption from several areas of Laâyoune, Boujdour, Marrakech and Foum Eloued. It presents both domestic and industrial consumption profiles with a frequency of recording every 10 minutes for Laâyoune Boujdour and Foum Eloued and every 30 minutes for Marrakech. The data are organized in several tables showing usage trends by city and area, which allows analysis of daily variations in demand, seasonal fluctuations and peak loads. An important element to The dataset includes detailed measurements of electricity consumption from several areas of Laâyoune, Boujdour, Marrakech and Foum Eloued. It presents both domestic and industrial consumption profiles with a frequency of recording every 10 minutes for Laâyoune Boujdour and Foum Eloued and every 30 minutes for Marrakech. All data is in amperes (A), except for Marrakech, which is in kilowatts (kW).
Gallstone
The clinical dataset was collected from the Internal Medicine Outpatient Clinic of Ankara VM Medical Park Hospital and includes data from 319 individuals (June 2022–June 2023), 161 of whom were diagnosed with gallstone disease. It contains 38 features, including demographic, bioimpedance, and laboratory data, and was ethically approved by the Ankara City Hospital Ethics Committee (E2-23-4632). Demographic variables are age, sex, height, weight, and BMI. Bioimpedance data includes total, extracellular, and intracellular water, muscle and fat mass, protein, visceral fat area, and hepatic fat. Laboratory features are glucose, total cholesterol, HDL, LDL, triglycerides, AST, ALT, ALP, creatinine, GFR, CRP, hemoglobin, and vitamin D. The dataset is complete, with no missing values, and balanced in terms of disease status, eliminating the need for additional preprocessing. It provides a strong foundation for machine learning-based gallstone prediction using non-imaging features.
BEED: Bangalore EEG Epilepsy Dataset
The Bangalore EEG Epilepsy Dataset (BEED) is a comprehensive EEG collection for epileptic seizure detection and classification. Recorded at a neurological research centre in Bangalore, India, it features high-fidelity EEG signals captured using the standard 10-20 electrode system at a 256 Hz sampling rate. BEED contains 16,000 segments of 20-second EEG recordings evenly distributed across four categories: Healthy Subjects (0), Generalized Seizures (1), Focal Seizures (2), and Seizure Events (3), where seizure activity occurs with events like eye blinking, nail biting, or staring. Each category includes data from 20 adult subjects (ages 21-55) with equal gender representation. The dataset comprises 16 EEG channels (X1-X16) corresponding to different brain regions, with a binary label (y) indicating seizure presence (1) or absence (0). BEED supports machine learning in seizure detection, epilepsy analysis, and EEG research with its balanced, high-resolution data.
RecGym: Gym Workouts Recognition Dataset with IMU and Capacitive Sensor
The RecGym dataset is a collection of gym workouts with IMU and Capacitive sensors, designed for research and development in recommendation systems and fitness applications. The data set records ten volunteers' gym sessions with a sensing unit composed of an IMU sensor (columns of A_x, A_y, A_z, G_x, G_y, G_z) and a Body Capacitance sensor (column of C_1). The sensing units were worn at three positions: on the wrist, in the pocket, and on the calf, with a sampling rate of 20 Hz. The data set contains the motion signals of twelve activities, including eleven workouts: Adductor, ArmCurl, BenchPress, LegCurl, LegPress, Riding, RopeSkipping, Running, Squat, StairsClimber, Walking, and a "Null" activity when the volunteer hangs around between different workouts session. Each participant performed the above-listed workouts for five sessions in five days (each session lasts around one hour). Altogether, fifty sessions of normalized gym workout data are presented in this data set.
Inflation Research Abstracts Classification
This data set contains scientific papers abstracts from economics inflation. The task is to classify them according to their machine learning methodologies inclusion.
Drug Induced Autoimmunity Prediction
This dataset comprises molecular descriptors generated using RDKit, specifically curated for the study of drug-induced autoimmunity through ensemble machine learning approaches. It is divided into a training set and a testing set, containing numerical features that represent molecular properties and structural characteristics of drugs. The dataset supports predictive modeling tasks aimed at identifying potential autoimmune risks associated with drug candidates. These molecular descriptors include physicochemical properties, providing a comprehensive foundation for machine learning analysis. The dataset facilitates the development of interpretable models for drug toxicity prediction, contributing to advancements in computational toxicology and drug safety assessment.
PIRvision_FoG_presence_detection
The PIRvision dataset contains occupancy detection data collected from a Synchronized Low-Energy Electronically-chopped Passive Infra-Red sensing node in residential and office environments. Each observation represents 4 seconds of recorded human activity within the sensor Field-of-View (FoV).
Lattice-physics (PWR fuel assembly neutronics simulation results)
This dataset encompasses lattice-physics parameters—the infinite multiplication factor (k-inf) and the pin power peaking factor (PPPF)—modeled as functions of variations in fuel pin enrichments for the NuScale US600 fuel assembly type C-01 (NFAC-01) [NuScale FSAR]. These critical parameters were computed using the MCNP6 code, a Monte Carlo-based tool for nuclear reactor criticality simulations. Fuel pin enrichments were uniformly sampled within the range of 0.7–5.0 weight percent (w/o) U-235 to generate the dataset. The dataset contains 39 features, each representing the enrichment of a specific fuel rod in a one-eighth symmetry of the NFAC assembly. The outputs of interest are the k-inf and PPPF values associated with these enrichments.
Gas sensor array low-concentration
This dataset contains 6 gas responses collected by a sensor array consisting of 10 metal oxide semiconductor sensors, with gas concentrations at the ppb level (below the minimum detection limit of the sensors)
Twitter Geospatial Data
Seven days of geo-tagged Tweet data from the United States with exact GPS location and timestamp.
0 to 10 of 682