Browse Datasets
Sort by Date Donated, desc
Gas sensor array low-concentration
This dataset contains 6 gas responses collected by a sensor array consisting of 10 metal oxide semiconductor sensors, with gas concentrations at the ppb level (below the minimum detection limit of the sensors)
Twitter Geospatial Data
Seven days of geo-tagged Tweet data from the United States with exact GPS location and timestamp.
CAN-MIRGU
A Comprehensive CAN Bus Attack Dataset from Moving Vehicles for Intrusion Detection System Evaluation This dataset includes CAN bus attacks collected from a modern automobile equipped with autonomous driving capabilities, operating in real-world driving scenarios. The dataset encompasses physically verified attacks to enhance the comparison and validation of in-vehicle network Intrusion Detection Systems.
Assessing Mathematics Learning in Higher Education
MathE is a mathematical platform developed under the MathE project (mathe.pixel-online.org). The dataset has 9546 answers to questions in the Mathematical topics taught in higher education. The file has eight features, named: Student ID, Student Country, Question ID, Type of answer (correct or incorrect), Question level (basic or advanced), Math Topic, Math Subtopic, and Question Keywords. The question level was associated with the professor who submitted the question. The data was obtained from February 2019 until December 2023.
Turkish Crowdfunding Startups
This dataset contains data on crowdfunding campaigns in Turkey. The dataset includes various characteristics such as crowdfunding projects, project descriptions, targeted and raised funds, campaign durations, and number of backers. Collected in 2022, this dataset provides a valuable resource for researchers who want to understand and analyze the crowdfunding ecosystem in Turkey. In total, there are data from more than 1500 projects on 6 different platforms. The dataset is particularly useful for training natural language processing (NLP) and machine learning models. This dataset is an important reference point for studies on the characteristics of successful crowdfunding campaigns and provides comprehensive information for entrepreneurs, investors and researchers in Turkey.
Synthetic Circle Data Set
This dataset comprises 10000 two-dimensional points arranged into 100 circles, each containing 100 points. It was designed to evaluate clustering algorithms, such as k-means, by providing a clear and structured clustering challenge.
Micro Gas Turbine Electrical Energy Prediction
This dataset consists of measurements of electrical power corresponding to an input control signal over time, collected from a 3-kilowatt commercial micro gas turbine.
Printed Circuit Board Processed Image
This CSV dataset, originally used for test-pad coordinate retrieval from PCB images, presents potential applications like classification (e.g., Grey test pad detection), anomaly detection (e.g., fake test pads), or clustering for grey test pads discovery. The dataset includes X and Y representing pixel positions, and R, G, B values determining pixel color (minmax normalized from 0-255). A 'Grey' field indicates approximate grey pixels. This dataset was originally used for a 2-stage discovery of high number of test pad clusters (>100) in a dataset presented in: @article{Tan2016FastRO, title={Fast retrievals of test-pad coordinates from photo images of printed circuit boards}, author={Swee Chuan Tan and Schumann Tong Wei Kit}, journal={2016 International Conference on Advanced Mechatronic Systems (ICAMechS)}, year={2016}, pages={464-467}, url={https://api.semanticscholar.org/CorpusID:38544897} } More pixels here than that in the paper due to different extraction method.
An eye on the vine - a dataset for fungi segmentation in microscopic vine wood images
This dataset is intended to help solving the problem of pathogen segmentation in fluorescence microscopy images of vine wood. Because there is no dataset available to cast the problem into a supervised framework, this dataset provides a collection of realistic images based on the knowledge of the image formation model in fluorescence microscopy.
PhiUSIIL Phishing URL (Website)
PhiUSIIL Phishing URL Dataset is a substantial dataset comprising 134,850 legitimate and 100,945 phishing URLs. Most of the URLs we analyzed, while constructing the dataset, are the latest URLs. Features are extracted from the source code of the webpage and URL. Features such as CharContinuationRate, URLTitleMatchScore, URLCharProb, and TLDLegitimateProb are derived from existing features.
0 to 10 of 673