We Care For Parkinsons
{Big Data, Big Dream}

Index of this Chapter


Big Data techonlogy have the potential to automatically detect PD symptoms and inform clinicians about the progression of disease. This chapter will introduce the knowledge about the big data in healthcare.

There are some real-world data analysis by using machine learning, deep learning and stats for parkinson’s disease or other disease. Here, also provide a FHIR on Azure playground to learn FHIR, HL7 and EMR.

If you have any comments or suggestion, please let us know. Or you want to publish your post or experience here, contact us!

Build a customized chatbot using GPT-3 Index and ChatGPT


GPT stands for Generative Pre-trained Transformer, which is a Large Language Model (LLM) built by OpenAI and released in June 2020. The GPT3 model was later iterated into GPT3.5, also known as InstructGPT, to improve its ability to follow instructions and complete tasks.

Author and collect by: Cell

Build local ChatGPT-like by using LLaMa and Alpaca


“Stanford Alpaca” is a research project developed by a team at Stanford University’s Natural Language Processing Group, and “LLaMA” stands for “Language Learning for Instruction Following by Matching and Attention.”.

Author and collect by: Cell

Create a REST API in Java, Spring Boot and MySQL


Spring Boot is a popular framework for building web applications using the Java programming language. It provides many features, such as auto-configuration, that make it easy to create and deploy web applications.

Author and collect by: Cell

GitHub Copilot · Your AI pair programmer


Copilot is a cutting-edge artificial intelligence (AI) tool developed by OpenAI, designed to assist programmers in writing code more efficiently and effectively. It is a revolutionary new programming tool that uses advanced deep learning algorithms to generate code suggestions in real-time as developers write their code.

Author and collect by: Leon, Cell

Dockerizing Wordpress with Nginx on Ubuntu by using free Oracle Cloud


Install WordPress with Docker, Nginx, MySQL with SSL. In this guide you are going to see how to make a best performance setup with Docker, Docker Compose, Nginx, MySQL, and Let’s Encrypt to run WordPress on Ubuntu 22.04.1 VM by using Oracle Cloud.

Author and collect by: Cell

Docker Cheat Sheet


Here is The Ultimate Docker Cheat Sheet and other docker cheat sheets.

Collect by: Cell

DevOps Tools


DevOps Tool is an application that helps automate the software development process. It mainly focuses on communication and collaboration between product management, software development, and operations professionals. DevOps tool also enables teams to automate most of the software development processes like build, conflict management, dependency management, deployment, etc. and helps reduce manual efforts.

Author and collect by: Cell

Azure DevOps Self-hosted agents


An agent is computing infrastructure with installed agent software that runs one job at a time. To build your code or deploy your software using Azure Pipelines, you need at least one agent. As you add more code and people, you’ll eventually need more. When your pipeline runs, the system begins one or more jobs. Here is introducing to install the agent on macOS machines.

Author and collect by: Cell

Apache Kafka and Redis


Today, the most popular tools for message streaming like log aggregation are Kafka and Redis. Both tools provide the functionality of data streaming and aggregation in their own respective ways. Here, we are going to compare the two in regards to their various capabilities and performance tests.

Author and collect by: Cell

Cron: a job scheduler


The cron command-line utility, also known as cron job is a job scheduler on Unix-like operating systems.

Author and collect by: Cell

Oracle Cloud Infrastructure Always Free services


Oracle began offering Always Free services including Compute, Storage, and Autonomous Database. The newest Always Free services include Ampere A1 Compute, Autonomous JSON Database, NoSQL, APEX Application Development, Logging, Service Connector Hub, Application Performance Monitoring (APM), flexible load balancer, flexible network load balancer, VPN Connect V2, Oracle Security Zones, Oracle Security Advisor, and OCI Bastion, making Oracle’s Always Free portfolio of services and resources among the most generous in the industry.

Author and collect by: Cell

Powered by Dask


Dask is a flexible library for parallel computing in Python. Dask uses existing Python APIs and data structures to make it easy to switch between NumPy, pandas, scikit-learn to their Dask-powered equivalents.

Author and collect by: Cell, Sheryl

Backup WordPress and MySQL Docker Containers


We could build a multi-container WordPress installation. These containers include a MySQL database, an Nginx web server, and WordPress itself. This post introduces how to backup your work about MySQL database and wordpress.

Author and collect by: Sheryl, Cell

Set up gitlab-runner for GitLab CI


Using your local development machine as GitLab CI runner instead of the shared runners. GitLab runner is a build instance which is used to run the jobs over multiple machines and send the results to GitLab and which can be placed on separate users, servers, and local machine. You can register the runner as shared or specific after installing it.

Author and collect by: Sheryl, Cell

What is EEG (Electroencephalography)


The EEG is an electrophysiological technique for the recording of electrical activity arising from the human brain. Given its exquisite temporal sensitivity, the main utility of EEG is in the evaluation of dynamic cerebral functioning.

Author and collect by: Cell, Sheryl

A list of all public EEG-datasets.


A list of all public EEG-datasets for study.

Creating a CI/CD pipeline by using Python to test links


Github Actions now has support for CI/CD. What this means is that developers can now start using GitHub Actions to create a CI/CD pipeline. In this tutorial, we are going to build a CI/CD pipeline using Github Actions and call Python funcion, the pipeline will test hyperlinks about open-source health websites or libs.

Author and collect by: Sheryl, Cell

Amazon Lightsail, EC2 example-1: Launch a Wordpress Website


Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the Amazon Web Services (AWS) cloud. Lightsail is an easy-to-use cloud platform that offers you everything needed to build an application or website, plus a cost-effective, monthly plan. This post presents how to launch a Wordpress website by using EC2 or Lightsail.

Author and collect by: Sheryl

Azure HDInsight - Hadoop, Spark, & Kafka Service


Azure HDInsight is easy, cost-effective, enterprise-grade service for open source analytics. Azure HDInsight is a managed, full-spectrum, open-source analytics service in the cloud for enterprises. You can use open-source frameworks such as Hadoop, Apache Spark, Apache Hive, LLAP, Apache Kafka, Apache Storm, R, and more.

Author and collect by: Sheryl

Azure - create Linux server


Azure supports common Linux distributions, including Red Hat, SUSE, Ubuntu, CentOS, Debian, Oracle Linux and CoreOS. We can selecet Azure to create Linux virtual machines (VMs), deploy and run containers in Kubernetes, or other files.

Author and collect by: Sheryl

Workshop 1: Big data, Big Dream


Big Data, Big Dream. Mr. Bin Zhu has 18+ years of software development across multiple platforms and technologies. He built a complete big data ecosystem and the entire team for Internet company which is in top ten global traffic list.

Currently, he is Cofounder & CTO of Faimdata in Montreal.

A Tour of Machine Learning Ten Algorithms


Machine learning algorithms are programs that can learn from data and improve from experience without human intervention. Learning tasks may include learning the function that maps the input to the output, learning the hidden structure in unlabeled data; or ‘instance-based learning’, where a class label is produced for a new instance by comparing the new instance (row) to instances from the training data, which were stored in memory. ‘Instance-based learning’ does not create an abstraction from specific instances.
Machine learning is also often referred to as predictive analytics, or predictive modelling.

Installing Anaconda in Hadoop - Using Anaconda with Spark


Anaconda is a free and open-source distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment. Package versions are managed by the package management system conda. Author and collect by: Sheryl

Anaconda Installation in OS, and Hadoop


Anaconda is a free and open-source distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment. Package versions are managed by the package management system conda. Author and collect by: Sheryl

Cheat Sheets for Big Data 4: SQL


List of Data Science Cheatsheets & Infographics (SQL)
Collected by: Sheryl, Cell

Cheat Sheets for Big Data 3: General


List of Data Science Cheatsheets & Infographics (General)
Collected by: Sheryl, Cell

Cheat Sheets for Big Data 2: R


List of Data Science Cheatsheets & Infographics (R part)
Collected by: Sheryl, Cell

Cheat Sheets for Big Data 1: Python


List of Data Science Cheatsheets & Infographics (Python)
Collected by: Sheryl, Cell

Hadoop_Hortonworks Sandbox HDP installation (Hadoop Part1)


In this post, we’re going to build environment about hadoop,big, spark. Finish Hortonworks Sandbox HDP installation. Later, we will use these system to finish some big data projects about using health data or other data.
Author: Sheryl, Cell.

EEG ERP Datasets for Parkinsons disease


Here share some datasets which are related to PD. And will introduce datasets and related studies…. Magnetoencephalography and electroencephalography (M/EEG) measure the weak electromagnetic signals generated by neuronal activity in the brain. Using these signals to characterize and locate neural activation in the brain is a challenge that requires expertise in physics, signal processing, statistics, and numerical methods.

Classifying Quantized Dataset with Random Forest Classifier (Part 2)


In this post, we’re going to finish the work started in the previous one and eventually classify quantized version of Wrist-worn Accelerometer Dataset. There are many ways to classify datasets with numerical features, but Decision Tree is one of the most intuitively understandable ones and simple it its underlying implementation. We are going to build a Decision Tree classifier using Numpy library and generalize it to Random Forest — an ensemble of randomly generated trees, which is less prone to data noise. Author: Ilia Zaitsev.

Using K-Means Clustering to Quantize Dataset Samples (Part 1)


Clustering algorithms are used to analyze data in an unsupervised fashion, in cases when labels are not available or to get new insights about the dataset. The K-Means algorithm is one of the oldest clustering algorithms developed several decades ago but still applied in Machine Learning tasks. One of the ways to use this algorithm is to apply it for vector quantization, a process which allows reducing the dimensionality of analyzed data. In this post, I’m going to implement a simple implementation of K-Means and apply it to Wrist-worn Accelerometer Dataset.Author: Ilia Zaitsev.